Extract all links which ends with .js extension in html page

Extract all links which ends with .js extension in html page - php

I want to extract all links that ends with .js within html page.I am able to fetch links that are within script tag
but how could i fetch links from properties like {"yui":"http://l.yimg.com/nn/lib/metro/g/uicontrib/yui/yui_3.4.1.js"}.
I want this to be done in php

A simple PHP HTML DOM parser written in PHP5+, supports invalid HTML, and provides a very easy way to handle HTML elements. Find tags on an HTML page with selectors just like jQuery. Extract contents from HTML in a single line.
Here is the link to get it: http://sourceforge.net/projects/simplehtmldom/
...and here is the official web site: http://simplehtmldom.sourceforge.net/

For basic HTML elements you can use http://code.google.com/p/phpquery/ to parse DOM content (it handle jquery like CSS selectors, functions like attr, find). Here is example howto use selectors with PhpQuery http://code.google.com/p/phpquery/wiki/Selectors.
For properties, it depends:
Some kind of regexp if they are in Javascripts or something else,
If they are in data attributes and you know attributes name, then you can get that json string and simply run json_decode php function on it.

Related

How can I update an element property using simple_html_dom

Id like to update the 'src' attribute of an img tag using Simple HTML DOM. Ive got his at the top of the php file (join.php) which contains the img file:
include_once("simplehtmldom/simple_html_dom.php");
$htmldomOb = file_get_html('join.php');
$htmldomOb->find('img[id=imgtapchat]', 0)->src = './tapchat/clss_tapcht-1.php';
echo $htmldomOb;
This works but outputs the entire page again since i read in the entire page dom object. How can i just update the image src similar to how it is done in jQuery - As it says in the SimpleHTMLDOM site docs
Find tags on an HTML page with selectors just like jQuery
With thanks

I use PHP Dom extension to rewrite PHP Simple Dom, just finished. You can try it here.
http://shinbonlin.github.io/html-parser/

simple html DOM cant see all hrefs

Im trying to retrieve the youtube link of a certain site. But when using the simple html DOM parser it cant find the links im looking for.
$new_html = file_get_html("https://www.bia2.com/video/Amir-Shamloo/Delam-Tange/");
foreach ($new_html->find('href') as $youtube) {
echo $youtube;
}
it should find the link: https://www.youtube.com/watch?v=vJ2aNG0aJPU.
does someone know what the problem is here?

That particular link is inserted via JavaScript via onYouTubeIframeAPIReady("vJ2aNG0aJPU") during the onload event.
SimpleHtmlDom (or any other PHP based HTML parser for that matter) will not execute any JavaScript. They just parse the markup returned by the webserver.
You'd need a scraper capable of executing Javascript before you can scrape it. Or you can match the argument to that function and assemble the link yourself.
On a side note: $new_html->find('href') will try to find any elements named "href", which is obviously wrong. To get all href attributes for any element, you'd have to use *[href] instead.
On another side not: SimpleHtmlDom is a crap library. Consider your options:
How do you parse and process HTML/XML in PHP?

What's the best way to remove some divs (with selectors) from an html string using php?

I have the following text on a php variable:
<div class="keep">Content</div>
<div class="remove">Content</div>
<div class="keep">Content</div>
<div id="remove">Content</div>
I need to remove the div with id = remove and the one with class = remove, using php. The html could be more complex, basically I need to target a div with jquery-type selectors and the remove it and its content. Thanks.

This post has some decent links to DOM manipulation packages for PHP. Also if you Google "php dom manipulation", you can find other resources like querypath.org. Some of these packages use selectors that you may be more familiar with.

i want to get data from another website and display it on mine but with my style.css

So my school has this very annoying way to view my rooster.
you have to bypass 5 links to get to my rooster.
this is the link for my class (it updates weekly without changing the link)
https://webuntis.a12.nl/WebUntis/?school=roc%20a12#Timetable?type=1&departmentId=0&id=2147
i want to display the content from that page on my website but with my
own stylesheet.
i don't mean this:
<?php
$homepage = file_get_contents('http://www.example.com/');
echo $homepage;
?>
or an iframe....

I think this can be better done using jquery and ajax. You can get jquery to load the target page, use selectors to strip out what you need, then attach it to your document tree. You should then be able to style it anyway you like.

I would recommend you to use the cURL library: http://www.php.net/manual/en/curl.examples.php
But you have to extract part of the page you want to display, because you will get the whole HTML document.

You'd probably read the whole page into a string variable (using file_get_contents like you mentioned for example) and parse the content, here you have some possibilities:
Regular expressions
Walking the DOM tree (eg. using PHPs DOMDocument classes)
After that, you'd most likely replace all the style="..." or class="..." information with your own.

Creating a personalization engine with php

I am new to php and I want to create an php engine which changes the web content of a webpage with PHP with the use of data in mysql. For example (changing the order of navigation links on a webpage with the order of highest click count) I am not sure how PHP will read the HTML file and change the elements in the HTML file and also output the HTML file with the changes. Is this possible?

I am not quite sure why you would want to generate the html, read it, change it and then output it. It seems to be a lot easier to just generate it the way you want to in the first place.
I am not sure how PHP will read the HTML file and change the elements in the HTML file and also output the HTML file with the changes. Is this possible?
You could use file_get_contents:
$html = file_get_contents($url);
Then use a html-parser like Simple HTML DOM Parser, change what you want to do and output it.

If you want to modify HTML structure, use ganon - HTML DOM parser for PHP
include('path/ganon.php');
// Parse the google code website into a DOM
$html = file_get_dom('http://code.google.com/');
foreach($html('p[class]') as $element) {
echo $element->class, "<br>\n";
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extract all links which ends with .js extension in html page - php

I want to extract all links that ends with .js within html page.I am able to fetch links that are within script tag but how could i fetch links from properties like {"yui":"http://l.yimg.com/nn/lib/metro/g/uicontrib/yui/yui_3.4.1.js"}. I want this to be done in php

Related

How can I update an element property using simple_html_dom

simple html DOM cant see all hrefs

What's the best way to remove some divs (with selectors) from an html string using php?

i want to get data from another website and display it on mine but with my style.css

Creating a personalization engine with php

Categories

Resources