simple html DOM cant see all hrefs - php

Im trying to retrieve the youtube link of a certain site. But when using the simple html DOM parser it cant find the links im looking for.
$new_html = file_get_html("https://www.bia2.com/video/Amir-Shamloo/Delam-Tange/");
foreach ($new_html->find('href') as $youtube) {
echo $youtube;
}
it should find the link: https://www.youtube.com/watch?v=vJ2aNG0aJPU.
does someone know what the problem is here?

That particular link is inserted via JavaScript via onYouTubeIframeAPIReady("vJ2aNG0aJPU") during the onload event.
SimpleHtmlDom (or any other PHP based HTML parser for that matter) will not execute any JavaScript. They just parse the markup returned by the webserver.
You'd need a scraper capable of executing Javascript before you can scrape it. Or you can match the argument to that function and assemble the link yourself.
On a side note: $new_html->find('href') will try to find any elements named "href", which is obviously wrong. To get all href attributes for any element, you'd have to use *[href] instead.
On another side not: SimpleHtmlDom is a crap library. Consider your options:
How do you parse and process HTML/XML in PHP?

Related

i want to get data from another website and display it on mine but with my style.css

So my school has this very annoying way to view my rooster.
you have to bypass 5 links to get to my rooster.
this is the link for my class (it updates weekly without changing the link)
https://webuntis.a12.nl/WebUntis/?school=roc%20a12#Timetable?type=1&departmentId=0&id=2147
i want to display the content from that page on my website but with my
own stylesheet.
i don't mean this:
<?php
$homepage = file_get_contents('http://www.example.com/');
echo $homepage;
?>
or an iframe....
I think this can be better done using jquery and ajax. You can get jquery to load the target page, use selectors to strip out what you need, then attach it to your document tree. You should then be able to style it anyway you like.
I would recommend you to use the cURL library: http://www.php.net/manual/en/curl.examples.php
But you have to extract part of the page you want to display, because you will get the whole HTML document.
You'd probably read the whole page into a string variable (using file_get_contents like you mentioned for example) and parse the content, here you have some possibilities:
Regular expressions
Walking the DOM tree (eg. using PHPs DOMDocument classes)
After that, you'd most likely replace all the style="..." or class="..." information with your own.

Extract all links which ends with .js extension in html page

I want to extract all links that ends with .js within html page.I am able to fetch links that are within script tag
but how could i fetch links from properties like {"yui":"http://l.yimg.com/nn/lib/metro/g/uicontrib/yui/yui_3.4.1.js"}.
I want this to be done in php
A simple PHP HTML DOM parser written in PHP5+, supports invalid HTML, and provides a very easy way to handle HTML elements. Find tags on an HTML page with selectors just like jQuery. Extract contents from HTML in a single line.
Here is the link to get it: http://sourceforge.net/projects/simplehtmldom/
...and here is the official web site: http://simplehtmldom.sourceforge.net/
For basic HTML elements you can use http://code.google.com/p/phpquery/ to parse DOM content (it handle jquery like CSS selectors, functions like attr, find). Here is example howto use selectors with PhpQuery http://code.google.com/p/phpquery/wiki/Selectors.
For properties, it depends:
Some kind of regexp if they are in Javascripts or something else,
If they are in data attributes and you know attributes name, then you can get that json string and simply run json_decode php function on it.

Creating a personalization engine with php

I am new to php and I want to create an php engine which changes the web content of a webpage with PHP with the use of data in mysql. For example (changing the order of navigation links on a webpage with the order of highest click count) I am not sure how PHP will read the HTML file and change the elements in the HTML file and also output the HTML file with the changes. Is this possible?
I am not quite sure why you would want to generate the html, read it, change it and then output it. It seems to be a lot easier to just generate it the way you want to in the first place.
I am not sure how PHP will read the HTML file and change the elements in the HTML file and also output the HTML file with the changes. Is this possible?
You could use file_get_contents:
$html = file_get_contents($url);
Then use a html-parser like Simple HTML DOM Parser, change what you want to do and output it.
If you want to modify HTML structure, use ganon - HTML DOM parser for PHP
include('path/ganon.php');
// Parse the google code website into a DOM
$html = file_get_dom('http://code.google.com/');
foreach($html('p[class]') as $element) {
echo $element->class, "<br>\n";
}

Get contents of a div from a URL [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to implement a web scraper in PHP?
How to parse and process HTML with PHP?
I need to crawl through a page and get the contents of a particular div. I have php and javascript as my two main options. How can it be done?
There are many ways to get the contents of an url:
First Method:
http://simplehtmldom.sourceforge.net/
Simple HTML DOM Parser
Second Method :
<?php
$contents = file_get_contents("http://www.url.com");
$contents = strip_tags($contents, "<div>");
preg_match_all("/<div/>(?:[^<]*)<\/div>/is", $contents, $file_contents);
?>
Third Method:
`You can use jquery like Selectors :`
http://api.jquery.com/category/selectors/
This is quite a basic method to do it PHP and it returns the content in plain text. However you might consider revising the regex for your particular need.
<?php
$link = file_get_contents("http://www.domain.com");
$file = strip_tags($link, "<div>");
preg_match_all("/<div/>(?:[^<]*)<\/div>/is", $file, $content);
print_r($content);
?>
You can use SimpleDomParser as documented here http://simplehtmldom.sourceforge.net/manual.htm
it requires PHP5+ though, but the nice thing is you can find tags on an HTML page with selectors just like jQuery.
Specifically with jQuery, if you have a div like the following:
<div id="cool_div">Some content here</div>
You could use jQuery to get the contents of the div like this:
$('#cool_div').text(); // will return text version of contents...
$('#cool_div').html(); // will return HTML version of contents...
If you're using PHP to generate the content of the page, then you should be able to get a decent handle on the content and manipulate it even before it's returned to the screen and displayed. Hope this helps!
Using PHP, you can try the DOMDocument class and the getElements() function

Getting the first URL of an image search result with google image API in PHP

did you know a php script (a class will be nice) who get the url of the first image result of a google api image search? Thanks
Example.
<?php echo(geturl("searchterm")) ?>
I have found a solution to get the first image from Google Image result using Simple HTML DOM as Sarfraz told.
Kindly check the below code. Currently it is working fine for me.
$search_keyword=str_replace(' ','+',$search_keyword);
$newhtml =file_get_html("https://www.google.com/search?q=".$search_keyword."&tbm=isch");
$result_image_source = $newhtml->find('img', 0)->src;
echo '<img src="'.$result_image_source.'">';
You should be able do that easily with Simple HTML DOM.
Note: See the examples on their site for more information.
A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
Find tags on an HTML page with selectors just like jQuery.

Categories