I'm trying to parse a HTML page where the majority of the content is contained in javascript. When I use the Chrome development tools I can see that the div class I'm trying to grab the content from is called div class=doodle-image. However when I either view the page as a source or try to grab it with php:
<?php
include_once('simple_html_dom.php');
$html = new simple_html_dom();
$html->load_file('http://www.google.com/doodles/finder/2012/All%20doodles');
$doodles = $html->find('.doodle-image');
echo $html;
?>
It returns the frame of the page but contains none of the divs or content. How can I grab the full content of the page?
That's because the element is empty when your PHP client fetches it, Google is loading in a JSON-object with JavaScript to populate the list of doodles. It does a Ajax-request to this page, and probably you can too.
Related
I started learning php. How can I find video source url with php. Can someone tell me how to get video url with simple_html_dom? For example how to find a video source url from a website?
Thank you
First of all, you need to understand how you can fetch an HTML document on a site. Everything you see in the browser consists of HTML and CSS. When you fetch a HTML document from a page with PHP, you get whatever is on the screen at that moment. So if a content is loaded into the page later using an Ajax call like below, you can't get it directly from the page.
function loadVideoURL() {
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
if (this.status == 200) {
document.getElementById("videosrc").innerHTML = this.responseText;
}
};
xhttp.open("GET", "get_video_url.php", true);
xhttp.send();
}
In the current situation, suppose that the video urls included in the HTML document in the first place.
First Download latest version of simplehtmldom from here. This is the PHP Class to manipulate HTML in easy way and this library requires minimum PHP 5. Create a file which name is simplehtmldomparser.php in main directory of your project and include it using code below to your main php code.
<?php
include 'simplehtmldomparser.php';
?>
This code block will load the class to your main PHP file. Now, you want to fetch video url from the page.
<?php
$html = file_get_html('http://www.videos.com/');
/* This line will fetch HTML document from the site that you wrote. */
?>
Now, you have the html document in $html variable. You need find video tags in HTML document. For example, If fetched page uses a <video> tag for videos, you can get the video urls as follows.
<?php
foreach($html->find('video') as $element){
echo $element->src . '<br>';
}
?>
Here is another example that might work for your project. For example you want to jump another video in the website, fetch all <a> tags and get their hrefs and use the $html = file_get_html($newlyfetchedanchor); again. To fetch all <a> tags from the current $html use code below.
<?php
foreach($html->find('a') as $element){
echo $element->href;
}
?>
There are more useful functions in the class and you can find here.
require('simple_html_dom.php');
// Create DOM from URL or file
$html = file_get_html('https://www7.fmovies.se/film/hometown-hero.m2r28/6xpjrp');
foreach($html->find('div[id=player]') as $div)
{
foreach($div->find('iframe') as $iframe)
{
echo $iframe->src;
}
}
This is my code and as you see I'm trying to get the src of the iframe under the player div using PHP Simple HTML DOM Parser, can you explain to me why I'm getting a blank page as a result?
Thanks!
UPDATE: After using a javascript switcher addon and disabling javascript, I noticed that the iframe I'm looking for is not loaded. What should I do to get the iframe src?
There are two possible solutions;
Try to figure out how the javascript works, and mock this behavior in your PHP script.
Let the page load in, for example, selenium and then grab the SRC from the iframe using selenium. (https://www.seleniumhq.org/)
Hope this helps
I want to load a webpage and extract some items from it. I'm using php. This is my code
<?php
$html = file_get_contents('www.website.com');
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
echo $html;
}
?>
When I compare source code of www.website.com and html that I loded I see that some tags changed. for example there is
<span class="str">
in www.website.com sorce code which changes to
<h5 class="item-subtitle">
in my loded html. What is the reason and how can I correct it?
Some websites will change the markup with JavaScript. When you load the sites markup with file_get_contents you do not run its JavaScript, but actually just retrieve the HTML. Compare your retrieved HTML with "View page source" in your browser. Those should be the same.
Hopefully you can parse your stuff from that HTML, if not, you will have to contact the owner of that website and connect to their API, if they even supply one.
I'm attempting to create a page where I input a url and the PHP code uses that to pull page elements from another website to be displayed on my blog post. I haven't even made it as far as the form, right now I just need to understand how to get this code to work so that it displays the page elements within the div with the class "products-grid first odd".
<?php
$homepage = file_get_contents('website');
$dochtml = new DOMDocument();
$dochtml->loadHTML($strhtml);
$dochtml->getElementsByClassName('products-grid first odd');
echo ????
?>
The PHP DOMDocument object does not appear to have the method getElementsByClassName().
Instead, I think you would have to getElementsByTagName() and then loop through those DOMElements and getAttribute('class') on each and check until you find the right one.
I'm trying to 'iframe' a div using PHP / DOM instead showing the whole page, but I am having difficulties as the Div contains a custom google map and even when I try to show the entire page I am not able to make the map load. How is it possible to do it?
Link : http://satbeams.com/footprints?beam=5491
Div Id : "map_container"
What I have tried so far :
<?php
set_time_limit(0);
ini_set('memory_limit', '-1');
ini_set('display_errors',false);
include 'includes/dom.php';
$html = file_get_contents('http://satbeams.com/footprints?beam=5491');
$map = $html->find('div[map_container]');
echo $map;
?>
Thanks
file_get_contents returns a pure string, not an dom object you may manupilate. So have a look at http://php.net/manual/en/domdocument.loadhtml.php
A Google Map is mainly rendered via a javascript-API, so there isn't any existing iframe in the html markup that you could extract. You would have to parse the json data included in the page, and rebuild the map.
Be sure to not violate any copyrights!