Using cURL or DOM to webscrape

Using cURL or DOM to webscrape - php

I've been working on this for about four hours and have been all over the internet trying to understand it, so please be gentle.
I'd like to display a div from an external source on my php page. I've tried usingfile_get_dom, simplexml_load_file, file_get_contents with preg_match_all, then printed them on my page, but they don't work. cURLing is over my head from what I have seen and can't understand any of it, but I've been told it is the best way to do it. They all result in various errors when all I want is to grab the contents of an external div. What should I do?
An example would be scraping the div id='hmenus' on this page, then displaying it on my local page.
Thanks!

If cURL is over your head then perhaps try Simple HTML DOM
$html = file_get_html($url);
echo $html->find('div[id=hmenus]', 0);

Related

How to get tag's attribute using PHP simple HTML DOM parser

I am using the PHP Simple HTML DOM parser to scrap website data, but unfortunately not able to extract the data i want to. I have also tried to google and look in the documentation but could not solve the issue. The code structure of what i am trying to scrap is something like this.
<div id="section1">
<h1>Some content</h1>
<p>Some content</p>
............
<<Not fixed number of element>>
............
<script> <<Some script>></script>
<video>
<source src="www.exmple.com/34/exmple.mp4">
</video>
</div>
I tried with JavaScript and i could do the same like this
document.getElementById("section1").getElementsByTagName("source")[0].getAttribute("src");
But when i tried with PHP Dom parser i m not getting any data.
Here is how my code looks likes
require ''.$_SERVER['DOCUMENT_ROOT'].'/../lib/simplehtmldom/simple_html_dom.php';
$html_content = get($url); //This is cURL function to get website content.
$obj_content = str_get_html($html_content);
$linkURL = $obj_content->getElementById('section1')->find('source',0)->getAttribute('src');
var_dump($linkURL);
This results in an empty string. I also tried changing to code a bit here and there but none of those works every time came blank. But if i var dump $obj_content i get lot of dom element
I tried to follow these posts from stackoverflow which are similar to mine , but these did not help me.
How do I get the HTML code of a web page in PHP?
PHP Simple HTML DOM
PHP Simple HTML DOM Parser Call to a member function children() on a non-object
And their manual http://simplehtmldom.sourceforge.net/manual.htm
Can anyone please help me
Thank you

The code snippet is fine as it is. Problem was that the URL that I was targeting was not there at the time of page load. It was added by the <script> tag after page being loaded.
Thank you #WillardSolutions

php simple html dom and iframe src

require('simple_html_dom.php');
// Create DOM from URL or file
$html = file_get_html('https://www7.fmovies.se/film/hometown-hero.m2r28/6xpjrp');
foreach($html->find('div[id=player]') as $div)
{
foreach($div->find('iframe') as $iframe)
{
echo $iframe->src;
}
}
This is my code and as you see I'm trying to get the src of the iframe under the player div using PHP Simple HTML DOM Parser, can you explain to me why I'm getting a blank page as a result?
Thanks!
UPDATE: After using a javascript switcher addon and disabling javascript, I noticed that the iframe I'm looking for is not loaded. What should I do to get the iframe src?

There are two possible solutions;
Try to figure out how the javascript works, and mock this behavior in your PHP script.
Let the page load in, for example, selenium and then grab the SRC from the iframe using selenium. (https://www.seleniumhq.org/)
Hope this helps

HTML DOM Parser, only get the data inside the DIV, and not all the rest from the start div

I've started using the PHP HTML Dom Parser and still learning it. I've got a problem though, I'm trying to obtain data from a video website, within a certain div tag. I've managed to fix it, but a small problem still remains; it catches all the data from the start of that specific div tag. I would probably need to add a small line of code, something like "[/div]" or something, but I'm completely out of any further ideas.
Here's my code that you can take a look at;
foreach($html->find('div[class=video]') as $key => $info)
{
echo $info->innertext;
}
So, how can I fix this? So it only gets the content inside that div, and not the rest of the file?
Thanks!

phpQuery - make php script wait until iframe content has loaded

I'm using the phpQuery library (http://code.google.com/p/phpquery/) to parse web pages but have stumbled across a problem getting sites that use Ajax to display all the content.
I have worked out that I can get all the content if I load it in to an iframe (the code below works):
$temp = phpQuery::newDocumentHTML('<iframe src="" id="test">a</iframe>')->find('iframe[id=test]')->attr('src', 'http://www.example.com/');
echo $temp;
BUT, my question is, how can I get my PHP script to wait until the iframe has loaded before proceeding?
Below is the jQuery equivalent but I was wondering if anybody knows how to do the equivalent using phpQuery?
$(iFrame).attr('src', 'http://www.example.com');
$(iFrame).load(function(){
alert("Loaded");
});
Thanks in advance.

BUT, my question is, how can I get my PHP script to wait until the iframe has loaded before proceeding?
This is not how PHP-side HTML parsing works. phpQuery just parses the HTML code, it doesn't do anything with it - like load and/or render iframes, or run JavaScript events.
There is probably a way to do what you want to do - if you tell us what that is!

Getting the first URL of an image search result with google image API in PHP

did you know a php script (a class will be nice) who get the url of the first image result of a google api image search? Thanks
Example.
<?php echo(geturl("searchterm")) ?>

I have found a solution to get the first image from Google Image result using Simple HTML DOM as Sarfraz told.
Kindly check the below code. Currently it is working fine for me.
$search_keyword=str_replace(' ','+',$search_keyword);
$newhtml =file_get_html("https://www.google.com/search?q=".$search_keyword."&tbm=isch");
$result_image_source = $newhtml->find('img', 0)->src;
echo '<img src="'.$result_image_source.'">';

You should be able do that easily with Simple HTML DOM.
Note: See the examples on their site for more information.
A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
Find tags on an HTML page with selectors just like jQuery.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Using cURL or DOM to webscrape - php

If cURL is over your head then perhaps try Simple HTML DOM $html = file_get_html($url); echo $html->find('div[id=hmenus]', 0);

Related

How to get tag's attribute using PHP simple HTML DOM parser

php simple html dom and iframe src

HTML DOM Parser, only get the data inside the DIV, and not all the rest from the start div

phpQuery - make php script wait until iframe content has loaded

Getting the first URL of an image search result with google image API in PHP

Categories

Resources