This is the code I am using to scrape specific data from http://www.partyhousedecorations.com
however I keep getting this error (Fatal error: Call to a member function children() on a non-object in C:\wamp\www\webScraping\PartyHouseDecorations.php on line 8 )and I am stuck and can't seem to be able to fix it.
This is my code:
<?php
include_once("simple_html_dom.php");
$serv=$_GET['search'];
$url = 'http://www.partyhousedecorations.com/category-adult-birthday-party-themes'.$serv;
$output = file_get_html($url);
$arrOfStuff = $output->find('div[class=product-grid]', 0)->children();
foreach( $arrOfStuff as $item )
{
echo "Party House Decorations".'<br>';
echo $item->find('div[class=name]', 0)->find('a', 0)->innertext.'<br>';
echo '<img src="http://www.partyhousedecorations.com'.$item->find('div[class=image]', 0)->find('img', 0)->src.'"><br>';
echo str_replace('KWD', 'AED', $item->find('div[class=price]',0)->innertext.'<br>');
}
?>
Looks like $output->find('div[class=product-grid]', 0) doesn't return an object with a method called children(). Maybe it's returning null or something that's not an object. Put it in a separate variable and look what the value of that variable is.
$what_is_this = $output->find('div[class=product-grid]', 0);
var_dump($what_is_this)
Update:
I debugged your program, and apart from the simple html dom parser seemingly expecting classes to be given as 'div.product-grid' instead of 'div[class=x]' it also turns out that the webpage responds by returning a product list instead of a product grid. I've included a working copy below.
<?php
include_once("simple_html_dom.php");
$serv=$_GET['search'];
$url = 'http://www.partyhousedecorations.com/category-adult-birthday-party-themes';
$output = file_get_html($url);
$arrOfStuff = $output->find('div.product-list', 0)->children();
foreach( $arrOfStuff as $item )
{
echo "Party House Decorations".'<br>';
echo $item->find('div.name', 0)->find('a', 0)->innertext.'<br>';
echo '<img src="http://www.partyhousedecorations.com'.$item->find('div.image', 0)->find('img', 0)->src.'"><br>';
echo str_replace('KWD', 'AED', $item->find('div.price',0)->innertext.'<br>');
}
?>
Related
I want to fetch image from google using PHP. so I tried to get help from net I got a script as I needed but it is showing this fatal error
Fatal error: Call to a member function find() on a non-object in C:\wamp\www\nq\qimages.php on line 7**
Here is my script:
<?php
include "simple_html_dom.php";
$search_query = "car";
$search_query = urlencode( $search_query );
$html = file_get_html( "https://www.google.com/search?q=$search_query&tbm=isch" );
$image_container = $html->find('div#rcnt', 0);
$images = $image_container->find('img');
$image_count = 10; //Enter the amount of images to be shown
$i = 0;
foreach($images as $image){
if($i == $image_count) break;
$i++;
// DO with the image whatever you want here (the image element is '$image'):
echo $image;
}
?>
I am also using Simple html dom.
Look at my example that works and gets first image from google results:
<?php
$url = "https://www.google.hr/search?q=aaaa&biw=1517&bih=714&source=lnms&tbm=isch&sa=X&ved=0CAYQ_AUoAWoVChMIyKnjyrjQyAIVylwaCh06nAIE&dpr=0.9";
$content = file_get_contents($url);
libxml_use_internal_errors(true);
$dom = new DOMDocument;
#$dom->loadHTML($content);
$images_dom = $dom->getElementsByTagName('img');
foreach ($images_dom as $img) {
if($img->hasAttribute('src')){
$image_url = $img->getAttribute('src');
}
break;
}
//this is first image on url
echo $image_url;
This error usually means that $html isn't an object.
It's odd that you say this seems to work. What happens if you output $html? I'd imagine that the url isn't available and that $html is null.
Edit: Looks like this may be an error in the parser. Someone has submitted a bug and added a check in his code as a workaround.
I´m parsing some itunes links with dom parser in php. With most of the links it works perfectly. Others which are totally the same type it doesn`t?! I need the "img" tag and the "src-swap-high-dpi" attribute. It drives me nuts. That´s a part of my php-code
$url = "https://itunes.apple.com/us/podcast/id278981407";
$htmlContent = str_get_html(file_get_contents($url));
foreach ($htmlContent->find("img") as $element) {
$value = $element->getAttribute("src-swap-high-dpi");
echo $value;
}
So e.g. I can parse the following links:
https://itunes.apple.com/us/podcast/id201671138
https://itunes.apple.com/us/podcast/id523121474
https://itunes.apple.com/us/podcast/id152249110
But this e.g. not:
https://itunes.apple.com/us/podcast/id278981407
I do not get any output.
Edit:
New Code doesnt work as well:
Still not working for me. Very strange. Thats my new complete code now:
<?php
ini_set("display_errors",1); error_reporting(E_ALL);
require_once ('simple_html_dom.php');
$url = "https://itunes.apple.com/us/podcast/id278981407";
$htmlContent = str_get_html(file_get_contents($url));
foreach($htmlContent->find("div.artwork") as $div) {
$value = $div->find("img",0)->getAttribute("src-swap-high-dpi");
echo $value."<br/>";
}
?>
I get the Output:
Fatal error: Call to a member function find() on a non-object in /home/www/whatever/delete.php on line 10
line 10 is the line starting with "foreach". Your code works fine with the links provided above which I declared as working. But as soon as I take one of the designated one which doesnt work I get the error message provided above. ?!
I think this is one of the cases Simple DOM gets a bit confused and you need to provide it with a parent:
$url = "https://itunes.apple.com/us/podcast/id278981407";
$htmlContent = str_get_html(file_get_contents($url));
foreach($htmlContent->find("div.artwork") as $div) {
$value = $div->find("img",0)->getAttribute("src-swap-high-dpi");
echo $value."<br/>";
}
UPDATE
Here are the results using the above fragment:
http://a3.mzstatic.com/us/r30/Podcasts/v4/61/cc/7f/61cc7f25-131f-7616-6549-5553e6444b87/mza_7489225285918350214.150x150-75.jpg
http://a2.mzstatic.com/us/r30/Podcasts6/v4/04/a9/64/04a964d7-7c10-72d6-871b-97619cf89066/mza_1416781107029663068.150x150-75.jpg
http://a5.mzstatic.com/us/r30/Podcasts4/v4/bb/a6/f4/bba6f4b6-eeab-d7d9-8591-adb2bd277ccb/mza_5223368352447971673.150x150-75.jpg
http://a1.mzstatic.com/us/r30/Podcasts5/v4/aa/54/16/aa541600-cc8b-772b-9c0a-824efe8fdc42/mza_6772270613386652594.150x150-75.jpg
http://a2.mzstatic.com/us/r30/Podcasts3/v4/95/3d/2f/953d2f75-c2c2-4815-a752-f30fdcc0b9fb/mza_9037746738018570312.150x150-75.jpg
http://a4.mzstatic.com/us/r30/Podcasts4/v4/a2/1c/f5/a21cf5a4-2d8d-1ed7-983f-1c90f2f4f948/mza_7120473049241631392.340x340-75.jpg
http://a2.mzstatic.com/us/r30/Podcasts4/v4/5d/21/8d/5d218d2a-2980-0ac9-0bc7-9321ea6eb334/mza_6358466742996313573.150x150-75.jpg
http://a1.mzstatic.com/us/r30/Podcasts/b2/bb/bf/ps.ykmejwzs.150x150-75.jpg
http://a4.mzstatic.com/us/r30/Podcasts6/v4/17/ea/31/17ea3187-ef8c-4756-e488-0c65adced988/mza_7931750363714403933.150x150-75.jpg
http://a1.mzstatic.com/us/r30/Podcasts2/v4/0b/3c/7d/0b3c7d2b-19bf-f7a2-7c50-ca15338b8316/mza_2792239161425784587.150x150-75.jpg
Can you verify you're not getting errors at all ? Say, just write some weird characters in your PHP file, does the PHP shows the error? If not, try to add this in your .htaccess file.
<IfModule mod_php5.c>
# do not display errors
php_value display_errors 1
</IfModule>
UPDATE 2
$url = "https://itunes.apple.com/us/podcast/id278981407";
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,FALSE);
$html = curl_exec($ch);
curl_close($ch);
//$htmlContent = str_get_html(file_get_contents($url));
$htmlContent = str_get_html($html);
foreach($htmlContent->find("div.artwork") as $div) {
$value = $div->find("img",0)->getAttribute("src-swap-high-dpi");
echo $value."<br/>";
}
The reason i didn't use file_get_html of Simple Dom is because it simply uses file_get_contents internally.
I'm writing this PHP to read the data from the following website, and the write it into database.
Here's the code:
<?php
require('simple_html_dom.php');
$html = file_get_html('http://backpack.tf/pricelist/spreadsheet');
$data = $html->find('.table tr td[1]');
foreach($data as $result)
{
echo $result->plaintext . '<br />';
}
?>
I intended to get all the data in the tds and even the attribute inside the trs.
So, I tried by getting them in plain text first.
By far the code returns:
Fatal error: Call to a member function find() on a non-object
How can I solve and improve the code?
The following code is working for your example.
It could be the memory limit for your executing script that's causing trouble.
ini_set('memory_limit','160M');
require('simple_html_dom.php');
$url = 'http://backpack.tf/pricelist/spreadsheet';
$html = new simple_html_dom();
$html->load_file($url);
$data = $html->find('.table tr td[1]');
foreach($data as $result)
{
echo $result->plaintext . '<br />';
}
I'm trying to build a personal project of mine, however I'm a bit stuck when using the Simple HTML DOM class.
What I'd like to do is scrape a website and retrieve all the content, and it's inner html, that matches a certain class.
My code so far is:
<?php
error_reporting(E_ALL);
include_once("simple_html_dom.php");
//use curl to get html content
$url = 'http://www.peopleperhour.com/freelance-seo-jobs';
$html = file_get_html($url);
//Get all data inside the <div class="item-list">
foreach($html->find('div[class=item-list]') as $div) {
//get all div's inside "item-list"
foreach($div->find('div') as $d) {
//get the inner HTML
$data = $d->outertext;
}
}
print_r($data)
echo "END";
?>
All I get with this is a blank page with "END", nothing else outputted at all.
It seems your $data variable is being assigned a different value on each iteration. Try this instead:
$data = "";
foreach($html->find('div[class=item-list]') as $div) {
//get all divs inside "item-list"
foreach($div->find('div') as $d) {
//get the inner HTML
$data .= $d->outertext;
}
}
print_r($data)
I hope that helps.
I think, you may want something like this
$url = 'http://www.peopleperhour.com/freelance-seo-jobs';
$html = file_get_html($url);
foreach ($html->find('div.item-list div.item') as $div) {
echo $div . '<br />';
};
This will give you something like this (if you add the proper style sheet, it'll be displayed nicely)
I'm trying to read the xml information that tumblr provides to create a kind of news feed off the tumblr, but I'm very stuck.
<?php
$request_url = 'http://candybrie.tumblr.com/api/read?type=post&start=0&num=5&type=text';
$xml = simplexml_load_file($request_url);
if (!$xml)
{
exit('Failed to retrieve data.');
}
else
{
foreach ($xml->posts[0] AS $post)
{
$title = $post->{'regular-title'};
$post = $post->{'regular-body'};
$small_post = substr($post,0,320);
echo .$title.;
echo '<p>'.$small_post.'</p>';
}
}
?>
Which always breaks as soon as it tries to go through the nodes. So basically "tumblr->posts;....ect" is displayed on my html page.
I've tried saving the information as a local xml file. I've tried using different ways to create the simplexml object, like loading it as a string (probably a silly idea). I double checked that my webhosting was running PHP5. So basically, I'm stuck on why this wouldn't be working.
EDIT: Ok I tried changing from where I started (back to the original way it was, starting from tumblr was just another (actually silly) way to try to fix it. It still breaks right after the first ->, so displays "posts[0] AS $post....ect" on screen.
This is the first thing I've ever done in PHP so there might be something obvious that I should have set up beforehand or something. I don't know and couldn't find anything like that though.
This should work :
<?php
$request_url = 'http://candybrie.tumblr.com/api/read?type=post&start=0&num=5&type=text';
$xml = simplexml_load_file($request_url);
if ( !$xml ){
exit('Failed to retrieve data.');
}else{
foreach ( $xml->posts[0] AS $post){
$title = $post->{'regular-title'};
$post = $post->{'regular-body'};
$small_post = substr($post,0,320);
echo $title;
echo '<p>'.$small_post.'</p>';
echo '<hr>';
}
}
First thing in you code is that you used root element that should not be used.
<?php
$request_url = 'http://candybrie.tumblr.com/api/read?type=post&start=0&num=5&type=text';
$xml = simplexml_load_file($request_url);
if (!$xml)
{
exit('Failed to retrieve data.');
}
else
{
foreach ($xml->posts->post as $post)
{
$title = $post->{'regular-title'};
$post = $post->{'regular-body'};
$small_post = substr($post,0,320);
echo .$title.;
echo '<p>'.$small_post.'</p>';
}
}
?>
$xml->posts returns you the posts nodes, so if you want to iterate the post nodes you should try $xml->posts->post, which gives you the ability to iterate through the post nodes inside the first posts node.
Also as Needhi pointed out you shouldn't pass through the root node (tumblr), because $xml represents itself the root node. (So I fixed my answer).