why is this php code not working? - php

This is the code I am using to scrape specific data from http://www.partyhousedecorations.com
however I keep getting this error (Fatal error: Call to a member function children() on a non-object in C:\wamp\www\webScraping\PartyHouseDecorations.php on line 8 )and I am stuck and can't seem to be able to fix it.
This is my code:
<?php
include_once("simple_html_dom.php");
$serv=$_GET['search'];
$url = 'http://www.partyhousedecorations.com/category-adult-birthday-party-themes'.$serv;
$output = file_get_html($url);
$arrOfStuff = $output->find('div[class=product-grid]', 0)->children();
foreach( $arrOfStuff as $item )
{
echo "Party House Decorations".'<br>';
echo $item->find('div[class=name]', 0)->find('a', 0)->innertext.'<br>';
echo '<img src="http://www.partyhousedecorations.com'.$item->find('div[class=image]', 0)->find('img', 0)->src.'"><br>';
echo str_replace('KWD', 'AED', $item->find('div[class=price]',0)->innertext.'<br>');
}
?>

Looks like $output->find('div[class=product-grid]', 0) doesn't return an object with a method called children(). Maybe it's returning null or something that's not an object. Put it in a separate variable and look what the value of that variable is.
$what_is_this = $output->find('div[class=product-grid]', 0);
var_dump($what_is_this)
Update:
I debugged your program, and apart from the simple html dom parser seemingly expecting classes to be given as 'div.product-grid' instead of 'div[class=x]' it also turns out that the webpage responds by returning a product list instead of a product grid. I've included a working copy below.
<?php
include_once("simple_html_dom.php");
$serv=$_GET['search'];
$url = 'http://www.partyhousedecorations.com/category-adult-birthday-party-themes';
$output = file_get_html($url);
$arrOfStuff = $output->find('div.product-list', 0)->children();
foreach( $arrOfStuff as $item )
{
echo "Party House Decorations".'<br>';
echo $item->find('div.name', 0)->find('a', 0)->innertext.'<br>';
echo '<img src="http://www.partyhousedecorations.com'.$item->find('div.image', 0)->find('img', 0)->src.'"><br>';
echo str_replace('KWD', 'AED', $item->find('div.price',0)->innertext.'<br>');
}
?>

Related

Fetching image from google using dom

I want to fetch image from google using PHP. so I tried to get help from net I got a script as I needed but it is showing this fatal error
Fatal error: Call to a member function find() on a non-object in C:\wamp\www\nq\qimages.php on line 7**
Here is my script:
<?php
include "simple_html_dom.php";
$search_query = "car";
$search_query = urlencode( $search_query );
$html = file_get_html( "https://www.google.com/search?q=$search_query&tbm=isch" );
$image_container = $html->find('div#rcnt', 0);
$images = $image_container->find('img');
$image_count = 10; //Enter the amount of images to be shown
$i = 0;
foreach($images as $image){
if($i == $image_count) break;
$i++;
// DO with the image whatever you want here (the image element is '$image'):
echo $image;
}
?>
I am also using Simple html dom.
Look at my example that works and gets first image from google results:
<?php
$url = "https://www.google.hr/search?q=aaaa&biw=1517&bih=714&source=lnms&tbm=isch&sa=X&ved=0CAYQ_AUoAWoVChMIyKnjyrjQyAIVylwaCh06nAIE&dpr=0.9";
$content = file_get_contents($url);
libxml_use_internal_errors(true);
$dom = new DOMDocument;
#$dom->loadHTML($content);
$images_dom = $dom->getElementsByTagName('img');
foreach ($images_dom as $img) {
if($img->hasAttribute('src')){
$image_url = $img->getAttribute('src');
}
break;
}
//this is first image on url
echo $image_url;
This error usually means that $html isn't an object.
It's odd that you say this seems to work. What happens if you output $html? I'd imagine that the url isn't available and that $html is null.
Edit: Looks like this may be an error in the parser. Someone has submitted a bug and added a check in his code as a workaround.

Cant parse specific Links with php dom parser

I´m parsing some itunes links with dom parser in php. With most of the links it works perfectly. Others which are totally the same type it doesn`t?! I need the "img" tag and the "src-swap-high-dpi" attribute. It drives me nuts. That´s a part of my php-code
$url = "https://itunes.apple.com/us/podcast/id278981407";
$htmlContent = str_get_html(file_get_contents($url));
foreach ($htmlContent->find("img") as $element) {
$value = $element->getAttribute("src-swap-high-dpi");
echo $value;
}
So e.g. I can parse the following links:
https://itunes.apple.com/us/podcast/id201671138
https://itunes.apple.com/us/podcast/id523121474
https://itunes.apple.com/us/podcast/id152249110
But this e.g. not:
https://itunes.apple.com/us/podcast/id278981407
I do not get any output.
Edit:
New Code doesnt work as well:
Still not working for me. Very strange. Thats my new complete code now:
<?php
ini_set("display_errors",1); error_reporting(E_ALL);
require_once ('simple_html_dom.php');
$url = "https://itunes.apple.com/us/podcast/id278981407";
$htmlContent = str_get_html(file_get_contents($url));
foreach($htmlContent->find("div.artwork") as $div) {
$value = $div->find("img",0)->getAttribute("src-swap-high-dpi");
echo $value."<br/>";
}
?>
I get the Output:
Fatal error: Call to a member function find() on a non-object in /home/www/whatever/delete.php on line 10
line 10 is the line starting with "foreach". Your code works fine with the links provided above which I declared as working. But as soon as I take one of the designated one which doesnt work I get the error message provided above. ?!
I think this is one of the cases Simple DOM gets a bit confused and you need to provide it with a parent:
$url = "https://itunes.apple.com/us/podcast/id278981407";
$htmlContent = str_get_html(file_get_contents($url));
foreach($htmlContent->find("div.artwork") as $div) {
$value = $div->find("img",0)->getAttribute("src-swap-high-dpi");
echo $value."<br/>";
}
UPDATE
Here are the results using the above fragment:
http://a3.mzstatic.com/us/r30/Podcasts/v4/61/cc/7f/61cc7f25-131f-7616-6549-5553e6444b87/mza_7489225285918350214.150x150-75.jpg
http://a2.mzstatic.com/us/r30/Podcasts6/v4/04/a9/64/04a964d7-7c10-72d6-871b-97619cf89066/mza_1416781107029663068.150x150-75.jpg
http://a5.mzstatic.com/us/r30/Podcasts4/v4/bb/a6/f4/bba6f4b6-eeab-d7d9-8591-adb2bd277ccb/mza_5223368352447971673.150x150-75.jpg
http://a1.mzstatic.com/us/r30/Podcasts5/v4/aa/54/16/aa541600-cc8b-772b-9c0a-824efe8fdc42/mza_6772270613386652594.150x150-75.jpg
http://a2.mzstatic.com/us/r30/Podcasts3/v4/95/3d/2f/953d2f75-c2c2-4815-a752-f30fdcc0b9fb/mza_9037746738018570312.150x150-75.jpg
http://a4.mzstatic.com/us/r30/Podcasts4/v4/a2/1c/f5/a21cf5a4-2d8d-1ed7-983f-1c90f2f4f948/mza_7120473049241631392.340x340-75.jpg
http://a2.mzstatic.com/us/r30/Podcasts4/v4/5d/21/8d/5d218d2a-2980-0ac9-0bc7-9321ea6eb334/mza_6358466742996313573.150x150-75.jpg
http://a1.mzstatic.com/us/r30/Podcasts/b2/bb/bf/ps.ykmejwzs.150x150-75.jpg
http://a4.mzstatic.com/us/r30/Podcasts6/v4/17/ea/31/17ea3187-ef8c-4756-e488-0c65adced988/mza_7931750363714403933.150x150-75.jpg
http://a1.mzstatic.com/us/r30/Podcasts2/v4/0b/3c/7d/0b3c7d2b-19bf-f7a2-7c50-ca15338b8316/mza_2792239161425784587.150x150-75.jpg
Can you verify you're not getting errors at all ? Say, just write some weird characters in your PHP file, does the PHP shows the error? If not, try to add this in your .htaccess file.
<IfModule mod_php5.c>
# do not display errors
php_value display_errors 1
</IfModule>
UPDATE 2
$url = "https://itunes.apple.com/us/podcast/id278981407";
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,FALSE);
$html = curl_exec($ch);
curl_close($ch);
//$htmlContent = str_get_html(file_get_contents($url));
$htmlContent = str_get_html($html);
foreach($htmlContent->find("div.artwork") as $div) {
$value = $div->find("img",0)->getAttribute("src-swap-high-dpi");
echo $value."<br/>";
}
The reason i didn't use file_get_html of Simple Dom is because it simply uses file_get_contents internally.

Get table value using PHP Simple HTML DOM

I'm writing this PHP to read the data from the following website, and the write it into database.
Here's the code:
<?php
require('simple_html_dom.php');
$html = file_get_html('http://backpack.tf/pricelist/spreadsheet');
$data = $html->find('.table tr td[1]');
foreach($data as $result)
{
echo $result->plaintext . '<br />';
}
?>
I intended to get all the data in the tds and even the attribute inside the trs.
So, I tried by getting them in plain text first.
By far the code returns:
Fatal error: Call to a member function find() on a non-object
How can I solve and improve the code?
The following code is working for your example.
It could be the memory limit for your executing script that's causing trouble.
ini_set('memory_limit','160M');
require('simple_html_dom.php');
$url = 'http://backpack.tf/pricelist/spreadsheet';
$html = new simple_html_dom();
$html->load_file($url);
$data = $html->find('.table tr td[1]');
foreach($data as $result)
{
echo $result->plaintext . '<br />';
}

PHP Simple HTML DOM Scrape External URL

I'm trying to build a personal project of mine, however I'm a bit stuck when using the Simple HTML DOM class.
What I'd like to do is scrape a website and retrieve all the content, and it's inner html, that matches a certain class.
My code so far is:
<?php
error_reporting(E_ALL);
include_once("simple_html_dom.php");
//use curl to get html content
$url = 'http://www.peopleperhour.com/freelance-seo-jobs';
$html = file_get_html($url);
//Get all data inside the <div class="item-list">
foreach($html->find('div[class=item-list]') as $div) {
//get all div's inside "item-list"
foreach($div->find('div') as $d) {
//get the inner HTML
$data = $d->outertext;
}
}
print_r($data)
echo "END";
?>
All I get with this is a blank page with "END", nothing else outputted at all.
It seems your $data variable is being assigned a different value on each iteration. Try this instead:
$data = "";
foreach($html->find('div[class=item-list]') as $div) {
//get all divs inside "item-list"
foreach($div->find('div') as $d) {
//get the inner HTML
$data .= $d->outertext;
}
}
print_r($data)
I hope that helps.
I think, you may want something like this
$url = 'http://www.peopleperhour.com/freelance-seo-jobs';
$html = file_get_html($url);
foreach ($html->find('div.item-list div.item') as $div) {
echo $div . '<br />';
};
This will give you something like this (if you add the proper style sheet, it'll be displayed nicely)

PHP SimpleXML Breaking when trying to traverse nodes

I'm trying to read the xml information that tumblr provides to create a kind of news feed off the tumblr, but I'm very stuck.
<?php
$request_url = 'http://candybrie.tumblr.com/api/read?type=post&start=0&num=5&type=text';
$xml = simplexml_load_file($request_url);
if (!$xml)
{
exit('Failed to retrieve data.');
}
else
{
foreach ($xml->posts[0] AS $post)
{
$title = $post->{'regular-title'};
$post = $post->{'regular-body'};
$small_post = substr($post,0,320);
echo .$title.;
echo '<p>'.$small_post.'</p>';
}
}
?>
Which always breaks as soon as it tries to go through the nodes. So basically "tumblr->posts;....ect" is displayed on my html page.
I've tried saving the information as a local xml file. I've tried using different ways to create the simplexml object, like loading it as a string (probably a silly idea). I double checked that my webhosting was running PHP5. So basically, I'm stuck on why this wouldn't be working.
EDIT: Ok I tried changing from where I started (back to the original way it was, starting from tumblr was just another (actually silly) way to try to fix it. It still breaks right after the first ->, so displays "posts[0] AS $post....ect" on screen.
This is the first thing I've ever done in PHP so there might be something obvious that I should have set up beforehand or something. I don't know and couldn't find anything like that though.
This should work :
<?php
$request_url = 'http://candybrie.tumblr.com/api/read?type=post&start=0&num=5&type=text';
$xml = simplexml_load_file($request_url);
if ( !$xml ){
exit('Failed to retrieve data.');
}else{
foreach ( $xml->posts[0] AS $post){
$title = $post->{'regular-title'};
$post = $post->{'regular-body'};
$small_post = substr($post,0,320);
echo $title;
echo '<p>'.$small_post.'</p>';
echo '<hr>';
}
}
First thing in you code is that you used root element that should not be used.
<?php
$request_url = 'http://candybrie.tumblr.com/api/read?type=post&start=0&num=5&type=text';
$xml = simplexml_load_file($request_url);
if (!$xml)
{
exit('Failed to retrieve data.');
}
else
{
foreach ($xml->posts->post as $post)
{
$title = $post->{'regular-title'};
$post = $post->{'regular-body'};
$small_post = substr($post,0,320);
echo .$title.;
echo '<p>'.$small_post.'</p>';
}
}
?>
$xml->posts returns you the posts nodes, so if you want to iterate the post nodes you should try $xml->posts->post, which gives you the ability to iterate through the post nodes inside the first posts node.
Also as Needhi pointed out you shouldn't pass through the root node (tumblr), because $xml represents itself the root node. (So I fixed my answer).

Categories