I´m parsing some itunes links with dom parser in php. With most of the links it works perfectly. Others which are totally the same type it doesn`t?! I need the "img" tag and the "src-swap-high-dpi" attribute. It drives me nuts. That´s a part of my php-code
$url = "https://itunes.apple.com/us/podcast/id278981407";
$htmlContent = str_get_html(file_get_contents($url));
foreach ($htmlContent->find("img") as $element) {
$value = $element->getAttribute("src-swap-high-dpi");
echo $value;
}
So e.g. I can parse the following links:
https://itunes.apple.com/us/podcast/id201671138
https://itunes.apple.com/us/podcast/id523121474
https://itunes.apple.com/us/podcast/id152249110
But this e.g. not:
https://itunes.apple.com/us/podcast/id278981407
I do not get any output.
Edit:
New Code doesnt work as well:
Still not working for me. Very strange. Thats my new complete code now:
<?php
ini_set("display_errors",1); error_reporting(E_ALL);
require_once ('simple_html_dom.php');
$url = "https://itunes.apple.com/us/podcast/id278981407";
$htmlContent = str_get_html(file_get_contents($url));
foreach($htmlContent->find("div.artwork") as $div) {
$value = $div->find("img",0)->getAttribute("src-swap-high-dpi");
echo $value."<br/>";
}
?>
I get the Output:
Fatal error: Call to a member function find() on a non-object in /home/www/whatever/delete.php on line 10
line 10 is the line starting with "foreach". Your code works fine with the links provided above which I declared as working. But as soon as I take one of the designated one which doesnt work I get the error message provided above. ?!
I think this is one of the cases Simple DOM gets a bit confused and you need to provide it with a parent:
$url = "https://itunes.apple.com/us/podcast/id278981407";
$htmlContent = str_get_html(file_get_contents($url));
foreach($htmlContent->find("div.artwork") as $div) {
$value = $div->find("img",0)->getAttribute("src-swap-high-dpi");
echo $value."<br/>";
}
UPDATE
Here are the results using the above fragment:
http://a3.mzstatic.com/us/r30/Podcasts/v4/61/cc/7f/61cc7f25-131f-7616-6549-5553e6444b87/mza_7489225285918350214.150x150-75.jpg
http://a2.mzstatic.com/us/r30/Podcasts6/v4/04/a9/64/04a964d7-7c10-72d6-871b-97619cf89066/mza_1416781107029663068.150x150-75.jpg
http://a5.mzstatic.com/us/r30/Podcasts4/v4/bb/a6/f4/bba6f4b6-eeab-d7d9-8591-adb2bd277ccb/mza_5223368352447971673.150x150-75.jpg
http://a1.mzstatic.com/us/r30/Podcasts5/v4/aa/54/16/aa541600-cc8b-772b-9c0a-824efe8fdc42/mza_6772270613386652594.150x150-75.jpg
http://a2.mzstatic.com/us/r30/Podcasts3/v4/95/3d/2f/953d2f75-c2c2-4815-a752-f30fdcc0b9fb/mza_9037746738018570312.150x150-75.jpg
http://a4.mzstatic.com/us/r30/Podcasts4/v4/a2/1c/f5/a21cf5a4-2d8d-1ed7-983f-1c90f2f4f948/mza_7120473049241631392.340x340-75.jpg
http://a2.mzstatic.com/us/r30/Podcasts4/v4/5d/21/8d/5d218d2a-2980-0ac9-0bc7-9321ea6eb334/mza_6358466742996313573.150x150-75.jpg
http://a1.mzstatic.com/us/r30/Podcasts/b2/bb/bf/ps.ykmejwzs.150x150-75.jpg
http://a4.mzstatic.com/us/r30/Podcasts6/v4/17/ea/31/17ea3187-ef8c-4756-e488-0c65adced988/mza_7931750363714403933.150x150-75.jpg
http://a1.mzstatic.com/us/r30/Podcasts2/v4/0b/3c/7d/0b3c7d2b-19bf-f7a2-7c50-ca15338b8316/mza_2792239161425784587.150x150-75.jpg
Can you verify you're not getting errors at all ? Say, just write some weird characters in your PHP file, does the PHP shows the error? If not, try to add this in your .htaccess file.
<IfModule mod_php5.c>
# do not display errors
php_value display_errors 1
</IfModule>
UPDATE 2
$url = "https://itunes.apple.com/us/podcast/id278981407";
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,FALSE);
$html = curl_exec($ch);
curl_close($ch);
//$htmlContent = str_get_html(file_get_contents($url));
$htmlContent = str_get_html($html);
foreach($htmlContent->find("div.artwork") as $div) {
$value = $div->find("img",0)->getAttribute("src-swap-high-dpi");
echo $value."<br/>";
}
The reason i didn't use file_get_html of Simple Dom is because it simply uses file_get_contents internally.
Related
I'm having an issue with php file_get_content(), I have a txt file with links where I created a foreach loop that display multiple links in the same webpage but it's not working, please take a look at the code:
<?php
$urls = file("links.txt");
foreach($urls as $url) {
file_get_contents($url);
echo $url;
}
The content of links.txt is: https://www.google.com
Result: Only a String displaying "https://www.google.com"
Another code that works is :
$url1 = file_get_contents('https://google.com');
echo $url1;
This code returns google's homepage, but I need to use first method with loops to provide multiple links.
Any idea?
Here's one way of combining the things you already had implemented:
$urls = file("links.txt");
foreach($urls as $url) {
$contents = file_get_contents($url);
echo $contents;
}
Both file and file_get_contents are functions that return some value; what you had to do is putting return value of the latter one inside a variable, then outputting that variable with echo.
In fact, you didn't even need to use variable: this...
$urls = file("links.txt");
foreach($urls as $url) {
echo file_get_contents($url);
}
... should have been sufficient too.
My case is, I want to scrap a website, which is success, and I'm using PHP cURL. The problem start when I want to use the DOM Parser to get the content I want. Here is the warning came out:
the error image is here
And the code I use is here. Before this code, I scrap a website using cURL, it's working, but just this part got error :
include 'simple_html_dom.php';
//Here is where I scraping, no need to show it
$fp = fopen(dirname(__FILE__) . '/airpaz.html', 'w');
//$html contain the page I scrap
fwrite($fp, $html);
fclose($fp);
$html_content = file_get_contents(dirname(__FILE__) . '/airpaz.html');
echo $html_content;
$html2 = new simple_html_dom();
$html2->load_file($html_content);
Hope you guys can help, thanks
It looks like you are trying to read a file 3 times:
$read_file = fread($fr, filesize(dirname(__FILE__) . '/airpaz.html'));
and:
$html_content = file_get_contents($read_file);
and:
$html2->load_file($html_content);
In the last two instances, instead of a file-name you pass html contents to the function so that will not work.
You should read the file only once and use string functions on the contents you receive. Or you open the url directly in $html2->load_file().
try this code
include 'simple_html_dom.php';
$html_content = file_get_html(dirname(__FILE__) . '/airpaz.html');
echo $html_content;
$html2 = new simple_html_dom();
$html2->load_file($html_content);
This screenshot shows that the URL is getting stored in $url
This screenshots shows that after I add echo $html to the code, it says undefined variable $url and file_get_contents(): filename cannot be empty
Also, I have tried almost everything that's there on stackoverflow including file_get_html() and cURL. Nothing seems to work. Please tell me where I'm going wrong here.
<?php
include_once('simple_html_dom.php');
$base_url = "https://www.instagram.com/";
$html = "";
if ( isset($_POST['username']) ) {
$url = $base_url.htmlspecialchars($_POST['username'])."/";//concatenate $base_url to username to generate full URL
}
$html = file_get_contents($url); //access the URL in $url
$doc = new DOMDocument;
$doc->loadHTML($html); //get HTML of the webpage given by file_get_contents
$tags = $doc->getElementsByTagName('img');
$arr = (array)$tags;
if (empty($arr)) {
echo 'emptyarray';
}
foreach ($tags as $tag) {
echo $tag->getAttribute('src');
}
?>
Edit:
If 'http:// stackoverflow.com/questions' is used instead of 'https:// www.instagram.com/ its_kushal_here' file_get_contents() is working fine and not failing.
When you refreshed the page did you make sure your post parameters carried through to the new request ?
The issue seems to be here
if ( isset($_POST['username']) ) {
$url = $base_url.htmlspecialchars($_POST['username'])."/";
}
If $_POST['username'] is not set then $url will not be defined. Also remove the # from #$doc->loadHTML($html); so you can see the error it outputs. That will help you workout what fails after that point.
This is the code I am using to scrape specific data from http://www.partyhousedecorations.com
however I keep getting this error (Fatal error: Call to a member function children() on a non-object in C:\wamp\www\webScraping\PartyHouseDecorations.php on line 8 )and I am stuck and can't seem to be able to fix it.
This is my code:
<?php
include_once("simple_html_dom.php");
$serv=$_GET['search'];
$url = 'http://www.partyhousedecorations.com/category-adult-birthday-party-themes'.$serv;
$output = file_get_html($url);
$arrOfStuff = $output->find('div[class=product-grid]', 0)->children();
foreach( $arrOfStuff as $item )
{
echo "Party House Decorations".'<br>';
echo $item->find('div[class=name]', 0)->find('a', 0)->innertext.'<br>';
echo '<img src="http://www.partyhousedecorations.com'.$item->find('div[class=image]', 0)->find('img', 0)->src.'"><br>';
echo str_replace('KWD', 'AED', $item->find('div[class=price]',0)->innertext.'<br>');
}
?>
Looks like $output->find('div[class=product-grid]', 0) doesn't return an object with a method called children(). Maybe it's returning null or something that's not an object. Put it in a separate variable and look what the value of that variable is.
$what_is_this = $output->find('div[class=product-grid]', 0);
var_dump($what_is_this)
Update:
I debugged your program, and apart from the simple html dom parser seemingly expecting classes to be given as 'div.product-grid' instead of 'div[class=x]' it also turns out that the webpage responds by returning a product list instead of a product grid. I've included a working copy below.
<?php
include_once("simple_html_dom.php");
$serv=$_GET['search'];
$url = 'http://www.partyhousedecorations.com/category-adult-birthday-party-themes';
$output = file_get_html($url);
$arrOfStuff = $output->find('div.product-list', 0)->children();
foreach( $arrOfStuff as $item )
{
echo "Party House Decorations".'<br>';
echo $item->find('div.name', 0)->find('a', 0)->innertext.'<br>';
echo '<img src="http://www.partyhousedecorations.com'.$item->find('div.image', 0)->find('img', 0)->src.'"><br>';
echo str_replace('KWD', 'AED', $item->find('div.price',0)->innertext.'<br>');
}
?>
Hi I have been having problems with the google weather api having errors Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 2: parser error ....
I tried to use the script of the main author(thinking it was my edited script) but still I am having this errors I tried 2
//komunitasweb.com/2009/09/showing-the-weather-with-php-and-google-weather-api/
and
//tips4php.net/2010/07/local-weather-with-php-and-google-weather/
The weird part is sometimes it fixes itself then goes back again to the error I have been using it for months now without any problem, this just happened yesterday. Also the demo page of the authors are working but I have the same exact code any help please.
this is my site http://j2sdesign.com/weather/widgetlive1.php
#Mike I added your code
<?
$xml = file_get_contents('http://www.google.com/ig/api?weather=jakarta'); if (! simplexml_load_string($xml)) { file_put_contents('malformed.xml', $xml); }
$xml = simplexml_load_file('http://www.google.com/ig/api?weather=jakarta');
$information = $xml->xpath("/xml_api_reply/weather/forecast_information");
$current = $xml->xpath("/xml_api_reply/weather/current_conditions");
$forecast_list = $xml->xpath("/xml_api_reply/weather/forecast_conditions");
?>
and made a list of the error but I can't seem to see the error cause it's been fixing itself then after sometime goes back again to the error
here is the content of the file
<?php include_once('simple_html_dom.php'); // create doctype $dom = new DOMDocument("1.0");
// display document in browser as plain text
// for readability purposes //header("Content-Type: text/plain");
// create root element
$xmlProducts = $dom->createElement("products");
$dom->appendChild($xmlProducts);
$pages = array( 'http://myshop.com/small_houses.html', 'http://myshop.com/medium_houses.html', 'http://myshop.com/large_houses.html' ) foreach($pages as $page) { $product = array(); $source = file_get_html($page); foreach($source->find('img') as $src) { if (strpos($src->src,"http://myshop.com") === false) { $product['image'] = "http://myshop.com/$src->src"; } } foreach($source->find('p[class*=imAlign_left]') as $description) { $product['description'] = $description->innertext; } foreach($source->find('span[class*=fc3]') as $title) { $product['title'] = $title->innertext; } //debug perposes! echo "Current Page: " . $page . "\n"; print_r($product); echo "\n\n\n"; //Clear seperator } ?>
When simplexml_load_string() fails you need to store the data you're trying to load somewhere for review. Examining the data is the first step to diagnose what it causing the error.
$xml = file_get_contents('http://example.com/file.xml');
if (!simplexml_load_string($xml)) {
file_put_contents('malformed.xml', $xml);
}