why server could not fetch title of website? - php

I am trying to get the title of a website. This code works perfectly on my computer but on the server it is not running smoothly. On server it could not fetch the url content. On my computer it is easily redirecting.
<?php
ini_set('max_execution_time', 300);
$url = "http://www.cricinfo.com/ci/engine/match/companion/597928.html";
if(strpos( $url, "companion" ) !== false)
{
$url = str_replace("/companion","",$url);
}
$html= file_get_contents($url);
echo $html;
//parsing begins here:
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
//get and display what you need:
$title = $nodes->item(0)->nodeValue;
$msg1 = current(explode("|", $title));
$msg=rawurlencode($msg1);
echo $msg;
if(empty($msg))
{
echo "no data to send";
}
else
{
header("Location:fullonsms.php?msg=" .$msg);
}
exit();
?>
the output on server is this http://sendmysms.bugs3.com/cricket/fetch.php

It appears that the fopen wrappers aren't enabled. As you can see in the notes section of the php docs for file_get_contents, allow_url_fopen must be set to true in order to open a url with file_get_contents. Try running the following on the server to see if you can use file_get_contents with a url.
echo "urls ";
echo (ini_get('allow_url_include')) ? "allowed" : "not allowed";
echo " in file_get_contents.";
If that says 'urls not allowed in file_get_contents' then you'll need to update the setting via the php.ini, a .htaccess file, apache config, or some such equivalent. That is, if you would like to continue using file_get_contents to access the url. Another option is to use curl if you have the php curl extension installed.
P.S. I know this is a problem with the call to file_get_contents since you can see that his script echos the $html variable after he sets it. His link to his script on the server doesn't output any html which tells me this is an issue with grabbing the html rather than the html parser.

Related

Warning file_get_contents in DOM Parser

My case is, I want to scrap a website, which is success, and I'm using PHP cURL. The problem start when I want to use the DOM Parser to get the content I want. Here is the warning came out:
the error image is here
And the code I use is here. Before this code, I scrap a website using cURL, it's working, but just this part got error :
include 'simple_html_dom.php';
//Here is where I scraping, no need to show it
$fp = fopen(dirname(__FILE__) . '/airpaz.html', 'w');
//$html contain the page I scrap
fwrite($fp, $html);
fclose($fp);
$html_content = file_get_contents(dirname(__FILE__) . '/airpaz.html');
echo $html_content;
$html2 = new simple_html_dom();
$html2->load_file($html_content);
Hope you guys can help, thanks
It looks like you are trying to read a file 3 times:
$read_file = fread($fr, filesize(dirname(__FILE__) . '/airpaz.html'));
and:
$html_content = file_get_contents($read_file);
and:
$html2->load_file($html_content);
In the last two instances, instead of a file-name you pass html contents to the function so that will not work.
You should read the file only once and use string functions on the contents you receive. Or you open the url directly in $html2->load_file().
try this code
include 'simple_html_dom.php';
$html_content = file_get_html(dirname(__FILE__) . '/airpaz.html');
echo $html_content;
$html2 = new simple_html_dom();
$html2->load_file($html_content);

Cant parse specific Links with php dom parser

I´m parsing some itunes links with dom parser in php. With most of the links it works perfectly. Others which are totally the same type it doesn`t?! I need the "img" tag and the "src-swap-high-dpi" attribute. It drives me nuts. That´s a part of my php-code
$url = "https://itunes.apple.com/us/podcast/id278981407";
$htmlContent = str_get_html(file_get_contents($url));
foreach ($htmlContent->find("img") as $element) {
$value = $element->getAttribute("src-swap-high-dpi");
echo $value;
}
So e.g. I can parse the following links:
https://itunes.apple.com/us/podcast/id201671138
https://itunes.apple.com/us/podcast/id523121474
https://itunes.apple.com/us/podcast/id152249110
But this e.g. not:
https://itunes.apple.com/us/podcast/id278981407
I do not get any output.
Edit:
New Code doesnt work as well:
Still not working for me. Very strange. Thats my new complete code now:
<?php
ini_set("display_errors",1); error_reporting(E_ALL);
require_once ('simple_html_dom.php');
$url = "https://itunes.apple.com/us/podcast/id278981407";
$htmlContent = str_get_html(file_get_contents($url));
foreach($htmlContent->find("div.artwork") as $div) {
$value = $div->find("img",0)->getAttribute("src-swap-high-dpi");
echo $value."<br/>";
}
?>
I get the Output:
Fatal error: Call to a member function find() on a non-object in /home/www/whatever/delete.php on line 10
line 10 is the line starting with "foreach". Your code works fine with the links provided above which I declared as working. But as soon as I take one of the designated one which doesnt work I get the error message provided above. ?!
I think this is one of the cases Simple DOM gets a bit confused and you need to provide it with a parent:
$url = "https://itunes.apple.com/us/podcast/id278981407";
$htmlContent = str_get_html(file_get_contents($url));
foreach($htmlContent->find("div.artwork") as $div) {
$value = $div->find("img",0)->getAttribute("src-swap-high-dpi");
echo $value."<br/>";
}
UPDATE
Here are the results using the above fragment:
http://a3.mzstatic.com/us/r30/Podcasts/v4/61/cc/7f/61cc7f25-131f-7616-6549-5553e6444b87/mza_7489225285918350214.150x150-75.jpg
http://a2.mzstatic.com/us/r30/Podcasts6/v4/04/a9/64/04a964d7-7c10-72d6-871b-97619cf89066/mza_1416781107029663068.150x150-75.jpg
http://a5.mzstatic.com/us/r30/Podcasts4/v4/bb/a6/f4/bba6f4b6-eeab-d7d9-8591-adb2bd277ccb/mza_5223368352447971673.150x150-75.jpg
http://a1.mzstatic.com/us/r30/Podcasts5/v4/aa/54/16/aa541600-cc8b-772b-9c0a-824efe8fdc42/mza_6772270613386652594.150x150-75.jpg
http://a2.mzstatic.com/us/r30/Podcasts3/v4/95/3d/2f/953d2f75-c2c2-4815-a752-f30fdcc0b9fb/mza_9037746738018570312.150x150-75.jpg
http://a4.mzstatic.com/us/r30/Podcasts4/v4/a2/1c/f5/a21cf5a4-2d8d-1ed7-983f-1c90f2f4f948/mza_7120473049241631392.340x340-75.jpg
http://a2.mzstatic.com/us/r30/Podcasts4/v4/5d/21/8d/5d218d2a-2980-0ac9-0bc7-9321ea6eb334/mza_6358466742996313573.150x150-75.jpg
http://a1.mzstatic.com/us/r30/Podcasts/b2/bb/bf/ps.ykmejwzs.150x150-75.jpg
http://a4.mzstatic.com/us/r30/Podcasts6/v4/17/ea/31/17ea3187-ef8c-4756-e488-0c65adced988/mza_7931750363714403933.150x150-75.jpg
http://a1.mzstatic.com/us/r30/Podcasts2/v4/0b/3c/7d/0b3c7d2b-19bf-f7a2-7c50-ca15338b8316/mza_2792239161425784587.150x150-75.jpg
Can you verify you're not getting errors at all ? Say, just write some weird characters in your PHP file, does the PHP shows the error? If not, try to add this in your .htaccess file.
<IfModule mod_php5.c>
# do not display errors
php_value display_errors 1
</IfModule>
UPDATE 2
$url = "https://itunes.apple.com/us/podcast/id278981407";
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,FALSE);
$html = curl_exec($ch);
curl_close($ch);
//$htmlContent = str_get_html(file_get_contents($url));
$htmlContent = str_get_html($html);
foreach($htmlContent->find("div.artwork") as $div) {
$value = $div->find("img",0)->getAttribute("src-swap-high-dpi");
echo $value."<br/>";
}
The reason i didn't use file_get_html of Simple Dom is because it simply uses file_get_contents internally.

cannot get xml from url in PHP

I am very new to both php and xml. What I am trying to do in
php is read in xml from a call to a url, and then parse the xml.
(I can get this to work in the example below when $urlip = 'localfile.xml'
but not when I put in a url. Ive checked the url by going to it with my browser,
and I can see the xml. I also did a show source, copied it and then pasted the
xml into the localfile and that works fine.
What am I doing wrong in trying to get the xml from the url?
Thank you
The error being returned is:
Error loading XML Start tag expected, ‘<' not found
Here is my code snip it:
$urlip="test.xml";# for debugging since I cannot read from the url yet! not sure why....
if (($xml = file_get_contents($urlip))===false) {
echo "error fetching XML\n";
} else {
libxml_use_internal_errors(true);
$data = simplexml_load_string($xml,null,LIBXML_NOCDATA);
if (!$data) {
echo "Error loading XML\n";
foreach(libxml_get_errors() as $error) {
echo "\t", $error->message;
}
} else {
foreach ($data as $item) {
$type = $item->TAB_TYPE;
$number=$item->ALT_ID;
$title = $item->SHORT_DESCR;
$searchlink = $item->ID;
$rsite=$item->CATEGORY;
echo "type $type, number $number, title $title, search link $searchlink, site $rsite\n";
}
}
}
Most likely situation from what it looks like:
Your function queries the remote URL and returns you an empty string, which passes the condition of your 'if' statement.
After that - you try to pass the empty string into XML, but it cannot, so it gives you an error.
Your steps to solve it:
configure php to open remote urls as comments to your question state - url_fopen
use another way to get content from the URL - cURL library works well

How to extract price from Amazon Url without Amazon API

I'm trying to load html file from a Amazon URL to extract the product price using a simple php function on Yii.
I started to get the entire file with php function file_get_contents, and than extract only the price from my html file with DOM.
I'm using a DOM parser to read the HTML file. It has convenient functions to read the tags of a html file. This is the parser:
http://simplehtmldom.sourceforge.net/
The URL that php analyze can be of amazon.com, amazon.co.uk, amazon.it, etc. In the future this feature will be used also to analyze other url different from Amazon.
I created a simple function, that from a URL, extract the price, here it is:
public function findAmazonPriceFromUrl($url) {
Yii::import('ext.HtmlDOMParser.*');
require_once('simple_html_dom.php');
$html = file_get_html($url);
$item = $html->getElementsById('actualPriceValue');
if ($item) {
$price = $item[0]->firstChild()->innertext;
} else {
$item = $html->getElementsById('current-price');
$price = $item[0]->innertext;
}
return $price;
}
The file_get_html function is the following:
function file_get_html($url) {
$dom = new simple_html_dom();
$contents = file_get_contents($url);
if (empty($contents) || strlen($contents) > MAX_FILE_SIZE) {
return false;
}
$dom->load($contents);
return $dom;
}
I noticed that after a few request (various links), I always get an error from the server (Error 500). I checked my apache log file, but everything is good.
Amazon could block my requests after certain time? How can i fix it?
Thanks in advance for the help
I had same problem and this is my fix: I run script again if image is not parsed. image is parsed first in my php script so I check if it works and amazon gives information. I hope it helps.
if($html->find('#main-image')) {
foreach($html->find('#main-image') as $e) {
echo '<span href="'. $e->src . '" class="imgblock parseimg">
<img src="'. $e->src . '" class="resultimg" alt="'.$name.'" title="'.$name.'">
</span>
<input type="hidden" name="my-item-img" value="'. $e->src . '" />';
}
} else {
gethtml($url,$domain);
die;
}

PHP returning page error on simplexml print_r

The problem is only happening with one file when I try to do a DocumentDOM/SimpleXML method, so it seems like the issue is with that file. No clue what it could be.
If I do the following:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
$xml = simplexml_import_dom($dom);
print_r($xml);
in Chrome, I get a "Page Unavailable" error. In Firefox, I get nothing.
If I do the same thing but to a "test2.html", I get a print out as expected.
If I try the same thing but doing it this way:
$file = "test1.html";
$data = file_get_contents($file)
$dom = DOMDocument::loadHTML($data);
$xml = simplexml_import_dom($dom);
print_r($xml);
I get the same issue.
If I comment out the print_r line, Chrome goes from the "Page Unavailable" to blank.
I changed the permissions to 777, in case that was an issue, no fix.
I tried simply echoing out the contents of the html, no problem at all.
Any clues as to why a) Chrome would do that, and b) why I'm not getting any usable results?
Update:
If I put in:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
if(!$dom) {
echo "No Load!";
}
else {
$xml = simplexml_import_dom($dom);
print_r($xml);
}
I get the same issue. If I put in:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
if(!$dom) {
echo "No Load!";
}
else {
echo "Load!";
}
I get the "Load!" output, meaning that the dom method shouldn't be the problem (?)
I'll try the same exact test with the simplexml.
Update2:
If I do this:
I get the same issue. If I put in:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
$xml = simplexml_import_dom($dom);
if(!$xml) {
echo "No Load!";
}
else {
echo "Load!";
}
I get "Load!" but if I do:
$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
$xml = simplexml_import_dom($dom);
if(!$xml) {
echo "No Load!";
}
else {
echo "Load!";
print_r($xml);
}
I get the error. I did finally notice that I had an option to view the error in Chrome:
Error 324 (net::ERR_EMPTY_RESPONSE): Unknown error.
The troublesome html file is 288Kb. Could that be the issue? If so, how would I adjust for that?
Last Update:
Very Odd. I can use methods and functions on the object (as simplexml or domdocument), so I can do things like xpath to delete or parse the html, etc. In some cases (small results) it can echo out results, but for big stuff (show all spans), it fails in the same way.
So, since the end result, I think will fit in these parameters, I SHOULD be okay (I guess).
But any real solution is very welcome.
Turn on error reporting: error_reporting(E_ALL); in the first line of your PHP code.
Check the memory limit of your PHP configuration: memory_limit in the respective php.ini
What's the difference between test1.html and test2.html? Perhaps test1.html is not well-formed.
DocumentDOM and/or SimpleXML may bail out if the document is malformed. Try something like:
$dom = DOMDocument::loadHTMLFile($file);
if (!$dom) {
echo 'Loading file failed';
exit;
}
$xml = simplexml_import_dom($dom);
if (!$xml) {
...
}
If creating the $dom worked, conversion to $xml should work as well, but make sure anyway.
Edit: As Gehrig said, make sure error reporting is on, that should make it obvious where the process fails.

Categories