file_get_html returning empty page

file_get_html returning empty page - php

I am using simple html dom parser but when I use file_get_html(), it returns empty page, but page is note empty you can check by opening in browser. Here is my code
include"11/simple_html_dom.php";
$link = "http://www.flipkart.com/transcend-storejet-25m3-2-5-inch-1-tb-external-hard-disk/p/itmd72p3y3zcsbku? pid=ACCD72ZXFC6ZRTST&srno=b_1&ref=549d7873-2897-4bd5-8451-776337341be8";
$html = file_get_html($link);
if(!empty($html)){
echo $html->find("span.fk-font-verybig") ;
}
else{
echo 'file is empty';
}
Any help would be appreciated.

Try this:
in your simple_html_dom.php
Edit this line define('MAX_FILE_SIZE',600000); to define('MAX_FILE_SIZE',900000); or even more according to the size of your file.
This sometimes occurs when your Html file size is larger then what is define so it returns empty without any errors.
I hope it works.

Instead of echo $html->find("span.fk-font-verybig");
try echo reset( $html->find("span.fk-font-verybig") );

Related

PHP SimplePie Error: $item->get_enclosure() always return true

Am trying to build a news reader using php SimplePie Library. When i try to get image from feed using code
if ($enclosure = $item->get_enclosure()){
$imageLink = $enclosure->get_link();
echo "<img src=\"$imageLink\">";
}
When i fetch feed from an rss feed which dont have an enclosure, it echo image tag with source as follows.
src="//?#"
The above code is working fine with feeds which have enclosures.
I also tried with code:
if ($enclosure = $item->get_enclosure()){
if($imageLink = $enclosure->get_link()){
echo "<img src=\"$imageLink\">";
}
}
can someone tell me what i am doin wrong in these codes?

Seems like $imageLink value is //?#, so if you do
if($imageLink = $enclosure->get_link())
The result is true...
check the exact value if there is no enclosure, and then change the condition... I.E
$imageLink = $enclosure->get_link();
if($imageLink !== "//?#") {
You can check the exact value using
if ($enclosure = $item->get_enclosure()){
$imageLink = $enclosure->get_link();
var_dump($imageLink);
}

Check if $imageLink is assigned a value anywhere in your code. Most probably that could be the error. Use print_r or var_dump at each step of your code to fine where exactly is you code assigning that value to before mentioned variable

newline in json_encode() output

I am building an output array like so
if (count($errors)) {
$success = 'false';
$output['json_msg'] = "Please try your submission again.";
$output['errors'] = $errors;
} else {
$success = 'true';
$output['json_msg'] = "Thanks for Becoming a NOLA Insider!";
}
$output['success'] = $success;
header('Content-type:application/json;charset=utf-8');
if (count($errors)) { http_response_code(500); }
echo json_encode($output);
exit;
But when I look at the response in Chrome's Network pane of the developer tools I see what appears to be a newline in response:
I tried wrapping json_encode() in trim() but this gave garbled output.
How do I eliminate the carriage return?

You can try to remove new line using str_replace
$output = str_replace(array("\r\n", "\n", "\r"),'',$output);
echo json_encode($output);

Do you have a ?> at the end of your PHP file and what's happening when you remove it ?
Because you may have a carriage return at the end of the script which may be sent before your response :
?>\n
// END OF FILE
This is explained by the fact that PHP is actually a templating language :
Here is a file which defines a function and which displays a text :
<?php
/**
* #File lib.php
*/
function sayHello()
{
echo "hello";
}
?>
forgotten text
And here is a file that includes this file.
<?php
/**
* #file index.php
*/
include_once('lib.php');
sayHello();
This will output :
forgotten text
hello
The "forgotten text" is output when the lib.php file is included whereas the "hello" is output after.
(But it may be even simpler and just the point that #nanocv suggested)

If you are getting the new line characters like \r\n to your json code after json_encode() you can follow up the method with the final json_value that you get. This will remove up all the new lines that has been output-ed from the code that you obtain after you perform the json_encode().
Hence you need to preg_replace() the json outputed value as follows which will remove uo the new lines from the json_code.
This will replace the new lines with no value over to the second parameter in preg_replace().
Try not to provide any white spaced between the php codes (i.e) Opening and Closing Codes that you process either at the beginning or at the end of the document. This may cause the issue sometimes.
Code:
$output_json = preg_replace("!\r?\n!","", $output_json);

I bet your php code starts this way:
1. <--- Note the blank line here
2. <?php
That's a new line character that will became part of the result.
(This way I could recreate the same behavior)

I solved by removing spaces from the php file indicated in the include .

image_container remains null therefore it throws the error? [duplicate]

I am using this library (PHP Simple HTML DOM parser) to parse a link, here's the code:
function getSemanticRelevantKeywords($keyword){
$results = array();
$html = file_get_html("http://www.semager.de/api/keyword.php?q=". urlencode($keyword) ."&lang=de&out=html&count=2&threshold=");
foreach($html->find('span') as $e){
$results[] = $e->plaintext;
}
return $results;
}
but I am getting this error when I output the results:
Fatal error: Call to a member function find() on a non-object in
/var/www/vhosts/efamous.de/subdomains/sandbox/httpdocs/getNewTrusts.php
on line 25
(line 25 is the foreach loop), the odd thing is that it outputs everything (at least seemingly) correctly but I still get that error and can't figure out why.

The reason for this error is: the simple HTML DOM does not return the object if the size of the response from url is greater than 600000.
You can void it by changing the simple_html_dom.php file. Remove strlen($contents) > MAX_FILE_SIZE from the if condition of the file_get_html function.
This will solve your issue.

You just need to increase CONSTANT MAX_FILE_SIZE in file simple_html_dom.php.
For example:
define('MAX_FILE_SIZE', 999999999999999);

This error usually means that $html isn't an object.
It's odd that you say this seems to work. What happens if you output $html?
I'd imagine that the url isn't available and that $html is null.
Edit:
Looks like this may be an error in the parser. Someone has submitted a bug and added a check in his code as a workaround.

Before file_get_html/load_file method, you should first check if URL exists or not.
If the URL exists, you pass one step.
(Some servers, service a 404 page a valid HTML page. which has propriate HTML page structure like body, head, etc. But it has only text "This page couldn'!t find. 404 error bla bla..)
If URL is 200-OK, then you should check whether fetched thing is object and whether nodes are set.
That's the code i used in my pages.
function url_exists($url){
if ((strpos($url, "http")) === false) $url = "http://" . $url;
$headers = #get_headers($url);
// print_r($headers);
if (is_array($headers)){
if(strpos($headers[0], '404 Not Found'))
return false;
else
return true;
}
else
return false;
}
$pageAddress='http://www.google.com';
if ( url_exists($pageAddress) ) {
$htmlPage->load_file( $pageAddress );
} else {
echo 'url doesn t exist, i stop';
return;
}
if( $htmlPage && is_object($htmlPage) && isset($htmlPage->nodes) )
{
// do your work here...
} else {
echo 'fetched page is not ok, i stop';
return;
}

For those arriving here via a search engine (as I did), after reading the info (and linked bug-report) above, I started some code-prodding and ended up fixing my problems with 2 extra checks after loading the dom;
$html = file_get_html('<your url here>');
// first check if $html->find exists
if (method_exists($html,"find")) {
// then check if the html element exists to avoid trying to parse non-html
if ($html->find('html')) {
// and only then start searching (and manipulating) the dom
}
}

I'm having the same error come up in my logs and apart from the solutions mentioned above, it could also be that there is no 'span' in the document. I get the same error when searching for divs with a particular class that doesn't exist on the page, but when searching for something that I know exists on the page, the error doesn't pop up.

your script is OK.
I receive this error when it doase not find the element that i'm looking for on that page.
In your case, please check if the page that you are accessing it has 'SPAN' element

Simplest solution to this problem
if ($html = file_get_html("http://www.semager.de/api/keyword.php?q=". urlencode($keyword) ."&lang=de&out=html&count=2&threshold=") {
} else {
// do something else because couldn't find html
}

Error means, the find() function is either not defined yet or not available. Make sure you have loaded or include related function.

Loading an HTML page in PHP

I'm trying to load an HTML page by using a URL. This is what I'm doing now to find the count of images on a page:
$html = "http://stackoverflow.com/";
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('*');
$count = 0;
foreach ($tags as $tag) {
if (strcmp($tag->tagName, "img") == 0) {
$count++;
}
}
echo $count;
I know this isn't an efficient way to do this, I just set it up as an example. Each time, count is 0. But there are images on the page. Which brings me to believe the page isn't loading right. What am I doing wrong? Thanks.

Tag names in HTML are canonically in upper-case, however you can avoid the issue by using strcasecmp instead of strcmp.
Or avoid both problems by doing it properly:
$count = $doc->getElementsByTagName('img')->length;

From the docs
DOMDocument::loadHTML — Load HTML from a string
It's signature is quite clear about this, too:
public bool DOMDocument::loadHTML ( string $source [, int $options = 0 ] )
You could try using DOMDocument::loadHTMLFile, or simply get the markup of the given url using file_get_contents or a cURL request (whichever works best for you).
And please don't use the error-suppression operator # of death if something emits a notice/warning/error, there's a problem. Don't ignore it, fix it!

SimpleHtmlDOM, PHP, Fatal Error: Call to a member function find() on a non-object in C:\xampp\htdocs [duplicate]

I am using this library (PHP Simple HTML DOM parser) to parse a link, here's the code:
function getSemanticRelevantKeywords($keyword){
$results = array();
$html = file_get_html("http://www.semager.de/api/keyword.php?q=". urlencode($keyword) ."&lang=de&out=html&count=2&threshold=");
foreach($html->find('span') as $e){
$results[] = $e->plaintext;
}
return $results;
}
but I am getting this error when I output the results:
Fatal error: Call to a member function find() on a non-object in
/var/www/vhosts/efamous.de/subdomains/sandbox/httpdocs/getNewTrusts.php
on line 25
(line 25 is the foreach loop), the odd thing is that it outputs everything (at least seemingly) correctly but I still get that error and can't figure out why.

The reason for this error is: the simple HTML DOM does not return the object if the size of the response from url is greater than 600000.
You can void it by changing the simple_html_dom.php file. Remove strlen($contents) > MAX_FILE_SIZE from the if condition of the file_get_html function.
This will solve your issue.

You just need to increase CONSTANT MAX_FILE_SIZE in file simple_html_dom.php.
For example:
define('MAX_FILE_SIZE', 999999999999999);

This error usually means that $html isn't an object.
It's odd that you say this seems to work. What happens if you output $html?
I'd imagine that the url isn't available and that $html is null.
Edit:
Looks like this may be an error in the parser. Someone has submitted a bug and added a check in his code as a workaround.

Before file_get_html/load_file method, you should first check if URL exists or not.
If the URL exists, you pass one step.
(Some servers, service a 404 page a valid HTML page. which has propriate HTML page structure like body, head, etc. But it has only text "This page couldn'!t find. 404 error bla bla..)
If URL is 200-OK, then you should check whether fetched thing is object and whether nodes are set.
That's the code i used in my pages.
function url_exists($url){
if ((strpos($url, "http")) === false) $url = "http://" . $url;
$headers = #get_headers($url);
// print_r($headers);
if (is_array($headers)){
if(strpos($headers[0], '404 Not Found'))
return false;
else
return true;
}
else
return false;
}
$pageAddress='http://www.google.com';
if ( url_exists($pageAddress) ) {
$htmlPage->load_file( $pageAddress );
} else {
echo 'url doesn t exist, i stop';
return;
}
if( $htmlPage && is_object($htmlPage) && isset($htmlPage->nodes) )
{
// do your work here...
} else {
echo 'fetched page is not ok, i stop';
return;
}

For those arriving here via a search engine (as I did), after reading the info (and linked bug-report) above, I started some code-prodding and ended up fixing my problems with 2 extra checks after loading the dom;
$html = file_get_html('<your url here>');
// first check if $html->find exists
if (method_exists($html,"find")) {
// then check if the html element exists to avoid trying to parse non-html
if ($html->find('html')) {
// and only then start searching (and manipulating) the dom
}
}

I'm having the same error come up in my logs and apart from the solutions mentioned above, it could also be that there is no 'span' in the document. I get the same error when searching for divs with a particular class that doesn't exist on the page, but when searching for something that I know exists on the page, the error doesn't pop up.

your script is OK.
I receive this error when it doase not find the element that i'm looking for on that page.
In your case, please check if the page that you are accessing it has 'SPAN' element

Simplest solution to this problem
if ($html = file_get_html("http://www.semager.de/api/keyword.php?q=". urlencode($keyword) ."&lang=de&out=html&count=2&threshold=") {
} else {
// do something else because couldn't find html
}

Error means, the find() function is either not defined yet or not available. Make sure you have loaded or include related function.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

file_get_html returning empty page - php

Instead of echo $html->find("span.fk-font-verybig"); try echo reset( $html->find("span.fk-font-verybig") );

Related

PHP SimplePie Error: $item->get_enclosure() always return true

newline in json_encode() output

image_container remains null therefore it throws the error? [duplicate]

Loading an HTML page in PHP

SimpleHtmlDOM, PHP, Fatal Error: Call to a member function find() on a non-object in C:\xampp\htdocs [duplicate]

Categories

Resources