Unfortunately I can't check it right now, because the XML (which will be on another server) is offline. The url to the xml file will look like this: http://url.com:123/category?foo=bar. It comes with no .xml file extension as you can see. I forgot to insert a file check to avoid error messages printing out the url of the xml file.
simple_load_file works fine with that URL, but I'm not sure about file_exists!
Would this work?:
if(file_exists('http://url.com:123/category?foo=bar')) {
$xml = simplexml_load_file('http://url.com:123/category?foo=bar');
//stuff happens here
} else{echo 'Error message';}
I'm not sure since file_exists doesn't work with URLs.
Thank you!
As you suspect, file_exists() doesn't work with URLs, but fopen() and fclose() do:
if (fclose(fopen("http://url.com:123/category?foo=bar", "r"))) {
$xml = simplexml_load_file('http://url.com:123/category?foo=bar');
//stuff happens here
} else {
echo 'Error message';
}
It is not really useful, if you just try to fetch the data to parse it. Especially if the URL you call is a program/script itself. This will just mean that the script is executed twice.
I suggest you fetch the data with file_get_contents(), handle/catch the errors and parse the fetched data.
Just blocking the errors:
if ($xml = #file_get_contents($url)) {
$element = new SimpleXMLElement($xml);
...
}
Related
I'm using file_get_contents as below and set cron job to run that file every hour, so it opens the described url which is for running some other functions. Now I have two questions completely similar.
<?php
file_get_contents('http://107.150.52.251/~IdidODiw/AWiwojdPDOmwiIDIWDIcekSldcdndudsAoiedfiee1.php');
?>
1) if the above url returns null value, does it store anything on server (temporory value or log)?
2) if the above url returns error, does it store anything like errors or temporary values to server permanently?
The function itself does not leave any trace.
Since you are running this code in a cron job, you cannot directly inspect its output. Therefore you need to log the result to a log file. Look into monolog for instance.
You will then log the result of your function like this :
$contents = file_get_contents( ... );
if($contents == false){
$log->error("An error occurred");
} else {
$log->debug("Result", array('content' => $content));
}
If you are suspecting anything wrong with the above command or want to debug it . You can print the error / success msg with the following code and re-direct it to log file.
$error = error_get_last();
echo $error['message'];
I'm mining data from site, but there it paginator, but I need to get all pages.
Link to the next page is written in link tag with rel=next. If there are no more pages, the link tag is missing. I created function called getAll which should call self again and again until there is the link tag.
function getAll($url, &$links) {
$dom = file_get_html ($url); // create dom object from $url
$tmp = $dom->find('link[rel=next]', 0); // find link rel=next
if(is_object($tmp)){ // is there the link tag?
$link = $tmp->getAttribute('href'); // get url of next page - href attribute
$links[] = $link; // insert url into array
getAll($link, $links); // call self
}else{
return $links; // there are no more urls, return the array
}
}
// usage
$links = array();
getAll('http://www.zbozi.cz/vyrobek/apple-iphone-5/', $links);
print_r($links); // dump the links
But I have a problem, when I run the script the message "No data received" appear in Chrome. I don't have any idea about error or something. The function should works, because when I don't use it again it-self it returns one link - to the second page.
I think the problem is in bad syntax or bad pointer usage.
Could you please help me?
I don't know what file_get_html or find should do, but this should work:
<?php
function getAll($url, &$links) {
$dom = new DOMDocument();
$dom->loadHTML(file_get_contents($url));
$linkElements = $dom->getElementsByTagName('link');
foreach ($linkElements as $link => $content) {
if ($content->hasAttribute('rel') && $content->getAttribute('rel') === 'next') {
$nextURL = $content->getAttribute('href');
$links[] = $nextURL;
getAll($nextURL, $links);
}
}
}
$links = array();
getAll('http://www.zbozi.cz/vyrobek/apple-iphone-5/', $links);
print_r($links);
Firstly, this could be easier. Without an error message this could be anything from a DNS error to a corrupted space character inside your file. So if you haven't, try adding this to the top of your script:
error_reporting(E_ALL);
ini_set("display_errors", "1");
It should reveal any error that might have taken place. But if that doesn't work I have two ideas:
You can't have a syntax error because then the script wouldn't even run. You said that removing the recursion yielded a result so the script must work.
One possibility is that it's timing out. This depends on the server configuration. Try adding
echo $url, "<br>";
flush();
to the top of getAll. If you receive any of the links this is your problem.
This can be fixed by calling a function like set_time_limit(0).
Another possibility is a connection error. This could be caused by coincidence or a server configuration limit. I can't be certain but I know some hosting providers limit file_get_contents and curl requests. There is a possibility your scripts are limited to one external request per execution.
Besides that there is nothing I could think of that can really go wrong with your script. You could remove the recursion and run the function in a while loop. But unless you expect a lot pages there is no need for such a modification.
And finally, the library you are using for DOM parsing will either return a DOM element object or null. So you can change if(is_object($tmp)){ to if($tmp){. And since you are passing the result by reference, returning a value is pointless. You can safely remove the else statement.
I wish you good luck.
I am having roblems with locating a PHP script to allow me to obtain the contents of a txt file on a remote server, then output to a variable. Outputting something to a variable is not the hard part. It's the picking up and reading the contents of the file that's the hard part. Anyone have any ideas?
I have trawled the forum and can only locate a method that works locally. Not ideal as the target is remote.
The objective really is, how do I find out if a file exists on the remote server and output a status in html.
Ideas?
Assuming your remote server is accessible by http or ftp you can use file_exists():
if (file_exists("http://www.example.com/somefile.txt")) {
echo "Found it!;
}
or
if (file_exists("ftp:user:password#www.example.com/somefile.txt")) {
echo "Found it!;
}
Use this:
$url = 'http://php.net';
$file_headers = #get_headers($url);
if($file_headers[0] == 'HTTP/1.1 404 Not Found') {
echo "URL does not exist";
}
else {
echo "URL exists";
}
Source: http://www.php.net/manual/en/function.file-exists.php#75064
You can try to use this code:
if (file_exists($path)) {
echo "it exists";
} else {
echo "it does not exist";
}
As you can see $path is the path of your file. Of course you can write anything else instead of those echo.
Accessing files on other servers can be quite tricky! If you have access to the file via ftp, you can use ftp to fetch the file, for example with ftp_fget().
If you do not have access to the file-system via ssh, you only can check the response the server gives when requesting the file. If the server responds with an error 404, the file is either not existent or it is not accessible via http due to the server configuration.
You can check this through curl, see this tutorial for a detailled explanation of obtaining the response code through curl.
I know this is an old thread, but as Lars Ebert points out, checking for the existence of a file on a remote server can be tricky, so checking the server response, using cURL, is how I have been able to do it on our big travel site. Using file_exists() threw an error every time, but checking for a "200 OK" has proved quite successful. Here is the code we are using to check for images for our hotel listings page:
$media_url = curl_init("http://pathto/remote_file.png");
curl_setopt($media_url, CURLOPT_RETURNTRANSFER, true);
$media_img = curl_exec($media_url);
$server_response = curl_getinfo($media_url, CURLINFO_HTTP_CODE);
if($server_response != 200){
echo "pathto/graphics/backup_image.png";
}else{
echo "http://pathto/remote_file.png";
}
Where "http://pathto/remote_file.png" is the remote image we seek, but we need to know whether it is really there. And "pathto/graphics/backup_image.png" is what we display if the remote image does not exist.
I know it's awfully verbose, compared to file_exists(), but it's also more accurate, at least so far.
The following function receives a string parameter representing an url and then loads the url in a simple_html_dom object. If the loading fails, it attemps to load the url again.
public function getSimpleHtmlDomLoaded($url)
{
$ret = false;
$count = 1;
$max_attemps = 10;
while ($ret === false) {
$html = new simple_html_dom();
$ret = $html->load_file($url);
if ($ret === false) {
echo "Error loading url: $url\n";
sleep(5);
$count++;
$html->clear();
unset($html);
if ($count > $max_attemps)
return false;
}
}
return $html;
}
However, if the url loading fails one time, it keeps failing for the current url, and after the max attemps are over, it also keeps failing in the next calls to the function with the rest of the urls it has to process.
It would make sense to keep failing if the urls were temporarily offline, but they are not (I've checked while the script was running).
Any ideas why this is not working properly?
I would also like to point out, that when starts failing to load the urls, it only gives a warning (instead of multiple ones), with the following message:
PHP Warning: file_get_contents(http://www.foo.com/resource): failed
to open stream: HTTP request failed! in simple_html_dom.php on line
1081
Which is prompted by this line of code:
$ret = $html->load_file($url);
I have tested your code and it works perfectly for me, every time I call that function it returns valid result from the first time.
So even if you load the pages from the same domain there can be some protection on the page or server.
For example page can look for some cookies, or the server can look for your user agent and if it see you as an bot it would not serve correct content.
I had similar problems while parsing some websites.
Answer for me was to see what is some page/server expecting and make my code simulate that. Everything, from faking user agent to generating cookies and such.
By the way have you tried to create a simple php script just to test that 'simple html dom' parser can be run on your server with no errors? That is the first thing I would check.
On the end I must add that in one case, while I failed in numerous tries for parsing one page, and I could not win the masking game. On the end I made an script that loads that page in linux command line text browser lynx and saved the whole page locally and then I parsed that local file which worked perfect.
may be it is a problem of load_file() function itself.
Problem was, that the function error_get_last() returns all privious erros too, don't know, may be depending on PHP version?
I solved the problem by changing it to (check if error changed, not if it is null)
(or use the non object function: file_get_html()):
function load_file()
{
$preerror=error_get_last();
$args = func_get_args();
$this->load(call_user_func_array('file_get_contents', $args), true);
// Throw an error if we can't properly load the dom.
if (($error=error_get_last())!==$preerror) {
$this->clear();
return false;
}
}
This is a bit of a long shot, but I figured I'd ask anyway. I have an application that has web-based code editing, like you find on Github, using the ACE editor. The problem is, it is possible to edit code that is within the application itself.
I have managed to detect parse errors before saving the file, which works great, but if the user creates a runtime error, such as MyClass extends NonExistentClass, the file passes the parse check, but saves to the filesystem, killing the application.
Is there anyway to test if the new code will cause a runtime error before I save it to the filesystem? Seems completely counter-intuitive, but I figured I'd ask.
Possibly use register_shutdown_function to build a JSON object containing information about the fatal error. Then use an AJAX call to test the file; parse the returned value from the call to see if there is an error. (Obviously you could also run the PHP file and parse the JSON object without using AJAX, just thinking about what would be the best from a UX standpoint)
function my_shutdown() {
$error = error_get_last();
if( $error['type'] == 1 ) {
echo json_encode($error);
}
}
register_shutdown_function('my_shutdown');
Will output something like
{"type":1,"message":"Fatal error message","line":1}
Prepend that to the beginning of the test file, then:
$.post('/test.php', function(data) {
var json = $.parseJSON(data);
if( json.type == 1 ) {
// Don't allow test file to save?
}
});
Possibly helpful: php -f <file> will return a non-zero exit code if there's a runtime error.
perhaps running the code in a separate file first and attach some fixed code on the bottom to check if it evaluates?