Unable to download XML file on server, works fine on other - php

I have had an application running successfully for a couple months that relies on a cron job to get an xml feed of air pollution statistics. Since January it has run without error, but this morning from 7:00 it has not read the data. The relevant code is as follows:
<?php
define('FEED_URL', 'http://www.beijingaqifeed.com/BeijingAQI/BeijingAir.xml');
$contents = file_get_contents(FEED_URL);
if ($contents === false) echo "READ FAILED";
echo "FILE_GET_CONTENTS SIZE IS " . strlen($contents) . "<br>\n";
If I run this on my machine at home, it works:
FILE_GET_CONTENTS SIZE IS 21538
If it runs on my server it does not:
FILE_GET_CONTENTS SIZE IS 0
I have confirmed with support at the server site that they can browse the url and see the xml data, so there is no firewall or anything blocking this. And, as I say, this has worked successfully over 1000 times (as measured by entries in my database) until this morning, and now it always fails. I have no connection at the data supplier so I can't investigate from their side.
Can anyone suggest why this started failing, and what I could try doing? I have tried fread() and file(), with the same results.
Thanks...
(I have checked allow_url_fopen is turned on)

It is in this case probaly something on server blocking Your PHP , might be OS update , or something like that. In past I had similar problems , but , mines was about unkillable daemon , linked with cron job , so , me and support team had big headaches of turning it off. In this case , this is crucial for further investigation , this line: FILE_GET_CONTENTS SIZE IS 21538 , if someone could obtain it and read it , there's the catch. This answer might not be helpful at all , but , as I stated , that error line is key.
Odd , I've just checked XML URL , and it works , normally , as it should.

probably permission issue. Try to add the following after file_get_contents to see their response
if (!empty($http_response_header))
{
var_dump($http_response_header);
//to see what tou get back
}

First I thought it would be permissions but, that isn't the case.
Try changing server, maybe your IP is blocked or something?
<?php
function download($website){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$path);
curl_setopt($ch, CURLOPT_FAILONERROR,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
$retValue = curl_exec($ch);
curl_close($ch);
return $retValue;
}
$XML = download('http://www.beijingaqifeed.com/BeijingAQI/BeijingAir.xml');
var_dump($XML);
Perform:
wget http://www.beijingaqifeed.com/BeijingAQI/BeijingAir.xml via SSH (if possible) and see the response.

It is most likely that there is 500 error , so - their side. Depends what do they use , but many admins (like me) avoid to point out server errors , replacing them with useless comments or simply , by removing them. This is done to prevent intruders , as error code could stick attacker to server under my administration , and if it gets down - my fault.

This is not a final answer, but it clarifies things somewhat. I tried uploading the file to the server and reading it from there the same way (http:/young-0/testfile.xml) and it succeeded. Then I tried getting "http://www.beijingaqifeed.com" from the server - and that failed. So the bom was a red herring, the connection is being blocked either by my provider (who says it is not them) or the site is refusing connections from my server - thanks to everyone who helped.
For now I have returned to using the twitter feed, which is far less reliable but does have the advantage that I am able to read it.

Related

Gateway Timeout 504 on multiple requests. Apache

I have an XML file localy. It contains data from marketplace.
It roughly looks like this:
<offer id="2113">
<picture>https://anotherserver.com/image1.jpg</picture>
<picture>https://anotherserver.com/image2.jpg</picture>
</offer>
<offer id="2117">
<picture>https://anotherserver.com/image3.jpg</picture>
<picture>https://anotherserver.com/image4.jpg</picture>
</offer>
...
What I want is to save those images in <picture> node localy.
There are about 9,000 offers and about 14,000 images.
When I iterate through them I see that images are being copied from that another server but at some point it gives 504 Gateway Timeout.
Thing is that sometimes error is given after 2,000 images sometimes way more or less.
I tried getting only one image 12,000 times from that server (i.e. only https://anotherserver.com/image3.jpg) but it still gave the same error.
As I've read, than another server is blocking my requests after some quantity.
I tried using PHP sleep(20) after every 100th image but it still gave me the same error (sleep(180) - same). When I tried local image but with full path it didn't gave any errors. Tried second server (non local) the same thing occured.
I use PHP copy() function to move image from that server.
I've just used file_get_contents() for testing purposes but got the same error.
I have
set_time_limit(300000);
ini_set('default_socket_timeout', 300000);
as well but no luck.
Is there any way to do this without chunking requests?
Does this error occur on some one image? Would be great to catch this error or just keep track of the response delay to send another request after some time if this can be done?
Is there any constant time in seconds that I have to wait in order to get those requests rollin'?
And pls give me non-curl answers if possible.
UPDATE
Curl and exec(wget) didn't work as well. They both gone to same error.
Can remote server be tweaked so it doesn't block me? (If it does).
p.s. if I do: echo "<img src = 'https://anotherserver.com/image1.jpg'" /> in loop for all 12,000 images, they show up just fine.
Since you're accessing content on a server you have no control over, only the server administrators know the blocking rules in place.
But you have a few options, as follows:
Run batches of 1000 or so, then sleep for a few hours.
Split the request up between computers that are requesting the information.
Maybe even something as simple as changing the requesting user agent info every 1000 or so images would be good enough to bypass the blocking mechanism.
Or some combination of all of the above.
I would suggest you to try following
1. reuse previously opened connection using CURL
$imageURLs = array('https://anotherserver.com/image1.jpg', 'https://anotherserver.com/image2.jpg', ...);
$notDownloaded = array();
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
foreach ($imageURLs as $URL) {
$filepath = parse_url($URL, PHP_URL_PATH);
$fp = fopen(basename($filepath), "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_URL, $URL);
curl_exec($ch);
fclose($fp);
if (curl_getinfo($ch, CURLINFO_RESPONSE_CODE) == 504) {
$notDownloaded[] = $URL;
}
}
curl_close($ch);
// check to see if $notDownloaded is empty
If images are accessible via both https and http try to use http instead. (this will at least speed up the downloading)
Check response headers when 504 is returned as well as when you load url your browser. Make sure there are no X-RateLimit-* headers. BTW what is the response headers actually?

File_get_contents failed to open stream: http request failed 404 not found

I need to add an svg file to a website and apply a class to this svg. This is frustrating, I've tried different solutions posted on here and none of them have worked for me. This worked on a different server, but after being moved to a new server it no longer works. Here is how I am calling it in the php:
<?php $svg = file_get_contents("http://www.folklorecoffee.com/wp-content/uploads/2018/04/folkloretextwhite-1.svg");
$dom = new DOMDocument();
$dom->loadHTML($svg);
foreach($dom->getElementsByTagName('svg') as $element) {
$element->setAttribute('class','logo-light');
}
$dom->saveHTML();
$svg = $dom->saveHTML();
echo $svg;?>
I'm getting these warnings:
WARNING: file_get_contents(http://www.folklorecoffee.com/wp-content/uploads/2018/04/folkloretextwhite-1.svg): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
in /home/folklorecoffee/public_html/wp-content/themes/lily/header.php on line 34
WARNING: DOMDocument::loadHTML(): Empty string supplied as input in /home/folklorecoffee/public_html/wp-content/themes/lily/header.php on line 36
But when I test the url in my browser, it comes up fine. Not sure why I'm getting a 404 error. What am I doing wrong? Thank you in advance!
The request is working for me:
echo $svg = file_get_contents("http://www.folklorecoffee.com/wp-content/uploads/2018/04/folkloretextwhite-1.svg"); // prints the SVG data as expected
You could try the same request via a different mechanism, such as cURL:
$url = "http://www.folklorecoffee.com/wp-
content/uploads/2018/04/folkloretextwhite-1.svg";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
curl_close($ch);
echo $data; // prints the SVG data as expected
But if the URL is the same, I expect you will get the same result.
Given the above, the best answer is for you to do troubleshooting on your own about what is happening, since it is very specific to your setup.
You may gain some insight by looking at the Apache logs for the request. Does it show up? Does the URL it displays match what you expect?
Can you make other requests with file_get_contents to your domain that work? To the uploads folder? To other domains?
Check with your hosting provider to see if they can explain. There may be a configuration item that is interfering in some way.
Finally, you may want try investigating more carefully why the logo cannot be loaded from the file system. Is it a permissions issue? Can you load any file from the permissions directory?
I tried everyone's suggestions and I thank you all for taking your time to try to offer help, I was never able to get it to work and continued to get the same error.
Interestingly any file on my server that I tried to call with file_get_contents would get the 404 error. It didn't matter if I did a local path or the exact url, the paths were correct. Here's what it's interesting: it would work if I pointed to a file that was NOT on my server. So I'm thinking there must be a configuration setting somewhere that I would need to change. I do not know what this setting would be and if someone out there is reading this and experiencing the same issues I hope this helps point you in the right direction.
I was in a time crunch and decided instead to simply write the svg file in with an object tag and that did the trick for now. I'd prefer using a different method, and I'll revisit this again in the future and update this post if I figure out what would fix it. Thank you again for all of your help, I hope this helps someone else.

Calling file() on pastebin URL fails, but on local file or google.com it works

I'm working on a bit of PHP code that depends on a remote file which happens to be hosted on pastebin. The server I am working on has all the necessary functions enabled, as running it with FILE_URL set to http://google.com returns the expected results. I've also verified through php.ini for extra measure.
Everything should work, but it doesn't. Calling file() on a URL formed as such, http://pastebin.com/raw.php?i=<paste id here>, returns a 500 server error. Doing the same on the exact same file hosted locally or on google.com returns a reasonable result.
I have verified that the URL is set to the correct value and verified that the remote page is where I think that it is. I'm at a loss.
ini_set("allow_url_fopen", true);
// Prefer remote (up-to-date) file, fallback to local file
if( ini_get("allow_url_fopen") ){
$file = file( FILE_URL );
}
if(!isset( $file ) || !$file ) {
$file = file( LOCAL_FILE_PATH );
}
I wasn't able to test this, but you should use curl, try something like this:
<?php
$url = "http://pastebin.com/2ZdFcEKh";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
Pastebin appear to use a protection system that will automatically block IP addresses that issue requests that are "bot-like".
In the case of your example, you will get a 500 server error since the file() command never completes (since their protection system never closes the connection) and there is no timeout facility in your call. The script is probably considered "bot-like" since file() does not pass through all the standard HTTP headers a typical browser would.
To solve this problem, I would recommend investigating cURL and perhaps look at setting a browser user agent as a starting point to grant access to your script. I should also mention that it would be in your interests to investigate whether or not this is considered a breach of the Pastebin user agreement. While I cannot see any reference to using scripts in their FAQ (as of 2012/12/29), they have installed protection against scripts for a reason.

XML/API cannot be retrieved by PHP/curl

Yeah, I'm stumped. I'm getting nothing. curl_exec is returning no content. I've tried file_get_contents, but that completely times out. I'm attempting to get an API XML from my Subsonic media server and display it on my web server (different servers). The end result would be that I can have people log in to my web server with the media server account. I can deal with the actual parsing later, but I can't even grab the XML right now. I've tried their forums, but haven't gotten much help since they're not really PHP inclined. Figure I'd ask here.
$url = "http://{$subserver}/rest/getUser.view?u={$username}&p={$password}&username={$username}&v=1.8.0&c={$appID}";
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_HEADER, 0);
$result = curl_exec($c);
curl_close($c);
echo $result;
This returns nothing. The variables are defined correctly, and I get the same response as if I typed in the whole URL. Here is their API page: http://www.subsonic.org/pages/api.jsp I've even tried with their "ping" function - still empty
The url itself looks fine. In the web browser, it returns:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<subsonic-response xmlns="http://subsonic.org/restapi" status="ok" version="1.8.0">
<user username="xxxxxx" email="xxxxxx#xxxxxx.com" scrobblingEnabled="false" adminRole="true" settingsRole="true" downloadRole="true" uploadRole="true" playlistRole="true" coverArtRole="true" commentRole="true" podcastRole="true" streamRole="true" jukeboxRole="true" shareRole="true"/>
</subsonic-response>
I admit I've never used XML, but according to everything I've read... this should work. And it does work, with other random XML files I found on the web.
it might have something to do with the fact that it's not an ".xml" file, but a generated via url xml, as this same exact code will work with some random xml file I found ( http://www.w3schools.com/xml/note.xml )
Any thoughts?

file_get_contents() GET request not showing up on my webserver log

I've got a simple php script to ping some of my domains using file_get_contents(), however I have checked my logs and they are not recording any get requests.
I have
$result = file_get_contents($url);
echo $url. ' pinged ok\n';
where $url for each of the domains is just a simple string of the form http://mydomain.com/, echo verifies this. Manual requests made by myself are showing.
Why would the get requests not be showing in my logs?
Actually I've got it to register the hit when I send $result to the browser. I guess this means the webserver only records browser requests? Is there any way to mimic such in php?
ok tried curl php:
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "getcorporate.co.nr");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
same effect though - no hit registered in logs. So far it only registers when I feed the http response back from my script to the browser. Obviously this will only work for a single request and not a bunch as is the purpose of my script.
If something else is going wrong, what debugging output can I look at?
Edit: D'oh! See comments below accepted answer for explanation of my erroneous thinking.
If the request is actually being made, it would be in the logs.
Your example code could be failing silently.
What happens if you do:
<?PHP
if ($result = file_get_contents($url)){
echo "Success";
}else{
echo "Epic Fail!";
}
If that's failing, you'll want to turn on some error reporting or logging and try to figure out why.
Note: if you're in safe mode, or otherwise have fopen url wrappers disabled, file_get_contents() will not grab a remote page. This is the most likely reason things would be failing (assuming there's not a typo in the contents of $url).
Use curl instead?
That's odd. Maybe there is some caching afoot? Have you tried changing the URL dynamically ($url = $url."?timestamp=".time() for example)?

Categories