Is file_get_contents() a blocking function? - php

I am connecting to an unreliable API via file_get_contents. Since it's unreliable, I decided to put the api call into a while loop thusly:
$resultJSON = FALSE;
while(!$resultJSON) {
$resultJSON = file_get_contents($apiURL);
set_time_limit(10);
}
Putting it another way: Say the API fails twice before succeeding on the 3rd try. Have I sent 3 requests, or have I sent however many hundreds of requests as will fit into that 3 second window?

file_get_contents(), like basically all functions in PHP, is a blocking call.

Yes, it is a blocking function. You should also check to see if the value is specifically "false". (Note that === is used, not ==.) Lastly, you want to sleep for 10 seconds. set_time_limit() is used to set the max execution time before it is automatically killed.
set_time_limit(300); //Run for up to 5 minutes.
$resultJSON = false;
while($resultJSON === false)
{
$resultJSON = file_get_contents($apiURL);
sleep(10);
}

Expanding on #Sammitch suggestion to use cURL instead of file_get_contents():
<?php
$apiURL = 'http://stackoverflow.com/';
$curlh = curl_init($apiURL);
// Use === not ==
// if ($curlh === FALSE) handle error;
curl_setopt($curlh, CURLOPT_FOLLOWLOCATION, TRUE); // maybe, up to you
curl_setopt($curlh, CURLOPT_HEADER, FALSE); // or TRUE, according to your needs
curl_setopt($curlh, CURLOPT_RETURNTRANSFER, TRUE);
// set your timeout in seconds here
curl_setopt($curlh, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlh, CURLOPT_TIMEOUT, 30);
$resultJSON = curl_exec($curlh);
curl_close($curlh);
// if ($resultJSON === FALSE) handle error;
echo "$resultJSON\n"; // Now process $resultJSON
?>
There are a lot more curl_setopt options. You should check them out.
Of course, this assumes you have cURL available.

I am not aware of any function in PHP that does not "block". As an alternative, and if your server permits such things, you can:
Use pcntl_fork() and do other stuff in your script while waiting for the API call to go through.
Use exec() to call another script in the background [using &] to do the API call for you if pcntl_fork() is unavailable.
However, if you literally cannot do anything else in your script without a successful call to that API then it doesn't really matter if the call 'blocks' or not. What you should really be concerned about is spending so much time waiting for this API that you exceed the configured max_execution_time and your script is aborted in the middle without being properly completed.
$max_calls = 5;
for( $i=1; $i<=$max_calls; $i++ ) {
$resultJSON = file_get_contents($apiURL);
if( $resultJSON !== false ) {
break;
} else if( $i = $max_calls ) {
throw new Exception("Could not reach API within $max_calls requests.");
}
usleep(250000); //wait 250ms between attempts
}
It's worth noting that file_get_contents() has a default timeout of 60 seconds so you're really in danger of the script being killed. Give serious consideration to using cURL instead since you can set much more reasonable timeout values.

Related

Inconsistencies with CURL Multi PHP

When I run a check on 10 urls, if I am able to get a connection with the host server, the handle will return a success message (CURLE_OK)
When processing each handle if a server refuses the connection, the handle will include a error message.
The problem
I assumed that when we get a bad handle, CURL will mark this handle but continue to process the unprocessed handles, however this is not what seems to happen.
When we come across a bad handle, CURL will mark this handle as bad, but will not process the remaining unprocessed handles.
This can be hard to detect, if I do get a connection with all handles, which is what happens most of the time, then the problem is not visible.(CURL only stops on first bad connection);
For the test, I had to find a suitable site which loads slow/refuses x amount simultaneous of connections.
set_time_limit(0);
$l = array(
'http://smotri.com/video/list/',
'http://smotri.com/video/list/sports/',
'http://smotri.com/video/list/animals/',
'http://smotri.com/video/list/travel/',
'http://smotri.com/video/list/hobby/',
'http://smotri.com/video/list/gaming/',
'http://smotri.com/video/list/mult/',
'http://smotri.com/video/list/erotic/',
'http://smotri.com/video/list/auto/',
'http://smotri.com/video/list/humour/',
'http://smotri.com/video/list/film/'
);
$mh = curl_multi_init();
$s = 0;
$f = 10;
while($s <= $f)
{
$ch = curl_init();
$curlsettings = array(
CURLOPT_URL => $l[$s],
CURLOPT_TIMEOUT => 0,
CURLOPT_CONNECTTIMEOUT => 0,
CURLOPT_RETURNTRANSFER => 1
);
curl_setopt_array($ch, $curlsettings);
curl_multi_add_handle($mh,$ch);
$s++;
}
$active = null;
do
{
curl_multi_exec($mh,$active);
curl_multi_select($mh);
$info = curl_multi_info_read($mh);
echo '<pre>';
var_dump($info);
if($info['result'] === CURLE_OK)
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
if($info['result'] != 0)
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
} while ($active > 0);
curl_multi_close($mh);
I have dumped $info in the script which asks the Multi Handle if there is any new information on any handles whilst running.
When the script has ended we will see some bool(false) - when no new information was available(handles were still processing), along with all handles if all was successful or limited handles if one handle failed.
I have failed at fixing this, its probably something I have overlooked and I have gone too far down the road on attempting to fix things which are not relevant.
Some attempts at fixing this was.
Assign each $ch handle to a array - $ch[1], $ch[2] etc (instead of
adding current $ch handle to multi_handle then overwriting - as whats
in the test)
Removing handles after success/failure with
curl_​multi_​remove_​handle
Set CURLOPT_CONNECTTIMEOUT and CURLOPT_TIMEOUT to infinity.
many more.(I will update this post as I have forgotten all of them)
Testing this with Php version 5.4.14
Hopefully I have illustrated the points well enough.
Thanks for reading.
I've been mucking around with your script for a while now trying to get it to work.It was only when I read Repeated calls to this function will return a new result each time, until a FALSE is returned as a signal that there is no more to get at this point., for http://se2.php.net/manual/en/function.curl-multi-info-read.php, that I realized a while loop might work.
The extra while loop makes it behave exactly how you'd expect. Here is the output I get:
http://smotri.com/video/list/sports/ failed
http://smotri.com/video/list/travel/ failed
http://smotri.com/video/list/gaming/ failed
http://smotri.com/video/list/erotic/ failed
http://smotri.com/video/list/humour/ failed
http://smotri.com/video/list/animals/ success
http://smotri.com/video/list/film/ success
http://smotri.com/video/list/auto/ success
http://smotri.com/video/list/ failed
http://smotri.com/video/list/hobby/ failed
http://smotri.com/video/list/mult/ failed
Here's the code I used for testing:
<?php
set_time_limit(0);
$l = array(
'http://smotri.com/video/list/',
'http://smotri.com/video/list/sports/',
'http://smotri.com/video/list/animals/',
'http://smotri.com/video/list/travel/',
'http://smotri.com/video/list/hobby/',
'http://smotri.com/video/list/gaming/',
'http://smotri.com/video/list/mult/',
'http://smotri.com/video/list/erotic/',
'http://smotri.com/video/list/auto/',
'http://smotri.com/video/list/humour/',
'http://smotri.com/video/list/film/'
);
$mh = curl_multi_init();
$s = 0;
$f = 10;
while($s <= $f)
{
$ch = curl_init();
if($s%2)
{
$curlsettings = array(
CURLOPT_URL => $l[$s],
CURLOPT_TIMEOUT_MS => 3000,
CURLOPT_RETURNTRANSFER => 1,
);
}
else
{
$curlsettings = array(
CURLOPT_URL => $l[$s],
CURLOPT_TIMEOUT_MS => 4000,
CURLOPT_RETURNTRANSFER => 1,
);
}
curl_setopt_array($ch, $curlsettings);
curl_multi_add_handle($mh,$ch);
$s++;
}
$active = null;
do
{
$mrc = curl_multi_exec($mh,$active);
curl_multi_select($mh);
while($info = curl_multi_info_read($mh))
{
echo '<pre>';
//var_dump($info);
if($info['result'] === 0)
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
}
else
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
}
}
} while ($active > 0);
curl_multi_close($mh);
Hope that helps. For testing just adjust CURLOPT_TIMEOUT_MS to your internet connection. I made it so it alternates between 3000 and 4000 milliseconds as 3000 will fail and 4000 usually succeeds.
Update
After going through the PHP and libCurl docs I have found how curl_multi_exec works (in libCurl its curl_multi_perform). Upon first being called it starts handling transfers for all the added handles (added before via curl_multi_add_handle).
The number it assigns $active is the number of transfers still running. So if it's less than the total number of handles you have then you know one or more transfers are complete. So curl_multi_exec acts as a kind of progress indicator as well.
As all transfers are handled in a non-blocking fashion (transfers can finish simultaneously) the while loop curl_multi_exec's in cannot represent each iteration of completed url requests.
All data is stored in a queue so as soon as one or more transfers are complete you can call curl_multi_info_read to fetch this data.
In my original answer I had curl_multi_info_read in a while loop. This loop would keep iterating until curl_multi_info_read found no remaining data in the queue. After which the outer while loop would move onto the next iteration if $active != 0 (meaning curl_multi_exec reported transfers still not complete).
To summarize, the outer loop keeps iterating when there are still transfers not completed and the inner loop iterates only when there's data from a completed transfer.
The PHP documentation is pretty bad for curl multi functions so I hope this cleared a few things up. Below is an alternative way to do the same thing.
do
{
curl_multi_exec($mh,$active);
} while ($active > 0);
// while($info = curl_multi_info_read($mh)) would work also here
for($i = 0; $i <= $f; $i++){
$info = curl_multi_info_read($mh);
if($info['result'] === 0)
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' success<br>';
}
else
{
echo curl_getinfo($info['handle'],CURLINFO_EFFECTIVE_URL) . ' failed<br>';
}
}
From this information you can also see curl_multi_select is not needed as you don't want something that blocks until there is activity.
With the code you provided in your question it only seemed like curl wasn't proceeding after a few failed transfers but there was actually still data queued in the buffer. Your code just wasn't calling curl_multi_info_read enough times. The reason all the successful transfers were picked up by your code is due to PHP being run on a single thread and so the script hanged waiting for the requests. The timeouts for the failed requests didn't impact PHP enough to make it hang/wait that long so the number of iterations the while loop was doing was less than the number of queued data.

file get content or fsockopen - timeout issue

I have a php file called testResponse.php which is only :
<?php
sleep(5);
echo"go";
?>
Now, I'm calling this file from a another page using file_get_contents like this :
$start= microtime(true);
$opts = array('http' =>
array(
'method' => 'GET',
'timeout' => 1
)
);
$context = stream_context_create($opts);
$loc = #file_get_contents("http://www.mywebsite.com/testResponse.php", false, $context);
$end= microtime(true);
echo $end - $start, "\n";
The output is more than 5 sec, which means that my timeout has been ignored...
I followed the advice of this post : stackoverflow.com/questions/3689371
But it seems that hostname cannot be a path (like www.mywebsite.com/testResponse.php) but directly the hostname like www.mywebsite.com.
So I'm stuck to achieve this goal :
Get content of page www.test.com/x.php with constraint :
if test.com doesn't exist or the page x.php doesn't exist returns nothing quickly
if the page exist but takes more than 1 sec to load, abort
else get the content of the file
Edit : By the way, it seems to work when I call this page (testResponse.php) from my local server. Well, it multiply the timeout by 2. For instance, If I have 1 for timeout, I will have echoed something like "2.0054645". But only from local...
The solution is to use PHP's cURL functions. The other question you linked to explains things properly, about the read timeouts vs. the connection timeouts, and so on, but neither of those are truly what you're looking for here. Even the connection timeout won't work, because the connection to testResponse.php is always successful; after that it's waiting, so what you need is an execution timeout. This is where cURL comes in handy.
So, testResponse.php doesn't need to be altered. In your main file, though, try the following code (this is tested and it works on my server):
$start = microtime(true);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.mywebsite.com/testResponse.php");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 1);
$output = curl_exec($ch);
$errno = curl_errno($ch);
if ($errno > 0) {
if ($errno === 28) {
echo "Connection timed out.";
}
else {
echo "Error #" . $errno . ": " . curl_error($ch);
}
}
else {
echo $output;
}
$end = microtime(true);
echo "<br><br>" . ($end - $start);
curl_close($ch);
This sets the execution time of the cURL session, via the CURLOPT_TIMEOUT option you see on line 5. So, when the connection is timed out, $errno will equal 28, the code for cURL's operation timeout error. The rest of the error codes are listed in the cURL documentation, so you can expand the script above to act accordingly.
Finally, because of the CURLOPT_RETURNTRANSFER option that's set, curl_exec($ch) will be set to the content of the retrieved page if the session succeeds. Otherwise, it will equal false.
Hope this helps!
Edit: Removed the statement setting CURLOPT_HEADER. I also, for some reason, was under the impression that curl_exec($ch) set the value of $ch to the returned contents, forgetting that the contents are returned by curl_exec().

Safe image download from PHP

I want to allow my users to upload a file by providing a URL to the image.
Pretty much like imgur, you enter http://something.com/image.png and the script downloads the file, then keeps it on the server and publishes it.
I tried using file_get_contents() and getimagesize(). But I'm thinking there would be problems:
how can I protect the script from 100 users supplying 100 URLs to large images?
how can I determine if the download process will take or already takes too long?
This is actually interesting.
It appears that you can actually track and control the progress of a cURL transfer. See documentation on CURLOPT_NOPROGRESS, CURLOPT_PROGRESSFUNCTION and CURLOPT_WRITEFUNCTION
I found this example and changed it to:
<?php
file_put_contents('progress.txt', '');
$target_file_name = 'targetfile.zip';
$target_file = fopen($target_file_name, 'w');
$ch = curl_init('http://localhost/so/testfile2.zip');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_NOPROGRESS, FALSE);
curl_setopt($ch, CURLOPT_PROGRESSFUNCTION, 'progress_callback');
curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'write_callback');
curl_exec($ch);
if ($target_file) {
fclose($target_file);
}
$_download_size = 0;
function progress_callback($download_size, $downloaded_size, $upload_size, $uploaded_size) {
global $_download_size;
$_download_size = $download_size;
static $previous_progress = 0;
if ($download_size == 0) {
$progress = 0;
}
else {
$progress = round($downloaded_size * 100 / $download_size);
}
if ($progress > $previous_progress) {
$previous_progress = $progress;
$fp = fopen('progress.txt', 'a');
fputs($fp, $progress .'% ('. $downloaded_size .'/'. $download_size .")\n");
fclose($fp);
}
}
function write_callback($ch, $data) {
global $target_file_name;
global $target_file;
global $_download_size;
if ($_download_size > 1000000) {
return '';
}
return fwrite($target_file, $data);
}
write_callback checks whether the size of the data is greater than a specified limit. If it is, it returns an empty string that aborts the transfer. I tested this on 2 files with 80K and 33M, respectively, with a 1M limit. In your case, progress_callback is pointless beyond the second line, but I kept everything in there for debugging purposes.
One other way to get the size of the data is to do a HEAD request but I don't think that servers are required to send a Content-length header.
To answer question one, you simply need to add the appropriate limits in your code. Define how many requests you want to accept in a given amount of time, track your requests in a database, and go from there. Also put a cap on file size.
For question two, you can set appropriate timeouts if you use cURL.

Exit out of a cURL fetch

I'm trying to find a way to only quickly access a file and then disconnect immediately.
So I've decided to use cURL since it's the fastest option for me. But I can't figure out how I should "disconnect" cURL.
With the code below, Apache's access logs says that the file I tried accessing was indeed accessed, but I'm feeling a little iffy about this, because when I just run the while loop without breaking out of it, it just keeps looping. Shouldn't the loop stop when cURL has finished fetching the file? Or am I just being silly; is the loop just restarting constantly?
<?php
$Resource = curl_init();
curl_setopt($Resource, CURLOPT_URL, '...');
curl_setopt($Resource, CURLOPT_HEADER, 0);
curl_setopt($Resource, CURLOPT_USERAGENT, '...');
while(curl_exec($Resource)){
break;
}
curl_close($Resource);
?>
I tried setting the CURLOPT_CONNECTTIMEOUT_MS / CURLOPT_CONNECTTIMEOUT options to very small values, but it didn't help in this case.
Is there a more "proper" way of doing this?
This statement is superflous:
while(curl_exec($Resource)){
break;
}
Instead just keep the return value for future reference:
$result = curl_exec($Resource);
The while loop does not help anything. So now to your question: You can tell curl that it should only take some bytes from the body and then quit. That can be achieved by reducing the CURLOPT_BUFFERSIZE to a small value and by using a callback function to tell curl it should stop:
$withCallback = array(
CURLOPT_BUFFERSIZE => 20, # ~ value of bytes you'd like to get
CURLOPT_WRITEFUNCTION => function($handle, $data) {
echo "WRITE: (", strlen($data), ") $data\n";
return 0;
},
);
$handle = curl_init("http://stackoverflow.com/");
curl_setopt_array($handle, $withCallback);
curl_exec($handle);
curl_close($handle);
Output:
WRITE: (10) <!DOCTYPE
Another alternative is to make a HEAD request by using CURLOPT_NOBODY which will never fetch the body. But it's not a GET request.
The connect timeout settings are about how long it will take until the connect times out. The connect is the phase until the server accepts input from curl and curl starts to know about that the server does. It's not related to the phase when curl fetches data from the server, that's
CURLOPT_TIMEOUT The maximum number of seconds to allow cURL functions to execute.
You find a long list of available options in the PHP Manual: curl_setopt­Docs.
Perhaps that might be helpful?
$GLOBALS["dataread"] = 0;
define("MAX_DATA", 3000); // how many bytes should be read?
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch, CURLOPT_WRITEFUNCTION, "handlewrite");
curl_exec($ch);
curl_close($ch);
function handlewrite($ch, $data)
{
$GLOBALS["dataread"] += strlen($data);
echo "READ " . strlen($data) . " bytes\n";
if ($GLOBALS["dataread"] > MAX_DATA) {
return 0;
}
return strlen($data);
}

Timing out a script portion and allowing the rest to continue

I have a widget that runs on my homepage which is loading xml data from an external source. I want to timeout the xml load after x seconds (lately the other site has been having load issues). Here is the function I have so far. I can't figure out how to make the timer ineract with the simplexml_load_file().
Am I on the right track? Is there a way to make this work? Or is there a better way to do this? If this does timeout, I still need the rest of the page to continue loading, so I can't use set_time_limit(), because that will end all script execution, right?
function timer($end) {
$count = 0;
while($end > $count) {
sleep(1);
$count++;
}
return true;
}
$we = simplexml_load_file('http://forecast.weather.gov/MapClick.php?lat=44.08920&lon=-70.17250&FcstType=xml');
if(timer(3)) return;
So you want to set a timeout for simplexml_load_file(). You can't set it specifically, but you can just set it globally (for all socket based streams) before using the function:
ini_set('default_socket_timeout', 3);
$we = simplexml_load_file($url);
// you can restore the default value after use, if you want
ini_restore('default_socket_timeout');
I would use CURL instead of loading the URL directly...
function getXml($url, $timeout = 0){
$ch = curl_init($url);
curl_setopt_array($ch,array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_TIMEOUT => (int) $timeout
));
if($xml = curl_exec($ch)){
return new SimpleXmlElement($xml);
}
else {
return null;
}
}
//Example
$xmlData = getXml('http://yoururl.com', 2); // 2 second timeout
You could first read the content of the file with some blocking or more reliable function (like fopen, fsockopen or curl, choose the best you can use) and then pass the content to simplexml_load_string instead of simplexml_load_file

Categories