PHP curl maximum execution time using hhvm - php

I am trying to download all the data from an api, so I am curling into it and saving the results a json file. But the execution stops and the results are truncated and never finishes.
How can this be remedied. Maybe the maximum execution time in the server of api cannot serve so long so it stops. I think there are more than 10000 results.
Is there a way to download the first 1000, 2nd 1000 results etc. and by the way, the api uses sails.js for their api,
Here is my code :
<?php
$url = 'http://api.example.com/model';
$data = array (
'app_id' => '234567890976',
'limit' => 100000
);
$fields_string = '';
foreach($data as $key=>$value) { $fields_string .= $key.'='.urlencode($value).'&'; }
$fields_string = rtrim($fields_string,'&');
$url = $url.'?'.$fields_string;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, '300000000');
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
$response = curl_exec($ch);
print($response);
$file = fopen("results.json", 'w+'); // Create a new file, or overwrite the existing one.
fwrite($file, $response);
fclose($file);
curl_close($ch);

Lots of possible problems might be the cause. Without more details that help understand if the problem is on the client or server, such as with error codes or other info, it's hard to say.
Given that you are calling the API with a URL, what happens when you put your URL into a browser? If you get a good response in a browser then it seems likely the problem is with your local configuration and not with node/sails.
Here are a few ideas to see if the problem is local, but I'll admit I can't say any one is the right answer because I don't have enough information to do better:
Check your php.ini settings for memory_limit, max_execution_time and if you are using Apache, the httpd.conf timeout setting. A test using the URL in a browser is a way to see if these settings may help. If the browser downloads the response fine, start checking things like these settings for reasons your system is prematurely ending things.
If you are saving the response to disk and not manipulating the data, you could try removing CURLOPT_RETURNTRANSFER and instead use CURLOPT_FILE. This can be more memory efficient and (in my experience) faster if you don't need the data in-memory. See this article or this article on this site for info on how to do this.
Check what's in curl_errno if the script isn't crashing.
Related: what is your error reporting level? If error reporting is off...why haven't you turned it on as you debug this? If error reporting is on...are you getting any errors?
Given the way you are using foreach to construct a URL, I have to wonder if you are writing a really huge URL with up to 10,000 items in your query string. If so, that's a bad approach. In a situation like that, you could consider breaking up the requests into individual queries and then use curl_multi or the Rolling Curl library that uses curl_multi to do the work to queue and execute multiple requests. (If you are just making a single request and get one gigantic response with tons of detail, this won't be useful.)
Good luck.

Related

PHP : understand the CURL timeout

From a php page, i have to do a get to another php file.
I don't care to wait for the response of the get or know whether it is successful or not.
The file called could end the script also in 5-6 seconds, so i don't know how to handle the get timeout considering what has been said before.
The code is this
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://mywebsite/myfile.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);
curl_setopt($ch, CURLOPT_TIMEOUT, 1);
$content = trim(curl_exec($ch));
curl_close($ch);
For the first task (Where you don't need to wait for response )you can start new background process and below that write code which will redirect you on another page.
Yeah, you definitely shouldn't be creating a file on the server in response to a GET request. Even as a side-effect, it's less than ideal; as the main purpose of the request, it just doesn't make sense.
If you were doing this as a POST, you'd still have the same issue to work with, however. In that case, if the action can't be guaranteed to happen quickly enough to be acceptable in the context of HTTP, you'll need to hive it off somewhere else. E.g. make your HTTP request send a message to some other system which then works in parallel whilst the HTTP response is free to be sent back immediately.

Running file_put_contents in parallel?

was searching stackoverflow for a solution, but couldn't find anything even close to what I am trying to achieve. Perhaps I am just blissfully unaware of some magic PHP sauce everyone is doing tackling this problem... ;)
Basically I have an array with give or take a few hundred urls, pointing to different XML files on a remote server. I'm doing some magic file-checking to see if the content of the XML files have changed and if it did, I'll download newer XMLs to my server.
PHP code:
$urls = array(
'http://stackoverflow.com/a-really-nice-file.xml',
'http://stackoverflow.com/another-cool-file2.xml'
);
foreach($urls as $url){
set_time_limit(0);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, false);
$contents = curl_exec($ch);
curl_close($ch);
file_put_contents($filename, $contents);
}
Now, $filename is set somewhere else and gives each xml it's own ID based on my logic.
So far this script is running OK and does what it should, but it does it terribly slow. I know my server can handle a lot more and I suspect my foreach is slowing down the process.
Is there any way I can speed up the foreach? Currently I am thinking to up the file_put_contents in each foreach loop to 10 or 20, basically cutting my execution time 10- or 20-fold, but can't think of how to approach this the best and most performance kind of way. Any help or pointers on how to proceed?
Your bottleneck (most likely) is your curl requests, you can only write to a file after each request is done, there is no way (in a single script) to speed up that process.
I don't know how it all works but you can execute curl requests in parallel: http://php.net/manual/en/function.curl-multi-exec.php.
Maybe you can fetch the data (if memory is available to store it) and then as they complete fill in the data.
Just run more script. Each script will download some urls.
You can get more information about this pattern here: http://en.wikipedia.org/wiki/Thread_pool_pattern
The more script your run the more parallelism you get
I use on paralel requests guzzle pool ;) ( you can send x paralel request)
http://docs.guzzlephp.org/en/stable/quickstart.html

Facebook graph extremely slow in PHP

Whether using the Facebook PHP SDK, or just loading data using curl with $contents = file_get_contents("https://graph.facebook.com/$id?access_token=$accessToken"), it takes around a whole second for the response to come.
That counts as very slow when I need to check the data for a bunch of ids.
When in a browser, if I type in a facebook graph url, I get the results almost instantly, under a tenth of the time it takes in PHP.
What is causing this problem, and how can I make it as fast as it would be in any browser? I know the browser can do it. There has to be a way to make it fast in PHP too.
IDEA: perhaps I need to configure something in cURL?
What I have tried:
Using the PHP SDK. It's as slow. The reason I tried using file_get_contents() in the first place was because I was hoping the PHP SDK wasn't configured properly.
Using setopt($ch, CURLOPT_SSL_VERIFYPEER, false);. It didn't make a difference. AFTER ANSWER ACCEPT EDIT: actually, this together with reusing the curl handle made the subsequent requests really fast.
EDIT: here is a pastebin of the code I used to measure the time it takes to do the requests: http://pastebin.com/bEbuqq5g.
I corrected the text that used to say microseconds, to seconds. this is what produces results similar to the one I wrote in my comment in this question: Facebook graph extremely slow in PHP. Note also that they take similarly slow times even if the access token is expired, like in my pastebin example.
EDIT 2: there should be partly a problem with ssl. I tried benchmarking http://graph.facebook.com/4 (no httpS), and it resulted in 1.2 seconds for three requests, whereas the same, but with https took 2.2 seconds. This is in no way a solution though, because for any request that needs an access token, I must use https.
file_get_contents can be very slow in PHP because it doesn't send/process headers properly, leading to the HTTP connection not getting closed properly when the file transfer is complete. I have also read about DNS issues, though I don't have any information about that.
The solution that I highly recommend is to either use the PHP SDK, which is designed for making API calls to Facebook, or make use of cURL (which the SDK uses). With cURL you can really configure a lot of aspects of the request, since it's basically designed for making API calls like this.
PHP SDK information: https://developers.facebook.com/docs/reference/php/
PHP SDK source: https://github.com/facebook/facebook-php-sdk
If you choose to do it without the SDK, you could look at how they make use of cURL in base_facebook.php. here is some sample code you could use to fetch using cURL:
function get_url($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, FALSE); // Return contents only
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // return results instead of outputting
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10) // Give up after connecting for 10 seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 60); // Only execute 60s at most
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // Don't verify SSL cert
$response = curl_exec($ch);
curl_close($ch);
return $response;
}
$contents = get_url("https://graph.facebook.com/$id?access_token=$accessToken");
The function will return FALSE on failure.
I see that you said you've used the PHP SDK, but maybe you didn't have cURL set up. Try installing or updating it, and if it still seems to be slow, you should use
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
and check out the output.
I wondered what would happen if I did two subsequent curl_exec() calls without doing a curl_close(), enabling the use of HTTP Keep-Alive.
The test code:
$ch = curl_init('https://graph.facebook.com/xxx');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// FIRST REQUEST
curl_exec($ch);
print_r(curl_getinfo($ch));
// SECOND REQUEST
curl_exec($ch);
print_r(curl_getinfo($ch));
curl_close($ch);
Below are the results, showing parts of the output from curl_getinfo():
// FIRST REQUEST
[total_time] => 0.976259
[namelookup_time] => 0.008271
[connect_time] => 0.208543
[pretransfer_time] => 0.715296
// SECOND REQUEST
[total_time] => 0.253083
[namelookup_time] => 3.7E-5
[connect_time] => 3.7E-5
[pretransfer_time] => 3.9E-5
The first request is pretty slow, almost one whole second, similar to your experience. But from the time of the second request (only 0.25s) you can see how much difference the keep-alive made.
Your browser uses this technique as well of course, loading the page in a fresh instance of your browser would take considerably longer.
Just two thoughts:
Have you verified that the browser doesn't have a presistent connection to facebook? That the browser hasn't cached the DNS lookup (you could try adding graph.facebook.net to your hosts-file to rule in/out DNS)
You are of course running the php code from the same system/environment as your browser (not from a vm, not from another host? Also that php is running with the same scheduling priorties as your browser? (same nice level etc))
The overall biggest factor in making Graph API calls “slow” is – the HTTP connection.
Maybe there’s a little improvement in there by tweaking some parameters or getting a server with a better connection.
But this will most likely make no big difference, as HTTP is generally to be considered “slow”, and there’s little that can be done about this.
That counts as very slow when I need to check the data for a bunch of ids.
The best thing you can do to speed things up is, of course – minimize the number of HTTP requests.
If you have to do several Graph API calls in a row, try doing them as a Batch Request instead. That allows you to query several portions of data, while at the same time making only one HTTP request.
This is purly a speculation, however the cause might be that Facebook uses the SPDY protocol (not sure wheter that's true for the API). PHP is not able to load the page using the SPDY protocol.

php curl check if url is reachable before query

We're having problems with an api we are using.
Here is the code we're using (naming no names on the api front)
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://apiurl.com/whatever/api/we/call');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$ch_output = curl_exec($ch);
curl_close($ch);
This response times out, but not for ages. This is hideously slowing down our web app, and as such further code breaks because of the bad return value. This I can fix, however the response timeout I don't know how to fix. Is there any way to quickly see if a url is "responding" (e.g. something like ping in terminal) before trying to do a curl request?
Thank you.
Do you mean usingcurl_setopt($ch,CURLOPT_CONNECTTIMEOUT,NUMERIC_TIMEOUT_VALUE);to set the timeout?
Your best option would be to set the timeout on curl to a more acceptable level. There are several timeout options available for DNS lookup, connect timeout, transfer timeout, etc. More information is available here http://php.net/manual/en/function.curl-setopt.php

PHP file_get_contents very slow when using full url

I am working with a script (that I did not create originally) that generates a pdf file from an HTML page. The problem is that it is now taking a very long time, like 1-2 minutes, to process. Supposedly this was working fine originally, but has slowed down within the past couple of weeks.
The script calls file_get_contents on a php script, which then outputs the result into an HTML file on the server, and runs the pdf generator app on that file.
I seem to have narrowed down the problem to the file_get_contents call on a full url, rather than a local path.
When I use
$content = file_get_contents('test.txt');
it processes almost instantaneously. However, if I use the full url
$content = file_get_contents('http://example.com/test.txt');
it takes anywhere from 30-90 seconds to process.
It's not limited to our server, it is slow when accessing any external url, such as http://www.google.com. I believe the script calls the full url because there are query string variables that are necessary that don't work if you call the file locally.
I also tried fopen, readfile, and curl, and they were all similarly slow. Any ideas on where to look to fix this?
Note: This has been fixed in PHP 5.6.14. A Connection: close header will now automatically be sent even for HTTP/1.0 requests. See commit 4b1dff6.
I had a hard time figuring out the cause of the slowness of file_get_contents scripts.
By analyzing it with Wireshark, the issue (in my case and probably yours too) was that the remote web server DIDN'T CLOSE THE TCP CONNECTION UNTIL 15 SECONDS (i.e. "keep-alive").
Indeed, file_get_contents doesn't send a "connection" HTTP header, so the remote web server considers by default that's it's a keep-alive connection and doesn't close the TCP stream until 15 seconds (It might not be a standard value - depends on the server conf).
A normal browser would consider the page is fully loaded if the HTTP payload length reaches the length specified in the response Content-Length HTTP header. File_get_contents doesn't do this and that's a shame.
SOLUTION
SO, if you want to know the solution, here it is:
$context = stream_context_create(array('http' => array('header'=>'Connection: close\r\n')));
file_get_contents("http://www.something.com/somepage.html",false,$context);
The thing is just to tell the remote web server to close the connection when the download is complete, as file_get_contents isn't intelligent enough to do it by itself using the response Content-Length HTTP header.
I would use curl() to fetch external content, as this is much quicker than the file_get_contents method. Not sure if this will solve the issue, but worth a shot.
Also note that your servers speed will effect the time it takes to retrieve the file.
Here is an example of usage:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://example.com/test.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
Sometimes, it's because the DNS is too slow on your server, try this:
replace
echo file_get_contents('http://www.google.com');
as
$context=stream_context_create(array('http' => array('header'=>"Host: www.google.com\r\n")));
echo file_get_contents('http://74.125.71.103', false, $context);
I had the same issue,
The only thing that worked for me is setting timeout in $options array.
$options = array(
'http' => array(
'header' => implode($headers, "\r\n"),
'method' => 'POST',
'content' => '',
'timeout' => .5
),
);
$context = stream_context_create(array('http' => array('header'=>'Connection: close\r\n')));
$string = file_get_contents("http://localhost/testcall/request.php",false,$context);
Time: 50976 ms (avaerage time in total 5 attempts)
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, "http://localhost/testcall/request.php");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
echo $data = curl_exec($ch);
curl_close($ch);
Time: 46679 ms (avaerage time in total 5 attempts)
Note: request.php is used to fetch some data from mysql database.
Can you try fetching that url, on the server, from the command line? curl or wget come to mind. If those retrieve the URL at a normal speed, then it's not a network problem and most likely something in the apache/php setup.
I have a huge data passed by API, I'm using file_get_contents to read the data, but it took around 60 seconds. However, using KrisWebDev's solution it took around 25 seconds.
$context = stream_context_create(array('https' => array('header'=>'Connection: close\r\n')));
file_get_contents($url,false,$context);
What I would also consider with Curl is that you can "thread" the requests. This has helped me immensely as I do not have access to a version of PHP that allows threading at the moment .
For example, I was getting 7 images from a remote server using file_get_contents and it was taking 2-5 seconds per request. This process alone was adding 30seconds or something to the process, while the user waited for the PDF to be generated.
This literally reduced the time to about 1 image. Another example, I verify 36 urls in the time it took before to do one. I think you get the point. :-)
$timeout = 30;
$retTxfr = 1;
$user = '';
$pass = '';
$master = curl_multi_init();
$node_count = count($curlList);
$keys = array("url");
for ($i = 0; $i < $node_count; $i++) {
foreach ($keys as $key) {
if (empty($curlList[$i][$key])) continue;
$ch[$i][$key] = curl_init($curlList[$i][$key]);
curl_setopt($ch[$i][$key], CURLOPT_TIMEOUT, $timeout); // -- timeout after X seconds
curl_setopt($ch[$i][$key], CURLOPT_RETURNTRANSFER, $retTxfr);
curl_setopt($ch[$i][$key], CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch[$i][$key], CURLOPT_USERPWD, "{$user}:{$pass}");
curl_setopt($ch[$i][$key], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($master, $ch[$i][$key]);
}
}
// -- get all requests at once, finish when done or timeout met --
do { curl_multi_exec($master, $running); }
while ($running > 0);
Then check over the results:
if ((int)curl_getinfo($ch[$i][$key], CURLINFO_HTTP_CODE) > 399 || empty($results[$i][$key])) {
unset($results[$i][$key]);
} else {
$results[$i]["options"] = $curlList[$i]["options"];
}
curl_multi_remove_handle($master, $ch[$i][$key]);
curl_close($ch[$i][$key]);
then close file:
curl_multi_close($master);
I know that is old question but I found it today and answers didn't work for me. I didn't see anyone saying that max connections per IP may be set to 1. That way you are doing API request and API is doing another request because you use full url. That's why loading directly from disc works. For me that fixed a problem:
if (strpos($file->url, env('APP_URL')) === 0) {
$url = substr($file->url, strlen(env('APP_URL')));
} else {
$url = $file->url;
}
return file_get_contents($url);

Categories