PHP: Right process to avoid timeout issue - php

I have a PHP website and one of the pages I use makes a CURL call to another server. Now this server need about 45 seconds to respond, and there is nothing I can do about it.There is actually 2 step to get the information, the first step is to send the request to update the information (this takes about 43 seconds) and after I need to send another request to get the data back (normally takes 2-5 sec).
My server is on GoDaddy and obviously sometimes it timeout (CGI Timeout) because I think it's normally 30 seconds.
This script (asking the request + getting the data back), is normally triggered overnight via cron job however it can be triggered during the day.
So I was wondering: what would be the best way to split the information to avoid timeout issues?
I was thinking of just sending theupdate request and don't care about the result. Then, about a minute after, I would send a request to get back the data. However, I have no idea if it's even possible to do a timer in PHP, and if so, would the page timeout anyways?
Thanks!

You can set a timeout value in your PHP code to allow more time.
Setting Curl's Timeout in PHP
If you want to run the files separately, I would set up a separate cron job for the second file.

Use CURLOPT_CONNECTTIMEOUT to increase server response time.
CURLOPT_CONNECTTIMEOUT
The number of seconds to wait while trying to connect. Use 0 to wait indefinitely.
And then you need to use CURLOPT_TIMEOUT to get working the option CURLOPT_CONNECTTIMEOUT.
Something like this,
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0); // 0 for wait infinitely, not a good practice
curl_setopt($ch, CURLOPT_TIMEOUT, 400); //in seconds
You can set it in micro seconds as well , like so,
CURLOPT_TIMEOUT_MS

Related

MultiCurl requires big timeout in order to fetch data from many URLs

I use this multi curl wrapper: https://github.com/php-curl-class/php-curl-class/
I'm looping through ~160 URLs and fetch XML data from them. As I realize the curl requests are done in parallel. The strange thing is that if I put small timeout (say, 10 seconds) more than half of URLs couldn't be handled: I received error callback with error message of Timeout was reached.
However if I set the timeout to be 100 seconds, almost all URLs are handled properly.
But I cannot understand why this happens. If I use a single Curl instance and fetch data from any of the URLs I got response pretty quickly, it doesn't require 100 seconds to fetch data from a single URL.
So the purpose of multi curl is doing requests in parallel. Every request has it's own timeout. Then, if the timeout is set to small value (10-20-30 seconds), then why it turns out that it's not enough?
Later I'll have ~600 URLs, which would mean that the timeout probably should be increased to 400-500 seconds, which is weird. I could as well create a single Curl instance and do requests one by one with almost the same results
Curl and PHP can't async a request
instead of using fetch in a JavaScript way you got second third ... chance to do the job with promises. however to add this native function to php with curl you go somes ways already talked here Async curl request in PHP and PHP Curl async response

PHP / HTTP Timeout when using Large set of Curl Request

I have a PHP script which basically goes through my database (currently only about 200 rows)
check for XYZ urls and then trys to scrape them to see if an update value is available.
Now I set the timeout on curl to 5 seconds as shouldn't take longer than this for each request.
I have set the timeout on PHP (max timeout) to 0 - which is unlimited.
I can now successfully do sets of 40 or so rows by add LIMIT 40 to my sql cmd.
Currently I have the scrape in part of a loop for 'each row' returned by the SQL command.
This is ending in a HTTP timeout presumably because it isn't returned within a specific time.
Now my question is can I load each 'loop' too the page individually so you can see them appear as they are completed also this would resolve the HTTP timeout?
I dont have access to apache as its using standard hosting package.
Could any one assist me in the right direction to write this code so it doesn't timeout.
Hope this makes sense if you want me to go over anything above please let me know!

CURL and DDOS Problems

I need get some data from remote http server.Im using Curl Classes for multirequests.
My problem is Remote Server's Firewall. Im sending 1000 between 10000 GET and POST requests. And Server bans me from DDOS.
İ used this measures.
packages still contain header information
curl_setopt($this->ch, CURLOPT_HTTPHEADER, $header);
packages still contain random referer information
curl_setopt($this->ch, CURLOPT_REFERER, $refs[rand(0,count($refs))]);
packages still contain random user agents
curl_setopt($this->ch, CURLOPT_USERAGENT, $agents[rand(0,count($agents))]);
I send packages by using the function of sleep at random intervals.
sleep(rand(0,10));
But bans access to the server each time for 1 hour.
Sorry for my bad english :)
Thanks for all.
Sending a large number of requests in a short space of time to the server is likely to have the same impact as a DOS attack whether that is what you intended or not. A quick fix would be to change the sleep line from sleep(rand(0,10)); which means there is a 1 in 11 chance of sending the next request instantly to sleep(3); which means there will always be 3 seconds (approximately) between requests. 3 seconds should be enough of a gap to keep most servers happy. Once you've verified this works you can reduce the value to 2 or 1 to see if you can speed things up.
A far better solution would be to create an API on the server that allows you to get the data you need in 1, or at least only a few, requests. Obviously this is only possible if you're able to make changes to the server (or can persuade those who can to make the changes on your behalf).

best value for curl timeout and connection timeout

Greetings everyone
I am working on a small crawling engine and am using curl to request pages from various websites. Question is what do suggest should I set my connection_timeout and timeout values to? Stuff I would normally be crawling would be pages with lots of images and text.
cURL knows two different timeouts.
For CURLOPT_CONNECTTIMEOUT it doesn't matter how much text the site contains or how many other resources like images it references because this is a connection timeout and even the server cannot know about the size of the requested page until the connection is established.
For CURLOPT_TIMEOUT it does matter. Even large pages require only a few packets on the wire, but the server may need more time to assemble the output. Also the number of redirects and other things (e.g. proxies) can significantly increase response time.
Generally speaking the "best value" for timeouts depends on your requirements and conditions of the networks and servers. Those conditions are subject of change. Therefore there is no "one best value".
I recommend to use rather short timeouts and retry failed downloads later.
Btw cURL does not automatically download resources referenced in the response. You have to do this manually with further calls to curl_exec (with fresh timeouts).
If you set it too high then your script will be slow as a one url that is down will take all the time you set in CURLOPT_TIMEOUT to finish processing. If you are not using proxies then you can just set the following values
CURLOPT_TIMEOUT = 3
CURLOPT_CONNECTTIMEOUT = 1
Then you can go through failed urls at a later time to double check on them.
The best response is the rik's one.
I have a Proxy Checker and in my benchmarks I saw that most of working Proxies takes less than 10 seconds to connect.
So I use 10 seconds for ConnectionTimeOut and TimeOut but that's in my case, you have to decide how many time you want to use so start with big values, use curl_getinfo to see time benchmarks and decrease the value.
Note: A proxy that takes more than 5 or 10 seconds to connect is useless for me, that's why I use that values.
Yes. If your target is a proxy to query another site, such a cascading connection will require fairly long period like these values to execute the curl calls.
Especially when you encountered intermittent curl problems, please check these values first.
I use
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,30);
curl_setopt($ch, CURLOPT_TIMEOUT,60);

Faster alternative to file_get_contents()

Currently I'm using file_get_contents() to submit GET data to an array of sites, but upon execution of the page I get this error:
Fatal error: Maximum execution time of 30 seconds exceeded
All I really want the script to do is start loading the webpage, and then leave. Each webpage may take up to 5 minutes to load fully, and I don't need it to load fully.
Here is what I currently have:
foreach($sites as $s) //Create one line to read from a wide array
{
file_get_contents($s['url']); // Send to the shells
}
EDIT: To clear any confusion, this script is being used to start scripts on other servers, that return no data.
EDIT: I'm now attempting to use cURL to do the trick, by setting a timeout of one second to make it send the data and then stop. Here is my code:
$ch = curl_init($s['url']); //load the urls
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 1); //Only send the data, don't wait.
curl_exec($ch); //Execute
curl_close($ch); //Close it off.
Perhaps I've set the option wrong. I'm looking through some manuals as we speak. Just giving you an update. Thank you all of you that are helping me thus far.
EDIT: Ah, found the problem. I was using CURLOPT_CONNECTTIMEOUT instead of CURLOPT_TIMEOUT. Whoops.
However now, the scripts aren't triggering. They each use ignore_user_abort(TRUE); so I can't understand the problem
Hah, scratch that. Works now. Thanks a lot everyone
There are many ways to solve this.
You could use cURL with its curl_multi_* functions to execute asynchronously the requests. Or use cURL the common way but using 1 as timeout limit, so it will request and return timeout, but the request will be executed.
If you don't have cURL installed, you could continue using file_get_contents but forking processes (not so cool, but works) using something like ZendX_Console_Process_Unix so you avoid the waiting between each request.
As Franco mentioned and I'm not sure was picked up on, you specifically want to use the curl_multi functions, not the regular curl ones. This packs multiple curl objects into a curl_multi object and executes them simultaneously, returning (or not, in your case) the responses as they arrive.
Example at http://php.net/curl_multi_init
Re your update that you only need to trigger the operation:
You could try using file_get_contents with a timeout. This would lead to the remote script being called, but the connection being terminated after n seconds (e.g. 1).
If the remote script is configured so it continues to run even if the connection is aborted (in PHP that would be ignore_user_abort), it should work.
Try it out. If it doesn't work, you won't get around increasing your time_limit or using an external executable. But from what you're saying - you just need to make the request - this should work. You could even try to set the timeout to 0 but I wouldn't trust that.
From here:
<?php
$ctx = stream_context_create(array(
'http' => array(
'timeout' => 1
)
)
);
file_get_contents("http://example.com/", 0, $ctx);
?>
To be fair, Chris's answer already includes this possibility: curl also has a timeout switch.
it is not file_get_contents() who consume that much time but network connection itself.
Consider not to submit GET data to an array of sites, but create an rss and let them get RSS data.
I don't fully understands the meaning behind your script.
But here is what you can do:
In order to avoid the fatal error quickly you can just add set_time_limit(120) at the beginning of the file. This will allow the script to run for 2 minutes. Of course you can use any number that you want and 0 for infinite.
If you just need to call the url and you don't "care" for the result you should use cUrl in asynchronous mode. This case any call to the URL will not wait till it finished. And you can call them all very quickly.
BR.
If the remote pages take up to 5 minutes to load, your file_get_contents will sit and wait for that 5 minutes. Is there any way you could modify the remote scripts to fork into a background process and do the heavy processing there? That way your initial hit will return almost immediately, and not have to wait for the startup period.
Another possibility is to investigate if a HEAD request would do the trick. HEAD does not return any data, just headers, so it may be enough to trigger the remote jobs and not wait for the full output.

Categories