I'm trying to get information from a site by parsing/scraping it via PHP & Curl. But sometimes the current page doesn't finish loading, so the script runs without anything happening. It's a simple script like this...
...
curl_setopt($curl, CURLOPT_URL, $url);
$page = curl_exec($curl);
...
Is there a way to simply retry the loading of the same page if the page doesn't finish loading after (for example) 60 sec, without interrupting the complete script?
It would be great if someone could help me out with a way to realize this task.
You can use CURLOPT_TIMEOUT which is the maximum number of seconds to allow cURL functions to execute.
curl_setopt($ch, CURLOPT_TIMEOUT, timeout_in_seconds);
Something simple would be..
$bol=true;
while($bol)
{
$page = curl_exec($curl);
if($page=="")//Or whatever curl_exec returns on timeout
$bol=true;
}
Related
I have a curl script that I would like to wrap in an if condition to proceed anyway if the page isnt loading. Is it possible to check page load time > if taking too long > proceed anyway?
if(pageLoad != TooMuchTime {
//Curl script to some url
} else {
//Took to long to get a responce
}
The reason I do this is because I use a curl request as part of an install script (to track installs) the curl call is to a php file which inserts data to the database. If for any reason (network congestion, site down, ect.) the page doesn't load I want the user to still be able to install the product.
Curl offers options to achieve the desired action, There is no need for if conditions.
CURLOPT_CONNECTTIMEOUT - The number of seconds to wait while trying to connect. Use 0 to wait indefinitely.
CURLOPT_TIMEOUT - The maximum number of seconds to allow cURL functions to execute.
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 400); //timeout in seconds
From a php page, i have to do a get to another php file.
I don't care to wait for the response of the get or know whether it is successful or not.
The file called could end the script also in 5-6 seconds, so i don't know how to handle the get timeout considering what has been said before.
The code is this
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://mywebsite/myfile.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);
curl_setopt($ch, CURLOPT_TIMEOUT, 1);
$content = trim(curl_exec($ch));
curl_close($ch);
For the first task (Where you don't need to wait for response )you can start new background process and below that write code which will redirect you on another page.
Yeah, you definitely shouldn't be creating a file on the server in response to a GET request. Even as a side-effect, it's less than ideal; as the main purpose of the request, it just doesn't make sense.
If you were doing this as a POST, you'd still have the same issue to work with, however. In that case, if the action can't be guaranteed to happen quickly enough to be acceptable in the context of HTTP, you'll need to hive it off somewhere else. E.g. make your HTTP request send a message to some other system which then works in parallel whilst the HTTP response is free to be sent back immediately.
It's my first time developing an API, which is why i'm not surpirsed it was running a little slow, taking 2-4 seconds to load (I used a microtime timer on my webpage).
But then I found out how long it took for the API commands to execute, they're around 0.002 seconds. So why when I use CURL in PHP, does it take another 2 seconds to load?
My API Connection Code:
function APIPost($DataToSend){
$APILink = curl_init();
curl_setopt($APILink,CURLOPT_URL, "http://api.subjectplanner.co.uk");
curl_setopt($APILink,CURLOPT_POST, 4);
curl_setopt($APILink,CURLOPT_POSTFIELDS, $DataToSend);
curl_setopt($APILink, CURLOPT_HEADER, 0);
curl_setopt($APILink, CURLOPT_RETURNTRANSFER, 1);
return curl_exec($APILink);
curl_close($APILink);
}
How I retrieve data in my web page:
$APIData=array(
'com'=>'todayslessons',
'json'=>'true',
'sid'=>$_COOKIE['SID']
);
$APIResult = json_decode(APIPost($APIData), true);
if($APIResult['functionerror']==0){
$Lessons['Error']=false;
$Lessons['Data']=json_decode($APIResult['data'], true);
}else{
$Lessons['Error']=true;
$Lessons['ErrorDetails']="An error has occured.";
}
The APIPost function is within a functions.php file, which is included at the begging of my page. The time it took from the begging of the second snippet of code, to the end is about 2.0126 seconds. What is the best way to fetch my API data?
This is just a guess, so please dont beat me up about it. But maybe its waiting for curl to complete i.e. timeout as you dont close curl before doing the return.
Try this tiny amendment see if it helps:
function APIPost($DataToSend){
$APILink = curl_init();
curl_setopt($APILink,CURLOPT_URL, "http://api.subjectplanner.co.uk");
curl_setopt($APILink,CURLOPT_POST, 4);
curl_setopt($APILink,CURLOPT_POSTFIELDS, $DataToSend);
curl_setopt($APILink, CURLOPT_HEADER, 0);
curl_setopt($APILink, CURLOPT_RETURNTRANSFER, 1);
$ret curl_exec($APILink);
curl_close($APILink);
return $ret;
}
was searching stackoverflow for a solution, but couldn't find anything even close to what I am trying to achieve. Perhaps I am just blissfully unaware of some magic PHP sauce everyone is doing tackling this problem... ;)
Basically I have an array with give or take a few hundred urls, pointing to different XML files on a remote server. I'm doing some magic file-checking to see if the content of the XML files have changed and if it did, I'll download newer XMLs to my server.
PHP code:
$urls = array(
'http://stackoverflow.com/a-really-nice-file.xml',
'http://stackoverflow.com/another-cool-file2.xml'
);
foreach($urls as $url){
set_time_limit(0);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, false);
$contents = curl_exec($ch);
curl_close($ch);
file_put_contents($filename, $contents);
}
Now, $filename is set somewhere else and gives each xml it's own ID based on my logic.
So far this script is running OK and does what it should, but it does it terribly slow. I know my server can handle a lot more and I suspect my foreach is slowing down the process.
Is there any way I can speed up the foreach? Currently I am thinking to up the file_put_contents in each foreach loop to 10 or 20, basically cutting my execution time 10- or 20-fold, but can't think of how to approach this the best and most performance kind of way. Any help or pointers on how to proceed?
Your bottleneck (most likely) is your curl requests, you can only write to a file after each request is done, there is no way (in a single script) to speed up that process.
I don't know how it all works but you can execute curl requests in parallel: http://php.net/manual/en/function.curl-multi-exec.php.
Maybe you can fetch the data (if memory is available to store it) and then as they complete fill in the data.
Just run more script. Each script will download some urls.
You can get more information about this pattern here: http://en.wikipedia.org/wiki/Thread_pool_pattern
The more script your run the more parallelism you get
I use on paralel requests guzzle pool ;) ( you can send x paralel request)
http://docs.guzzlephp.org/en/stable/quickstart.html
I'm running a curl request on an eXist database through php. The dataset is very large, and as a result, the database consistently takes a long amount of time to return an XML response. To fix that, we set up a curl request, with what is supposed to be a long timeout.
$ch = curl_init();
$headers["Content-Length"] = strlen($postString);
$headers["User-Agent"] = "Curl/1.0";
curl_setopt($ch, CURLOPT_URL, $requestUrl);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERPWD, 'admin:');
curl_setopt($ch,CURLOPT_TIMEOUT,1000);
$response = curl_exec($ch);
curl_close($ch);
However, the curl request consistently ends before the request is completed (<1000 when requested via a browser). Does anyone know if this is the proper way to set timeouts in curl?
See documentation: http://www.php.net/manual/en/function.curl-setopt.php
CURLOPT_CONNECTTIMEOUT - The number of seconds to wait while trying to connect. Use 0 to wait indefinitely.
CURLOPT_TIMEOUT - The maximum number of seconds to allow cURL functions to execute.
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 400); //timeout in seconds
also don't forget to enlarge time execution of php script self:
set_time_limit(0);// to infinity for example
Hmm, it looks to me like CURLOPT_TIMEOUT defines the amount of time that any cURL function is allowed to take to execute. I think you should actually be looking at CURLOPT_CONNECTTIMEOUT instead, since that tells cURL the maximum amount of time to wait for the connection to complete.
There is a quirk with this that might be relevant for some people... From the PHP docs comments.
If you want cURL to timeout in less than one second, you can use CURLOPT_TIMEOUT_MS, although there is a bug/"feature" on "Unix-like systems" that causes libcurl to timeout immediately if the value is < 1000 ms with the error "cURL Error (28): Timeout was reached". The explanation for this behavior is:
"If libcurl is built to use the standard system name resolver, that portion of the transfer will still use full-second resolution for timeouts with a minimum timeout allowed of one second."
What this means to PHP developers is "You can't use this function without testing it first, because you can't tell if libcurl is using the standard system name resolver (but you can be pretty sure it is)"
The problem is that on (Li|U)nix, when libcurl uses the standard name resolver, a SIGALRM is raised during name resolution which libcurl thinks is the timeout alarm.
The solution is to disable signals using CURLOPT_NOSIGNAL. Here's an example script that requests itself causing a 10-second delay so you can test timeouts:
if (!isset($_GET['foo'])) {
// Client
$ch = curl_init('http://localhost/test/test_timeout.php?foo=bar');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_NOSIGNAL, 1);
curl_setopt($ch, CURLOPT_TIMEOUT_MS, 200);
$data = curl_exec($ch);
$curl_errno = curl_errno($ch);
$curl_error = curl_error($ch);
curl_close($ch);
if ($curl_errno > 0) {
echo "cURL Error ($curl_errno): $curl_error\n";
} else {
echo "Data received: $data\n";
}
} else {
// Server
sleep(10);
echo "Done.";
}
From http://www.php.net/manual/en/function.curl-setopt.php#104597
Your code sets the timeout to 1000 seconds. For milliseconds, use CURLOPT_TIMEOUT_MS.
You will need to make sure about timeouts between you and the file.
In this case PHP and Curl.
To tell Curl to never timeout when a transfer is still active, you need to set CURLOPT_TIMEOUT to 0, instead of 1000.
curl_setopt($ch, CURLOPT_TIMEOUT, 0);
In PHP, again, you must remove time limits or PHP it self (after 30 seconds by default) will kill the script along Curl's request. This alone should fix your issue.
In addition, if you require data integrity, you could add a layer of security by using ignore_user_abort:
# The maximum execution time, in seconds. If set to zero, no time limit is imposed.
set_time_limit(0);
# Make sure to keep alive the script when a client disconnect.
ignore_user_abort(true);
A client disconnection will interrupt the execution of the script and possibly damaging data,
eg. non-transitional database query, building a config file, ecc., while in your case it would download a partial file... and you might, or not, care about this.
Answering this old question because this thread is at the top on engine searches for CURL_TIMEOUT.
You can't run the request from a browser, it will timeout waiting for the server running the CURL request to respond. The browser is probably timing out in 1-2 minutes, the default network timeout.
You need to run it from the command line/terminal.
If you are using PHP as a fastCGI application then make sure you check the fastCGI timeout settings.
See: PHP curl put 500 error