i could only think of curl_close() from one of the callback functions.
but php throws a warning:
PHP Warning: curl_close(): Attempt to close cURL handle from a callback.
any ideas how to do that?
you can return false or something what is not length of currently downloaded data from callback function to abort curl
I had a similar problem that needed me to be able to stop a curl transfer in the middle. This is easily in my personal top ten of 'dirty hacks that seem to work' of all time.
Create a curl read function that knows when it's time to cancel the upload.
function curlReadFunction($ch, $fileHandle, $maxDataSize){
if($GLOBALS['abortTransfer'] == TRUE){
sleep(1);
return "";
}
return fread($fileHandle, $maxDataSize);
}
And tell Curl to stop if the data read rate drops too low for a certain amount of time.
curl_setopt($ch, CURLOPT_READFUNCTION, 'curlReadFunction');
curl_setopt($ch, CURLOPT_LOW_SPEED_LIMIT, 1024);
curl_setopt($ch, CURLOPT_LOW_SPEED_TIME, 5);
This will cause the curl transfer to abort during the upload. Obviously not ideal but it seems to work.
If the problem is that is taking too long to execute the curl, you could set a time, example
<?php
$c = curl_init('http://slow.example.com/');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 15);
$page = curl_exec($c);
curl_close($c);
echo $page;
Related
I have to call external script in which i make a first call with CURL to get data which takes about 2-3 minutes, Now during this time i need to make other external call with CURL to get the progress of the first call. But issue is my next call wait till the reply of first CURL comes. I also checked curl_multi but that is also not helping me as i want to make many calls when the first call is in progress. So anyone can help me to solve it please.
I suppose that, there is no need to make second call to track the CURL progress. You can achieve the same by using CURL option CURLOPT_PROGRESSFUNCTION with a callback function.
The call back method takes 5 arguments:
cURL resource
Total number of bytes expected to be downloaded
Number of bytes downloaded so far
Total number of bytes expected to be uploaded
Number of bytes uploaded so far
In the callback method you can calculate the percentage downloaded/uploaded. An example is given below:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://stackoverflow.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_PROGRESSFUNCTION, 'progress');
curl_setopt($ch, CURLOPT_NOPROGRESS, false);
curl_setopt($ch, CURLOPT_HEADER, 0);
$html = curl_exec($ch);
curl_close($ch);
function progress($resource,$download_size, $downloaded, $upload_size, $uploaded)
{
if($download_size > 0)
echo $downloaded / $download_size * 100 . "%\n";
sleep(1);
}
There is a way to do this - please see the following links, they explain how to do this using curl_multi_init: php.net curl_multi_init and http://arguments.callee.info/2010/02/21/multiple-curl-requests-with-php/
It's my first time developing an API, which is why i'm not surpirsed it was running a little slow, taking 2-4 seconds to load (I used a microtime timer on my webpage).
But then I found out how long it took for the API commands to execute, they're around 0.002 seconds. So why when I use CURL in PHP, does it take another 2 seconds to load?
My API Connection Code:
function APIPost($DataToSend){
$APILink = curl_init();
curl_setopt($APILink,CURLOPT_URL, "http://api.subjectplanner.co.uk");
curl_setopt($APILink,CURLOPT_POST, 4);
curl_setopt($APILink,CURLOPT_POSTFIELDS, $DataToSend);
curl_setopt($APILink, CURLOPT_HEADER, 0);
curl_setopt($APILink, CURLOPT_RETURNTRANSFER, 1);
return curl_exec($APILink);
curl_close($APILink);
}
How I retrieve data in my web page:
$APIData=array(
'com'=>'todayslessons',
'json'=>'true',
'sid'=>$_COOKIE['SID']
);
$APIResult = json_decode(APIPost($APIData), true);
if($APIResult['functionerror']==0){
$Lessons['Error']=false;
$Lessons['Data']=json_decode($APIResult['data'], true);
}else{
$Lessons['Error']=true;
$Lessons['ErrorDetails']="An error has occured.";
}
The APIPost function is within a functions.php file, which is included at the begging of my page. The time it took from the begging of the second snippet of code, to the end is about 2.0126 seconds. What is the best way to fetch my API data?
This is just a guess, so please dont beat me up about it. But maybe its waiting for curl to complete i.e. timeout as you dont close curl before doing the return.
Try this tiny amendment see if it helps:
function APIPost($DataToSend){
$APILink = curl_init();
curl_setopt($APILink,CURLOPT_URL, "http://api.subjectplanner.co.uk");
curl_setopt($APILink,CURLOPT_POST, 4);
curl_setopt($APILink,CURLOPT_POSTFIELDS, $DataToSend);
curl_setopt($APILink, CURLOPT_HEADER, 0);
curl_setopt($APILink, CURLOPT_RETURNTRANSFER, 1);
$ret curl_exec($APILink);
curl_close($APILink);
return $ret;
}
I have to fetch multiple web pages, let's say 100 to 500. Right now I am using curl to do so.
function get_html_page($url) {
//create curl resource
$ch = curl_init();
//set url
curl_setopt($ch, CURLOPT_URL, $url);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
//$output contains the output string
$html = curl_exec($ch);
//close curl resource to free up system resources
curl_close($ch);
return $html;
}
My major concern is the total time taken by my script to fetch all these web pages. I know that the time taken is directly proportional to my internet speed and hence the majority time is taken by $html = curl_exec($ch); function call.
I was thinking that instead of creating and destroying curl instance again and again for each and every web page, if I create it only once and then just reuse it for each and every page and finally in the end destroy it. Something like:
<?php
function get_html_page($ch, $url) {
//$output contains the output string
$html = curl_exec($ch);
return $html;
}
//create curl resource
$ch = curl_init();
//set url
curl_setopt($ch, CURLOPT_URL, $url);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
.
.
.
<fetch web pages using get_html_page()>
.
.
.
//close curl resource to free up system resources
curl_close($ch);
?>
Will it make any significant difference in the total time taken? If there is any other better approach then please let me know about it also?
How about trying to benchmark it? It may be more efficient to do it the second way, but I don't think it will add up to much. I'm sure your system can create and destroy curl instances in microseconds. It has to initiate the same HTTP connections each time either way, too.
If you were running many of these at the same time and were worried about system resources, not time, it might be worth exploring. As you noted, most of the time spent doing this will be waiting for network transfers, so I don't think you'll notice a change in overall time with either method.
For web scraping I would use : YQL + JSON + xPath. You'll implement it using cURL
I think you'll save a lot of resources.
Normally I Post data when I initiate cURL. And I wait for the response, parse it, etc...
I want to simply post data, and not wait for any response.
In other words, can I send data to a Url, via cURL, and close my connection immediately? (not waiting for any response, or even to see if the url exists)
It's not a normal thing to ask, but I'm asking anyway.
Here's what I have so far:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $MyUrl);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_to_send);
curl_exec($ch);
curl_close($ch);
I believe the only way to not actually receive the whole response from the remote server is by using CURLOPT_WRITEFUNCTION. For example:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $MyUrl);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_to_send);
curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'do_nothing');
curl_exec($ch);
curl_close($ch);
function do_nothing($curl, $input) {
return 0; // aborts transfer with an error
}
Important notes
Be aware that this will generate a warning, as the transfer will be aborted.
Make sure that you do not set the value of CURLOPT_RETURNTRANSFER, as this will interfere with the write callback.
You could do this through the curl_multi_* functions that are designed to execute multiple simultaneous requests - just fire off one request and don't bother asking for the response.
Not sure what the implications are in terms of what will happen if the script exits and curl is still running.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $MyUrl);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_to_send);
$mh = curl_multi_init();
curl_multi_add_handle($mh,$ch);
$running = 'idc';
curl_multi_exec($mh,$running); // asynchronous
// don't bother with the usual cleanup
Not sure if this helps, but via command-line I suppose you could use the '--max-time' option - "Maximum time in seconds that you allow the whole operation to take."
I had to do something quick and dirty and didn't want to have to re-program code or wait for a response, so found the --max-time option in the curl manual
curl --max-time 1 URL
I'm doing some cURL work in php 5.3.0.
I'm wondering if there is any way to tell the curl handle/object to keep the cookies in memory (assuming I'm reusing the same handle for multiple requests), or to somehow return them and let me pass them back when making a new handle.
Theres this long accepted method for getting them in/out of the request:
curl_setopt($ch, CURLOPT_COOKIEJAR, $filename);
curl_setopt($ch, CURLOPT_COOKIEFILE, $filename);
But I'm hitting some scenarios where I need to be running multiple copies of a script out of the same directory, and they step on each others cookie files. Yes, I know I could use tempnam() and make sure each run has its own cookie file, but that leads me to my 2nd issue.
There is also the issue of having these cookie files on the disk at all. Disk I/O is slow and a bottle neck I'm sure. I dont want to have to deal with cleaning up the cookie file when the script is finished (if it even exits in a way that lets me clean it up).
Any ideas? Or is this just the way things are?
You can use the CURLOPT_COOKIEJAR option, and set the file to "/dev/null" for Linux / MacOS X or "NULL" for Windows. This will prevent the cookies from being written to disk, but it will keep them around in memory as long as you reuse the handle and don't call curl_easy_cleanup().
Unfortunately, I don't think you can use 'php://memory' as the input and output stream. The workaround is to parse the headers yourself. This can be done pretty easily. Here is an example of a page making two requests and passing the cookies yourself.
curl.php:
<?php
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://localhost/test.php?message=Hello!');
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($curl);
curl_close($curl);
preg_match_all('|Set-Cookie: (.*);|U', $data, $matches);
$cookies = implode('; ', $matches[1]);
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://localhost/test.php');
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_COOKIE, $cookies);
$data = curl_exec($curl);
echo $data;
?>
test.php:
<?php
session_start();
if(isset($_SESSION['message'])) {
echo $_SESSION['message'];
} else {
echo 'No message in session';
}
if(isset($_GET['message'])) {
$_SESSION['message'] = $_GET['message'];
}
?>
This will output 'Hello!' on the second request.
Just set CURLOPT_COOKIEFILE to a file that doesn't exist, usually an empty string is the best option. Then DON'T set CURLOPT_COOKIEJAR, this is the trick. This will prevent a file from being written but the cookies will stay in memory. I just tested this and it works (my test: send http auth data to a URL that redirects you to a login URL that authenticates the request, then redirects you back to the original URL with a cookie).
There is but it's completely unintuitive.
curl_setopt($curl, CURLOPT_COOKIEFILE, "");
For more details please see my answer in the comments
If using Linux, you could set these to point somewhere within /dev/shm .. this will keep them in memory and you can be assured that they won't persist across re-boots.
I somehow thought that Curl's cleanup handled the unlinking of cookies, but I could be mistaken.
What works for me is using this setting:
curl_setopt($ch, CURLOPT_HEADER, 1);
And then parsing the result. Details in this blog post where I found out how to do this.
And since that is old, here is a gist replacing deprecated functions.