I am using the following code, in PHP, to get the thumbnail (from the JSON) of a vimeo video according to the vimeo API
private function curl_get($url)
{
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 30);
$return = curl_exec($curl);
curl_close($curl);
return $return;
}
After profiling the page I noticed that it takes to curl_exec about 220 milliseconds, which I find a lot considering that I only want the thumbnail of the video.
Do you know a faster way to get the thumbnail?
it takes to curl_exec about 220 milliseconds
It's probably the network overhead (DNS lookup - connect - transfer - fetching the transferred data). It may not be possible to speed this up any further.
Make sure you are caching the results locally, so they don't have to be fetched anew every time.
Related
I use the following curl to multipart post contents of a csv file via curl to another server
I have 1500 records in the csv file and it takes around 5 minutes to post the data ..
Is there a way we can make the post of each record faster .
Will curl multi work here ? or will it just end up posting same content multiple times .
move_uploaded_file($_FILES['file']['tmp_name'], __DIR__.'/uploads/'. $_FILES["image"]['name']);
$file = "uploads/payables.csv";
$authorization = "Authorization: Bearer [some_token_key]";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "https://example.org/api/v1/imports.json");
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_HTTPHEADER, [$authorization, 'Content-Type: text/csv']);
$cfile = new CurlFile($file, 'text/csv');
//curl file itself return the realpath with prefix of #
$data = array('data-binary' => $cfile);
curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$curl_response = curl_exec($curl);
curl_close($curl);
Using curl_multi_*() is unlikely to help and is only going to be possible if you can shard the data at both ends of the exchange.
Since you've presented no analysis of where the bottleneck is, its impossible to say what will make this exchange go fast. Certainly its unlikely to be latency, but might be bandwidth, might be network configuration, might be capacity at either the client or the server, might be lots of other things or it might be a combination of factors.
You might consider trying to isolate the problem before you start trying to fix it.
If the issue is bandwidth (and some network configuration issues) then compressing the file should make a significant difference. Unfortunately, I don't think libcurl's option for enabling SSL compression is exposed in PHP - so you'd need to inject a transfer-encoding header and compress the file in your code. Compressing a file in PHP is easy, but only if you can load it all in memory. Compressing a stream is a bit more involved.
I have to call external script in which i make a first call with CURL to get data which takes about 2-3 minutes, Now during this time i need to make other external call with CURL to get the progress of the first call. But issue is my next call wait till the reply of first CURL comes. I also checked curl_multi but that is also not helping me as i want to make many calls when the first call is in progress. So anyone can help me to solve it please.
I suppose that, there is no need to make second call to track the CURL progress. You can achieve the same by using CURL option CURLOPT_PROGRESSFUNCTION with a callback function.
The call back method takes 5 arguments:
cURL resource
Total number of bytes expected to be downloaded
Number of bytes downloaded so far
Total number of bytes expected to be uploaded
Number of bytes uploaded so far
In the callback method you can calculate the percentage downloaded/uploaded. An example is given below:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://stackoverflow.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_PROGRESSFUNCTION, 'progress');
curl_setopt($ch, CURLOPT_NOPROGRESS, false);
curl_setopt($ch, CURLOPT_HEADER, 0);
$html = curl_exec($ch);
curl_close($ch);
function progress($resource,$download_size, $downloaded, $upload_size, $uploaded)
{
if($download_size > 0)
echo $downloaded / $download_size * 100 . "%\n";
sleep(1);
}
There is a way to do this - please see the following links, they explain how to do this using curl_multi_init: php.net curl_multi_init and http://arguments.callee.info/2010/02/21/multiple-curl-requests-with-php/
It's my first time developing an API, which is why i'm not surpirsed it was running a little slow, taking 2-4 seconds to load (I used a microtime timer on my webpage).
But then I found out how long it took for the API commands to execute, they're around 0.002 seconds. So why when I use CURL in PHP, does it take another 2 seconds to load?
My API Connection Code:
function APIPost($DataToSend){
$APILink = curl_init();
curl_setopt($APILink,CURLOPT_URL, "http://api.subjectplanner.co.uk");
curl_setopt($APILink,CURLOPT_POST, 4);
curl_setopt($APILink,CURLOPT_POSTFIELDS, $DataToSend);
curl_setopt($APILink, CURLOPT_HEADER, 0);
curl_setopt($APILink, CURLOPT_RETURNTRANSFER, 1);
return curl_exec($APILink);
curl_close($APILink);
}
How I retrieve data in my web page:
$APIData=array(
'com'=>'todayslessons',
'json'=>'true',
'sid'=>$_COOKIE['SID']
);
$APIResult = json_decode(APIPost($APIData), true);
if($APIResult['functionerror']==0){
$Lessons['Error']=false;
$Lessons['Data']=json_decode($APIResult['data'], true);
}else{
$Lessons['Error']=true;
$Lessons['ErrorDetails']="An error has occured.";
}
The APIPost function is within a functions.php file, which is included at the begging of my page. The time it took from the begging of the second snippet of code, to the end is about 2.0126 seconds. What is the best way to fetch my API data?
This is just a guess, so please dont beat me up about it. But maybe its waiting for curl to complete i.e. timeout as you dont close curl before doing the return.
Try this tiny amendment see if it helps:
function APIPost($DataToSend){
$APILink = curl_init();
curl_setopt($APILink,CURLOPT_URL, "http://api.subjectplanner.co.uk");
curl_setopt($APILink,CURLOPT_POST, 4);
curl_setopt($APILink,CURLOPT_POSTFIELDS, $DataToSend);
curl_setopt($APILink, CURLOPT_HEADER, 0);
curl_setopt($APILink, CURLOPT_RETURNTRANSFER, 1);
$ret curl_exec($APILink);
curl_close($APILink);
return $ret;
}
was searching stackoverflow for a solution, but couldn't find anything even close to what I am trying to achieve. Perhaps I am just blissfully unaware of some magic PHP sauce everyone is doing tackling this problem... ;)
Basically I have an array with give or take a few hundred urls, pointing to different XML files on a remote server. I'm doing some magic file-checking to see if the content of the XML files have changed and if it did, I'll download newer XMLs to my server.
PHP code:
$urls = array(
'http://stackoverflow.com/a-really-nice-file.xml',
'http://stackoverflow.com/another-cool-file2.xml'
);
foreach($urls as $url){
set_time_limit(0);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, false);
$contents = curl_exec($ch);
curl_close($ch);
file_put_contents($filename, $contents);
}
Now, $filename is set somewhere else and gives each xml it's own ID based on my logic.
So far this script is running OK and does what it should, but it does it terribly slow. I know my server can handle a lot more and I suspect my foreach is slowing down the process.
Is there any way I can speed up the foreach? Currently I am thinking to up the file_put_contents in each foreach loop to 10 or 20, basically cutting my execution time 10- or 20-fold, but can't think of how to approach this the best and most performance kind of way. Any help or pointers on how to proceed?
Your bottleneck (most likely) is your curl requests, you can only write to a file after each request is done, there is no way (in a single script) to speed up that process.
I don't know how it all works but you can execute curl requests in parallel: http://php.net/manual/en/function.curl-multi-exec.php.
Maybe you can fetch the data (if memory is available to store it) and then as they complete fill in the data.
Just run more script. Each script will download some urls.
You can get more information about this pattern here: http://en.wikipedia.org/wiki/Thread_pool_pattern
The more script your run the more parallelism you get
I use on paralel requests guzzle pool ;) ( you can send x paralel request)
http://docs.guzzlephp.org/en/stable/quickstart.html
I have to fetch multiple web pages, let's say 100 to 500. Right now I am using curl to do so.
function get_html_page($url) {
//create curl resource
$ch = curl_init();
//set url
curl_setopt($ch, CURLOPT_URL, $url);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
//$output contains the output string
$html = curl_exec($ch);
//close curl resource to free up system resources
curl_close($ch);
return $html;
}
My major concern is the total time taken by my script to fetch all these web pages. I know that the time taken is directly proportional to my internet speed and hence the majority time is taken by $html = curl_exec($ch); function call.
I was thinking that instead of creating and destroying curl instance again and again for each and every web page, if I create it only once and then just reuse it for each and every page and finally in the end destroy it. Something like:
<?php
function get_html_page($ch, $url) {
//$output contains the output string
$html = curl_exec($ch);
return $html;
}
//create curl resource
$ch = curl_init();
//set url
curl_setopt($ch, CURLOPT_URL, $url);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
.
.
.
<fetch web pages using get_html_page()>
.
.
.
//close curl resource to free up system resources
curl_close($ch);
?>
Will it make any significant difference in the total time taken? If there is any other better approach then please let me know about it also?
How about trying to benchmark it? It may be more efficient to do it the second way, but I don't think it will add up to much. I'm sure your system can create and destroy curl instances in microseconds. It has to initiate the same HTTP connections each time either way, too.
If you were running many of these at the same time and were worried about system resources, not time, it might be worth exploring. As you noted, most of the time spent doing this will be waiting for network transfers, so I don't think you'll notice a change in overall time with either method.
For web scraping I would use : YQL + JSON + xPath. You'll implement it using cURL
I think you'll save a lot of resources.