was searching stackoverflow for a solution, but couldn't find anything even close to what I am trying to achieve. Perhaps I am just blissfully unaware of some magic PHP sauce everyone is doing tackling this problem... ;)
Basically I have an array with give or take a few hundred urls, pointing to different XML files on a remote server. I'm doing some magic file-checking to see if the content of the XML files have changed and if it did, I'll download newer XMLs to my server.
PHP code:
$urls = array(
'http://stackoverflow.com/a-really-nice-file.xml',
'http://stackoverflow.com/another-cool-file2.xml'
);
foreach($urls as $url){
set_time_limit(0);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, false);
$contents = curl_exec($ch);
curl_close($ch);
file_put_contents($filename, $contents);
}
Now, $filename is set somewhere else and gives each xml it's own ID based on my logic.
So far this script is running OK and does what it should, but it does it terribly slow. I know my server can handle a lot more and I suspect my foreach is slowing down the process.
Is there any way I can speed up the foreach? Currently I am thinking to up the file_put_contents in each foreach loop to 10 or 20, basically cutting my execution time 10- or 20-fold, but can't think of how to approach this the best and most performance kind of way. Any help or pointers on how to proceed?
Your bottleneck (most likely) is your curl requests, you can only write to a file after each request is done, there is no way (in a single script) to speed up that process.
I don't know how it all works but you can execute curl requests in parallel: http://php.net/manual/en/function.curl-multi-exec.php.
Maybe you can fetch the data (if memory is available to store it) and then as they complete fill in the data.
Just run more script. Each script will download some urls.
You can get more information about this pattern here: http://en.wikipedia.org/wiki/Thread_pool_pattern
The more script your run the more parallelism you get
I use on paralel requests guzzle pool ;) ( you can send x paralel request)
http://docs.guzzlephp.org/en/stable/quickstart.html
Related
So,
I have an issue with needing to make multiple http requests (i mean 100's, even 1000's) using PHP, and these requests need to be run through a proxy to protect the server info/ip.
So, the problem is, to do these requests using php curl is perfectly fine, but takes a long time. If i use php curl_multi, it takes a lot less time. But if i introduce the PROXY to this, the curl_multi takes a significantly longer time than not using curl_multi (curl one after another).
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_URL, $_REQUEST['url']);
//Proxy
$proxyStr = 'user:pass#123.123.123.123:80'; // username:password#ip:port
curl_setopt($ch, CURLOPT_PROXY, $proxyStr);
curl_setopt($ch, CURLOPT_PROXYTYPE, 'HTTP');
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
$req = "something= somethingelse \r\n"; //request that needs to be sent
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $req); // Query
$result = curl_exec($ch);
//This is some sample code, obviously the curl_multi is using curl_multi_init() and curl_multi_exec()
//Here is a sample code for curl_multi, just introduce the proxy code in the middle of it: http://www.phpied.com/simultaneuos-http-requests-in-php-with-curl/
My question is, what am i doing wrong? The curl_multi with proxy takes a HUGE amount of time longer regardless of if i use 2 at a time or 5 at a time. But using multi with 1 at a time is quicker (not quick enough but a LOT quicker than 2 or more).
As a poor man's multithreading, i added a worker script to the server which pretty much takes a http post and does a single curl on the data (post contains the proxy, so this worker uses the proxy). And using the main server, i use curl_multi to POST details to the worker script (about 2-5 requests at a time). This method works "better" but not a huge amount better. Eg. if i use curl_multi to send 10 requests 1 at a time to the worker, it finishes in ~10 secs. If i send 10 requests 5 at a time, it finished in ~8-9 Secs. So its definitely faster, but not as fast as i need it to be.
Does anyone have any ideas/suggestions on why this is the case?
Is there something about php curl_multi and proxies that im missing? And I assumed that multiple PHP requests coming INTO a server would execute in parallel (thus why i was reffering to curl_multi'ing into the worker script which does the proxy as a poor mans multithreading). Anyone suggesting on using pthreads, will that actually solve the problem (it is not practical for me to install pthreads, but from what i can see, i doubt it would solve the problem)?
I am trying to download all the data from an api, so I am curling into it and saving the results a json file. But the execution stops and the results are truncated and never finishes.
How can this be remedied. Maybe the maximum execution time in the server of api cannot serve so long so it stops. I think there are more than 10000 results.
Is there a way to download the first 1000, 2nd 1000 results etc. and by the way, the api uses sails.js for their api,
Here is my code :
<?php
$url = 'http://api.example.com/model';
$data = array (
'app_id' => '234567890976',
'limit' => 100000
);
$fields_string = '';
foreach($data as $key=>$value) { $fields_string .= $key.'='.urlencode($value).'&'; }
$fields_string = rtrim($fields_string,'&');
$url = $url.'?'.$fields_string;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, '300000000');
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
$response = curl_exec($ch);
print($response);
$file = fopen("results.json", 'w+'); // Create a new file, or overwrite the existing one.
fwrite($file, $response);
fclose($file);
curl_close($ch);
Lots of possible problems might be the cause. Without more details that help understand if the problem is on the client or server, such as with error codes or other info, it's hard to say.
Given that you are calling the API with a URL, what happens when you put your URL into a browser? If you get a good response in a browser then it seems likely the problem is with your local configuration and not with node/sails.
Here are a few ideas to see if the problem is local, but I'll admit I can't say any one is the right answer because I don't have enough information to do better:
Check your php.ini settings for memory_limit, max_execution_time and if you are using Apache, the httpd.conf timeout setting. A test using the URL in a browser is a way to see if these settings may help. If the browser downloads the response fine, start checking things like these settings for reasons your system is prematurely ending things.
If you are saving the response to disk and not manipulating the data, you could try removing CURLOPT_RETURNTRANSFER and instead use CURLOPT_FILE. This can be more memory efficient and (in my experience) faster if you don't need the data in-memory. See this article or this article on this site for info on how to do this.
Check what's in curl_errno if the script isn't crashing.
Related: what is your error reporting level? If error reporting is off...why haven't you turned it on as you debug this? If error reporting is on...are you getting any errors?
Given the way you are using foreach to construct a URL, I have to wonder if you are writing a really huge URL with up to 10,000 items in your query string. If so, that's a bad approach. In a situation like that, you could consider breaking up the requests into individual queries and then use curl_multi or the Rolling Curl library that uses curl_multi to do the work to queue and execute multiple requests. (If you are just making a single request and get one gigantic response with tons of detail, this won't be useful.)
Good luck.
what is the best solution for fetch json with more than 100 url because the php script is too slow to do that
sure in the head of script I used set_time_limit(0);
I use this little bit code with cURL but it still slowly
$curl_connection = curl_init($jsonurl);
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
$data = json_decode(curl_exec($curl_connection), true);
curl_close($curl_connection);
what do you think about this ?
This is almost impossible to answer without more context, but it sounds like a job for a job queue and a cron job to process the queue periodically.
You can investigate the use of curl_multi_* functionality. This will allow multiple parallel cURL requests.
Here is a simple PHP REST client I built that leverages curl_multi_*. Feel free to use it.
https://github.com/mikecbrant/php-rest-client
For some reason my curl call is very slow. Here is the code I used.
$postData = "test"
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, false);
$result = curl_exec($ch);
Executing this code takes on average 250ms to finish.
However when I just open the url in a browser, firebug says it only takes about 80ms.
Is there something I am doing wrong? Or is this the overhead associated with PHP Curl.
It's the call to
curl_exec
That is taking up all the time.
UPDATE:
So I figured out right after I posted this that if I set the curl option
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
It significantly slows down
curl_exec
The post data could be anything and it will slow it down.
Even if I set
curl_setopt($ch, CURLOPT_POST, false);
It's slow.
I'll try to work around it by just adding the parameters to the URI as a query string.
SECOND UPDATE:
Confirmed that if I just call the URI using GET and passing parameters
as a query string it is much faster than using POST and putting the parameters in the body.
CURL has some problems with DNS look-ups. Try using IP address instead of domain name.
Curl has the ability to tell exactly how long each piece took and where the slowness is (name lookup, connect, transfer time). Use curl_getinfo (http://www.php.net/manual/en/function.curl-getinfo.php) after you run curl_exec.
If curl is slow, it is generally not the PHP code, it's almost always network related.
try this
curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4 );
Adding "curl_setopt($ch, CURLOPT_POSTREDIR, CURL_REDIR_POST_ALL);" solved here. Any problem with this solution?
I just resolved this exact problem by removing the following two options:
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
Somehow on the site I was fetching, the POST request to over ten full seconds. If it's GET, it's less than a second.
So... in my wrapper function that does the Curl requests, it now only sets those two options when there is something in $postData
I just experienced a massive speed-up through compression. By adding the Accept-Encoding header to "gzip, deflate", or just to all formats which Curl supports, my ~200MB download took 6s instead of 20s:
curl_setopt($ch, CURLOPT_ENCODING, '');
Notes:
If an empty string, "", is set, a header containing all supported encoding types is sent.
you do not even have to care about decompression after the download, as this is done by Curl internally.
CURLOPT_ENCODING requires Curl 7.10+
The curl functions in php directly use the curl command line tool under *nix systems.
Therefore it really only depends on the network speed since in general curl itself is much faster than a webbrowser since it (by default) does not load any additional data like included pictures, stylesheets etc. of a website.
It might be possible that you are not aware, that the network performance of the server on which you were testing your php script is way worse than on your local computer where you were testing with the browser. Therefore both measurements are not really comparable.
generally thats acceptable when you are loading contents or posting to slower end of world. curl call are directly proportional to your network speed and throughput of your webserver
There is an epic lack of PHP cURL love on the Internet for beginners like me. I was wondering how to use cURL to download & display an ICS file (They're plain text to me...) in my PHP code. Unless fopen() is 1,000 times easier, I'd like to stick with cURL for this one.
If your webserver allows it, file_get_contents() is even easier.
echo file_get_contents('http://www.example.com/path/to/your/file.ics');
If you can not open URLs with file_get_contents() check out all the stuff on Stack Overflow, which I believe should be fine for a beginner.
If remote file_get_contents is not enabled, cURL can indeed do this.
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://example.com/file.ics');
// this is the key option - sets curl_exec to return the HTTP response
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$file_contents = curl_exec($curl);