PHP Curl Multi (curl_multi) and proxies, - php

So,
I have an issue with needing to make multiple http requests (i mean 100's, even 1000's) using PHP, and these requests need to be run through a proxy to protect the server info/ip.
So, the problem is, to do these requests using php curl is perfectly fine, but takes a long time. If i use php curl_multi, it takes a lot less time. But if i introduce the PROXY to this, the curl_multi takes a significantly longer time than not using curl_multi (curl one after another).
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_URL, $_REQUEST['url']);
//Proxy
$proxyStr = 'user:pass#123.123.123.123:80'; // username:password#ip:port
curl_setopt($ch, CURLOPT_PROXY, $proxyStr);
curl_setopt($ch, CURLOPT_PROXYTYPE, 'HTTP');
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
$req = "something= somethingelse \r\n"; //request that needs to be sent
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $req); // Query
$result = curl_exec($ch);
//This is some sample code, obviously the curl_multi is using curl_multi_init() and curl_multi_exec()
//Here is a sample code for curl_multi, just introduce the proxy code in the middle of it: http://www.phpied.com/simultaneuos-http-requests-in-php-with-curl/
My question is, what am i doing wrong? The curl_multi with proxy takes a HUGE amount of time longer regardless of if i use 2 at a time or 5 at a time. But using multi with 1 at a time is quicker (not quick enough but a LOT quicker than 2 or more).
As a poor man's multithreading, i added a worker script to the server which pretty much takes a http post and does a single curl on the data (post contains the proxy, so this worker uses the proxy). And using the main server, i use curl_multi to POST details to the worker script (about 2-5 requests at a time). This method works "better" but not a huge amount better. Eg. if i use curl_multi to send 10 requests 1 at a time to the worker, it finishes in ~10 secs. If i send 10 requests 5 at a time, it finished in ~8-9 Secs. So its definitely faster, but not as fast as i need it to be.
Does anyone have any ideas/suggestions on why this is the case?
Is there something about php curl_multi and proxies that im missing? And I assumed that multiple PHP requests coming INTO a server would execute in parallel (thus why i was reffering to curl_multi'ing into the worker script which does the proxy as a poor mans multithreading). Anyone suggesting on using pthreads, will that actually solve the problem (it is not practical for me to install pthreads, but from what i can see, i doubt it would solve the problem)?

Related

PHP : understand the CURL timeout

From a php page, i have to do a get to another php file.
I don't care to wait for the response of the get or know whether it is successful or not.
The file called could end the script also in 5-6 seconds, so i don't know how to handle the get timeout considering what has been said before.
The code is this
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://mywebsite/myfile.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);
curl_setopt($ch, CURLOPT_TIMEOUT, 1);
$content = trim(curl_exec($ch));
curl_close($ch);
For the first task (Where you don't need to wait for response )you can start new background process and below that write code which will redirect you on another page.
Yeah, you definitely shouldn't be creating a file on the server in response to a GET request. Even as a side-effect, it's less than ideal; as the main purpose of the request, it just doesn't make sense.
If you were doing this as a POST, you'd still have the same issue to work with, however. In that case, if the action can't be guaranteed to happen quickly enough to be acceptable in the context of HTTP, you'll need to hive it off somewhere else. E.g. make your HTTP request send a message to some other system which then works in parallel whilst the HTTP response is free to be sent back immediately.

JSON more than 100 urls PHP

what is the best solution for fetch json with more than 100 url because the php script is too slow to do that
sure in the head of script I used set_time_limit(0);
I use this little bit code with cURL but it still slowly
$curl_connection = curl_init($jsonurl);
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
$data = json_decode(curl_exec($curl_connection), true);
curl_close($curl_connection);
what do you think about this ?
This is almost impossible to answer without more context, but it sounds like a job for a job queue and a cron job to process the queue periodically.
You can investigate the use of curl_multi_* functionality. This will allow multiple parallel cURL requests.
Here is a simple PHP REST client I built that leverages curl_multi_*. Feel free to use it.
https://github.com/mikecbrant/php-rest-client

Running file_put_contents in parallel?

was searching stackoverflow for a solution, but couldn't find anything even close to what I am trying to achieve. Perhaps I am just blissfully unaware of some magic PHP sauce everyone is doing tackling this problem... ;)
Basically I have an array with give or take a few hundred urls, pointing to different XML files on a remote server. I'm doing some magic file-checking to see if the content of the XML files have changed and if it did, I'll download newer XMLs to my server.
PHP code:
$urls = array(
'http://stackoverflow.com/a-really-nice-file.xml',
'http://stackoverflow.com/another-cool-file2.xml'
);
foreach($urls as $url){
set_time_limit(0);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, false);
$contents = curl_exec($ch);
curl_close($ch);
file_put_contents($filename, $contents);
}
Now, $filename is set somewhere else and gives each xml it's own ID based on my logic.
So far this script is running OK and does what it should, but it does it terribly slow. I know my server can handle a lot more and I suspect my foreach is slowing down the process.
Is there any way I can speed up the foreach? Currently I am thinking to up the file_put_contents in each foreach loop to 10 or 20, basically cutting my execution time 10- or 20-fold, but can't think of how to approach this the best and most performance kind of way. Any help or pointers on how to proceed?
Your bottleneck (most likely) is your curl requests, you can only write to a file after each request is done, there is no way (in a single script) to speed up that process.
I don't know how it all works but you can execute curl requests in parallel: http://php.net/manual/en/function.curl-multi-exec.php.
Maybe you can fetch the data (if memory is available to store it) and then as they complete fill in the data.
Just run more script. Each script will download some urls.
You can get more information about this pattern here: http://en.wikipedia.org/wiki/Thread_pool_pattern
The more script your run the more parallelism you get
I use on paralel requests guzzle pool ;) ( you can send x paralel request)
http://docs.guzzlephp.org/en/stable/quickstart.html

php curl check if url is reachable before query

We're having problems with an api we are using.
Here is the code we're using (naming no names on the api front)
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://apiurl.com/whatever/api/we/call');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$ch_output = curl_exec($ch);
curl_close($ch);
This response times out, but not for ages. This is hideously slowing down our web app, and as such further code breaks because of the bad return value. This I can fix, however the response timeout I don't know how to fix. Is there any way to quickly see if a url is "responding" (e.g. something like ping in terminal) before trying to do a curl request?
Thank you.
Do you mean usingcurl_setopt($ch,CURLOPT_CONNECTTIMEOUT,NUMERIC_TIMEOUT_VALUE);to set the timeout?
Your best option would be to set the timeout on curl to a more acceptable level. There are several timeout options available for DNS lookup, connect timeout, transfer timeout, etc. More information is available here http://php.net/manual/en/function.curl-setopt.php

PHP Curl Slowness

For some reason my curl call is very slow. Here is the code I used.
$postData = "test"
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, false);
$result = curl_exec($ch);
Executing this code takes on average 250ms to finish.
However when I just open the url in a browser, firebug says it only takes about 80ms.
Is there something I am doing wrong? Or is this the overhead associated with PHP Curl.
It's the call to
curl_exec
That is taking up all the time.
UPDATE:
So I figured out right after I posted this that if I set the curl option
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
It significantly slows down
curl_exec
The post data could be anything and it will slow it down.
Even if I set
curl_setopt($ch, CURLOPT_POST, false);
It's slow.
I'll try to work around it by just adding the parameters to the URI as a query string.
SECOND UPDATE:
Confirmed that if I just call the URI using GET and passing parameters
as a query string it is much faster than using POST and putting the parameters in the body.
CURL has some problems with DNS look-ups. Try using IP address instead of domain name.
Curl has the ability to tell exactly how long each piece took and where the slowness is (name lookup, connect, transfer time). Use curl_getinfo (http://www.php.net/manual/en/function.curl-getinfo.php) after you run curl_exec.
If curl is slow, it is generally not the PHP code, it's almost always network related.
try this
curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4 );
Adding "curl_setopt($ch, CURLOPT_POSTREDIR, CURL_REDIR_POST_ALL);" solved here. Any problem with this solution?
I just resolved this exact problem by removing the following two options:
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
Somehow on the site I was fetching, the POST request to over ten full seconds. If it's GET, it's less than a second.
So... in my wrapper function that does the Curl requests, it now only sets those two options when there is something in $postData
I just experienced a massive speed-up through compression. By adding the Accept-Encoding header to "gzip, deflate", or just to all formats which Curl supports, my ~200MB download took 6s instead of 20s:
curl_setopt($ch, CURLOPT_ENCODING, '');
Notes:
If an empty string, "", is set, a header containing all supported encoding types is sent.
you do not even have to care about decompression after the download, as this is done by Curl internally.
CURLOPT_ENCODING requires Curl 7.10+
The curl functions in php directly use the curl command line tool under *nix systems.
Therefore it really only depends on the network speed since in general curl itself is much faster than a webbrowser since it (by default) does not load any additional data like included pictures, stylesheets etc. of a website.
It might be possible that you are not aware, that the network performance of the server on which you were testing your php script is way worse than on your local computer where you were testing with the browser. Therefore both measurements are not really comparable.
generally thats acceptable when you are loading contents or posting to slower end of world. curl call are directly proportional to your network speed and throughput of your webserver

Categories