For some reason my curl call is very slow. Here is the code I used.
$postData = "test"
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, false);
$result = curl_exec($ch);
Executing this code takes on average 250ms to finish.
However when I just open the url in a browser, firebug says it only takes about 80ms.
Is there something I am doing wrong? Or is this the overhead associated with PHP Curl.
It's the call to
curl_exec
That is taking up all the time.
UPDATE:
So I figured out right after I posted this that if I set the curl option
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
It significantly slows down
curl_exec
The post data could be anything and it will slow it down.
Even if I set
curl_setopt($ch, CURLOPT_POST, false);
It's slow.
I'll try to work around it by just adding the parameters to the URI as a query string.
SECOND UPDATE:
Confirmed that if I just call the URI using GET and passing parameters
as a query string it is much faster than using POST and putting the parameters in the body.
CURL has some problems with DNS look-ups. Try using IP address instead of domain name.
Curl has the ability to tell exactly how long each piece took and where the slowness is (name lookup, connect, transfer time). Use curl_getinfo (http://www.php.net/manual/en/function.curl-getinfo.php) after you run curl_exec.
If curl is slow, it is generally not the PHP code, it's almost always network related.
try this
curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4 );
Adding "curl_setopt($ch, CURLOPT_POSTREDIR, CURL_REDIR_POST_ALL);" solved here. Any problem with this solution?
I just resolved this exact problem by removing the following two options:
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
Somehow on the site I was fetching, the POST request to over ten full seconds. If it's GET, it's less than a second.
So... in my wrapper function that does the Curl requests, it now only sets those two options when there is something in $postData
I just experienced a massive speed-up through compression. By adding the Accept-Encoding header to "gzip, deflate", or just to all formats which Curl supports, my ~200MB download took 6s instead of 20s:
curl_setopt($ch, CURLOPT_ENCODING, '');
Notes:
If an empty string, "", is set, a header containing all supported encoding types is sent.
you do not even have to care about decompression after the download, as this is done by Curl internally.
CURLOPT_ENCODING requires Curl 7.10+
The curl functions in php directly use the curl command line tool under *nix systems.
Therefore it really only depends on the network speed since in general curl itself is much faster than a webbrowser since it (by default) does not load any additional data like included pictures, stylesheets etc. of a website.
It might be possible that you are not aware, that the network performance of the server on which you were testing your php script is way worse than on your local computer where you were testing with the browser. Therefore both measurements are not really comparable.
generally thats acceptable when you are loading contents or posting to slower end of world. curl call are directly proportional to your network speed and throughput of your webserver
Related
So,
I have an issue with needing to make multiple http requests (i mean 100's, even 1000's) using PHP, and these requests need to be run through a proxy to protect the server info/ip.
So, the problem is, to do these requests using php curl is perfectly fine, but takes a long time. If i use php curl_multi, it takes a lot less time. But if i introduce the PROXY to this, the curl_multi takes a significantly longer time than not using curl_multi (curl one after another).
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_URL, $_REQUEST['url']);
//Proxy
$proxyStr = 'user:pass#123.123.123.123:80'; // username:password#ip:port
curl_setopt($ch, CURLOPT_PROXY, $proxyStr);
curl_setopt($ch, CURLOPT_PROXYTYPE, 'HTTP');
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
$req = "something= somethingelse \r\n"; //request that needs to be sent
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $req); // Query
$result = curl_exec($ch);
//This is some sample code, obviously the curl_multi is using curl_multi_init() and curl_multi_exec()
//Here is a sample code for curl_multi, just introduce the proxy code in the middle of it: http://www.phpied.com/simultaneuos-http-requests-in-php-with-curl/
My question is, what am i doing wrong? The curl_multi with proxy takes a HUGE amount of time longer regardless of if i use 2 at a time or 5 at a time. But using multi with 1 at a time is quicker (not quick enough but a LOT quicker than 2 or more).
As a poor man's multithreading, i added a worker script to the server which pretty much takes a http post and does a single curl on the data (post contains the proxy, so this worker uses the proxy). And using the main server, i use curl_multi to POST details to the worker script (about 2-5 requests at a time). This method works "better" but not a huge amount better. Eg. if i use curl_multi to send 10 requests 1 at a time to the worker, it finishes in ~10 secs. If i send 10 requests 5 at a time, it finished in ~8-9 Secs. So its definitely faster, but not as fast as i need it to be.
Does anyone have any ideas/suggestions on why this is the case?
Is there something about php curl_multi and proxies that im missing? And I assumed that multiple PHP requests coming INTO a server would execute in parallel (thus why i was reffering to curl_multi'ing into the worker script which does the proxy as a poor mans multithreading). Anyone suggesting on using pthreads, will that actually solve the problem (it is not practical for me to install pthreads, but from what i can see, i doubt it would solve the problem)?
We're having problems with an api we are using.
Here is the code we're using (naming no names on the api front)
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://apiurl.com/whatever/api/we/call');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$ch_output = curl_exec($ch);
curl_close($ch);
This response times out, but not for ages. This is hideously slowing down our web app, and as such further code breaks because of the bad return value. This I can fix, however the response timeout I don't know how to fix. Is there any way to quickly see if a url is "responding" (e.g. something like ping in terminal) before trying to do a curl request?
Thank you.
Do you mean usingcurl_setopt($ch,CURLOPT_CONNECTTIMEOUT,NUMERIC_TIMEOUT_VALUE);to set the timeout?
Your best option would be to set the timeout on curl to a more acceptable level. There are several timeout options available for DNS lookup, connect timeout, transfer timeout, etc. More information is available here http://php.net/manual/en/function.curl-setopt.php
I'm encountering a problem to which I can't find a solution anywhere. Even worse, none else seems to have this problem so I'm probably doing something very stupid.
Some background info: I'm trying to make a proxy-like page that forwards an AJAX request to a different server. This to circumvent the same-domain-policy. All I want this code to do is take the POST variables, forward them to a different page, and then return the results. It's been working but for 1 thing: every time it waits for the timeout to continue. I've put it to 1 second now, so it's doing ok for now, but I'd rather have a fast response and proper timeout.
Here's my code:
// create a new cURL resource
$call = curl_init();
// set URL and other appropriate options
curl_setopt($call, CURLOPT_URL, $url);
curl_setopt($call, CURLOPT_POST, true);
curl_setopt($call, CURLOPT_POSTFIELDS, $params);
curl_setopt($call, CURLOPT_HEADER, false);
curl_setopt($call, CURLOPT_RETURNTRANSFER, true);
curl_setopt($call, CURLOPT_CONNECTTIMEOUT, 1);
// grab URL and pass it to the browser
$response = curl_exec($call);
// close cURL resource, and free up system resources
curl_close($call);
echo $response;
I've tried sending a "Connection: close" header with it, and several ways to make the target code specify that it's done running (setting Content-length, flushing, die(), etc.). At this point I really don't know what's going on, what surprises me most is that I can't find anyone with a similar problem.
Who can help me?
This would make sense if the server weren't actually completing the request. This would be expected in a page streaming or service streaming scenario. Are you sure that the server is actually returning a full and complete HTTP response to each request?
Sounds like it's trying to connect, timing out, and the retry is working.
This fixed it for me:
curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
I can connect on the commandline via ipv6, so I don't know why this helps.
I'm using a web-service from a provider who is being a little too helpful in anticipating my needs. They have given me a HTML snippet to paste on my website, for users to click on to trigger their services. I'd prefer to script this process, so I've got a php script which posts a cURL request to the same url, as appropriate. However, this provider is keeping tabs on my session, and interprets each new request as an update of the first one, rather than each being a unique request.
I've contacted the provider regarding my issue, and they've gone so far as to inform me that their system is working as intended, and that it's impossible for me to avoid using the same ASP.NET session for each subsequent cURL request. While my favored option would be to switch to a different vendor, that doesn't appear to be an option right now. Is there a reliable way to get a new ASP.NET session with each cURL request?
I've tried the following set of CURLOPT's, to no avail:
//initialize curl
$ch = curl_init($url);
//build a string out of the post_vars
$post_str = http_build_query($post_vars);
//set the necessary curl options
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_str);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIESESSION, 1);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_FORBID_REUSE, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "UZ_".uniqid());
curl_setopt($ch, CURLOPT_REFERER, CURRENT_SITE_URL."index.php?newsession=".uniqid());
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Pragma: no-cache", "Cache-Control: no-cache"));
//execute the call to the backend script, retrieve the results
$xmlstr = curl_exec($ch);
If cURL isn't helping much, why not try other methods to call the services from your script, like php's file() function, or file_get_contents().
If you see do not see any difference at all, then the service provider might be using your ip to track your requests. Try using some proxy for a test.
Normal Asp.net session is tracked by a cookie called ASP.NET_SessionId. This cookie is sent within the response to your first request. So as long as your curl requests don't send back this asp.net cookie, each of your requests will have no connection to each other. Use the curl -c option to see what cookies are flying in-between you and them. Overriding this cookie with a cookie file should work if you confirm that it is normal asp.net session being used here.
It is quite poor for a service to use session (http has much cleaner ways of maintaining state which ReST exploits) so I wouldn't completely rule out the vendor switch option.
Well given the options you are using, it seems you have covered your basics. Can you find out how their sessions are setup?
If you know how they setup a session, IE what they use (if it is IP or what not) and then you can figure out a work around. Another option is trying to set the cookies in a different cookie file:
CURLOPT_COOKIEFILE - The name of the file containing the cookie data. The cookie file can be in Netscape format, or just plain HTTP-style headers dumped into a file.
But if all they do is check cookies your current code should work. If you can figure out what the cookie's name is, you can pass a custom cookie that is blank with the request to see if that works. But if you can get information out of them on how their session's work, that would be best.
use these two line to handle the session:
curl_setopt($ch, CURLOPT_COOKIEJAR, "path/to/cookies.txt"); // cookies.txt should be writable
curl_setopt($ch, CURLOPT_COOKIEFILE, "path/to/cookies.txt");
I have simple code that does a head request for a URL and then prints the response headers. I've noticed that on some sites, this can take a long time to complete.
For example, requesting http://www.arstechnica.com takes about two minutes. I've tried the same request using another web site that does the same basic task, and it comes back immediately. So there must be something I have set incorrectly that's causing this delay.
Here's the code I have:
$ch = curl_init();
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt ($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
// Only calling the head
curl_setopt($ch, CURLOPT_HEADER, true); // header will be at output
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
$content = curl_exec ($ch);
curl_close ($ch);
Here's a link to the web site that does the same function: http://www.seoconsultants.com/tools/headers.asp
The code above, at least on my server, takes two minutes to retrieve www.arstechnica.com, but the service at the link above returns it right away.
What am I missing?
Try simplifying it a little bit:
print htmlentities(file_get_contents("http://www.arstechnica.com"));
The above outputs instantly on my webserver. If it doesn't on yours, there's a good chance your web host has some kind of setting in place to throttle these kind of requests.
EDIT:
Since the above happens instantly for you, try setting this curl setting on your original code:
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
Using the tool you posted, I noticed that http://www.arstechnica.com has a 301 header sent for any request sent to it. It is possible that cURL is getting this and not following the new Location specified to it, thus causing your script to hang.
SECOND EDIT:
Curiously enough, trying the same code you have above was making my webserver hang too. I replaced this code:
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
With this:
curl_setopt($ch, CURLOPT_NOBODY, true);
Which is the way the manual recommends you do a HEAD request. It made it work instantly.
You have to remember that HEAD is only a suggestion to the web server. For HEAD to do the right thing it often takes some explicit effort on the part of the admins. If you HEAD a static file Apache (or whatever your webserver is) will often step in an do the right thing. If you HEAD a dynamic page, the default for most setups is to execute the GET path, collect all the results, and just send back the headers without the content. If that application is in a 3 (or more) tier setup, that call could potentially be very expensive and needless for a HEAD context. For instance, on a Java servlet, by default doHead() just calls doGet(). To do something a little smarter for the application the developer would have to explicitly implement doHead() (and more often than not, they will not).
I encountered an app from a fortune 100 company that is used for downloading several hundred megabytes of pricing information. We'd check for updates to that data by executing HEAD requests fairly regularly until the modified date changed. It turns out that this request would actually make back end calls to generate this list every time we made the request which involved gigabytes of data on their back end and xfer it between several internal servers. They weren't terribly happy with us but once we explained the use case they quickly came up with an alternate solution. If they had implemented HEAD, rather than relying on their web server to fake it, it would not have been an issue.
If my memory doesn't fails me doing a HEAD request in CURL changes the HTTP protocol version to 1.0 (which is slow and probably the guilty part here) try changing that to:
$ch = curl_init();
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt ($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
// Only calling the head
curl_setopt($ch, CURLOPT_HEADER, true); // header will be at output
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); // ADD THIS
$content = curl_exec ($ch);
curl_close ($ch);
I used the below function to find out the redirected URL.
$head = get_headers($url, 1);
The second argument makes it return an array with keys. For e.g. the below will give the Location value.
$head["Location"]
http://php.net/manual/en/function.get-headers.php
This:
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
I wasn't trying to get headers.
I was just trying to make the page load of some data not take 2 minutes similar to described above.
That magical little options has dropped it down to 2 seconds.