In my php script I am using curl library and the function curl_exec takes 1-5 seconds to be executed ( and for some url it take 10 seconds as well ). It is normal ?
This is my script:
$ch = curl_init();
$timeout = 5;
$url = "http://www.mashable.com/feed";
curl_setopt ($ch, CURLOPT_URL, $url );
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$file_contents = curl_exec($ch);
curl_close($ch);
It is normal ?
Totally depends on your connection, the target URLs, and the server it runs on. It's well possible that it's normal.
If you have command line access to your server, you could try replicating the actions in command line curl and see how long they take there; also try them from your local machine. If there are massive differences, there could be a networking or firewall issue.
But those kinds of loading times are not unheard of.
It doesn't have to be uncommon - you are under the same conditions as if you requested an URL with your own browser - connecting and exchanging requests will take some time and if the URL you are requesting is busy or on a slower connection, the time naturally increases.
Related
While using PHP I am taking image links from my mysql database, and echoing them out. There are 600 or so, but it keeps stopping after running 100 or so. It is not a logic error, it seems there is a setting that is stopping php from continuing the curl. Please advise which setting I should expand to allow a longer CURL thanks!
Here is what I am using now:
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch,CURLOPT_BINARYTRANSFER, true);
$data = curl_exec($ch);
return $data;
}
$htmlaa = file_get_contents_curl($getimagefrom);
$docaa = new DOMDocument();
#$docaa->loadHTML($htmlaa);
Again, it is worknig just fine but just keeps stopping after running for maybe 3 minutes.
You can set the curl timeout like so:
curl_setopt($ch, CURLOPT_TIMEOUT, 1000); //seconds to live
Since there are multiple factors that influence execution time you should also check out these two as well:
http://php.net/manual/en/function.set-time-limit.php
http://php.net/manual/en/info.configuration.php#ini.max-execution-time
Also please note that CURLOPT_TIMEOUT defines the amount of time that any cURL function is allowed to take to execute. You should also checkout CURLOPT_CONNECTTIMEOUT option.
guys.
I'm with serious trouble trying to solve this.
The scenario:
Here at work we use the Vulnerability Management tool QualysGuard.
Skipping all technical details, this tool basically detects vulnerabilities in all servers and for each vulnerability in each server it creates a Ticket Number.
From the UI I can access all these tickets and download a CSV file with all of them.
The other way of doing it is by using the API.
The API uses some cURL calls to access the database and retrieve the info that I specify in the parameters.
The method:
I'm using a script like this to get the data:
<?php
$username="myUserName";
$password="myPassword";
$proxy= "myProxy";
$proxyauth = 'myProxyUser:myProxyPassword';
$url="https://qualysapi.qualys.com/msp/ticket_list.php?"; //This is the official script, provided by Qualys, for doing this task.
$postdata = "show_vuln_details=0&SINCE_TICKET_NUMBER=1&CURRENT_STATE=Open&ASSET_GROUPS=All";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyauth);
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_USERPWD, $username . ":" . $password);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt ($ch, CURLOPT_POST, 1);
$result = curl_exec ($ch);
$xml = simplexml_load_string($result);
?>
The script above works fine. It connects to the API, pass some parameters to it and the ticket_list.php file generates an XML file with all I need.
The Problems:
1-) This script only allows a limit of 1000 results in the XML file it returns.
If my request has generated more than 1000 results, the script creates a TAG like this, at the end of the XML:
<TRUNCATION last="5066">Truncated after 1000 records</TRUNCATION>
In this case, I would need to execute anoter cURL call, with the parameters bellow:
$postdata = "show_vuln_details=0&SINCE_TICKET_NUMBER=5066&CURRENT_STATE=Open&ASSET_GROUPS=All";
2-) There are approximately 300,000 tickets in Qualys' database (cloud), and I need to download all of them and insert in MY database, which is used by an application that I'm creating. This application has some forms, which are filled by the user and a bunch of queries are run against the database.
The doubt:
What would be the best way for me to do the task above?
I've got some ideas, but I'm at a complete loss.
I thought:
**1-)**Create a function that does the call above, parses the xml and if the tag
TRUNCATION exists, it gets its value and call itself again, doing it recursively until a result without the tag TRUNCATIONcomes.
The problem with this one is that I weren't able to merge the XML results of each call, and I'm not sure if it would cause memory issues, since it would be needed nearly 300 cURL calls. This script would be executed automatically by using the server's cronTab in a non-business period.
2-) Instead of retrieving all the data, I make the forms that I've mentioned post the data to the script and make the cURL calls with the parameters that the user POSTed. But again I'm not sure if that would be good, since I would still need to do multiple calls, depending on the parameters that the user sends.
3-) This is a crazy one: Use some sort of Macro software to record me while I log in the UI, go to the page where the tickets are located, click the download button, check the CSV option and click to download again. Then, export this script to some language like python or java, create a task in the cronTab and create a script that parses the CSV downloaded and inserts the data to the database. (Crazy or not? =P )
Any help is very welcome, maybe the answer is right before my eyes and I haven't gotten yet.
Thanks in advance!
I believe the proper way would involve a queue worker, however, If I were you I'd make your script grab 5 of these XML files in one execution- grab 1, insert rows, remove from memory, repeat. Then, I'd test it by running it a few times manually to see what sort of execution time and memory it requires. Once you've got a good idea of the execution time and you can see memory will not be a problem, schedule a cron for a little under double that time. If all goes well it should be about a minute between runs and you can have it all in your DB within an hour.
For some reason my curl call is very slow. Here is the code I used.
$postData = "test"
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, false);
$result = curl_exec($ch);
Executing this code takes on average 250ms to finish.
However when I just open the url in a browser, firebug says it only takes about 80ms.
Is there something I am doing wrong? Or is this the overhead associated with PHP Curl.
It's the call to
curl_exec
That is taking up all the time.
UPDATE:
So I figured out right after I posted this that if I set the curl option
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
It significantly slows down
curl_exec
The post data could be anything and it will slow it down.
Even if I set
curl_setopt($ch, CURLOPT_POST, false);
It's slow.
I'll try to work around it by just adding the parameters to the URI as a query string.
SECOND UPDATE:
Confirmed that if I just call the URI using GET and passing parameters
as a query string it is much faster than using POST and putting the parameters in the body.
CURL has some problems with DNS look-ups. Try using IP address instead of domain name.
Curl has the ability to tell exactly how long each piece took and where the slowness is (name lookup, connect, transfer time). Use curl_getinfo (http://www.php.net/manual/en/function.curl-getinfo.php) after you run curl_exec.
If curl is slow, it is generally not the PHP code, it's almost always network related.
try this
curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4 );
Adding "curl_setopt($ch, CURLOPT_POSTREDIR, CURL_REDIR_POST_ALL);" solved here. Any problem with this solution?
I just resolved this exact problem by removing the following two options:
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
Somehow on the site I was fetching, the POST request to over ten full seconds. If it's GET, it's less than a second.
So... in my wrapper function that does the Curl requests, it now only sets those two options when there is something in $postData
I just experienced a massive speed-up through compression. By adding the Accept-Encoding header to "gzip, deflate", or just to all formats which Curl supports, my ~200MB download took 6s instead of 20s:
curl_setopt($ch, CURLOPT_ENCODING, '');
Notes:
If an empty string, "", is set, a header containing all supported encoding types is sent.
you do not even have to care about decompression after the download, as this is done by Curl internally.
CURLOPT_ENCODING requires Curl 7.10+
The curl functions in php directly use the curl command line tool under *nix systems.
Therefore it really only depends on the network speed since in general curl itself is much faster than a webbrowser since it (by default) does not load any additional data like included pictures, stylesheets etc. of a website.
It might be possible that you are not aware, that the network performance of the server on which you were testing your php script is way worse than on your local computer where you were testing with the browser. Therefore both measurements are not really comparable.
generally thats acceptable when you are loading contents or posting to slower end of world. curl call are directly proportional to your network speed and throughput of your webserver
I want to run my php script for every 5 minutes. Here is my PHP code.
function call_remote_file($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
}
set_time_limit(0);
$root='http://mywebsiteurl'; //remote location of the invoking and the working script
$url=$root."invoker.php";
$workurl=$root."script.php";
call_remote_file($workurl);//call working script
sleep(60*5);// wait for 300 seconds.
call_remote_file($url); //call again this script
I run this code once. It works perfectly, even after i close the entire browser window.
The problem is the stops working if i turn of my system's internet connect.
How to solve this problem. Please help me out.
While I wouldn't really recommend doing this for something critical (you're going to have stability issues), this could work:
function call_remote_file($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
}
set_time_limit(0);
$root='http://mywebsiteurl'; //remote location of the invoking and the working script
$url=$root."invoker.php";
$workurl=$root."script.php";
while(true)
{
call_remote_file($workurl);//call working script
sleep(60*5);// wait for 300 seconds.
}
Another way would be to call it from the command line using exec():
function call_remote_file($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
}
set_time_limit(0);
$root='http://mywebsiteurl'; //remote location of the invoking and the working script
$url=$root."invoker.php";
$workurl=$root."script.php";
call_remote_file($workurl);//call working script
sleep(60*5);// wait for 300 seconds.
exec('php ' . $_SERVER['SCRIPT_FILENAME']);
You should really use cron though if at all possible.
The above code is ok but if you want to add multiple scripts to run at different intervals then the coding becomes far more complicated.
If you try phpjobscheduler (open source so free to use) it provides an interface to add, modify and remove scripts to run.
I have simple code that does a head request for a URL and then prints the response headers. I've noticed that on some sites, this can take a long time to complete.
For example, requesting http://www.arstechnica.com takes about two minutes. I've tried the same request using another web site that does the same basic task, and it comes back immediately. So there must be something I have set incorrectly that's causing this delay.
Here's the code I have:
$ch = curl_init();
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt ($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
// Only calling the head
curl_setopt($ch, CURLOPT_HEADER, true); // header will be at output
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
$content = curl_exec ($ch);
curl_close ($ch);
Here's a link to the web site that does the same function: http://www.seoconsultants.com/tools/headers.asp
The code above, at least on my server, takes two minutes to retrieve www.arstechnica.com, but the service at the link above returns it right away.
What am I missing?
Try simplifying it a little bit:
print htmlentities(file_get_contents("http://www.arstechnica.com"));
The above outputs instantly on my webserver. If it doesn't on yours, there's a good chance your web host has some kind of setting in place to throttle these kind of requests.
EDIT:
Since the above happens instantly for you, try setting this curl setting on your original code:
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
Using the tool you posted, I noticed that http://www.arstechnica.com has a 301 header sent for any request sent to it. It is possible that cURL is getting this and not following the new Location specified to it, thus causing your script to hang.
SECOND EDIT:
Curiously enough, trying the same code you have above was making my webserver hang too. I replaced this code:
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
With this:
curl_setopt($ch, CURLOPT_NOBODY, true);
Which is the way the manual recommends you do a HEAD request. It made it work instantly.
You have to remember that HEAD is only a suggestion to the web server. For HEAD to do the right thing it often takes some explicit effort on the part of the admins. If you HEAD a static file Apache (or whatever your webserver is) will often step in an do the right thing. If you HEAD a dynamic page, the default for most setups is to execute the GET path, collect all the results, and just send back the headers without the content. If that application is in a 3 (or more) tier setup, that call could potentially be very expensive and needless for a HEAD context. For instance, on a Java servlet, by default doHead() just calls doGet(). To do something a little smarter for the application the developer would have to explicitly implement doHead() (and more often than not, they will not).
I encountered an app from a fortune 100 company that is used for downloading several hundred megabytes of pricing information. We'd check for updates to that data by executing HEAD requests fairly regularly until the modified date changed. It turns out that this request would actually make back end calls to generate this list every time we made the request which involved gigabytes of data on their back end and xfer it between several internal servers. They weren't terribly happy with us but once we explained the use case they quickly came up with an alternate solution. If they had implemented HEAD, rather than relying on their web server to fake it, it would not have been an issue.
If my memory doesn't fails me doing a HEAD request in CURL changes the HTTP protocol version to 1.0 (which is slow and probably the guilty part here) try changing that to:
$ch = curl_init();
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt ($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
// Only calling the head
curl_setopt($ch, CURLOPT_HEADER, true); // header will be at output
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); // ADD THIS
$content = curl_exec ($ch);
curl_close ($ch);
I used the below function to find out the redirected URL.
$head = get_headers($url, 1);
The second argument makes it return an array with keys. For e.g. the below will give the Location value.
$head["Location"]
http://php.net/manual/en/function.get-headers.php
This:
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
I wasn't trying to get headers.
I was just trying to make the page load of some data not take 2 minutes similar to described above.
That magical little options has dropped it down to 2 seconds.