why would cURL in PHP return timeout message when get HTML from web page?
Here is the PHP code.
function getFromUrl( $url )
{
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($curl);
if (curl_errno($curl))
{
echo 'Error:' . curl_error($curl) . '<br>' ;
}
curl_close($curl);
return $result ;
}
I get the expected results when I run the function with www.google.com as the URL.
$url = 'http://www.google.com' ;
$result = getFromUrl($url) ;
But, when I pass in the URL of web page on a 2nd web server, I get a timeout response. The URL exists when I paste it into a browser. Why the timeout message?
$url = "http://xxx.54.20.170:10080/accounting/tester/hello.html" ;
echo $url . '<br>' ;
$rv = getFromUrl( $url ) ;
echo $rv . '<br>' ;
here is the cURL error message:
Error:Failed to connect to xxx.54.20.170 port 10080: Connection timed out
I am looking to transfer data from one web server to another.
thanks,
For PHP,
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 400); //timeout in seconds
From terminal first check if curl is working using below extra options.
--connect-timeout
Maximum time in seconds that you allow the connection to the
server to take. This only limits the connection phase, once
curl has connected this option is of no more use. Since 7.32.0,
this option accepts decimal values, but the actual timeout will
decrease in accuracy as the specified timeout increases in deci‐
mal precision. See also the -m, --max-time option.
If this option is used several times, the last one will be used.
and
-m, --max-time
Maximum time in seconds that you allow the whole operation to
take. This is useful for preventing your batch jobs from hang‐
ing for hours due to slow networks or links going down. Since
7.32.0, this option accepts decimal values, but the actual time‐
out will decrease in accuracy as the specified timeout increases
in decimal precision. See also the --connect-timeout option.
If this option is used several times, the last one will be used.
Try to use them to increase timeout time.
There are many reasons for curl not working. Some of them can be,
1) Response time is slow.
2) Few site has check on few header parameters to respond to request. These parameters include User-Agent, Referer, etc to make sure it is coming from valid source and not through bots.
Related
I want the following curl code stable for up to 50 connections from different ip's, So it can easily handle up to 50 connections requests at once without hanging or putting much load on server.
Actually i am on shared hosting. but i want this curl script to make less load on server even if.. if in future it gets more than 50 or 100 requests at once. otherwise my hosting resources can be limited by admin if i put high load on shared hosting server.
One more thing i have to tell that, each request fetch just average 30kb file from remote server with this curl script. So i think each request job will complete in few seconds less than 3 seconds. because file size is very small.
Also Please tell me is this script needs any modification like (curl_multi) to face 50 to 100 small requests at once ? ...OR it is perfect and NO need of any modification. ... OR i just need make changes in shared hosting php ini settings via cpanel.
$userid = $_GET['id'];
if (file_exists($userid.".txt") && (filemtime($userid.".txt") > (time() - 3600 * $ttime ))) {
$ffile = file_get_contents($userid.".txt");} else {
$dcurl = curl_init();
$ffile = fopen($userid.".txt", "w+");
curl_setopt($dcurl, CURLOPT_URL,"http://remoteserver.com/data/$userid");
curl_setopt($dcurl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($dcurl, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);
curl_setopt($dcurl, CURLOPT_TIMEOUT, 50);
curl_setopt($dcurl, CURLOPT_FILE, $ffile);
$ffile = curl_exec($dcurl);
if(curl_errno($dcurl)) // check for execution errors
{
echo 'Script error: ' . curl_error($dcurl);
exit;
}
curl_close($dcurl);$ffile = file_get_contents($userid.".txt");}
you can use curl_multi
http://php.net/manual/en/function.curl-multi-init.php - description and example
I am using an API provided by flipkart.com, this allows me to search and get results output as json.
The code I am using is:
$snapword = $_GET['p'];
$snapword = str_replace(' ','+',$snapword);
$headers = array(
'Fk-Affiliate-Id: myaffid',
'Fk-Affiliate-Token: c0f74c4esometokesndad68f50666'
);
$pattern = "#\(.*?\)#";
$snapword = preg_replace($pattern,'',$snapword);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://affiliate-api.flipkart.net/affiliate/search/json?query='.$snapword.'&resultCount=5');
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_USERAGENT,'php');
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$snapdeal = curl_exec($ch);
curl_close($ch);
$time_end = microtime(true);
$time = $time_end - $time_start;
echo "Process Time: {$time}";
and the time it is taking is : Process Time: 5.3794288635254
Which is way too much, any ideas on how to reduce this?
Use curl_getinfo() to retrieve more accurate information. It also shows how much time spent resolving DNS etc.
You can see exact times taken for each step with the following keys:
CURLINFO_TOTAL_TIME - Total transaction time in seconds for last transfer
CURLINFO_NAMELOOKUP_TIME - Time in seconds until name resolving was complete
CURLINFO_CONNECT_TIME - Time in seconds it took to establish the connection
CURLINFO_PRETRANSFER_TIME - Time in seconds from start until just before file transfer begins
CURLINFO_STARTTRANSFER_TIME - Time in seconds until the first byte is about to be transferred
CURLINFO_SPEED_DOWNLOAD - Average download speed
CURLINFO_SPEED_UPLOAD - Average upload speed
$info = curl_getinfo($curl);
echo $info['connect_time']; // Same as above, but lower letters without CURLINFO
Most probably, the API is slow.
You could try to change to a faster DNS server (in Linux: /etc/resolv.conf).
Other than that, not much you can do.
I would see if you can determine your servers connection speed in your terminal/console window. This would greatly impact the time it takes to access a resource on the web. Also, you might want to consider thinking about the response time it takes from the resource, as the page needs to get the requested information and send it back.
I would also consider saving as much information that you need using a cronjob late at night so that you don't have to handle this upfront.
I am running a dedicated server that fetches data from an API server. My machine runs on a Windows Server 2008 OS.
I use PHP curl function to fetch the data via http requests ( and using proxy ). The function I've created for that:
function get_http($url)
{
$proxy_file = file_get_contents("proxylist.txt");
$proxy_file = explode("
", $proxy_file);
$how_Many_Proxies = count($proxy_file);
$which_Proxy = rand(0,$how_Many_Proxies);
$proxy = $proxy_file[$which_Proxy];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
return $curl_scraped_page;
}
I then save it in the MySQL database using this simple code that I run 20-40-60-100 versions in parallel with curl ( after some number, it doesn't increase performance and I wonder where is the bottleneck? ):
function retrieveData($id)
{
$the_data = get_http("http://api-service-ip-address/?id=$id");
return $the_data;
}
$ids_List = file_get_contents("the-list.txt");
$ids_List = explode("
",$ids_List);
for($a = 0;$a<50;$a++)
{
$array[$a] = get_http($ids_List[$a]);
}
for($b = 0;$b<50;$b++)
{
$insert_Array[] = "('$ids_List[$b]', NULL, '$array[$b]')";
}
$insert_Array = implode(',', $insert_Array);
$sql = "INSERT INTO `the_data` (`id`, `queue_id`, `data`) VALUES $insert_Array;";
mysql_query($sql);
After many optimizations, I am stuck on retrieving/fetching/saving around 23 rows with data per second.
The MySQL table is pretty simple and looks like this:
id | queue_id(AI) | data
Keep in mind, that the database doesn't seem to be the bottleneck. When I check the CPU usage, the mysql.exe process barely ever goes over 1%.
I fetch the data via 125 proxies. I've decreased the amount to 20 for the test and it DIDN'T make any difference ( suggesting that the proxies are not the bottleneck? - because I get the same performance when using 5 times less of them? )
So if the MySQL and Proxies are not the cause of the limit, what else can be it and how can I find out?
So far, the optimizations I've did:
replaced file_get_contents with curl functions for retrieving the
http data
replaced the https:// url for a http:// one ( is this faster? )
indexed the table
replaced the API domain name that is called by a pure IP address ( so
the DNS time isn't a factor )
I use only private proxies that have low latency.
My questions:
What may be the possible cause of the performance limit?
How do I find the reason for the limit?
Can this be caused by some TCP/IP limitation / poorly configured apache/windows?
The API is really fast and it serves many times more queries to other people so I don't believe it can't respond any faster.
You are reading the proxy file every time you are calling the curl function. I recommend you to use the read operation outside the function. I mean read the proxies once, and store it in an array to reuse it.
Use this curl option CURLOPT_TIMEOUT to defined a fixed amount of time for your curl execution(for example 3 seconds). It will help you to debug whether its the issue of curl operation or not.
I am working with a script (that I did not create originally) that generates a pdf file from an HTML page. The problem is that it is now taking a very long time, like 1-2 minutes, to process. Supposedly this was working fine originally, but has slowed down within the past couple of weeks.
The script calls file_get_contents on a php script, which then outputs the result into an HTML file on the server, and runs the pdf generator app on that file.
I seem to have narrowed down the problem to the file_get_contents call on a full url, rather than a local path.
When I use
$content = file_get_contents('test.txt');
it processes almost instantaneously. However, if I use the full url
$content = file_get_contents('http://example.com/test.txt');
it takes anywhere from 30-90 seconds to process.
It's not limited to our server, it is slow when accessing any external url, such as http://www.google.com. I believe the script calls the full url because there are query string variables that are necessary that don't work if you call the file locally.
I also tried fopen, readfile, and curl, and they were all similarly slow. Any ideas on where to look to fix this?
Note: This has been fixed in PHP 5.6.14. A Connection: close header will now automatically be sent even for HTTP/1.0 requests. See commit 4b1dff6.
I had a hard time figuring out the cause of the slowness of file_get_contents scripts.
By analyzing it with Wireshark, the issue (in my case and probably yours too) was that the remote web server DIDN'T CLOSE THE TCP CONNECTION UNTIL 15 SECONDS (i.e. "keep-alive").
Indeed, file_get_contents doesn't send a "connection" HTTP header, so the remote web server considers by default that's it's a keep-alive connection and doesn't close the TCP stream until 15 seconds (It might not be a standard value - depends on the server conf).
A normal browser would consider the page is fully loaded if the HTTP payload length reaches the length specified in the response Content-Length HTTP header. File_get_contents doesn't do this and that's a shame.
SOLUTION
SO, if you want to know the solution, here it is:
$context = stream_context_create(array('http' => array('header'=>'Connection: close\r\n')));
file_get_contents("http://www.something.com/somepage.html",false,$context);
The thing is just to tell the remote web server to close the connection when the download is complete, as file_get_contents isn't intelligent enough to do it by itself using the response Content-Length HTTP header.
I would use curl() to fetch external content, as this is much quicker than the file_get_contents method. Not sure if this will solve the issue, but worth a shot.
Also note that your servers speed will effect the time it takes to retrieve the file.
Here is an example of usage:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://example.com/test.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
Sometimes, it's because the DNS is too slow on your server, try this:
replace
echo file_get_contents('http://www.google.com');
as
$context=stream_context_create(array('http' => array('header'=>"Host: www.google.com\r\n")));
echo file_get_contents('http://74.125.71.103', false, $context);
I had the same issue,
The only thing that worked for me is setting timeout in $options array.
$options = array(
'http' => array(
'header' => implode($headers, "\r\n"),
'method' => 'POST',
'content' => '',
'timeout' => .5
),
);
$context = stream_context_create(array('http' => array('header'=>'Connection: close\r\n')));
$string = file_get_contents("http://localhost/testcall/request.php",false,$context);
Time: 50976 ms (avaerage time in total 5 attempts)
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, "http://localhost/testcall/request.php");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
echo $data = curl_exec($ch);
curl_close($ch);
Time: 46679 ms (avaerage time in total 5 attempts)
Note: request.php is used to fetch some data from mysql database.
Can you try fetching that url, on the server, from the command line? curl or wget come to mind. If those retrieve the URL at a normal speed, then it's not a network problem and most likely something in the apache/php setup.
I have a huge data passed by API, I'm using file_get_contents to read the data, but it took around 60 seconds. However, using KrisWebDev's solution it took around 25 seconds.
$context = stream_context_create(array('https' => array('header'=>'Connection: close\r\n')));
file_get_contents($url,false,$context);
What I would also consider with Curl is that you can "thread" the requests. This has helped me immensely as I do not have access to a version of PHP that allows threading at the moment .
For example, I was getting 7 images from a remote server using file_get_contents and it was taking 2-5 seconds per request. This process alone was adding 30seconds or something to the process, while the user waited for the PDF to be generated.
This literally reduced the time to about 1 image. Another example, I verify 36 urls in the time it took before to do one. I think you get the point. :-)
$timeout = 30;
$retTxfr = 1;
$user = '';
$pass = '';
$master = curl_multi_init();
$node_count = count($curlList);
$keys = array("url");
for ($i = 0; $i < $node_count; $i++) {
foreach ($keys as $key) {
if (empty($curlList[$i][$key])) continue;
$ch[$i][$key] = curl_init($curlList[$i][$key]);
curl_setopt($ch[$i][$key], CURLOPT_TIMEOUT, $timeout); // -- timeout after X seconds
curl_setopt($ch[$i][$key], CURLOPT_RETURNTRANSFER, $retTxfr);
curl_setopt($ch[$i][$key], CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch[$i][$key], CURLOPT_USERPWD, "{$user}:{$pass}");
curl_setopt($ch[$i][$key], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($master, $ch[$i][$key]);
}
}
// -- get all requests at once, finish when done or timeout met --
do { curl_multi_exec($master, $running); }
while ($running > 0);
Then check over the results:
if ((int)curl_getinfo($ch[$i][$key], CURLINFO_HTTP_CODE) > 399 || empty($results[$i][$key])) {
unset($results[$i][$key]);
} else {
$results[$i]["options"] = $curlList[$i]["options"];
}
curl_multi_remove_handle($master, $ch[$i][$key]);
curl_close($ch[$i][$key]);
then close file:
curl_multi_close($master);
I know that is old question but I found it today and answers didn't work for me. I didn't see anyone saying that max connections per IP may be set to 1. That way you are doing API request and API is doing another request because you use full url. That's why loading directly from disc works. For me that fixed a problem:
if (strpos($file->url, env('APP_URL')) === 0) {
$url = substr($file->url, strlen(env('APP_URL')));
} else {
$url = $file->url;
}
return file_get_contents($url);
I'm running a curl request on an eXist database through php. The dataset is very large, and as a result, the database consistently takes a long amount of time to return an XML response. To fix that, we set up a curl request, with what is supposed to be a long timeout.
$ch = curl_init();
$headers["Content-Length"] = strlen($postString);
$headers["User-Agent"] = "Curl/1.0";
curl_setopt($ch, CURLOPT_URL, $requestUrl);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERPWD, 'admin:');
curl_setopt($ch,CURLOPT_TIMEOUT,1000);
$response = curl_exec($ch);
curl_close($ch);
However, the curl request consistently ends before the request is completed (<1000 when requested via a browser). Does anyone know if this is the proper way to set timeouts in curl?
See documentation: http://www.php.net/manual/en/function.curl-setopt.php
CURLOPT_CONNECTTIMEOUT - The number of seconds to wait while trying to connect. Use 0 to wait indefinitely.
CURLOPT_TIMEOUT - The maximum number of seconds to allow cURL functions to execute.
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 400); //timeout in seconds
also don't forget to enlarge time execution of php script self:
set_time_limit(0);// to infinity for example
Hmm, it looks to me like CURLOPT_TIMEOUT defines the amount of time that any cURL function is allowed to take to execute. I think you should actually be looking at CURLOPT_CONNECTTIMEOUT instead, since that tells cURL the maximum amount of time to wait for the connection to complete.
There is a quirk with this that might be relevant for some people... From the PHP docs comments.
If you want cURL to timeout in less than one second, you can use CURLOPT_TIMEOUT_MS, although there is a bug/"feature" on "Unix-like systems" that causes libcurl to timeout immediately if the value is < 1000 ms with the error "cURL Error (28): Timeout was reached". The explanation for this behavior is:
"If libcurl is built to use the standard system name resolver, that portion of the transfer will still use full-second resolution for timeouts with a minimum timeout allowed of one second."
What this means to PHP developers is "You can't use this function without testing it first, because you can't tell if libcurl is using the standard system name resolver (but you can be pretty sure it is)"
The problem is that on (Li|U)nix, when libcurl uses the standard name resolver, a SIGALRM is raised during name resolution which libcurl thinks is the timeout alarm.
The solution is to disable signals using CURLOPT_NOSIGNAL. Here's an example script that requests itself causing a 10-second delay so you can test timeouts:
if (!isset($_GET['foo'])) {
// Client
$ch = curl_init('http://localhost/test/test_timeout.php?foo=bar');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_NOSIGNAL, 1);
curl_setopt($ch, CURLOPT_TIMEOUT_MS, 200);
$data = curl_exec($ch);
$curl_errno = curl_errno($ch);
$curl_error = curl_error($ch);
curl_close($ch);
if ($curl_errno > 0) {
echo "cURL Error ($curl_errno): $curl_error\n";
} else {
echo "Data received: $data\n";
}
} else {
// Server
sleep(10);
echo "Done.";
}
From http://www.php.net/manual/en/function.curl-setopt.php#104597
Your code sets the timeout to 1000 seconds. For milliseconds, use CURLOPT_TIMEOUT_MS.
You will need to make sure about timeouts between you and the file.
In this case PHP and Curl.
To tell Curl to never timeout when a transfer is still active, you need to set CURLOPT_TIMEOUT to 0, instead of 1000.
curl_setopt($ch, CURLOPT_TIMEOUT, 0);
In PHP, again, you must remove time limits or PHP it self (after 30 seconds by default) will kill the script along Curl's request. This alone should fix your issue.
In addition, if you require data integrity, you could add a layer of security by using ignore_user_abort:
# The maximum execution time, in seconds. If set to zero, no time limit is imposed.
set_time_limit(0);
# Make sure to keep alive the script when a client disconnect.
ignore_user_abort(true);
A client disconnection will interrupt the execution of the script and possibly damaging data,
eg. non-transitional database query, building a config file, ecc., while in your case it would download a partial file... and you might, or not, care about this.
Answering this old question because this thread is at the top on engine searches for CURL_TIMEOUT.
You can't run the request from a browser, it will timeout waiting for the server running the CURL request to respond. The browser is probably timing out in 1-2 minutes, the default network timeout.
You need to run it from the command line/terminal.
If you are using PHP as a fastCGI application then make sure you check the fastCGI timeout settings.
See: PHP curl put 500 error