How to make PHP Curl request not to wait? - php

I've a PHP function which fetches a Curl request. This request sometimes take longer time than expected and hence my php function takes longer to return.
In my particular case, output of curl request is not important. So is it possible with curl just to place a request and proceed without waiting for curl_exec() to finish?

PHP does not support multi-threading, so this is not possible. You can, however, limit the amount of time cURL will execute.
$max_exe_time = 250; // time in milliseconds
curl_setopt($curl_handle, CURLOPT_TIMEOUT_MS, $max_exe_time);
You can read about this configuration option and others: http://php.net/manual/function.curl-setopt.php

Related

Send response and continue executing script

I'm creating a highload Telegram Bot, which means many requests come in and they take time to handle.
I use webhook (Telegram sends updates to, say, handler.php) which requires to respond with correct [] answer and fully close the connection for Telegram to send the next update. Otherwise Telegram will only increase pending_update_count and won't send any new updates until the previous one is handled.
So what I'm trying to do is to respond correctly and close the connection before code is executed.
StackOverflow suggests some solutions. But none of these will work for me because they only close the connection if there was no output and I need to respond with [].
How do I close a connection early?
Send response and continue executing script - PHP
The only workaround I came up with is to call shell_exec('php sendMessage.php >/dev/null 2>/dev/null &') from handler.php. Works perfectly well but that's not what I need. Any other suggestions to respond and close the connection, so code can execute in the background?
I found a simple solution that works for Telegram Bot API webhooks. It allows me to call the handling script directly within the script itself without using exec() or shell_exec().
Time-consuming code
$bool = true; // some condition in your regular code
if($bool) {
sleep(10);
}
This will cause a huge delay for other bot users since Telegram didn't get the response and won't send you any other updates.
Solution
if(isset($_GET['action'])) {
sleep(10);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://localhost/handler.php?action');
curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
curl_setopt($ch, CURLOPT_TIMEOUT_MS, 50);
curl_exec($ch);
curl_close($ch);
This way the script will call itself (or any other script), Telegram will be provided with correct and quick response, while the heavy part is executed separately.
It is strongly recommended not to go below 50 ms on CURLOPT_TIMEOUT_MS, sometimes it just doesn't work when receiving requests from Telegram.

Something faster than get_headers()

I'm trying to make a PHP script that will check the HTTP status of a website as fast as possible.
I'm currently using get_headers() and running it in a loop of 200 random urls from mysql database.
To check all 200 - it takes an average of 2m 48s.
Is there anything I can do to make it (much) faster?
(I know about fsockopen - It can check port 80 on 200 sites in 20s - but it's not the same as requesting the http status code because the server may responding on the port - but might not be loading websites correctly etc)
Here is the code..
<?php
function get_httpcode($url) {
$headers = get_headers($url, 0);
// Return http status code
return substr($headers[0], 9, 3);
}
###
## Grab task and execute it
###
// Loop through task
while($data = mysql_fetch_assoc($sql)):
$result = get_httpcode('http://'.$data['url']);
echo $data['url'].' = '.$result.'<br/>';
endwhile;
?>
You can try CURL library. You can send multiple request parallel at same time with CURL_MULTI_EXEC
Example:
$ch = curl_init('http_url');
curl_setopt($ch, CURLOPT_HEADER, 1);
$c = curl_exec($ch);
$info = curl_getinfo($ch, CURLINFO_HTTP_CODE);
print_r($info);
UPDATED
Look this example. http://www.codediesel.com/php/parallel-curl-execution/
I don't know if this is an option that you can consider, but you could run all of them almost at the same using a fork, this way the script will take only a bit longer than one request
http://www.php.net/manual/en/function.pcntl-fork.php
you could add this in a script that is ran in cli mode and launch all the requests at the same time, for example
Edit: you say that you have 200 calls to make, so a thing you might experience is the database connection loss. the problem is caused by the fact that the link is destroyed when the first script completes. to avoid that you could create a new connection for each child. I see that you are using the standard mysql_* functions so be sure to pass the 4th parameter to be sure you create a new link each time. also check the maximum number of simultaneous connections on your server

Stream response from CURL request without waiting for it to finish

I have a PHP script on my server that is making a request to another server for an image.
The script is accessed just like a regular image source like this:
<img src="http://example.com/imagecontroller.php?id=1234" />
Browser -> Script -> External Server
The script is doing a CURL request to the external server.
Is it possible to "stream" the CURL response directly back to the client (browser) as it is received on the server?
Assume my script is on a slow shared hosting server and the external server is blazing fast (a CDN). Is there a way to serve the response directly back to the client without my script being a bottleneck? It would be great if my server didn't have to wait for the entire image to be loaded into memory before beginning the response to the client.
Pass the -N/--no-buffer flag to curl. It does the following:
Disables the buffering of the output stream. In normal work
situations, curl will use a standard buffered output stream that will
have the effect that it will output the data in chunks, not
necessarily exactly when the data arrives. Using this option will
disable that buffering.
Note that this is the negated option name documented. You can thus use
--buffer to enforce the buffering.
Check out Pascal Martin's answer to an unrelated question, in which he discusses using CURLOPT_FILE for streaming curl responses. His explanation for handling " Manipulate a string that is 30 million characters long " should work in your case.
Hope this helps!
Yes you can use the CURLOPT_WRITEFUNCTION flag:
curl_setopt($ch, CURLOPT_WRITEFUNCTION, $callback);
Where $ch is the Curl handler, and $callback is the callback function name.
This command will stream response data from remote site. The callback function can look something like:
$result = '';
$callback = function ($ch, $str) {
global $result;
$result .= $str;//$str has the chunks of data streamed back.
//here you can mess with the stream data either with $result or $str
return strlen($str);//don't touch this
};
If not interrupted at the end $result will contain all the response from remote site.
Not with curl, you could use fsocket to do streaming.

PHP Curl Performance Bottleneck Making Google Maps Geocoding Requests

I am using PHP and CURL to make HTTP reverse geocoding (lat, long -> address) requests to Google Maps. I have a premier account, so we can make a lot of a requests without being throttled or blocked.
Unfortunately, I have reached a performance limit. We get approximately 500,000 requests daily that need to be reverse geocoded.
The code is quite trivial (I will write pieces in pseudo-code) for the sake of saving time and space. The following code fragment is called every 15 seconds via a job.
<?php
//get requests from database
$requests = get_requests();
foreach($requests as $request) {
//build up the url string to send to google
$url = build_url_string($request->latitude, $request->longitude);
//make the curl request
$response = Curl::get($url);
//write the response address back to the database
write_response($response);
}
class Curl {
public static function get($p_url, $p_timeout = 5) {
$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, $p_url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, $p_timeout);
curl_setopt($curl_handle, CURLOPT_TIMEOUT, $p_timeout);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($curl_handle);
curl_close($curl_handle);
return $response;
}
}
?>
The performance problem seems to be the CURL requests. They are extremely slow, probably because its making a full HTTP request every operations. We have a 100mbps connection, but the script running at full speed is only utilizing about 1mbps. The load on the server is essentially nothing. The server is a quad core, with 8GB of memory.
What things can we do to increase the throughput of this? Is there a way to open a persistent (keep-alive) HTTP request with Google Maps? How about exploding the work out horizontally, i.e. making 50 concurrent requests?
Thanks.
some things I would do:
no matter how "premium" you are, doing external http-requests will always be a bottleneck, so for starters, cache request+response - you can still update them via cron on a regular basis
these are single http requests - you will never get "fullspeed" with them especially if request and response are that small (< 1MB) - tcp/handshaking/headers/etc.
so try using multicurl (if your premium allows it) in order to start multiple requests - this should give you fullspeed ;)
add "Connection: close" in the request header you send, this will immediately close the http connection so your and google's server won't get hammered with halfopen
Considering you are running all your requests sequentially you should look into dividing the work up onto multiple machines or processes. Then each can be run in parallel. Judging by your benchmarks you are limited by how fast each Curl response takes, not by CPU or bandwidth.
My first guess is too look at a queuing system (Gearman, RabbitMQ).

How do I check for valid (not dead) links programmatically using PHP?

Given a list of urls, I would like to check that each url:
Returns a 200 OK status code
Returns a response within X amount of time
The end goal is a system that is capable of flagging urls as potentially broken so that an administrator can review them.
The script will be written in PHP and will most likely run on a daily basis via cron.
The script will be processing approximately 1000 urls at a go.
Question has two parts:
Are there any bigtime gotchas with an operation like this, what issues have you run into?
What is the best method for checking the status of a url in PHP considering both accuracy and performance?
Use the PHP cURL extension. Unlike fopen() it can also make HTTP HEAD requests which are sufficient to check the availability of a URL and save you a ton of bandwith as you don't have to download the entire body of the page to check.
As a starting point you could use some function like this:
function is_available($url, $timeout = 30) {
$ch = curl_init(); // get cURL handle
// set cURL options
$opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
CURLOPT_URL => $url, // set URL
CURLOPT_NOBODY => true, // do a HEAD request only
CURLOPT_TIMEOUT => $timeout); // set timeout
curl_setopt_array($ch, $opts);
curl_exec($ch); // do it!
$retval = curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200; // check if HTTP OK
curl_close($ch); // close handle
return $retval;
}
However, there's a ton of possible optimizations: You might want to re-use the cURL instance and, if checking more than one URL per host, even re-use the connection.
Oh, and this code does check strictly for HTTP response code 200. It does not follow redirects (302) -- but there also is a cURL-option for that.
Look into cURL. There's a library for PHP.
There's also an executable version of cURL so you could even write the script in bash.
I actually wrote something in PHP that does this over a database of 5k+ URLs. I used the PEAR class HTTP_Request, which has a method called getResponseCode(). I just iterate over the URLs, passing them to getResponseCode and evaluate the response.
However, it doesn't work for FTP addresses, URLs that don't begin with http or https (unconfirmed, but I believe it's the case), and sites with invalid security certificates (a 0 is not found). Also, a 0 is returned for server-not-found (there's no status code for that).
And it's probably easier than cURL as you include a few files and use a single function to get an integer code back.
fopen() supports http URI.
If you need more flexibility (such as timeout), look into the cURL extension.
Seems like it might be a job for curl.
If you're not stuck on PHP Perl's LWP might be an answer too.
You should also be aware of URLs returning 301 or 302 HTTP responses which redirect to another page. Generally this doesn't mean the link is invalid. For example, http://amazon.com returns 301 and redirects to http://www.amazon.com/.
Just returning a 200 response is not enough; many valid links will continue to return "200" after they change into porn / gambling portals when the former owner fails to renew.
Domain squatters typically ensure that every URL in their domains returns 200.
One potential problem you will undoubtably run into is when the box this script is running on looses access to the Internet... you'll get 1000 false positives.
It would probably be better for your script to keep some type of history and only report a failure after 5 days of failure.
Also, the script should be self-checking in some way (like checking a known good web site [google?]) before continuing with the standard checks.
You only need a bash script to do this. Please check my answer on a similar post here. It is a one-liner that reuses HTTP connections to dramatically improve speed, retries n times for temporary errors and follows redirects.

Categories