This is a loop used in our script with curl. It's causing CPU usage to shoot up to 100%. A friend said "Your computer is looping so fast here it doesn't have time to process the request since it is constantly checking for a finish."... So my question is how can this loop be re-written to slow down? Thanks
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
Try this: http://php.net/manual/function.curl-multi-select.php
Adding a call to http://php.net/sleep or http://php.net/usleep in every iteration should reduce the CPU usage by allowing other running processes to be scheduled by the operating system.
Unfortunately you didn't post whole code. I suppose you are doing something like
$mh = curl_multi_init();
for ($i = 0; $i < $desiredThreadsNumber; $i++) {
$ch = curl_init();
// set up $ch here
curl_multi_add_handle($mh, $ch);
}
You should understand that you haven't run threads yet here. curl_multi_exec() runs all threads. But it can't run all $desiredThreadsNumber threads simultaneously. If you look on example on curl_multi_exec() php.net page, you will see that you must wait while curl_multi_exec() run all threads. In other words, you need next nested loop here:
$running = null;
do {
do {
$mrc = curl_multi_exec($mh, $running);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
} while($running > 0);
At the end let me suggest you to read this article http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/ and use code snippet from there, I used it in 2 or 3 projects.
curl_multi_select (http://php.net/manual/function.curl-multi-select.php) is indeed the way to go, but there are a couple caveats.
First, if curl_multi_exec returns CURLM_CALL_MULTI_PERFORM, it has more data to be processed immediately, so should be run again. Also, it is important to check that curl_multi_exec did not fail immediately; in that case curl_multi_select could block forever.
This should work:
do {
while (CURLM_CALL_MULTI_PERFORM === curl_multi_exec($mh, $running)) {};
if (!$running) break;
while (curl_multi_select($mh) === 0) {};
} while (true);
If anyone sees a good way to avoid the while(true) without duplicating code, please point it out.
Tried all the solutions provided above but this one worked for me in highly loaded system where on each second there are more than 1k Multi Curl requests are being made.
//Execute Handles
$running = null;
do {
$mrc = curl_multi_exec($mh, $running);
} while($mrc == CURLM_CALL_MULTI_PERFORM);
while ($running && $mrc == CURLM_OK) {
if (curl_multi_select($mh) == -1) {
usleep(1);
}
do {
$mrc = curl_multi_exec($mh, $running);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
Try:
$running = null;
do {
do {
$mrc = curl_multi_exec($mh, $running);
} while ($mrc == CURLM_CALL_MULTI_PERFORM && curl_multi_select($mh) === 0 );
} while($running > 0 && $mrc == CURLM_OK );
You could add sleep(1), which sleeps for one second, into the loop.
Related
After looking at php.net examples and contributor codes, I found that there are different approaches, however some of them either doesn't work after testing or are deprecated.
Over internet different articles suggest different approaches:
do {
curl_multi_exec($mh,$active);
}
while ($active > 0);
other examples/programmers used "advanced" way:
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) == -1) {
usleep(1000);
}
else {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
Can anyone tell, which is the up-to-date and best usage of curl_multi_exec ?
(btw, personally me, i've found that usleep doenst make any performance )
The usleep() call doesn't "improve performance". It is there to avoid a busy-loop for the case where the function doesn't wait on anything but return instantly. That can happen in particular for (some older) libcurl versions during the name resolver phase in the beginning of a transfer. (That precaution can be probably removed in the future when CURL/PHP won't behave like that anymore.)
But you can for sure skip the checks for CURLM_CALL_MULTI_PERFORM since libcurl hasn't returned that since very many years. So, that would make it just
$active = 1;
$mrc = CURLM_OK;
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) == -1) {
usleep(1000);
}
else {
$mrc = curl_multi_exec($mh, $active);
}
}
How can I send a message to more than 50 mobiles at a time? The code below will execute but it takes a long time.
<?php
$sqljobseekers=$con->query("SELECT * FROM users");
$y=mysqli_num_rows($sqljobseekers);
while($jobseekers=$sqljobseekers->fetch_assoc()) {
$seekermobile=$jobseekers['emp_mobile'];
$msg="Dear Candidate, .".$job_cmp_name.". is looking .".$jobrolename.". like u, for more details logon www.venkymama.com / www.lifemadeeasyglobal.com";
$msg=urlencode($msg);
$sms_file="http://tra.bulksmshyderabad.co.in/websms/sendsms.aspx?userid=$user&password=$password&sender=atmm&mobileno=".$seekermobile."&msg=$msg.";
$sms_h=fopen($sms_file,"r");
fclose($sms_h);
}
?>
Instead of looping through your users, making an API call, waiting for a response, and moving onto the next user; why not execute all the calls at the same time?
Take a look into curl_multi_exec. It will allow you to send multiple API calls at the same time. Something similar to this:
<?php
$sqljobseekers=$con->query("SELECT * FROM users");
$y=mysqli_num_rows($sqljobseekers);
$mh = curl_multi_init();
while($jobseekers=$sqljobseekers->fetch_assoc()) {
$seekermobile=$jobseekers['emp_mobile'];
$msg="Dear Candidate, .".$job_cmp_name.". is looking .".$jobrolename.". like u, for more details logon www.venkymama.com / www.lifemadeeasyglobal.com";
$msg=urlencode($msg);
$sms_file="http://tra.bulksmshyderabad.co.in/websms/sendsms.aspx?userid=$user&password=$password&sender=atmm&mobileno=".$seekermobile."&msg=$msg.";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $sms_file);
curl_multi_add_handle($mh, $ch);
}
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
?>
So I am having trouble getting this code to work. I pulled it from a blog and it is based on the wordpress link checker. I have about 6000 urls in a database i need to check the http status so this seems like a great choice. I've modified the code slightly to fit my needs and it works (kind of).
I have checked the url_list array throughout the code and it contains all of the url's. The problem is that it will basically stop executing after about the 110th row, it is kind of random but generally around that number. Not really sure if I need to set a timeout somewhere or if I have a bug in the code. I noticed that if I set $max_connections greater than 8 it will return a 500 error. Any suggestions?
<?php
// CONFIG
$db_host = 'localhost';
$db_user = 'test';
$db_pass = 'yearight';
$db_name = 'URLS';
$excluded_domains = array();
$max_connections = 7;
$dbh = new PDO('mysql:host=localhost;dbname=URLS', $db_user, $db_pass);
$sth = $dbh->prepare("SELECT url FROM list");
$sth->execute();
$result = $sth->fetchAll(PDO::FETCH_COLUMN, 0);
// initialize some variables
$url_list = array();
$working_urls = array();
$dead_urls = array();
$not_found_urls = array();
$active = null;
foreach($result as $d) {
// get all links via regex
if (preg_match_all('#((http?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)#', $d, $matches)) {
foreach ($matches[1] as $url) {
// store the url
$url_list []= $url;
}
}
}
// 1. multi handle
$mh = curl_multi_init();
// 2. add multiple URLs to the multi handle
for ($i = 0; $i < $max_connections; $i++) {
add_url_to_multi_handle($mh, $url_list);
}
// 3. initial execution
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
// 4. main loop
while ($active && $mrc == CURLM_OK) {
// 5. there is activity
if (curl_multi_select($mh) != -1) {
// 6. do work
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
// 7. is there info?
if ($mhinfo = curl_multi_info_read($mh)) {
// this means one of the requests were finished
// 8. get the info on the curl handle
$chinfo = curl_getinfo($mhinfo['handle']);
// 9. dead link?
if (!$chinfo['http_code']) {
$dead_urls []= $chinfo['url'];
// 10. 404?
} else if ($chinfo['http_code'] == 404) {
$not_found_urls []= $chinfo['url'];
// 11. working
} else {
$working_urls []= $chinfo['url'];
}
// 12. remove the handle
curl_multi_remove_handle($mh, $mhinfo['handle']);
curl_close($mhinfo['handle']);
// 13. add a new url and do work
if (add_url_to_multi_handle($mh, $url_list)) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
}
}
// 14. finished
curl_multi_close($mh);
echo "==Dead URLs==<br/>";
echo implode("<br/>",$dead_urls) . "<br/><br/>";
echo "==404 URLs==<br>";
echo implode("<br/>",$not_found_urls) . "<br/><br/>";
echo "==Working URLs==<br/>";
echo implode("<br/>",$working_urls);
echo "<pre>";
var_dump($url_list);
echo "</pre>";
// 15. adds a url to the multi handle
function add_url_to_multi_handle($mh, $url_list) {
static $index = 0;
// if we have another url to get
if ($url_list[$index]) {
// new curl handle
$ch = curl_init();
// set the url
curl_setopt($ch, CURLOPT_URL, $url_list[$index]);
// to prevent the response from being outputted
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// follow redirections
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
// do not need the body. this saves bandwidth and time
curl_setopt($ch, CURLOPT_NOBODY, 1);
// add it to the multi handle
curl_multi_add_handle($mh, $ch);
// increment so next url is used next time
$index++;
return true;
} else {
// we are done adding new URLs
return false;
}
}
?>
UPDATE:
I have written a script in bash that does the same thing as this. I notice when I was going through the text file the info was output to, that when it fails it is typically around links that return odd http status codes like 000 and 522 some of them tend to execute for up to 5 minutes! So I am wondering if the PHP version of cURL is stopping execution when it encounters these status codes. It is just a thought and might add more value to help solve the issue.
1 - EXECUTION TIME ISSUE
2 - DECLAE AT THE TOP OF CODE MAX_EXECUTION_TIME, WILL HELP THIS FOR SURE
bool set_time_limit ( int $seconds )
I use multi curl for retrieve some pages, from 1 to 200.
The problem is that the firsts links from the List return always Empties!
I don't understand WHY!! O_o
$mh = curl_multi_init();
for($j=0; $j<$i; $j++){
$ch[$j] = curl_init($Links[$j]);
curl_setopt($ch[$j], CURLOPT_CONNECTTIMEOUT, $curlConTimeOut);
curl_setopt($ch[$j], CURLOPT_TIMEOUT, $curlTimeOut);
curl_setopt($ch[$j], CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch[$j], CURLOPT_MAXREDIRS, 3);
curl_setopt($ch[$j], CURLOPT_FOLLOWLOCATION, 1);
curl_multi_add_handle($mh, $ch[$j]);
}
$active = null;
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
$Si = 0; $Fi = 0; $Disp = "";
for($j=0; $j<$i; $j++){
if($ch[$j]){
if(curl_multi_getcontent($ch[$j]) == null){
$Disp .= '0';
$Fi++;
}else{
$Disp .= '1';
$Si++;
}
curl_multi_remove_handle($mh, $ch[$j]);
curl_close($ch[$j]);
}
}
curl_multi_close($mh);
$Si / $Fi / $Disp is just for testing, and an example of result is:
Link Success: 65/161
Link Failed : 96/161
Disp: 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111101111110011111111001111111111111111111111111111111111
Where 0 is for failed, and 1 for success. If the N element is 0, it means that the N Link is returned NULL
It's impossible that every time, only the initials elements return null!! What's the odds?!?!?!
I have ask for curl_error, all with: "Connection timed out after XXXXX milliseconds"!
1°: 13852 milliseconds
2°: 13833 milliseconds
...
12676 ms
...
10195
...
and continues down to 6007ms and after start the right ones!
The CURLOPT_CONNECTTIMEOUT IS SET TO 6sec!
why every time start from an higher number and go to 6, and after return right? O_o
I want to underline that the order of the null response depends only from the list! Not from the multicurl time respond!
Another Example with less links:
| Link Success: 30/52
| Link Failed : 22/52
| Disp: 0000000000000000000001111111111011111111111111111111
As you see when you execute/request less content/pages you will hit the 1 faster (1 is success and 0 error).
As I understand from what I hear your first request hit an time out. My guess is that you need to lower the amount of request/executions per time. Lets stay you need to execute 5, get the values and then next 5.
5 is just a number i say, so test which number is beter for you. This number can be bigger if your processor can handle more stuff at the same time. But is also limited to the other side of the internet how fast they respond.
Hope it hels
Why in this piece of code I need call 2 times curl_multi_exec function.
On first loop I'm executing the curl_multi_exec handler to run sub handler. When CURLM_CALL_MULTI_PERFORM is different from $mrc the loop ends.
In second loop, is where we find the results from curl handlers, and the first loop is executed again, Why?
<?php
do {
$mrc = curl_multi_exec($multiHandle, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($multiHandle, $timeout) != -1) {
do {
$mrc = curl_multi_exec($multiHandle, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
?>
The code was extracted from PHP-Doc site
The answers is here curl_multi_exec().
It's frustrating as the PHP's documentation can be useless in some aspects ...