I'm working with multi Curl and was wondering how to handle the errors. I want to check which error occured and if it is an error like, rate limit exceeded I want to crawl that link again after some delay (sleep()). My question: "Is there a build in function which can do this for me or do I need to collect all Urls in an array and just run those again?"
This is what I've got now:
<?php
$urls = array( "https://API-URL.com",
"https://API-URL.com",
"https://API-URL.com",
"https://API-URL.com",
...);
//create the multiple cURL handle
$mh = curl_multi_init();
//Number of elements in $urls
$nbr = count($urls);
// set URL and options
for($x = 0; $x < $nbr; $x++){
// create both cURL resources
$ch[$x] = curl_init();
// set URL and other appropriate options
curl_setopt($ch[$x], CURLOPT_URL, $urls[$x]);
curl_setopt($ch[$x], CURLOPT_RETURNTRANSFER, true );
curl_setopt($ch[$x], CURLOPT_SSL_VERIFYPEER, false);
//add the two handles
curl_multi_add_handle($mh,$ch[$x]);
}
//execute the handles
do {
curl_multi_exec($mh, $running);
} while ($running);
for($x = 0; $x < $nbr; $x++){
$result = curl_multi_getcontent($ch[$x]);
$decoded = json_decode($result, true);
//get info about the request
$error = curl_getinfo($ch[$x], CURLINFO_HTTP_CODE);
//error handling
if($error != 200){
$again[] = array("Url" => $urls[$x], "errornbr" => $error);
} else {
// Here I do what ever I want with the data
}
curl_multi_remove_handle($mh, $ch[$x]);
curl_close($ch[1]);
}
curl_multi_close($mh);
?>
For multiple handles there is
https://www.php.net/manual/en/function.curl-multi-info-read.php
so error check (assuming http connection) should look like:
while ($a = curl_multi_info_read($mh))
{
if ($b = $a['result'])
{
echo curl_strerror($b);# CURLE_* error
}
elseif (!($b = curl_getinfo($a['handle'], CURLINFO_RESPONSE_CODE)))
{
echo 'connection failed';
}
elseif ($b !== 200)
{
echo 'HTTP status is not 200 OK';
}
}
Consider this code as pseudo-code for modern PHPs (i didn't test this exact variant, but scheme will work). Calling curl_errno() on "easy" handles added to "multi" handle will return 0 which is not an error.
In the second for-loop, when you are cycling through the curl handlers to examine what did each curl handler return, I hope, this approach will answer you question
foreach ($ch as $key => $h) {
//This code is actually checking for any error that may occur, whatever that
//error is you can handle it in the if-part of the condition. and save those
//urls to the array $again to call them on a later stage.
if (curl_errno($h)) {
//this is how you will get complete information what did happened to the
//curl handler. And why did it fail. All the inforation will be stored in //error_info.
$again[] = array("Url" =>curl_getinfo($h, CURLINFO_EFFECTIVE_URL), "error_info" => curl_getinfo($h));
}
else{
//here you will handle the success scenario for each curl handler.
$responses[$key] = ['data' => curl_multi_getcontent($h)];
}
//remove curl handler as you are doing in the loop
}
Related
I am trying to detect status of invoice from a json file, then if the status is a confirmed payment, update the status and write the json to a new location, then unlink the existing json location.
<?php
// get posted variables or die;
if (isset($_POST['num'])) {
$invoice = strip_tags($_POST['num']);
$filename = $invoice.'.json';
} else {
die;
}
if (isset($_POST['status'])) {
$status = strip_tags($_POST['status']);
} else {
die;
}
// get existing invoice
$content = file_get_contents('data/'.$invoice.'.json');
$data = json_decode($content, true);
// read json into variables
$email = $data['email'];
$id = $data['id'];
$addr = $data['tac_address'];
$os = $data['os'];
$exp = $data['experience'];
$hosting = $data['type'];
if (isset($data['telegram']) && $data['telegram'] != '') { $telegram = $data['telegram']; } else { $telegram = ''; }
if (isset($data['linkedin']) && $data['linkedin'] != '') { $linkedin = $data['linkedin']; } else { $linkedin = ''; }
if (isset($data['pay_status']) && $data['pay_status'] != '' && $data['pay_status'] == $status) { $status = $data['pay_status']; }
$payment_addr = $data['bitcoin'];
$payment_value = $data['value'];
$payment = substr($payment_value, 0, -4);
// turn variables into json array
$arr = array(
'id' => $invoice,
'email' => $email,
'tac_address' => $addr,
'os' => $os,
'experience' => $exp,
'type' => $hosting,
'telegram' => $telegram,
'linkedin' => $linkedin,
'bitcoin' => $payment_addr,
'value' => $payment_value,
'pay_status' => $status
);
$json = json_encode($arr);
// check status if paid save output to new location and delete old file
if ($status == 'Confirmed Payment') {
file_put_contents('paid_data/'.$filename, $json);
unlink('data/'.$filename);
}
The problem I am facing is that file_put_contents('paid_data/'.$filename, $json); ends up a file with a bunch of NULL variables. If I remove the unlink the variables save just fine, when I add it back the variables are all NULL.
So how can I verify file_put_contents takes place before the unlinking happens?
Also.... WHY does this happen? Isn't php supposed to be linear and shouldn't file_put_contents finish before the next line is carried out? Everything I have read about file_put_contents suggests as much. So why does the unlink take place before writing the content to a new location?
I still hope for a better answer, but so far this is my working solution to the problem. I changed the final if statement to the following. This seems to solve the issue - but there really has to be a better way than this. This feels very "hacky".
if ($status == 'Confirmed Payment') {
file_put_contents('paid_data/'.$filename, $json);
$i = 0;
while ($i < 1000) {
$i++;
if (file_exists('paid_data/'.$filename)) {
unlink('data/'.$filename);
break;
}
}
}
After mimicking your file structure and seeding a few examples, I was able to execute your code as is with the expected results. However, file_put_contents willreturn false on failure, so you might try something like this:
if ($status == 'Confirmed Payment') {
if(!file_put_contents('paid_data/'.$filename, $json);){
print_r(error_get_last());
die;
}
unlink('data/'.$filename);
}
Your code as originally written should be fine, as far as I can see. Usually when I see the kind of behavior you're describing, the problem is that the script itself is being called twice (or more) and overlapping calls are manipulating the same file.
I would definitely put in some debugging statements to verify this; I don't know your environment, but a simple line written to a log will probably be enlightening.
I am trying to check the header response of multiple URLs at the same time without having a complicated php block.
<?php
$url = array("http://www.simplysup.co.uk/download/dl/trjsetup692.exe");
$headers = get_headers($url);
$response = substr($headers[0], 9, 3);
if ($response != "404") {
echo "PASS";
} else {
echo "FAIL";
}
?>
The above code checks for a single URL at a time. How to perform the same for multiple URLs at the same time? I will also need to trigger an email with the URL when the Header response is 404. Any help would be much appreciated.
I think this could solve your problem.
$fail = false;
$urls = array("http://www.simplysup.co.uk/download/dl/trjsetup692.exe");
foreach ($urls as $url) {
$headers = get_headers($url);
$response = substr($headers[0], 9, 3);
if ($response === "404") {
$fail = true;
}
}
if ($fail) {
echo "FAIL";
} else {
echo "PASS";
}
I need to check if a file exist in multiple domains/servers and then show the download link to the user or write an error message. I have this script working for 1 domain:
<?php
$domain0='www.example.com';
$file=$_GET['file']
$resourceUrl = 'http://$domain0/$file';
$resourceExists = false;
$ch = curl_init($resourceUrl);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_exec($ch);
$statusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
//200 = OK
if ($statusCode == '200') {
$resourceExists = true;
}
if ($resourceExists == true) {
echo "Exist! $file";
} else {
echo "$file doesnt exist!";
}
?>
Now I need to check if that file exist in 4 domains, how can I do this? I don't know how to use arrays, so maybe if someone explain me ho to do this, I'll be very grateful.
I would create an array for domains
I would loop through the array with "foreach"
I would call a function to get the result
function checkFileOnDomain($file,$domain) {
$resourceUrl = "http://$domain/$file";
$ch = curl_init($resourceUrl);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_exec($ch);
$statusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($statusCode == '200')
return true;
}
$file=$_GET["file"];
// $_GET should be sanitized!
$domain_list=array("www.test1.com","www.test2.com");
foreach ($domain_list as $domain) {
echo "Check DOMAIN: $domain <hr/>";
if (checkFileOnDomain($file,$domain)) {
echo ">> [ $file ] EXISTS";
} else {
echo ">> [ $file ] DOES NOT EXIST";
}
echo "<br/><br/>";
} unset($domain);
EDIT:
To apply your specifications, you need an extra variable before foreach.
$link_to_file="";
foreach ($domain_list as $domain) {
if (checkFileOnDomain($file,$domain)) {
$link_to_file="$domain/$file";
break; // get first result and quit
}
} unset($domain);
if (!empty($link_to_file)) {
echo $link_to_file; //file is here
} else {
echo "404";
}
An array should solve the problem for you. This creates an array of the domains you want to check, and then cycles through them one by one running the code you wrote.
If you're struggling with arrays, have a look here for some more information
<?php
// Create an array of domains
$domains = ['www.example.com', 'www.example2.com', ...];
// Cycle through all the domains and run the code
foreach($domains as $domain) {
$domain0='www.example.com';
$file=$_GET['file']
$resourceUrl = 'http://$domain0/$file';
$resourceExists = false;
$ch = curl_init($resourceUrl);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_exec($ch);
$statusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
//200 = OK
if($statusCode == '200') {
$resourceExists = true;
} else if($resourceExists == false) {
}
if ($resourceExists == true) {
echo "Exist! $file";
} else {
echo "$file doesnt exist!";
}
}
?>
So I am having trouble getting this code to work. I pulled it from a blog and it is based on the wordpress link checker. I have about 6000 urls in a database i need to check the http status so this seems like a great choice. I've modified the code slightly to fit my needs and it works (kind of).
I have checked the url_list array throughout the code and it contains all of the url's. The problem is that it will basically stop executing after about the 110th row, it is kind of random but generally around that number. Not really sure if I need to set a timeout somewhere or if I have a bug in the code. I noticed that if I set $max_connections greater than 8 it will return a 500 error. Any suggestions?
<?php
// CONFIG
$db_host = 'localhost';
$db_user = 'test';
$db_pass = 'yearight';
$db_name = 'URLS';
$excluded_domains = array();
$max_connections = 7;
$dbh = new PDO('mysql:host=localhost;dbname=URLS', $db_user, $db_pass);
$sth = $dbh->prepare("SELECT url FROM list");
$sth->execute();
$result = $sth->fetchAll(PDO::FETCH_COLUMN, 0);
// initialize some variables
$url_list = array();
$working_urls = array();
$dead_urls = array();
$not_found_urls = array();
$active = null;
foreach($result as $d) {
// get all links via regex
if (preg_match_all('#((http?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)#', $d, $matches)) {
foreach ($matches[1] as $url) {
// store the url
$url_list []= $url;
}
}
}
// 1. multi handle
$mh = curl_multi_init();
// 2. add multiple URLs to the multi handle
for ($i = 0; $i < $max_connections; $i++) {
add_url_to_multi_handle($mh, $url_list);
}
// 3. initial execution
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
// 4. main loop
while ($active && $mrc == CURLM_OK) {
// 5. there is activity
if (curl_multi_select($mh) != -1) {
// 6. do work
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
// 7. is there info?
if ($mhinfo = curl_multi_info_read($mh)) {
// this means one of the requests were finished
// 8. get the info on the curl handle
$chinfo = curl_getinfo($mhinfo['handle']);
// 9. dead link?
if (!$chinfo['http_code']) {
$dead_urls []= $chinfo['url'];
// 10. 404?
} else if ($chinfo['http_code'] == 404) {
$not_found_urls []= $chinfo['url'];
// 11. working
} else {
$working_urls []= $chinfo['url'];
}
// 12. remove the handle
curl_multi_remove_handle($mh, $mhinfo['handle']);
curl_close($mhinfo['handle']);
// 13. add a new url and do work
if (add_url_to_multi_handle($mh, $url_list)) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
}
}
// 14. finished
curl_multi_close($mh);
echo "==Dead URLs==<br/>";
echo implode("<br/>",$dead_urls) . "<br/><br/>";
echo "==404 URLs==<br>";
echo implode("<br/>",$not_found_urls) . "<br/><br/>";
echo "==Working URLs==<br/>";
echo implode("<br/>",$working_urls);
echo "<pre>";
var_dump($url_list);
echo "</pre>";
// 15. adds a url to the multi handle
function add_url_to_multi_handle($mh, $url_list) {
static $index = 0;
// if we have another url to get
if ($url_list[$index]) {
// new curl handle
$ch = curl_init();
// set the url
curl_setopt($ch, CURLOPT_URL, $url_list[$index]);
// to prevent the response from being outputted
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// follow redirections
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
// do not need the body. this saves bandwidth and time
curl_setopt($ch, CURLOPT_NOBODY, 1);
// add it to the multi handle
curl_multi_add_handle($mh, $ch);
// increment so next url is used next time
$index++;
return true;
} else {
// we are done adding new URLs
return false;
}
}
?>
UPDATE:
I have written a script in bash that does the same thing as this. I notice when I was going through the text file the info was output to, that when it fails it is typically around links that return odd http status codes like 000 and 522 some of them tend to execute for up to 5 minutes! So I am wondering if the PHP version of cURL is stopping execution when it encounters these status codes. It is just a thought and might add more value to help solve the issue.
1 - EXECUTION TIME ISSUE
2 - DECLAE AT THE TOP OF CODE MAX_EXECUTION_TIME, WILL HELP THIS FOR SURE
bool set_time_limit ( int $seconds )
I have this code to retrieve some countervalues of a copymachine.
foreach($sett as $key => $value){
if (intval(str_replace("INTEGER: ","",snmpget($ip, "public", $base.$value["MIB"])))) {
$c = intval(str_replace("INTEGER: ","",snmpget($ip, "public", $base.$value["MIB"])));
$error = false;
}
else {
$c = 0;
$error = true;
}
$counters = array_push_assoc($counters,ucwords($key),array("total" => $c, "code" => $value["code"]));
}
everything works like a charm but the only thing that is the problem is when a machine is down en the code cannot make an SNMPGET, the whole script fails.
First I want to check if the connection to the device is alive and then retrieve the counters with SNMPGET
Is there any solution you guys can offer me?
thx
The snmpget() function returns FALSE if it fails to retrieve the object.
See docs: http://www.php.net/manual/en/function.snmpget.php
You should do a check for this within your code, for example:
try
{
foreach($sett as $key => $value){
$sntpReturn = snmpget($ip, "public", $base.$value["MIB"]);
if ($sntpReturn === false)
{
// Do something to handle failed SNTP request.
throw new Exception("Failed to execute the SNTP request to the machine.");
}
else
{
if (intval(str_replace("INTEGER: ","", $sntpReturn))) {
$c = intval(str_replace("INTEGER: ","",snmpget($ip, "public", $base.$value["MIB"])));
$error = false;
}
else {
$c = 0;
$error = true;
}
$counters = array_push_assoc($counters,ucwords($key),array("total" => $c, "code" => $value["code"]));
}
}
catch (Exception $e)
{
// Handle the exception, maybe kill the script because it failed?
}