php curl multi error handler

php curl multi error handler - php

i want to capture curl errors and warnings in my error handler so that they do not get echoed to the user. to prove that all errors have been caught i prepend the $err_start string to the error. currently here is a working (but simplified) snippet of my code (run it in a browser, not cli):
<?php
set_error_handler('handle_errors');
test_curl();
function handle_errors($error_num, $error_str, $error_file, $error_line)
{
$err_start = 'caught error'; //to prove that the error has been properly caught
die("$err_start $error_num, $error_str, $error_file, $error_line<br>");
}
function test_curl()
{
$curl_multi_handle = curl_multi_init();
$curl_handle1 = curl_init('iamdooooooooooown.com');
curl_setopt($curl_handle1, CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($curl_multi_handle, $curl_handle1);
$still_running = 1;
while($still_running > 0) $multi_errors = curl_multi_exec($curl_multi_handle, $still_running);
if($multi_errors != CURLM_OK) trigger_error("curl error [$multi_errors]: ".curl_error($curl_multi_handle), E_USER_ERROR);
if(strlen(curl_error($curl_handle1))) trigger_error("curl error: [".curl_error($curl_handle1)."]", E_USER_ERROR);
$curl_info = curl_getinfo($curl_handle1); //info for individual requests
$content = curl_multi_getcontent($curl_handle1);
curl_multi_remove_handle($curl_multi_handle, $curl_handle1);
curl_close($curl_handle1);
curl_multi_close($curl_multi_handle);
}
?>
note that my full code has multiple requests in parallel, however the issue still manifests with a single request as shown here. note also that the error handler shown in this code snippet is very basic - my actual error handler will not die on warnings or notices, so no need to school me on this.
now if i try and curl a host which is currently down then i successfully capture the curl error and my script dies with:
caught error 256, curl error: [Couldn't resolve host 'iamdooooooooooown.com'], /var/www/proj/test_curl.php, 18
however the following warning is not caught by my error handler function, and is being echoed to the page:
Warning: (null)(): 3 is not a valid cURL handle resource in Unknown on line 0
i would like to capture this warning in my error handler so that i can log it for later inspection.
one thing i have noticed is that the warning only manifests when the curl code is inside a function - it does not happen when the code is at the highest scope level. is it possible that one of the curl globals (eg CURLM_OK) is not accessible within the scope of the test_curl() function?
i am using PHP Version 5.3.2-1ubuntu4.19
edits
updated the code snippet to fully demonstrate the error
the uncaptured warning only manifests when inside a function or class method

I don't think i agree with the with the way you are capturing the error ... you can try
$nodes = array(
"http://google.com",
"http://iamdooooooooooown.com",
"https://gokillyourself.com"
);
echo "<pre>";
print_r(multiplePost($nodes));
Output
Array
(
[google.com] => #HTTP-OK 48.52 kb returned
[iamdooooooooooown.com] => #HTTP-ERROR 0 for : http://iamdooooooooooown.com
[gokillyourself.com] => #HTTP-ERROR 0 for : https://gokillyourself.com
)
Function Used
function multiplePost($nodes) {
$mh = curl_multi_init();
$curl_array = array();
foreach ( $nodes as $i => $url ) {
$url = trim($url);
$curl_array[$i] = curl_init($url);
curl_setopt($curl_array[$i], CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_array[$i], CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)');
curl_setopt($curl_array[$i], CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($curl_array[$i], CURLOPT_TIMEOUT, 15);
curl_setopt($curl_array[$i], CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl_array[$i], CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl_array[$i], CURLOPT_SSL_VERIFYPEER, 0);
curl_multi_add_handle($mh, $curl_array[$i]);
}
$running = NULL;
do {
usleep(10000);
curl_multi_exec($mh, $running);
} while ( $running > 0 );
$res = array();
foreach ( $nodes as $i => $url ) {
$domain = parse_url($url, PHP_URL_HOST);
$curlErrorCode = curl_errno($curl_array[$i]);
if ($curlErrorCode === 0) {
$info = curl_getinfo($curl_array[$i]);
$info['url'] = trim($info['url']);
if ($info['http_code'] == 200) {
$content = curl_multi_getcontent($curl_array[$i]);
$res[$domain] = sprintf("#HTTP-OK %0.2f kb returned", strlen($content) / 1024);
} else {
$res[$domain] = "#HTTP-ERROR {$info['http_code'] } for : {$info['url']}";
}
} else {
$res[$domain] = sprintf("#CURL-ERROR %d: %s ", $curlErrorCode, curl_error($curl_array[$i]));
}
curl_multi_remove_handle($mh, $curl_array[$i]);
curl_close($curl_array[$i]);
flush();
ob_flush();
}
curl_multi_close($mh);
return $res;
}

it is possible that this is a bug with php-curl. when the following line is removed, then everything behaves ok:
if(strlen(curl_error($curl_handle1))) trigger_error("curl error: [".curl_error($curl_handle1)."]", E_USER_ERROR);
as far as i can tell, curling a host that is down is corrupting $curl_handle1 in some way that the curl_error() function is not prepared for. to get around this problem (until a bug fix is made) just test if the http_code returned by curl_getinfo() is 0. if it is 0 then do not use the curl_error function:
if($multi_errors != CURLM_OK) trigger_error("curl error [$multi_errors]: ".curl_error($curl_multi_handle), E_USER_ERROR);
$curl_info = curl_getinfo($curl_handle1); //info for individual requests
$is_up = ($curl_info['http_code'] == 0) ? 0 : 1;
if($is_up && strlen(curl_error($curl_handle1))) trigger_error("curl error: [".curl_error($curl_handle1)."]", E_USER_ERROR);
its not a very elegant solution, but it may have to do for now.

Related

I'm trying to log into a website with a curl php script but can't because of viewstate generator and eventvalidation. Is there any way to bypass that?

I'm trying to log into a website Using cUrl and scrape certain data from the site. It's a homework project. But the site has 3 different form data that changes every time I log in.
Is it possible to bypass that and log in or is it just not possible? If so, can someone please get me started in the right direction?
The cURL code I've tried is:
<?php
include("simple_html_dom.php");
$cofile = dirname(__FILE__).'/cookie.txt';
$postfield= array(
"SM"=>"UpPnlLogin|btnLogin",
"__LASTFOCUS"=>"",
"__EVENTTARGET"=>"btnLogin",
"__EVENTARGUMENT"=>"",
"__VIEWSTATE"=>"hly8ipIDyvfEpBj01vjkB/HmrA
yIw+UuyvBkGc5NHMexWF+PvAVQZYkSrcwJM4rO9aaz
93ogQuFxowVMDPueJz5DU3obstDtyl7KuLvZXQ+GJ1
JKRGEtTTRl5vM2RIi7mwL+j3LRqHgl+ZW1wftsnt2q
nUy7rrxSC6j0eoqabUM/hpS1hveORvLcEbo+5o1J+r
W0+UYYnZ/cFQcUNhx5538uRaD8PIxq6GxTrT/qI2ef
DDLJB5qmmANILYPxsVg++dXFmQFD59MvETq+R3Om0g
==",
"__VIEWSTATEGENERATOR"=>"CADA6983",
"__EVENTVALIDATION"=>"y2iWoj4pBfE6Ij55U/Hf
Sq/mWPNVk4Hv4Nvg7IDxuN6KElLeNsq4iUIbHMfGQS
8s6oProuk3wXUrqQWG6VleouPj+M3LLkKYR8XhLzmw
e4Cck3tqa/YpGmNLZiNOLkbN4/RhPFq+onAiQ2GDc4
gHlU5aU94WwONQ9ItyzsH4V111bPhKX3gjr9YXhpPg
9UiyWwkNXohLJSWRM9jGfHrgMg==",
"txtCustNo"=>"username",
"txtPassword"=>"password",
"__ASYNCPOST"=>"true",
"btnLogin"=>"Нэвтрэх"
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, $cofile);
curl_setopt($ch, CURLOPT_URL,"https://e.khanbank.com/");//url that is
requested when logging in
curl_setopt($ch,
CURLOPT_REFERER,"https://e.khanbank.com/");//CURLOPT_REFERER
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($postfield));
ob_start(); // prevent any output
curl_exec ($ch); // execute the curl command
ob_end_clean(); // stop preventing output
curl_close ($ch);
unset($ch);
$ch = curl_init();
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cofile);
curl_setopt($ch, CURLOPT_URL,"https://e.khanbank.com/pageMain?
content=ucMain_Welcome");
$result = curl_exec ($ch);
curl_close ($ch);
echo $result;
?>

you can't hardcode the values, they change for every login, and they're tied to your cookie session, meaning the EVENTVALIDATION that you get from your browser is tied to your browser's cookie session, and is not valid for curl.
i'll write an example with the hhb_curl library,
first add this function somewhere, you'll need it (it makes DOMDocument load HTML with utf-8 characterset, which is not the default for DOMDocument, but utf-8 is used by khanbank),
function my_dom_loader(string $html): \DOMDocument
{
$html = trim($html);
if (empty($html)) {
//....
}
if (false === stripos($html, '<?xml encoding=')) {
$html = '<?xml encoding="UTF-8">' . $html;
}
$ret = new DOMDocument('', 'UTF-8');
$ret->preserveWhiteSpace = false;
$ret->formatOutput = true;
if (!(#$ret->loadHTML($html, LIBXML_NOBLANKS | LIBXML_NONET | LIBXML_BIGLINES))) {
throw new \Exception("failed to create DOMDocument from input html!");
}
$ret->preserveWhiteSpace = false;
$ret->formatOutput = true;
return $ret;
}
first create the hhb_curl handle,
<?php
declare (strict_types = 1);
require_once('hhb_.inc.php');
$hc = new hhb_curl('', true);
now, khanbank.com use a browser-white-list, if you're not using a whitelisted browser, you cannot log in. an example of a whitelisted browser is Google Chrome 75 X64, so impersonate that browser by setting
$hc->setopt(CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36');
next fetch the login page to get the cookie and the EVENTVALIDATION stuff,
$html = $hc->exec('https://e.khanbank.com/')->getStdOut();
now we got the EVENTVALIDATION stuff in html, and we need to parse it out from the html,
$domd = my_dom_loader($html);
$xp = new DOMXPath($domd);
$form = $domd->getElementById("Form1");
$post_data = array();
foreach ($form->getElementsByTagName("input") as $input) {
$post_data[$input->getAttribute("name")] = $input->getAttribute("value");
}
assert(isset($post_data['txtCustNo']), "ERROR: COULD NOT FIND USERNAME INPUT!");
assert(isset($post_data['txtPassword']), "ERROR: COULD NOT FIND PASSWORD INPUT!");
now $post_data contains:
array (
'__VIEWSTATE' => '9GT5O4HrKQJrWbF7PRSXu9RiMlpkqY5hO+sN9H0OXxmwYjWMfr2uf4yIgpHtk9sp56RWot30dvKeuGF3+eoOhpNu5nsuGBjtrpb8g8AGMaDbQ0nxpEKS3HILkqccMwFfn7y0LThLfjm0Ow84RGosJa+/5iM9YfP/HFM5HnyHKGJkM84nGEh7QZfoGYwMOU9SSb5dKmxfnmrIo/xXUUh4DT8+LOFGCQ2H5+nPFudTonwfgX6AKBNhkRijlfrUY+ns7HMq699AU38bsaxgD67KEw==',
'__VIEWSTATEGENERATOR' => 'CADA6983',
'__EVENTVALIDATION' => '4FZipDfTouUXBNMfIqlf/SXhPNyW5SBkcH/JIZB/j8kdaJUlMAQzvodpEq2n6WBRvxs6IBGVASOFouDQbqjygKK8+01KbRa9CpEGRiYGdxSIlt0wbZ2wJZeN6kB2ncn2DSd3C3nymCcz1kGHIdR3Dy5l2OlS6JngVCVoXuhpDzsjDQbrRwHST85XOlXdF6jl8/aQPYkSlZkSRQ5BFzdbnw==',
'txtCustNo' => '',
'txtPassword' => '',
'chkRemUser' => '',
)
these are tied to this specific cookie session, so you must parse them out of the html every time, you cannot hardcode it, but there are still some variables missing (because they are set with javascript, not with HTML), so add those:
$post_data['SM'] = 'UpPnlLogin|btnLogin';
$post_data['__LASTFOCUS'] = '';
$post_data['__EVENTARGUMENT'] = '';
$post_data['__EVENTTARGET'] = 'btnLogin';
$post_data['__ASYNCPOST'] = 'true';
now setting the username and password:
$post_data['txtCustNo'] = "username";
$post_data['txtPassword'] = "password";
and finally to send the actual login request:
$html = $hc->setopt_array(array(
CURLOPT_POST => 1,
CURLOPT_POSTFIELDS => http_build_query($post_data),
CURLOPT_URL => 'https://e.khanbank.com/'
))->exec()->getStdOut();
and finally-finally: check for login errors:
$domd = my_dom_loader($html);
$xp = new DOMXPath($domd);
$login_errors = array();
//uk-alert uk-alert-warning
foreach ($xp->query("//*[contains(#class,'alert')]") as $login_error) {
$login_error = trim($login_error->textContent);
if (!empty($login_error)) {
$login_errors[] = $login_error;
}
}
if (!empty($login_errors)) {
var_dump($login_errors);
throw new \RuntimeException("login errors: " . json_encode($login_errors, JSON_PRETTY_PRINT));
}
echo "logged in successfully! :)";
which yields:
$ php wtf4.php
array(1) {
[0]=>
string(69) "Нэвтрэх нэр эсвэл нууц үг буруу байна!"
}
PHP Fatal error: Uncaught RuntimeException: login errors: [
"\u041d\u044d\u0432\u0442\u0440\u044d\u0445 \u043d\u044d\u0440 \u044d\u0441\u0432\u044d\u043b \u043d\u0443\u0443\u0446 \u04af\u0433 \u0431\u0443\u0440\u0443\u0443 \u0431\u0430\u0439\u043d\u0430!"
] in /cygdrive/c/projects/misc/wtf4.php:63
Stack trace:
#0 {main}
thrown in /cygdrive/c/projects/misc/wtf4.php on line 63
because "username" and "password" is not valid login credentials. also the weird \u0431\u0430\u0439\u043d\u0430 stuff is because PHP's Exception message does not support unicode characters, it seems, and the error message is written in unicode characters (russian maybe?)

Curl PHP cannot display amazon

I am using the following code and am not able to display amazon.com using php and curl. Im using curl_error and getting no errors, so I'm not sure what im doing wrong
<?php
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'https://www.amazon.com');
curl_exec($curl);
curl_close ($curl);
I'm doing this on local host

just display amazon then use this
echo file_get_contents("https://www.amazon.com");

You should use the following:
$response = curl_exec($curl);
$result is an array. You can get for example the body of the request by using:
$header_size = curl_getinfo($curl,CURLINFO_HEADER_SIZE);
$result['header'] = substr($response, 0, $header_size);
$result['body'] = substr( $response, $header_size );
$result['http_code'] = curl_getinfo($curl,CURLINFO_HTTP_CODE);
$result['last_url'] = curl_getinfo($curl,CURLINFO_EFFECTIVE_URL);
echo $result['body'];
For more information: http://php.net/manual/de/function.curl-exec.php

when debugging curl code, use CURLOPT_VERBOSE, and post the CURLOPT_VERBOSE log when asking for help. also when debugging, do not ignore the return values of curl_setopt, because it returns bool(false) if there was an error, and if there was an error, that error would probably explain why the code isn't working. also do not ignore the return value of curl_exec, because it returns bool(false) if there was an error, which goes unnoticed if you ignore the return value (and your code does)
here is a version of your code that doesn't ignore any errors and enable CURLOPT_VERBOSE logging, it should reveal where your code fails:
<?php
$curl = curl_init();
if (! is_resource($curl)) {
throw new \RuntimeException('curl_init() failed!');
}
ecurl_setopt($curl, CURLOPT_URL, 'https://www.amazon.com');
ecurl_setopt($curl, CURLOPT_VERBOSE, 1);
$curlstderr = etmpfile();
$curlstdout = etmpfile();
ecurl_setopt($curl, CURLOPT_STDERR, $curlstderr);
ecurl_setopt($curl, CURLOPT_FILE, $curlstdout);
if (true !== curl_exec($curl)) {
throw new \RuntimeException("curl_exec failed! " . curl_errno($curl) . ": " . curl_error($curl));
}
rewind($curlstderr); // https://bugs.php.net/bug.php?id=76268
rewind($curlstdout); // https://bugs.php.net/bug.php?id=76268
$verbose = stream_get_contents($curlstderr);
$output = stream_get_contents($curlstdout);
curl_close($curl);
fclose($curlstderr);
fclose($curlstdout);
var_dump($verbose, $output);
function ecurl_setopt ( /*resource*/$ch, int $option , /*mixed*/ $value): bool
{
$ret = curl_setopt($ch, $option, $value);
if ($ret !== true) {
// option should be obvious by stack trace
throw new RuntimeException('curl_setopt() failed. curl_errno: ' . return_var_dump(curl_errno($ch)) . '. curl_error: ' . curl_error($ch));
}
return true;
}
function etmpfile()
{
$ret = tmpfile();
if (false === $ret) {
throw new \RuntimeException('tmpfile() failed!');
}
return $ret;
}
also, it appears that https://www.amazon.com has a bug, see is it a bug to send response gzip-compressed to clients that doesn't specify Accept-Encoding: gzip?
in any case, to make curl automatically decompress the gzip-compressed response from amazon, add ecurl_setopt($curl,CURLOPT_ENCODING,''); , that tells libcurl to add the Accept-Encoding: gzip,deflate header, and automatically decompress the result.

PHP script executes faster in browser than in python/another PHP script

I wrote an API in PHP. It executes pretty fast for my purpose (3s) when I call it using the browser. However if I call it using another PHP script (which i wrote to do testing) it takes a looong time (24s) for each request! I use curl to call the URL. Anybody knows whats happening ?
System Config :
Using WAMP to run the PHP.
Hosted on local computer.
Solutions tried :
Disabled all firewalls
Added the option curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
I even wrote a python script to call the PHP API and it also takes a long time. Seems like browser gives the best response time.
Any help is appreciated.
Updated with the code :
<?php
// Class to handle all Utilities
Class Utilities{
// Make a curl call to a URL and return both JSON & Array
public function callBing($bingUrl){
// Initiate curl
$ch = curl_init();
// Disable SSL verification
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
// Will return the response, if false it print the response
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Set the url
curl_setopt($ch, CURLOPT_URL,$bingUrl);
// Performance Tweak
curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12');
session_write_close();
// Execute
$bingJSON=curl_exec($ch);
// Closing
curl_close($ch);
$bingArray = json_decode($bingJSON,true);
return array( "array" => $bingArray , "json" => $bingJSON );
}
}
?>
<?php
// The Test script
include_once('class/class.Utilities.php');
$util = new Utilities();
echo "<style> td { border : thin dashed black;}</style>";
// Test JSON
$testJSON = '
{
"data" : [
{ "A" : "24324" , "B" : "64767", "expectedValue" : "6.65" , "name" : "Test 1"},
{ "A" : "24324" , "B" : "65464", "expectedValue" : "14" , "name" : "Test 2"}
]
}
';
$testArray = json_decode($testJSON, TRUE);
echo "<h1> Test Results </h1>";
echo "<table><tr><th>Test name</th><th> Expected Value</th><th> Passed ? </th></tr>";
$count = count($testArray["data"]);
for ($i=0; $i < $count ; $i++) {
$url = "http://localhost/API.php?txtA=".urlencode($testArray["data"][$i]["A"])."&txtB=".urlencode($testArray["data"][$i]["B"]);
$result = $util->callutil($url);
if($testArray["data"][$i]["expectedValue"] == $result["value"])
$passed = true;
else
$passed = false;
if($passed)
$passed = "<span style='background:green;color: white;font-weight:bold;'>Passed</span>";
else
$passed = "<span style='background:red;color: white;font-weight:bold;'>Failed</span>";
echo "<tr><td>".$testArray["data"][$i]["name"]."</td><td>".$testArray["data"][$i]["expectedValue"]."</td><td>$passed</td></tr>";
}
echo "</table>";
?>

There is an overhead cost involved in starting up the interpreter and parsing the code (whether php, python, ruby, etc). When you have the code running in a server process that startup cost is payed when the server starts initially, and the application logic (plus some minor request/response overhead) is simply executed on the request. When running the code manually, however, that additional startup overhead happens before you code can be run and causes the slowness you are seeing. This is the reason that mod_php, and mod_wsgi exist (as opposed to frameworks that use the CGI api).

php my crawler crash after some time segmentation fault error

i am a newbie in PHP and with my knownledge i build a script in PHP but after some time it crash.
I tested it on 5-6 different Linux OS, debian, ubuntu, redhat, fedora,etc. Only on fedora don't crash but after 3-4 h of working he stops and don't give me any error. The process still remain open, he don't crash, just stop of working, but this only on fedora.
Here's my script code:
<?
ini_set('max_execution_time', 0);
include_once('simple_html_dom.php');
$file = fopen("t.txt", "r");
while(!feof($file)) {
$line = fgets($file);
$line = trim($line);
$line = crawler($line);
}
fclose($file);
function crawler($line) {
$site = $line;
// Check target.
$agent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; pt-pt) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27";
$ch=curl_init();
curl_setopt ($ch, CURLOPT_URL,$line);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode>=200 && $httpcode<=300) {
$check2 = $html = #file_get_html($site);
if($check2 === false) {
return $line;
} else {
foreach($html->find('a') as $element) {
$checkurl = parse_url($element->href);
$checkline = parse_url($line);
if(isset($checkurl['scheme'], $checkurl['host'])) {
if($checkurl['host'] !== $checkline['host']) {
$split = str_split($checkurl['host']);
$replacethis = ".";
$replacewith = "dot";
for($i=0;$i<count($split);$i++) {
if($split[$i] == $replacethis) {
$split[$i] = $replacewith;
}
}
chdir('C:\xampp\htdocs\_test\db');
foreach($split as $element2) {
if(!chdir($element2)) { mkdir($element2); chdir($element2); };
}
$save = fopen('results.txt', 'a'); $txt = "$line,$element->innertext\n"; fwrite($save,$txt); fclose($save);
}
}
}
}
}
}
?>
So my script crawl all backlinks from the targets i specified in t.txt, but only outgoing backlinks... then he scale on directories and save the information.
Here are the errors I got:
Allowed memory size of 16777216 bytes exhausted (tried to allocate 24 bytes)
Segmentation fault (core dumped)
It seems somewhere is a bug.. something is wrong... any ideea? Thanks.

Such error can be thrown when you haven't free memory. I believe it happens inside your simple_html_dom. You need to use
void clear () Clean up memory.
while using it in loop according to its documentation
Also you perform two http request for each line. But it's enough only one curl request. Just save responce
$html = curl_exec($ch);
and than use str_get_html($html) instead of file_get_html($site);
also it's bad practice to use error suppression operator #. If it can throw an exception you better handle it by try ... catch construction.
Also you don't need to do such things
$site = $line;
just use $line
and finally instead of your long line $save = fopen('results.txt', 'a');............... you can use simple file_put_contents()
And i suggest you to output to console what you actually doing now. Like
echo "getting HTML from URL ".$line
echo "parsing text..."
so you can control process somehow

How to tell when curl_multi_exec is done _sending_ data

I need to call a webservice from a PHP script. The web service is slow, and I'm not interested in its response, I only want to send data to it.
I'm trying to use curl_multi_exec (following an example here: http://www.jaisenmathai.com/articles/php-curl-asynchronous.html), and it's second parameter ($still_running) lets you know when its done sending AND receiving. But, again, I'm only interested in knowing when my script is done sending. Of course, if I exit the script before its done sending the data, the web service never registers receiving the request.
Another way to look at it is to detect when PHP is idle, waiting for a response from the server.
What I'd like to achieve is this dialogue:
PHP: Hi, please save this data
WS: Ok, ho hum, lets think about this.
PHP: Cya! (off to do something more important)
WS: Ok, Im done processing, here is your response... PHP? Where did you go? I feel used :(

You can try
$url = "http://localhost/server.php";
$nodes = array();
$nodes["A"] = array("data" => mt_rand()); <-------- Random Data
$nodes["B"] = array("data" => mt_rand());
$nodes["C"] = array("data" => mt_rand());
$nodes["D"] = array("data" => mt_rand());
echo "<pre>";
$mh = curl_multi_init();
$curl_array = array();
foreach ( $nodes as $i => $data ) {
$curl_array[$i] = curl_init($url);
curl_setopt($curl_array[$i], CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_array[$i], CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)');
curl_setopt($curl_array[$i], CURLOPT_POST, true);
curl_setopt($curl_array[$i], CURLOPT_POSTFIELDS, $data);
curl_setopt($curl_array[$i], CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($curl_array[$i], CURLOPT_TIMEOUT, 15);
curl_multi_add_handle($mh, $curl_array[$i]);
echo "Please save this data No : $i ", $data['data'], PHP_EOL;
}
echo PHP_EOL ,PHP_EOL;
$running = NULL;
do {
usleep(10000);
curl_multi_exec($mh, $running);
} while ( $running > 0 );
$res = array();
foreach ( $nodes as $i => $url ) {
$curlErrorCode = curl_errno($curl_array[$i]);
if ($curlErrorCode === 0) {
$info = curl_getinfo($curl_array[$i]);
if ($info['http_code'] == 200) { <------- Connection OK
echo "Cya! (off to do something more important No : $i Done", PHP_EOL;
echo curl_multi_getcontent($curl_array[$i]) , PHP_EOL ;
}
}
curl_multi_remove_handle($mh, $curl_array[$i]);
curl_close($curl_array[$i]);
}
curl_multi_close($mh);
Output
Please save this data No : A 1130087324
Please save this data No : B 1780371600
Please save this data No : C 764866719
Please save this data No : D 2042666801
Cya! (off to do something more important No : A Done
Ok, Im done processing, here is your response...
{"data":"1130087324"} PHP? Where did you go?
I feel used :(
113
Cya! (off to do something more important No : B Done
Ok, Im done processing, here is your response...
{"data":"1780371600"} PHP? Where did you go?
I feel used :(
113
Cya! (off to do something more important No : C Done
Ok, Im done processing, here is your response...
{"data":"764866719"} PHP? Where did you go?
I feel used :(
112
Cya! (off to do something more important No : D Done
Ok, Im done processing, here is your response...
{"data":"2042666801"} PHP? Where did you go?
I feel used :(
113
Simple Test Server server.php
echo printf("Ok, Im done processing, here is your response... \n\t%s PHP? Where did you go? \n\tI feel used :(\n", json_encode($_REQUEST));

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php curl multi error handler - php

Related

I'm trying to log into a website with a curl php script but can't because of viewstate generator and eventvalidation. Is there any way to bypass that?

Curl PHP cannot display amazon

PHP script executes faster in browser than in python/another PHP script

php my crawler crash after some time segmentation fault error

How to tell when curl_multi_exec is done _sending_ data

Categories

Resources