How to tell when curl_multi_exec is done _sending_ data - php

I need to call a webservice from a PHP script. The web service is slow, and I'm not interested in its response, I only want to send data to it.
I'm trying to use curl_multi_exec (following an example here: http://www.jaisenmathai.com/articles/php-curl-asynchronous.html), and it's second parameter ($still_running) lets you know when its done sending AND receiving. But, again, I'm only interested in knowing when my script is done sending. Of course, if I exit the script before its done sending the data, the web service never registers receiving the request.
Another way to look at it is to detect when PHP is idle, waiting for a response from the server.
What I'd like to achieve is this dialogue:
PHP: Hi, please save this data
WS: Ok, ho hum, lets think about this.
PHP: Cya! (off to do something more important)
WS: Ok, Im done processing, here is your response... PHP? Where did you go? I feel used :(

You can try
$url = "http://localhost/server.php";
$nodes = array();
$nodes["A"] = array("data" => mt_rand()); <-------- Random Data
$nodes["B"] = array("data" => mt_rand());
$nodes["C"] = array("data" => mt_rand());
$nodes["D"] = array("data" => mt_rand());
echo "<pre>";
$mh = curl_multi_init();
$curl_array = array();
foreach ( $nodes as $i => $data ) {
$curl_array[$i] = curl_init($url);
curl_setopt($curl_array[$i], CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_array[$i], CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)');
curl_setopt($curl_array[$i], CURLOPT_POST, true);
curl_setopt($curl_array[$i], CURLOPT_POSTFIELDS, $data);
curl_setopt($curl_array[$i], CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($curl_array[$i], CURLOPT_TIMEOUT, 15);
curl_multi_add_handle($mh, $curl_array[$i]);
echo "Please save this data No : $i ", $data['data'], PHP_EOL;
}
echo PHP_EOL ,PHP_EOL;
$running = NULL;
do {
usleep(10000);
curl_multi_exec($mh, $running);
} while ( $running > 0 );
$res = array();
foreach ( $nodes as $i => $url ) {
$curlErrorCode = curl_errno($curl_array[$i]);
if ($curlErrorCode === 0) {
$info = curl_getinfo($curl_array[$i]);
if ($info['http_code'] == 200) { <------- Connection OK
echo "Cya! (off to do something more important No : $i Done", PHP_EOL;
echo curl_multi_getcontent($curl_array[$i]) , PHP_EOL ;
}
}
curl_multi_remove_handle($mh, $curl_array[$i]);
curl_close($curl_array[$i]);
}
curl_multi_close($mh);
Output
Please save this data No : A 1130087324
Please save this data No : B 1780371600
Please save this data No : C 764866719
Please save this data No : D 2042666801
Cya! (off to do something more important No : A Done
Ok, Im done processing, here is your response...
{"data":"1130087324"} PHP? Where did you go?
I feel used :(
113
Cya! (off to do something more important No : B Done
Ok, Im done processing, here is your response...
{"data":"1780371600"} PHP? Where did you go?
I feel used :(
113
Cya! (off to do something more important No : C Done
Ok, Im done processing, here is your response...
{"data":"764866719"} PHP? Where did you go?
I feel used :(
112
Cya! (off to do something more important No : D Done
Ok, Im done processing, here is your response...
{"data":"2042666801"} PHP? Where did you go?
I feel used :(
113
Simple Test Server server.php
echo printf("Ok, Im done processing, here is your response... \n\t%s PHP? Where did you go? \n\tI feel used :(\n", json_encode($_REQUEST));

Related

How to get a specified row using cUrl PHP

Hey guys I use curl to communicate web external server, but the type of response is html, I was able to convert it to json code (more than 4000 row) but I have no idea how to get specified row which contains my result. Any idea ?
Here is my cUrl code :
require_once('getJson.php');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.reputationauthority.org/domain_lookup.php?ip=website.com&Submit.x=9&Submit.y=5&Submit=Search');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$data = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
$data = '<<<EOF'.$data.'EOF';
$json = new GetJson();
header("Content-Type: text/plain");
$res = json_encode($json->html_to_obj($data), JSON_PRETTY_PRINT);
$myArray = json_decode($res,true);
For getJson.php
class GetJson{
function html_to_obj($html) {
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
return $this->element_to_obj($dom->documentElement);
}
function element_to_obj($element) {
if ($element->nodeType == XML_ELEMENT_NODE){
$obj = array( "tag" => $element->tagName );
foreach ($element->attributes as $attribute) {
$obj[$attribute->name] = $attribute->value;
}
foreach ($element->childNodes as $subElement) {
if ($subElement->nodeType == XML_TEXT_NODE) {
$obj["html"] = $subElement->wholeText;
}
else {
$obj["children"][] = $this->element_to_obj($subElement);
}
}
return $obj;
}
}
}
My idea is instead of Browsing rows to achieve lign 2175 (doing something like : $data['children'][2]['children'][7]['children'][3]['children'][1]['children'][1]['children'][0]['children'][1]['children'][0]['children'][1]['children'][2]['children'][0]['children'][0]['html'] is not a good idea to me), I want to go directly to it.
If the HTML being returned has a consistent structure every time, and you just want one particular value from one part of it, you may be able to use regular expressions to parse the HTML and find the part you need. This is an alternative you trying to put the whole thing into an array. I have used this technique before to parse a HTML document and find a specific item. Here's a simple example. You will need to adapt it to your needs, since you haven't specified the exact nature of the data you're seeking. You may need to go down several levels of parsing to find the right bit:
$data = curl_exec($ch);
//Split the output into an array that we can loop through line by line
$array = preg_split('/\n/',$data);
//For each line in the output
foreach ($array as $element)
{
//See if the line contains a hyperlink
if (preg_match("/<a href/", "$element"))
{
...[do something here, e.g. store the data retrieved, or do more matching to find something within it]...
}
}

PHP - The fastest way to get content of another website and parse this content

I have to get a few params of user's from website. I can do it because every user have an unique ID and I can search users by URL:
http://page.com/search_user.php?uid=X
So I added this URL in for() loop and I tried to get 500 results:
<?php
$start = time();
$results = array();
for($i=0; $i<= 500; $i++)
{
$c = curl_init();
curl_setopt($c, CURLOPT_URL, 'http://page.com/search_user.php?uid='.$i);
curl_setopt($c, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.0; pl; rv:1.9.1.2) Gecko/20090729 desktopsmiley_2_2_5643778701369665_44_71 DS_gamingharbor Firefox/3.5.2 (.NET CLR 3.5.30729)');
curl_setopt($c, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$p = curl_exec($c);
curl_close($c);
if ( preg_match('"<span class=\"uname\">(.*?)</span>"si', $p, $matches) )
{
$username = $matches[1];
}
else
{
continue;
}
preg_match('"<table cellspacing=\"0\">(.*?)</table>"si', $p, $matches);
$comments = $matches[1];
preg_match('"<tr class=\"pos\">(.*?)</tr>"si', $comments, $matches_pos);
preg_match_all('"<td>([0-9]+)</td>"si', $matches_pos[1], $matches);
$comments_pos = $matches[1][2];
preg_match('"<tr class=\"neu\">(.*?)</tr>"si', $comments, $matches_neu);
preg_match_all('"<td>([0-9]+)</td>"si', $matches_neu[1], $matches);
$comments_neu = $matches[1][2];
preg_match('"<tr class=\"neg\">(.*?)</tr>"si', $comments, $matches_neg);
preg_match_all('"<td>([0-9]+)</td>"si', $matches_neg[1], $matches);
$comments_neg = $matches[1][2];
$comments_all = $comments_pos+$comments_neu+$comments_neg;
$about_me = 0;
if ( preg_match('"<span>O mnie</span>"si', $p) )
{
$about_me = 1;
}
$results[] = array('comments' => $comments_all, 'about_me' => $about_me, 'username' => $username);
}
echo 'Generated in: <b>'.(time()-$start).'</b> seconds.<br><br>';
var_dump($results);
?>
Finally I got results:
- everything was generated in 135 seconds.
Then I I replaced curl with file_get_contents() and I got: 155 seconds.
Is faster way to get this results than curl ?? I have to get 20.000.000 results from another page and 135 seconds is too much for me.
Thanks.
If you really need to query different URLs 500 times, maybe you should consider asynchronous approach. The problem with above is that the slowest part (bottleneck) are the curl requests themselves. While waiting for the response, your code is doing nothing.
Try to have a look at PHP asynchronous cURL with callback (i.e. you would make 500 requests "almost at once" and process responses as they come - asynchronously).
Take a look at a previous answer of mine regarding how to divide and conquer this kind of job.
debugging long running PHP script
In your case I'd say you follow the same idea, but you'd further chunk the requests into groups of 500, say.

PHP script executes faster in browser than in python/another PHP script

I wrote an API in PHP. It executes pretty fast for my purpose (3s) when I call it using the browser. However if I call it using another PHP script (which i wrote to do testing) it takes a looong time (24s) for each request! I use curl to call the URL. Anybody knows whats happening ?
System Config :
Using WAMP to run the PHP.
Hosted on local computer.
Solutions tried :
Disabled all firewalls
Added the option curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
I even wrote a python script to call the PHP API and it also takes a long time. Seems like browser gives the best response time.
Any help is appreciated.
Updated with the code :
<?php
// Class to handle all Utilities
Class Utilities{
// Make a curl call to a URL and return both JSON & Array
public function callBing($bingUrl){
// Initiate curl
$ch = curl_init();
// Disable SSL verification
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
// Will return the response, if false it print the response
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Set the url
curl_setopt($ch, CURLOPT_URL,$bingUrl);
// Performance Tweak
curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12');
session_write_close();
// Execute
$bingJSON=curl_exec($ch);
// Closing
curl_close($ch);
$bingArray = json_decode($bingJSON,true);
return array( "array" => $bingArray , "json" => $bingJSON );
}
}
?>
<?php
// The Test script
include_once('class/class.Utilities.php');
$util = new Utilities();
echo "<style> td { border : thin dashed black;}</style>";
// Test JSON
$testJSON = '
{
"data" : [
{ "A" : "24324" , "B" : "64767", "expectedValue" : "6.65" , "name" : "Test 1"},
{ "A" : "24324" , "B" : "65464", "expectedValue" : "14" , "name" : "Test 2"}
]
}
';
$testArray = json_decode($testJSON, TRUE);
echo "<h1> Test Results </h1>";
echo "<table><tr><th>Test name</th><th> Expected Value</th><th> Passed ? </th></tr>";
$count = count($testArray["data"]);
for ($i=0; $i < $count ; $i++) {
$url = "http://localhost/API.php?txtA=".urlencode($testArray["data"][$i]["A"])."&txtB=".urlencode($testArray["data"][$i]["B"]);
$result = $util->callutil($url);
if($testArray["data"][$i]["expectedValue"] == $result["value"])
$passed = true;
else
$passed = false;
if($passed)
$passed = "<span style='background:green;color: white;font-weight:bold;'>Passed</span>";
else
$passed = "<span style='background:red;color: white;font-weight:bold;'>Failed</span>";
echo "<tr><td>".$testArray["data"][$i]["name"]."</td><td>".$testArray["data"][$i]["expectedValue"]."</td><td>$passed</td></tr>";
}
echo "</table>";
?>
There is an overhead cost involved in starting up the interpreter and parsing the code (whether php, python, ruby, etc). When you have the code running in a server process that startup cost is payed when the server starts initially, and the application logic (plus some minor request/response overhead) is simply executed on the request. When running the code manually, however, that additional startup overhead happens before you code can be run and causes the slowness you are seeing. This is the reason that mod_php, and mod_wsgi exist (as opposed to frameworks that use the CGI api).

php curl multi error handler

i want to capture curl errors and warnings in my error handler so that they do not get echoed to the user. to prove that all errors have been caught i prepend the $err_start string to the error. currently here is a working (but simplified) snippet of my code (run it in a browser, not cli):
<?php
set_error_handler('handle_errors');
test_curl();
function handle_errors($error_num, $error_str, $error_file, $error_line)
{
$err_start = 'caught error'; //to prove that the error has been properly caught
die("$err_start $error_num, $error_str, $error_file, $error_line<br>");
}
function test_curl()
{
$curl_multi_handle = curl_multi_init();
$curl_handle1 = curl_init('iamdooooooooooown.com');
curl_setopt($curl_handle1, CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($curl_multi_handle, $curl_handle1);
$still_running = 1;
while($still_running > 0) $multi_errors = curl_multi_exec($curl_multi_handle, $still_running);
if($multi_errors != CURLM_OK) trigger_error("curl error [$multi_errors]: ".curl_error($curl_multi_handle), E_USER_ERROR);
if(strlen(curl_error($curl_handle1))) trigger_error("curl error: [".curl_error($curl_handle1)."]", E_USER_ERROR);
$curl_info = curl_getinfo($curl_handle1); //info for individual requests
$content = curl_multi_getcontent($curl_handle1);
curl_multi_remove_handle($curl_multi_handle, $curl_handle1);
curl_close($curl_handle1);
curl_multi_close($curl_multi_handle);
}
?>
note that my full code has multiple requests in parallel, however the issue still manifests with a single request as shown here. note also that the error handler shown in this code snippet is very basic - my actual error handler will not die on warnings or notices, so no need to school me on this.
now if i try and curl a host which is currently down then i successfully capture the curl error and my script dies with:
caught error 256, curl error: [Couldn't resolve host 'iamdooooooooooown.com'], /var/www/proj/test_curl.php, 18
however the following warning is not caught by my error handler function, and is being echoed to the page:
Warning: (null)(): 3 is not a valid cURL handle resource in Unknown on line 0
i would like to capture this warning in my error handler so that i can log it for later inspection.
one thing i have noticed is that the warning only manifests when the curl code is inside a function - it does not happen when the code is at the highest scope level. is it possible that one of the curl globals (eg CURLM_OK) is not accessible within the scope of the test_curl() function?
i am using PHP Version 5.3.2-1ubuntu4.19
edits
updated the code snippet to fully demonstrate the error
the uncaptured warning only manifests when inside a function or class method
I don't think i agree with the with the way you are capturing the error ... you can try
$nodes = array(
"http://google.com",
"http://iamdooooooooooown.com",
"https://gokillyourself.com"
);
echo "<pre>";
print_r(multiplePost($nodes));
Output
Array
(
[google.com] => #HTTP-OK 48.52 kb returned
[iamdooooooooooown.com] => #HTTP-ERROR 0 for : http://iamdooooooooooown.com
[gokillyourself.com] => #HTTP-ERROR 0 for : https://gokillyourself.com
)
Function Used
function multiplePost($nodes) {
$mh = curl_multi_init();
$curl_array = array();
foreach ( $nodes as $i => $url ) {
$url = trim($url);
$curl_array[$i] = curl_init($url);
curl_setopt($curl_array[$i], CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_array[$i], CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)');
curl_setopt($curl_array[$i], CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($curl_array[$i], CURLOPT_TIMEOUT, 15);
curl_setopt($curl_array[$i], CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl_array[$i], CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl_array[$i], CURLOPT_SSL_VERIFYPEER, 0);
curl_multi_add_handle($mh, $curl_array[$i]);
}
$running = NULL;
do {
usleep(10000);
curl_multi_exec($mh, $running);
} while ( $running > 0 );
$res = array();
foreach ( $nodes as $i => $url ) {
$domain = parse_url($url, PHP_URL_HOST);
$curlErrorCode = curl_errno($curl_array[$i]);
if ($curlErrorCode === 0) {
$info = curl_getinfo($curl_array[$i]);
$info['url'] = trim($info['url']);
if ($info['http_code'] == 200) {
$content = curl_multi_getcontent($curl_array[$i]);
$res[$domain] = sprintf("#HTTP-OK %0.2f kb returned", strlen($content) / 1024);
} else {
$res[$domain] = "#HTTP-ERROR {$info['http_code'] } for : {$info['url']}";
}
} else {
$res[$domain] = sprintf("#CURL-ERROR %d: %s ", $curlErrorCode, curl_error($curl_array[$i]));
}
curl_multi_remove_handle($mh, $curl_array[$i]);
curl_close($curl_array[$i]);
flush();
ob_flush();
}
curl_multi_close($mh);
return $res;
}
it is possible that this is a bug with php-curl. when the following line is removed, then everything behaves ok:
if(strlen(curl_error($curl_handle1))) trigger_error("curl error: [".curl_error($curl_handle1)."]", E_USER_ERROR);
as far as i can tell, curling a host that is down is corrupting $curl_handle1 in some way that the curl_error() function is not prepared for. to get around this problem (until a bug fix is made) just test if the http_code returned by curl_getinfo() is 0. if it is 0 then do not use the curl_error function:
if($multi_errors != CURLM_OK) trigger_error("curl error [$multi_errors]: ".curl_error($curl_multi_handle), E_USER_ERROR);
$curl_info = curl_getinfo($curl_handle1); //info for individual requests
$is_up = ($curl_info['http_code'] == 0) ? 0 : 1;
if($is_up && strlen(curl_error($curl_handle1))) trigger_error("curl error: [".curl_error($curl_handle1)."]", E_USER_ERROR);
its not a very elegant solution, but it may have to do for now.

Caching JSON output in PHP

Got a slight bit of an issue. Been playing with the facebook and twitter API's and getting the JSON output of status search queries no problem, however I've read up further and realised that I could end up being "rate limited" as quoted from the documentation.
I was wondering is it easy to cache the JSON output each hour so that I can at least try and prevent this from happening? If so how is it done? As I tried a youtube video but that didn't really give much information only how to write the contents of a directory listing to a cache.php file, but it didn't really point out whether this can be done with JSON output and certainly didn't say how to use the time interval of 60 minutes or how to get the information then back out of the cache file.
Any help or code would be very much appreciated as there seems to be very little in tutorials on this sorta thing.
Here a simple function that adds caching to getting some URL contents:
function getJson($url) {
// cache files are created like cache/abcdef123456...
$cacheFile = 'cache' . DIRECTORY_SEPARATOR . md5($url);
if (file_exists($cacheFile)) {
$fh = fopen($cacheFile, 'r');
$size = filesize($cacheFile);
$cacheTime = trim(fgets($fh));
// if data was cached recently, return cached data
if ($cacheTime > strtotime('-60 minutes')) {
return fread($fh, $size);
}
// else delete cache file
fclose($fh);
unlink($cacheFile);
}
$json = /* get from Twitter as usual */;
$fh = fopen($cacheFile, 'w');
fwrite($fh, time() . "\n");
fwrite($fh, $json);
fclose($fh);
return $json;
}
It uses the URL to identify cache files, a repeated request to the identical URL will be read from the cache the next time. It writes the timestamp into the first line of the cache file, and cached data older than an hour is discarded. It's just a simple example and you'll probably want to customize it.
It's a good idea to use caching to avoid the rate limit.
Here's some example code that shows how I did it for Google+ data,
in some php code I wrote recently.
private function getCache($key) {
$cache_life = intval($this->instance['cache_life']); // minutes
if ($cache_life <= 0) return null;
// fully-qualified filename
$fqfname = $this->getCacheFileName($key);
if (file_exists($fqfname)) {
if (filemtime($fqfname) > (time() - 60 * $cache_life)) {
// The cache file is fresh.
$fresh = file_get_contents($fqfname);
$results = json_decode($fresh,true);
return $results;
}
else {
unlink($fqfname);
}
}
return null;
}
private function putCache($key, $results) {
$json = json_encode($results);
$fqfname = $this->getCacheFileName($key);
file_put_contents($fqfname, $json, LOCK_EX);
}
and to use it:
// $cacheKey is a value that is unique to the
// concatenation of all params. A string concatenation
// might work.
$results = $this->getCache($cacheKey);
if (!$results) {
// cache miss; must call out
$results = $this->getDataFromService(....);
$this->putCache($cacheKey, $results);
}
I know this post is old, but it show in google so for everyone looking, I made this simple one that curl a JSON url and cache it in a file that is in a specific folder, when json is requested again if 5min passed it will curl it if the 5min didnt pass yet, it will show it from file, it uses timestamp to track time and yea, enjoy
function ccurl($url,$id){
$path = "./private/cache/$id/";
$files = scandir($path);
$files = array_values(array_diff(scandir($path), array('.', '..')));
if(count($files) > 1){
foreach($files as $file){
unlink($path.$file);
$files = scandir($path);
$files = array_values(array_diff(scandir($path), array('.', '..')));
}
}
if(empty($files)){
$c = curl_init();
curl_setopt($c, CURLOPT_URL, $url);
curl_setopt($c, CURLOPT_TIMEOUT, 15);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_USERAGENT,
'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0');
$response = curl_exec($c);
curl_close ($c);
$fp = file_put_contents($path.time().'.json', $response);
return $response;
}else {
if(time() - str_replace('.json', '', $files[0]) > 300){
unlink($path.$files[0]);
$c = curl_init();
curl_setopt($c, CURLOPT_URL, $url);
curl_setopt($c, CURLOPT_TIMEOUT, 15);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_USERAGENT,
'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0');
$response = curl_exec($c);
curl_close ($c);
$fp = file_put_contents($path.time().'.json', $response);
return $response;
}else {
return file_get_contents($path. $files[0]);
}
}
}
for usage create a directory for all cached files, for me its /private/cache then create another directory inside for the request cache like x for example, and when calling the function it should be like htis
ccurl('json_url','x')
where x is the id, if u have question pls ask me ^_^ also enjoy (i might update it later so it doesn't use a directory for id

Categories