curl_multi_exec - parsing the html - php

I found this script on php.net and lets say I wanted to get only the info from part of the page. How would one go about doing this, I know how to do it with curl_init, but the multi seems much more efficent.
For example:
from php.net
<?php
// create both cURL resources
$ch1 = curl_init();
$ch2 = curl_init();
// set URL and other appropriate options
curl_setopt($ch1, CURLOPT_URL, "http://lxr.php.net/");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch2, CURLOPT_HEADER, 0);
//create the multiple cURL handle
$mh = curl_multi_init();
//add the two handles
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//close the handles
curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh);
?>
Id like to just get the info below from the request:
<b>Key enhancements in PHP 5.3.3 include:</b>
</p>
<ul>
<li>Upgraded bundled sqlite to version 3.6.23.1.</li>
<li>Upgraded bundled PCRE to version 8.02.</li>
<li>Added FastCGI Process Manager (FPM) SAPI.</li>
<li>Added stream filter support to mcrypt extension.</li>
<li>Added full_special_chars filter to ext/filter.</li>
<li>Fixed a possible crash because of recursive GC invocation.</li>
<li>Fixed bug #52238 (Crash when an Exception occured in iterator_to_array).</li>
<li>Fixed bug #52041 (Memory leak when writing on uninitialized variable returned from function).</li>
<li>Fixed bug #52060 (Memory leak when passing a closure to method_exists()).</li>
<li>Fixed bug #52001 (Memory allocation problems after using variable variables).</li>
<li>Fixed bug #51723 (Content-length header is limited to 32bit integer with Apache2 on Windows).</li>
<li>Fixed bug #48930 (__COMPILER_HALT_OFFSET__ incorrect in PHP >= 5.3).</li>
</ul>

Related

What is the object that contains data in PHP multi cURL

From the PHP manual a multi curl is performed like this:
// create both cURL resources
$ch1 = curl_init();
$ch2 = curl_init();
// set URL and other appropriate options
curl_setopt($ch1, CURLOPT_URL, "http://lxr.php.net/");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch2, CURLOPT_HEADER, 0);
//create the multiple cURL handle
$mh = curl_multi_init();
//add the two handles
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
//---------
Link:http://php.net
I can copy paste the code as is and it would get me the content. Notice how I don't "echo" anything, so it works without echoing.
So my question is, where does the data come from? What is the object holding the data? I know you have to set CURLOPT_RETURNTRANSFER to true and then get the content with curl_multi_getcontent() but as I stated the script retrieves content, but where is the object?
Both curl_exec and curl_multi_exec output the response by default. You need to set CURLOPT_RETURNTRANSFER option to true to disable the output and make those functions return the response instead.

When is it best to check asynchronous cURL requests for completion?

Multiple cURL requests are better to be made in an asynchronous manner, that is without each of the requests waiting till all the previous requests have received responses. Another optimization in many cases would be starting to process a received response without waiting for other responses. However, the docs and official examples are not clear when it is both possible and as early as possible to check for completed requests (which is typically done using curl_multi_info_read function).
So when is the earliest point to check for completed requests? Or what is the optimal set of such points?
This is the example from the curl_multi_exec's page (comments in upper case are mine):
<?php
// create both cURL resources
$ch1 = curl_init();
$ch2 = curl_init();
// set URL and other appropriate options
curl_setopt($ch1, CURLOPT_URL, "http://lxr.php.net/");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch2, CURLOPT_HEADER, 0);
//create the multiple cURL handle
$mh = curl_multi_init();
//add the two handles
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
// SHOULD REQUESTS BE CHECKED FOR COMPLETION HERE?
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
// SHOULD REQUESTS BE CHECKED FOR COMPLETION HERE?
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
// SHOULD REQUESTS BE CHECKED FOR COMPLETION HERE?
}
// SHOULD REQUESTS BE CHECKED FOR COMPLETION HERE?
}
// SHOULD REQUESTS BE CHECKED FOR COMPLETION HERE?
//close the handles
curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh);
?>
First, to simplify your life the CURLM_CALL_MULTI_PERFORM return code isn't used in modern libcurls (not used in 7.20.0 or later).
Then, as long as 'active' is larger than zero there are at least one active transfer in progress so you can wait with checking curl_multi_info_read() if you want.
Or you can call curl_multi_info_read() immediately after every call to curl_multi_exec(), that's up to you!

curl_multi_exec fails on Mac OS X

I'm trying to run this simple piece of code from php.net under my Mac OS X (Mavericks) to try cURL multi exec feature :
<?php
// create both cURL resources
$ch1 = curl_init();
$ch2 = curl_init();
// set URL and other appropriate options
curl_setopt($ch1, CURLOPT_URL, "http://lxr.php.net/");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch2, CURLOPT_HEADER, 0);
//create the multiple cURL handle
$mh = curl_multi_init();
//add the two handles
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//close the handles
curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh);
?>
That does not work as the only output I got is :
PHP Fatal error: Maximum execution time of 30 seconds exceeded in
This is my environment: Mac OS X 10.9, PHP 5.4.17, Apache 2.2.24.
cUrl is installed as I my regular curl "single" requests work great.
I think this is an issue with Mac OS but I can't find any fix. Do you have any idea?
EDIT: I tried the same code on a Linux server and everything worked fine.
On php 5.3.18+ be aware that curl_multi_select() may return -1 forever until you call curl_multi_exec().
Try this:
while ($this->active && $mrc == CURLM_OK)
{
// add this line
while (curl_multi_exec($this->mh, $this->active) === CURLM_CALL_MULTI_PERFORM);
if (curl_multi_select($this->mh) != -1)
{
do {
$mrc = curl_multi_exec($this->mh, $this->active);
if ($mrc == CURLM_OK)
{
while($info = curl_multi_info_read($this->mh))
{
$this->process($info);
}
}
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
See https://bugs.php.net/bug.php?id=63411 or http://marchtea.com/?p=109 for more information.

Loading remote items together or separately?

I'm looking to optimize my application. It uses the Twitter and the Facebook API and loads large files to be displayed on the users screen. Right now, I am running the script linearly, calling one file that includes both API calls using AJAX and loading all of the information onto the screen. Would it be faster for me to separate the two API calls into two different files and then load each one separately with AJAX? This way, if one response was taking longer then the other, the faster one would still be displayed.
Thank you.
If it matters, I'm using PHP and CURL for API calls.
Certainly it would be better if the AJAX calls don't depend each other. You can also do this at the PHP side using curl_multi_init that executes HTTP calls in paralell.
Sample from the PHP manual:
<?php
// create both cURL resources
$ch1 = curl_init();
$ch2 = curl_init();
// set URL and other appropriate options
curl_setopt($ch1, CURLOPT_URL, "http://lxr.php.net/");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch2, CURLOPT_HEADER, 0);
//create the multiple cURL handle
$mh = curl_multi_init();
//add the two handles
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//close the handles
curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh);
?>

how to make non blocking call in php

I am using a php script to upload lot of files. I am using the CURL command . The remote server accepts only POST requests. But when I execute the below script it processes the first request and waits until the first file is uploaded. Is there a way to make it non blocking and run simultaneous 2 curl upload requests .Find the code sample below.
<?php
$arr= array(somefile1.txt,somefile2.txt);
for ( $i=0;$i<2;$i++) {
$cmd = "curl -F name=aaa -F type=yyy FileName=#/xxxxx/xxxx/$arr[$i] http://someurl.com";
print "Executing file ";
shell_exec("nohup $cmd 2> /dev/null & echo $!" );
print "======= done ================";
}
?>
I believe you may want curl_multi_init. Here is an outbound example; it will have to be adapted for your inbound problem. This seems cleaner than you forking multiple threads yourself.
<?php
// create both cURL resources
$ch1 = curl_init();
$ch2 = curl_init();
// set URL and other appropriate options
curl_setopt($ch1, CURLOPT_URL, "http://lxr.php.net/");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch2, CURLOPT_HEADER, 0);
//create the multiple cURL handle
$mh = curl_multi_init();
//add the two handles
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//close the handles
curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh);
?>
There is a good article about "multithreading", take a look at it here: Multithreading in PHP with CURL
You can try use PHP Simple Curl Wrapper - https://github.com/Graceas/php-simple-curl-wrapper. This library allows the processing of multiple request's asynchronously.
You can find full answer here: php asynchronous cURL request
No you cannot run simultaneously two curl statements.
Curl is made for working like this. A Curl statement will make the later statements
wait until it finishes its operation.

Categories