When running file download to a file pointer using a single thread it works fine. When utilizing multithread it doesn't download the full file (stops somewhere in the middle)
Single thread (works)
$fp = fopen('php://output', 'w');
$ch = curl_init(str_replace(" ", "%20", $url)); //Here is the file we are downloading, replace spaces with %20
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$return = curl_exec($ch); // get curl response
$code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$error = curl_error($ch);
curl_close($ch);
fclose($fp);
Multithread (incomplete download)
$bodyStream = fopen('php://output', 'w');
$headerStream = fopen('php://temp', 'rw');
$ch = curl_init(str_replace(" ", "%20", $url)); //Here is the file we are downloading, replace spaces with %20
curl_setopt($ch, CURLOPT_WRITEHEADER, $headerStream);
curl_setopt($ch, CURLOPT_FILE, $bodyStream);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$mh = curl_multi_init();
curl_multi_add_handle($mh, $ch);
$headerProcessed = false;
ob_start(); // Buffer body output until headers are ready
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
// Process headers
if (!$headerProcessed) {
$currentPos = ftell($headerStream);
rewind($headerStream);
$header = stream_get_contents($headerStream);
fseek($headerStream, $currentPos); // Is this really needed?
if (strpos($header, "\r\n\r\n") !== false) {
// Copy headers such as Content-Length etc.
$this->generateProxyHeader($header);
// Headers set. Now send output to browser
$headerProcessed = true;
ob_end_flush();
}
}
}
curl_multi_remove_handle($mh, $ch);
curl_multi_close($mh);
exit; // Download complete, stop processing
The main reason I need the headers prior to output is to catch errors on the backend. (Single threaded approach would result in 200OK header, even if server responds 404 or 500). This would effectively break the data in the file.
How can I ensure the full file is sent to php://output before PHP stops sending data to browser, and still use the curl_multi (such that I can proxy large files, including headers)?
I found the answer in: PHP & curl_multi and CURLOPT_FILE = No File Contents
Seems there's some unexpected behavior when using CURLOPT_FILE and curl_multi. The workaround is to explicitly call fclose($bodyStream)
Related
In the code below from http://php.net/manual/en/function.curl-multi-init.php
How can I add code before the second request is made (ie sleep(5)) before curl makes the request to twitter)
Regards
<?php
// create both cURL resources
$ch1 = curl_init();
$ch2 = curl_init();
// set URL and other appropriate options
curl_setopt($ch1, CURLOPT_URL, "https://www.google.com");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, "https://twitter.com");
curl_setopt($ch2, CURLOPT_HEADER, 0);
//create the multiple cURL handle
$mh = curl_multi_init();
//add the two handles
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//close the handles
curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh);
?>
I'm no PHP guy or competent programmer for that matter :D Now that disclaimer is out there, here's my solution.
There's probably a much cleaner way to do this but I have limited knowledge of PHP and how to extend classes. For that reason, I decided to use the built-in process control extensions and create a helper function to handle the curl process. I'm sure there are much better programmers out there ready to provide a much cleaner solution though.
<?php
// Helper function
function async_curl($url,$delay){
sleep($delay);
echo "FORK: Getting $url after $delay seconds\n";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
// Mute the return for demonstration purposes.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_exec($ch);
curl_close($ch);
}
$urls = array("http://google.com","http://twitter.com","http://www.facebook.com");
foreach($urls as $url){
// Generate random timeout for demonstration purposes.
$delay = rand(1,20);
// Create a forked child process for each URL
$pid = pcntl_fork();
// Exit if fork failed
if ($pid == -1) {
exit("Error, failed to create a child process for the URL: $url");
// Create a single child process to call the helper function
} else if ($pid == 0) {
echo "MAIN: Forking process for $url\nPID: " .getmypid() . "\tDelay: $delay\n";
async_curl($url,$delay);
exit();
}
}
// Wait for all forked processes to complete before exiting.
while (($pid = pcntl_waitpid(0, $status)) > 0) {
echo "MAIN: Process $pid completed\n";
}
?>
If the following PHP is runnig, I would like to download the file with `cURL on the client-site. It means, if one of my website visitor run's an action, which starts this PHP file, it should download the file on this PC.
I tried with different locations, but without any success. If I run my code, it does always download it on my WebSever, which is not this, what I want.
<?php
//The resource that we want to download.
$fileUrl = 'https://www.example.com/this-is-a-example-video';
//The path & filename to save to.
$saveTo = 'test.mp4';
//Open file handler.
$fp = fopen($saveTo, 'w+');
//If $fp is FALSE, something went wrong.
if($fp === false){
throw new Exception('Could not open: ' . $saveTo);
}
//Create a cURL handle.
$ch = curl_init($fileUrl);
//Pass our file handle to cURL.
curl_setopt($ch, CURLOPT_FILE, $fp);
//Timeout if the file doesn't download after 20 seconds.
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
//Execute the request.
curl_exec($ch);
//If there was an error, throw an Exception
if(curl_errno($ch)){
throw new Exception(curl_error($ch));
}
//Get the HTTP status code.
$statusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
//Close the cURL handler.
curl_close($ch);
if($statusCode == 200){
echo 'Downloaded!';
} else{
echo "Status Code: " . $statusCode;
}
?>
How can I change the cURL downloading process to the client-site?
PHP cannot run client-side.
You could use cURL to download data to the server (without saving it to a file) and then output that data to the client.
Don't do this:
//Open file handler.
$fp = fopen($saveTo, 'w+');
or this:
//Pass our file handle to cURL.
curl_setopt($ch, CURLOPT_FILE, $fp);
Then capture the output:
//Execute the request.
curl_exec($ch);
Should be:
//Execute the request.
$output = curl_exec($ch);
Then you can:
echo $output;
… but make sure you set the Content-Type and consider setting the Content-Length response headers. You might also want Content-Disposition.
Under most circumstances, it would probably be better to simply send the browser to fetch the file directly instead of proxying it through the server.
$fileUrl = 'https://www.example.com/this-is-a-example-video';
header("Location: $fileUrl");
I have an API written in PHP that sends 10 requests with CURL.
The problem is that when I send a HTTP request to the API, I get the response right away, although the server hasn't finished working( getting the response for all of the 10 requests).
I can't use ignore_user_abort() because I need to know exactly the time that the API finished.
How can I notify the connection "hey, wait for the script to finish working"?
Important note: if I use sleep() the connection holds.
Here's my code: gist
This is just a example to show how ob_start works.
echo "hello";
ob_start(); // output buffering starts here
echo "hello1";
//all curl requests
if(all curl requests completed)
{
ob_end_flush() ;
}
With no code to refer, I can only show implementation of ob_start. You have to change this code according to your requirement.
$handlers = [];
$mh = curl_multi_init();
ob_start(); // output buffering starts here
foreach($query->fetchAll() as $domain){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://'.$domain['name']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $DEFAULT_REQUEST_TIMEOUT);
curl_setopt($ch, CURLOPT_TIMEOUT, $DEFAULT_REQUEST_TIMEOUT);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 2);
curl_multi_add_handle($mh, $ch);
$handlers[] = ['ch'=>$ch, 'domain_id'=>$domain['domain_id']];
echo $domain['name'];
}
// Execute the handles
$active = null;
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
// Wait for activity on any curl-connection
if (curl_multi_select($mh) == -1) {
usleep(1);
}
// Continue to exec until curl is ready to
// give us more data
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
// Extract the content
$values = [];
foreach($handlers as $key => $handle){
// Check for errors
echo $key.'. result: ';
$curlError = curl_error($handle['ch']);
if($curlError == ""){
$res = curl_multi_getcontent($handle['ch']);
echo 'done';
}
else {
echo "Curl error on handle $key: $curlError".' <br />';
}
// Remove and close the handle
curl_multi_remove_handle($mh, $handle['ch']);
curl_close($handle['ch']);
}
// Clean up the curl_multi handle
curl_multi_close($mh);
ob_end_flush() ; // output flushed here
Source - http://php.net/manual/en/function.ob-start.php
I use this code for my website
ob_start("unique_identifier");
// your header script
// your page script
// your footer script
ob_end_flush("unique_identifier");
ob_end_clean("unique_identifier");
I use "unique_identifier" because inside my script also exists another
ob_start()
I wrote a PHP script that allows me to get the dimensions (width and height) of a remotely hosted JPG without having to download it in full (just the first 10K).
The problem with this is I write the partial download to a file, then read that file to extract the information I need (using getImageSize).
I know this can be down without writing to disk, but I do not know how.
Anyone have suggestions/solutions?
Here is my original code:
function remoteImage($url){
$ch = curl_init ($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1);
curl_setopt($ch, CURLOPT_RANGE, "0-10240");
$fn = "partial.jpg";
$raw = curl_exec($ch);
$result = array();
if(file_exists($fn)){
unlink($fn);
}
if ($raw !== false) {
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($status == 200 || $status == 206) {
$result["w"] = 0;
$result["h"] = 0;
$fp = fopen($fn, 'x');
fwrite($fp, $raw);
fclose($fp);
$size = getImageSize($fn);
if ($size===false) {
// Cannot get file size information
} else {
// Return width and height
list($result["w"], $result["h"]) = $size;
}
}
}
curl_close ($ch);
return $result;
}
My original question, which lead to this, is here - and might be helpful.
It may be possible to use a memory file stream.
$fn = 'php://memory';
See: http://php.net/manual/en/wrappers.php.php
i want to get several pages thru curl_exec, first page is come normally, but all others - 302 header, what reason?
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, ROOT_URL);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$content = curl_exec($curl); // here good content
curl_close($curl);
preg_match_all('/href="(\/users\/[^"]+)"[^>]+>\s*/i', $content, $p);
for ($j=0; $j<count($p[1]); $j++){
$new_curl = curl_init();
curl_setopt($new_curl, CURLOPT_URL, NEW_URL.$p[1][$j]);
curl_setopt($new_curl, CURLOPT_RETURNTRANSFER, 0);
$content = curl_exec($new_curl); // here 302
curl_close($new_curl);
preg_match('/[^#]+#[^"]+/i', $content, $p2);
}
smth like this
You probably want to provide a sample of your code so we can see if you're omitting something.
302 response code typically indicates that the server is redirecting you to a different location (found in the Location response header). Depending on what flags you use, CURL can either retrieve that automatically or you can watch for the 302 response and retrieve it yourself.
Here is how you would get CURL to follow the redirects (where $ch is the handle to your curl connection):
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects
You can use curl multi which is faster and can get data from all the url's in parallel.
You can use it like this
//Initialize
$curlOptions = array(CURLOPT_RETURNTRANSFER => 1);//Add whatever u additionally want.
$curlHandl1 = curl_init($url1);
curl_setopt_array($curlHandl1, $curlOptions);
$curlHandl2 = curl_init($url2);
curl_setopt_array($curlHandl2, $curlOptions);
$multi = curl_multi_init();
curl_multi_add_handle($multi, $curlHandle1);
curl_multi_add_handle($multi, $curlHandle2);
//Run Handles
$running = null;
do {
$status = curl_multi_exec($mh, $running);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($running && $status == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$status = curl_multi_exec($mh, $running);
} while ($status == CURLM_CALL_MULTI_PERFORM);
}
}
//Retrieve Results
$response1 = curl_multi_getcontent($curlHandle1);
$status1 = curl_getinfo($curlHandle1);
$response1 = curl_multi_getcontent($curlHandle1);
$status1 = curl_getinfo($curlHandle1);
You can find more information here http://www.php.net/manual/en/function.curl-multi-exec.php
Checkout the Example1