This is a bit of a continue from my previous thread (PHP CURL Chunked encoding a large file (700mb)) but I've now improvised something else.
Right now, I'm trying to use fread and then sending files through CURL chunk by chunk (each chunk around 1MB) and while the idea is good and it does work. It does timeout the server, so I was wondering if there was any way to reduce the amount of times it sends a chunk per second or a way to make it so it doesn't completely overload my PHP process.
$length = (1024 * 1024) * 1;
$handle = fopen($getFile, "r");
while (($buffer = fread($handle, $length)) !== false) {
if ($response = sendChunk($getServer, $buffer)) {
$chunk++;
print "Chunk " . $chunk . " Sent (Code: " . $response . ")! \n";
}
}
The function sendChunk is
function sendChunk($url, $chunk) {
$POST_DATA = [
'file' => base64_encode($chunk)
];
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_TIMEOUT, 2048);
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $POST_DATA);
curl_exec($curl);
$response = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close ($curl);
return $response;
}
I tried making it so you can read the file line by line, but it doesn't work since a video file (mp4, wmv) is lots of random characters and what not.
UPDATE: I have discovered the issue and the timing out was actually a result of CloudFlare timing out when there's no such HTTP Response. So I decided to run the script using SSH and it worked fine .... except for one thing.
After the file does get successfully sent over it will just keep sending chunks of 0 bytes in this endless loop and I was told it was because feof() isn't always accurate in measuring that. So I tried using the ($buffer = fread($handle, $length) !== false) trick and it still repeats the same thing. Any ideas?
After working on this for around 8 hours, I noticed that I wasn't using $buffer to send the chunk so now I have done that.
while (!feof($fp) && ($buffer = fread($handle, $length)) !== false) {
if ($response = sendChunk($getServer, $buffer)) {
$chunk++;
print "Chunk " . $chunk . " Sent (Code: " . $response . ")! \n";
}
}
Everything works fine, I did some other touchups like check for a response code of 200. But the core of it works.
A lesson for anyone that is using Cloudflare and is wanting to transfer a file (Up to 2GB) to another server and want to use it via CURL.
There is better ways than just using CURL for this in my opinion, but client has requested it done via this way, but it works.
Cloudflare only has a maximum upload limit of 250MB for Free users, you cannot do chunked uploading through CURL's supported stream function as Cloudflare still reads it as > 250MB in the header.
When I managed to get this code to work, it would timeout on certain chunks and it was because Cloudflare needs an HTTP header within 100 seconds or it times out. Thankfully my script will be executed via CRON so it doesn't need to go through Cloudflare to work. However, if you are looking to execute code within the browser then you may want to take a look at this. https://github.com/marcialpaulg/Fixing-Cloudflare-Error-524
Related
I am making a website that will check if a website is working and live. I pass in the URL of the site I would like to check and the following code will check if the site is live and return the HTTP response code as well as true or false.
function urlExists($url=NULL)
{
if($url == NULL) return false;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpcode == 0) {
return array (false, $httpcode);
}
else if($httpcode < 400){
return array (true, $httpcode);
} else {
return array (false, $httpcode);
}
}
With one of the sites I am testing though I am getting the HTTP response code of 0 even though I know that the site is live and working.
The site is very slow as its a large site on a not very powerful server so response times can vary between 7 - 25 seconds.
Any help would be greatly appreciated.
Thanks,
Sam
Based on these two links:-
https://curl.haxx.se/libcurl/c/CURLOPT_TIMEOUT.html
And
https://curl.haxx.se/libcurl/c/CURLOPT_CONNECTTIMEOUT.html
First one is:- set maximum time the request is allowed to take
Second one is:- timeout for the connect phase
As you said that the Site URL you are hitting is taking 7-25 second for responding. meanwhile your CURL request is terminated and closed because of these two time settings.
Increase these two time settings in your code and it will work for you.
thanks.
I will offer 2 alternatives for you to compare - along with your curl() function, you will have 3 options to see which one is better/faster for you.
Option A (all php versions), requires fopen() to be activated:
if (!$fp = fopen($url, 'r'))
{
trigger_error("Unable to open URL ($url)", E_USER_ERROR);
}
$headers = stream_get_meta_data($fp);
fclose($fp);
$http_header_info = $headers['wrapper_data'][0];
$httpCode = (int)substr($http_header_info, 9, 3);
Option B (php5+):
$headers = get_headers($url, 1);
$http_header_info = $headers[0];
$httpCode = substr($http_header_info, 9, 3);
Also, if anyone has benchmarks on these 3 approaches, i am curious to see which is more appropriate (only for retrieving http response headers of course)
Code 0 returns often when used invalid URL syntax or host not found error.
You can also call curl_error($ch) function (http://php.net/manual/en/function.curl-error.php) to determine error details.
I want to allow my users to upload a file by providing a URL to the image.
Pretty much like imgur, you enter http://something.com/image.png and the script downloads the file, then keeps it on the server and publishes it.
I tried using file_get_contents() and getimagesize(). But I'm thinking there would be problems:
how can I protect the script from 100 users supplying 100 URLs to large images?
how can I determine if the download process will take or already takes too long?
This is actually interesting.
It appears that you can actually track and control the progress of a cURL transfer. See documentation on CURLOPT_NOPROGRESS, CURLOPT_PROGRESSFUNCTION and CURLOPT_WRITEFUNCTION
I found this example and changed it to:
<?php
file_put_contents('progress.txt', '');
$target_file_name = 'targetfile.zip';
$target_file = fopen($target_file_name, 'w');
$ch = curl_init('http://localhost/so/testfile2.zip');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_NOPROGRESS, FALSE);
curl_setopt($ch, CURLOPT_PROGRESSFUNCTION, 'progress_callback');
curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'write_callback');
curl_exec($ch);
if ($target_file) {
fclose($target_file);
}
$_download_size = 0;
function progress_callback($download_size, $downloaded_size, $upload_size, $uploaded_size) {
global $_download_size;
$_download_size = $download_size;
static $previous_progress = 0;
if ($download_size == 0) {
$progress = 0;
}
else {
$progress = round($downloaded_size * 100 / $download_size);
}
if ($progress > $previous_progress) {
$previous_progress = $progress;
$fp = fopen('progress.txt', 'a');
fputs($fp, $progress .'% ('. $downloaded_size .'/'. $download_size .")\n");
fclose($fp);
}
}
function write_callback($ch, $data) {
global $target_file_name;
global $target_file;
global $_download_size;
if ($_download_size > 1000000) {
return '';
}
return fwrite($target_file, $data);
}
write_callback checks whether the size of the data is greater than a specified limit. If it is, it returns an empty string that aborts the transfer. I tested this on 2 files with 80K and 33M, respectively, with a 1M limit. In your case, progress_callback is pointless beyond the second line, but I kept everything in there for debugging purposes.
One other way to get the size of the data is to do a HEAD request but I don't think that servers are required to send a Content-length header.
To answer question one, you simply need to add the appropriate limits in your code. Define how many requests you want to accept in a given amount of time, track your requests in a database, and go from there. Also put a cap on file size.
For question two, you can set appropriate timeouts if you use cURL.
I'm trying to find a way to only quickly access a file and then disconnect immediately.
So I've decided to use cURL since it's the fastest option for me. But I can't figure out how I should "disconnect" cURL.
With the code below, Apache's access logs says that the file I tried accessing was indeed accessed, but I'm feeling a little iffy about this, because when I just run the while loop without breaking out of it, it just keeps looping. Shouldn't the loop stop when cURL has finished fetching the file? Or am I just being silly; is the loop just restarting constantly?
<?php
$Resource = curl_init();
curl_setopt($Resource, CURLOPT_URL, '...');
curl_setopt($Resource, CURLOPT_HEADER, 0);
curl_setopt($Resource, CURLOPT_USERAGENT, '...');
while(curl_exec($Resource)){
break;
}
curl_close($Resource);
?>
I tried setting the CURLOPT_CONNECTTIMEOUT_MS / CURLOPT_CONNECTTIMEOUT options to very small values, but it didn't help in this case.
Is there a more "proper" way of doing this?
This statement is superflous:
while(curl_exec($Resource)){
break;
}
Instead just keep the return value for future reference:
$result = curl_exec($Resource);
The while loop does not help anything. So now to your question: You can tell curl that it should only take some bytes from the body and then quit. That can be achieved by reducing the CURLOPT_BUFFERSIZE to a small value and by using a callback function to tell curl it should stop:
$withCallback = array(
CURLOPT_BUFFERSIZE => 20, # ~ value of bytes you'd like to get
CURLOPT_WRITEFUNCTION => function($handle, $data) {
echo "WRITE: (", strlen($data), ") $data\n";
return 0;
},
);
$handle = curl_init("http://stackoverflow.com/");
curl_setopt_array($handle, $withCallback);
curl_exec($handle);
curl_close($handle);
Output:
WRITE: (10) <!DOCTYPE
Another alternative is to make a HEAD request by using CURLOPT_NOBODY which will never fetch the body. But it's not a GET request.
The connect timeout settings are about how long it will take until the connect times out. The connect is the phase until the server accepts input from curl and curl starts to know about that the server does. It's not related to the phase when curl fetches data from the server, that's
CURLOPT_TIMEOUT The maximum number of seconds to allow cURL functions to execute.
You find a long list of available options in the PHP Manual: curl_setoptĀDocs.
Perhaps that might be helpful?
$GLOBALS["dataread"] = 0;
define("MAX_DATA", 3000); // how many bytes should be read?
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch, CURLOPT_WRITEFUNCTION, "handlewrite");
curl_exec($ch);
curl_close($ch);
function handlewrite($ch, $data)
{
$GLOBALS["dataread"] += strlen($data);
echo "READ " . strlen($data) . " bytes\n";
if ($GLOBALS["dataread"] > MAX_DATA) {
return 0;
}
return strlen($data);
}
This is my code:
function get_remote_file_to_cache(){
$sites_array = array("http://www.php.net", "http://www.engadget.com", "http://www.google.se", "http://arstechnica.com", "http://wired.com");
$the_site= $sites_array[rand(0, 4)];
$curl = curl_init();
$fp = fopen("rr.txt", "w");
curl_setopt ($curl, CURLOPT_URL, $the_site);
curl_setopt($curl, CURLOPT_FILE, $fp);
curl_exec ($curl);
curl_close ($curl);
}
$cache_file = 'rr.txt';
$cache_life = '15'; //caching time, in seconds
$filemtime = #filemtime($cache_file);
if (!$filemtime or (time() - $filemtime >= $cache_life)){
ob_start();
echo file_get_contents($cache_file);
ob_get_flush();
echo " <br><br><h1>Writing to cache</h1>";
get_remote_file_to_cache();
}else{
echo "<h1>Reading from cache file:</h1><br> ";
ob_start();
echo file_get_contents($cache_file);
ob_get_flush();
}
Everything works as it should and no problems or surprises, and as you can see its pretty simple code but I am new to CURL and would just like to add one check to the code, but dont know how:
Is there anyway to check that the file fetched from the remote site is not a 404 (not found) page or such but is a status code 200 (successful) ?
So basically, only write to cache file if the fill is status code 200.
Thanks!
To get the status code from a cURL handle, use curl_getinfo after curl_exec:
$status = curl_getinfo($curl, CURLINFO_HTTP_CODE);
But the cached file will be overwritten when
$fp = fopen("rr.txt", "w");
is called, regardless of the HTTP code, this means that to update the cache only when status is 200, you need to read the contents into memory, or write to a temporary file. Then finally write to the real file if the status is 200.
It is also a good idea to
touch('rr.txt');
before executing cURL, so that the next request that may come before the current operation finish will not also try to load the page to page too.
Try this after curl_exec
$httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
I have a simple download function in a class that might be dealing with files of many hundreds of megabytes at a time from an Amazon Web Services bucket. The whole file cannot be loaded into memory at once, so it must be streamed directly to a file pointer. This is my understanding as this is the first time I've dealt with this issue and I'm picking things up as I go along.
I've ended up with this, based on a 4 KB file buffer which simple testing showed was a good size:
$fs = fsockopen($host, 80, $errno, $errstr, 30);
if (!$fs) {
$this->writeDebugInfo("FAILED ", $errstr . '(' . $errno . ')');
} else {
$out = "GET $file HTTP/1.1\r\n";
$out .= "Host: $host\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fs, $out);
$fm = fopen ($temp_file_name, "w");
stream_set_timeout($fs, 30);
while(!feof($fs) && ($debug = fgets($fs)) != "\r\n" ); // ignore headers
while(!feof($fs)) {
$contents = fgets($fs, 4096);
fwrite($fm, $contents);
$info = stream_get_meta_data($fs);
if ($info['timed_out']) {
break;
}
}
fclose($fm);
fclose($fs);
if ($info['timed_out']) {
// Delete temp file if fails
unlink($temp_file_name);
$this->writeDebugInfo("FAILED - Connection timed out: ", $temp_file_name);
} else {
// Move temp file if succeeds
$media_file_name = str_replace('temp/', 'media/', $temp_file_name);
rename($temp_file_name, $media_file_name);
$this->writeDebugInfo("SUCCESS: ", $media_file_name);
}
}
In testing it's fine. However I have got into a conversation with someone who is saying that I am not understanding how fgets() and feof() work together, and he's mentioning chunked encoding as a more efficient method.
Is the code generally OK, or am I missing something vital here? What is the benefit that chunked encoding will give me?
Your solution seems fine to me, however I have a few comments.
1) Don't create a HTTP packet yourself, i.e. don't send the HTTP request. Instead use something like CURL. This is more fool proof and will support a wider range of responses the server might reply with. Additionally CURL can be setup to write directly to a file, saving you doing it yourself.
2) Using fgets may be a problem if you are reading binary data. Fgets reads to the end of a line, and with binary data this may corrupt your download. Instead I suggest fread($fs, 4096); which will handle both text and binary data.
2) Chunked encoding is a way for a webserver to send you the response in multiple chunks. I don't think this is very useful to you, however, a better encoding that the webserver might support is the gzip encoding. This would allow the webserver to compress the response on the fly. If you use a library like CURL, it will tell the server it supports gzip, and then automatically decompress it for you.
I hope this helps
Don't deal with sockets, optimize your code and use the cURL library, PHP cURL. Like this:
$url = 'http://'.$host.'/'.$file;
// create a new cURL resource
$fh = fopen ($temp_file_name, "w");
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FILE, $fh);
//curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
fclose($fh);
And the final result in case it helps anyone else. I also wrapped the whole thing in a retry loop to decrease the risk of a completely failed download, but it does increase the use of resources:
do {
$fs = fopen('http://' . $host . $file, "rb");
if (!$fs) {
$this->writeDebugInfo("FAILED ", $errstr . '(' . $errno . ')');
} else {
$fm = fopen ($temp_file_name, "w");
stream_set_timeout($fs, 30);
while(!feof($fs)) {
$contents = fread($fs, 4096); // Buffered download
fwrite($fm, $contents);
$info = stream_get_meta_data($fs);
if ($info['timed_out']) {
break;
}
}
fclose($fm);
fclose($fs);
if ($info['timed_out']) {
// Delete temp file if fails
unlink($temp_file_name);
$this->writeDebugInfo("FAILED on attempt " . $download_attempt . " - Connection timed out: ", $temp_file_name);
$download_attempt++;
if ($download_attempt < 5) {
$this->writeDebugInfo("RETRYING: ", $temp_file_name);
}
} else {
// Move temp file if succeeds
$media_file_name = str_replace('temp/', 'media/', $temp_file_name);
rename($temp_file_name, $media_file_name);
$this->newDownload = true;
$this->writeDebugInfo("SUCCESS: ", $media_file_name);
}
}
} while ($download_attempt < 5 && $info['timed_out']);