Using file_get_contents vs curl for file size

Using file_get_contents vs curl for file size - php

I have a file uploading script running on my server which also features remote uploads.. Everything works fine but I am wondering what is the best way to upload via URL. Right now I am using fopen to get the file from the remote url pasted in the text box named "from". I have heard that fopen isn't the best way to do it. Why is that?
Also I am using file_get_contents to get the file size of the file from the URL. I have heard that curl is better on that part. Why is that and also how can I apply these changes to this script?
<?php
$from = htmlspecialchars(trim($_POST['from']));
if ($from != "") {
$file = file_get_contents($from);
$filesize = strlen($file);
while (!feof($file)) {
$move = "./uploads/" . $rand2;
move_upload($_FILES['from']['tmp_name'], $move);
$newfile = fopen("./uploads/" . $rand2, "wb");
file_put_contents($newfile, $file);
}
}
?>

You can use filesize to get the file size of a file on disk.
file_get_contents actually gets the file into memory so $filesize = strlen(file_get_contents($from)); already gets the file, you just don't do anything with it other than find it size. You can substitute for you fwrite call file_put_contents;
See: file_get_contents and file_put_contents .
curl is used when you need more access to the HTTP protocol. There are many questions and examples on StackOverflow using curl in PHP.
So we can first download the file, in this example I wll use file_get_contents, get its size, then put the file in the directory on your local disk.
$tmpFile = file_get_contents($from);
$fileSize = strlen($tmpFile);
// you could do a check for file size here
$newFileName = "./uploads/$rand2";
file_put_contents($newFileName, $tmpFile);
In your code you have move_upload($_FILES['from']['tmp_name'], $move); but $_FILES is only applicable when you have a <input type="file"> element, which it doesn't seem you have.
P.S. You should probably white-list characters that you allow in a filename for instance $goodFilename = preg_replace("/^[^a-zA-Z0-9]+$/", "-", $filename) This is often easier to read and safer.
Replace:
while (!feof($file)) {
$move = "./uploads/" . $rand2;
move_upload($_FILES['from']['tmp_name'], $move);
$newfile = fopen("./uploads/" . $rand2, "wb");
file_put_contents($newfile, $file);
}
With:
$newFile = "./uploads/" . $rand2;
file_put_contents($newfile, $file);
The whole file is read in by file_get_contents the whole file is written by file_put_contents

As far as I understand your question: You want to get the filesize of a remote fiel given by a URL, and you're not sure which solution ist best/fastest.
At first, the biggest difference between CURL, file_get_contents() and fread() in this context is that CURL and file_get_contents() put the whole thing into memory, while fopen() gives you more control over what parts of the file you want to read. I think fopen() and file_get_contents() are nearly equivalent in your case, because you're dealing with small files and you actually want to get the whole file. So it doesn't make any difference in terms of memory usage.
CURL is just the big brother of file_get_contents(). It is actually a complete HTTP-Client rather than some kind of a wrapper for simple functions.
And talking about HTTP: Don't forget there's more to HTTP than GET and POST. Why don't you just use the resource's meta-data to check it's size before you even get it? That's one thing the HTTP method HEAD is meant for. PHP even comes with a built in function for getting the headers: get_headers(). It has some flaws though: It still sends a GET request, which makes it probably a little slower, and it follows redirects, which may cause security issues. But you can fix this pretty easily by adjusting the default context:
$opts = array(
'http' =>
array(
'method' => 'HEAD',
'max_redirects'=> 1,
'ignore_errors'=> true
)
);
stream_context_set_default($opts);
Done. Now you can simply get the headers:
$headers = get_headers('http://example.com/pic.png', 1);
//set the keys to lowercase so we don't have to deal with lower- and upper case
$lowerCaseHeaders = array_change_key_case($headers);
// 'content-length' is the header we're interested in:
$filesize = $lowerCaseHeaders['content-length'];
NOTE: filesize() will not work on a http / https stream wrapper, because stat() is not supported (http://php.net/manual/en/wrappers.http.php).
And that's pretty much it. Of course you can achieve the same with CURL just as easy if you like it better. The approach would be same (reding the headers).
And here's how you get the file and it's size (after downloading) with CURL:
// Create a CURL handle
$ch = curl_init();
// Set all the options on this handle
// find a full list on
// http://au2.php.net/manual/en/curl.constants.php
// http://us2.php.net/manual/en/function.curl-setopt.php (for actual usage)
curl_setopt($ch, CURLOPT_URL, 'http://example.com/pic.png');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Send the request and store what is returned to a variable
// This actually contains the raw image data now, you could
// pass it to e.g. file_put_contents();
$data = curl_exec($ch);
// get the required info about the request
// find a full list on
// http://us2.php.net/manual/en/function.curl-getinfo.php
$filesize = curl_getinfo($ch, CURLINFO_SIZE_DOWNLOAD);
// close the handle after you're done
curl_close($ch);
Pure PHP approach: http://codepad.viper-7.com/p8mlOt
Using CURL: http://codepad.viper-7.com/uWmsYB
For a nicely formatted and human readable output of the file size I've learned this amazing function from Laravel:
function get_file_size($size)
{
$units = array('Bytes', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB');
return #round($size / pow(1024, ($i = floor(log($size, 1024)))), 2).' '.$units[$i];
}
If you don't want to deal with all this, you should check out Guzzle. It's a very powerful and extremely easy to use library for any kind HTTP stuff.

Related

Running Out Of Memory With Fread

I'm using Backblaze B2 to store files and am using their documentation code to upload via their API. However their code uses fread to read the file, which is causing issues for files that are larger than 100MB as it tries to load the entire file into memory. Is there a better way to this that doesn't try to load the entire file into RAM?
$file_name = "file.txt";
$my_file = "<path-to-file>" . $file_name;
$handle = fopen($my_file, 'r');
$read_file = fread($handle,filesize($my_file));
$upload_url = ""; // Provided by b2_get_upload_url
$upload_auth_token = ""; // Provided by b2_get_upload_url
$bucket_id = ""; // The ID of the bucket
$content_type = "text/plain";
$sha1_of_file_data = sha1_file($my_file);
$session = curl_init($upload_url);
// Add read file as post field
curl_setopt($session, CURLOPT_POSTFIELDS, $read_file);
// Add headers
$headers = array();
$headers[] = "Authorization: " . $upload_auth_token;
$headers[] = "X-Bz-File-Name: " . $file_name;
$headers[] = "Content-Type: " . $content_type;
$headers[] = "X-Bz-Content-Sha1: " . $sha1_of_file_data;
curl_setopt($session, CURLOPT_HTTPHEADER, $headers);
curl_setopt($session, CURLOPT_POST, true); // HTTP POST
curl_setopt($session, CURLOPT_RETURNTRANSFER, true); // Receive server response
$server_output = curl_exec($session); // Let's do this!
curl_close ($session); // Clean up
echo ($server_output); // Tell me about the rabbits, George!
I have tried using:
curl_setopt($session, CURLOPT_POSTFIELDS, array('file' => '#'.realpath('file.txt')));
However I get an error response: Error reading uploaded data: SocketTimeoutException(Read timed out)
Edit: Streaming the filename withing the CURL also doesn't seem to work.

The issue you are having is related to this.
fread($handle,filesize($my_file));
With the filesize in there you might as well just do file_get_contents. it's much better memory wise to read 1 line at a time with fget
$handle = fopen($myfile, 'r');
while(!feof($handle)){
$line = fgets($handle);
}
This way you only read one line into memory, but if you need the full file contents you will still hit a bottleneck.
The only real way is to stream the upload.
I did a quick search and it seems the default for CURL is to stream the file if you give it the filename
$post_data['file'] = 'myfile.csv';
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
You can see the previous answer for more details
Is it possible to use cURL to stream upload a file using POST?
So as long as you can get past the sha1_file It looks like you can just stream the file, which should avoid the memory issues. There may be issues with time limit though. Also I can't really think of a way around getting the hash if that fails.
Just FYI, personally I never tried this, typically i just us sFTP for large file transfers. So I don't know if it has to be specially post_data['file'] I just copied that from the other answer.
Good luck...
UPDATE
Seeing as streaming seems to have failed (see comments).
You may want to test the streaming to make sure it works. I don't know what all that would involve, maybe stream a file to your own server? Also I am not sure why it wouldn't work "as advertised" and you may have tested it already. But it never hurts to test something, never assume something works until you know for sure. It very easy to try something new as a solution, only to miss a setting or put a path in wrong and then fall back to thinking its all based on the original issue.
I've spent a lot of time tearing things apart only to realize I had a spelling error. I'm pretty adept a programing these days so I typically overthink the errors too. My point is, be sure it's not a simple mistake before moving on.
Assuming everything is setup right, I would try file_get_contents. I don't know if it will be any better but it's more meant to open whole files. It also would seem to be more Readable in the code, because then it's clear that the whole file is needed. It just seems more semantically correct if nothing else.
You can also increase the RAM PHP has access to by using
ini_set('memory_limit', '512M')
You can even go higher then that, depending on your server. The highest I went before was 3G, but the server I uses has 54GB of ram and that was a one time thing, (we migrated 130million rows from MySql to MongoDB, the innodb index was eating up 30+GB ). Typically I run with 512M and have some scripts that routinely need 1G. But I wouldn't just up the Memory willy-nilly. That is usually a last resort for me after optimizing and testing. We do a lot of heavy processing that is why we have such a big server, we also have 2 slave servers (among other things) that run with 16GB each.
As far as what size to put, typically I increment it by 128M tell it works, then add an extra 128M just to be sure, but you might want to go in smaller steps. Typically people always use multiples of 8, but I don't know if that make to much difference these days.
Again, Good Luck.

Best method for bulk downloading images from website

I will download a lot of images (+20.000) from a website to my server and i'm trying to figure out the best way to do this since there's so many images to download.
Currently I have the code below which works in testing. But is there a better solution or should I use some software to do this?
foreach ($products as $product) {
$url = $product->img;
$imgName = $product->product_id
$path = "images/";
$img = $path . $imgName . ".png";
file_put_contents($img, file_get_contents($url));
}
Also, is there a chance that I will break something or crash the website when I download that many images at once?

first off, i agree with #Rudy Palacois here, wget would probably be better. that said, if you want to do it in PHP, curl would be much faster than file_get_contents, for 2 reasons.
1: unlike file_get_contents, curl can reuse the same connection to download multiple files, while file_get_contents will create & close a new connection for each download, that takes time, thus curl will be faster (as long as you're not using CURLOPT_FORBID_REUSE / CURLOPT_FRESH_CONNECT , anyway)
2: curl stops the download when the Content-Length http header's bytes has been downloaded. but file_get_contents completely ignores this header, and keeps downloading everything it can, until the connection is closed. this can again be much slower than curl's approach, because it's up to the web server when the connection will close, on some servers, it's A LOT slower than reading Content-Length bytes.
(and generally, curl is faster than file_get_contents because curl supports compressed transfers, gzip and deflate, which file_get_contents does not do... but that's generally not applicable for images, most common image formats are already pre-compressed. notable exceptions include .bmp images, though)
like this:
$ch = curl_init ();
curl_setopt ( $ch, CURLOPT_ENCODING, '' ); // if you're downloading files that benefit from compression (like .bmp images), this line enables compressed transfers.
foreach ( $products as $product ) {
$url = $product->img;
$imgName = $product->product_id;
$path = "images/";
$img = $path . $imgName . ".png";
$img=fopen($img,'wb');
curl_setopt_array ( $ch, array (
CURLOPT_URL => $url,
CURLOPT_FILE => $img
) );
curl_exec ( $ch );
fclose($img);
// file_put_contents ( $img, file_get_contents ( $url ) );
}
curl_close ( $ch );
edit: fixed a code-breaking typo, it's called CURLOPT_FILE, not CURLOPT_OUTFILE
edit 2: CURLOPT_FILE wants a file resource, not a filepath, fixed that too x.x

If you have access to shell, you could use WGET, I mean, the main problem with php, if you are executing this code from a browser, is the execution time, it will stop after a few minutes or it can be loading forever and get stucked, but if you have a complete URL and a pattern, as I can see, you can create a file with the URLs, one URL per line, list.txt, for example and then execute
wget -i list.txt
Check this answer too https://stackoverflow.com/a/14578517/5415074

PHP filesize of dynamically chosen file

I have a php script that needs to determine the size of a file on the file system after being manipulated by a separate php script.
For example, there exists a zip file that has a fixed size but gets an additional file of unknown size inserted into it based on the user that tries to access it. So the page that's serving the file is something like getfile.php?userid=1234.
So far, I know this:
filesize('getfile.php'); //returns the actual file size of the php file, not the result of script execution
readfile('getfile.php'); //same as filesize()
filesize('getfile.php?userid=1234'); //returns false, as it can't find the file matching the name with GET vars attached
readfile('getfile.php?userid=1234'); //same as filesize()
Is there a way to read the result size of the php script instead of just the php file itself?

filesize
As of PHP 5.0.0, this function can also be used with some URL
wrappers.
something like
filesize('http://localhost/getfile.php?userid=1234');
should be enough

Someone had posted an option for using curl to do this but removed their answer after a downvote. Too bad, because it's the one way I've gotten this to work. So here's their answer that worked for me:
$ch = curl_init('http://localhost/getfile.php?userid=1234');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //This was not part of the poster's answer, but I needed to add it to prevent the file being read from outputting with the requesting script
curl_exec($ch);
$size = 0;
if(!curl_errno($ch))
{
$info = curl_getinfo($ch);
$size = $info['size_download'];
}
curl_close($ch);
echo $size;

The only way to get the size of the output is to run it and then look. Depending on the script the result might differ though for practical use the best thing to do is to estimate basd on your knowledge. i.e. if you have a 5MB file and add another 5k user specific content it's still about 5MB in the end etc.

To expand on Ivan's answer:
Your string is 'getfile.php' with or without GET parameters, this is being treated as a local file, and therefore retrieving the filesize of the php file itself.
It is being treated as a local file because it isn't starting with the http protocol. See http://us1.php.net/manual/en/wrappers.php for supported protocols.

When using filesize() I got a warning:
Warning: filesize() [function.filesize]: stat failed for ...link... in ..file... on line 233
Instead of filesize() I found two working options to replace it:
1)
$headers = get_headers($pdfULR, 1);
$fileSize = $headers['Content-Length'];
echo $fileSize;
2)
echo strlen(file_get_contents($pdfULR));
Now it's working fine.

PHP Copy Dynamically created TIFF image from remote server

I've written a script that searches through exiting legal case dockets for things like "motion to intervene" and "motion to compel". If the regular expression returns true, then it looks to see if there is a scanned image of the document online for public use. That image is a TIFF file, but not an ordinary tiff file. Here is a link to an example of what I'm trying to copy to my own server.
http://www.oscn.net/applications/oscn/getimage.tif?submitted=true&casemasterid=2565129&db=OKLAHOMA&barcode=1012443256
Here is the error you get if you only try to look at the http://www.oscn.net/applications/oscn/getimage.tif
It is a TIFF file but dynamic. I've used the fopen(), CURL, etc without success. I've used these types of functions with JPG images from random sites just to check to make sure that my server allowed this type of stuff and it worked.
I don't have PDFlib installed on the server (I checked the PEAR and it's not available there either, though I'm not 100% sure that is where it would be.) My host uses cPanel. The server is running Apache. I'm not sure where else to look for a solution to this problem.
I've seen some solutions that used PDFlib but each of those grabbed a normal TIFF image, not one that was dynamically created. My thought though is that it shouldn't matter if I can get the image data to stream, shouldn't I be able to use fopen() and write or buffer that data into my own .tif file?
Thanks for any input and Happy Thanksgiving!
UPDATE: The issue wasn't with CURL, it was with the URL I scraped to pass to CURL. When I printed the $url to the screen, it looked right, but it wasn't. Somewhere & was turned into &, which then threw off CURL because it was fetching an invalid URL (invalid at least according to the remote server where the TIF file is).
For those of you finding this later, here is the script that works perfectly.
//*******************************************************************************
$url = 'http://www.oscn.net/applications/oscn/getimage.tif"
$url .= '?submitted=true&casemasterid=2565129&db=OKLAHOMA&barcode=1016063497';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // set the url
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // get the transfer as a string, rather than output it directly
print "Attempting to fetch file...\n";
$img = curl_exec($ch); // get the image
//I used the time() so that in testing I would know when a new file was created rather than always overwriting the old file. This will be changed for final version
if($img){
$fh = fopen('oscn_docs/' . time(). '.tif', 'w'); // this will simply overwrite the file. If that's not what you want to do, you'll have to change the 'w' argument!
if($fh){
$byteswritten = fwrite($fh, $img);
fclose($fh);
}else{
print "Unable to open file.\n";
}
}else{
print "Unable to fetch file.\n";
}
print "Done.\n";
exit(0);
//*******************************************************************************
jarod

For those of you finding this later, here is the script that works perfectly.
//*******************************************************************************
$url = 'http://www.oscn.net/applications/oscn/getimage.tif"
$url .= '?submitted=true&casemasterid=2565129&db=OKLAHOMA&barcode=1016063497';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // set the url
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // get the transfer as a string, rather than output it directly
print "Attempting to fetch file...\n";
$img = curl_exec($ch); // get the image
//I used the time() so that in testing I would know when a new file was created rather than always overwriting the old file. This will be changed for final version
if($img){
$fh = fopen('oscn_docs/' . time(). '.tif', 'w'); // this will simply overwrite the file. If that's not what you want to do, you'll have to change the 'w' argument!
if($fh){
$byteswritten = fwrite($fh, $img);
fclose($fh);
}else{
print "Unable to open file.\n";
}
}else{
print "Unable to fetch file.\n";
}
print "Done.\n";
exit(0);
//*******************************************************************************

Stream FTP upload in chunks with PHP?

Is it possible to stream an FTP upload with PHP? I have files I need to upload to another server, and I can only access that server through FTP. Unfortunately, I can't up the timeout time on this server. Is it at all possible to do this?
Basically, if there is a way to write part of a file, and then append the next part (and repeat) instead of uploading the whole thing at once, that'd save me. However, my Googling hasn't provided me with an answer.
Is this achievable?

OK then... This might be what you're looking for. Are you familiar with curl?
CURL can support appending for FTP:
curl_setopt($ch, CURLOPT_FTPAPPEND, TRUE ); // APPEND FLAG
The other option is to use ftp:// / ftps:// streams, since PHP 5 they allow appending. See ftp://; ftps:// Docs. Might be easier to access.

The easiest way to append a chunk to the end of a remote file is to use file_put_contents with FILE_APPEND flag:
file_put_contents('ftp://username:pa‌ssword#hostname/path/to/file', $chunk, FILE_APPEND);
If it does not work, it's probably because you do not have URL wrappers enabled in PHP.
If you need a greater control over the writing (transfer mode, passive mode, etc), or you cannot use the file_put_contents, use the ftp_fput with a handle to the php://temp (or the php://memory) stream:
$conn_id = ftp_connect('hostname');
ftp_login($conn_id, 'username', 'password');
ftp_pasv($conn_id, true);
$h = fopen('php://temp', 'r+');
fwrite($h, $chunk);
rewind($h);
// prevent ftp_fput from seeking local "file" ($h)
ftp_set_option($conn_id, FTP_AUTOSEEK, false);
$remote_path = '/path/to/file';
$size = ftp_size($conn_id, $remote_path);
$r = ftp_fput($conn_id, $remote_path, $h, FTP_BINARY, $size);
fclose($h);
ftp_close($conn_id);
(add error handling)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.