PHP: Receive and store large binary data - php

My PHP script is receiving large data (100 - 500 MB) from a client. I want my PHP script run fast, without using too much memory.
To save traffic, I don't use Base64 or form data. I send binary data directly in a POST request.
The data consists of two parts: 2000 Bytes header, and the rest, that has to be stored as a file on the server.
$fle = file_get_contents("php://input",FALSE,NULL,2000);
file_put_contents("file.bin", $fle);
The problem is, that file_get_contents ignores the offset parameter, and reads the data from byte 0. Is there any better way to do it?
** I don't want to read the whole data and slice off the last N-2000 bytes, as I am afraid it would use too much memory.

Use the lower-level file IO functions and read/write a little bit at a time.
$bufsz = 4096;
$fi = fopen("php://input", "rb");
$fo = fopen("file.bin", "wb");
fseek($fi, 2000);
while( $buf = fread($fi, $bufsz) ) {
fwrite($fo, $buf);
}
fclose($fi);
fclose($fo);
This will read/write in 4kB chunks.

Related

Downloading a large file in PHP, max 8192 bytes?

I'm using the following code to download a large file (>100mb). The code is executed in a shell.
$fileHandle = fopen($url, 'rb');
$bytes = 100000;
while ($read = #fread($fileHandle, $bytes)) {
debug(strlen($read));
if (!file_put_contents($filePath, $read, FILE_APPEND)) {
return false;
}
}
Where I would expect that debug(strlen($read)) would output 100000, this is the actual output:
10627
8192
8192
8192
...
Why doesn't fread read more than 8192 bytes after the first time, and why does it read 10627 bytes on the first iteration?
This makes downloading the file very slow, is there a better way to do this?
The answer to your question is (quoting from the PHP docs for fread()):
if the stream is read buffered and it does not represent a plain file, at most one read of up to a number of bytes equal to the chunk size (usually 8192) is made; depending on the previously buffered data, the size of the returned data may be larger than the chunk size
The solution to your performance problem is to using stream_copy_to_stream() which should be faster than block reading using fread(), and more memory efficient as well
I checked the manual, and found this: http://php.net/manual/en/function.fread.php
"If the stream is read buffered and it does not represent a plain file, at most one read of up to a number of bytes equal to the chunk size (usually 8192) is made;"
Since you're opening a URL this is probably the case.
It doesn't explain the 10627 though...
Besides that, why do you expect 100000 byte reads to be faster than 8192?
I doubt that's your bottle neck. My guess is that either the download speed from the URL or the writing speed of the HD is the problem.

PHP using fwrite and fread with input stream

I'm looking for the most efficient way to write the contents of the PHP input stream to disk, without using much of the memory that is granted to the PHP script. For example, if the max file size that can be uploaded is 1 GB but PHP only has 32 MB of memory.
define('MAX_FILE_LEN', 1073741824); // 1 GB in bytes
$hSource = fopen('php://input', 'r');
$hDest = fopen(UPLOADS_DIR.'/'.$MyTempName.'.tmp', 'w');
fwrite($hDest, fread($hSource, MAX_FILE_LEN));
fclose($hDest);
fclose($hSource);
Does fread inside an fwrite like the above code shows mean that the entire file will be loaded into memory?
For doing the opposite (writing a file to the output stream), PHP offers a function called fpassthru which I believe does not hold the contents of the file in the PHP script's memory.
I'm looking for something similar but in reverse (writing from input stream to file). Thank you for any assistance you can give.
Yep - fread used in that way would read up to 1 GB into a string first, and then write that back out via fwrite. PHP just isn't smart enough to create a memory-efficient pipe for you.
I would try something akin to the following:
$hSource = fopen('php://input', 'r');
$hDest = fopen(UPLOADS_DIR . '/' . $MyTempName . '.tmp', 'w');
while (!feof($hSource)) {
/*
* I'm going to read in 1K chunks. You could make this
* larger, but as a rule of thumb I'd keep it to 1/4 of
* your php memory_limit.
*/
$chunk = fread($hSource, 1024);
fwrite($hDest, $chunk);
}
fclose($hSource);
fclose($hDest);
If you wanted to be really picky, you could also unset($chunk); within the loop after fwrite to absolutely ensure that PHP frees up the memory - but that shouldn't be necessary, as the next loop will overwrite whatever memory is being used by $chunk at that time.

PHP - how to read big remote files efficiently and use buffer in loop

i would like to understand how to use the buffer of a read file.
Assuming we have a big file with a list of emails line by line ( delimiter is a classic \n )
now, we want compare each line with each record of a table in our database in a kind of check like line_of_file == table_row.
this is a simple task if you have a normal file, otherwise, if you have a huge file the server usually stop the operation after few minute.
so what's the best way of doing this kind of stuff with the file buffer?
what i have so far is something like this:
$buffer = file_get_contents('file.txt');
while($row = mysql_fetch_array($result)) {
if ( preg_match('/'.$email.'/im',$buffer)) {
echo $row_val;
}
}
$buffer = file_get_contents('file.txt');
$lines = preg_split('/\n/',$buffer);
//or $lines = explode('\n',$buffer);
while($row = mysql_fetch_array($result)) {
if ( in_array($email,$lines)) {
echo $row_val;
}
}
Like already suggested in my closevotes to your question (hence CW):
You can use SplFileObject which implements Iterator to iterate over a file line by line to save memory. See my answers to
Least memory intensive way to read a file in PHP and
How to save memory when reading a file in Php?
for examples.
Don't use file_get_contents for large files. This pulls the entire file into memory all at once. You have to read it in pieces
$fp = fopen('file.txt', 'r');
while(!feof($fp)){
//get onle line
$buffer = fgets($fp);
//do your stuff
}
fclose($fp);
Open the file with fopen() and read it incrementally. Probably one line at a time with fgets().
file_get_contents reads the whole file into memory, which is undesirable if the file is larger than a few megabytes
Depending on how long this takes, you may need to worry about the PHP execution time limit, or the browser timing out if it doesn't receive any output for 2 minutes.
Things you might try:
set_time_limit(0) to avoid running up against the PHP time limit
Make sure to output some data every 30 seconds or so so the browser doesn't time out; make sure to flush(); and possibly ob_flush(); so your output is actually sent over the network (this is a kludge)
start a separate process (e.g. via exec()) to run this in the background. Honestly, anything that takes more than a second or two is best run in the background

Is fopen() limited by the filesystem?

I wrote a program to generate large .SQL files for quickly populating very large databases. I scripted it in PHP. When I started coding I was using fopen() and fwrite(). When files got too large the program would return control to the shell and the file would be incomplete.
Unfortunately I'm not sure exactly how large is 'too large'. I think it may have been around 4GB.
To solve this problem I had the file echo to stdout. I redirected it when I called the program like so:
[root#localhost]$ php generatesql.php > myfile.sql
Which worked like a charm. My output file ended up being about 10GB.
My question, then, is: Are fopen() and fwrite() limited by the file system in terms of how large a file they are capable of generating? If so; is this a limitation of PHP? Does this happen in other languages as well?
What's probably occuring is the underlying PHP build is 32bit and can't handle file pointers >4GB - see this related question.
Your underlying OS is obviously capable of storing large files, which why you're able to redirect stdout to a large file.
Incidentally, an SQL file is likely to be highly compressible, so you might like to consider using the gzip fopen wrapper to compress the file as you write it.
$file = 'compress.zlib:///path/to/my/file.sql.gz';
$f = fopen($file, 'wb');
//just write as normal...
fwrite($f, 'CREATE TABLE foo (....)');
fclose($f);
Your dump will be a fraction of the original size, and you can restore it simply piping the output from zcat into an SQL client, e.g. for mysql
zcat /path/to/my/file.sql.gz | mysql mydatabase
Yes and no. Its not fopen() or fwrite() which are limited directly, its the files, which cannot exceed some dimensions depending on the filesystem. Have a look at Comparison of filesystems on Wikipedia.
Is it possible that your script is taking too long to execute and has timed out?
Alternatively, is it possible that you're reaching your memory limits within the script?
You can write more than 2GB of data in a stream, not in a file (as the fseek internal pointer of the file is exceeding PHP limits, and streams are not usually seekable)
<?
$target = fopen('test.tar', 'w'); //this is a file, limited by php to 2GB
$body = str_repeat("===", 1024 * 1024);
while(true)
fwrite($target, $test);
<?
$target = popen('cat > test.tar', 'w'); //this is a stream, no limitation here
$body = str_repeat("===", 1024 * 1024);
while(true)
fwrite($target, $test);

How to base64-decode large files in PHP

My PHP web application has an API that can recieve reasonably large files (up to 32 MB) which are base64 encoded. The goal is to write these files somewhere on my filesystem. Decoded of course. What would be the least resource intensive way of doing this?
Edit: Recieving the files through an API means that I have a 32MB string in my PHP app, not a 32 MB source file somewhere on disk. I need to get that string decoded an onto the filesystem.
Using PHP's own base64_decode() isn't cutting it because it uses a lot of memory so I keep running into PHP's memory limit (I know, I could raise that limit but I don't feel good about allowing PHP to use 256MB or so per process).
Any other options? Could I do it manually? Or write the file to disk encoded and call some external command? Any thought?
Even though this has an accepted answer, I have a different suggestion.
If you are pulling the data from an API, you should not store the entire payload in a variable. Using curl or other HTTP fetchers you can automatically store your data in a file.
Assuming you are fetching the data through a simple GET url:
$url = 'http://www.example.com/myfile.base64';
$target = 'localfile.data';
$rhandle = fopen($url,'r');
stream_filter_append($rhandle, 'convert.base64-decode');
$whandle = fopen($target,'w');
stream_copy_to_stream($rhandle,$whandle);
fclose($rhandle);
fclose($whandle);
Benefits:
Should be faster (less copying of huge variables)
Very little memory overhead
If you must grab the data from a temporary variable, I can suggest this approach:
$data = 'your base64 data';
$target = 'localfile.data';
$whandle = fopen($target,'w');
stream_filter_append($whandle, 'convert.base64-decode',STREAM_FILTER_WRITE);
fwrite($whandle,$data);
fclose($whandle);
Decode the data in smaller chunks. Four characters of Base64 data equal three bytes of “Base256” data.
So you could group each 1024 characters and decode them to 768 octets of binary data:
$chunkSize = 1024;
$src = fopen('base64.data', 'rb');
$dst = fopen('binary.data', 'wb');
while (!feof($src)) {
fwrite($dst, base64_decode(fread($src, $chunkSize)));
}
fclose($dst);
fclose($src);
It's not a good idea transfer 32Mb string. But I have a solution for my task, that can accept any size files from browser to app server. Algorithm:
Client
Javascript: Read file from INPUT with FileReader and readAsDataURL() to FILE var.
Cut all of data in FILE from start to first "," position Split it with array_chunks by max_upload_size/max_post_size php var.
Send chunk with UID, chunk number and chunks quantity and wait for response, then send another chunk one by one.
Server Side
Write each chunk until the last one. Then do base64 with streams:
$src = fopen($source, 'r');
$trg = fopen($target, 'w');
stream_filter_append($src, 'convert.base64-decode');
stream_copy_to_stream($src, $trg);
fclose($src);
fclose($trg);
... now you have your base64 decoded file in $target local path. Note! You can not read and write the same file, so $source and $target must be different.

Categories