I wrote a program to generate large .SQL files for quickly populating very large databases. I scripted it in PHP. When I started coding I was using fopen() and fwrite(). When files got too large the program would return control to the shell and the file would be incomplete.
Unfortunately I'm not sure exactly how large is 'too large'. I think it may have been around 4GB.
To solve this problem I had the file echo to stdout. I redirected it when I called the program like so:
[root#localhost]$ php generatesql.php > myfile.sql
Which worked like a charm. My output file ended up being about 10GB.
My question, then, is: Are fopen() and fwrite() limited by the file system in terms of how large a file they are capable of generating? If so; is this a limitation of PHP? Does this happen in other languages as well?
What's probably occuring is the underlying PHP build is 32bit and can't handle file pointers >4GB - see this related question.
Your underlying OS is obviously capable of storing large files, which why you're able to redirect stdout to a large file.
Incidentally, an SQL file is likely to be highly compressible, so you might like to consider using the gzip fopen wrapper to compress the file as you write it.
$file = 'compress.zlib:///path/to/my/file.sql.gz';
$f = fopen($file, 'wb');
//just write as normal...
fwrite($f, 'CREATE TABLE foo (....)');
fclose($f);
Your dump will be a fraction of the original size, and you can restore it simply piping the output from zcat into an SQL client, e.g. for mysql
zcat /path/to/my/file.sql.gz | mysql mydatabase
Yes and no. Its not fopen() or fwrite() which are limited directly, its the files, which cannot exceed some dimensions depending on the filesystem. Have a look at Comparison of filesystems on Wikipedia.
Is it possible that your script is taking too long to execute and has timed out?
Alternatively, is it possible that you're reaching your memory limits within the script?
You can write more than 2GB of data in a stream, not in a file (as the fseek internal pointer of the file is exceeding PHP limits, and streams are not usually seekable)
<?
$target = fopen('test.tar', 'w'); //this is a file, limited by php to 2GB
$body = str_repeat("===", 1024 * 1024);
while(true)
fwrite($target, $test);
<?
$target = popen('cat > test.tar', 'w'); //this is a stream, no limitation here
$body = str_repeat("===", 1024 * 1024);
while(true)
fwrite($target, $test);
Related
From a comment to this answer I read that "stream_get_contents is low-level" comparing to file_get_contents. However according to Manual, stream_get_contents is
Identical to file_get_contents(), except that stream_get_contents() operates on an already open stream resource and returns the remaining contents in a string, up to maxlength bytes and starting at the specified offset.
Which statement is correct?
Is stream_get_contents really lower level and faster?
Specifically I am interested in reading local files from HD.
I'm late here but it might help others
file_get_contents() loads the file content into memory. It sits there in memory and waits for the program to call echo upon which it will be delivered to the output buffer.
A good usage example is:
echo file_get_contents('file.txt');
stream_get_contents() delivers the content on an already open stream. An example is this:
$handle = fopen('file.txt', 'w+');
echo stream_get_contents($handle);
You could see that stream_get_contents() used an existing stream created by fopen() to get the contents as a string.
file_get_contents() is the more preferred way as it doesn't depend on an open stream, and is efficient with your memory using memory mapping techniques. For external sites reading, you can also set HTTP headers when getting the content. (Refer to https://www.php.net/manual/en/function.file-get-contents.php for more info)
For larger files/resources, stream_get_contents() may be preferred as it delivers the content fractionally as opposed to file_get_contents() where the entire data is dumped in memory.
I'd like to know if there is a faster way of concatenating 2 text files in PHP, than the usual way of opening txt1 in a+, reading txt2 line by line and copying each line to txt1.
If you want to use a pure-PHP solution, you could use file_get_contents to read the whole file in a string and then write that out (no error checking, just to show how you could do it):
$fp1 = fopen("txt1", 'a+');
$file2 = file_get_contents("txt2");
fwrite($fp1, $file2);
It's probably much faster to use the cat program in linux if you have command line permissions for PHP
system('cat txt1 txt2 > txt3');
$content = file_get_contents("file1");
file_put_contents("file2", $content, FILE_APPEND);
I have found using *nix cat to be the most effective here, but if for whatever reason you don't have access to it, and you are concatenating large files, then you can use this line by line function. (Error handling stripped for simplicity).
function catFiles($arrayOfFiles, $outputPath) {
$dest = fopen($outputPath,"a");
foreach ($arrayOfFiles as $f) {
$FH = fopen($f,"r");
$line = fgets($FH);
while ($line !== false) {
fputs($dest,$line);
$line = fgets($FH);
}
fclose($FH);
}
fclose($dest);
}
While the fastest way is undobtedly to use OS commands, like cp or cat, this is hardly advisable for compatibility.
The fastest "PHP only" way is using file_get_contents, that reads the whole source file, in one shot but it also has some drawbacks. It will require a lot of memory for large files and for this reason it may fail depending on the memory assigned to PHP.
A universal clean and fast solution is to use fread and fwrite with a large buffer.
If the file is smaller than the buffer, all reading will happen in one burst, so speed is optimal, otherwise reading happens at big chunks (the size of the buffer) so the overhead is minimal and speed is quite good.
Reading line by line with fgets instead, has to test for every charachter, one by one, if it's a newline or line feed.
Also, reading line by line with fgets a file with many short lines will be slower as you will read many little pieces, of different sizes, depending of where newlines are positioned.
fread is faster as it only checks for EOF (which is easy) and reads files using a fixed size chunk you decide, so it can be made optimal for your OS or disk or kind of files (say you have many files <12k you can set the buffer size to 16k so they are all read in one shot).
// Code is untested written on mobile phone inside Stack Overflow, comes from various examples online you can also check.
<?php
$BUFFER_SIZE=1*1024*1024; // 1MB, bigger is faster.. depending on file sizes and count
$dest = fopen($fileToAppendTo "a+");
if (FALSE === $dest) die("Failed to open destination");
$handle = fopen("source.txt", "rb");
if (FALSE === $handle) {
fclose($dest);
die("Failed to open source");
}
$contents = '';
while( !feof($handle) ) {
fwrite($dest, fread($handle, $BUFFER_SIZE) );
}
fclose($handle);
fclose($dest);
?>
I'm looking for the most efficient way to write the contents of the PHP input stream to disk, without using much of the memory that is granted to the PHP script. For example, if the max file size that can be uploaded is 1 GB but PHP only has 32 MB of memory.
define('MAX_FILE_LEN', 1073741824); // 1 GB in bytes
$hSource = fopen('php://input', 'r');
$hDest = fopen(UPLOADS_DIR.'/'.$MyTempName.'.tmp', 'w');
fwrite($hDest, fread($hSource, MAX_FILE_LEN));
fclose($hDest);
fclose($hSource);
Does fread inside an fwrite like the above code shows mean that the entire file will be loaded into memory?
For doing the opposite (writing a file to the output stream), PHP offers a function called fpassthru which I believe does not hold the contents of the file in the PHP script's memory.
I'm looking for something similar but in reverse (writing from input stream to file). Thank you for any assistance you can give.
Yep - fread used in that way would read up to 1 GB into a string first, and then write that back out via fwrite. PHP just isn't smart enough to create a memory-efficient pipe for you.
I would try something akin to the following:
$hSource = fopen('php://input', 'r');
$hDest = fopen(UPLOADS_DIR . '/' . $MyTempName . '.tmp', 'w');
while (!feof($hSource)) {
/*
* I'm going to read in 1K chunks. You could make this
* larger, but as a rule of thumb I'd keep it to 1/4 of
* your php memory_limit.
*/
$chunk = fread($hSource, 1024);
fwrite($hDest, $chunk);
}
fclose($hSource);
fclose($hDest);
If you wanted to be really picky, you could also unset($chunk); within the loop after fwrite to absolutely ensure that PHP frees up the memory - but that shouldn't be necessary, as the next loop will overwrite whatever memory is being used by $chunk at that time.
Not sure if this is possible, but it's become an academic struggle now.
Using the __halt_compiler() trick to embed binary data in a PHP file, I've successfully created a self-opening script which will fseek() to __COMPILER_HALT_OFFSET__ (not too hard seeing as this precise example is documented in the manual)
Anyways, I've stored a small lump of binary ZIP data (a single folder containing a single file that says "hello world") after my call to __halt_compiler()
What I've tried to do is copy the data directly to the php://temp stream, and have done so with success (if I rewind() and passthru() the temporary stream handle, it dumps the data)
$php = fopen(__FILE__, 'rb');
$tmp = fopen('php://temp', 'r+b');
fseek($php, __COMPILER_HALT_OFFSET__);
stream_copy_to_stream($php, $tmp);
My problem comes with trying to now open php://temp1 with zip_open()
$zip = zip_open('php://temp');
1From what I can see (despite other such possibilities as lack of stream support with zip_open()) the problem here is the inherent non-permanence of data in php://memory and php://temp streams between handles. If this can be worked around, perhaps it is in fact possible.
It keeps kicking back error code 11, which I have found no2 documentation on (seemingly, like most other possible error codes)
var_dump($zip); // int(11)
2 As #cweiske pointed out, error code 11 = ZipArchive::ER_OPEN, Can't open file
Is this consequence to my attempt at using the php://temp stream, or some other possible issue? I'm also aware there exists an OOP approach (ZipArchive, et al.) but I figured I'd start with the basics.
Any ideas?
11 is the constant ZIPARCHIVE::ER_OPEN, which the manual describes with
Can't open file
Note that the manual does not state that stream wrappers may be used.
Please think about using PHP's phar extension - it does what you want, and is well tested.
i would like to understand how to use the buffer of a read file.
Assuming we have a big file with a list of emails line by line ( delimiter is a classic \n )
now, we want compare each line with each record of a table in our database in a kind of check like line_of_file == table_row.
this is a simple task if you have a normal file, otherwise, if you have a huge file the server usually stop the operation after few minute.
so what's the best way of doing this kind of stuff with the file buffer?
what i have so far is something like this:
$buffer = file_get_contents('file.txt');
while($row = mysql_fetch_array($result)) {
if ( preg_match('/'.$email.'/im',$buffer)) {
echo $row_val;
}
}
$buffer = file_get_contents('file.txt');
$lines = preg_split('/\n/',$buffer);
//or $lines = explode('\n',$buffer);
while($row = mysql_fetch_array($result)) {
if ( in_array($email,$lines)) {
echo $row_val;
}
}
Like already suggested in my closevotes to your question (hence CW):
You can use SplFileObject which implements Iterator to iterate over a file line by line to save memory. See my answers to
Least memory intensive way to read a file in PHP and
How to save memory when reading a file in Php?
for examples.
Don't use file_get_contents for large files. This pulls the entire file into memory all at once. You have to read it in pieces
$fp = fopen('file.txt', 'r');
while(!feof($fp)){
//get onle line
$buffer = fgets($fp);
//do your stuff
}
fclose($fp);
Open the file with fopen() and read it incrementally. Probably one line at a time with fgets().
file_get_contents reads the whole file into memory, which is undesirable if the file is larger than a few megabytes
Depending on how long this takes, you may need to worry about the PHP execution time limit, or the browser timing out if it doesn't receive any output for 2 minutes.
Things you might try:
set_time_limit(0) to avoid running up against the PHP time limit
Make sure to output some data every 30 seconds or so so the browser doesn't time out; make sure to flush(); and possibly ob_flush(); so your output is actually sent over the network (this is a kludge)
start a separate process (e.g. via exec()) to run this in the background. Honestly, anything that takes more than a second or two is best run in the background