Addtitional info:
I'm running this from the command line. CentOS 6, 32GB Ram total, 2GB Memory for PHP.
I tried increasing the memory limit to 4GB, but now I get a Fatal error: String size overflow. PHP maximum string size is 2GB.
My code is very simple test code:
$Reader = new SpreadsheetReader($_path_to_files . 'ABC.xls');
$i = 0;
foreach ($Reader as $Row)
{ $i++;
print_r($Row);
if($i>10) break;
}
And it is only to print 10 rows. And that is taking 2 Gigabytes of memory?
The error is occuring at line 253 in excel_reader2.php
Inside class OLERead, inside function read($sFilenName)
Here is the code causing my exhaustion:
if ($this->numExtensionBlocks != 0) {
$bbdBlocks = (BIG_BLOCK_SIZE - BIG_BLOCK_DEPOT_BLOCKS_POS)/4;
}
for ($i = 0; $i < $bbdBlocks; $i++) { // LINE 253
$bigBlockDepotBlocks[$i] = GetInt4d($this->data, $pos);
$pos += 4;
}
I solved the problem. It turned out to be somewhat unrelated to the php code.
The program I am writing downloads .xls, .xlsx, and .csv files from email and FTP. The .xls file that was causing the memory overflow was downloaded in ASCII mode instead of Binary.
I changed my default to binary mode, and added a check that changes it to ASCII mode for .csv files.
I still find it strange that the program creates a 2GB string because of that. If there are no line breaks in the binary file, then I can see perhaps how the entire file might end up in one string. But the file is only 286KB. So, that's strange.
Related
I have a file with 3,200,000 lines of csv data (with 450 columns). Total file size is 6 GB.
I read the file like this:
$data = file('csv.out');
Without fail, it only reads 897,000 lines. I confirmed with 'print_r', and echo sizeof($data). I increased my "memory_limit" to a ridiculous value like 80 GB but didn't make a difference.
Now, it DID read in my other large file, same number of lines (3,200,000) but only a few columns so total file size 1.1 GB. So it appears to be a total file size issue. FYI, 897,000 lines in the $data array is around 1.68 GB.
Update: I increased the second (longer) file to 2.1 GB (over 5 million lines) and it reads it in fine, yet truncates the other file at 1.68 GB. So does not appear to be a size issue. If I continue to increase the size of the second file to 2.2 GB, instead of truncating it and continuing the program (like it does for the first file), it dies and core dumps.
Update: I verified my system is 64 bit by printing integer and float numbers:
<?php
$large_number = 2147483647;
var_dump($large_number); // int(2147483647)
$large_number = 2147483648;
var_dump($large_number); // float(2147483648)
$million = 1000000;
$large_number = 50000 * $million;
var_dump($large_number); // float(50000000000)
$large_number = 9223372036854775807;
var_dump($large_number); //
int(9223372036854775807)
$large_number = 9223372036854775808;
var_dump($large_number); //
float(9.2233720368548E+18)
$million = 1000000;
$large_number = 50000000000000 * $million;
var_dump($large_number); // float(5.0E+19)
print "PHP_INT_MAX: " . PHP_INT_MAX . "\n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)\n";
?>
The output from this script is:
int(2147483647)
int(2147483648)
int(50000000000)
int(9223372036854775807)
float(9.2233720368548E+18)
float(5.0E+19)
PHP_INT_MAX: 9223372036854775807
PHP_INT_SIZE: 8 bytes (64 bits)
So since it's 64 bit, and memory limit is set really high, why is PHP not reading files > 2.15 GB?
Some things that come to mind:
If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.
If reading the file takes too long, there could be time-outs.
If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.
Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)
I fixed it. All I had to do was change the way I read the files. Why...I do not know.
Old code that only reads 2.15 GB out of 6.0 GB:
$data = file('csv.out');
New code that reads the full 6.0 GB:
$data = array();
$i=1;
$handle = fopen('csv.out');
if ($handle) {
while (($data[$i] = fgets($handle)) !== false){
// process the line read
$i++;
}
Feel free to shed some light on why. There must be some limitation when using
$var=file();
Interestingly, 2.15 GB is close to the 32 bit limit I read about.
This question already has answers here:
Streaming a large file using PHP
(5 answers)
Closed 4 years ago.
I have a Laravel 5.3 project. I need to import and parse a pretty large (1.6M lines) text file.
I am having memory resource issues. I think at some point, I need to use chunk but am having trouble getting the file loaded to do so.
Here is what I am trying;
if(Input::hasFile('file')){
$path = Input::file('file')->getRealPath(); //assign file from input
$data = file($path); //load the file
$data->chunk(100, function ($content) { //parse it 100 lines at a time
foreach ($content as $line) {
//use $line
}
});
}
I understand that file() will return an array vs. File::get() which will return a string.
I have increased my php.ini upload and memory limits to be able to handle the file size but am running into this error;
Allowed memory size of 524288000 bytes exhausted (tried to allocate 4096 bytes)
This is occurring at the line;
$data = file($path);
What am I missing? And/or is this the most ideal way to do this?
Thanks!
As mentioned, file() reads the entire file into an array, in this case 1.6 million elements. I doubt that is possible. You can read each line one by one overwriting the previous one:
$fh = fopen($path "r");
if($fh) {
while(($line = fgets($fh)) !== false) {
//use $line
}
}
The only way to keep it from timing out is to set the maximum execution time:
set_time_limit(0);
If file is too large, you need split your file without php, you can use exec command safely, if you want use just with php interpreter, you need many memory and it need long time, linux commands save your time for each run.
exec('split -C 20m --numeric-suffixes input_filename output_prefix');
After that you may use Directory Iterator and read each file.
Regards
I have a php script which reads in a text file and does a count of all the lines in the file which match a specified regular expression. The script has worked well up until now as it segfaulted on the fread of a file over 2GB.
Actually before the segfault, I initially received the PHP Fatal Error: PHP Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 2223941409 bytes).
To fix that I added this line to my script: ini_set('memory_limit', '4G');
That fixes the memory size exhausted error but I get the segfault on fread now.
Here's a condensed working version of the script which will exhibit the error:
#!/usr/bin/php
<?php
ini_set('memory_limit', '4G');
$file = $argv[1];
$fh = fopen($file, 'r');
$fsize = filesize($file);
print("SIZE: ".$fsize."\n" );
$myData = fread($fh, $fsize);
print("Got passed fread!\n");
fclose($fh);
preg_match_all( '/Z\t/', $myData, $sArray );
$scount = count($sArray,COUNT_RECURSIVE);
print("COUNT: ".$scount."\n");
?>
Sample output:
$ runtest.php testfile.txt
SIZE: 2223941408
Segmentation fault (core dumped)
Other info:
OS: CentOS release 6.7 (Final) x86_64
PHP 5.3.3 (cli) (built: Jul 9 2015 17:39:00) 64-bit
You're probably using a 32-bit PHP distribution. Under such architecture a PHP process cannot allocate more than 2 GB of RAM. In practice the upper limit is closer to 1GB than 2GB—the interpreter crashes way before getting to the 2 GB limit. Additionally, integer variables cannot be greater than PHP_INT_MAX which, in 32 builds, is as small as 2,147,483,647 (232-1).
This highlights two problems in your code:
$fsize = filesize($file);
... will not work if the file size is greater that PHP_INT_MAX.
Because PHP's integer type is signed and many platforms use 32bit integers, some filesystem functions may return unexpected results for files which are larger than 2GB.
$myData = fread($fh, $fsize);
... will crash for large files because you're loading the complete file contents in memory and then doing additional processing that will probably eat even more memory.
You'd better redesign your algorithm and read the file in small chunks (the task where fread() excels at). Counting the occurrences of a two-character substring should only need a few KB of RAM.
Here's a possible approach that assumes single byte encoding (as your code does):
// Ridiculously small value for illustration purposes, set to something bigger for better performance
define('CHUNK_SIZE', 4);
$fsize = $scount = 0;
$fh = fopen($file, 'r');
$possible_pending_match = false;
while (!feof($fh)) {
$chunk = fread($fh, CHUNK_SIZE);
$fsize += strlen($chunk);
$scount += substr_count($chunk, "Z\t");
if ($possible_pending_match && $chunk[0]==="\t") {
$scount++;
}
$possible_pending_match = substr($chunk, -1)==='Z';
}
print("SIZE: ".$fsize."\n" );
print("COUNT: ".$scount."\n");
print("MEMORY: ".memory_get_peak_usage(true)." bytes\n");
You'd need to add 1 to $scount to get the same result as your code, which counts one extra item for no evident reason—it feels like a bug but I don't know the specs.
Hi 2GB means that there is some internal 32-bit limitation in PHP. Are you running 32-bit PHP?
There is an alternative solution. You can do it with a very small memory overhead using a shell command called by PHP. The memory used would be no more than a couple of MB as grep and wc only load portions of the file into memory.
$lines = shell_exec("grep 'Z\t' $file | wc --lines");
grep: command to search files using regular expression
wc: command that returns the number of words/lines/chars
I don't know is it right or not. I have a PHP file with contents like this:
$x = explode("\n", $y); // Making $x has length 65000
foreach ($x as $k) {
//Some code here
}
And often my script auto-stopping after ~25000 loops.
Why? Is it PHP default configuration?
This behaviour can be because of 2 reasons
The script execution time is more than allocated to it ... Try increasing max_execution_time in php.ini .
The memory limit of script may be more than allocated .For this try changing the value of
memory_limit in php.ini
The default memory limit of PHP is 8MB (I mean standard distro's, not a default PHP compile from source, because that is limitless).
When I do this code:
$x = array();
for ($i = 0; $i < 65000; $i++) {
$x[$i] = $i;
}
echo (memory_get_peak_usage()/1024).'<br />';
echo (memory_get_usage()/1024).'<br />';
echo count($x);
It outputs:
9421.9375
9415.875
65000
To test this, I've increased my memory limit tho. But it would abort with an error if you can't allocate more memory;
for ($i = 0; $i < 1e6; $i++) { // 1 Million
$x[$i] = $i;
}
It reports back;
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 32 bytes) in /Applications/MAMP/htdocs/run_app.php on line 5
For personal use (I have 16GB RAM, so it's no issue) I use these starting codes:
// Settings
ini_set('error_reporting', E_ALL); // Shows all feedback from the parser for debugging
ini_set('max_execution_time', 0); // Changes the 30 seconds parser exit to infinite
ini_set('memory_limit', '2048M'); // Sets the memory that may be used to 512MegaBytes
This way you can increase your limit the way you want it. This won't work with online hosts unless you have a dedicated server tho. This is VERY dangerous tho, if you don't know what you're doing. Infinite loops will crash your browser or even your OS if it starts to lack RAM/resources.
In the foreach loop, pass the array as a reference. In PHP, foreach makes a copy of the array before it begins looping. Thus if you have an array that is 100K then foreach will allocate at the very least another 100K for processing. By passing it as a reference, you are only concern about the size of an address.
so I'm trying to use PHP to read a large csv file (about 64 000 entries) and stack it in a big array
Using fopen() and fgetcsv, i managed to get most of the file read, though it suddenly stops at entry 51829 for no apparent reason
I checked in my array and in the CSV and the data get imported correctly, line 51829 in the csv and in the array are the same, etc...
any of you got an idea of why I can't read all the file?
Here's my code ^^ Thanks in advance
$this->load->model('Import_model', 'import');
$this->import->clear_student_group();
//CHARGER LE CSV
$row = 1;
$data = array();
$file = "Eleve_Groupe_Mat.csv";
$lines = count(file($file));
if ($fp = fopen('Eleve_Groupe_Mat.csv', 'r')) {
$rownumber = 0;
while (!feof($fp)) {
$rownumber++;
$row = fgetcsv($fp);
$datarow = explode(";", $row[0]);
for($i = 0; $i <= 7; $i++) {
$dataset[$rownumber][$i] = $datarow[$i];
}
}
fclose($fp);
}
$this->import->insert_student_group($dataset);
Your script is probably running out of memory. Check your error logs or turn on error reporting to confirm. To make your script work you can try increasing the memory_limit, which can be done either in php.ini or using ini_set(). But a much better approach would be not to store the csv data in an array at all, but rather process each line as you read it. This keeps the memory footprint low and alleviates the need for increasing the memory_limit.
You're exhausting all the memory PHP has available to it. A file that big can't fit into memory, especially not in PHP, which stores a lot of additional data with every variable created.
You should read a limited number of lines in, say 100 or so, then process the lines you've read in and discard them. Then read the next 100 lines or so and repeat the process until you've processed the entire file.
I think Fopen has restrictions on reading files. Try using file_get_contents();