PHP Memory Debugging - php

For one off my projects I need to import a very huge text file ( ~ 950MB ). I'm using Symfony2 & Doctrine 2 for my project.
My problem is that I get errors like:
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 24 bytes)
The error even occurs if I increase the memory limit to 1GB.
I tried to analyze the problem by using XDebug and KCacheGrind ( as part of PHPEdit ), but I don't really understand the values :(
I'am looking for a tool or a method (Quick & Simple due to the fact that I don't have much time) to find out why memory is allocated and not freed again.
Edit
To clear some things up here is my code:
$handle = fopen($geonameBasePath . 'allCountries.txt','r');
$i = 0;
$batchSize = 100;
if($handle) {
while (($buffer = fgets($handle,16384)) !== false) {
if( $buffer[0] == '#') //skip comments
continue;
//split parts
$parts = explode("\t",$buffer);
if( $parts[6] != 'P')
continue;
if( $i%$batchSize == 0 ) {
echo 'Flush & Clear' . PHP_EOL;
$em->flush();
$em->clear();
}
$entity = $em->getRepository('MyApplicationBundle:City')->findOneByGeonameId( $parts[0] );
if( $entity !== null) {
$i++;
continue;
}
//create city object
$city = new City();
$city->setGeonameId( $parts[0] );
$city->setName( $parts[1] );
$city->setInternationalName( $parts[2] );
$city->setLatitude($parts[4] );
$city->setLongitude( $parts[5] );
$city->setCountry( $em->getRepository('MyApplicationBundle:Country')->findOneByIsoCode( $parts[8] ) );
$em->persist($city);
unset($city);
unset($entity);
unset($parts);
unset($buffer);
echo $i . PHP_EOL;
$i++;
}
}
fclose($handle);
Things I have tried, but nothing helped:
Adding second parameter to fgets
Increasing memory_limit
Unsetting vars

Increasing memory limit is not going to be enough. When importing files like that, you buffer the reading.
$f = fopen('yourfile');
while ($data = fread($f, '4096') != 0) {
// Do your stuff using the read $data
}
fclose($f);
Update :
When working with an ORM, you have to understand that nothing is actually inserted in the database until the flush call. Meaning all those objects are stored by the ORM tagged as "to be inserted". Only when the flush call is made, the ORM will check the collection and start inserting.
Solution 1 : Flush often. And clear.
Solution 2 : Don't use the ORM. Go for plain SQL command. They will take up far less memory than the object + ORM solution.

33554432 are 32MB
change memory limit in php.ini for example 75MB
memory_limit = 75M
and restart server

Instead of simply reading the file, you should read the file line by line. Every time you do read the one line you should process your data. Do NOT try to fit EVERYTHING in memory. You will fail. The reason for that is that while you can put the TEXT file in ram, you will not be able to also have the data as php objects/variables/whathaveyou at the same time, since php by itself needs much larger amounts of memory for each of them.
What I instead suggest is
a) read a new line,
b) parse the data in the line
c) create the new object to store in the database
d) goto step a, by unset(ting) the old object first or reusing it's memory

Related

How to free memory in PHP or improve PHPExcel?

PHPExcel need too much memory to load file. I want improve this code by memory usage.
It breaks with error Fatal error: Allowed memory size of 536870912 bytes exhausted on this code:
/* class PHPExcel_Cell
*
* $returnValue = array()
*
*/
$sortKeys = array();
foreach (array_unique($returnValue) as $coord) {
sscanf($coord,'%[A-Z]%d', $column, $row);
$sortKeys[sprintf('%3s%09d',$column,$row)] = $coord;
}
ksort($sortKeys);
return array_values($sortKeys);
$returnValue = array("B1", "C12", "C1", "D3", "B2"...)
must be sorted like array("B1", "B2", "C1", "C12")
First problem: If I understand correctly, array_unique use one additional array to store result, so total memory usage x2 (if we have array with unique elements). But I think not need to use array_unique, because any duplicates will be rewritten with this line:
$sortKeys[sprintf('%3s%09d',$column,$row)] = $coord;
Second Problem: This code use two arrays: $returnValue and $sortKeys (2x memory), so I rewrote it like:
$len = count($returnValue);
for ($i = 0; $i < $len; $i++) {
$val = $returnValue[$i];
unset($returnValue[$i]);
sscanf($val,'%[A-Z]%d', $column, $row);
$returnValue[sprintf('%3s%09d',$column,$row)] = $val;
}
ksort($returnValue);
return array_values($returnValue);
But unset() doesn't free memory, only removed element from array and gc_collect_cycles() also not working.
How can I free memory after unset?
Maybe you know other way, how improve this code by memory usage?
P.S. I can not use xlsx2csv and other bash tools.
Your code looks pretty strange, you remove element from array and then add a new one in one loop.
When I used PHPExcel i used destructors to free memory and it worked very well
function __destruct()
{
if ($this->phpExcelObj) {
\PHPExcel_Calculation::unsetInstance($this->phpExcelObj);
if ($this->phpExcelObj) {
$this->phpExcelObj->disconnectWorksheets();
unset($this->phpExcelObj);
}
}
}
The other way is to make templates with Word and load that template with PHPExcel, then you will avoid using memory consumption operations for making markup of your document.
These techniques helped me to load million of rows using PHPExcel with not so much memory.
I'm not sure if it helps, but You could try this:
$spreadsheet->disconnectWorksheets();
unset($spreadsheet);
from: https://phpspreadsheet.readthedocs.io/en/latest/topics/creating-spreadsheet/#clearing-a-workbook-from-memory

phpexcel memory exhausted with 128Mb memory reading only first row of a big file

I've a memory problem with an xlsx file of about 95.500 rows and 28 columns.
To handle such big file (more than 10 MB xlsx) i wrote below code but when i execute the code and calling the load method i receive a memory exhausted error even with only one row read! (I've assigned only 128Mb to php interpreter)
Please consider that:
Currently i try to read only one single row and the receive the error about memory exhausted (see $chunkFilter->setRows(1,1);)
After solving this problem about reading the first line, i need to read all other lines to load content in a database table
If you think that there is other library or solution, please consider that i prefer PHP as language because is the main language used for this application But i can accept any other solutions with other languages (like go)
Please, don't simply suggest to increment memory of php process. I alredy know that this is possible but this code run on VPS shared server with only 512Mb of RAM maximum and I need to maintain the memory use lowest as possible
there is solution? please find below code that i use:
/** Define a Read Filter class implementing PHPExcel_Reader_IReadFilter to read file in "chunks" */
class chunkReadFilter implements PHPExcel_Reader_IReadFilter {
private $_startRow = 0;
private $_endRow = 0;
/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize) {
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '') {
// Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
return true;
}
return false;
}
}
function loadXLSFile($inputFile){
// Initiate cache
$cacheMethod = PHPExcel_CachedObjectStorageFactory:: cache_to_sqlite3;
if (!PHPExcel_Settings::setCacheStorageMethod($cacheMethod)) {
echo date('H:i:s'), " Unable to set Cell Caching using ", $cacheMethod,
" method, reverting to memory", EOL;
}
$inputFileType = PHPExcel_IOFactory::identify($inputFile);
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$chunkFilter = new chunkReadFilter();
// Tell the Read Filter, the limits on which rows we want to read this iteration
$chunkFilter->setRows(1,1);
// Tell the Reader that we want to use the Read Filter that we've Instantiated
$objReader->setReadFilter($chunkFilter);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFile);
}
UPDATE
Below the error returned as requested by pamelus
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 112 bytes) in /vendor/phpoffice/phpexcel/Classes/PHPExcel/Reader/Excel2007.php on line 471
PHP Stack trace:
PHP 1. {main}() dataimport.php:0
PHP 2. loadFileToDb($inputFile = *uninitialized*, $tabletoupdate = *uninitialized*) dataimport.php:373
PHP 3. PHPExcel_Reader_Excel2007->load($pFilename = *uninitialized*) dataimport.php:231
Given the low memory limit you have, I can suggest you an alternative to PHPExcel that would solve your problem once and for all: Spout. It only requires 10MB of memory, so you should be good!
Your loadXLSXFile() function would become:
use Box\Spout\Reader\ReaderFactory;
use Box\Spout\Common\Type;
function loadXLSFile($inputFile) {
$reader = ReaderFactory::create(Type::XLSX);
$reader->open($inputFile);
foreach ($reader->getSheetIterator() as $sheet) {
foreach ($sheet->getRowIterator() as $row) {
// $row is the first row of the sheet. Do something with it
break; // you won't read any other rows
}
break; // if you only want to read the first sheet
}
$reader->close();
}
It's that simple! No need for caching, filters, and other optimizations :)

Caching large Array causes memory exhaustion

So I'm trying to cache an array in a file and use it somewhere else.
import.php
// Above code is to get each line in CSV and put in it in an array
// (1 line is 1 multidimensional array) - $csv
$export = var_export($csv, true);
$content = "<?php \$data=" . $export . ";?>";
$target_path1 = "/var/www/html/Samples/test";
file_put_contents($target_path1 . "recordset.php", $content);
somewhere.php
ini_set('memory_limit','-1');
include_once("/var/www/html/Samples/test/recordset.php");
print_r($data);
Now, I've included recordset.php in somewhere.php to use the array stored in it. It works fine when the uploaded CSV file has 5000 lines, now if i try to upload csv with 50000 lines for example, i'm getting a fatal error:
Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 79691776 bytes)
How can I fix it or is there a possible way to achieve what i want in a more convenient way? Speaking about the performance... Should i consider the CPU of the server? I've override the memory limit and set it to -1 in somewhere.php
There are 2 ways to fix this:
You need to increase memory(RAM) on the server as memory_limit can only use memory which is available on server. And it seems that you have very low RAM available for PHP.
To Check the total RAM on Linux server:
<?php
$fh = fopen('/proc/meminfo','r');
$mem = 0;
while ($line = fgets($fh)) {
$pieces = array();
if (preg_match('/^MemTotal:\s+(\d+)\skB$/', $line, $pieces)) {
$mem = $pieces[1];
break;
}
}
fclose($fh);
echo "$mem kB RAM found"; ?>
Source: get server ram with php
You should parse your CSV file in chunks & every time release occupied memory using unset function.

Handle Large File with PHP

I have a file with the size of around 10 GB or more. The file contains only numbers ranging from 1 to 10 on each line and nothing else. Now the task is to read the data[numbers] from the file and then sort the numbers in ascending or descending order and create a new file with the sorted numbers.
Can anyone of you please help me with the answer?
I'm assuming this is somekind of homework and goal for this is to sort more data than you can hold in your RAM?
Since you only have numbers 1-10, this is not that complicated task. Just open your input file and count how many occourances of every specific number you have. After that you can construct simple loop and write values into another file. Following example is pretty self explainatory.
$inFile = '/path/to/input/file';
$outFile = '/path/to/output/file';
$input = fopen($inFile, 'r');
if ($input === false) {
throw new Exception('Unable to open: ' . $inFile);
}
//$map will be array with size of 10, filled with 0-s
$map = array_fill(1, 10, 0);
//Read file line by line and count how many of each specific number you have
while (!feof($input)) {
$int = (int) fgets($input);
$map[$int]++;
}
fclose($input);
$output = fopen($outFile, 'w');
if ($output === false) {
throw new Exception('Unable to open: ' . $outFile);
}
/*
* Reverse array if you need to change direction between
* ascending and descending order
*/
//$map = array_reverse($map);
//Write values into your output file
foreach ($map AS $number => $count) {
$string = ((string) $number) . PHP_EOL;
for ($i = 0; $i < $count; $i++) {
fwrite($output, $string);
}
}
fclose($output);
Taking into account the fact, that you are dealing with huge files, you should also check script execution time limit for your PHP environment, following example will take VERY long for 10GB+ sized files, but since I didn't see any limitations concerning execution time and performance in your question, I'm assuming it is OK.
I had a similar issue before. Trying to manipulate such a large file ended up being huge drain on resources and it couldn't cope. The easiest solution I ended up with was to try and import it into a MySQL database using a fast data dump function called LOAD DATA INFILE
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
Once it's in you should be able to manipulate the data.
Alternatively, you could just read the file line by line while outputting the result into another file line by line with the sorted numbers. Not too sure how well this would work though.
Have you had any previous attempts at it or are you just after a possible method of doing it?
If that's all you don't need PHP (if you have a Linux maschine at hand):
sort -n file > file_sorted-asc
sort -nr file > file_sorted-desc
Edit: OK, here's your solution in PHP (if you have a Linux maschine at hand):
<?php
// Sort ascending
`sort -n file > file_sorted-asc`;
// Sort descending
`sort -nr file > file_sorted-desc`;
?>
:)

php fgetcsv multiple lines not only one or all

I wand to read biiiiig CSV-Files and want to insert them into a database. That already works:
if(($handleF = fopen($path."\\".$file, 'r')) !== false){
$i = 1;
// loop through the file line-by-line
while(($dataRow = fgetcsv($handleF,0,";")) !== false) {
// Only start at the startRow, otherwise skip the row.
if($i >= $startRow){
// Check if to use headers
if($lookAtHeaders == 1 && $i == $startRow){
$this->createUberschriften( array_map(array($this, "convert"), $dataRow ) );
} else {
$dataRow = array_map(array($this, "convert"), $dataRow );
$data = $this->changeMapping($dataRow, $startCol);
$this->executeInsert($data, $tableFields);
}
unset($dataRow);
}
$i++;
}
fclose($handleF);
}
My problem of this solution is, that it's very slow. But the files are too big to put it directly into the memory... So I wand to ask, if there a posibility to read, for example 10 lines, into the $dataRow array not only one or all.
I want to get a better balance between the memory and the performance.
Do you understand what i mean? Thanks for help.
Greetz
V
EDIT:
Ok, I still have to try to find a solution with the MSSQL-Database. My solution was to stack the data and than make a multiple-MSSQL-Insert:
while(($dataRow = fgetcsv($handleF,0,";")) !== false) {
// Only start at the startRow, otherwise skip the row.
if($i >= $startRow){
// Check if to use headers
if($lookAtHeaders == 1 && $i == $startRow){
$this->createUberschriften( array_map(array($this, "convert"), $dataRow ) );
} else {
$dataRow = array_map(array($this, "convert"), $dataRow );
$data = $this->changeMapping($dataRow, $startCol);
$this->setCurrentRow($i);
if(count($dataStack) > 210){
array_push($dataStack, $data);
#echo '<pre>', print_r($dataStack), '</pre>';
$this->executeInsert($dataStack, $tableFields, true);
// reset the stack
unset($dataStack);
$dataStack = array();
} else {
array_push($dataStack, $data);
}
unset($data);
}
$i++;
unset($dataRow);
}
}
Finaly I have to loop the Stack and build in mulitiple Insert in the method "executeInsert", to create a query like this:
INSERT INTO [myTable] (field1, field2) VALUES ('data1', 'data2'),('data2', 'datta3')...
That works much better. I still have to check the best balance, but therefor i can change only the value '210' in the code above. I hope that help's everybody with a similar problem.
Attention: Don't forget to execute the method "executeInsert" again after readin the complete file, because it could happen that there are still some data in the stack and the method will only be executed when the stack reach the size of 210....
Greetz
V
I think your bottleneck is not reading the file. Which is a text file. Your bottleneck is the INSERT in the SQL table.
Do something, just comment the line that actually do the insert and you will see the difference.
I had this same issue in the past, where i did exactly what you are doing. reading a 5+ million lines CSV and inserting it in a Mysql table. The execution time was 60 hours which is
unrealistic.
My solutions was switch to another db technology. I selected MongoDB and the execution time
was reduced to 5 minutes. MongoDB performs really fast on this scenarios and also have a tool called mongoimport that will allow you to import a csv file firectly from the command line.
Give it a try if the db technology is not a limitation on your side.
Another solution will be spliting the huge CSV file into chunks and then run the same php script multiple times in parallel and each one will take care of the chunks with an specific preffix or suffix on the filename.
I don't know which specific OS are you using, but in Unix/Linux there is a command line tool
called split that will do that for you and will also add any prefix or suffix you want to the filename of the chunks.

Categories