In my script I wanted to clear the array elements to free memory from no longer used data.
I found myself in strange situation where using unset() causes:
( ! ) Fatal error: Allowed memory size of 134217728 bytes exhausted
(tried to allocate 16777224 bytes) in
.../models/Persons.php on line 60
This is code part which causes this problem:
$chunks_count = count($this->xml_records_chunk['fnames']) - 1;
for ($num = 0; $num <= $chunks_count; $num++) {
$chunks_count = count($this->xml_records_chunk['fnames']) - 1;
$not_last = ($num < $chunks_count ? ',' : '');
$new_records .= '(' . $this->xml_records_chunk['fnames'][$chunks_count] . ','
. $this->xml_records_chunk['lnames'][$chunks_count] . ' , '
. $this->xml_records_chunk['dobs'][$chunks_count] . ' , '
. $this->xml_records_chunk['phones'][$chunks_count] . ' )' . $not_last;
unset($this->xml_records_chunk['fnames'][$chunks_count]);
unset($this->xml_records_chunk['lnames'][$chunks_count]);
unset($this->xml_records_chunk['dobs'][$chunks_count]);
unset($this->xml_records_chunk['phones'][$chunks_count]);
}
Script works just fine without unset.
Now the questions are:
Why unset causes memory exhaustion?
What is the correct way to unset unsused array elements in this case?
I've already checked this for example:
What's better at freeing memory with PHP: unset() or $var = null
Ok null indeed works a bit other way since with it script dies on line 61 - 3rd unset.
That is a good question why your unset breaks your memory. But you call $this->xml_records_chunk as variable in your code. So i would suggest that you have an array with all existing elements so you have allocated the complete memory already.
I think in that case you don't need to cleanup your array and memory because you have already allocated that memory. The GC is not that bad. So if you script doesn't use the variable anymore it's cleaned.
In your case i would suggest that you change your array structure and put the iterator to the first entry of your value something like this:
$this->xml_records_chunk[$chunks_count]['phones']
Then you have the following structure
$this->xml_records_chunk[$chunks_count] = [
'phones',
'...',
'...
]
Then you can clean with a single unset the complete array with
unset($this->xml_records_chunk[$chunks_count])
that could cause less problems and perhaps you could check the Iterator-Interface to iterate and delete your data.
Related
We have 2 millions encrypted records in a table to export. We are using Drupal 8 but we cannot export data it through custom views or using webform export due to encryption of sensitive data. So we have to write a custom function to export data in CSV or Excel. But it throw "Allowed Memory Exhausted" error due large amount of data whenever we tried to export it.
It seems the best option is loading data in smaller chunks and appending to the same sheet. How can we achieve this approach? Or any idea to do it in PHP or Drupal 8.
Exporting to CSV is by far the simpler operation. There are a couple of ways to do this.
1. You could always use mysqldump with text delimiters to avoid PHP memory constraints:
mysqldump -u YOUR_USERNAME -p -t DATABASE_NAME TABLE_NAME
--fields-terminated-by=","
--fields-optionally-enclosed-by="\""
--fields-escaped-by="\""
--lines-terminated-by="\r\n"
--tab /PATH/TO/TARGET_DIR
Line breaks added for readability. By default, mysqldump also generates a .sql file with DROP/CREATE TABLE statements. The -t option skips that.
2. You can make a MySQL query and define INTO OUTFILE with the appropriate delimiters to format your data as CSV and save it into a file:
SELECT * FROM `db_name`.`table_name`
INTO OUTFILE 'path_to_folder/table_dump.csv'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\r\n';"
If you run this on the command line, you can probably get away with a single call without the need to batch it (subject to your server specs and MySQL memory config).
If you do need to batch, then add something like LIMIT 0, 100000 where 100000 is whatever is a good result set size, and adapt your filename to match: table_dump_100000.csv etc. Merging the resulting CSV dumps into one file should be a simple operation.
3. If you do want to run this over PHP, then you most likely have to batch it. Basic steps:
A loop with for($i = 0; $i <= $max_rows; $i += $incr) where $incr is the batch size. In the loop:
Make MySQL query with variables used in the LIMIT clause; as in LIMIT $i, $incr.
Write the rows with fputcsv into your target file. Define your handle before the loop.
The above is more of a homework assignment than an attempt to provide ready code. Get started and ask again (with code shown). Whatever you do, make sure the data variables used for each batch iteration are reused or cleared to prevent massive memory usage buildup.
You can up your script's memory limit with ini_set('memory_limit', '2048M'); (or whatever your server can handle). If you run into max execution time, set_time_limit(600) (10 min; or whatever seems enough) at the start of your script.
Never tried with 2 million records.
But it works with a few hundred thousand, using drush and a script similar to:
<?php
// php -d memory_limit=-1 vendor/bin/drush php:script export_nodes.php
// options
$type = 'the_type';
$csv_file_name = '/path/to/csv_file.csv';
$delimiter = '"';
$separator = ',';
// fields
$fields = [
'nid',
'title',
'field_one',
'field_two',
'field_three',
];
// header
$header = '';
foreach ($fields as $field) {
$header = $header . $delimiter . $field . $delimiter . $separator;
}
$header = $header . PHP_EOL;
file_put_contents ($csv_file_name, $header, FILE_APPEND);
unset ($header);
// get nodes
$nodes = \Drupal::entityTypeManager()
->getStorage('node')
->loadByProperties([
'type' => $type,
]);
// loop nodes
foreach ($nodes as $node) {
$line = '';
// loop fields
foreach ($fields as $field) {
$field_value_array = $node->get($field)->getValue();
if (empty ($field_value_array[0]['value'])) {
$field_value = '';
}
else {
$field_value = $field_value_array[0]['value'];
}
$line = $line . $delimiter . $field_value . $delimiter . $separator;
}
unset ($field_value_array);
unset ($field_value);
// break line
$line = $line . PHP_EOL;
// write line
file_put_contents ($csv_file_name, $line, FILE_APPEND);
unset ($line);
}
unset ($nodes);
I have my code already working, it will parse the files and insert the records, the issue that is stumping me as I have never had to do this is, how can I tell my code to parse 1-300 files then wait then parse the next "batch" 301-500 and so on until it's finished parsing all the files. I'm needed to parse over 50 thousand files, so obviously I'm reaching php's memory limit and execution time which has already been increased but I don't think I could set it extremely high to process 50 thousand.
I need help with how do I tell my code to run 1-x then rerun and run x-y?
My code is (Note, I am gathering more information that what's in my snip below)
$xml_files = glob(storage_path('path/to/*.xml'));
foreach ($xml_files as $file) {
$data = simplexml_load_file($file);
... Parse XML and get certain nodes ...
$name = $data->record->memberRole->member->name;
... SQL to insert record into DB ...
Members::firstOrCreate(
['name' => $name]
);
}
Simplest, if inelegant solution, is calling the script multiple times with an offset and using a for loop instead of forach.
$xml_files = glob(storage_path('path/to/*.xml'));
$offset = $_GET['offset'];
// Or if calling the script via command line:
// $offset = $argv[1];
$limit = $offset + 300;
for ($i = $offset; $i < $limit; $i++) {
$data = simplexml_load_file($xml_files[$i]);
// process and whatever
}
If you're calling the script as a web page, just add a query param like my-xml-parser.php?offset=300 and get offset like this: $offset = $_GET['offset'].
If you're calling this as a command line script, call it like this: php my-xml-parser.php 300, and get the offset from argv: $offset = $argv[1]
EDIT
If it's a web script, you can try and add a curl call that would call itself with the next offset without waiting for an answer.
PHPExcel need too much memory to load file. I want improve this code by memory usage.
It breaks with error Fatal error: Allowed memory size of 536870912 bytes exhausted on this code:
/* class PHPExcel_Cell
*
* $returnValue = array()
*
*/
$sortKeys = array();
foreach (array_unique($returnValue) as $coord) {
sscanf($coord,'%[A-Z]%d', $column, $row);
$sortKeys[sprintf('%3s%09d',$column,$row)] = $coord;
}
ksort($sortKeys);
return array_values($sortKeys);
$returnValue = array("B1", "C12", "C1", "D3", "B2"...)
must be sorted like array("B1", "B2", "C1", "C12")
First problem: If I understand correctly, array_unique use one additional array to store result, so total memory usage x2 (if we have array with unique elements). But I think not need to use array_unique, because any duplicates will be rewritten with this line:
$sortKeys[sprintf('%3s%09d',$column,$row)] = $coord;
Second Problem: This code use two arrays: $returnValue and $sortKeys (2x memory), so I rewrote it like:
$len = count($returnValue);
for ($i = 0; $i < $len; $i++) {
$val = $returnValue[$i];
unset($returnValue[$i]);
sscanf($val,'%[A-Z]%d', $column, $row);
$returnValue[sprintf('%3s%09d',$column,$row)] = $val;
}
ksort($returnValue);
return array_values($returnValue);
But unset() doesn't free memory, only removed element from array and gc_collect_cycles() also not working.
How can I free memory after unset?
Maybe you know other way, how improve this code by memory usage?
P.S. I can not use xlsx2csv and other bash tools.
Your code looks pretty strange, you remove element from array and then add a new one in one loop.
When I used PHPExcel i used destructors to free memory and it worked very well
function __destruct()
{
if ($this->phpExcelObj) {
\PHPExcel_Calculation::unsetInstance($this->phpExcelObj);
if ($this->phpExcelObj) {
$this->phpExcelObj->disconnectWorksheets();
unset($this->phpExcelObj);
}
}
}
The other way is to make templates with Word and load that template with PHPExcel, then you will avoid using memory consumption operations for making markup of your document.
These techniques helped me to load million of rows using PHPExcel with not so much memory.
I'm not sure if it helps, but You could try this:
$spreadsheet->disconnectWorksheets();
unset($spreadsheet);
from: https://phpspreadsheet.readthedocs.io/en/latest/topics/creating-spreadsheet/#clearing-a-workbook-from-memory
So I'm trying to cache an array in a file and use it somewhere else.
import.php
// Above code is to get each line in CSV and put in it in an array
// (1 line is 1 multidimensional array) - $csv
$export = var_export($csv, true);
$content = "<?php \$data=" . $export . ";?>";
$target_path1 = "/var/www/html/Samples/test";
file_put_contents($target_path1 . "recordset.php", $content);
somewhere.php
ini_set('memory_limit','-1');
include_once("/var/www/html/Samples/test/recordset.php");
print_r($data);
Now, I've included recordset.php in somewhere.php to use the array stored in it. It works fine when the uploaded CSV file has 5000 lines, now if i try to upload csv with 50000 lines for example, i'm getting a fatal error:
Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 79691776 bytes)
How can I fix it or is there a possible way to achieve what i want in a more convenient way? Speaking about the performance... Should i consider the CPU of the server? I've override the memory limit and set it to -1 in somewhere.php
There are 2 ways to fix this:
You need to increase memory(RAM) on the server as memory_limit can only use memory which is available on server. And it seems that you have very low RAM available for PHP.
To Check the total RAM on Linux server:
<?php
$fh = fopen('/proc/meminfo','r');
$mem = 0;
while ($line = fgets($fh)) {
$pieces = array();
if (preg_match('/^MemTotal:\s+(\d+)\skB$/', $line, $pieces)) {
$mem = $pieces[1];
break;
}
}
fclose($fh);
echo "$mem kB RAM found"; ?>
Source: get server ram with php
You should parse your CSV file in chunks & every time release occupied memory using unset function.
For one off my projects I need to import a very huge text file ( ~ 950MB ). I'm using Symfony2 & Doctrine 2 for my project.
My problem is that I get errors like:
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 24 bytes)
The error even occurs if I increase the memory limit to 1GB.
I tried to analyze the problem by using XDebug and KCacheGrind ( as part of PHPEdit ), but I don't really understand the values :(
I'am looking for a tool or a method (Quick & Simple due to the fact that I don't have much time) to find out why memory is allocated and not freed again.
Edit
To clear some things up here is my code:
$handle = fopen($geonameBasePath . 'allCountries.txt','r');
$i = 0;
$batchSize = 100;
if($handle) {
while (($buffer = fgets($handle,16384)) !== false) {
if( $buffer[0] == '#') //skip comments
continue;
//split parts
$parts = explode("\t",$buffer);
if( $parts[6] != 'P')
continue;
if( $i%$batchSize == 0 ) {
echo 'Flush & Clear' . PHP_EOL;
$em->flush();
$em->clear();
}
$entity = $em->getRepository('MyApplicationBundle:City')->findOneByGeonameId( $parts[0] );
if( $entity !== null) {
$i++;
continue;
}
//create city object
$city = new City();
$city->setGeonameId( $parts[0] );
$city->setName( $parts[1] );
$city->setInternationalName( $parts[2] );
$city->setLatitude($parts[4] );
$city->setLongitude( $parts[5] );
$city->setCountry( $em->getRepository('MyApplicationBundle:Country')->findOneByIsoCode( $parts[8] ) );
$em->persist($city);
unset($city);
unset($entity);
unset($parts);
unset($buffer);
echo $i . PHP_EOL;
$i++;
}
}
fclose($handle);
Things I have tried, but nothing helped:
Adding second parameter to fgets
Increasing memory_limit
Unsetting vars
Increasing memory limit is not going to be enough. When importing files like that, you buffer the reading.
$f = fopen('yourfile');
while ($data = fread($f, '4096') != 0) {
// Do your stuff using the read $data
}
fclose($f);
Update :
When working with an ORM, you have to understand that nothing is actually inserted in the database until the flush call. Meaning all those objects are stored by the ORM tagged as "to be inserted". Only when the flush call is made, the ORM will check the collection and start inserting.
Solution 1 : Flush often. And clear.
Solution 2 : Don't use the ORM. Go for plain SQL command. They will take up far less memory than the object + ORM solution.
33554432 are 32MB
change memory limit in php.ini for example 75MB
memory_limit = 75M
and restart server
Instead of simply reading the file, you should read the file line by line. Every time you do read the one line you should process your data. Do NOT try to fit EVERYTHING in memory. You will fail. The reason for that is that while you can put the TEXT file in ram, you will not be able to also have the data as php objects/variables/whathaveyou at the same time, since php by itself needs much larger amounts of memory for each of them.
What I instead suggest is
a) read a new line,
b) parse the data in the line
c) create the new object to store in the database
d) goto step a, by unset(ting) the old object first or reusing it's memory