php load big json file once - php

I created a local web application to display and interact with data that I got an a big Json file (around 250 Mo). I have several function in order to display differently the data but currently each one starts by reading and parsing the Json file :
$string = file_get_contents("myFile.json");
$json_a = json_decode($string, true);
The thing is that it is quite slow (about 4 seconds) because the file is big... So I would like to parse the Json file once and for all and store the parsed data in memory so that each function could use it :
session_start();
$string = file_get_contents("myFile.json");
$_SESSION['json_a'] = json_decode($string, true);
and then use $_SESSION['json_a'] in my functions.
But I got an error when the function access to this $_SESSION['json_a'] variable :
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 195799278 bytes)
I assume it is because the file is too big.. but why does it crash when the variable is used and not when it is built ? And why can I do it with my first solution (parsing each time) if it is too big for the memory ?
And finally my real question : how can I optimise that ? (I know sql database would be much better but I happen to have json data)

Related

Allocate memory failure

Working on a XML string variable, that normally occupies 1.4 Mb, happens to me that.
When is executed that part of the script:
echo memory_get_usage()." - ";
$aux_array['T3']=substr($xml, $array_min["T3"], strpos($xml, "</second>", $contador)-$array_min["T3"]);
print_r(memory_get_usage());
The display is
5059720 - 5059896
But when is that part:
echo memory_get_usage()." - ";
$aux_array['US']=substr($xml, $array_min["US"], strpos($xml, "</second>", $contador)-$array_min["US"]);
print_r(memory_get_usage());
the display is
5059896 - 6417152
To me, both orders are the same, but memory_get_usage() don't lie.
the exact output is:
PHP Fatal error: Allowed memory size of 268435456 bytes exhausted
(tried to allocate 1305035 bytes) in
/var/www/html/devel/soap/index.php on line 138
Because a while sentence makes allocate that size many times.
Can you figure it out what's the problem?
The problem is that you are holding ~268MB of data in your PHP script, even though your input XML file is only 1.4MB big. Clear/remove variables you don't need anymore to clean up some memory while your PHP script is running. Or change the way you work with the XML file. Instead of loading the whole XML file at once you might want to use an XMLReader which walks over the XML nodes instead.

Make an array out of contents of a very large file

I have a 16GB file. I'm trying to make an array that splits by line down. Right now I'm using file_get_contents with preg_split, using the following.
$list = preg_split('/$\R?^/m', file_get_contents("file.txt));
However, I get the following error:
Fatal error: Allowed memory size of 10695475200 bytes exhausted (tried to allocate 268435456 bytes) in /var/www/html/mysite/script.php on line 35
I don't want to use too much memory. I know you can buffer it with fopen, but I'm not sure how to create an array using the contents of the file with a line down being the delimiter.
The question does not address how I would make an array from the contents of the file using preg_split similar to how I do above.

PHP - How to append to a JSON file

I generate JSON files which I load into datatables, and these JSON files can contain thousands of rows from my database. To generate them, I need to loop through every row in the database and add each database row as a new row in the JSON file. The problem I'm running into is this:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 262643 bytes)
What I'm doing is I get the JSON file with file_get_contents($json_file) and decode it into an array then I add a new row to the array, then encode the array back into JSON and export it to the file with file_put_contents($json_file).
Is there a better way to do this? Is there a way I can prevent the memory increasing with each loop iteration? Or is there a way I can clear the memory before it reaches the limit? I need the script to run to completion, but with this memory problem it barely gets up to 5% completion before crashing.
I can keep rerunning the script and each time I rerun it, it adds more rows to the JSON file, so if this memory problem is unavoidable, is there a way to automatically rerun the script numerous times until its finished? For example could I detect the memory usage, and detect when its about to reach the limit, then exit out of the script and restart it? I'm on wpengine so they won't allow security risky functions like exec().
So I switched to using CSV files and it solved the memory problem. The script runs vastly faster too. JQuery DataTables doesn't have built in support for CSV files, so I wrote a function to convert the CSV file to JSON:
public function csv_to_json($post_type) {
$data = array(
"recordsTotal" => $this->num_rows,
"recordsFiltered" => $this->num_rows,
"data"=>array()
);
if (($handle = fopen($this->csv_file, 'r')) === false) {
die('Error opening file');
}
$headers = fgetcsv($handle, 1024, "\t");
$complete = array();
while ($row = fgetcsv($handle, 1024, "\t")) {
$complete[] = array_combine($headers, $row);
}
fclose($handle);
$data['data'] = $complete;
file_put_contents($this->json_file,json_encode($data,JSON_PRETTY_PRINT));
}
So the result is I create a CSV file and a JSON file much faster than creating a JSON file alone, and there are no issues with memory limits.
Personally as I said in the comments, I would use CSV files. They have several advantages.
you can read / write one line at a time so you only manage the memory for one line
you can just append new data into the file.
PHP has plenty of built in support using either the fputcsv() or SPL file objects.
you can load them directly into the database using using "Load Data Infile"
http://dev.mysql.com/doc/refman/5.7/en/load-data.html
The only cons are
keep the same schema through the whole file
no nested data structures
The issue with Json, is ( as far as I know ) you have to keep the whole thing in memory as a single data set. Therefor you cannot stream it ( line for line ) like a normal text file. There is really no solution beside limiting the size of the json data, which may or may not even be easy to do. You can increase the memory some, but that is just a temporary fix if you expect the data to continue to grow.
We use CSV files in a production environment and I regularly deal with datasets that are 800k or 1M rows. I've even seen one that was 10M rows. We have a single table of 60M rows ( MySql ) that is populated from CSV uploads. So it will work and be robust.
If your set on Json, then I would just come up with a fixed number of rows that works and design your code to only run that many rows at a time. It's impossible for me to guess how to do that without more details.

Avoiding wasteful memory consumption

Using php.exe 5.2.17.17 on Windows 7, this:
include_once('simple_html_dom.php');
function fix($setlink)
{
$setaddr = $setlink->href;
$filename="..\\".urldecode($setaddr);
$set=file_get_contents($filename);
$setstr = str_get_html($set);
// Do stuff requiring whole file
unset($set);
unset($setstr);
}
$setindexpath = "..\index.htm";
foreach(file_get_html($setindexpath)->find('a.setlink') as $setlink)
{
fix($setlink);
}
(relying on external data files) fails thus:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 32 bytes) in [snip]\simple_html_dom.php on line 620
"function fix" is a suggestion from the answer to a similar question here. unset() is wishful thinking :-)
How may I avoid the continuing consumption of memory by the strings unused on the next loop iteration? Without defacing the code too much. And while providing the whole file as string.
try $setstr->clear(); before unset($setstr);
see http://simplehtmldom.sourceforge.net/manual_faq.htm#memory_leak
side note: $setstr seems to be a misnomer; it's not a string but the dom repesentation of the html doc.

Memory exhausted error for json_parse with PHP

I have the following code:
<?php
$FILE="giant-data-barf.txt";
$fp = fopen($FILE,'r');
//read everything into data
$data = fread($fp, filesize($FILE));
fclose($fp);
$data_arr = json_decode($data);
var_dump($data_arr);
?>
The file giant-data-barf.txt is, as its name suggests, a huge file(it's 5.4mb right now, but it could go up to several GB)
When I execute this script, I get the following error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes) in ........./data.php on line 12
I looked at possible solutions, and saw this:
ini_set('memory_limit','16M');
and my question is, is there a limit to how big I should set my memory? Or is there a better way of solving this problem?
THIS IS A VERY BAD IDEA, that said, you'll need to set
ini_set('memory_limit',filesize($FILE) + SOME_OVERHEAD_AMOUNT);
because you're reading the entire thing into memory. You may very well have to set the memory limit to two times the size of the file since you also want to JSON_DECODE
NOTE THAT ON A WEB SERVER THIS WILL CONSUME MASSIVE AMOUNTS OF MEMORY AND YOU SHOULD NOT DO THIS IF THE FILE WILL BE MANY GIGABYTES AS YOU SAID!!!!
Is it really a giant JSON blob? You should look at converting this to a database or other format which you can use random or row access with before parsing with PHP.
I've given all my servers a memory_limit of 100M... didn't run into trouble yet.
I would consider splitting up that file somehow, or get rid of it and use a database

Categories