How to insert large JSON file to database with PHP

How to insert large JSON file to database with PHP - php

So, I have a large JSON file and I want to insert data from that file to MySQL database. I can only use PHP 5.6 and can't change php.ini file.
When I'm using json_decode(), I get error, that there is to much memory to allocate. So I searched for some kind of library and I found this library and I'm using it like that:
set_time_limit(300);
$listener = new \JsonStreamingParser\Listener\InMemoryListener();
$stream = fopen('data/stops.json', 'r');
try {
$parser = new \JsonStreamingParser\Parser($stream, $listener);
$parser->parse();
fclose($stream);
} catch (Exception $e) {
fclose($stream);
throw $e;
}
var_dump($listener->getJson());
But I still get that annoying error about momory:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried
to allocate 72 bytes) in
SOME/PATH/TO/vendor/salsify/json-streaming-parser/src/Parser.php on
line 516
I have no clue how to obtain my JSON file. So I'm looking for some advice, or someone who can help me to write a code that will be responsible for converting JSON file to array, so I colud insert data from that array to database. Also I'm not looking for one time solution, becouse I need to parse that JSON a least one per day.
Here is the whole JSON file: JSON, the structure looks like this:
{
"2017-07-26":
{
"lastUpdate":"2017-07-26 07:07:01",
"stops":[
{
"stopId":32640,
"stopCode":null,
"stopName":null,
"stopShortName":"2640",
"stopDesc":"Amona",
"subName":"2640",
"date":"2017-07-26",
"stopLat":54.49961,
"stopLon":18.44532,
"zoneId":null,
"zoneName":null,
"stopUrl":"",
"locationType":null,
"parentStation":null,
"stopTimezone":"",
"wheelchairBoarding":null,
"virtual":null,
"nonpassenger":null,
"depot":null,
"ticketZoneBorder":null,
"onDemand":null,
"activationDate":"2017-07-25"
},
{...},
{...}
]
}
}

You need to set the option via ini_set : http://php.net/manual/en/function.ini-set.php
ini_set('memory_limit','16M');
taken from https://davidwalsh.name/increase-php-memory-limit-ini_set
alternatively, you can use a .htaccess file :
php_value memory_limit '512M'
credit goes to https://stackoverflow.com/a/42578190/351861

Related

Read big json file with php

I've read somewhere that I should use the library salsify/jsonstreamingparser to open a big json file but it's giving me the same error as with json_decode:
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /data/www/default/database/vendor/salsify/json-streaming-parser/src/Listener/InMemoryListener.php on line 92
I have to do it in php because I'm using a free hosting which doesn't have python.
Basically what I want to do is download a big json file unzip it and process the content. I don't know why in php I wasn't able to do it in all day but in python I did it in 5 minutes:
import os
import json
import urllib
import zipfile
json_file = 'AllSets-x.json'
zip_file = json_file + '.zip'
urllib.urlretrieve ("https://mtgjson.com/json/" + zip_file, zip_file)
dir_path = os.path.dirname(os.path.realpath(__file__))
zip_ref = zipfile.ZipFile(dir_path + "/" + zip_file, 'r')
zip_ref.extractall(dir_path)
zip_ref.close()
json_data = json.load(open(json_file, 'r'))
print json_data.keys()[0]
This is what I have in php:
<?php
require_once __DIR__ . '/vendor/autoload.php';
include "../credentials.php";
error_reporting(E_ALL); # Reports all errors
ini_set('display_errors','Off'); # Do not display errors for the end-users (security issue)
ini_set('error_log','/tmp/php-errors.log'); # Set a logging file
// Override the default error handler behavior
set_exception_handler(function($exception) {
$logger->error($exception);
echo "Something went wrong!";
});
$logger = new Monolog\Logger('channel-name');
$logger->pushHandler(new Monolog\Handler\StreamHandler('/tmp/php-errors.log', Monolog\Logger::DEBUG));
$logger->info("Parsing json file");
$listener = new \JsonStreamingParser\Listener\InMemoryListener();
$json_file = __DIR__ . "/AllSets-x.json";
$stream = fopen($json_file, 'r');
try {
$parser = new \JsonStreamingParser\Parser($stream, $listener);
$parser->parse();
fclose($json_file);
} catch (Exception $e) {
fclose($json_file);
throw $e;
}
$logger->info("Json file parsed");
$json_data = $listener->getJson();
$logger->info("Displaying json data");
var_dump($json_data);

Using the InMemoryListener certainly defeats the purpose of a streaming parser. That'll just unpack everything into memory (likely worse memory-wise than plain json_decode).
You'll need to catch each JSON object block individually, if you want to work under such constraints.
There's the SimpleObjectQueueListener which could possibly fit the bill. If the specific JSON has a bunch of [{…}, {…}, {…}] objects to be processed:
$listener = new \JsonStreamingParser\Listener\SimpleObjectQueueListener("print_r", 0);
// would just print out each object block from the JSON stream
Obviously you would use a callback like "process_my_json_blobs" instead. (Or a prepared callback like [$pdo, "execute"] perhaps.)
Btw, reading the whole JSON input just works on your local Python, because there's usually no memory_limit as for common PHP setups. (Python at best relies on the systems ulimit.)

PHPExcel_IOFactory::load Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 32 bytes) [duplicate]

I am using PHPExcel (found here: https://github.com/PHPOffice/PHPExcel). If i try to read more than approximately 2000 rows then it shows memory error as follows.
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried
to allocate 71 bytes) in
/home/sample/PHPExcelReader/Classes/PHPExcel/worksheet.php on line 89
My Excel data range is A1:X2000
Below is my code used to read the excel.
ini_set('memory_limit', '-1');
/** Include path **/
set_include_path(get_include_path() . PATH_SEPARATOR . 'Classes/');
/** PHPExcel_IOFactory */
include $unsecured_param['home_dir'].'APIs/PHPExcelReader/Classes/PHPExcel/IOFactory.php';
$inputFileName = $target; // File to read
//echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory to identify the format<br />';
try {
$objPHPExcel = PHPExcel_IOFactory::load($inputFileName);
} catch(Exception $e) {
die('Error loading file "'.pathinfo($inputFileName,PATHINFO_BASENAME).'": '.$e->getMessage());
}
$sheetData = $objPHPExcel->getActiveSheet()->rangeToArray('A1:X2000', null, true, true, true)
//store data into array..
$i=0;$j=0;$max_rows=0;$max_columns=0;
foreach($sheetData as $rec)
{
foreach($rec as $part)
{//echo "items[$j][$i]=" ; echo $part;echo "<br>";
$items[$j][$i]=$part; $i=$i+1;
if($j==0) {$max_columns=$i;}
}
$j=$j+1;$i=0;
}
$max_rows=$j;
Could any one please let me know how to overcome this issue ?

Consider using cell caching to reduce the memory required to hold the workbook in memory, as described in section 4.2.1 of the developer documentation
And consider not using toArray() and then using that to build another array in memory.... doing this is really using a lot of memory to hold duplicated data, when you could simply loop through the rows and columns of the worksheet to do what you need

This error means that the PHP file that you are running has exceeded the allowed size in memory for PHP on your server. You can edit your PHP.ini file to allow your PHP files to allocate more space in memory when they are running, which may assist in this, but at the same time, if you are running a 32 bit Linux OS on your server for whatever reason, there is a hard cape of 3.5GB that the process can take up, so even allocating more than that, it will still fail and therefore cause a similar issue.
In cases such as this, it really comes down to the fact that the amount of data that you are trying to pull is too large and you need to scale it back somehow. It isn't necessarily an issue with the code, but rather how much data you are actually attempting to show/process.
Using google, I managed to find that the amount of memory that your noting (134217728 bytes), matches with the 128MB default that PHP.ini uses for memory_limit. Changing the value in the ini natively, will resolve this issue. If unable to do that, then you need to somehow limit the amount of data that you pull in one time.
Information:
http://ca1.php.net/manual/en/ini.core.php#ini.memory-limit

PHPExcel throws Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes)

I am using PHPExcel (found here: https://github.com/PHPOffice/PHPExcel). If i try to read more than approximately 2000 rows then it shows memory error as follows.
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried
to allocate 71 bytes) in
/home/sample/PHPExcelReader/Classes/PHPExcel/worksheet.php on line 89
My Excel data range is A1:X2000
Below is my code used to read the excel.
ini_set('memory_limit', '-1');
/** Include path **/
set_include_path(get_include_path() . PATH_SEPARATOR . 'Classes/');
/** PHPExcel_IOFactory */
include $unsecured_param['home_dir'].'APIs/PHPExcelReader/Classes/PHPExcel/IOFactory.php';
$inputFileName = $target; // File to read
//echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory to identify the format<br />';
try {
$objPHPExcel = PHPExcel_IOFactory::load($inputFileName);
} catch(Exception $e) {
die('Error loading file "'.pathinfo($inputFileName,PATHINFO_BASENAME).'": '.$e->getMessage());
}
$sheetData = $objPHPExcel->getActiveSheet()->rangeToArray('A1:X2000', null, true, true, true)
//store data into array..
$i=0;$j=0;$max_rows=0;$max_columns=0;
foreach($sheetData as $rec)
{
foreach($rec as $part)
{//echo "items[$j][$i]=" ; echo $part;echo "<br>";
$items[$j][$i]=$part; $i=$i+1;
if($j==0) {$max_columns=$i;}
}
$j=$j+1;$i=0;
}
$max_rows=$j;
Could any one please let me know how to overcome this issue ?

Consider using cell caching to reduce the memory required to hold the workbook in memory, as described in section 4.2.1 of the developer documentation
And consider not using toArray() and then using that to build another array in memory.... doing this is really using a lot of memory to hold duplicated data, when you could simply loop through the rows and columns of the worksheet to do what you need

This error means that the PHP file that you are running has exceeded the allowed size in memory for PHP on your server. You can edit your PHP.ini file to allow your PHP files to allocate more space in memory when they are running, which may assist in this, but at the same time, if you are running a 32 bit Linux OS on your server for whatever reason, there is a hard cape of 3.5GB that the process can take up, so even allocating more than that, it will still fail and therefore cause a similar issue.
In cases such as this, it really comes down to the fact that the amount of data that you are trying to pull is too large and you need to scale it back somehow. It isn't necessarily an issue with the code, but rather how much data you are actually attempting to show/process.
Using google, I managed to find that the amount of memory that your noting (134217728 bytes), matches with the 128MB default that PHP.ini uses for memory_limit. Changing the value in the ini natively, will resolve this issue. If unable to do that, then you need to somehow limit the amount of data that you pull in one time.
Information:
http://ca1.php.net/manual/en/ini.core.php#ini.memory-limit

Allowed memory size exhausted error exporting from mongodb

I try to export some documents from mongodb to .csv. For some large lists, the files would be something like 40M, I get errors about memory limit:
Fatal error: Allowed memory size of 134217728 bytes exhausted
(tried to allocate 44992513 bytes) in
/usr/share/php/Zend/Controller/Response/Abstract.php on line 586
I wonder why this error happens. What consumes such an amount of memory? How do I avoid such error without changing memory_limit which is set 128M now.
I use something like this:
public static function exportList($listId, $state = self::SUBSCRIBED)
{
$list = new Model_List();
$fieldsInfo = $list->getDescriptionsOfFields($listId);
$headers = array();
$params['list_id'] = $listId;
$mongodbCursor = self::getCursor($params, $fieldsInfo, $headers);
$mongodbCursor->timeout(0);
$fp = fopen('php://output', 'w');
foreach ($mongodbCursor as $subscriber) {
foreach ($fieldsInfo as $fieldInfo) {
$field = ($fieldInfo['constant']) ? $fieldInfo['field_tag'] : $fieldInfo['field_id'];
if (!isset($subscriber->$field)) {
$row[$field] = '';
} elseif (Model_CustomField::isMultivaluedType($fieldInfo['type'])) {
$row[$field] = array();
foreach ($subscriber->$field as $value) {
$row[$field][] = $value;
}
$row[$field] = implode(self::MULTIVALUED_DELEMITOR, $row[$field]);
} else {
$row[$field] = $subscriber->$field;
}
}
fputcsv($fp, $row);
}
}
Then in my controller I try to call it something like this:
public function exportAction()
{
set_time_limit(300);
$this->_helper->layout->disableLayout();
$this->_helper->viewRenderer->setNoRender();
$fileName = $list->list_name . '.csv';
$this->getResponse()->setHeader('Content-Type', 'text/csv; charset=utf-8')
->setHeader('Content-Disposition', 'attachment; filename="'. $fileName . '"');
Model_Subscriber1::exportList($listId);
echo 'Peak memory usage: ', memory_get_peak_usage()/1024, ' Memory usage: ', memory_get_usage()/1024;
}
So I'm at the end of the file where I export data. It's rather strange that for the list I export with something like 1M documents, it exports successfully and displays:
> Peak memory usage: 50034.921875 Kb Memory usage: 45902.546875 Kb
But when I try to export 1.3M documents, then after several minutes I only get in export file:
Fatal error: Allowed memory size of 134217728 bytes exhausted
(tried to allocate 44992513 bytes) in
/usr/share/php/Zend/Controller/Response/Abstract.php on line 586.
The size of documents I export are approximately the same.
I increased memory_limit to 256M and tried to export 1.3M list, this is what it showed:
Peak memory usage: 60330.4609375Kb Memory usage: 56894.421875 Kb.
It seems very confusing to me. Isn't this data so inaccurate? Otherwise, why it causes memory exhausted error with memory_limit set to 128M?

While the size of the documents may be about the same, the size allocated by PHP to process them isn't directly proportional to the document size or number of documents. This is because different types require different memory allocation in PHP. You may be able to free some memory as you go, but I don't see any place where you can in your code.
The best answer is to probably just increase the memory limit.
One thing you could do is offload the processing to an external script and call that from PHP. Many languages do this sort of processing in a more memory efficient way than PHP.
I've also noticed that the memory_get_peak_usage() isn't always accurate. I would try an experiment to increase the mem_limit to say 256 and run it on the larger data set (the 1.3 million). You are likely to find that it reports below the 128 limit as well.

I could reproduce this issue in a similar case of exporting a CSV file, where my system should have had enough memory, as shown by memory_get_usage(), but ended up with the same fatal error:
Fatal error: Allowed memory size.
I circumvented this issue by outputting the CSV contents into a physical temporary file, that I eventually zipped, before reading it out.
I wrote the file in a loop, so that each iteration wrote only a limited chunk of data, so that I never exceded the memory limit.
After zipping, the compression ratio was such, that I could handle raw files of over 10 times the size I initially hit the wall at. All up, it was a success.
Hint: when creating your archive, don't unlink the archive component(s) before invoking $zip->close(), as this call seems to be the one doing the business. Otherwise you'll end up with an empty archive!
Code sample:
<?php
$zip = new ZipArchive;
if ($zip->open($full_zip_path, ZipArchive::CREATE) === TRUE) {
$zip->addFile($full_csv_path, $csv_name);
$zip->close();
$Response->setHeader("Content-type", "application/zip; charset=utf-8");
$Response->setHeader("Content-disposition", "attachment; filename=" . $zip_name);
$Response->setBody(file_get_contents($full_zip_path));
}
else {
var_dump(error_get_last());
echo utf8_decode("Couldn't create zip archive '$full_zip_path'."), "\r\n";
}
unset($zip);
?>
Attention: when adding items to the zip archive, don't prepend a leading slash to the item's name if using Windows based OS.
Discussion over the original issue:
The Zend file at the line quoted is the
public function outputBody()
{
$body = implode('', $this->_body);
echo $body;
}
from the outputBody() method of the Zend_Controller_Response_Abstract class.
It looks like, however you do it, through echo, or print, or readfile, the output is always captured, and stuck into the response body, even if your turn the response return feature off before the dispatch.
I even tried to use the clearBody() class method, within the echo loop, with in mind that each $response->sendResponse() followed by $response->clearBody() would release memory, but it failed.
The way Zend handles the sending of the response is such that I always got the memory allocation of the full size of the raw CSV file.
Yet to be determined how it would be possible to tell Zend not to "capture" the output buffer.

How to use PHP to parse large XML file sequentially

I'm trying to parse a moderately large XML file (6mb) in php using simpleXML. The script takes each record from the XML file, checks to see if it's already been imported, and, if it hasn't, updates/inserts that record into my own db.
The problem is I'm constantly getting a Fatal error about exceeding memory allocation:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 256 bytes) in /.../system/database/drivers/mysql/mysql_result.php on line 162
I avoided that error by using the following line to increase max memory allocation (following tip from here):
ini_set('memory_limit', '-1');
However, then I run up against the max execution time of 60 seconds, and, for whatever reason, my server (XAMPP on Mac OS X) won't let me increase that time (script simply won't run if I try to include a line like:)
set_time_limit(240);
This all seems very inefficient, however; shouldn't I be able to break the file up some how and process it sequentially? In the controller below I have a count variable ($cycle) to keep track of what record I'm on but I can't figure out how to implement it that it still doesn't have to process the whole XML file.
The controller (I'm using CodeIgniter) has this basic structure:
$f = base_url().'data/data.xml';
if($data = file_get_contents($f))
{
$cycle = 0;
$xml = new SimpleXMLElement($data);
foreach($xml->person as $p)
{
//this makes a single call to db for single field based on id of record in XML file
if($this->_notImported('source',$p['id']))
{
//various process here, mainly breaking up the data for inserting into four different bales
}
$cycle++;
}
}
Any thoughts?
Edited
To shed further light on what I'm doing, I'm grabbing most of the attributes of each element and subeelement and inserting them into my db. For example, using my old code, I have something like this:
$insert = array('indiv_name' => $p['fullname'],
'indiv_first' => ($p['firstname']),
'indiv_last' => ($p['lastname']),
'indiv_middle' => ($p['middlename']),
'indiv_other' => ($p['namemod']),
'indiv_full_name' => $full_name,
'indiv_title' => ($p['title']),
'indiv_dob' => ($p['birthday']),
'indiv_gender' => ($p['gender']),
'indiv_religion' => ($p['religion']),
'indiv_url' => ($url)
);
With the suggestions of using XMLReader (see below), how could I accomplish parsing the attributes of both the main element and subelements?

Use XMLReader.
Say your document is like this:
<test>
<hello>world</hello>
<foo>bar</foo>
</test>
With XMLReader:
$xml = new XMLReader;
$xml->open('doc.xml');
$xml->read();
while ($xml->read()) {
if ($xml->nodeType == XMLReader::ELEMENT) {
print $xml->name.': ';
} else if ($xml->nodeType == XMLReader::TEXT) {
print $xml->value.PHP_EOL;
}
}
This outputs:
hello: world
foo: bar
The nice thing is that you can also use expand to fetch the node as a DOMNode object.

It sounds like the problem is you are reading the whole xml file into memory before trying to manipulate it. Use XMLReader to walk you way through the file stream instead of loading everything into memory for manipulation.

How about instead of using xml, use json? The data will be much smaller in JSON format and I would imagine you won't run into the same memory issues because of that fact.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to insert large JSON file to database with PHP - php

Related

Read big json file with php

PHPExcel_IOFactory::load Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 32 bytes) [duplicate]

PHPExcel throws Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes)

Allowed memory size exhausted error exporting from mongodb

How to use PHP to parse large XML file sequentially

Categories

Resources