Read big json file with php

Read big json file with php - php

I've read somewhere that I should use the library salsify/jsonstreamingparser to open a big json file but it's giving me the same error as with json_decode:
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /data/www/default/database/vendor/salsify/json-streaming-parser/src/Listener/InMemoryListener.php on line 92
I have to do it in php because I'm using a free hosting which doesn't have python.
Basically what I want to do is download a big json file unzip it and process the content. I don't know why in php I wasn't able to do it in all day but in python I did it in 5 minutes:
import os
import json
import urllib
import zipfile
json_file = 'AllSets-x.json'
zip_file = json_file + '.zip'
urllib.urlretrieve ("https://mtgjson.com/json/" + zip_file, zip_file)
dir_path = os.path.dirname(os.path.realpath(__file__))
zip_ref = zipfile.ZipFile(dir_path + "/" + zip_file, 'r')
zip_ref.extractall(dir_path)
zip_ref.close()
json_data = json.load(open(json_file, 'r'))
print json_data.keys()[0]
This is what I have in php:
<?php
require_once __DIR__ . '/vendor/autoload.php';
include "../credentials.php";
error_reporting(E_ALL); # Reports all errors
ini_set('display_errors','Off'); # Do not display errors for the end-users (security issue)
ini_set('error_log','/tmp/php-errors.log'); # Set a logging file
// Override the default error handler behavior
set_exception_handler(function($exception) {
$logger->error($exception);
echo "Something went wrong!";
});
$logger = new Monolog\Logger('channel-name');
$logger->pushHandler(new Monolog\Handler\StreamHandler('/tmp/php-errors.log', Monolog\Logger::DEBUG));
$logger->info("Parsing json file");
$listener = new \JsonStreamingParser\Listener\InMemoryListener();
$json_file = __DIR__ . "/AllSets-x.json";
$stream = fopen($json_file, 'r');
try {
$parser = new \JsonStreamingParser\Parser($stream, $listener);
$parser->parse();
fclose($json_file);
} catch (Exception $e) {
fclose($json_file);
throw $e;
}
$logger->info("Json file parsed");
$json_data = $listener->getJson();
$logger->info("Displaying json data");
var_dump($json_data);

Using the InMemoryListener certainly defeats the purpose of a streaming parser. That'll just unpack everything into memory (likely worse memory-wise than plain json_decode).
You'll need to catch each JSON object block individually, if you want to work under such constraints.
There's the SimpleObjectQueueListener which could possibly fit the bill. If the specific JSON has a bunch of [{…}, {…}, {…}] objects to be processed:
$listener = new \JsonStreamingParser\Listener\SimpleObjectQueueListener("print_r", 0);
// would just print out each object block from the JSON stream
Obviously you would use a callback like "process_my_json_blobs" instead. (Or a prepared callback like [$pdo, "execute"] perhaps.)
Btw, reading the whole JSON input just works on your local Python, because there's usually no memory_limit as for common PHP setups. (Python at best relies on the systems ulimit.)

Related

How to insert large JSON file to database with PHP

So, I have a large JSON file and I want to insert data from that file to MySQL database. I can only use PHP 5.6 and can't change php.ini file.
When I'm using json_decode(), I get error, that there is to much memory to allocate. So I searched for some kind of library and I found this library and I'm using it like that:
set_time_limit(300);
$listener = new \JsonStreamingParser\Listener\InMemoryListener();
$stream = fopen('data/stops.json', 'r');
try {
$parser = new \JsonStreamingParser\Parser($stream, $listener);
$parser->parse();
fclose($stream);
} catch (Exception $e) {
fclose($stream);
throw $e;
}
var_dump($listener->getJson());
But I still get that annoying error about momory:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried
to allocate 72 bytes) in
SOME/PATH/TO/vendor/salsify/json-streaming-parser/src/Parser.php on
line 516
I have no clue how to obtain my JSON file. So I'm looking for some advice, or someone who can help me to write a code that will be responsible for converting JSON file to array, so I colud insert data from that array to database. Also I'm not looking for one time solution, becouse I need to parse that JSON a least one per day.
Here is the whole JSON file: JSON, the structure looks like this:
{
"2017-07-26":
{
"lastUpdate":"2017-07-26 07:07:01",
"stops":[
{
"stopId":32640,
"stopCode":null,
"stopName":null,
"stopShortName":"2640",
"stopDesc":"Amona",
"subName":"2640",
"date":"2017-07-26",
"stopLat":54.49961,
"stopLon":18.44532,
"zoneId":null,
"zoneName":null,
"stopUrl":"",
"locationType":null,
"parentStation":null,
"stopTimezone":"",
"wheelchairBoarding":null,
"virtual":null,
"nonpassenger":null,
"depot":null,
"ticketZoneBorder":null,
"onDemand":null,
"activationDate":"2017-07-25"
},
{...},
{...}
]
}
}

You need to set the option via ini_set : http://php.net/manual/en/function.ini-set.php
ini_set('memory_limit','16M');
taken from https://davidwalsh.name/increase-php-memory-limit-ini_set
alternatively, you can use a .htaccess file :
php_value memory_limit '512M'
credit goes to https://stackoverflow.com/a/42578190/351861

XML fails to load, without any errors message

I have a XML structure in my PHP file.
For example:
$file = file_get_contents($myFile);
$response = '<?xml version="1.0"?>';
$response .= '<responses>';
$response .= '<file>';
$response .= '<name>';
$response .= '</name>';
$response .= '<data>';
$response .= base64_encode($file);
$response .= '</data>';
$response .= '</file>';
$response .= '</responses>';
echo $response;
If i create .doc file or with other extension and put little text in, it works. But, if user load file with complex structure (not only text) - XML just not load, and i have a empty file without errors.
But the same files works on my other server.
I have try use simplexml_load_string for output errors, but i have no errors.
The server with PHP 5.3.3 have the problem; the one with PHP 5.6 hasn’t. It works if I try it with 5.3.3 on my local server.
Is the problem due to the PHP version? If so, how exactly?

There're basically three things that can be improved in your code:
Configure error reporting to actually see error messages.
Generate XML with a proper library, to ensure you cannot send malformed data.
Be conservative in memory usage (you're currently storing the complete file in RAM three times, two of them in a plain text representation that depending of file type can be significantly larger).
Your overall code could like like this:
// Quick code, needs more error checking and refining
$fp = fopen($myFile, 'rb');
if ($fp) {
$writer = new XMLWriter();
$writer->openURI('php://output');
$writer->startDocument('1.0');
$writer->startElement('responses');
$writer->startElement('file');
$writer->startElement('name');
$writer->endElement();
$writer->startElement('data');
while (!feof($fp)) {
// If I recall correctly, substring size must be multiple of 4
// to encode it properly (except for last part)
$writer->text(base64_encode(fread($fp, 10240)));
}
$writer->endElement();
$writer->endElement();
$writer->endElement();
fclose($fp);
}
I've tried this code with a 316 MB file and used 256 KB on my PC.
As a side note, inserting binary files inside XML is pretty troublesome when files are large. It makes extraction problematic because you can't use most of the usual tools due to extensive memory usage.

How do I upload big (video) files in streams to AWS S3 with Laravel 5 and filesystem?

I want to upload a big video file to my AWS S3 bucket. After a good deal of hours, I finally managed to configure my php.ini and nginx.conf files, so they allowed bigger files.
But then I got a "Fatal Error: Allowed Memory Size of XXXXXXXXXX Bytes Exhausted". After some time I found out larger files should be uploaded with streams using fopen(),fwrite(), and fclose().
Since I'm using Laravel 5, the filesystem takes care of much of this. Except that I can't get it to work.
My current ResourceController#store looks like this:
public function store(ResourceRequest $request)
{
/* Prepare data */
$resource = new Resource();
$key = 'resource-'.$resource->id;
$bucket = env('AWS_BUCKET');
$filePath = $request->file('resource')->getRealPath();
/* Open & write stream */
$stream = fopen($filePath, 'w');
Storage::writeStream($key, $stream, ['public']);
/* Store entry in DB */
$resource->title = $request->title;
$resource->save();
/* Success message */
session()->flash('message', $request->title . ' uploadet!');
return redirect()->route('resource-index');
}
But now I get this long error:
CouldNotCreateChecksumException in SignatureV4.php line 148:
A sha256 checksum could not be calculated for the provided upload body, because it was not seekable. To prevent this error you can either 1) include the ContentMD5 or ContentSHA256 parameters with your request, 2) use a seekable stream for the body, or 3) wrap the non-seekable stream in a GuzzleHttp\Stream\CachingStream object. You should be careful though and remember that the CachingStream utilizes PHP temp streams. This means that the stream will be temporarily stored on the local disk.
So I am currently completely lost. I can't figure out if I'm even on the right track. Here are the resource I try to make sense of:
AWS SDK guide for PHP: Stream Wrappers
AWS SDK introduction on stream wrappers
Flysystem original API on stream wrappers
And just to confuse me even more, there seems to be another way to upload large files other than streams: The so called "multipart" upload. I actually thought that was what the streams where all about...
What is the difference?

I had the same problem and came up with this solution.
Instead of using
Storage::put('file.jpg', $contents);
Which of course ran into an "out of memory error" I used this method:
use Aws\S3\MultipartUploader;
use Aws\Exception\MultipartUploadException;
// ...
public function uploadToS3($fromPath, $toPath)
{
$disk = Storage::disk('s3');
$uploader = new MultipartUploader($disk->getDriver()->getAdapter()->getClient(), $fromPath, [
'bucket' => Config::get('filesystems.disks.s3.bucket'),
'key' => $toPath,
]);
try {
$result = $uploader->upload();
echo "Upload complete";
} catch (MultipartUploadException $e) {
echo $e->getMessage();
}
}
Tested with Laravel 5.1
Here are the official AWS PHP SDK docs:
http://docs.aws.amazon.com/aws-sdk-php/v3/guide/service/s3-multipart-upload.html

the streaming part applies to downloads.
for uploads you need to know the content size. for large files multipart uploads are the way to go.

PHPExcel throws Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes)

I am using PHPExcel (found here: https://github.com/PHPOffice/PHPExcel). If i try to read more than approximately 2000 rows then it shows memory error as follows.
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried
to allocate 71 bytes) in
/home/sample/PHPExcelReader/Classes/PHPExcel/worksheet.php on line 89
My Excel data range is A1:X2000
Below is my code used to read the excel.
ini_set('memory_limit', '-1');
/** Include path **/
set_include_path(get_include_path() . PATH_SEPARATOR . 'Classes/');
/** PHPExcel_IOFactory */
include $unsecured_param['home_dir'].'APIs/PHPExcelReader/Classes/PHPExcel/IOFactory.php';
$inputFileName = $target; // File to read
//echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory to identify the format<br />';
try {
$objPHPExcel = PHPExcel_IOFactory::load($inputFileName);
} catch(Exception $e) {
die('Error loading file "'.pathinfo($inputFileName,PATHINFO_BASENAME).'": '.$e->getMessage());
}
$sheetData = $objPHPExcel->getActiveSheet()->rangeToArray('A1:X2000', null, true, true, true)
//store data into array..
$i=0;$j=0;$max_rows=0;$max_columns=0;
foreach($sheetData as $rec)
{
foreach($rec as $part)
{//echo "items[$j][$i]=" ; echo $part;echo "<br>";
$items[$j][$i]=$part; $i=$i+1;
if($j==0) {$max_columns=$i;}
}
$j=$j+1;$i=0;
}
$max_rows=$j;
Could any one please let me know how to overcome this issue ?

Consider using cell caching to reduce the memory required to hold the workbook in memory, as described in section 4.2.1 of the developer documentation
And consider not using toArray() and then using that to build another array in memory.... doing this is really using a lot of memory to hold duplicated data, when you could simply loop through the rows and columns of the worksheet to do what you need

This error means that the PHP file that you are running has exceeded the allowed size in memory for PHP on your server. You can edit your PHP.ini file to allow your PHP files to allocate more space in memory when they are running, which may assist in this, but at the same time, if you are running a 32 bit Linux OS on your server for whatever reason, there is a hard cape of 3.5GB that the process can take up, so even allocating more than that, it will still fail and therefore cause a similar issue.
In cases such as this, it really comes down to the fact that the amount of data that you are trying to pull is too large and you need to scale it back somehow. It isn't necessarily an issue with the code, but rather how much data you are actually attempting to show/process.
Using google, I managed to find that the amount of memory that your noting (134217728 bytes), matches with the 128MB default that PHP.ini uses for memory_limit. Changing the value in the ini natively, will resolve this issue. If unable to do that, then you need to somehow limit the amount of data that you pull in one time.
Information:
http://ca1.php.net/manual/en/ini.core.php#ini.memory-limit

Allowed memory size exhausted error exporting from mongodb

I try to export some documents from mongodb to .csv. For some large lists, the files would be something like 40M, I get errors about memory limit:
Fatal error: Allowed memory size of 134217728 bytes exhausted
(tried to allocate 44992513 bytes) in
/usr/share/php/Zend/Controller/Response/Abstract.php on line 586
I wonder why this error happens. What consumes such an amount of memory? How do I avoid such error without changing memory_limit which is set 128M now.
I use something like this:
public static function exportList($listId, $state = self::SUBSCRIBED)
{
$list = new Model_List();
$fieldsInfo = $list->getDescriptionsOfFields($listId);
$headers = array();
$params['list_id'] = $listId;
$mongodbCursor = self::getCursor($params, $fieldsInfo, $headers);
$mongodbCursor->timeout(0);
$fp = fopen('php://output', 'w');
foreach ($mongodbCursor as $subscriber) {
foreach ($fieldsInfo as $fieldInfo) {
$field = ($fieldInfo['constant']) ? $fieldInfo['field_tag'] : $fieldInfo['field_id'];
if (!isset($subscriber->$field)) {
$row[$field] = '';
} elseif (Model_CustomField::isMultivaluedType($fieldInfo['type'])) {
$row[$field] = array();
foreach ($subscriber->$field as $value) {
$row[$field][] = $value;
}
$row[$field] = implode(self::MULTIVALUED_DELEMITOR, $row[$field]);
} else {
$row[$field] = $subscriber->$field;
}
}
fputcsv($fp, $row);
}
}
Then in my controller I try to call it something like this:
public function exportAction()
{
set_time_limit(300);
$this->_helper->layout->disableLayout();
$this->_helper->viewRenderer->setNoRender();
$fileName = $list->list_name . '.csv';
$this->getResponse()->setHeader('Content-Type', 'text/csv; charset=utf-8')
->setHeader('Content-Disposition', 'attachment; filename="'. $fileName . '"');
Model_Subscriber1::exportList($listId);
echo 'Peak memory usage: ', memory_get_peak_usage()/1024, ' Memory usage: ', memory_get_usage()/1024;
}
So I'm at the end of the file where I export data. It's rather strange that for the list I export with something like 1M documents, it exports successfully and displays:
> Peak memory usage: 50034.921875 Kb Memory usage: 45902.546875 Kb
But when I try to export 1.3M documents, then after several minutes I only get in export file:
Fatal error: Allowed memory size of 134217728 bytes exhausted
(tried to allocate 44992513 bytes) in
/usr/share/php/Zend/Controller/Response/Abstract.php on line 586.
The size of documents I export are approximately the same.
I increased memory_limit to 256M and tried to export 1.3M list, this is what it showed:
Peak memory usage: 60330.4609375Kb Memory usage: 56894.421875 Kb.
It seems very confusing to me. Isn't this data so inaccurate? Otherwise, why it causes memory exhausted error with memory_limit set to 128M?

While the size of the documents may be about the same, the size allocated by PHP to process them isn't directly proportional to the document size or number of documents. This is because different types require different memory allocation in PHP. You may be able to free some memory as you go, but I don't see any place where you can in your code.
The best answer is to probably just increase the memory limit.
One thing you could do is offload the processing to an external script and call that from PHP. Many languages do this sort of processing in a more memory efficient way than PHP.
I've also noticed that the memory_get_peak_usage() isn't always accurate. I would try an experiment to increase the mem_limit to say 256 and run it on the larger data set (the 1.3 million). You are likely to find that it reports below the 128 limit as well.

I could reproduce this issue in a similar case of exporting a CSV file, where my system should have had enough memory, as shown by memory_get_usage(), but ended up with the same fatal error:
Fatal error: Allowed memory size.
I circumvented this issue by outputting the CSV contents into a physical temporary file, that I eventually zipped, before reading it out.
I wrote the file in a loop, so that each iteration wrote only a limited chunk of data, so that I never exceded the memory limit.
After zipping, the compression ratio was such, that I could handle raw files of over 10 times the size I initially hit the wall at. All up, it was a success.
Hint: when creating your archive, don't unlink the archive component(s) before invoking $zip->close(), as this call seems to be the one doing the business. Otherwise you'll end up with an empty archive!
Code sample:
<?php
$zip = new ZipArchive;
if ($zip->open($full_zip_path, ZipArchive::CREATE) === TRUE) {
$zip->addFile($full_csv_path, $csv_name);
$zip->close();
$Response->setHeader("Content-type", "application/zip; charset=utf-8");
$Response->setHeader("Content-disposition", "attachment; filename=" . $zip_name);
$Response->setBody(file_get_contents($full_zip_path));
}
else {
var_dump(error_get_last());
echo utf8_decode("Couldn't create zip archive '$full_zip_path'."), "\r\n";
}
unset($zip);
?>
Attention: when adding items to the zip archive, don't prepend a leading slash to the item's name if using Windows based OS.
Discussion over the original issue:
The Zend file at the line quoted is the
public function outputBody()
{
$body = implode('', $this->_body);
echo $body;
}
from the outputBody() method of the Zend_Controller_Response_Abstract class.
It looks like, however you do it, through echo, or print, or readfile, the output is always captured, and stuck into the response body, even if your turn the response return feature off before the dispatch.
I even tried to use the clearBody() class method, within the echo loop, with in mind that each $response->sendResponse() followed by $response->clearBody() would release memory, but it failed.
The way Zend handles the sending of the response is such that I always got the memory allocation of the full size of the raw CSV file.
Yet to be determined how it would be possible to tell Zend not to "capture" the output buffer.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Read big json file with php - php

Related

How to insert large JSON file to database with PHP

XML fails to load, without any errors message

How do I upload big (video) files in streams to AWS S3 with Laravel 5 and filesystem?

PHPExcel throws Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes)

Allowed memory size exhausted error exporting from mongodb

Categories

Resources