I'm executing long running php script (PHP 7.3) which is parsing XML's in a loop and adds data to MySQL. I've set memory limit to 512MB with ini_set('memory_limit', '512M') inside the script. Problem is that after parsing about half of xml's OOM killer kills php with error:
[Mon May 6 16:52:09 2019] Out of memory: Kill process 12103 (php) score 704 or sacrifice child
[Mon May 6 16:52:09 2019] Killed process 12103 (php) total-vm:12540924kB, anon-rss:12268740kB, file-rss:3408kB, shmem-rss:0kB
I've tried debugging the code with memory_get_usage and php_meminfo. They both show that script do not exceed 100MB of memory at start and end of each loop (xml's have the same size). I'm already unsetting all possible vars at end of each loop.
It looks like PHP used 12.5GB of RAM inspite of the 0.5GB memory limit in the script. I'm expecting PHP to throw a fatal memory exhausted error if memory limit is reached but it never happens.
Any ideas how can I debug this problem?
I have recently met this problem and the trick was to allow PHP as little memory as possible, say 8MB. Otherwise the system triggered an out-of-memory error before PHP did. It did not provide any info so I did not know which part of the script was causing it.
But with 8MB memory limit I got a PHP exception with line number and additional info. It was trying to allocate a rather big chunk of memory (about 200k) whereas nothing in my script was demanding it.
Following the line number it became immediately obvious that one of the functions went recursive thus causing infinite memory consumption.
I have worked importing large xml files, and the require more memory that most people expect. is not only the memory use for the file, but related variables and processes, as #NigelRen suggested, the best way to handle large xml is reading the file in parts. Here is a simple example that I hope can give you an idea on how to do this.
$reader = new \XMLReader();
//https://www.php.net/manual/en/book.xmlreader.php
// path to the file, the LIBXML_NOCDATA will help if you have CDATA in your
// content
$reader->open($xmlPath, 'ISO-8859-1', LIBXML_NOCDATA);
while ($reader->read()) {
if ($reader->nodeType == XMLReader::ELEMENT) {
try {
$xmlNode = new \SimpleXMLElement($reader->readOuterXml());
// do what ever you want
} catch (\Throwable $th) {
// hanlde error
}
}
}
Related
I have some XML files that I need to "transform" in Html and display on screen.
I have developed a simple script that works -almost- all of the times, using DOMDocument and XSLTProcessor.
The problem is that sometimes it gives this error, and the resulting html is only a part of the complete content:
XSLTProcessor::transformToUri(): Memory allocation failed : reaching arbitrary MAX_URI_LENGTH limit in /var/www/test/index.php on line 14
This is a working copy of my script, which gives the same error with the same files.
<?php
$xslPath = 'test.xsl';
$xmlString = file_get_contents('test.xml');
$xml = new DOMDocument;
$xml->loadXML($xmlString);
$xsl = new DOMDocument;
$xsl->load($xslPath);
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);
$proc->transformToURI($xml, 'php://output');
I have tried to save the output to a file, but still I am having the same error, so php://output shouldn't be the a problem. How can I solve this issue?
EDIT:
It looks like the problem lies in the following code. If fact, if I remove the following lines, I am no longer seeing the issue. I hope this helps:
<a name="link" href="data:{$mimeType}/{$format};base64,{normalize-space(Attachment)}" download="{$attachmentName}">
<xsl:value-of select="attachmentName" />
</a>
The attachment itself is a base64 pdf file (which in this case is a ~1mb string, but it could be even more)
EDIT 2: This is what happens if I try to generate the html using the command line xsltproc command:
xsltproc --stringparam target cora_cmd test.xsl test.xml > test.html
URI error : Memory allocation failed : reaching arbitrary MAX_URI_LENGTH limit
URI error : Memory allocation failed : escaping URI value
EDIT 3: I have tried replacing transformToURI with transformToXML, no results. libxml_get_errors() shows no results too.
You need to increase the amount of memory PHP is allowed to use in a script. Start by opening your php.ini file. If you don't know where it is located, run php --ini in the terminal, or look for the row titled Loaded Configuration File and edit that (you may need sudo access).
Locate the variable memory_limit and change it to something larger. (I changed mine from 128M to 512M).
Beware of potential ramifications of doing this: you may run into issues with running out of memory.
I am trying to implement a sentiment analysis with PHP-ML. I have a training data set of roughly 15000 entries. I have the code working, however, I have to reduce the data set down to 100 entries for it to work. When I try to run the full data set I get this error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 917504 bytes) in C:\Users\<username>\Documents\Github\phpml\vendor\php-ai\php-ml\src\Phpml\FeatureExtraction\TokenCountVectorizer.php on line 95
The two files I have are index.php:
<?php
declare(strict_types=1);
namespace PhpmlExercise;
include 'vendor/autoload.php';
include 'SentimentAnalysis.php';
use PhpmlExercise\Classification\SentimentAnalysis;
use Phpml\Dataset\CsvDataset;
use Phpml\Dataset\ArrayDataset;
use Phpml\FeatureExtraction\TokenCountVectorizer;
use Phpml\Tokenization\WordTokenizer;
use Phpml\CrossValidation\StratifiedRandomSplit;
use Phpml\FeatureExtraction\TfIdfTransformer;
use Phpml\Metric\Accuracy;
use Phpml\Classification\SVC;
use Phpml\SupportVectorMachine\Kernel;
$dataset = new CsvDataset('clean_tweets2.csv', 1, true);
$vectorizer = new TokenCountVectorizer(new WordTokenizer());
$tfIdfTransformer = new TfIdfTransformer();
$samples = [];
foreach ($dataset->getSamples() as $sample) {
$samples[] = $sample[0];
}
$vectorizer->fit($samples);
$vectorizer->transform($samples);
$tfIdfTransformer->fit($samples);
$tfIdfTransformer->transform($samples);
$dataset = new ArrayDataset($samples, $dataset->getTargets());
$randomSplit = new StratifiedRandomSplit($dataset, 0.1);
$trainingSamples = $randomSplit->getTrainSamples();
$trainingLabels = $randomSplit->getTrainLabels();
$testSamples = $randomSplit->getTestSamples();
$testLabels = $randomSplit->getTestLabels();
$classifier = new SentimentAnalysis();
$classifier->train($randomSplit->getTrainSamples(), $randomSplit->getTrainLabels());
$predictedLabels = $classifier->predict($randomSplit->getTestSamples());
echo 'Accuracy: '.Accuracy::score($randomSplit->getTestLabels(), $predictedLabels);
And SentimentAnalysis.php:
<?php
namespace PhpmlExercise\Classification;
use Phpml\Classification\NaiveBayes;
class SentimentAnalysis
{
protected $classifier;
public function __construct()
{
$this->classifier = new NaiveBayes();
}
public function train($samples, $labels)
{
$this->classifier->train($samples, $labels);
}
public function predict($samples)
{
return $this->classifier->predict($samples);
}
}
I am pretty new to Machine Learning and php-ml so I am not really sure how to deduce where the issue is or if there is even a way to fix this without having a ton of memory. The most I can tell is that the error is happening in TokenCountVectorizer on line 22 of the index file. Does anyone have any idea what may be causing this issue o have run into this before?
The link to PHP-ML is here: http://php-ml.readthedocs.io/en/latest/
Thank you
This error comes from loading more into memory than what PHP is set up to handle in one process. There are other causes, but these are much less common.
In your case, your PHP instance seems configured to allow a maximum of 128MB of memory to be used. In machine learning, that is not very much and if you use large datasets you will most definitely hit that limit.
To alter the amount of memory you allow PHP to use to 1GB you can edit your php.ini file and set
memory_limit = 1024M
If you don't have access to your php.ini file but still have the permissions to change the setting you can do it at runtime using
<?php
ini_set('memory_limit', '1024M');
Alternatively, if you run Apache you can try to set the memory limit using a .htaccess file directive
php_value memory_limit 1024M
Do note that most shared hosting solutions etc have a hard, and often low, limit on the amount of memory you are allowed to use.
Other things you can do to help are
If you load data from files look at fgets and SplFileObject::fgets to load read files line-by-line instead of reading the complete file into memory at once.
Make sure you are running an as up to date version as possible of PHP
Make sure PHP extensions are up to date
Disable PHP extensions you don't use
unset data or large objects that you are done with and don't need in memory anymore. Note that PHP's garbage collector will not necessarily free the memory right away. Instead, by design, it will do that when it feels the CPU cycles required exists or before the script is about to run out of memory, whatever occurs first.
You can use something like echo memory_get_usage() / 1024.0 . ' kb' . PHP_EOL; to print memory usage at a given place in your program to try and profile how much memory different parts use.
What is the best strategy to debug a "Fatal error: Allowed memory size of 268435456 bytes exhausted " error? This error i'm getting is strange and something is obviously wrong. The function which is causing it is
/**
* Flush all output buffers for PHP 5.2.
*
* Make sure all output buffers are flushed before our singletons our destroyed.
*
* #since 2.2.0
*/
function wp_ob_end_flush_all() {
$levels = ob_get_level();
for ($i=0; $i<$levels; $i++)
ob_end_flush();
}
i simply rebased some code i was working on and started getting this.
what's your strategy to debug this?
Try the below code, if your code reaches the specified number of bytes it just echo it and exit. instead of crashing :
function wp_ob_end_flush_all() {
$levels = ob_get_level();
for ($i=0; $i<$levels; $i++){
ob_end_flush();
if(memory_get_peak_usage() > 268435400) { // 268435456
echo memory_get_peak_usage(). ' reached! now we should stop the script.' ;
break; // or die();
}
}
}
Update
To answer your question, one way to debug leaking is to use xdebug another way it to use the function I gave in the example or wrap your suspicious functions by memory_get_usage and compare the difference.
I was also getting this error upon starting the Apache server with EasyPhp-Devserver-16.1.
In my case, it was because Easy php was trying to load a too large error.log file.
Deleting the old server log in
C:\Program Files (x86)\EasyPHP-Devserver-16.1\eds-binaries\httpserver\apache2418x160331124251\logs
and creating an empty one solved my problem.
Hope this can help others.
Usually, when you reach memory limit, the solution is not bypassing it, enlarging the allowed size, or give a controlled error. what you have to do is to find what is causing that overflow.
You are using 256Mb, and consuming all them in a loop concerning just output buffers, so something is wrong.
first to do, check how many iterations you are trying to do
function wp_ob_end_flush_all() {
$levels = ob_get_level();
die ($levels);
}
In my case, it was because Easyphp was trying to load a too large error.log file.
Deleting the server log file in eds-binaries\httpserver\apache2418x160331124251\logs helped me solve the problem.
The problem
How to write data to start of file if I have not enough space to allocate it in RAM and I have not enough space to make it's copy on current FS partition? I.e. I have a file with 100Mb size, I have 30Mb memory limit in my PHP script (and it can not be adjusted in any way) and I have only 50Mb free on my current FS partition. I want to add 2-10 rows to file (it's definitely less than remaining 50Mb FS space)
Some background
I know about XY-problem and agree that it's true for this case. But to reconsider this case I'll need to change significant part of current application (actually, it went from previous team) and, may be, API of other applications that using this file.
My attempt
I have not found solution for this yet. My previous approach was - to use some network buffer (i.e. to connect to some external storage, such as MySQL, for example - it's located on another machine where there is enough space to write file's copy)
The question
So, is it possible to write data to file's start when I have not enough space to allocate it in RAM and have not enough space to create file's copy on FS? Is using network (external) storage the only solution?
Say you want to write 2K to the beginning of a file, your only real option is to:
open the file
read as much from the end of the file as you can fit into memory
write it back into the file 2K later than you started to read
continue with the previous block of data until you have shifted the entire content of the file 2K towards the end
write your 2K to the beginning
To visualize that:
|------------------------|
|-----------------XXXXXXX|
------>
|-------------------XXXXXXX|
|----------XXXXXXX---------|
------>
|------------XXXXXXX-------|
...repeat...
Note that this is a very unsafe operation which edits the file in place. If the process crashes, you're left with a file in an inconsistent state. If you don't have enough room on disk to duplicate a file you arguably shouldn't work with that file and expand your storage capacity first.
#deceze hint me great idea. So I've finished with:
function reverseFile($sIn, $sOut, $bRemoveSource=false)
{
$rFile = #fopen($sIn, 'a+');
$rTemp = #fopen($sOut,'a+');
if(!$rFile || !$rTemp)
{
return false;
}
$iPos = filesize($sIn)-1;
while($iPos>=0)
{
fseek($rFile, $iPos, SEEK_SET);
fwrite($rTemp, $tmp=fread($rFile, 1));
ftruncate($rFile, $iPos>0?$iPos:0);
clearstatcache();
$iPos--;
}
fclose($rFile);
fclose($rTemp);
if($bRemoveSource)
{
unlink($sIn);
}
return true;
}
function writeReverse($sFile, $sData, $sTemp=null)
{
if(!isset($sTemp))
{
$sTemp=$sFile.'.rev';
}
if(reverseFile($sFile, $sTemp, 1))
{
file_put_contents($sTemp, strrev($sData), FILE_APPEND);
return reverseFile($sTemp, $sFile, 1);
}
return false;
}
-it will be quite slow, but recoverable if process is interrupted (simply look to .rev file)
Thanks to all who participated in this.
I've tried code suggested by #AlmaDo, don't try it on real projects, or you will be burn in hell, it is VERY slow. (60MB file - processing 19minutes)
You can run shell script - https://stackoverflow.com/a/9533736/2064576 (processed 420ms, can not understand how much memory does it use)
Or try this php script - https://stackoverflow.com/a/16813550/2064576 (160ms, worked with memory_limit=3M, not worked with 2M)
I'm trying to parse a 50 megabyte .csv file. The file itself is fine, but I'm trying to get past the massive timeout issues involved. Every is set upload wise, I can easily upload and re-open the file but after the browser timeout, I receive a 500 Internal error.
My guess is I can save the file onto the server, open it and keep a session value of what line I dealt with. After a certain line I reset the connect via refresh and open the file at the line I left off with. Is this a do-able idea? The previous developer made a very inefficient MySQL class and it controls the entire site, so I don't want to write my own class if I don't have to, and I don't want to mess with his class.
TL;DR version: Is it efficient to save the last line I'm currently on of a CSV file that has 38K lines of products then, and after X number of rows, reset the connection and start from where I left off? Or is there another way to parse a Large CSV file without timeouts?
NOTE: It's the PHP script execution time. Currently at 38K lines, it takes about 46 minutes and 5 seconds to run via command line. It works correctly 100% of the time when I remove it from the browser, suggesting that it is a browser timeout. Chrome's timeout is not editable as far as Google has told me, and Firefox's timeout works rarely.
You could do something like this:
<?php
namespace database;
class importcsv
{
private $crud;
public function __construct($dbh, $table)
{
$this->crud = new \database\crud($dbh, $table);
return $this;
}
public function import($columnNames, $csv, $seperator)
{
$lines = explode("\n", $csv);
foreach($lines as $line)
{
\set_time_limit(30);
$line = explode($seperator, $line);
$data = new \stdClass();
foreach($line as $i => $item)
{
if(isset($columnNames[$i])&&!empty($columnNames[$i]))
$data->$columnNames[$i] = $item;
}
#$x++;
$this->crud->create($data);
}
return $x;
}
public function importFile($columnNames, $csvPath, $seperator)
{
if(file_exists($csvPath))
{
$content = file_get_contents($csvPath);
return $this->import($columnNames, $content, $seperator);
}
else
{
// Error
}
}
}
TL;DR: \set_time_limit(30); everytime you loop throu a line might fix your timeout issues.
I suggest to run php from command line and set it as a cron job. This way you don't have to modify your code. There will be no timeout issue and you can easily parse large CSV files.
also check this link
Your post is a little unclear due to the typos and grammar, could you please edit?
If you are saying that the Upload itself is okay, but the delay is in processing of the file, then the easiest thing to do is to parse the file in parallel using multiple threads. You can use the java built-in Executor class, or Quartz or Jetlang to do this.
Find the size of the file or number of lines.
Select a Thread load (Say 1000 lines per thread)
Start an Executor
Read the file in a loop.
For ach 1000 lines, create a Runnable and load it to the Executor
Start the Executor
Wait till all threads are finished
Each runnable does this:
Fetch a connection
Insert the 1000 lines
Log the results
Close the connection