I am trying to implement a sentiment analysis with PHP-ML. I have a training data set of roughly 15000 entries. I have the code working, however, I have to reduce the data set down to 100 entries for it to work. When I try to run the full data set I get this error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 917504 bytes) in C:\Users\<username>\Documents\Github\phpml\vendor\php-ai\php-ml\src\Phpml\FeatureExtraction\TokenCountVectorizer.php on line 95
The two files I have are index.php:
<?php
declare(strict_types=1);
namespace PhpmlExercise;
include 'vendor/autoload.php';
include 'SentimentAnalysis.php';
use PhpmlExercise\Classification\SentimentAnalysis;
use Phpml\Dataset\CsvDataset;
use Phpml\Dataset\ArrayDataset;
use Phpml\FeatureExtraction\TokenCountVectorizer;
use Phpml\Tokenization\WordTokenizer;
use Phpml\CrossValidation\StratifiedRandomSplit;
use Phpml\FeatureExtraction\TfIdfTransformer;
use Phpml\Metric\Accuracy;
use Phpml\Classification\SVC;
use Phpml\SupportVectorMachine\Kernel;
$dataset = new CsvDataset('clean_tweets2.csv', 1, true);
$vectorizer = new TokenCountVectorizer(new WordTokenizer());
$tfIdfTransformer = new TfIdfTransformer();
$samples = [];
foreach ($dataset->getSamples() as $sample) {
$samples[] = $sample[0];
}
$vectorizer->fit($samples);
$vectorizer->transform($samples);
$tfIdfTransformer->fit($samples);
$tfIdfTransformer->transform($samples);
$dataset = new ArrayDataset($samples, $dataset->getTargets());
$randomSplit = new StratifiedRandomSplit($dataset, 0.1);
$trainingSamples = $randomSplit->getTrainSamples();
$trainingLabels = $randomSplit->getTrainLabels();
$testSamples = $randomSplit->getTestSamples();
$testLabels = $randomSplit->getTestLabels();
$classifier = new SentimentAnalysis();
$classifier->train($randomSplit->getTrainSamples(), $randomSplit->getTrainLabels());
$predictedLabels = $classifier->predict($randomSplit->getTestSamples());
echo 'Accuracy: '.Accuracy::score($randomSplit->getTestLabels(), $predictedLabels);
And SentimentAnalysis.php:
<?php
namespace PhpmlExercise\Classification;
use Phpml\Classification\NaiveBayes;
class SentimentAnalysis
{
protected $classifier;
public function __construct()
{
$this->classifier = new NaiveBayes();
}
public function train($samples, $labels)
{
$this->classifier->train($samples, $labels);
}
public function predict($samples)
{
return $this->classifier->predict($samples);
}
}
I am pretty new to Machine Learning and php-ml so I am not really sure how to deduce where the issue is or if there is even a way to fix this without having a ton of memory. The most I can tell is that the error is happening in TokenCountVectorizer on line 22 of the index file. Does anyone have any idea what may be causing this issue o have run into this before?
The link to PHP-ML is here: http://php-ml.readthedocs.io/en/latest/
Thank you
This error comes from loading more into memory than what PHP is set up to handle in one process. There are other causes, but these are much less common.
In your case, your PHP instance seems configured to allow a maximum of 128MB of memory to be used. In machine learning, that is not very much and if you use large datasets you will most definitely hit that limit.
To alter the amount of memory you allow PHP to use to 1GB you can edit your php.ini file and set
memory_limit = 1024M
If you don't have access to your php.ini file but still have the permissions to change the setting you can do it at runtime using
<?php
ini_set('memory_limit', '1024M');
Alternatively, if you run Apache you can try to set the memory limit using a .htaccess file directive
php_value memory_limit 1024M
Do note that most shared hosting solutions etc have a hard, and often low, limit on the amount of memory you are allowed to use.
Other things you can do to help are
If you load data from files look at fgets and SplFileObject::fgets to load read files line-by-line instead of reading the complete file into memory at once.
Make sure you are running an as up to date version as possible of PHP
Make sure PHP extensions are up to date
Disable PHP extensions you don't use
unset data or large objects that you are done with and don't need in memory anymore. Note that PHP's garbage collector will not necessarily free the memory right away. Instead, by design, it will do that when it feels the CPU cycles required exists or before the script is about to run out of memory, whatever occurs first.
You can use something like echo memory_get_usage() / 1024.0 . ' kb' . PHP_EOL; to print memory usage at a given place in your program to try and profile how much memory different parts use.
Related
I'm executing long running php script (PHP 7.3) which is parsing XML's in a loop and adds data to MySQL. I've set memory limit to 512MB with ini_set('memory_limit', '512M') inside the script. Problem is that after parsing about half of xml's OOM killer kills php with error:
[Mon May 6 16:52:09 2019] Out of memory: Kill process 12103 (php) score 704 or sacrifice child
[Mon May 6 16:52:09 2019] Killed process 12103 (php) total-vm:12540924kB, anon-rss:12268740kB, file-rss:3408kB, shmem-rss:0kB
I've tried debugging the code with memory_get_usage and php_meminfo. They both show that script do not exceed 100MB of memory at start and end of each loop (xml's have the same size). I'm already unsetting all possible vars at end of each loop.
It looks like PHP used 12.5GB of RAM inspite of the 0.5GB memory limit in the script. I'm expecting PHP to throw a fatal memory exhausted error if memory limit is reached but it never happens.
Any ideas how can I debug this problem?
I have recently met this problem and the trick was to allow PHP as little memory as possible, say 8MB. Otherwise the system triggered an out-of-memory error before PHP did. It did not provide any info so I did not know which part of the script was causing it.
But with 8MB memory limit I got a PHP exception with line number and additional info. It was trying to allocate a rather big chunk of memory (about 200k) whereas nothing in my script was demanding it.
Following the line number it became immediately obvious that one of the functions went recursive thus causing infinite memory consumption.
I have worked importing large xml files, and the require more memory that most people expect. is not only the memory use for the file, but related variables and processes, as #NigelRen suggested, the best way to handle large xml is reading the file in parts. Here is a simple example that I hope can give you an idea on how to do this.
$reader = new \XMLReader();
//https://www.php.net/manual/en/book.xmlreader.php
// path to the file, the LIBXML_NOCDATA will help if you have CDATA in your
// content
$reader->open($xmlPath, 'ISO-8859-1', LIBXML_NOCDATA);
while ($reader->read()) {
if ($reader->nodeType == XMLReader::ELEMENT) {
try {
$xmlNode = new \SimpleXMLElement($reader->readOuterXml());
// do what ever you want
} catch (\Throwable $th) {
// hanlde error
}
}
}
What is the best strategy to debug a "Fatal error: Allowed memory size of 268435456 bytes exhausted " error? This error i'm getting is strange and something is obviously wrong. The function which is causing it is
/**
* Flush all output buffers for PHP 5.2.
*
* Make sure all output buffers are flushed before our singletons our destroyed.
*
* #since 2.2.0
*/
function wp_ob_end_flush_all() {
$levels = ob_get_level();
for ($i=0; $i<$levels; $i++)
ob_end_flush();
}
i simply rebased some code i was working on and started getting this.
what's your strategy to debug this?
Try the below code, if your code reaches the specified number of bytes it just echo it and exit. instead of crashing :
function wp_ob_end_flush_all() {
$levels = ob_get_level();
for ($i=0; $i<$levels; $i++){
ob_end_flush();
if(memory_get_peak_usage() > 268435400) { // 268435456
echo memory_get_peak_usage(). ' reached! now we should stop the script.' ;
break; // or die();
}
}
}
Update
To answer your question, one way to debug leaking is to use xdebug another way it to use the function I gave in the example or wrap your suspicious functions by memory_get_usage and compare the difference.
I was also getting this error upon starting the Apache server with EasyPhp-Devserver-16.1.
In my case, it was because Easy php was trying to load a too large error.log file.
Deleting the old server log in
C:\Program Files (x86)\EasyPHP-Devserver-16.1\eds-binaries\httpserver\apache2418x160331124251\logs
and creating an empty one solved my problem.
Hope this can help others.
Usually, when you reach memory limit, the solution is not bypassing it, enlarging the allowed size, or give a controlled error. what you have to do is to find what is causing that overflow.
You are using 256Mb, and consuming all them in a loop concerning just output buffers, so something is wrong.
first to do, check how many iterations you are trying to do
function wp_ob_end_flush_all() {
$levels = ob_get_level();
die ($levels);
}
In my case, it was because Easyphp was trying to load a too large error.log file.
Deleting the server log file in eds-binaries\httpserver\apache2418x160331124251\logs helped me solve the problem.
The problem
How to write data to start of file if I have not enough space to allocate it in RAM and I have not enough space to make it's copy on current FS partition? I.e. I have a file with 100Mb size, I have 30Mb memory limit in my PHP script (and it can not be adjusted in any way) and I have only 50Mb free on my current FS partition. I want to add 2-10 rows to file (it's definitely less than remaining 50Mb FS space)
Some background
I know about XY-problem and agree that it's true for this case. But to reconsider this case I'll need to change significant part of current application (actually, it went from previous team) and, may be, API of other applications that using this file.
My attempt
I have not found solution for this yet. My previous approach was - to use some network buffer (i.e. to connect to some external storage, such as MySQL, for example - it's located on another machine where there is enough space to write file's copy)
The question
So, is it possible to write data to file's start when I have not enough space to allocate it in RAM and have not enough space to create file's copy on FS? Is using network (external) storage the only solution?
Say you want to write 2K to the beginning of a file, your only real option is to:
open the file
read as much from the end of the file as you can fit into memory
write it back into the file 2K later than you started to read
continue with the previous block of data until you have shifted the entire content of the file 2K towards the end
write your 2K to the beginning
To visualize that:
|------------------------|
|-----------------XXXXXXX|
------>
|-------------------XXXXXXX|
|----------XXXXXXX---------|
------>
|------------XXXXXXX-------|
...repeat...
Note that this is a very unsafe operation which edits the file in place. If the process crashes, you're left with a file in an inconsistent state. If you don't have enough room on disk to duplicate a file you arguably shouldn't work with that file and expand your storage capacity first.
#deceze hint me great idea. So I've finished with:
function reverseFile($sIn, $sOut, $bRemoveSource=false)
{
$rFile = #fopen($sIn, 'a+');
$rTemp = #fopen($sOut,'a+');
if(!$rFile || !$rTemp)
{
return false;
}
$iPos = filesize($sIn)-1;
while($iPos>=0)
{
fseek($rFile, $iPos, SEEK_SET);
fwrite($rTemp, $tmp=fread($rFile, 1));
ftruncate($rFile, $iPos>0?$iPos:0);
clearstatcache();
$iPos--;
}
fclose($rFile);
fclose($rTemp);
if($bRemoveSource)
{
unlink($sIn);
}
return true;
}
function writeReverse($sFile, $sData, $sTemp=null)
{
if(!isset($sTemp))
{
$sTemp=$sFile.'.rev';
}
if(reverseFile($sFile, $sTemp, 1))
{
file_put_contents($sTemp, strrev($sData), FILE_APPEND);
return reverseFile($sTemp, $sFile, 1);
}
return false;
}
-it will be quite slow, but recoverable if process is interrupted (simply look to .rev file)
Thanks to all who participated in this.
I've tried code suggested by #AlmaDo, don't try it on real projects, or you will be burn in hell, it is VERY slow. (60MB file - processing 19minutes)
You can run shell script - https://stackoverflow.com/a/9533736/2064576 (processed 420ms, can not understand how much memory does it use)
Or try this php script - https://stackoverflow.com/a/16813550/2064576 (160ms, worked with memory_limit=3M, not worked with 2M)
I've seen many questions about how to efficiently use PHP to download files rather than allowing direct HTTP requests (to keep files secure, to track downloads, etc.).
The answer is almost always PHP readfile().
Downloading large files reliably in PHP
How to force download of big files without using too much memory?
Best way to transparently log downloads?
BUT, although it works great during testing with huge files, when it's on a live site with hundreds of users, downloads start to hang and PHP memory limits are exhausted.
So what is it about how readfile() works that causes memory to blow up so bad when traffic is high? I thought it's supposed to bypass heavy use of PHP memory by writing directly to the output buffer?
EDIT: (To clarify, I'm looking for a "why", not "what can I do". I think that Apache's mod_xsendfile is the best way to circumvent)
Description
int readfile ( string $filename [, bool $use_include_path = false [, resource $context ]] )
Reads a file and writes it to the output buffer*.
PHP has to read the file and it writes to the output buffer.
So, for 300Mb file, no matter what the implementation you wrote (by many small segments, or by 1 big chunk) PHP has to read through 300Mb of file eventually.
If multiple user has to download the file, there will be a problem.
(In one server, hosting providers will limit memory given to each hosting user. With such limited memory, using buffer is not going to be a good idea. )
I think using the direct link to download a file is a much better approach for big files.
If you have output buffering on than use ob_end_flush() right before the call to readfile()
header(...);
ob_end_flush();
#readfile($file);
As mentioned here: "Allowed memory .. exhausted" when using readfile, the following block of code at the top of the php file did the trick for me.
This will checks if php output buffering is active. If so it turns it off.
if (ob_get_level()) {
ob_end_clean();
}
You might want to turn off output buffering altogether for that particular location, using PHP's output_buffering configuration directive.
Apache example:
<Directory "/your/downloadable/files">
...
php_admin_value output_buffering "0"
...
</Directory>
"Off" as the value seems to work as well, while it really should throw an error. At least according to how other types are converted to booleans in PHP. *shrugs*
Came up with this idea in the past (as part of my library) to avoid high memory usage:
function suTunnelStream( $sUrl, $sMimeType, $sCharType = null )
{
$f = #fopen( $sUrl, 'rb' );
if( $f === false )
{ return false; }
$b = false;
$u = true;
while( $u !== false && !feof($f ))
{
$u = #fread( $f, 1024 );
if( $u !== false )
{
if( !$b )
{ $b = true;
suClearOutputBuffers();
suCachedHeader( 0, $sMimeType, $sCharType, null, !suIsValidString($sCharType)?('content-disposition: attachment; filename="'.suUniqueId($sUrl).'"'):null );
}
echo $u;
}
}
#fclose( $f );
return ( $b && $u !== false );
}
Maybe this can give you some inspiration.
Well, it is memory intensive function. I would pipe users to a static server that has specific rule set in place to control downloads instead of using readfile().
If that's not an option add more RAM to satisfy the load or introduce queuing system that gracefully controls server usage.
Can I release the memory generated from the included file, here are my codes
a.php
<?
echo memory_get_usage();
include_once "b.php";
echo memory_get_usage();
$n = new obj();
echo memory_get_usage();
?>
b.php
<?
class obj {
protected $_obj = array{
....
}
function ....
}
?>
I checked that, after I include the b.php, the memory use will increase, which is more higher than create a new object. The result is as below
348832
496824
497072
So, how can I release the included file's memory?
I think PHP cannot de-include (I mean, free the memory space hold by included file) since the contents of the file may be used later. This is a design choice of PHP creators.
After your PHP script finishes, it will free consumed memory, do not worry about it too much unless it really makes too overhead and you have a high volume traffic load.
If there is (let's say a huge) object coming from included file you want to deallocate right now, use unset($obj). It will help some. You should read more about PHP Garbage Collection policy to have a fine tuning.
PHP compiles the code from all your included/required files to opcode for faster execution, this memory cannot be de-allocated, php frees it when script finishes.
If you allocate some memory / object within your second required file, it will take the memory too, but you can unset those variables (but this is not your case, since you are just declaring a class within your b.php).
Also, php must know, that you don't want to include file b.php again (include_ONCE), so it keeps internal record of files which you have included to not try to include them again (that means this also consumes memory).
As ahmet alp balkan said, you can also try to keep memory usage of your script the lowest possible if you deallocate variables that you don't need anymore via unset();
But for performance reasons, PHP doesn't de-allocate this memory right in the moment when you call unset, but rather marks this unsetted variable as "freed".
Then garbage collector comes and frees all freed variables (+ those ones it thinks you won't need anymore). GC is triggered over time.
Try for example this:
<?
echo memory_get_usage();
include_once "b.php";
echo memory_get_usage();
$n = new obj();
echo memory_get_usage();
unset($n);
echo memory_get_usage();
// try to wait for GC
sleep(5);
echo memory_get_usage();
?>
If there is a real danger of running out of memory and you only need to extract specific information from the file, you can use $x = file_get_contents() inside a function or method, then extract the information with preg_match().
This will cost you speed, but $x should be released when the function or method returns. It has the further advantage that the memory taken by the file will not be used at all if the function or method is never called. For example:
/* You need the value of $modx->lang_attribute and there is something like this
in the file: $modx_lang_attribute = 'en'; */
$x = file_get_contents('path/to/file');
$pattern = "/modx_lang_attribute\s*=\s*'(\w\w)'/";
preg_match($pattern, $x, $matches);
return isset($matches[1])? $matches[1] : 'en';
In some cases, you can save more memory by processing the file line by line.
The down side of this is that the file will not be tokenized, so it will take up more memory while in use, but at least you won't be carrying it around for the rest of the program.