PHP release memory of include file

PHP release memory of include file - php

Can I release the memory generated from the included file, here are my codes
a.php
<?
echo memory_get_usage();
include_once "b.php";
echo memory_get_usage();
$n = new obj();
echo memory_get_usage();
?>
b.php
<?
class obj {
protected $_obj = array{
....
}
function ....
}
?>
I checked that, after I include the b.php, the memory use will increase, which is more higher than create a new object. The result is as below
348832
496824
497072
So, how can I release the included file's memory?

I think PHP cannot de-include (I mean, free the memory space hold by included file) since the contents of the file may be used later. This is a design choice of PHP creators.
After your PHP script finishes, it will free consumed memory, do not worry about it too much unless it really makes too overhead and you have a high volume traffic load.
If there is (let's say a huge) object coming from included file you want to deallocate right now, use unset($obj). It will help some. You should read more about PHP Garbage Collection policy to have a fine tuning.

PHP compiles the code from all your included/required files to opcode for faster execution, this memory cannot be de-allocated, php frees it when script finishes.
If you allocate some memory / object within your second required file, it will take the memory too, but you can unset those variables (but this is not your case, since you are just declaring a class within your b.php).
Also, php must know, that you don't want to include file b.php again (include_ONCE), so it keeps internal record of files which you have included to not try to include them again (that means this also consumes memory).
As ahmet alp balkan said, you can also try to keep memory usage of your script the lowest possible if you deallocate variables that you don't need anymore via unset();
But for performance reasons, PHP doesn't de-allocate this memory right in the moment when you call unset, but rather marks this unsetted variable as "freed".
Then garbage collector comes and frees all freed variables (+ those ones it thinks you won't need anymore). GC is triggered over time.
Try for example this:
<?
echo memory_get_usage();
include_once "b.php";
echo memory_get_usage();
$n = new obj();
echo memory_get_usage();
unset($n);
echo memory_get_usage();
// try to wait for GC
sleep(5);
echo memory_get_usage();
?>

If there is a real danger of running out of memory and you only need to extract specific information from the file, you can use $x = file_get_contents() inside a function or method, then extract the information with preg_match().
This will cost you speed, but $x should be released when the function or method returns. It has the further advantage that the memory taken by the file will not be used at all if the function or method is never called. For example:
/* You need the value of $modx->lang_attribute and there is something like this
in the file: $modx_lang_attribute = 'en'; */
$x = file_get_contents('path/to/file');
$pattern = "/modx_lang_attribute\s*=\s*'(\w\w)'/";
preg_match($pattern, $x, $matches);
return isset($matches[1])? $matches[1] : 'en';
In some cases, you can save more memory by processing the file line by line.
The down side of this is that the file will not be tokenized, so it will take up more memory while in use, but at least you won't be carrying it around for the rest of the program.

Related

Running out of memory on PHP-ML

I am trying to implement a sentiment analysis with PHP-ML. I have a training data set of roughly 15000 entries. I have the code working, however, I have to reduce the data set down to 100 entries for it to work. When I try to run the full data set I get this error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 917504 bytes) in C:\Users\<username>\Documents\Github\phpml\vendor\php-ai\php-ml\src\Phpml\FeatureExtraction\TokenCountVectorizer.php on line 95
The two files I have are index.php:
<?php
declare(strict_types=1);
namespace PhpmlExercise;
include 'vendor/autoload.php';
include 'SentimentAnalysis.php';
use PhpmlExercise\Classification\SentimentAnalysis;
use Phpml\Dataset\CsvDataset;
use Phpml\Dataset\ArrayDataset;
use Phpml\FeatureExtraction\TokenCountVectorizer;
use Phpml\Tokenization\WordTokenizer;
use Phpml\CrossValidation\StratifiedRandomSplit;
use Phpml\FeatureExtraction\TfIdfTransformer;
use Phpml\Metric\Accuracy;
use Phpml\Classification\SVC;
use Phpml\SupportVectorMachine\Kernel;
$dataset = new CsvDataset('clean_tweets2.csv', 1, true);
$vectorizer = new TokenCountVectorizer(new WordTokenizer());
$tfIdfTransformer = new TfIdfTransformer();
$samples = [];
foreach ($dataset->getSamples() as $sample) {
$samples[] = $sample[0];
}
$vectorizer->fit($samples);
$vectorizer->transform($samples);
$tfIdfTransformer->fit($samples);
$tfIdfTransformer->transform($samples);
$dataset = new ArrayDataset($samples, $dataset->getTargets());
$randomSplit = new StratifiedRandomSplit($dataset, 0.1);
$trainingSamples = $randomSplit->getTrainSamples();
$trainingLabels = $randomSplit->getTrainLabels();
$testSamples = $randomSplit->getTestSamples();
$testLabels = $randomSplit->getTestLabels();
$classifier = new SentimentAnalysis();
$classifier->train($randomSplit->getTrainSamples(), $randomSplit->getTrainLabels());
$predictedLabels = $classifier->predict($randomSplit->getTestSamples());
echo 'Accuracy: '.Accuracy::score($randomSplit->getTestLabels(), $predictedLabels);
And SentimentAnalysis.php:
<?php
namespace PhpmlExercise\Classification;
use Phpml\Classification\NaiveBayes;
class SentimentAnalysis
{
protected $classifier;
public function __construct()
{
$this->classifier = new NaiveBayes();
}
public function train($samples, $labels)
{
$this->classifier->train($samples, $labels);
}
public function predict($samples)
{
return $this->classifier->predict($samples);
}
}
I am pretty new to Machine Learning and php-ml so I am not really sure how to deduce where the issue is or if there is even a way to fix this without having a ton of memory. The most I can tell is that the error is happening in TokenCountVectorizer on line 22 of the index file. Does anyone have any idea what may be causing this issue o have run into this before?
The link to PHP-ML is here: http://php-ml.readthedocs.io/en/latest/
Thank you

This error comes from loading more into memory than what PHP is set up to handle in one process. There are other causes, but these are much less common.
In your case, your PHP instance seems configured to allow a maximum of 128MB of memory to be used. In machine learning, that is not very much and if you use large datasets you will most definitely hit that limit.
To alter the amount of memory you allow PHP to use to 1GB you can edit your php.ini file and set
memory_limit = 1024M
If you don't have access to your php.ini file but still have the permissions to change the setting you can do it at runtime using
<?php
ini_set('memory_limit', '1024M');
Alternatively, if you run Apache you can try to set the memory limit using a .htaccess file directive
php_value memory_limit 1024M
Do note that most shared hosting solutions etc have a hard, and often low, limit on the amount of memory you are allowed to use.
Other things you can do to help are
If you load data from files look at fgets and SplFileObject::fgets to load read files line-by-line instead of reading the complete file into memory at once.
Make sure you are running an as up to date version as possible of PHP
Make sure PHP extensions are up to date
Disable PHP extensions you don't use
unset data or large objects that you are done with and don't need in memory anymore. Note that PHP's garbage collector will not necessarily free the memory right away. Instead, by design, it will do that when it feels the CPU cycles required exists or before the script is about to run out of memory, whatever occurs first.
You can use something like echo memory_get_usage() / 1024.0 . ' kb' . PHP_EOL; to print memory usage at a given place in your program to try and profile how much memory different parts use.

creating only new files in PHP without cpu intensive code

In my cache system, I want it where if a new page is requested, a check is made to see if a file exists and if it doesn't then a copy is stored on the server, If it does exist, then it must not be overwritten.
The problem I have is that I may be using functions designed to be slow.
This is part of my current implementation to save files:
if (!file_exists($filename)){$h=fopen($filename,"wb");if ($h){fwrite($h,$c);fclose($h);}}
This is part of my implementation to load files:
if (($m=#filemtime($file)) !== false){
if ($m >= filemtime("sitemodification.file")){
$outp=file_get_contents($file);
header("Content-length:".strlen($outp),true);echo $outp;flush();exit();
}
}
What I want to do is replace this with a better set of functions meant for performance and yet still achieve the same functionality. All caching files including sitemodification.file reside on a ramdisk. I added a flush before exit in hopes that content will be outputted faster.
I can't use direct memory addressing at this time because the file sizes to be stored are all different.
Is there a set of functions I can use that can execute the code I provided faster by at least a few milliseconds, especially the loading files code?
I'm trying to keep my time to first byte low.

First, prefer is_file to file_exists and use file_put_contents:
if ( !is_file($filename) ) {
file_put_contents($filename,$c);
}
Then, use the proper function for this kind of work, readfile:
if ( ($m = #filemtime($file)) !== false && $m >= filemtime('sitemodification.file')) {
header('Content-length:'.filesize($file));
readfile($file);
}
}
You should see a little improvement but keep in mind that file accesses are slow and you check three times for files access before sending any content.

Write data to start of file when not enough space

The problem
How to write data to start of file if I have not enough space to allocate it in RAM and I have not enough space to make it's copy on current FS partition? I.e. I have a file with 100Mb size, I have 30Mb memory limit in my PHP script (and it can not be adjusted in any way) and I have only 50Mb free on my current FS partition. I want to add 2-10 rows to file (it's definitely less than remaining 50Mb FS space)
Some background
I know about XY-problem and agree that it's true for this case. But to reconsider this case I'll need to change significant part of current application (actually, it went from previous team) and, may be, API of other applications that using this file.
My attempt
I have not found solution for this yet. My previous approach was - to use some network buffer (i.e. to connect to some external storage, such as MySQL, for example - it's located on another machine where there is enough space to write file's copy)
The question
So, is it possible to write data to file's start when I have not enough space to allocate it in RAM and have not enough space to create file's copy on FS? Is using network (external) storage the only solution?

Say you want to write 2K to the beginning of a file, your only real option is to:
open the file
read as much from the end of the file as you can fit into memory
write it back into the file 2K later than you started to read
continue with the previous block of data until you have shifted the entire content of the file 2K towards the end
write your 2K to the beginning
To visualize that:
|------------------------|
|-----------------XXXXXXX|
------>
|-------------------XXXXXXX|
|----------XXXXXXX---------|
------>
|------------XXXXXXX-------|
...repeat...
Note that this is a very unsafe operation which edits the file in place. If the process crashes, you're left with a file in an inconsistent state. If you don't have enough room on disk to duplicate a file you arguably shouldn't work with that file and expand your storage capacity first.

#deceze hint me great idea. So I've finished with:
function reverseFile($sIn, $sOut, $bRemoveSource=false)
{
$rFile = #fopen($sIn, 'a+');
$rTemp = #fopen($sOut,'a+');
if(!$rFile || !$rTemp)
{
return false;
}
$iPos = filesize($sIn)-1;
while($iPos>=0)
{
fseek($rFile, $iPos, SEEK_SET);
fwrite($rTemp, $tmp=fread($rFile, 1));
ftruncate($rFile, $iPos>0?$iPos:0);
clearstatcache();
$iPos--;
}
fclose($rFile);
fclose($rTemp);
if($bRemoveSource)
{
unlink($sIn);
}
return true;
}
function writeReverse($sFile, $sData, $sTemp=null)
{
if(!isset($sTemp))
{
$sTemp=$sFile.'.rev';
}
if(reverseFile($sFile, $sTemp, 1))
{
file_put_contents($sTemp, strrev($sData), FILE_APPEND);
return reverseFile($sTemp, $sFile, 1);
}
return false;
}
-it will be quite slow, but recoverable if process is interrupted (simply look to .rev file)
Thanks to all who participated in this.

I've tried code suggested by #AlmaDo, don't try it on real projects, or you will be burn in hell, it is VERY slow. (60MB file - processing 19minutes)
You can run shell script - https://stackoverflow.com/a/9533736/2064576 (processed 420ms, can not understand how much memory does it use)
Or try this php script - https://stackoverflow.com/a/16813550/2064576 (160ms, worked with memory_limit=3M, not worked with 2M)

Caching includes in PHP for iterated reuse

Is there a way to cache a PHP include effectively for reuse, without APC, et al?
Simple (albeit stupid) example:
// rand.php
return rand(0, 999);
// index.php
$file = 'rand.php';
while($i++ < 1000){
echo include($file);
}
Again, while ridiculous, this pair of scripts dumps 1000 random numbers. However, for every iteration, PHP has to hit the filesystem (Correct? There is no inherit caching functionality I've missed, is there?)
Basically, how can I prevent the previous scenario from resulting in 1000 hits to the filesystem?
The only consideration I've come to so far is a goofy one, and it may not prove effective at all (haven't tested, wrote it here, error prone, but you get the idea):
// rand.php
return rand(0, 999);
// index.php
$file = 'rand.php';
$cache = array();
while($i++ < 1000){
if(isset($cache[$file])){
echo eval('?>' . $cache[$file] . '<?php;');
}else{
$cache[$file] = file_get_contents($file);
echo include($file);
}
}
A more realistic and less silly example:
When including files for view generation, given a view file is used a number of times in a given request (a widget or something) is there a realistic way to capture and re-evaluate the view script without a filesystem hit?

This would only make any sense if the include file was accessed across a network.
There is no inherit caching functionality I've missed, is there?
All operating systems are very highly optimized to reduce the amount of physical I/O and to speed up file operations. On a properly configured system in most cases, the system will rarely revert to disk to fetch PHP code. Sit down with a spreadsheet and have a think about how long it would take to process PHP code if every file had to be fetched from disk - it'd be ridiculous, e.g. suppose your script is in /var/www/htdocs/index.php and includes /usr/local/php/resource.inc.php - that's 8 seek operations to just locate the files - #8ms each, that's 64ms to find the files! Run some timings on your test case - you'll see that its running much, much faster than that.

As with Sabeen Malik's answer you could capture the output of the include with output buffering, then concat all of them together, then save that to a file and include the one file each time.
This one collective include could be kept for an hour by checking the file's mod time and then rewriting and re including the includes only once an hour.

I think better design would be something like this:
// rand.php
function get_rand() {
return rand(0, 999);
}
// index.php
$file = 'rand.php';
include($file);
while($i++ < 1000){
echo get_rand();
}

Another option:
while($i++ < 1000) echo rand(0, 999);

Is the same file tokenized every time I include it?

This question is about the PHP parsing engine.
When I include a file multiple times in a single runtime, does PHP tokenize it every time or does it keep a cache and just run the compiled code on subsequent inclusions?
EDIT: More details: I am not using an external caching mechanism and I am dealing with the same file being included multiple times during the same request.
EDIT 2: The file I'm trying to include contains procedural code. I want it to be executed every time I include() it, I am just curious if PHP internally keeps track of the tokenized version of the file for speed reasons.

You should use a PHP bytecode cache such as APC. That will accomplish what you want, to re-use a compiled version of a PHP page on subsequent requests. Otherwise, PHP reads the file, tokenizes and compiles it on every request.

By default the file is parsed every time it is (really) included, even within the same php instance.
But there are opcode caches like e.g. apc
<?php
$i = 'include_test.php';
file_put_contents($i, '<?php $x = 1;');
include $i;
echo $x, ' ';
file_put_contents($i, '<?php $x = 2;');
include $i;
echo $x, ' '1 2(ok, weak proof. PHP could check whether the file's mtime has changed. And that what apc does, I think. But without a cache PHP really doesn't)

Look at include_once().
It will include it again.
Also if you are using objects. Look at __autoload()

I just wrote a basic test, much like VolkerK's. Here's what I tested:
<?php
file_put_contents('include.php','<?php echo $i . "<br />"; ?>');
for($i = 0; $i<10; $i++){
include('include.php');
if($i == 5){
file_put_contents('include.php','<?php echo $i+$i; echo "<br />"; ?>');
}
}
?>
This generated the following:
0
1
2
3
4
5
12
14
16
18
So, unless it caches based on mtime of the file, it seems it parses every include. You would likely want to use include_once() instead of standard include(). Hope that helps!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.