PHP different behavior when called between browser and curl

PHP different behavior when called between browser and curl - php

I have a strange behavior and I hope that someone here can help me.
I make the same call once with a browser and once with curl.
Example
browser: https://www.myurl.com/index.php?foo=bar
curl: curl https://www.myurl.com/index.php?foo=bar
If I call the url with the browser (no matter which), then everything works as expected. My memory overflows when I call the url with curl.
Code snippet:
private function doSomething()
{
$this->doSomeLog('Start process');
foreach ($foo as $bar) {
$this->doLogMemory('Memory usage: ' . memory_get_usage());
$this->callMethodA();
$this->callMethodB();
// some more code
}
}
// Output browser (memory stays at the same level):
>Start process
>Memory usage: 25384948
>Memory usage: 25386731
>Memory usage: 25396326
>Memory usage: 25396326
// Output curl (memory grows steadily):
>Start process
>Memory usage: 25384948
>Memory usage: 162495865
>Memory usage: 236915437
>Memory usage: 426158496
Does anyone have any idea why this might be and how to avoid/fix it?

I found the mistake. It was an entry in php.ini.
zend.assertions had an entry of 1. After I set the value to -1, the memory didn't increase any further.

Related

Running out of memory on PHP-ML

I am trying to implement a sentiment analysis with PHP-ML. I have a training data set of roughly 15000 entries. I have the code working, however, I have to reduce the data set down to 100 entries for it to work. When I try to run the full data set I get this error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 917504 bytes) in C:\Users\<username>\Documents\Github\phpml\vendor\php-ai\php-ml\src\Phpml\FeatureExtraction\TokenCountVectorizer.php on line 95
The two files I have are index.php:
<?php
declare(strict_types=1);
namespace PhpmlExercise;
include 'vendor/autoload.php';
include 'SentimentAnalysis.php';
use PhpmlExercise\Classification\SentimentAnalysis;
use Phpml\Dataset\CsvDataset;
use Phpml\Dataset\ArrayDataset;
use Phpml\FeatureExtraction\TokenCountVectorizer;
use Phpml\Tokenization\WordTokenizer;
use Phpml\CrossValidation\StratifiedRandomSplit;
use Phpml\FeatureExtraction\TfIdfTransformer;
use Phpml\Metric\Accuracy;
use Phpml\Classification\SVC;
use Phpml\SupportVectorMachine\Kernel;
$dataset = new CsvDataset('clean_tweets2.csv', 1, true);
$vectorizer = new TokenCountVectorizer(new WordTokenizer());
$tfIdfTransformer = new TfIdfTransformer();
$samples = [];
foreach ($dataset->getSamples() as $sample) {
$samples[] = $sample[0];
}
$vectorizer->fit($samples);
$vectorizer->transform($samples);
$tfIdfTransformer->fit($samples);
$tfIdfTransformer->transform($samples);
$dataset = new ArrayDataset($samples, $dataset->getTargets());
$randomSplit = new StratifiedRandomSplit($dataset, 0.1);
$trainingSamples = $randomSplit->getTrainSamples();
$trainingLabels = $randomSplit->getTrainLabels();
$testSamples = $randomSplit->getTestSamples();
$testLabels = $randomSplit->getTestLabels();
$classifier = new SentimentAnalysis();
$classifier->train($randomSplit->getTrainSamples(), $randomSplit->getTrainLabels());
$predictedLabels = $classifier->predict($randomSplit->getTestSamples());
echo 'Accuracy: '.Accuracy::score($randomSplit->getTestLabels(), $predictedLabels);
And SentimentAnalysis.php:
<?php
namespace PhpmlExercise\Classification;
use Phpml\Classification\NaiveBayes;
class SentimentAnalysis
{
protected $classifier;
public function __construct()
{
$this->classifier = new NaiveBayes();
}
public function train($samples, $labels)
{
$this->classifier->train($samples, $labels);
}
public function predict($samples)
{
return $this->classifier->predict($samples);
}
}
I am pretty new to Machine Learning and php-ml so I am not really sure how to deduce where the issue is or if there is even a way to fix this without having a ton of memory. The most I can tell is that the error is happening in TokenCountVectorizer on line 22 of the index file. Does anyone have any idea what may be causing this issue o have run into this before?
The link to PHP-ML is here: http://php-ml.readthedocs.io/en/latest/
Thank you

This error comes from loading more into memory than what PHP is set up to handle in one process. There are other causes, but these are much less common.
In your case, your PHP instance seems configured to allow a maximum of 128MB of memory to be used. In machine learning, that is not very much and if you use large datasets you will most definitely hit that limit.
To alter the amount of memory you allow PHP to use to 1GB you can edit your php.ini file and set
memory_limit = 1024M
If you don't have access to your php.ini file but still have the permissions to change the setting you can do it at runtime using
<?php
ini_set('memory_limit', '1024M');
Alternatively, if you run Apache you can try to set the memory limit using a .htaccess file directive
php_value memory_limit 1024M
Do note that most shared hosting solutions etc have a hard, and often low, limit on the amount of memory you are allowed to use.
Other things you can do to help are
If you load data from files look at fgets and SplFileObject::fgets to load read files line-by-line instead of reading the complete file into memory at once.
Make sure you are running an as up to date version as possible of PHP
Make sure PHP extensions are up to date
Disable PHP extensions you don't use
unset data or large objects that you are done with and don't need in memory anymore. Note that PHP's garbage collector will not necessarily free the memory right away. Instead, by design, it will do that when it feels the CPU cycles required exists or before the script is about to run out of memory, whatever occurs first.
You can use something like echo memory_get_usage() / 1024.0 . ' kb' . PHP_EOL; to print memory usage at a given place in your program to try and profile how much memory different parts use.

PHP exec ImageMagick always returns 0

I'm trying to get ImageMagick to count the amount of pages in a PDF file for me. The function is as follows:
<?php
function countPdfPages($filepath)
{
$magick = "identify -format %n ".$filepath;
exec($magick, $debug, $result);
return $result;
}
?>
However, that function always returns 0. I have verified that ImageMagick is running properly, so that shouldn't be a problem. Am I not using exec() properly? Should I retrieve the output in another way? I've also tried using $debug, but that didn't give me any output, oddly.
I bet I'm doing something stupid here, but I just don't see it. Can anyone give me a push in the right direction? Thanks!

As noted in the man page, exec provides the return status of the executed command via the third argument. A value of 0 means that it exited normally. It sounds like you should be using something like popen.
Here's an example lifted from Example #3 of the fread man page (edited to use popen):
<?php
// For PHP 5 and up
$handle = popen("identify -format %n myfile.jpg", "r");
$contents = stream_get_contents($handle);
pclose($handle);
// $contents is the output of the 'identify' process
?>

Why does readfile() exhaust PHP memory?

I've seen many questions about how to efficiently use PHP to download files rather than allowing direct HTTP requests (to keep files secure, to track downloads, etc.).
The answer is almost always PHP readfile().
Downloading large files reliably in PHP
How to force download of big files without using too much memory?
Best way to transparently log downloads?
BUT, although it works great during testing with huge files, when it's on a live site with hundreds of users, downloads start to hang and PHP memory limits are exhausted.
So what is it about how readfile() works that causes memory to blow up so bad when traffic is high? I thought it's supposed to bypass heavy use of PHP memory by writing directly to the output buffer?
EDIT: (To clarify, I'm looking for a "why", not "what can I do". I think that Apache's mod_xsendfile is the best way to circumvent)

Description
int readfile ( string $filename [, bool $use_include_path = false [, resource $context ]] )
Reads a file and writes it to the output buffer*.
PHP has to read the file and it writes to the output buffer.
So, for 300Mb file, no matter what the implementation you wrote (by many small segments, or by 1 big chunk) PHP has to read through 300Mb of file eventually.
If multiple user has to download the file, there will be a problem.
(In one server, hosting providers will limit memory given to each hosting user. With such limited memory, using buffer is not going to be a good idea. )
I think using the direct link to download a file is a much better approach for big files.

If you have output buffering on than use ob_end_flush() right before the call to readfile()
header(...);
ob_end_flush();
#readfile($file);

As mentioned here: "Allowed memory .. exhausted" when using readfile, the following block of code at the top of the php file did the trick for me.
This will checks if php output buffering is active. If so it turns it off.
if (ob_get_level()) {
ob_end_clean();
}

You might want to turn off output buffering altogether for that particular location, using PHP's output_buffering configuration directive.
Apache example:
<Directory "/your/downloadable/files">
...
php_admin_value output_buffering "0"
...
</Directory>
"Off" as the value seems to work as well, while it really should throw an error. At least according to how other types are converted to booleans in PHP. *shrugs*

Came up with this idea in the past (as part of my library) to avoid high memory usage:
function suTunnelStream( $sUrl, $sMimeType, $sCharType = null )
{
$f = #fopen( $sUrl, 'rb' );
if( $f === false )
{ return false; }
$b = false;
$u = true;
while( $u !== false && !feof($f ))
{
$u = #fread( $f, 1024 );
if( $u !== false )
{
if( !$b )
{ $b = true;
suClearOutputBuffers();
suCachedHeader( 0, $sMimeType, $sCharType, null, !suIsValidString($sCharType)?('content-disposition: attachment; filename="'.suUniqueId($sUrl).'"'):null );
}
echo $u;
}
}
#fclose( $f );
return ( $b && $u !== false );
}
Maybe this can give you some inspiration.

Well, it is memory intensive function. I would pipe users to a static server that has specific rule set in place to control downloads instead of using readfile().
If that's not an option add more RAM to satisfy the load or introduce queuing system that gracefully controls server usage.

PHP release memory of include file

Can I release the memory generated from the included file, here are my codes
a.php
<?
echo memory_get_usage();
include_once "b.php";
echo memory_get_usage();
$n = new obj();
echo memory_get_usage();
?>
b.php
<?
class obj {
protected $_obj = array{
....
}
function ....
}
?>
I checked that, after I include the b.php, the memory use will increase, which is more higher than create a new object. The result is as below
348832
496824
497072
So, how can I release the included file's memory?

I think PHP cannot de-include (I mean, free the memory space hold by included file) since the contents of the file may be used later. This is a design choice of PHP creators.
After your PHP script finishes, it will free consumed memory, do not worry about it too much unless it really makes too overhead and you have a high volume traffic load.
If there is (let's say a huge) object coming from included file you want to deallocate right now, use unset($obj). It will help some. You should read more about PHP Garbage Collection policy to have a fine tuning.

PHP compiles the code from all your included/required files to opcode for faster execution, this memory cannot be de-allocated, php frees it when script finishes.
If you allocate some memory / object within your second required file, it will take the memory too, but you can unset those variables (but this is not your case, since you are just declaring a class within your b.php).
Also, php must know, that you don't want to include file b.php again (include_ONCE), so it keeps internal record of files which you have included to not try to include them again (that means this also consumes memory).
As ahmet alp balkan said, you can also try to keep memory usage of your script the lowest possible if you deallocate variables that you don't need anymore via unset();
But for performance reasons, PHP doesn't de-allocate this memory right in the moment when you call unset, but rather marks this unsetted variable as "freed".
Then garbage collector comes and frees all freed variables (+ those ones it thinks you won't need anymore). GC is triggered over time.
Try for example this:
<?
echo memory_get_usage();
include_once "b.php";
echo memory_get_usage();
$n = new obj();
echo memory_get_usage();
unset($n);
echo memory_get_usage();
// try to wait for GC
sleep(5);
echo memory_get_usage();
?>

If there is a real danger of running out of memory and you only need to extract specific information from the file, you can use $x = file_get_contents() inside a function or method, then extract the information with preg_match().
This will cost you speed, but $x should be released when the function or method returns. It has the further advantage that the memory taken by the file will not be used at all if the function or method is never called. For example:
/* You need the value of $modx->lang_attribute and there is something like this
in the file: $modx_lang_attribute = 'en'; */
$x = file_get_contents('path/to/file');
$pattern = "/modx_lang_attribute\s*=\s*'(\w\w)'/";
preg_match($pattern, $x, $matches);
return isset($matches[1])? $matches[1] : 'en';
In some cases, you can save more memory by processing the file line by line.
The down side of this is that the file will not be tokenized, so it will take up more memory while in use, but at least you won't be carrying it around for the rest of the program.

Get MD5 Checksum for Very Large Files

I've written a script that reads through all files in a directory and returns md5 hash for each file. However, it renders nothing for a rather large file. I assume that the interpreter has some value set for maximum processing time, and since it takes too long to get this value, it just skips along to other files. Is there anyway to get an md5 checksum for large files through PHP? If not, could it be done through a chron job with cpanel? I gave it a shot there but it doesn't seem that my md5sum command has ever been processed: I never get an email with the hash. Here's the PHP I've already written. It's a very simple code and works file for files of a reasonable size:
function md5_dir($dir) {
if (is_dir($dir)) {
if ($dh = opendir($dir)) {
while (($file = readdir($dh)) !== false) {
echo nl2br($file . "\n" . md5_file($file) . "\n\n");
}
closedir($dh);
}
}
}

Make sure to use escapeshellarg ( http://us3.php.net/manual/en/function.escapeshellarg.php ) if you decide to use a shell_exec() or system() call. I.e.,
shell_exec('md5sum -b ' . escapeshellarg($filename));

While i couldn't reproduce it with PHP 5.2 or 5.3 with a 2GB file the issue seems to come up on 32bit PHP builds.
Even so it's not a really nice solution you could try to let the system to the hasing
echo system("md5sum test.txt");
46d6a7bcbcf7ae0501da341cb3bae27c test.txt

If you're hitting an execution time limit or maximum execution time, PHP should be throwing an error message to that effect. Check your error logs. If you are hitting a limit, you can set the maximum values for PHP memory usage and execution time in your php.ini file:
memory_limit = 16M
will set max memory usage to 16 megs. For maximum execution time:
max_execution_time = 30
will set maximum execution time to 30 seconds.

you could achieve it with command line
shell_exec('md5sum -b '. $fileName);

FYI....in case someone needs a fast md5()check-sum. PHP is pretty fast even with the larger files. This returns the check-sum on Linux Mint .iso (size 880MB) in 3 sec.
<?php
// checksum
$path = $_SERVER['DOCUMENT_ROOT']; // get upload folder path
$file = $path."/somefolder/linux-mint.iso"; // any file
echo md5_file($file);
?>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.