php memory limit test - php

It seems this is an ever unsolved question: I did a simple test to the memory limits in my local machine (from command line):
<?php
for ($i = 0; $i < 4000*4000; $i ++) {
$R[$i] = 1.00001;
}
?>
and I have memory limit set at 128M. But PHP still sends off "Allowed memory exhausted" message. Why?

Well I wouldn't say ever unsolved question. There are a few reasons for it-PHP is a very insufficient language in terms of memory management-it's no secret. Now the code you provided could be optimized a little bit, but not enough to make a difference. For example take the multiplication in the for loop outside and store the value in a variable. Otherwise you are performing that mathematical operation on each loop. But that will not make any significant difference - 2310451248 bytes as it is and 2310451144 bytes if you do it as I proposed. But the point remains - PHP is not a low level language so you can't expect it to have the same efficiency as C for example. In your particular case, the required memory to perform all this is a little over 2 GB(2.15 gb)
<?php
ini_set('memory_limit', '4096M');
$ii = 4000*4000;
//$R = new SplFixedArray($ii);
$R = array();
for ($i = 0; $i < $ii; $i ++) {
$R[$i] = 1.00001;
}
echo humanize(memory_get_usage())."\n";
function humanize($size)
{
$unit=array('b','kb','mb','gb','tb','pb');
return round($size/pow(1024,($i=floor(log($size,1024)))),2).' '.$unit[$i];
}
?>
But using SplFixedArray things change a lot:
<?php
ini_set('memory_limit', '4096M');
$ii = 4000*4000;
$R = new SplFixedArray($ii);
for ($i = 0; $i < $ii; $i ++) {
$R[$i] = 1.00001;
}
echo humanize(memory_get_usage())."\n";
function humanize($size)
{
$unit=array('b','kb','mb','gb','tb','pb');
return round($size/pow(1024,($i=floor(log($size,1024)))),2).' '.$unit[$i];
}
?>
Which requires "only" 854.72 mb.
This is one of the main reasons why companies who deal with larger amounts of data in general avoid using PHP and go for languages such as python instead. There is a great article describing all of the problems and causes around this topic, found here. Hope that helps.

Related

Efficient way to write 2^20 files to ext4 filesystem

I'm trying to write 2^20 files with about 45k lines in each, each line is a word.
I need this to be flat, no sql for storage purpose, I optimized it not to take too much disk space.
As of now the files are written in 16 directory, making it 65536 file in each directory. The file names are 4 characters long.
That didn't seem too much for me, my script takes a huge file, read each line, and then write each line in its dedicated file.
I first tried to make this with 2^16 files, so 4096 files in each directory, worked like a charm but I wanted to make the lookup faster. It seems to hit a wall at this number of files. Yet from what I saw on the internet, it's not even close the ext4 maximum file number.
After say 45gB it became very very very slow (like wrote 100 lines in each file in more than an hour).
My script is taking advantage of the server 64gB ram by having a buffer, I have 1.7tB SSD drive with more than 700mB writing speed but still looks like it cannot manage that much files. Even a du -h /dir/ takes forever.
Is there any way to do this task faster, or should I just go with 65536 files ?
Thank you for your help :)
Have a great day.
EDIT : Here's the code sorry for the mistake.
Thank you for your answer and sorry I didn't post any code. Here's the code :
$time_pre = microtime(true);
$bufferSize = (1<<9);
$dico = fopen('/path/to/file1',"r+");
$hexa = array("0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f");
for($i = 0;$i < 16;$i++)
{
for($j = 0;$j < 16;$j++)
{
for($k = 0;$k < 16;$k++)
{
for($l = 0;$l < 16;$l++)
{
for($m = 0;$m < 16;$m++)
{
$ijklm = "$hexa[$i]$hexa[$j]$hexa[$k]$hexa[$l]$hexa[$m]";
$$ijklm = "";
}
}
}
}
}
while(!feof($dico))
{
$ligne = rtrim(fgets($dico));
$md4 = hash("md5",$ligne);
$add4 = "$md4[0]$md4[1]$md4[2]$md4[3]$md4[4]";
$$add4 .= "$md4[5]$md4[6]$ligne\n";
if(isset($$add4{$bufferSize}))
{
$baseFichier = '/path/to/'.$md4[0].'/'.$md4[1].'/'.$md4[2].'/'.$md4[3].'/'.$md4[4];
$fichier = fopen($baseFichier,"a");
fwrite($fichier, $$add4);
fclose($fichier);
$$add4 = "";
}
}
for($i = 0;$i < 16;$i++)
{
for($j = 0;$j < 16;$j++)
{
for($k = 0;$k < 16;$k++)
{
for($l = 0;$l < 16;$l++)
{
for($m = 0;$m < 16;$m++)
{
$ijklm = "$hexa[$i]$hexa[$j]$hexa[$k]$hexa[$l]$hexa[$m]";
if($$ijklm != "")
{
$baseFichier = '/Path/to/'.$hexa[$i].'/'.$hexa[$j].'/'.$hexa[$k].'/'.$hexa[$l].'/'.$hexa[$m];
$fichier = fopen($baseFichier,"a");
fwrite($fichier, $$ijkl);
fclose($fichier);
$$ijklm = "";
}
}
}
}
}
}
fclose($dico);
I used fflush and some locks for the files at first, because I had 6 workers at the same time, to avoid writing problems. But for testing sake I removed it. On this one I tried to create 16^4 directories instead of one directory with 65.536 files in it.
I also tried a smaller buffer, to check if this was better on SSD.
It's running right now but looks like it's slower than previous test. The input file is 75gig, I have 6 like this to process.
What would you do to speed things up ?
I'm using php8. I tried using C but was the same issue so I went with a more high-level language for facility purpose.
Thank you :)
PS : Sorry if the code isn't pro it's a hobbie for me.

Why is a sorted array slower than a non sorted array in PHP

I have the following script, and I know about the principle "Branch prediction" but it seems that's not the case here.
Why is it faster to process a sorted array than an unsorted array?
It seems to work the other way around.
When I run the following script without the sort($data) the script takes 193.23883700371 seconds to complete.
When I enable the sort($data) line the scripts takes 300.26129794121 seconds to complete.
Why is it so much slower in PHP? I used PHP 5.5 and 5.6.
In PHP 7 the script is faster when the sort() is not commented out.
<?php
$size = 32768;
$data = array_fill(0, $size, null);
for ($i = 0; $i < $size; $i++) {
$data[$i] = rand(0, 255);
}
// Improved performance when disabled
//sort($data);
$total = 0;
$start = microtime(true);
for ($i = 0; $i < 100000; $i++) {
for ($x = 0; $x < $size; $x++) {
if ($data[$x] >= 127) {
$total += $data[$x];
}
}
}
$end = microtime(true);
echo($end - $start);
Based on my comments above the solution is to either find or implement a sort function that moves the values so that memory remains contiguous and gives you the speedup, or push the values from the sorted array into a second array so that the new array has contiguous memory.
Assuming you MEANT to not time the actual sort, since your code doesn't time that action, it's difficult to assess any true performance difference because you've filled the array with random data. This means that one pass might have MANY more values greater than or equal to 127 (and thus running an additional command) then another pass. To really compare the two, fill your array with an identical set of fixed data. Otherwise, you'll never know if the random fill is causing the time differences you're seeing.

Prevent my script from using so much memory?

I have a script which lists all possible permutations in an array, which, admittedly, might be used instead of a wordlist. If I get this to work, it'll be impossible to not get a hit eventually unless there is a limit on attempts.
Anyway, the script obviously takes a HUGE amount of memory, something which will set any server on fire. What I need help with is finding a way to spread out the memory usage, something like somehow resetting the script and continuing where it left off by going to another file or something, possibly by using Sessions. I have no clue.
Here's what I've got so far:
<?php
ini_set('memory_limit', '-1');
ini_set('max_execution_time', '0');
$possible = "abcdefghi";
$input = "$possible";
function string_getpermutations($prefix, $characters, &$permutations)
{
if (count($characters) == 1)
$permutations[] = $prefix . array_pop($characters);
else
{
for ($i = 0; $i < count($characters); $i++)
{
$tmp = $characters;
unset($tmp[$i]);
string_getpermutations($prefix . $characters[$i], array_values($tmp), $permutations);
}
}
}
$characters = array();
for ($i = 0; $i < strlen($input); $i++)
$characters[] = $input[$i];
$permutations = array();
print_r($characters);
string_getpermutations("", $characters, $permutations);
print_r($permutations);
?>
Any ideas? :3
You could store the permutations in files every XXX permutations, then reopen files when needed in the correct order to display/use your permutations. (Files or whatever you want, as long as you can free PhP memory)
I see that you're just echoing the permutations, but maybe you'd want to do something else with it ? So it depends somehow.
Also, try to unset as many unused variables as soon as possible while doing your permutations.
Edit : Sometimes, using references as you did for your permutations array can result to a bigger use of memory. Just in case you didn't try, check which is better, with or without

Peak memory usage of code block? [duplicate]

This question already has answers here:
Capturing (externally) the memory consumption of a given Callback
(3 answers)
Closed 9 years ago.
Is it possible to get the peak memory usage of a particular block of code in PHP? The memory_get_peak_usage() function seems to get the peak over the entire process execution up to the point of the function call, but this is not what I'm trying to obtain, since other code blocks could have skewed the value. I'm trying to isolate code blocks themselves, rather than the process as a whole.
Example:
// Block 1
for ($i = 0; $i < $iterations; ++$i) {
// some code
}
// Block 2
for ($i = 0; $i < $iterations; ++$i) {
// some different code
}
// Determine which of the two blocks used the most memory during their execution here
Unfortunately, xdebug is not an option for me at this time.
I don't know a specific function that does this, but if you are just trying to isolate a block of code does something as simple as:
$before = memory_get_peak_usage();
for ($i = 0; $i < $iterations; ++$i) {
// some code
}
$after = memory_get_peak_usage();
$used_memory = $after - $before;
EDIT: XHProf does it, check my answer here.
Don't use memory_get_peak_usage():
$alpha = memory_get_usage();
for ($i = 0; $i < $iterations; ++$i) {
// some code
}
$used_memory = memory_get_usage() - $alpha;
Bare in mind that this will only return the final amount of memory your // some code needed. Intermediate memory consumption (such as setting / destroying stuff or calling functions) won't count.
You can hack your way using register_tick_function() but it still won't work for function calls.

Is PHP capable of caching count call inside loop?

I know the more efficient way to have a loop over array is a foreach, or to store count in a variable to avoid to call it multiple times.
But I am curious if PHP have some kind of "caching" stuff like:
for ($i=0; $i<count($myarray); $i++) { /* ... */ }
Does it have something similar and I am missing it, or it does not have anything and you should code:
$count=count($myarray);
for ($i=0; $i<$count; $i++) { /* ... */ }
PHP does exactly what you tell it to. The length of the array may change inside the loop, so it may be on purpose that you're calling count on each iteration. PHP doesn't try to infer what you mean here, and neither should it. Therefore the standard way to do this is:
for ($i = 0, $length = count($myarray); $i < $length; $i++)
PHP will execute the count each time the loop iterates. However, PHP does keep internal track of the array's size, so count is a relatively cheap operation. It's not as if PHP is literally counting each element in the array. But it's still not free.
Using a very simple 10 million item array doing a simple variable increment, I get 2.5 seconds for the in-loop count version, and 0.9 seconds for the count-before-loop. A fairly large difference, but not 'massive'.
edit: the code:
$x = range(1, 10000000);
$z = 0;
$start = microtime(true);
for ($i = 0; $i < count($x); $i++) {
$z++;
}
$end = microtime(true); // $end - $start = 2.5047581195831
Switching to do
$count = count($x);
for ($i = 0; $i < $count; $i++) {
and otherwise everything else the same, the time is 0.96466398239136
PHP is an imperative language, and that means it is not supposed to optimize away anything that can possibly have any effect. Given that it's also an interpreted language, it couldn't be done safely even if someone really wanted.
Plus, if you simply want to iterate over the array, you really want to use foreach. In that case, not only the count, but the whole array will be copied (and you can modify the original one as you wish). Or you can modify it in place using foreach ($arr as &$el) { $el = ... }; unset($el);. What I mean to say is that PHP (as any other language) often provides better solutions to your original problem (if you have any).

Categories