I'm making a little benchmark class to display page load time and memory usage.
Load time is already working, but when I display the memory usage, it doesn't change
Example:
$conns = array();
ob_start();
benchmark::start();
$conns[] = mysql_connect('localhost', 'root', '');
benchmark::stop();
ob_flush();
uses the same memory as
$conns = array();
ob_start();
benchmark::start();
for($i = 0; $i < 1000; $i++)
{
$conns[] = mysql_connect('localhost', 'root', '');
}
benchmark::stop();
ob_flush();
I'm using memory_get_usage(true) to get the memory usage in bytes.
memory_get_usage(true) will show the amount of memory allocated by the php engine, not actually used by the script. It's very possible that your test script hasn't required the engine to ask for more memory.
For a test, grab a large(ish) file and read it into memory. You should see a change then.
I've successfully used memory_get_usage(true) to track the memory usage of web crawling scripts, and it's worked fine (since the goal was to slow things down before hitting the system memory limit). The one thing to remember is that it doesn't change based on actual usage, it changes based on the memory requested by the engine. So what you end up seeing is sudden jumps instead of slowing growing (or shrinking).
If you set the real_usage flag to false, you may be able to see very small memory changes - however, this won't help you monitor the true amount of memory php is requesting from the system.
(Update: To be clear the difference I describe is between memory used by the variables of your script, compared to the memory the engine requested to run your script. All the same script, different way of measuring.)
I'm no Guru in PHP's internals, but I could imagine an echo does not affect the amount of memory used by PHP, as it just outputs something to the client.
It could be different if you enable output buffering.
The following should make a difference:
$result = null;
benchmark::start()
for($i = 0; $i < 10000; $i++)
{
$result.='test';
}
Look at:
for($i = 0; $i < 1000; $i++)
{
$conns[] = mysql_connect('localhost', 'root', '');
}
You could have looped to 100,000 and nothing would have changed, its the same connection. No resources are allocated for it because the linked list remembering them never grew. Why would it grow? There's already (assumingly) a valid handle at $conns[0]. It won't make a difference in memory_get_usage(). You did test $conns[15] to see if it worked, yes?
Can root#localhost have multiple passwords? No. Why would PHP bother to handle another connection just because you told it to? (tongue in cheek).
I suggest running the same thing via CLI through Valgrind to see the actual heap usage:
valgrind /usr/bin/php -f foo.php .. or something similar. At the bottom, you'll see what was allocated, what was freed and garbage collection at work.
Disclaimer: I do know my way around PHP internals, but I am no expert in that deliberately obfuscated maze written in C that Zend calls PHP.
echo won't change the allocated number of bytes (unless you use output buffers).
the $i-variable is being unset after the for-loop, so it's not changing the number of allocated bytes either.
try to use a output buffering example:
ob_start();
benchmark::start();
for($i = 0; $i < 10000; $i++)
{
echo 'test';
}
benchmark::stop();
ob_flush();
Related
With this code :
for($i = 1; $i <= $times; $i++){
$milliseconds = round(microtime(true) * 1000);
$res = file_get_contents($url);
$milliseconds2 = round(microtime(true) * 1000);
$milisecondsCount = $milliseconds2 - $milliseconds;
echo 'miliseconds=' . $milisecondsCount . ' ms' . "\n";
}
I get this output :
miliseconds=1048 ms
miliseconds=169 ms
miliseconds=229 ms
miliseconds=209 ms
....
But with sleep:
for($i = 1; $i <= $times; $i++){
$milliseconds = round(microtime(true) * 1000);
$res = file_get_contents($url);
$milliseconds2 = round(microtime(true) * 1000);
$milisecondsCount = $milliseconds2 - $milliseconds;
echo 'miliseconds=' . $milisecondsCount . ' ms' . "\n";
sleep(2);
}
This :
miliseconds=1172 ms
miliseconds=1157 ms
miliseconds=1638 ms
....
So what is happening here ?
My questions:
1) Why don't you test this yourself by using clearstatcache and checking the time signatures with and without using it?
2) Your method of testing is terrible, as a first fix - have you tried swapping so that the "sleep" reading function plays first rather than second?
3) How many iterations of your test have you done?If it's less than 10,000 then I suggest you repeat your tests to be sure to identify firstly the average delay (or lack thereof) and then what makes you think that this is caused specifically by caching?
4) What are the specs. of your machine, your RAM and free and available memory upon each iteration?
5) Is your server live? Are you able to remove outside services as possible causes of time disparity? (such as anti-virus, background processes loading, server traffic, etc. etc.)?
My Answer:
Short answer, is NO. file_get_contents does not use the cache.
However, operating system level caches may cache recently used files to
make for quicker access, but I wouldn't count on those being
available.
Calling a file reference multiple times will only read the file
from disk a single time, subsequent calls will read the value from the
static variable within the function. Note that you shouldn't count on
this approach for large files or ones used only once per page, since
the memory used by the static variable will persist until the end of
the request. Keeping the contents of files unnecessarily in static
variables is a good way to make your script a memory hog.
Quoted from This answer.
For remote (non local filesystem) files, there are so many possible causes of variable delays that really file_get_contents caching is a long way down the list of options.
As you claim to be connecting using a localhost/ reference structure, I would hazard (but not certain) that your server will be using various firewall and checking techniques to check the incoming request which will add a large variable to the time taken.
Concatenate a random query string with url.
For eg: $url = 'http://example.com/file.html?r=' . rand(0, 9999);
I am actually working on a security script and it seems that I meet a problem with PHP and the way PHP uses memory.
my.php:
<?php
// Display current PID
echo 'pid= ', posix_getpid(), PHP_EOL;
// The user type a very secret key
echo 'Fill secret: ';
$my_secret_key = trim(fgets(STDIN));
// 'Destroty' the secret key
unset($my_secret_key);
// Wait for something
echo 'waiting...';
sleep(60);
And now I run the script:
php my.php
pid= 1402
Fill secret: AZERTY <= User input
waiting...
Before the script end (while sleeping), I generate a core file sending SIGSEV signal to the script
kill -11 1402
I inspect the corefile:
strings core | less
Here is an extract of the result:
...
fjssdd
sleep
STDIN
AZERTY <==== this is the secret key
zergdf
...
I understand that the memory is just released with the unset and not 'destroyed'. The data are not really removed (a call to the free() function)
So if someone dumps the memory of the process, even after the script execution, he could read $my_secret_key (until the memory space will be overwritten by another process)
Is there a way to overwrite this memory segment of the full memory space after the PHP script execution?
Thanks to all for your comments.
I already now how memory is managed by the system.
Even if PHP doesn't use malloc and free (but some edited versions like emalloc or efree), it seems (and I understand why) it is simply impossible for PHP to 'trash' after freeing disallowed memory.
The question was more by curiosity, and every comments seems to confirm what I previously intend to do: write a little piece of code in a memory aware language (c?) to handle this special part by allocating a simple string with malloc, overwriting with XXXXXX after using THEN freeing.
Thanks to all
J
You seem to be lacking a lot of understanding about how memory management works in general, and specifically within PHP.
A discussion of the various salient points is redundant when you consider what the security risk is here:
So if someone dumps the memory of the process, even after the script execution
If someone can access the memory of a program running under a different uid then they have root access and can compromise the target in so many other ways - and it doesn't matter if it's PHP script, ssh, an Oracle DBMS....
If someone can access the memory previously occupied by a process which has now terminated, then not only have they got root, they've already compromised the kernel.
You seem to have missed an important lesson in what computers mean by "delete operations".
See, it's never feasible for computer to zero-out memory, but instead they just "forget" they were using that memory.
In other words, if you want to clear memory, you most definitely need to overwrite it, just as #hakre suggested.
That said, I hardly see the point of your script. PHP just isn't made for the sort of thing you are doing. You're probably better off with a small dedicated solution rather than using PHP. But this is just my opinion. I think.
I dunno if that works, but if you can in your tests, please add these lines to see the outcome:
...
// Overwrite it:
echo 'Overwrite secret: ';
for($l = strlen($my_secret_key), $i = 0; $i < $l; $i++)
{
$my_secret_key[$i] = '#';
}
And I wonder whether or not running
gc_collect_cycles();
makes a difference. Even the values are free'ed, they might still be in memory (of the scripts pid or even somewhere else in memory space).
I would try whether overwriting memory with some data would eventually erase your original locations of variables:
$buffer = '';
for ($i = 0; $i < 1e6; $i++) {
$buffer .= "\x00";
}
As soon as php releases the memory, I suppose more allocations might be given the same location. It's hardly fail proof though.
I post a strange behavior that could be reproduced (at least on apache2+php5).
I don't know if I am doing wrong but let me explain what I try to achieve.
I need to send chunks of binary data (let's say 30) and analyze the average Kbit/s at the end :
I sum each chunk output time, each chunk size, and perform my Kbit/s calculation at the end.
<?php
// build my binary chunk
$var= '';
$o=10000;
while($o--)
{
$var.= pack('N', 85985);
}
// get the size, prepare the memory.
$size = strlen($var);
$tt_sent = 0;
$tt_time = 0;
// I send my chunk 30 times
for ($i = 0; $i < 30; $i++)
{
// start time
$t = microtime(true);
echo $var."\n";
ob_flush();
flush();
$e = microtime(true);
// end time
// the difference should reprenent what it takes to the server to
// transmit chunk to client right ?
// add this chuck bench to the total
$tt_time += round($e-$t,4);
$tt_sent += $size;
}
// total result
echo "\n total: ".(($tt_sent*8)/($tt_time)/1024)."\n";
?>
In this example above, it works so far ( on localhost, it oscillate from 7000 to 10000 Kbit/s through different tests).
Now, let's say I want to shape the transmission, because I know that the client will have enough of a chunk of data to process for a second.
I decide to use usleep(1000000), to mark a pause between chunk transmission.
<?php
// build my binary chunk
$var= '';
$o=10000;
while($o--)
{
$var.= pack('N', 85985);
}
// get the size, prepare the memory.
$size = strlen($var);
$tt_sent = 0;
$tt_time = 0;
// I send my chunk 30 times
for ($i = 0; $i < 30; $i++)
{
// start time
$t = microtime(true);
echo $var."\n";
ob_flush();
flush();
$e = microtime(true);
// end time
// the difference should reprenent what it takes to the server to
// transmit chunk to client right ?
// add this chuck bench to the total
$tt_time += round($e-$t,4);
$tt_sent += $size;
usleep(1000000);
}
// total result
echo "\n total: ".(($tt_sent*8)/($tt_time)/1024)."\n";
?>
In this last example, I don't know why, the calculated bandwidth can jump from 72000 Kbit/s to 1,200,000, it is totally inaccurate / irrelevant.
Part of the problem is that the time measured to output my chunk is ridiculously low each time a chunk is sent (after the first usleep).
I am doing something wrong ? Does the buffer output is not synchronous ?
I'm not sure how definitive these tests are, but I found it interesting. On my box I'm averaging around 170000 kb/s. From a networked box this number goes up to around 280000 kb/s. I guess we have to assume microtime(true) is fairly accurate even though I read it's operating system dependent. Are you on a Linux based system? The real question is how do we calculate the kilobits transferred in a 1 second time period? I try to project how many chunks can be sent in 1 second and then store the calculated Kb/s to be averaged at the end. I added a sleep(1) before flush() and this results in a negative kb/s as to be expected.
Something don't feel right and I would be interested in knowing if you have improved your testing method. Good Luck!
<?php
// build my binary chunk
$var= '';
$o=10000;
//Alternative to get actual bytes
$m1 = memory_get_usage();
while($o--)
{
$var.= pack('N', 85985);
}
$m2 = memory_get_usage();
//Your size estimate
$size = strlen($var);
//Calculate alternative bytes
$bytes = ($m2 - $m1); //40108
//Convert to Kilobytes 1 Kilobyte = 1024 bytes
$kilobytes = $size/1024;
//Convert to Kilobits 1 byte = 8 bits
$kilobits = $kilobytes * 8;
//Display our data for the record
echo "<pre>size: $size</pre>";
echo "<pre>bytes: $bytes</pre>";
echo "<pre>kilobytes: $kilobytes</pre>";
echo "<pre>kilobits: $kilobits</pre>";
echo "<hr />";
//The test count
$count = 100;
//Initialize total kb/s variable
$total = 0;
for ($i = 0; $i < $count; $i++)
{
// Start Time
$start = microtime(true);
// Utilize html comment to prevent browser from parsing
echo "<!-- $var -->";
// End Time
$end = microtime(true);
// Seconds it took to flush binary chunk
$seconds = $end - $start;
// Calculate how many chunks we can send in 1 second
$chunks = (1/$seconds);
// Calculate the kilobits per second
$kbs = $chunks * $kilobits;
// Store the kbs and we'll average all of them out of the loop
$total += $kbs;
}
//Process the average (data generation) kilobits per second
$average = $total/$count;
echo "<h4>Average kbit/s: $average</h4>";
Analysis
Even though I arrive at some arbitrary value in the test, it is still a value that can be measured. Using a networked computer adds insight into what is really going on. I would have thought the localhost machine would have a higher value then the networked box, but tests prove otherwise in a big way. When on the localhost we have to both send the raw binary data and receive it. This of course shows that two threads are sharing cpu cycles and therefore the supposed kb/s value is in fact lower when testing in a browser on the same machine. We are therefore really measuring cpu cycles and we obtain higher values when the server is allowed to be a server.
Some interesting things start to show up when you increase the test count to 1000. First don't make the browser parse the data. It takes alot of cpu to attempt to render raw data at such high test cases. We can manually watch what is going on with say system monitor and task manager. In my case the local is a linux server and the network box is xp. You can obtain some real kb/s speeds this way and it makes it obvious that we are dynamically serving up data using mainly the cpu and network interfaces. The server doesn't replicate the data and, therefore no matter how high we set the test count, we only need 40 kilobytes of memory space. So 40 kilobytes can generate 40 megabytes dynamically at a 1000 test cases and 400mb at 10000 cases.
I crashed firefox in xp with a virtual memory to low error after running the test case 10000 several times. System monitor on the linux server showed some understandable spikes in cpu and network, but overall pushed out a large amount of data really quick and had plenty of room to spare. Running 10000 cases on linux a few times actually spun up the swap drive and pegged the server cpu cycle. The most interesting fact though is that my values that I obtained above only changed when I was both receiving in firefox and transmitting in apache when testing locally. I practically locked up the xp box, yet my network value of ~280000 kb/s did not change on the print out.
Conclusion:
The kb/s value we arrive at above is virtually useless other then to prove its useless. The test itself however shows some interesting things. At high test cases I beleive I could actually see some physical buffering going on in both the server and the client . Our test script actually dumps the data to apache and it gets released to continue its work. Apache handles the details of the transfer of course. This is actually nice to know, but proves we can't measure the actual transmission rate from our server to browser this way. We can measure our servers data generation rate I guess if thats meaningful in some way. Guess what! Flushing actually slowed down the speeds. Theres a penalty for flushing. In this case, there is no reason for it and removing flush() actually speeds up our data generation rate. Since we are not dealing with networking, our value above is actually more meaningful kept as Kilobytes. It's useless any ways so I'm not changing.
I'm having problems updating a mysql database.
I want to run this script on a ~800,000 row database but my memory runs out after 5 rows so some how I need to automatically update this mysql query by adding +5 to the LIMIT.
I want to use a redirect or reset the memory somehow..
<?php
include(dirname(__FILE__) .'/include/simple_html_dom.php');
mysql_connect('localhost', '', '');
mysql_select_db('');
$result = mysql_query("SELECT gamertag FROM gamertags LIMIT 0, 5 ");
while ($row = mysql_fetch_assoc($result)) {
$gamertag = $row['gamertag'];
//pages to pull stats from
$site_url = 'http://www.bungie.net';
$stats_page = 'http://www.bungie.net/stats/reach/default.aspx?player=';
//create dom
$html = new simple_html_dom();
$html->load_file($stats_page . urlencode(strtolower($gamertag)));
//pull nameplate emblem url, ****if it exist****
$nameplate_emblem_el = $html->find("#ctl00_mainContent_identityBar_namePlateImg");
if (!$nameplate_emblem_el){
$nameplate_emblem = 'No nameplate emblem!';
}
else{
//only if #ctl00_mainContent_identityBar_namePlateImg is found
$nameplate_emblem = htmlspecialchars_decode($nameplate_emblem_el[0]->attr['src']);
$nameplate_emblem = $site_url . $nameplate_emblem;
}
mysql_query("INSERT IGNORE INTO star SET gamertag = '".$gamertag."',nameplate = '".$nameplate_emblem."'");
}
?>
800.000 is a large number, not sure if it´s gonna happen, but from a comments on php manual:
"unset() does just what it's name says - unset a variable. It does not force immediate memory freeing. PHP's garbage collector will do it when it see fits - by intention as soon, as those CPU cycles aren't needed anyway, or as late as before the script would run out of memory, whatever occurs first.
If you are doing $whatever = null; then you are rewriting variable's data. You might get memory freed / shrunk faster, but it may steal CPU cycles from the code that truly needs them sooner, resulting in a longer overall execution time."
source:http://php.net/manual/en/function.unset.php
So, try to unset/set to null the variables after each time the loop runs.
I'm guessing that you hit your php memory limit, and not the machines. If so, you can edit php.ini (typically /etc/php5/apache2/php.ini) to give more memory to php. Look for "memory_limit = X" where X is something like 256M
What is the best way to write to files in a large php application. Lets say there are lots of writes needed per second. How is the best way to go about this.
Could I just open the file and append the data. Or should i open, lock, write and unlock.
What will happen of the file is worked on and other data needs to be written. Will this activity be lost, or will this be saved. and if this will be saved will is halt the application.
If you have been, thank you for reading!
Here's a simple example that highlights the danger of simultaneous wites:
<?php
for($i = 0; $i < 100; $i++) {
$pid = pcntl_fork();
//only spawn more children if we're not a child ourselves
if(!$pid)
break;
}
$fh = fopen('test.txt', 'a');
//The following is a simple attempt to get multiple threads to start at the same time.
$until = round(ceil(time() / 10.0) * 10);
echo "Sleeping until $until\n";
time_sleep_until($until);
$myPid = posix_getpid();
//create a line starting with pid, followed by 10,000 copies of
//a "random" char based on pid.
$line = $myPid . str_repeat(chr(ord('A')+$myPid%25), 10000) . "\n";
for($i = 0; $i < 1; $i++) {
fwrite($fh, $line);
}
fclose($fh);
echo "done\n";
If appends were safe, you should get a file with 100 lines, all of which roughly 10,000 chars long, and beginning with an integer. And sometimes, when you run this script, that's exactly what you'll get. Sometimes, a few appends will conflict, and it'll get mangled, however.
You can find corrupted lines with grep '^[^0-9]' test.txt
This is because file append is only atomic if:
You make a single fwrite() call
and that fwrite() is smaller than PIPE_BUF (somewhere around 1-4k)
and you write to a fully POSIX-compliant filesystem
If you make more than a single call to fwrite during your log append, or you write more than about 4k, all bets are off.
Now, as to whether or not this matters: are you okay with having a few corrupt lines in your log under heavy load? Honestly, most of the time this is perfectly acceptable, and you can avoid the overhead of file locking.
I do have high-performance, multi-threaded application, where all threads are writing (appending) to single log file. So-far did not notice any problems with that, each thread writes multiple times per second and nothing gets lost. I think just appending to huge file should be no issue. But if you want to modify already existing content, especially with concurrency - I would go with locking, otherwise big mess can happen...
If concurrency is an issue, you should really be using databases.
If you're just writing logs, maybe you have to take a look in syslog function, since syslog provides an api.
You should also delegate writes to a dedicated backend and do the job in an asynchroneous maneer ?
These are my 2p.
Unless a unique file is needed for a specific reason, I would avoid appending everything to a huge file. Instead, I would wrap the file by time and dimension. A couple of configuration parameters (wrap_time and wrap_size) could be defined for this.
Also, I would probably introduce some buffering to avoid waiting the write operation to be completed.
Probably PHP is not the most adapted language for this kind of operations, but it could still be possible.
Use flock()
See this question
If you just need to append data, PHP should be fine with that as filesystem should take care of simultaneous appends.