I am actually working on a security script and it seems that I meet a problem with PHP and the way PHP uses memory.
my.php:
<?php
// Display current PID
echo 'pid= ', posix_getpid(), PHP_EOL;
// The user type a very secret key
echo 'Fill secret: ';
$my_secret_key = trim(fgets(STDIN));
// 'Destroty' the secret key
unset($my_secret_key);
// Wait for something
echo 'waiting...';
sleep(60);
And now I run the script:
php my.php
pid= 1402
Fill secret: AZERTY <= User input
waiting...
Before the script end (while sleeping), I generate a core file sending SIGSEV signal to the script
kill -11 1402
I inspect the corefile:
strings core | less
Here is an extract of the result:
...
fjssdd
sleep
STDIN
AZERTY <==== this is the secret key
zergdf
...
I understand that the memory is just released with the unset and not 'destroyed'. The data are not really removed (a call to the free() function)
So if someone dumps the memory of the process, even after the script execution, he could read $my_secret_key (until the memory space will be overwritten by another process)
Is there a way to overwrite this memory segment of the full memory space after the PHP script execution?
Thanks to all for your comments.
I already now how memory is managed by the system.
Even if PHP doesn't use malloc and free (but some edited versions like emalloc or efree), it seems (and I understand why) it is simply impossible for PHP to 'trash' after freeing disallowed memory.
The question was more by curiosity, and every comments seems to confirm what I previously intend to do: write a little piece of code in a memory aware language (c?) to handle this special part by allocating a simple string with malloc, overwriting with XXXXXX after using THEN freeing.
Thanks to all
J
You seem to be lacking a lot of understanding about how memory management works in general, and specifically within PHP.
A discussion of the various salient points is redundant when you consider what the security risk is here:
So if someone dumps the memory of the process, even after the script execution
If someone can access the memory of a program running under a different uid then they have root access and can compromise the target in so many other ways - and it doesn't matter if it's PHP script, ssh, an Oracle DBMS....
If someone can access the memory previously occupied by a process which has now terminated, then not only have they got root, they've already compromised the kernel.
You seem to have missed an important lesson in what computers mean by "delete operations".
See, it's never feasible for computer to zero-out memory, but instead they just "forget" they were using that memory.
In other words, if you want to clear memory, you most definitely need to overwrite it, just as #hakre suggested.
That said, I hardly see the point of your script. PHP just isn't made for the sort of thing you are doing. You're probably better off with a small dedicated solution rather than using PHP. But this is just my opinion. I think.
I dunno if that works, but if you can in your tests, please add these lines to see the outcome:
...
// Overwrite it:
echo 'Overwrite secret: ';
for($l = strlen($my_secret_key), $i = 0; $i < $l; $i++)
{
$my_secret_key[$i] = '#';
}
And I wonder whether or not running
gc_collect_cycles();
makes a difference. Even the values are free'ed, they might still be in memory (of the scripts pid or even somewhere else in memory space).
I would try whether overwriting memory with some data would eventually erase your original locations of variables:
$buffer = '';
for ($i = 0; $i < 1e6; $i++) {
$buffer .= "\x00";
}
As soon as php releases the memory, I suppose more allocations might be given the same location. It's hardly fail proof though.
Related
I've been trying to validate over 1 million randomly generated values (strings) with PHP and a client side programming language on an online form, but there are a few challenges I'm facing:
PHP
Link to the (editable) PHP code:https://3v4l.org/AtTkO
The PHP code:
<?php
function generateRandomString($length = 10) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyz-_.';
$charactersLength = strlen($characters);
$randomString = '';
for ($i = 0; $i < $length; $i++) {
$randomString .= $characters[rand(0, $charactersLength - 1)];
}
return $randomString;
}
$unique = array();
for ($i = 0; $i < 9000000; $i++)
{
$u=$i+1;
$random = generateRandomString(5);
if(!in_array($random, $unique)){
echo $u.".m".$random."#[server]\n";
$unique[] = $random;
gc_collect_cycles();
}else{
echo "duplicate detected";
$i--;
}
}
echo memory_get_peak_usage();
What should happen:
New 5 character value gets randomly generated
Value gets checked if it already exists in the array
Value gets added to array
All randomly generated values are exported to a .txt file to be used for validating. (Not in the script yet)
What actually happens:
I hit either a memory usage limit or a server timeout for the execution time.
What I've tried
I've tried using sleep(3) during the for loop.
Setting Memory limit to -1 and timeout to 0. The unlimited memory doesn't make a difference and is too dangerous in a working environment.
Using gc_collect_cycles() during the for loop
Using echo memory_get_peak_usage(); -> I don't really understand
how I could use this for debugging.
What I need help with:
Memory management in PHP
Having pauses in the script that will reset the PHP execution timer
Client Side Programming language
This is where I have absolutely no clue which way I should go or which programming language I should use for this.
What I want to achieve
Load a webpage that has a form
Load the .txt with all randomly generated strings
fill in the form with the first string
submit the form:
If positive response from form > save string in special .txt file or array, go to the next value
If negative response from form > delete string from file, go to the next value | or just go to the next value
All values with a positive response are filtered out and easily accessible at the end.
I don't know which programming language I should use for this function. I've been thinking about Javascript and Python but I'm not sure how I could combine that with PHP. A nudge in the right direction would be appreciated.
I might be completely wrong for trying to achieve this with PHP, if so, please let me know what would be the better and easier option.
Thanks!
Interesting question, first of all whenever you think of a solution like this, one of the first things you need to consider is can it be async? If your answer is yes, then your implementation will likely be simple, else, you will likely have to pay huge server costs or render random cached results.
NB remove gc_collect_cycles. It does the opposite of what you want, and you hardly ever need to call it manually.
That being said, the approach I would recommend in your case is as follows:
Use a websocket which will be opened only once on the client browser, and then forward results in realtime from server to the browser. Of course, this code itself, can run completely on clientside via javascript, so if it's not just a PoC, you can convert the php code to javascript.
Change your code to yield items or forward results via websocket once a generated code has been confirmed as unique.
However, if you're really just doing only what the PHP code says, you can do that completely in javascript and save your server resources. See this answer for an example code to replace your generateRandomString function.
Assuming you have the ability to edit the php.ini:
Increase your memory limit as described here:
PHP MEMORY LIMIT INCREASE
For the 'memory limit' see here
and for the 'timeout for the execution time' add :
set_time_limit(0);
on the top of the PHP file.
Have you tried using sets? https://www.php.net/manual/en/class.ds-set.php
Sets are very efficient whenever you want to ensure a value isn't present twice.
Checking the presence of a value in a set it way way way faster that loop across all entries on the array.
I'm not a expert with PHP but it would look like something like that in Ruby
require 'set'
CHARS = '0123456789abcdefghijklmnopqrstuvwxyz-_.'.split('');
unique = Set.new()
def generateRandomString(l = 10)
Array.new(l) { CHARS.sample }.join
end
while unique.length < 1_000_000
random_string = generateRandomString
if !unique.include?(random_string)
unique.add(random_string)
end
end
hope it helps
Using fgetcsv, can I somehow do a destructive read where rows I've read and processed would be discarded so if I don't make it through the whole file in the first pass, I can come back and pick up where I left off before the script timed out?
Additional Details:
I'm getting a daily product feed from a vendor that comes across as a 200mb .gz file. When I unpack the file, it turns into a 1.5gb .csv with nearly 500,000 rows and 20 - 25 fields. I need to read this information into a MySQL db, ideally with PHP so I can schedule a CRON to run the script at my web hosting provider every day.
I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.
My idea was to grab the information from the .csv using the fgetcsv function, but I'm expecting to have to take multiple passes at the file because of the 3 minute timeout, I was thinking it would be nice to whittle away at the file as I process it so I wouldn't need to spend cycles skipping over rows that were already processed in a previous pass.
From your problem description it really sounds like you need to switch hosts. Processing a 2 GB file with a hard time limit is not a very constructive environment. Having said that, deleting read lines from the file is even less constructive, since you would have to rewrite the entire 2 GB to disk minus the part you have already read, which is incredibly expensive.
Assuming you save how many rows you have already processed, you can skip rows like this:
$alreadyProcessed = 42; // for example
$i = 0;
while ($row = fgetcsv($fileHandle)) {
if ($i++ < $alreadyProcessed) {
continue;
}
...
}
However, this means you're reading the entire 2 GB file from the beginning each time you go through it, which in itself already takes a while and you'll be able to process fewer and fewer rows each time you start again.
The best solution here is to remember the current position of the file pointer, for which ftell is the function you're looking for:
$lastPosition = file_get_contents('last_position.txt');
$fh = fopen('my.csv', 'r');
fseek($fh, $lastPosition);
while ($row = fgetcsv($fh)) {
...
file_put_contents('last_position.txt', ftell($fh));
}
This allows you to jump right back to the last position you were at and continue reading. You obviously want to add a lot of error handling here, so you're never in an inconsistent state no matter which point your script is interrupted at.
You can avoid timeout and memory error to some extent when reading like a Stream. By Reading line by line and then inserts each line into a database (Or Process accordingly). In that way only single line is hold in memory on each iteration. Please note don't try to load a huge csv-file into an array, that really would consume a lot of memory.
if(($handle = fopen("yourHugeCSV.csv", 'r')) !== false)
{
// Get the first row (Header)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// Process Your Data
unset($data);
}
fclose($handle);
}
I think a better solution (it will be phenomnally inefficient to continuously rewind and write to open file stream) would be to track the file position of each record read (using ftell) and store it with the data you've read - then if you have to resume, then just fseek to the last position.
You could try loading the file directly using mysql's read file function (which will likely be a lot faster) although I've had problems with this in the past and ended up writing my own php code.
I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.
What have you tried?
The memory can be limited by other means than the php.ini file, but I can't imagine how anyone could actually prevent you from using a different execution time (even if ini_set is disabled, from the command line you could run php -d max_execution_time=3000 /your/script.php or php -c /path/to/custom/inifile /your/script.php )
Unless you are trying to fit the entire datafile into memory then there should be no issue with a memory limit of 128Mb
I'm working on a game, written in PHP and that runs in a console. Think back to old MUDs and other text-based games, even some ASCII art!
Anyway, what I'm trying to do is have things happening while also accepting user input.
For instance, let's say it's a two player game and Player 1 is waiting for Player 2 to make a move. This is easily done by just listening for a message.
But what if Player 1 wants to change some options? What if they want to view details on aspects of the game state? What about conceding the game? There are many things a Player may want to do while waiting for their opponent to make a move.
Unfortunately the best I have right now is the fact that Ctrl+C completely kills the program. The other player is then left hanging, until the connection is dropped. Oh, and the game is completely lost.
I get user input with fgets(STDIN). But this blocks execution until input has been received (which is usually a good thing).
Is it even possible for a console program like this to handle input and output simultaneously? Or should I just look at some other interface?
In short PHP is not built for this, but you might get some help from one of these extensions. I'm not sure how thorough they are, but you really probably want to use a text UI library. (And really you probably do not want to use PHP for this.)
All that said, you need to get non blocking input from STDIN character by character. Unfortunately most terminals are buffered from PHP's point of view, so you won't get anything until enter is pressed.
If you run stty -icanon (or your OS's equivalent) on your terminal to disable buffering, then the following short program basically works:
<?php
stream_set_blocking(STDIN, false);
$line = '';
$time = microtime(true);
$prompt = '> ';
echo $prompt;
while (true)
{
if (microtime(true) - $time > 5)
{
echo "\nTick...\n$prompt$line";
$time = microtime(true);
}
$c = fgetc(STDIN);
if ($c !== false)
{
if ($c != "\n")
$line .= $c;
else
{
if ($line == 'exit' || $line == 'quit')
break;
else if ($line == 'help')
echo "Type exit\n";
else
echo "Unrecognized command.\n";
echo $prompt;
$line = '';
}
}
}
(It relies on local echo being enabled to print the characters as they are typed.)
As you see, we are just looping around forever. If a character exists, add it to the $line. If enter is pressed, process $line. Meanwhile, we are ticking every five seconds just to show that we could be doing something else while we wait for input. (This will consume maximum CPU; you'd have to issue a sleep() to get around that.)
This isn't meant to be a practical example, per se, but perhaps will get you thinking in the proper direction.
It is possible to build a game like you describe using ncurses (non-blocking mode) and libevent. That way, you get close to no CPU consumption. Handling individual keys is sometimes awkward (implement Backspace yourself, it's not fun at all - and did you know various OSes send different keycodes on Backspace press?), and gets really tricky if you want to support UTF-8 properly. Still, completely viable.
In particular, it is beneficial to make extensive use of libevent, by reading both the network and keyboard (stdin) input with it. This function enables you to listen for individual keys:
http://www.php.net/manual/en/function.ncurses-cbreak.php
which you can later read using libevent API. The key to keep in mind is that you will sometimes end up reading more than 1 key at a time, and it has to be handled (so loop over everything that you have read). Otherwise, the user will be annoyed to see that not all key presses are "reaching" the application and some are lost.
Sorry Matthew, I'm going to have to un-accept your answer, because I have found it myself:
Use the following code to receive user input while still doing something else:
while(/* some condition that the code running is waiting on */) {
// perform one step or iteration of that code
exec("choice /N /C ___ /D _ /T _",$out,$ret);
// /C is a list of letters that do something
// /D is the default action that will be used as a no-op
// /T is the amount of time to wait, probably best set to one second
switch($ret) {
// handle cases - the "default" case should be "continue 2"
}
}
This can then be used to interrupt the loop and enter an options menu, or trigger some other event, or could even be used to type out a command if used right.
I'm making a little benchmark class to display page load time and memory usage.
Load time is already working, but when I display the memory usage, it doesn't change
Example:
$conns = array();
ob_start();
benchmark::start();
$conns[] = mysql_connect('localhost', 'root', '');
benchmark::stop();
ob_flush();
uses the same memory as
$conns = array();
ob_start();
benchmark::start();
for($i = 0; $i < 1000; $i++)
{
$conns[] = mysql_connect('localhost', 'root', '');
}
benchmark::stop();
ob_flush();
I'm using memory_get_usage(true) to get the memory usage in bytes.
memory_get_usage(true) will show the amount of memory allocated by the php engine, not actually used by the script. It's very possible that your test script hasn't required the engine to ask for more memory.
For a test, grab a large(ish) file and read it into memory. You should see a change then.
I've successfully used memory_get_usage(true) to track the memory usage of web crawling scripts, and it's worked fine (since the goal was to slow things down before hitting the system memory limit). The one thing to remember is that it doesn't change based on actual usage, it changes based on the memory requested by the engine. So what you end up seeing is sudden jumps instead of slowing growing (or shrinking).
If you set the real_usage flag to false, you may be able to see very small memory changes - however, this won't help you monitor the true amount of memory php is requesting from the system.
(Update: To be clear the difference I describe is between memory used by the variables of your script, compared to the memory the engine requested to run your script. All the same script, different way of measuring.)
I'm no Guru in PHP's internals, but I could imagine an echo does not affect the amount of memory used by PHP, as it just outputs something to the client.
It could be different if you enable output buffering.
The following should make a difference:
$result = null;
benchmark::start()
for($i = 0; $i < 10000; $i++)
{
$result.='test';
}
Look at:
for($i = 0; $i < 1000; $i++)
{
$conns[] = mysql_connect('localhost', 'root', '');
}
You could have looped to 100,000 and nothing would have changed, its the same connection. No resources are allocated for it because the linked list remembering them never grew. Why would it grow? There's already (assumingly) a valid handle at $conns[0]. It won't make a difference in memory_get_usage(). You did test $conns[15] to see if it worked, yes?
Can root#localhost have multiple passwords? No. Why would PHP bother to handle another connection just because you told it to? (tongue in cheek).
I suggest running the same thing via CLI through Valgrind to see the actual heap usage:
valgrind /usr/bin/php -f foo.php .. or something similar. At the bottom, you'll see what was allocated, what was freed and garbage collection at work.
Disclaimer: I do know my way around PHP internals, but I am no expert in that deliberately obfuscated maze written in C that Zend calls PHP.
echo won't change the allocated number of bytes (unless you use output buffers).
the $i-variable is being unset after the for-loop, so it's not changing the number of allocated bytes either.
try to use a output buffering example:
ob_start();
benchmark::start();
for($i = 0; $i < 10000; $i++)
{
echo 'test';
}
benchmark::stop();
ob_flush();
What is the best way to write to files in a large php application. Lets say there are lots of writes needed per second. How is the best way to go about this.
Could I just open the file and append the data. Or should i open, lock, write and unlock.
What will happen of the file is worked on and other data needs to be written. Will this activity be lost, or will this be saved. and if this will be saved will is halt the application.
If you have been, thank you for reading!
Here's a simple example that highlights the danger of simultaneous wites:
<?php
for($i = 0; $i < 100; $i++) {
$pid = pcntl_fork();
//only spawn more children if we're not a child ourselves
if(!$pid)
break;
}
$fh = fopen('test.txt', 'a');
//The following is a simple attempt to get multiple threads to start at the same time.
$until = round(ceil(time() / 10.0) * 10);
echo "Sleeping until $until\n";
time_sleep_until($until);
$myPid = posix_getpid();
//create a line starting with pid, followed by 10,000 copies of
//a "random" char based on pid.
$line = $myPid . str_repeat(chr(ord('A')+$myPid%25), 10000) . "\n";
for($i = 0; $i < 1; $i++) {
fwrite($fh, $line);
}
fclose($fh);
echo "done\n";
If appends were safe, you should get a file with 100 lines, all of which roughly 10,000 chars long, and beginning with an integer. And sometimes, when you run this script, that's exactly what you'll get. Sometimes, a few appends will conflict, and it'll get mangled, however.
You can find corrupted lines with grep '^[^0-9]' test.txt
This is because file append is only atomic if:
You make a single fwrite() call
and that fwrite() is smaller than PIPE_BUF (somewhere around 1-4k)
and you write to a fully POSIX-compliant filesystem
If you make more than a single call to fwrite during your log append, or you write more than about 4k, all bets are off.
Now, as to whether or not this matters: are you okay with having a few corrupt lines in your log under heavy load? Honestly, most of the time this is perfectly acceptable, and you can avoid the overhead of file locking.
I do have high-performance, multi-threaded application, where all threads are writing (appending) to single log file. So-far did not notice any problems with that, each thread writes multiple times per second and nothing gets lost. I think just appending to huge file should be no issue. But if you want to modify already existing content, especially with concurrency - I would go with locking, otherwise big mess can happen...
If concurrency is an issue, you should really be using databases.
If you're just writing logs, maybe you have to take a look in syslog function, since syslog provides an api.
You should also delegate writes to a dedicated backend and do the job in an asynchroneous maneer ?
These are my 2p.
Unless a unique file is needed for a specific reason, I would avoid appending everything to a huge file. Instead, I would wrap the file by time and dimension. A couple of configuration parameters (wrap_time and wrap_size) could be defined for this.
Also, I would probably introduce some buffering to avoid waiting the write operation to be completed.
Probably PHP is not the most adapted language for this kind of operations, but it could still be possible.
Use flock()
See this question
If you just need to append data, PHP should be fine with that as filesystem should take care of simultaneous appends.