Does PHP copy variables when retrieving from shared memory? - php

If I run an shm_get_var(), will it return a "reference", keeping the data in shared memory?
I'm wanting to keep an array about 50MB in size in shared memory so that it can be used by multiple processes without having to keep multiple copies of this 50MB array hanging around. If shared memory isn't the answer, does anyone have another idea?

This is the relevant C code snippet from sysvsem.c in PHP 5.2.9 :
/* setup string-variable and serialize */
/* get serialized variable from shared memory */
shm_varpos = php_check_shm_data((shm_list_ptr->ptr), key);
if (shm_varpos < 0) {
php_error_docref(NULL TSRMLS_CC, E_WARNING, "variable key %ld doesn't exist", key);
RETURN_FALSE;
}
shm_var = (sysvshm_chunk*) ((char *)shm_list_ptr->ptr + shm_varpos);
shm_data = &shm_var->mem;
PHP_VAR_UNSERIALIZE_INIT(var_hash);
if (php_var_unserialize(&return_value, (const unsigned char **) &shm_data, shm_data + shm_var->length, &var_hash TSRMLS_CC) != 1) {
PHP_VAR_UNSERIALIZE_DESTROY(var_hash);
php_error_docref(NULL TSRMLS_CC, E_WARNING, "variable data in shared memory is corrupted");
RETURN_FALSE;
}
PHP_VAR_UNSERIALIZE_DESTROY(var_hash);
PHP will have to unserialize the entire value every time you call shm_get, which, on a 50MB array, is going to be really really slow.
How about breaking it up into individual values?
Also you might want to consider using APC's variable cache, which will handle all of the shared memory and locking for you (and will also use a hash table for key lookups)

I'm no expert on this, but would it be possible to write a quick test for this something like the following?
$key = 1234;
//put something small into shared memory
$identifier = shm_attach($key, 1024, 0777);
shm_put_var($identifier, $key, 'shave and a hair cut');
$firstVar = shm_get_var($identifier, $key);
$firstVar .= 'Test String of Doom';
$secondVar = shm_get_var($identifier, $key);
if ($firstVar == $secondVar) {
echo 'shm_get_var passes by reference';
} else {
echo 'shm_get_var passes by value';
}

form the wording of the documentation
shm_get_var() returns the variable
with a given variable_key , in the
given shared memory segment. The
variable is still present in the
shared memory.
I would say yes it's a reference to the shared memory space.

you can use shm_remove()
Check this out: http://php.net/manual/en/function.shm-remove.php

Related

Array Insert Time Jump

During deep researching about hash and zval structure and how arrays are based on it, faced with strange insert time.
Here is example:
$array = array();
$someValueToInsert = 100;
for ($i = 0; $i < 10000; ++$i) {
$time = microtime(true);
array_push($array, $someValueToInsert);
echo $i . " : " . (int)((microtime(true) - $time) * 100000000) . "</br>";
}
So, I found that every 1024, 2024, 4048... element will be inserted using much more time(>~x10).
It doesn't depends will I use array_push, array_unshift, or simply $array[] = someValueToInsert.
I'm thinking about that in Hash structure:
typedef struct _hashtable {
...
uint nNumOfElements;
...
} HashTable;
nNumOfElements has default max value, but it doesn't the answer why does it took more time to insert in special counters(1024, 2048...).
Any thoughts ?
While I would suggest double checking my answer on the PHP internals list, I believe the answer lay in zend_hash_do_resize(). When more elements are needed in the hash table, this function is called and the extant hash table is doubled in size. Since the table starts life at 1024, this doubling explains the results you've observed. Code:
} else if (ht->nTableSize < HT_MAX_SIZE) { /* Let's double the table size */
void *old_data = HT_GET_DATA_ADDR(ht);
Bucket *old_buckets = ht->arData;
HANDLE_BLOCK_INTERRUPTIONS();
ht->nTableSize += ht->nTableSize;
ht->nTableMask = -ht->nTableSize;
HT_SET_DATA_ADDR(ht, pemalloc(HT_SIZE(ht), ht->u.flags & HASH_FLAG_PERSISTENT));
memcpy(ht->arData, old_buckets, sizeof(Bucket) * ht->nNumUsed);
pefree(old_data, ht->u.flags & HASH_FLAG_PERSISTENT);
zend_hash_rehash(ht);
HANDLE_UNBLOCK_INTERRUPTIONS();
I am uncertain if the remalloc is the performance hit, or if the rehashing is the hit, or the fact that the whole block is uninterruptable. Would be interesting to put a profiler on it. I think some might have already done that for PHP 7.
Side note, the Thread Safe version does things differently. I'm not overly familiar with that code, so there may be a different issue going on if your using ZTS.
I think it is related to implementation of dynamic arrays.
See here "Geometric expansion and amortized cost" http://en.wikipedia.org/wiki/Dynamic_array
To avoid incurring the cost of resizing many times, dynamic arrays resize by a large amount, **such as doubling in size**, and use the reserved space for future expansion
You can read about arrays in PHP here as well https://nikic.github.io/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
It is a standard practice for dynamic arrays. E.g. check here C++ dynamic array, increasing capacity
capacity = capacity * 2; // doubles the capacity of the array

In PHP, how can I detect that input vars were truncated due to max_input_vars being exceeded?

I know that an E_WARNING is generated by PHP
PHP Warning: Unknown: Input variables exceeded 1000
But how can I detect this in my script?
A "close enough" method would be to check if( count($_POST, COUNT_RECURSIVE) == ini_get("max_input_vars"))
This will cause a false positive if the number of POST vars happens to be exactly on the limit, but considering the default limit is 1000 it's unlikely to ever be a concern.
count($_POST, COUNT_RECURSIVE) is not accurate because it counts all nodes in the array tree whereas input_vars are only the terminal nodes. For example, $_POST['a']['b'] = 'c' has 1 input_var but using COUNT_RECURSIVE will return 3.
php://input cannot be used with enctype="multipart/form-data". http://php.net/manual/en/wrappers.php.php
Since this issue only arises with PHP >= 5.3.9, we can use anonymous functions. The following recursively counts the terminals in an array.
function count_terminals($a) {
return is_array($a)
? array_reduce($a, function($carry, $item) {return $carry + count_terminals($item);}, 0)
: 1;
}
What works for me is this. Firstly, I put this at the top of my script/handler/front controller. This is where the error will be saved (or $e0 will be null, which is OK).
$e0 = error_get_last();
Then I run a bunch of other processing, bootstrapping my application, registering plugins, establishing sessions, checking database state - lots of things - that I can accomplish regardless of exceeding this condition.. Then I check this $e0 state. If it's not null, we have an error so I bail out (assume that App is a big class with lots of your magic in it)
if (null != $e0) {
ob_end_clean(); // Purge the outputted Warning
App::bail($e0); // Spew the warning in a friendly way
}
Tweak and tune error handlers for your own state.
Registering an error handler won't catch this condition because it exists before your error handler is registered.
Checking input var count to equal the maximum is not reliable.
The above $e0 will be an array, with type => 8, and line => 0; the message will explicitly mention input_vars so you could regex match to create a very narrow condition and ensure positive identification of the specific case.
Also note, according to the PHP specs this is a Warning not an Error.
function checkMaxInputVars()
{
$max_input_vars = ini_get('max_input_vars');
# Value of the configuration option as a string, or an empty string for null values, or FALSE if the configuration option doesn't exist
if($max_input_vars == FALSE)
return FALSE;
$php_input = substr_count(file_get_contents('php://input'), '&');
$post = count($_POST, COUNT_RECURSIVE);
echo $php_input, $post, $max_input_vars;
return $php_input > $post;
}
echo checkMaxInputVars() ? 'POST has been truncated.': 'POST is not truncated.';
Call error_get_last() as soon as possible in your script (before you have a chance to cause errors, as they will obscure this one.) In my testing, the max_input_vars warning will be there if applicable.
Here is my test script with max_input_vars set to 100:
<?php
if (($error = error_get_last()) !== null) {
echo 'got error:';
var_dump($error);
return;
}
unset($error);
if (isset($_POST['0'])) {
echo 'Got ',count($_POST),' vars';
return;
}
?>
<form method="post">
<?php
for ($i = 0; $i < 200; $i++) {
echo '<input name="',$i,'" value="foo" type="hidden">';
}
?>
<input type="submit">
</form>
Output when var limit is hit:
got error:
array
'type' => int 2
'message' => string 'Unknown: Input variables exceeded 100. To increase the limit change max_input_vars in php.ini.' (length=94)
'file' => string 'Unknown' (length=7)
'line' => int 0
Tested on Ubuntu with PHP 5.3.10 and Apache 2.2.22.
I would be hesitant to check explicitly for this error string, for stability (they could change it) and general PHP good practice. I prefer to turn all PHP errors into exceptions, like this (separate subclasses may be overkill, but I like this example because it allows # error suppression.) It would be a little different coming from error_get_last() but should be pretty easy to adapt.
I don't know if there are other pre-execution errors that could get caught by this method.
What about something like that:
$num_vars = count( explode( '###', http_build_query($array, '', '###') ) );
You can repeat it both for $_POST, $_GET, $_COOKIE, whatever.
Still cant be considered 100% accurate, but I guess it get pretty close to it.

issue with stream_select() in PHP

I am using stream_select() but it returns 0 number of descriptors after few seconds and my function while there is still data to be read.
An unusual thing though is that if you set the time out as 0 then I always get the number of descriptors as zero.
$num = stream_select($read, $w, $e, 0);
stream_select() must be used in a loop
The stream_select() function basically just polls the stream selectors you provided in the first three arguments, which means it will wait until one of the following events occur:
some data arrives
or reaches timeout (set with $tv_sec and $tv_usec) without getting any data.
So recieving 0 as a return value is perfectly normal, it means there was no new data in the current polling cycle.
I'd suggest to put the function in a loop something like this:
$EOF = false;
do {
$tmp = null;
$ready = stream_select($read, $write, $excl, 0, 50000);
if ($ready === false ) {
// something went wrong!!
break;
} elseif ($ready > 0) {
foreach($read as $r) {
$tmp .= stream_get_contents($r);
if (feof($r)) $EOF = true;
}
if (!empty($tmp)) {
//
// DO SOMETHING WITH DATA
//
continue;
}
} else {
// No data in the current cycle
}
} while(!$EOF);
Please note that in this example, the script totally ignores everything aside from the input stream. Also, the third section of the "if" statement is completely optional.
Does it return the number 0 or a FALSE boolean? FALSE means there was some error but zero could be just because of timeout or nothing interesting has happen with the streams and you should do a new select etc.
I would guess this could happen with a zero timeout as it will check and return immediately. Also if you read the PHP manual about stream-select you will see this warning about using zero timeout:
Using a timeout value of 0 allows you to instantaneously poll the status of the streams, however, it is NOT a good idea to use a 0 timeout value in a loop as it will cause your script to consume too much CPU time.
If this is a TCP stream and you want to check for connection close you should check the return value from fread etc to determine if the other peer has closed the conneciton. About the read streams array argument:
The streams listed in the read array will be watched to see if characters become available for reading (more precisely, to see if a read will not block - in particular, a stream resource is also ready on end-of-file, in which case an fread() will return a zero length string).
http://www.php.net/stream_select
Due to a limitation in the current Zend Engine it is not possible to
pass a constant modifier like NULL directly as a parameter to a
function which expects this parameter to be passed by reference.
Instead use a temporary variable or an expression with the leftmost
member being a temporary variable:
<?php $e = NULL; stream_select($r, $w, $e, 0); ?>
I have a similar issue which is caused by the underlying socket timeout.
Eg. I create some streams
$streams = stream_socket_pair(STREAM_PF_UNIX, STREAM_SOCK_STREAM, STREAM_IPPROTO_IP);
Then fork, and use a block such as the following
stream_set_blocking($pipes[1], 1);
stream_set_blocking($pipes[2], 1);
$pipesToRead = array($pipes[1], $pipes[2]);
while (!feof($pipesToRead[0]) || !feof($pipesToRead[1])) {
$reads = $pipesToRead;
$writes = null;
$excepts = $pipesToRead;
$tSec = null;
stream_select($reads, $writes, $excepts, $tSec);
// while it's generating any kind of output, duplicate it wherever it
// needs to go
foreach ($reads as &$read) {
$chunk = fread($read, 8192);
foreach ($streams as &$stream)
fwrite($stream, $chunk);
}
}
Glossing over what other things might be wrong there, my $tSec argument to stream_select is ignored, and the "stream" will timeout after 60 seconds of inactivity and produce an EOF.
If I add the following after creating the streams
stream_set_timeout($streams[0], 999);
stream_set_timeout($streams[1], 999);
Then I get the result I desire, even if there's no activity on the underlying stream for longer than 60 seconds.
I feel that this might be a bug, because I don't want that EOF after 60 seconds of inactivity on the underlying stream, and I don't want to plug in some arbitrarily large value to avoid hitting the timeout if my processes are idle for some time.
In addition, even if the 60 second timeout remains, I think it should just timeout on my stream_select() call and my loop should be able to continue.

How do you debug php "Out of Memory" issues?

I've had some issues lately with PHP memory limits lately:
Out of memory (allocated 22544384) (tried to allocate 232 bytes)
These are quite the nuisance to debug since I'm not left with a lot of info about what caused the issue.
Adding a shutdown function has helped
register_shutdown_function('shutdown');
then, using error_get_last(); I can obtain information about the last error, in this case, the "Out of memory" fatal error, such as the line number, and the php file name.
This is nice and all, but my php program is heavily object oriented. An error deep in the stack doesn't tell me much about the control structure or the execution stack at the moment of the error. I've tried debug_backtrace(), but that just shows me the stack during shutdown, not the stack at the time of the error.
I know I can just raise the memory limit using ini_set or modifying php.ini, but that doesn't get me any closer to actually figuring out what is consuming so much memory or what my execution flow looks like during the error.
Anyone have a good methodology for debugging memory errors in advanced Object Oriented PHP programs?
echo '<pre>';
$vars = get_defined_vars();
foreach($vars as $name=>$var)
{
echo '<strong>' . $name . '</strong>: ' . strlen(serialize($var)) . '<br />';
}
exit();
/* ... Code that triggers memory error ... */
I use this to print out a list of currently assigned variables just before a problem section of my code, along with a (very) rough estimate of the size of the variable. I go back and unset anything that isn't needed at and beyond the point of interest.
It's useful when installing an extension isn't an option.
You could modify the above code to use memory_get_usage in a way that will give you a different estimate of the memory in a variable, not sure whether it'd be better or worse.
Memprof is a php extension that helps finding those memory-eaters snippets, specially in object-oriented codes.
This adapted tutorial is quite useful.
Note: I unsuccessfully tried to compile this extension for windows. If you try so, be sure your php is not thread safe. To avoid some headaches I suggest you to use it under *nix environments.
Another interesting link was a slideshare describing how php handles memory. It gives you some clues about your script's memory usage.
I wonder is perhaps your thinking regards methodology is flawed here.
The basic answer to your question - how do I find out where this error is occurring? - has already been answered; you know what's causing that.
However, this is one of those cases where the triggering error isn't really the problem - certainly, that 232 byte object isn't your problem at all. It is the 20+Megs that was allocated before it.
There have been some ideas posted which can help you track that down; you really need to look "higher level" here, at the application architecture, and not just at individual functions.
It may be that your application requires more memory to do what it does, with the user load you have. Or it may be that there are some real memory hogs that are unnecessary - but you have to know what is necessary or not to answer that question.
That basically means going line-by-line, object-by-object, profiling as needed, until you find what you seek; big memory users. Note that there might not be one or two big items... if only it were so easy! Once you find the memory-hogs, you then have to figure out if they can be optimized. If not, then you need more memory.
Check the documentation of the function memory_get_usage() to view the memory usage in run time.
Website "IF !1 0" provides a simple to use MemoryUsageInformation class. It is very useful for debugging memory leaks.
<?php
class MemoryUsageInformation
{
private $real_usage;
private $statistics = array();
// Memory Usage Information constructor
public function __construct($real_usage = false)
{
$this->real_usage = $real_usage;
}
// Returns current memory usage with or without styling
public function getCurrentMemoryUsage($with_style = true)
{
$mem = memory_get_usage($this->real_usage);
return ($with_style) ? $this->byteFormat($mem) : $mem;
}
// Returns peak of memory usage
public function getPeakMemoryUsage($with_style = true)
{
$mem = memory_get_peak_usage($this->real_usage);
return ($with_style) ? $this->byteFormat($mem) : $mem;
}
// Set memory usage with info
public function setMemoryUsage($info = '')
{
$this->statistics[] = array('time' => time(),
'info' => $info,
'memory_usage' => $this->getCurrentMemoryUsage());
}
// Print all memory usage info and memory limit and
public function printMemoryUsageInformation()
{
foreach ($this->statistics as $satistic)
{
echo "Time: " . $satistic['time'] .
" | Memory Usage: " . $satistic['memory_usage'] .
" | Info: " . $satistic['info'];
echo "\n";
}
echo "\n\n";
echo "Peak of memory usage: " . $this->getPeakMemoryUsage();
echo "\n\n";
}
// Set start with default info or some custom info
public function setStart($info = 'Initial Memory Usage')
{
$this->setMemoryUsage($info);
}
// Set end with default info or some custom info
public function setEnd($info = 'Memory Usage at the End')
{
$this->setMemoryUsage($info);
}
// Byte formatting
private function byteFormat($bytes, $unit = "", $decimals = 2)
{
$units = array('B' => 0, 'KB' => 1, 'MB' => 2, 'GB' => 3, 'TB' => 4,
'PB' => 5, 'EB' => 6, 'ZB' => 7, 'YB' => 8);
$value = 0;
if ($bytes > 0)
{
// Generate automatic prefix by bytes
// If wrong prefix given
if (!array_key_exists($unit, $units))
{
$pow = floor(log($bytes) / log(1024));
$unit = array_search($pow, $units);
}
// Calculate byte value by prefix
$value = ($bytes / pow(1024, floor($units[$unit])));
}
// If decimals is not numeric or decimals is less than 0
// then set default value
if (!is_numeric($decimals) || $decimals < 0)
{
$decimals = 2;
}
// Format output
return sprintf('%.' . $decimals . 'f ' . $unit, $value);
}
}
Use xdebug to profile memory usage.

PHP extension for Linux: reality check needed!

Okay, I've written my first functional PHP extension. It worked but it was a proof-of-concept only. Now I'm writing another one which actually does what the boss wants.
What I'd like to know, from all you PHP-heads out there, is whether this code makes sense. Have I got a good grasp of things like emalloc and the like, or is there stuff there that's going to turn around later and try to bite my hand off?
Below is the code for one of the functions. It returns a base64 of a string that has also been Blowfish encrypted. When the function is called, it is supplied with two strings, the text to encrypt and encode, and the key for the encryption phase. It's not using PHP's own base64 functions because, at this point, I don't know how to link to them. And it's not using PHP's own mcrypt functions for the same reason. Instead, it links in the SSLeay BF_ecb_encrypt functions.
PHP_FUNCTION(Blowfish_Base64_encode)
{
char *psData = NULL;
char *psKey = NULL;
int argc = ZEND_NUM_ARGS();
int psData_len;
int psKey_len;
char *Buffer = NULL;
char *pBuffer = NULL;
char *Encoded = NULL;
BF_KEY Context;
int i = 0;
unsigned char Block[ 8 ];
unsigned char * pBlock = Block;
char *plaintext;
int plaintext_len;
int cipher_len = 0;
if (zend_parse_parameters(argc TSRMLS_CC, "ss", &psData, &psData_len, &psKey, &psKey_len) == FAILURE)
return;
Buffer = (char *) emalloc( psData_len * 2 );
pBuffer = Buffer;
Encoded = (char *) emalloc( psData_len * 4 );
BF_set_key( &Context, psKey_len, psKey );
plaintext = psData;
plaintext_len = psData_len;
for (;;)
{
if (plaintext_len--)
{
Block[ i++ ] = *plaintext++;
if (i == 8 )
{
BF_ecb_encrypt( Block, pBuffer, &Context, BF_ENCRYPT );
pBuffer += 8;
cipher_len += 8;
memset( Block, 0, 8 );
i = 0;
}
} else {
BF_ecb_encrypt( Block, pBuffer, &Context, BF_ENCRYPT );
cipher_len += 8;
break;
}
}
b64_encode( Encoded, Buffer, cipher_len );
RETURN_STRINGL( Encoded, strlen( Encoded ), 0 );
}
You'll notice that I have two emalloc calls, for Encoded and for Buffer. Only Encoded is passed back to the caller, so I'm concerned that Buffer won't be freed. Is that the case? Should I use malloc/free for Buffer?
If there are any other glaring errors, I'd really appreciate knowing.
emalloc() allocates memory per request, and it's free()'d automatically when the runtime ends.
You should, however, compile PHP with
--enable-debug --enable-maintainer-zts
It will tell you if anything goes wrong (it can detect memory leaks if you've used the e*() functions and report_memleaks is set in your php.ini).
And yes, you should efree() Buffer.
You'll notice that I have two emalloc calls, for Encoded and for Buffer. Only Encoded is passed back to the caller, so I'm concerned that Buffer won't be freed. Is that the case? Should I use malloc/free for Buffer?
Yes, you should free it with efree before returning.
Although PHP has safety net and memory allocated with emalloc will be freed at the end of the request, it's still a bug to leak memory and, depending you will warned if running a debug build with report_memleaks = On.

Categories