I want to send a UDP message over a network in PHP.
The message has a predefined protocol say the message must be 10 bytes long, in which the first 3 bytes are for entry 1, 2 bytes for entry 2, 3 bytes are for entry 3, and the last 2 bytes for entry 4.
How can I do this in PHP?
As in C we can use memcpy.
Just create a string of the required bytes. A character of a string is a byte in PHP.
I really wish there was a fast way to do this in PHP! All the PHP string functions
return a string rather than allow you to directly copy into an existing string.
The latter is extremely useful from a memory perspective if you have megabyte -
sized arrays. The following code is a PHP variant of the kind of thing I would like...
It would be nice if you could do this kinda manipulation directly in PHP as a native
function call rather than have to make your own extension. In the below,
$source is a short string we would like to copy over part of the data in the
much longer string $destination. We pass the latter by reference so we don't
waste memory by doing the function call.
function charCopy($source, &$destination, $start, $length)
{
$endk = intval($length - 1);
$end = intval($start + $endk);
$start = intval($start);
for($j = $end, $k = $endk; $j >= $start; $j--, $k--) {
$destination[$j] = $source[$k];
}
}
this is close.
function &memcpy(&$dest,$src,$n){
$dest=substr($src,0,$n) . substr($dest,$n);
return $dest;
}
note that unlike C, PHP manage memory & realloc for you, and you can't overflow :p i don't think you can make an exact memcpy clone in userland, you could throw some exceptions / trigger_error() if $n > strlen($src) or $n>strlen($dest) ofc. the code above will just automatically rezie $dest for you, if needed, unlike C's memcpy.
Related
In PHP, function parameters can be passed by reference by prepending an ampersand to the parameter in the function declaration, like so:
function foo(&$bar)
{
// ...
}
Now, I am aware that this is not designed to improve performance, but to allow functions to change variables that are normally out of their scope.
Instead, PHP seems to use Copy On Write to avoid copying objects (and maybe also arrays) until they are changed. So, for functions that do not change their parameters, the effect should be the same as if you had passed them by reference.
However, I was wondering if the Copy On Write logic maybe is shortcircuited on pass-by-reference and whether that has any performance impact.
ETA: To be sure, I assume that it's not faster, and I am well aware that this is not what references are for. So I think my own guesses are quite good, I'm just looking for an answer from someone who really knows what's definitely happening under the hood. In five years of PHP development, I've always found it hard to get quality information on PHP internals short from reading the source.
In a test with 100 000 iterations of calling a function with a string of 20 kB, the results are:
Function that just reads / uses the parameter
pass by value: 0.12065005 seconds
pass by reference: 1.52171397 seconds
Function to write / change the parameter
pass by value: 1.52223396 seconds
pass by reference: 1.52388787 seconds
Conclusions
Pass the parameter by value is always faster
If the function change the value of the variable passed, for practical purposes is the same as pass by reference than by value
The Zend Engine uses copy-on-write, and when you use a reference yourself, it incurs a little extra overhead. Can only find this mention at time of writing though, and comments in the manual contain other links.
(EDIT) The manual page on Objects and references contains a little more info on how object variables differ from references.
I ran some test on this because I was unsure of the answers given.
My results show that passing large arrays or strings by reference IS significantly faster.
Here are my results:
The Y axis (Runs) is how many times a function could be called in 1 second * 10
The test was repeated 8 times for each function/variable
And here is the variables I used:
$large_array = array_fill(PHP_INT_MAX / 2, 1000, 'a');
$small_array = array('this', 'is', 'a', 'small', 'array');
$large_object = (object)$large_array;
$large_string = str_repeat('a', 100000);
$small_string = 'this is a small string';
$value = PHP_INT_MAX / 2;
These are the functions:
function pass_by_ref(&$var) {
}
function pass_by_val($var) {
}
I have experimented with values and references of 10k bytes string passing it to two identical function. One takes argument by value and the second one by reference. They were common functions - take argument, do simple processing and return a value. I did 100 000 calls of both and figured out that references are not designed to increase performance - profit of reference was near 4-5% and it grows only when string becomes large enough (100k and longer, that gave 6-7% improvement). So, my conclusion is do not use references to increase perfomance, this stuff is not for that.
I used PHP Version 5.3.1
I'm pretty sure that no, it's not faster.
Additionally, it says specifically in the manual not to try using references to increase performance.
Edit: Can't find where it says that, but it's there!
I tried to benchmark this with a real-world example based on a project I was working on. As always, the differences are trivial, but the results were somewhat unexpected. For most of the benchmarks I've seen, the called function doesn't actually change the value passed in. I performed a simple str_replace() on it.
**Pass by Value Test Code:**
$originalString=''; // 1000 pseudo-random digits
function replace($string) {
return str_replace('1', 'x',$string);
}
$output = '';
/* set start time */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tstart = $mtime;
set_time_limit(0);
for ($i = 0; $i < 10; $i++ ) {
for ($j = 0; $j < 1000000; $j++) {
$string = $originalString;
$string = replace($string);
}
}
/* report how long it took */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tend = $mtime;
$totalTime = ($tend - $tstart);
$totalTime = sprintf("%2.4f s", $totalTime);
$output .= "\n" . 'Total Time' .
': ' . $totalTime;
$output .= "\n" . $string;
echo $output;
Pass by Reference Test Code
The same except for
function replace(&$string) {
$string = str_replace('1', 'x',$string);
}
/* ... */
replace($string);
Results in seconds (10 million iterations):
PHP 5
Value: 14.1007
Reference: 11.5564
PHP 7
Value: 3.0799
Reference: 2.9489
The difference is a fraction of a millisecond per function call, but for this use case, passing by reference is faster in both PHP 5 and PHP 7.
(Note: the PHP 7 tests were performed on a faster machine -- PHP 7 is faster, but probably not that much faster.)
There is nothing better than a testing piece of code
<?PHP
$r = array();
for($i=0; $i<500;$i++){
$r[]=5;
}
function a($r){
$r[0]=1;
}
function b(&$r){
$r[0]=1;
}
$start = microtime(true);
for($i=0;$i<9999;$i++){
//a($r);
b($r);
}
$end = microtime(true);
echo $end-$start;
?>
Final result! The bigger the array (or the greater the count of calls) the bigger the difference. So in this case, calling by reference is faster because the value is changed inside the function.
Otherwise there is no real difference between "by reference" and "by value", the compiler is smart enough not to create a new copy each time if there is no need.
Is simple, there is no need to test anything.
Depends on use-case.
Pass by value will ALWAYS BE FASTER BY VALUE than reference for small amount of arguments. This depends by how many variables that architecture allows to be passed through registers (ABI).
For example x64 will allow you 4 values 64 bit each to be passed through registers.
https://en.wikipedia.org/wiki/X86_calling_conventions
This is because you don't have to de-referentiate the pointers, just use value directly.
If your data that needs to be passed is bigger than ABI, rest of values will go to stack.
In this case, a array or a object (which in instance is a class, or a structure + headers) will ALWAYS BE FASTER BY REFERENCE.
This is because a reference is just a pointer to your data (not data itself), fixed size, say 32 or 64 bit depending on machine. That pointer will fit in one CPU register.
PHP is written in C/C++ so I'd expect to behave the same.
There is no need for adding & operator when passing objects. In PHP 5+ objects are passed by reference anyway.
In my PHP script I need to create an array of >600k integers. Unfortunately my webservers memory_limit is set to 32M so when initializing the array the script aborts with message
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 71 bytes) in /home/www/myaccount/html/mem_test.php on line 8
I am aware of the fact, that PHP does not store the array values as plain integers, but rather as zvalues which are much bigger than the plain integer value (8 bytes on my 64-bit system). I wrote a small script to estimate how much memory each array entry uses and it turns out, that it's pretty exactly 128 bytes. 128!!! I'd need >73M just to store the array. Unfortunately the webserver is not under my control so I cannot increase the memory_limit.
My question is, is there any possibility in PHP to create an array-like structure that uses less memory. I don't need this structure to be associative (plain index-access is sufficient). It also does not need to have dynamic resizing - I know exactly how big the array will be. Also, all elements would be of the same type. Just like a good old C-array.
Edit:
So deceze's solution works out-of-the-box with 32-bit integers. But even if you're on a 64-bit system, pack() does not seem to support 64-bit integers. In order to use 64-bit integers in my array I applied some bit-manipulation. Perhaps the below snippets will be of help for someone:
function push_back(&$storage, $value)
{
// split the 64-bit value into two 32-bit chunks, then pass these to pack().
$storage .= pack('ll', ($value>>32), $value);
}
function get(&$storage, $idx)
{
// read two 32-bit chunks from $storage and glue them back together.
return (current(unpack('l', substr($storage, $idx * 8, 4)))<<32 |
current(unpack('l', substr($storage, $idx * 8+4, 4))));
}
The most memory efficient you'll get is probably by storing everything in a string, packed in binary, and use manual indexing to it.
$storage = '';
$storage .= pack('l', 42);
// ...
// get 10th entry
$int = current(unpack('l', substr($storage, 9 * 4, 4)));
This can be feasible if the "array" initialisation can be done in one fell swoop and you're just reading from the structure. If you need a lot of appending to the string, this becomes extremely inefficient. Even this can be done using a resource handle though:
$storage = fopen('php://memory', 'r+');
fwrite($storage, pack('l', 42));
...
This is very efficient. You can then read this buffer back into a variable and use it as string, or you can continue to work with the resource and fseek.
A PHP Judy Array will use significantly less memory than a standard PHP array, and an SplFixedArray.
I quote "An array with 1 million entries using regular PHP array data structure takes 200MB. SplFixedArray uses around 90 megabytes. Judy uses 8 megs. Tradeoff is in performance, Judy takes about double the time of regular php array implementation."
You could use an object if possible. These often use less memory than array's.
Also SplFixedArray is an good option.
But it really depends on the implementation that you need to do. If you need an function to return an array and are using PHP 5.5. You could use the generator yield to stream the array back.
You can try to use a SplFixedArray, it's faster and take less memory (the doc comment say ~30% less). Test here and here.
Use a string - that's what I'd do. Store it in a string on fixed offsets (16 or 20 digits should do it I guess?) and use substr to get the one needed. Blazing fast write / read, super easy, and 600.000 integers will only take ~12M to store.
base_convert() - if you need something more compact but with minimum effort, convert your integers to base-36 instead of base-10; in this case, a 14-digit number would be stored in 9 alphanumeric characters. You'll need to make 2 pieces of 64-bit ints, but I'm sure that's not a problem. (I'd split them to 9-digit chunks where conversion gives you a 6-char version.)
pack()/unpack() - binary packing is the same thing with a bit more efficiency. Use it if nothing else works; split your numbers to make them fit to two 32-bit pieces.
600K is a lot of elements. If you are open to alternative methods, I personally would use a database for that. Then use standard sql/nosql select syntax to pull things out. Perhaps memcache or redis if you have an easy host for that, such as garantiadata.com. Maybe APC.
Depending on how you are generate the integers, you could potentially use PHP's generators, assuming you are traversing the array and doing something with individual values.
I took the answer by #deceze and wrapped it in a class that can handle 32-bit integers. It is append-only, but you can still use it as a simple, memory-optimized PHP Array, Queue, or Heap. AppendItem and ItemAt are both O(1), and it has no memory overhead. I added currentPosition/currentSize to avoid unnecessary fseek function calls. If you need to cap memory usage and switch to a temporary file automatically, use php://temp instead.
class MemoryOptimizedArray
{
private $_storage;
private $_currentPosition;
private $_currentSize;
const BYTES_PER_ENTRY = 4;
function __construct()
{
$this->_storage = fopen('php://memory', 'rw+');
$this->_currentPosition = 0;
$this->_currentSize = 0;
}
function __destruct()
{
fclose($this->_storage);
}
function AppendItem($value)
{
if($this->_currentPosition != $this->_currentSize)
{
fseek($this->_storage, SEEK_END);
}
fwrite($this->_storage, pack('l', $value));
$this->_currentSize += self::BYTES_PER_ENTRY;
$this->_currentPosition = $this->_currentSize;
}
function ItemAt($index)
{
$itemPosition = $index * self::BYTES_PER_ENTRY;
if($this->_currentPosition != $itemPosition)
{
fseek($this->_storage, $itemPosition);
}
$binaryData = fread($this->_storage, self::BYTES_PER_ENTRY);
$this->_currentPosition = $itemPosition + self::BYTES_PER_ENTRY;
$unpackedElements = unpack('l', $binaryData);
return $unpackedElements[1];
}
}
$arr = new MemoryOptimizedArray();
for($i = 0; $i < 3; $i++)
{
$v = rand(-2000000000,2000000000);
$arr->AddToEnd($v);
print("added $v\n");
}
for($i = 0; $i < 3; $i++)
{
print($arr->ItemAt($i)."\n");
}
for($i = 2; $i >=0; $i--)
{
print($arr->ItemAt($i)."\n");
}
I have to calculate a % b for two very large numbers.
I can not use the default modulo operator, because a and b are larger then PHP_INT_MAX, so I have to handle them as "strings".
I know that there exists special math libraries like BC or GMP but I can't use them, because my app probably will hosted on a shared host, where these are not enabled.
I have to write a function in php that will do the job. The function will take two strings (the two number) as parameters and have to return a % b, but I don't know how to start?
How to solve this problem?
Since PHP 4.0.4, libbcmath is bundled with PHP. You don't need any external libraries for this extension. These functions are only available if PHP was configured with --enable-bcmath .
The Windows version of PHP has built-in support for this extension. You do not need to load any additional extensions in order to use these functions. You should be able to enable these functions yourself, without any action on the part of the hosting company.
I though of this solution:
$n represents a huge number, $m the (not so huge) modulus.
function getModulus($n, $m)
{
$a = str_split($n);
$r = 0;
foreach($a as $v)
{
$r = ((($r * 10) + intval($v)) % $m);
}
return $r;
}
Hope it helps someone,
Depending on your processor, if using 64 bit machine 2^63-1 and if 32 bit machine 2^31-1 should give you the length of your decimal your machine can compute. above that you will get wrong values.
You can do the same by splitting your number into chunks.
Example:
my number is 18 decimal long thus, split into chunks of 9/7/2 = 18.
calculate the mod of the first chunk.
Append the mod of the first one to the front of the second chunk.
Example: result of the first mod = 23, thus 23XXXXXXX. find the mod of the resulting 23XXXXXXX. add the mod to the last chunk. Example: mod = 15 then 15XX.
$string = '123456789123456789'; // 18 decimal long
$chunk[0] = '123456789'; // 9 decimal long
$chunk[1] = '1234567'; // 7 decimal long
$chunk[2] = '89'; // 2 decimal long
$modulus = null;
foreach($chunk as $value){
$modulus = (int)($modulus.$value) % 45;
}
The result $modulus above should be same as
$modulus = $tring % 45
Better late than even.
Hope this will help. anyone with similar approach?
You can use fmod for values larger than MAX_INT
Read more about it here
http://php.net/manual/en/function.fmod.php
I've seen this Markov Chain gibberish detector written in response to another question on Stackoverflow and I would like to convert it to PHP, I'm not looking for someone to do this for me, but I am confused over portions of the Python code that I have no knowledge of. I've looked at the python docs but it confuses me even further.
What is the PHP equivalent of yield?
def ngram(n, l):
""" Return all n grams from l after normalizing """
filtered = normalize(l)
for start in range(0, len(filtered) - n + 1):
yield ''.join(filtered[start:start + n])
What exactly is xrange? There is a PECL extension, however I would prefer a pure PHP implementation? Would this be possible?
counts = [[10 for i in xrange(k)] for i in xrange(k)]
for i, row in enumerate(counts):
s = float(sum(row))
for j in xrange(len(row)):
row[j] = math.log(row[j] / s)
What does assert do? Is it the equivalent of throwing an Exception?
assert min(good_probs) > max(bad_probs)
Python Pickle, is that essentially serialize?
pickle.dump({'mat': counts, 'thresh': thresh}, open('gib_model.pki', 'wb'))
Thanks for any help.
Edit: typos.
1. What is the PHP equivalent of yield?
There is no equivalent to yield in PHP. yield is used in generator functions - a special class of function that returns a result but retains its state.
For example:
def simple_generator(start=0, end=100):
while start < end:
start += 1
yield start
gen = simple_generator()
gen() # 1
gen() # 2
gen() # 3
You can do something similar in PHP like so:
class simple_generator {
private $start;
private $end;
function __construct($start=0, $end=100) {
$this->start = $start;
$this->end = $end;
}
function __call() {
if($this->start < $this->end) {
$this->start++;
return $start;
}
}
}
gen = simple_generator();
gen(); // 1
gen(); // 2
2. What exactly is xrange?
xrange behaves just like range, but uses a generator function. This is a performance tweak for working with very large lists or when memory is tight.
3. What does assert do? Is it the equivalent of throwing and Exception?
Yes. Beware - it is not the same as PHP's assert - which is a really fun vector for attacks on your software.
4. Python Pickle, is that essentially serialize?
Yes.
xrange returns an iterator. This is different from range which returns a list. Both behave mostly in the same fashion so just use it like you use range.
Yes
Yes
In PHP, function parameters can be passed by reference by prepending an ampersand to the parameter in the function declaration, like so:
function foo(&$bar)
{
// ...
}
Now, I am aware that this is not designed to improve performance, but to allow functions to change variables that are normally out of their scope.
Instead, PHP seems to use Copy On Write to avoid copying objects (and maybe also arrays) until they are changed. So, for functions that do not change their parameters, the effect should be the same as if you had passed them by reference.
However, I was wondering if the Copy On Write logic maybe is shortcircuited on pass-by-reference and whether that has any performance impact.
ETA: To be sure, I assume that it's not faster, and I am well aware that this is not what references are for. So I think my own guesses are quite good, I'm just looking for an answer from someone who really knows what's definitely happening under the hood. In five years of PHP development, I've always found it hard to get quality information on PHP internals short from reading the source.
In a test with 100 000 iterations of calling a function with a string of 20 kB, the results are:
Function that just reads / uses the parameter
pass by value: 0.12065005 seconds
pass by reference: 1.52171397 seconds
Function to write / change the parameter
pass by value: 1.52223396 seconds
pass by reference: 1.52388787 seconds
Conclusions
Pass the parameter by value is always faster
If the function change the value of the variable passed, for practical purposes is the same as pass by reference than by value
The Zend Engine uses copy-on-write, and when you use a reference yourself, it incurs a little extra overhead. Can only find this mention at time of writing though, and comments in the manual contain other links.
(EDIT) The manual page on Objects and references contains a little more info on how object variables differ from references.
I ran some test on this because I was unsure of the answers given.
My results show that passing large arrays or strings by reference IS significantly faster.
Here are my results:
The Y axis (Runs) is how many times a function could be called in 1 second * 10
The test was repeated 8 times for each function/variable
And here is the variables I used:
$large_array = array_fill(PHP_INT_MAX / 2, 1000, 'a');
$small_array = array('this', 'is', 'a', 'small', 'array');
$large_object = (object)$large_array;
$large_string = str_repeat('a', 100000);
$small_string = 'this is a small string';
$value = PHP_INT_MAX / 2;
These are the functions:
function pass_by_ref(&$var) {
}
function pass_by_val($var) {
}
I have experimented with values and references of 10k bytes string passing it to two identical function. One takes argument by value and the second one by reference. They were common functions - take argument, do simple processing and return a value. I did 100 000 calls of both and figured out that references are not designed to increase performance - profit of reference was near 4-5% and it grows only when string becomes large enough (100k and longer, that gave 6-7% improvement). So, my conclusion is do not use references to increase perfomance, this stuff is not for that.
I used PHP Version 5.3.1
I'm pretty sure that no, it's not faster.
Additionally, it says specifically in the manual not to try using references to increase performance.
Edit: Can't find where it says that, but it's there!
I tried to benchmark this with a real-world example based on a project I was working on. As always, the differences are trivial, but the results were somewhat unexpected. For most of the benchmarks I've seen, the called function doesn't actually change the value passed in. I performed a simple str_replace() on it.
**Pass by Value Test Code:**
$originalString=''; // 1000 pseudo-random digits
function replace($string) {
return str_replace('1', 'x',$string);
}
$output = '';
/* set start time */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tstart = $mtime;
set_time_limit(0);
for ($i = 0; $i < 10; $i++ ) {
for ($j = 0; $j < 1000000; $j++) {
$string = $originalString;
$string = replace($string);
}
}
/* report how long it took */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tend = $mtime;
$totalTime = ($tend - $tstart);
$totalTime = sprintf("%2.4f s", $totalTime);
$output .= "\n" . 'Total Time' .
': ' . $totalTime;
$output .= "\n" . $string;
echo $output;
Pass by Reference Test Code
The same except for
function replace(&$string) {
$string = str_replace('1', 'x',$string);
}
/* ... */
replace($string);
Results in seconds (10 million iterations):
PHP 5
Value: 14.1007
Reference: 11.5564
PHP 7
Value: 3.0799
Reference: 2.9489
The difference is a fraction of a millisecond per function call, but for this use case, passing by reference is faster in both PHP 5 and PHP 7.
(Note: the PHP 7 tests were performed on a faster machine -- PHP 7 is faster, but probably not that much faster.)
There is nothing better than a testing piece of code
<?PHP
$r = array();
for($i=0; $i<500;$i++){
$r[]=5;
}
function a($r){
$r[0]=1;
}
function b(&$r){
$r[0]=1;
}
$start = microtime(true);
for($i=0;$i<9999;$i++){
//a($r);
b($r);
}
$end = microtime(true);
echo $end-$start;
?>
Final result! The bigger the array (or the greater the count of calls) the bigger the difference. So in this case, calling by reference is faster because the value is changed inside the function.
Otherwise there is no real difference between "by reference" and "by value", the compiler is smart enough not to create a new copy each time if there is no need.
Is simple, there is no need to test anything.
Depends on use-case.
Pass by value will ALWAYS BE FASTER BY VALUE than reference for small amount of arguments. This depends by how many variables that architecture allows to be passed through registers (ABI).
For example x64 will allow you 4 values 64 bit each to be passed through registers.
https://en.wikipedia.org/wiki/X86_calling_conventions
This is because you don't have to de-referentiate the pointers, just use value directly.
If your data that needs to be passed is bigger than ABI, rest of values will go to stack.
In this case, a array or a object (which in instance is a class, or a structure + headers) will ALWAYS BE FASTER BY REFERENCE.
This is because a reference is just a pointer to your data (not data itself), fixed size, say 32 or 64 bit depending on machine. That pointer will fit in one CPU register.
PHP is written in C/C++ so I'd expect to behave the same.
There is no need for adding & operator when passing objects. In PHP 5+ objects are passed by reference anyway.