Cheating PHP integers - php

This is in relation to my post here but taken in a completely different direction
Charset detection in PHP
essentially, i'm looking to reduce the memory that many huge arrays cause.
These arrays are just full of integers but seeing as PHP uses 32bit and 64 bit integers internally (depending which version you have compiled for your CPU type), it eats the memory.
is there a way to cheat PHP into using 8bit or 16bit integers?
I've thought about using pack(); to accomplish this so I can have an array of packed binary values and just unpack them as I need them (yes I know this would make it slower but is much faster than the alternative of loading and then running through each array individually as you can stream the text through so they all need to be in memory at the same time to keep speed up)
can you suggest any better alternatives to accomplish this? i know it's very hacky but I need to prevent huge memory surges.

Don't tell nobody!
class IntegerstringArray IMPLEMENTS ArrayAccess {
var $evil = "0000111122220000ffff";
// 16 bit each
function offsetExists ( $offset ) {
return (strlen($this->evil) / 4) - 1 >= $offset;
}
function offsetGet ( $offset ) {
return hexdec(substr($this->evil, $offset * 4, 4));
}
function offsetSet ( $offset , $value ) {
$hex = dechex($value);
if ($fill = 4 - strlen($hex)) {
$hex = str_repeat("0", $fill) . $hex;
}
for ($i=0; $i<4; $i++) {
$this->evil[$offset*4+$i] = $hex[$i];
}
}
function offsetUnset ( $offset ) {
assert(false);
}
}
So you can pretty much create an array object from this:
$array = new IntegerstringArray();
$array[2] = 65535;
print $array[2];
It internally stores a list and accepts 16-bit integers. The array offsets must be consecutive.
Not tested. Just as an implementation guide.

Related

One-pass algorithm (clarification needed) Why the space complexity is O(1)?

From en.wikipedia:
A one-pass algorithm generally requires O(n) (see 'big O' notation) time and less than O(n) storage (typically O(1)), where n is the size of the input.
I made a test with xdebug.profiler_enable=1:
function onePassAlgorithm(array $inputArray): int
{
$size = count($inputArray);
for ($countElements = 0; $countElements < $size; ++$countElements) {
}
return $countElements;
}
$range = range(1, 1_000_000);
$result = onePassAlgorithm($range);
The memory usage of this code in qcachegrind is: 33 558 608 bytes, and all 100% of them was used by the range() function.
And this part seems to me ok, because inside the onePassAlgorithm function we have only two int variables.
And that's the reason why space complexity is O(1).
Then I made another test:
function onePassAlgorithm(array $inputArray, int $twoSum): array
{
$seen_nums = [];
foreach ($inputArray as $key => $num) {
$complement = $twoSum - $num;
if (isset($seen_nums[$complement])) {
return [$seen_nums[$complement], $key];
}
$seen_nums[$num] = $key;
}
return [];
}
$range = range(1, 1_000_000);
$result = onePassAlgorithm($range, 1_999_999);
In qcachegrind we can see that onePassAlogorithm function uses only 376 bytes (the size of the return statement). Don't we need more to sequentially save in $seen_nums variable? So again space complexity is O(1)?
My question is: Why qcachegrind shows that the variable $seen_nums in which we copy the entire $inputArray consumes no memory?
Or in other words why the storage complexity of my second realisation of this algorithm is O(1)?
From Xdebug documentation:
[2007-05-17] — Removed support for Memory profiling as that didn't work properly.
[2015-02-22] — Xdebug 2.3.0
Added the time index and memory usage for function returns in normal tracefiles.
So the reason of my confusion was in that xdebug profiles shows only memory usage for function returns, and not the full memory profiling that i was expected.

PHP rand vs mt_rand vs openssl_random_pseudo_bytes

I want to generate a random string and was doing some research and found the following link:
http://golearnphp.com/php-rand-vs-mt_rand-and-openssl_random_pseudo_bytes/
function generateRandom($length) {
$validCharacters = 'abcdefghijklmnopqrstuvwxyz0123456789';
$myKeeper = '';
for ($n = 1; $n < $length; $n++) {
$whichCharacter = rand(0, strlen($validCharacters) - 1);
$myKeeper .= $validCharacters{$whichCharacter};
}
return $myKeeper;
}
function generateRandomdMT($length) {
$validCharacters = 'abcdefghijklmnopqrstuvwxyz0123456789';
$myKeeper = '';
for ($n = 1; $n < $length; $n++) {
$whichCharacter = mt_rand(0, strlen($validCharacters) - 1);
$myKeeper .= $validCharacters{$whichCharacter};
}
return $myKeeper;
}
$start = microtime(true);
echo htmlentities(generateRandom(100000));
var_dump(microtime(true) - $start);
$start = microtime(true);
echo htmlentities(generateRandomdMT(100000));
var_dump(microtime(true) - $start);
$start = microtime(true);
echo htmlentities(substr(base64_encode(openssl_random_pseudo_bytes(100000)), 0, 100000));
var_dump(microtime(true) - $start);
In the post the writer is saying that openssl_random_pseudo_bytes is significant faster then the other two. Is this true? Is openssl_random_pseudo_bytes really that much faster? Is that the correct way to test the "fastness" of functions?
openssl_random_pseudo_bytes created to be crypto strong(check the second param). Rand is old rand function with small period of repeating. MT_Rand is better than rand but not supposed to be used by crypto systems.
I bet that the difference between execution time do not impact on your application.
Also. Those functions return different results. First two return string with 36 possible letters. And third one returns string with 64 possible symbols. Result of two first function is shorter than third one.
If you are making optimization to speed up your application first thing that you should to know: how to profile your code.
In the post the writer is saying that openssl_random_pseudo_bytes is significant faster then the other two. Is this true?
In normal situations mt_rand() is significantly faster than openssl_random_pseudo_bytes().
It's only slower in the test code you've posted because you are comparing apples and oranges. For rand() and mt_rand() you are using complex functions which build up a string one byte at a time, whereas for openssl_random_pseudo_bytes() you're using the raw binary stream it produces with base64_encode() which is going to be much faster.
If you could get a raw binary stream out of mt_rand() or rand(), or a sequence of numbers 0 to 63 from openssl_random_pseudo_bytes(), you could do an apples to apples comparison.
In my testing, I found mt_rand() about 4 times as fast as openssl_random_pseudo_bytes(4) when I used unpack('V', openssl_random_pseudo_bytes(4) & "\xff\xff\xff\x7f") in order to get an equivalent output to mt_rand(). However this is still technically an apples to oranges situation because I'm doing additional processing on one in order to match it to the other, just in the opposite direction to you.
The time you asked this question, there was a bug report here > https://bugs.php.net/bug.php?id=70014 (php 5.6.10) It seems to be fixed in new versions of PHP.
My experience using it has always been unnecessary, I prefer Mt_Rand() but if you are generating random values for encryption purposes like I am doing, then do not use it, you should use random_bytes() ref. https://www.php.net/manual/en/function.random-bytes.php

Implement an ERF function in PHP

I want to implement an ERF function in PHP. I got its formula from wikipedia
P_Value = 1- ( ERF ( ABS ( Residual –mean ) )/(√2*SD )
I didn't get the idea how to implement it in PHP.
Based on the formula you provided, only the ERF (Error function) part should look like this:
function ERF ($difference) {
return abs($difference);
}
Now call $value = ERF($residual - $mean); from anywhere inside a php script to store the ERF value in the $value variable.
Edit:
Let's assume you meant this formula:
So, it should be:
function ERF ($ll, $ul, $t, $dt, $dx) {
$val = 0;
for($i = $ll; $i <= $ul; $i+=$dx){
$val += exp(-pow($t,2)) * $dt;
}
return (2/sqrt(pi())) * $val;
}
now call, $value = ERF(0, $x, $t, $dt, $dx); where $x is the upper limit, $t is the time and $dt is the dt part of integration as in time interval.
Note: I've added another parameter $dx here because it is an integral for continuous domain and $dx, $dt both should be very close to 0 for better approximations. For discrete values, you can provide both $dx and $dt as 1. And that would rather be called summation than integral.
For better approximation on integration, see Numerical integration algorithms and techniques.
There are better ways to approximate the error function than by naive numerical integration. The Wikipedia article about erf has a formula for numerical approximation. You can probably find others in Abramowitz & Stegun "Handbook of Mathematical Functions" or maybe the Digital Library of Mathematical Functions.
I found an implementation in PHP here: http://php.net/manual/en/function.stats-stat-percentile.php (look for the term "error function" in the text). Not sure which formula this implements.

Workaround needed, PHP dechex maximum integer [duplicate]

I have some large HEX values that I want to display as regular numbers, I was using hexdec() to convert to float, and I found a function on PHP.net to convert that to decimal, but it seems to hit a ceiling, e.g.:
$h = 'D5CE3E462533364B';
$f = hexdec($h);
echo $f .' = '. Exp_to_dec($f);
Output: 1.5406319846274E+19 = 15406319846274000000
Result from calc.exe = 15406319846273791563
Is there another method to convert large hex values?
As said on the hexdec manual page:
The function can now convert values
that are to big for the platforms
integer type, it will return the value
as float instead in that case.
If you want to get some kind of big integer (not float), you'll need it stored inside a string. This might be possible using BC Math functions.
For instance, if you look in the comments of the hexdec manual page, you'll find this note
If you adapt that function a bit, to avoid a notice, you'll get:
function bchexdec($hex)
{
$dec = 0;
$len = strlen($hex);
for ($i = 1; $i <= $len; $i++) {
$dec = bcadd($dec, bcmul(strval(hexdec($hex[$i - 1])), bcpow('16', strval($len - $i))));
}
return $dec;
}
(This function has been copied from the note I linked to; and only a bit adapted by me)
And using it on your number:
$h = 'D5CE3E462533364B';
$f = bchexdec($h);
var_dump($f);
The output will be:
string '15406319846273791563' (length=20)
So, not the kind of big float you had ; and seems OK with what you are expecting:
Result from calc.exe =
15406319846273791563
Hope this help ;-)
And, yes, user notes on the PHP documentation are sometimes a real gold mine ;-)
hexdec() switches from int to float when the result is too large to be represented as an int. If you want arbitrarily long values, you're probably going to have to roll your own conversion function to change the hex string to a GMP integer.
function gmp_hexdec($n) {
$gmp = gmp_init(0);
$mult = gmp_init(1);
for ($i=strlen($n)-1;$i>=0;$i--,$mult=gmp_mul($mult, 16)) {
$gmp = gmp_add($gmp, gmp_mul($mult, hexdec($n[$i])));
}
return $gmp;
}
print gmp_strval(gmp_hexdec("D5CE3E462533364B"));
Output: 15406319846273791563
$num = gmp_init( '0xD5CE3E462533364B' ); // way to input a number in gmp
echo gmp_strval($num, 10); // display value in decimal
That's the module to use. Convert it to a function and then use on your numbers.
Note: provide these hex numbers as strings so:
$num = "0x348726837469972346"; // set variable
$gmpnum = gmp_init("$num"); // gmp number format
echo gmp_strval($gmpnum, 10); // convert to decimal and print out
1.5406319846274E+19 is a limited representation of you number. You can have a more complete one by using printf()
printf("%u\n", hexdec($h));
...will output "15406319846273792000". PHP uses floats for such big numbers, so you may lose a bit of precision. If you have to work with arbitrary precision numbers, you may try the bcmath extension. By splitting the hex into two 32-bit words (which should be safe on most systems) you should be able to get more precision. For instance:
$f = bcadd(bcmul(hexdec(substr($h, 0, -8)), 0x100000000), hexdec(substr($h, 8)));
...would set $f to 15406319846273791563.
Convert HEX to DEC is easy.. But, reconstruct back hexadecimal number is very hard.
Try to use base_convert ..
$hexadecimal = base_convert(2826896153644826, 10, 16);
// result: a0b0c0d0e0f1a
Run into this issue while storing 64-bit keys in MySQL database. I was able to get a bit perfect conversion to a 64-bit signed integer (PHP limitation) using a few binary operators: (This code is 16x faster than bchexdec function and resulting variables are using half the memory on average).
function x64toSignedInt($k){
$left = hexdec(substr($k,0,8));
$right = hexdec(substr($k,8,8));
return (int) ($left << 32) | $right;
}
MySQL signed BIGINT datatype is a great match for this as an index or storage in general. HEX(column) is a simple way to convert it back to HEX within the SQL query for use elsewhere.
This solution also uses the BC Math Functions. However, an algorithm is used which does without the bcpow function. This function is a bit shorter and faster than the accepted solution, tested on PHP 7.4.
function hexDecBc(string $hex) : string
{
for ($dec = '0', $i = 0; $i < strlen($hex); $i++) {
$dec = bcadd(bcmul($dec,'16'),(string)hexdec($hex[$i]));
}
return $dec;
}
Make sure to enable gmp extension. ext-gmp
$number = gmp_strval(gmp_init('0x03....')); // outputs: 1234324....
Doesn't intval(var, base) take care of it?
From the PHP Manual.

Alternative/faster methods of converting an integer to a cartesian coordinate?

As a fun side-project for myself to help in learning yet another PHP MVC framework, I've been writing Reversi / Othello as a PHP & Ajax application, mostly straightforward stuff. I decided against using a multidimensional array for a number of reasons and instead have a linear array ( in this case 64 elements long ) and a couple methods to convert from the coordinates to integers.
So I was curious, is there any other, possibly faster algorithms for converting an integer to a coordinate point?
function int2coord($i){
$x = (int)($i/8);
$y = $i - ($x*8);
return array($x, $y);
}
//Not a surprise but this is .003 MS slower on average
function int2coord_2($i){
$b = base_convert($i, 10, 8);
$x = (int) ($b != 0 ? $b/8 : 0); // could also be $b < 8 for condition
$y = $b % 10;
return array($x, $y);
}
And for posterity sake, the method I wrote for coord2int
function coord2int($x, $y){
return ($x*8)+$y;
}
Update:
So in the land of the weird, the results were not what I was expecting but using a pre-computed lookup table has predominantly shown to be the fastest, guess trading memory for speed is always a winner?
There was a table with times here but I cut it due to styling issues with SO.
Oh yes! This is a perfect example of binary:
function int2coord($i){
$x = $i >> 3;
$y = $i & 0x07;
return array($x, $y);
}
The reality is that a good compiler will find this optimization and use it, so it's not necessarily faster. Test and see if your compiler/interpreter does this.
It works because any binary division by 8 is the same as a right shift by 3 bits. Modern processors have barrel shifters that can do up to a 32 bit shift in one instruction.
The reverse is as easy:
function coord2int($x, $y){
return ($x << 3)+$y;
}
-Adam
I don't have the time to measure this myself right now, but I would suspect that a pre-computed lookup table would beat your solution in speed. The code would look something like this:
class Converter {
private $_table;
function __construct()
{
$this->_table = array();
for ($i=0; $i<64; $i++) {
$this->_table[$i] = array( (int)($i/8), (int)($i%8) );
}
}
function int2coord( $i )
{
return $this->_table[$i];
}
}
$conv = new Converter();
$coord = $conv->int2coord( 42 );
Of course, this does add a lot of over-head so in practice you would only bother to pre-compute all coordinates if you conversion code was called very often.
I'm not in a position to measure right now, but you should be able to eke out some additional speed with this:
function int2coord($i){
$y = $i%8;
$x = (int)($i/8);
return array($x, $y);
}
edit: ignore me -- Adam's bitshifting answer should be superior.
function int2coord_3($i){
return array((int) ($i / 8), ($i % 8));
}
this is a little faster because there is no var declaration and affectation.
I think most of your performance is lost by returning array(...) at the end. Instead, I propose:
* define two functions, one for x and one for y
or
* inline the bit arithmetic in code needing the calculation

Categories