Compression Algorithm in PHP - php

I'm getting into compression algorithms and my native language is PHP, and I can understand that PHP is a language that you wouldn't normally make a big algorithm in, but I was wondering if it was possible. (of course it's possible - more so efficient and powerful)
The first type of algorithm I'd attempt to make is a simple adaptive algorithm, by taking in the most used bytes (chars) and converting them into binary types, (ex: a = 0001, b = 0010, c = 0011) - tho there is no real way to do that in PHP that I am familiar with, before this I was using simple ASCII conversions like a = chr(33), b = chr(34) that would get the least valued ASCII values use them as the smallest -> largest operators for the compression definitions.
So what I am asking is that if there's a way to assign binary values to a variable instead of it being represented as ASCII, if I go:
$string_after_compression = 000100100011;
#split it by 4 bits per = 0001 | 0010 | 0011
That would be interpreted as an int - therefore making an over-large int therefore most likely running out of available ram with a simple sentence, then if I try to store the value in a string instead, that removes the point of compressing as it's just making a string like:
$string_after_compression = "000100100011";
#split it by 4 bits per = "0001" . "0010" . "0011";
-----
This question is a bit confusing but the aim is: Is there a way to assign a PHP variable using a PHP intigers
Example solution:
$binary_var = (binary) 0001;

To work with binary in PHP, your data type of choice is a string. Because strings are mere byte arrays:
$bytes = '';
There's no simple base 2 notation for binary in PHP (0100101), usually the next best option to work with binary is hex notation:
$bytes = "\x42"; // 0100 0010
You'll need to convert between base 2 and base 16 notation back and forth in your head, but once you get used to that it's typically easier to follow and work with than long strings of 1s and 0s.
To test or manipulate binary data you'll want to get used to the binary operators:
if (($bytes[3] & "\x02") === "\x02") {
// the second bit of the forth byte in the sequence is set (0000 0010)
}
$bytes[6] |= "\x02"; // setting the second bit of the seventh byte

Related

Reliable Margin of Error for Float -> String -> Float Conversion?

I have a float value that I need to store as a string in PHP and then compare later after casting back into a float.
Due to the conversion I know that relying on equality would be a mistake, as there's potential for a loss of precision, so I'm doing something like the following:
if (abs((float)$string_value - $float_value) < 0.001) { echo "Values are close enough\n"; }
Now, while a margin for error of 0.001 should be fine for my immediate purposes, it got me wondering; what is the smallest margin of error that I can reliably/safely use?
I realise that the safe margin of error will change with the size of the float (i.e- larger values have less or even no fractional precision), so an answer should probably account for this.
So to put it another way; given a float value that I want to store in base 10 and read back, how can I reliably decide what my margin of error should be such that I can reasonably confirm that the two values are the same?
Unfortunately the values I'm handling must be stored in plain decimal form, so my usual go-to of packing them as a network order 64-bit integer is not an option here ☹️
EDIT: To clarify; please assume that my question is about handling arbitrarily sized floats; the example code I've given is for a recent case where I'm handling floats within a limited range, so setting the margin of error manually is fine, but I'd like to be able to handle floats of any magnitude in future.
As mentioned in Mark Dickinson's comment, it is possible to convert a floating-point number to a string and back without losing precision. This only works if
you use enough significant decimal digits (17 for IEEE doubles)
the conversions are accurate (i.e. they're guaranteed to convert to the nearest number)
From a quick look, it seems that casting a double $f to a string in PHP, either implicitly or with (string) $f, only uses 14 significant digits, so this method isn't accurate enough. But you can use sprintf with a %.16e conversion specifier to get 17 significant digits. So after the following roundtrip
$s = sprintf("%.16e", $f);
$f2 = (double) $s;
$f2 should equal $f exactly unless PHP uses suboptimal algorithms internally.
Note that the %e conversion specifier uses scientific (exponential) notation. If you need plain decimal strings, you can use the %f specifier and calculate the required number of digits after the decimal point using log10:
if ($f != 0) {
$prec = 16 - floor(log10(abs($f)));
if ($prec < 0) $prec = 0;
}
else {
$prec = 0;
}
$s = sprintf("%.${prec}f", $f);
This can produce extremely long strings for very small or large numbers, though.
It would probably require a huge amount of research to tell the whether these methods are completely reliable, and if not what the maximum error is. It all depends on several implementation details like PHP version, underlying C library, etc.
Another idea is to compare the string representations instead of floating-point values:
# Assuming $string_value was also converted with float_to_string
if ($string_value == float_to_string($float_value)) {
echo "Values are close enough\n";
}
This should be reliable as long as you stick to the same PHP version.
If you must compare floating-point numbers, it often makes more sense to compare the relative error. See Bruce Dawson's excellent blog for more details.

Exceed PHP binary limit

I was trying to do some sort of XOR binary crypting algorithm in PHP and so I needed to convert large strings into binary. The problem is that PHP seems to be very limited in terms of binary calculation / storage as a string of six letters only, once converted, exceeds the PHP INT limit.
That means unpacking a big string to binary just gives a unusable number. I tried to do the string unpacking by splitting the string into packs of 4 letters and then unpacking them, but then I've got troubles with the repacking where it gives random characters instead of the original ones.
How can I do the unpacking of very long strings, and then store them either in a string (made only of 0s and 1s) or in a big array (where each value is either a 0 or 1, the key indicating the location of this bit) ?
have you tried the GMP library? Man page GMP
quick test code:
<?php
$gmpValue1 = gmp_init("1562767628166296698262", 10); // note: using base 10 (decimal)
$gmpValue2 = gmp_init("2163623626362663286446", 10);
$gmpValue3 = gmp_xor($gmpValue1, $gmpValue2);
echo gmp_strval($gmpValue3, 10) . "\n"; // note: using base 10 (decimal)

PHP pack: do not really understand

I posted this (php pack: problems with data types and verification of my results) and found that I had two problems.
So here again only one issue (I solved the other one) Hopefully this is easy to understand:
I want to use the PHP pack() function.
1) My aim is to convert any integer number info a hex one of length 2-Bytes.
Example: 0d37 --> 0x0025
2) Second aim is to toggle high / low byte of each value: 0x0025 --> 0x2500
3) There are many input values which will form 12-Bytes of binary data.
Can anyone help me?
You just have to lookup the format table in the pack() manual page and it is quite easy.
2 bytes means 16 bits, or also called a "short". I assume you want that unsigned ... so we get n for big endian (high) and v for little endian (low) byte order.
The only potentially tricky part is figuring out how to combine the format and parameters, as each format character is tied to a value argument:
bin2hex(pack('nv', 34, 34)) // returns 00222200
If you need a variable number of values, you'll need agument unpacking (a PHP language feature, not to be confused with unpack()):
$format = 'nv';
$values = [34, 34];
pack($format, ... $values); // does the same thing
And alternatively, if all of your values should be packed with the same format, you could do this:
pack('v*', $values); // will "pack" as many short integers as you want

How to iterate over a bit value?

I want to build a chessboard via bitboard system.
Starting with 12 bitboards i want to display a table (chessboard), during loop/iteration a piece must be drawn.
How do i loop through all bitvalues?
I was thinking of something like:
for(i=0;i<64;i++)
draw table / build array / draw empty square
These are my my values to start a game:
function init_game($whitePlayer,$blackPlayer)
{
$WhitePawns = '0000000000000000000000000000000000000000000000001111111100000000';
$WhiteKnights = '0000000000000000000000000000000000000000000000000000000001000010';
$WhiteBishops = '0000000000000000000000000000000000000000000000000000000000100100';
$WhiteRooks = '0000000000000000000000000000000000000000000000000000000010000001';
$WhiteQueens = '0000000000000000000000000000000000000000000000000000000000010000';
$WhiteKing = '0000000000000000000000000000000000000000000000000000000000001000';
$BlackPawns = '0000000011111111000000000000000000000000000000000000000000000000';
$BlackKnights = '0100001000000000000000000000000000000000000000000000000001000010';
$BlackBishops = '0010010000000000000000000000000000000000000000000000000000100100';
$BlackRooks = '1000000100000000000000000000000000000000000000000000000000000000';
$BlackQueens = '0000100000000000000000000000000000000000000000000000000000000000';
$BlackKing = '0001000000000000000000000000000000000000000000000000000000000000';
$WhitePieces = $WhitePawns|$WhiteKnights|$WhiteBishops|$WhiteRooks|$WhiteQueens|$WhiteKing;
$BlackPieces = $BlackPawns|$BlackKnights|$BlackBishops|$BlackRooks|$BlackQueens|$BlackKing;
}
Some people asked me: why bitboard appoach?
Answer:
About bitboard
A bitboard, often used for boardgames such as chess, checkers and othello, is a specialization of the bitset data structure, where each bit represents a game position or state, designed for optimization of speed and/or memory or disk use in mass calculations. Bits in the same bitboard relate to each other in the rules of the game often forming a game position when taken together. Other bitboards are commonly used as masks to transform or answer queries about positions. The "game" may be any game-like system where information is tightly packed in a structured form with "rules" affecting how the individual units or pieces relate.
First you have to check if your PHP version supports 64bit integers, otherwise you will have strange results.
Just run:
echo PHP_INT_MAX;
and if result is 9223372036854775807 then it should work.
You're using strings and I suppose that when you'll do $string | $string in form like you're doing it above then it will be cast as integer with base 10, so the result won't be what you want. Since PHP 5.4 you can use 0b000 notation, for lower PHP version you'll need to keep it in hexadecimal or base 10 format. If you're storing values in DB or somewhere like that and you'll receive value as string or you just want to keep it in format presented above, then you have to use intVal($value, 2) first to cast it properly.
To iterate over the value you can use just for loop (as you suggested):
$value = intVal($WhitePieces,2);
for ($i = 0 ; $i < 64 ; ++$i) {
if ((pow(2,$i) & $value)) {
// draw piece
}
}
You do not have bitvalues, you do have strings. And strings should be difficult to or.
How do you loop? Use an array and foreach.
How do you use 64bit values? Use PHP 5.4 and the binary number format: 0b00001111 => 16 - alternatively express the integer value as hex or decimal, which should be completely ok for a game setup routine that will not change because the rules are known for centuries.
Remember that you have to use a 64Bit system to execute your code, otherwise PHP will be unable to support 64Bit integers, and either treat them as float values, or shorten them to 32Bit values, depending on what you actually do.
Because of all this, I'd suggest NOT to use bit fields for the solution. They seem like a great idea to program more assembler-like, but you are not writing assembler, and will probably pay for this approach with non-optimal performance compared to anything else.

How to compare two 64 bit numbers

In PHP I have a 64 bit number which represents tasks that must be completed. A second 64 bit number represents the tasks which have been completed:
$pack_code = 1001111100100000000000000011111101001111100100000000000000011111
$veri_code = 0000000000000000000000000001110000000000000000000000000000111110
I need to compare the two and provide a percentage of tasks completed figure. I could loop through both and find how many bits are set, but I don't know if this is the fastest way?
Assuming that these are actually strings, perhaps something like:
$pack_code = '1001111100100000000000000011111101001111100100000000000000011111';
$veri_code = '0000000000000000000000000001110000000000000000000000000000111110';
$matches = array_intersect_assoc(str_split($pack_code),str_split($veri_code));
$finished_matches = array_intersect($matches,array(1));
$percentage = (count($finished_matches) / 64) * 100
Because you're getting the numbers as hex strings instead of ones and zeros, you'll need to do a bit of extra work.
PHP does not reliably support numbers over 32 bits as integers. 64-bit support requires being compiled and running on a 64-bit machine. This means that attempts to represent a 64-bit integer may fail depending on your environment. For this reason, it will be important to ensure that PHP only ever deals with these numbers as strings. This won't be hard, as hex strings coming out of the database will be, well, strings, not ints.
There are a few options here. The first would be using the GMP extension's gmp_xor function, which performs a bitwise-XOR operation on two numbers. The resulting number will have bits turned on when the two numbers have opposing bits in that location, and off when the two numbers have identical bits in that location. Then it's just a matter of counting the bits to get the remaining task count.
Another option would be transforming the number-as-a-string into a string of ones and zeros, as you've represented in your question. If you have GMP, you can use gmp_init to read it as a base-16 number, and use gmp_strval to return it as a base-2 number.
If you don't have GMP, this function provided in another answer (scroll to "Step 2") can accurately transform a string-as-number into anything between base-2 and 36. It will be slower than using GMP.
In both of these cases, you'd end up with a string of ones and zeros and can use code like that posted by #Mark Baker to get the difference.
Optimization in this case is not worth of considering. I'm 100% sure that you don't really care whether your scrip will be generated 0.00000014 sec. faster, am I right?
Just loop through each bit of that number, compare it with another and you're done.
Remember words of Donald Knuth:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
This code utilizes the GNU Multi Precision library, which is supported by PHP, and since it is implemented in C, should be fast enough, and supports arbitrary precision.
$pack_code = gmp_init("1001111100100000000000000011111101001111100100000000000000011111", 2);
$veri_code = gmp_init("0000000000000000000000000001110000000000000000000000000000111110", 2);
$number_of_different_bits = gmp_popcount(gmp_xor($pack_code, $veri_code));
$a = 11111;
echo sprintf('%032b',$a)."\n";
$b = 12345;
echo sprintf('%032b',$b)."\n";
$c = $a & $b;
echo sprintf('%032b',$c)."\n";
$n=0;
while($c)
{
$n += $c & 1;
$c = $c >> 1;
}
echo $n."\n";
Output:
00000000000000000010101101100111
00000000000000000011000000111001
00000000000000000010000000100001
3
Given your PHP-setuo can handle 64bit, this can be easily extended.
If not you can sidestep this restriction using GNU Multiple Precision
You could also split up the HEx-Representation and then operate on those coresponding parts parts instead. As you need just the local fact of 1 or 0 and not which number actually is represented! I think that would solve your problem best.
For example:
0xF1A35C and 0xD546C1
you just compare the binary version of F and D, 1 and 5, A and 4, ...

Categories