Efficient way for manipulating binary data

Efficient way for manipulating binary data - php

I have a string containing bitmap data. Basically, it holds cycles of 1 byte of red, green, blue for each pixel in an image.
I want to manipulate the 8-bit integer value of each color channel in the bitmap. Currently I do this using unpack('C'.strlen($string)) which gives me an array of integers. This is very slow as it converts a lot of data.
Is there a more efficient way to access and modify data from a string as integers?
By more efficient, I mean more efficient than this:
for($i = 0, $l = strlen($string); $i < $l; $i++)
$string[$i] = chr(floor(ord($string[$i]) / 2));
The manipulation where I divide by 2 is simply an example. The manipulations would be more advanced in a real example.
The problem with the above example is that for say 1 million pixels, you get 3 million function calls initialized from PHP (chr, floor, ord). On my server, this amounts to about 1.4 seconds for an example picture.

PHP strings essentially are byte arrays. The fastest way would simply be:
for ($i = 0, $length = strlen($data); $i < $length; $i++) {
$byte = $data[$i];
...
}
If you manipulate the byte with bitwise operators, you'll hardly get more efficient than that. You can write the byte back into the string with $data[$i] = $newValue. If you need to turn a single byte into an int, use ord(), the other way around use chr().

Related

Convert string to consistent but random 1 of 10 options

I have many strings. Each string something like:
"i_love_pizza_123"
"whatever_this_is_now_later"
"programming_is_awesome"
"stack_overflow_ftw"
...etc
I need to be able to convert each string to a random number, 1-10. Each time that string gets converted, it should consistently be the same number. A sampling of strings, even with similar text should result in a fairly even spread of values 1-10.
My first thought was to do something like md5($string), then break down a-f,0-9 into ten roughly-equal groups, determine where the first character of the hash falls, and put it in that group. But doing so seems to have issues when converting 16 down to 10 by multiplying by 0.625, but that causes the spread to be uneven.
Thoughts on a good method to consistently convert a string to a random/repeatable number, 1-10? There has to be an easier way.

Here's a quick demo how you can do it.
function getOneToTenHash($str) {
$hash = hash('sha256', $str, true);
$unpacked = unpack("L", $hash); // convert first 4 bytes of hash to 32-bit unsigned int
$val = $unpacked[1];
return ($val % 10) + 1; // get 1 - 10 value
}
for ($i = 0; $i < 100; $i++) {
echo getOneToTenHash('str' . $i) . "\n";
}
How it works:
Basically you get the output of a hash function and downscale it to desired range (1..10 in this case).
In the example above, I used sha256 hash function which returns 32 bytes of arbitrary binary data. Then I extract just first 4 bytes as integer value (unpack()).
At this point I have a 4 bytes integer value (0..4294967295 range). In order to downscale it to 1..10 range I just take the remainder of division by 10 (0..9) and add 1.
It's not the only way to downscale the range but an easy one.
So, the above example consists of 3 steps:
get the hash value
convert the hash value to integer
downscale integer range
A much shorter example with crc32() function which returns integer value right away thus allowing us to omit step 2:
function getOneToTenHash($str) {
$int = crc32($str); // 0..4294967295
return ($int % 10) + 1; // 1..10
}

below maybe what u want
$inStr = "hello world";
$md5Str = md5($inStr);
$len = strlen($md5Str);
$out = 0;
for($i=0; $i<$len; $i++) {
$out = 7*$out + intval($md5Str[$i]); // if you want more random, can and random() here
}
$out = ($out % 10 + 9)%10; // scope= [1,10]

Store a playing cards deck in MySQL (single column)

I'm doing a game with playing cards and have to store shuffled decks in MySQL.
What is the most efficient way to store a deck of 52 cards in a single column? And save/retrieve those using PHP.
I need 6 bits to represent a number from 0 to 52 and thus thought of saving the deck as binary data but I've tried using PHP's pack function without much luck. My best shot is saving a string of 104 characters (52 zero-padded integers) but that's far from optimal.
Thanks.

I do agree that it's not necessary or rather impractical to do any of this, but for fun, why not ;)
From your question I'm not sure if you aim to have all cards encoded as one value and stored accordingly or whether you want to encode cards individually. So I assume the former.
Further I assume you have a set 52 cards (or items) that you represent with an integer value between 1 and 52.
I see some approaches, outlined as follows, not all actually for the better of using less space, but included for the sake of being complete:
using a comma separated list (CSV), total length of 9+2*42+51 = 144 characters
turning each card into a character ie a card is represented with 8 bits, total length of 52 characters
encoding each card with those necessary 6 bits and concatenating just the bits without the otherwise lost 2 bits (as in the 2nd approach), total length of 39 characters
treat the card-ids as coefficients in a polynomial of form p(cards) = cards(1)*52^51 + cards(2)*52^50 + cards(3)*52^49 + ... + cards(52)*52^0 which we use to identify the card-set. Roughly speaking p(cards) must lie in the value range of [0,52^52], which means that the value can be represented with log(52^52)/log(2) = 296.422865343.. bits or with a byte sequence of length 37.052 respectively 38.
There naturally are further approaches, taking into account mere practical, technical or theoretical aspects, as is also visible through the listed approaches.
For the theoretical approaches (which I consider the most interesting) it is helpful to know a bit about information theory and entropy. Essentially, depending on what is known about a problem, no further information is required, respectively only the information to clarify all remaining uncertainty is needed.
As we are working with bits and bytes, it mostly is interesting for us in terms of memory usage which practically speaking is bit- or byte-based (if you consider only bit-sequences without the underlying technologies and hardware); that is, bits represent one of two states, ergo allow the differentiation of two states. This is trivial but important, actually :/
then, if you want to represent N states in a base B, you will need log(N) / log(B) digits in that base, or in your example log(52) / log(2) = 5.70.. -> 6 bits. you will notice that actually only 5.70.. bits would be required, which means with 6 bits we actually have a loss.
Which is the moment the problem transformation comes in: instead of representing 52 states individually, the card set as a whole can be represented. The polynomial approach is a way to do this. Essentially it works as it assumes a base of 52, ie where the card set is represented as 1'4'29'31.... or mathematically speaking: 52 + 1 = 1'1 == 53 decimal, 1'1 + 1 = 1'2 == 54 decimal, 1'52'52 + 1 = 2'1'1 == 5408 decimal,
But if you further look at the polynomial-approach you will notice that there is a total of 52^52 possible values whereas we would only ever use 52! = 1*2*3*...*52 because once a card is fixed the remaining possibilities decrease, respectively the uncertainty or entropy decreases. (please note that 52! / 52^52 = 4.7257911e-22 ! which means the polynomial is a total waste of space).
If we now were to use a value in [1,52!] which is pretty much the theoretical minimum, we could represent the card set with log(52!) / log(2) = 225.581003124.. bits = 28.1976.. bytes. Problem with that is, that any of the values represented as such does not contain any structure from which we can derive its semantics, which means that for each of the 52! possible values (well 52! - 1, if you consider the principle of exclusion) we need a reference of its meaning, ie a lookup table of 52! values and that would certainly be a memory overkill.
Although we can make a compromise with the knowledge of the decreasing entropy of an encoded ordered set. As an example: we sequentially encode each card with the minimum number of bits required at that point in the sequence. So assume N<=52 cards remain, then in each step a card can be represented in log(N)/log(2) bits, meaning that the number of required bits decreases, until for the last card, you don't need a bit in the first place. This would give about (please correct)..
20 * 6 bits + 16 * 5 bits + 8 * 4 bits + 4 * 3 bits + 2 * 2 bits + 1 bit = 249 bits = 31.125.. bytes
But still there would be a loss because of the partial bits used unnecessarily, but the structure in the data totally makes up for that.
So a question might be, hej can we combine the polynomial with this??!?11?! Actually, I have to think about that, I'm getting tired.
Generally speaking, knowing about the structure of a problem drastically helps decreasing the necessary memory space. Practically speaking, in this day and age, for your average high-level developer such low level considerations are not so important (hej, 100kByte of wasted space, so what!) and other considerations are weighted higher; also because the underlying technologies are often reducing memory usage by themselves, be it your filesystem or gzip-ed web-server responses, etc. The general knowledge of these kind of things though is still helpful in creating your services and datastructures.
But these latter approaches are very problem-specific "compression procedures"; general compression works differently, where as an example approach the procedures sequencially run through the bytes of the data and for any unseen bit sequences add these to a lookup table and represent the actual sequence with an index (as a sketch).
Well enough of funny talk, let's get technical!
1st approach "csv"
// your unpacked card set
$cards = range(1,52);
$coded = implode(',',$cards);
$decoded = explode(',',$coded);
2nd approach: 1 card = 1 character
// just a helper
// (not really necessary, but using this we can pretty print the resulting string)
function chr2($i)
{
return chr($i + 32);
}
function myPack($cards)
{
$ar = array_map('chr2',$cards);
return implode('',$ar);
}
function myUnpack($str)
{
$set = array();
$len = strlen($str);
for($i=0; $i<$len; $i++)
$set[] = ord($str[$i]) - 32; // adjust this shift along with the helper
return $set;
}
$str = myPack($cards);
$other_cards = myUnpack($str);
3rd approach, 1 card = 6 bits
$set = ''; // target string
$offset = 0;
$carry = 0;
for($i=0; $i < 52; $i++)
{
$c = $cards[$i];
switch($offset)
{
case 0:
$carry = ($c << 2);
$next = null;
break;
case 2:
$next = $carry + $c;
$carry = 0;
break;
case 4:
$next = $carry + ($c>>2);
$carry = ($c << 6) & 0xff;
break;
case 6:
$next = $carry + ($c>>4);
$carry = ($c << 4) & 0xff;
break;
}
if ($next !== null)
{
$set .= chr($next);
}
$offset = ($offset + 6) % 8;
}
// and $set it is!
$new_cards = array(); // the target array for cards to be unpacked
$offset = 0;
$carry = 0;
for($i=0; $i < 39; $i++)
{
$o = ord(substr($set,$i,1));
$new = array();
switch($offset)
{
case 0:
$new[] = ($o >> 2) & 0x3f;
$carry = ($o << 4) & 0x3f;
break;
case 4:
$new[] = (($o >> 6) & 3) + $carry;
$new[] = $o & 0x3f;
$carry = 0;
$offset += 6;
break;
case 6:
$new[] = (($o >> 4) & 0xf) + $carry;
$carry = ($o & 0xf) << 2;
break;
}
$new_cards = array_merge($new_cards,$new);
$offset = ($offset + 6) % 8;
}
4th approach, the polynomial, just outlined (please consider using bigints because of the integer overflow)
$encoded = 0;
$base = 52;
foreach($cards as $c)
{
$encoded = $encoded*$base + $c;
}
// and now save the binary representation
$decoded = array();
for($i=0; $i < 52; $i++)
{
$v = $encoded % $base;
$encoded = ($encoded - $v) / $base;
array_shift($v, $decoded);
}

Convert random text to a number with PHP

I need to convert a random text to a number. But the ramdom text has always to be converted to the same number. For example:
xxxx -> 10
testing -> 396
stackoverflow -> 72
I cant use the number of characters to convert the string cause if I have 2 strings with the same number characters they need to have a different number (at most times at least).
I do not need to have this number in a range. No! It can be any number, since it will always be the same given a certain string.

You could try using hashes (md5, sha1, etc):
$number = hexdec( md5("hello world") );
$number = hexdec( sha1("hello world") );
Hashes of the same string will transform to the same number.

What about;
$number = crc32($string);
Should be cheap, gives integer output, and produce reasonable randomness for your use case.

Other methods that have been shown have the potential of having collisions. The following should not.
$num = "";
for($i = 0; $i < strlen($str); $i++)
$num .= str_pad(ord($str[$i]), 3 "0", STR_PAD_LEFT);
return $num;

Generating unique 6 digit code

I'm generating a 6 digit code from the following characters. These will be used to stamp on stickers.
They will be generated in batches of 10k or less (before printing) and I don't envisage there will ever be more than 1-2 million total (probably much less).
After I generate the batches of codes, I'll check the MySQL database of existing codes to ensure there are no duplicates.
// exclude problem chars: B8G6I1l0OQDS5Z2
$characters = 'ACEFHJKMNPRTUVWXY4937';
$string = '';
for ($i = 0; $i < 6; $i++) {
$string .= $characters[rand(0, strlen($characters) - 1)];
}
return $string;
Is this a solid approach to generating the code?
How many possible permutations would there be? (6 Digit code from pool of 21 characters). Sorry math isn't my strong point

21^6 = 85766121 possibilities.
Using a DB and storing used values is bad. If you want to fake randomness you can use the following:
Reduce to 19 possible numbers and make use of the fact that groups of order p^k where p is an odd prime are always cyclic.
Take the group of order 7^19, using a generator co-prime to 7^19 (I'll pick 13^11, you can choose anything not divisible by 7).
Then the following works:
$previous = 0;
function generator($previous)
{
$generator = pow(13,11);
$modulus = pow(7,19); //int might be too small
$possibleChars = "ACEFHJKMNPRTUVWXY49";
$previous = ($previous + $generator) % $modulus;
$output='';
$temp = $previous;
for($i = 0; $i < 6; $i++) {
$output += $possibleChars[$temp % 19];
$temp = $temp / 19;
}
return $output;
}
It will cycle through all possible values and look a little random unless they go digging. An even safer alternative would be multiplicative groups but I forget my math already :(

There is a lot of possible combination with or without repetition so your logic would be sufficient
Collision would be frequent because you are using rand see str_shuffle and randomness.
Change rand to mt_rand
Use fast storage like memcached or redis not MySQL when checking
Total Possibility
21 ^ 6 = 85,766,121
85,766,121 should be ok , To add database to this generation try:
Example
$prifix = "stamp.";
$cache = new Memcache();
$cache->addserver("127.0.0.1");
$stamp = myRand(6);
while($cache->get($prifix . $stamp)) {
$stamp = myRand(6);
}
echo $stamp;
Function Used
function myRand($no, $str = "", $chr = 'ACEFHJKMNPRTUVWXY4937') {
$length = strlen($chr);
while($no --) {
$str .= $chr{mt_rand(0, $length- 1)};
}
return $str;
}

as Baba said generating a string on the fly will result in tons of collisions. the closer you will go to 80 millions already generated ones the harder it will became to get an available string
another solution could be to generate all possible combinations once, and store each of them in the database already, with some boolean column field that marks if a row/token is already used or not
then to get one of them
SELECT * FROM tokens WHERE tokenIsUsed = 0 ORDER BY RAND() LIMIT 0,1
and then mark it as already used
UPDATE tokens SET tokenIsUsed = 1 WHERE token = ...

You would have 21 ^ 6 codes = 85 766 121 ~ 85.8 million codes!
To generate them all (which would take some time), look at the selected answer to this question: algorithm that will take numbers or words and find all possible combinations.

I had the same problem, and I found very impressive open source solution:
http://www.hashids.org/php/
You can take and use it, also it's worth it to look in it's source code to understand what's happening under the hood.

Or... you can encode username+datetime in md5 and save to database, this for sure will generate an unique code ;)

Is this algorithm reversible?

I have this algorithm written in PHP for my project:
<?php
$s = "abc"; //input -- string
$n = strlen($s);
$b = 0;
for ($i = 0; $i < $n; $i++)
{
$b += ord($s[$i]) * pow(31, ($n - ($i + 1)));
}
echo $b; //output -- int
?>
But now I have to reverse it to take the string from integer. I tried but it failed, is there any way to reverse it?
EDIT: By "any way" I meant that it doesn't have to reverse to the original text, but only to reverse to text that gives that value.

no, it's not...
easier example: let's assign every letter a value: a=1,b=2,c=3,d=4 etc...
and here we go: you have "5" - you don't know whether it is "ad" or "bba" or "bc" etc.

IF the string can be guaranteed to only have lowercase letters on it, it can; you'll have to figure out the maths for it (I suggest you work out in paper the algorithm, leaving the letters as variables; solve the equations and you'll see how to reverse it).
If the string is arbitrary, then no; because you're converting each character to a base-31 representation of the number, shifting it and adding the results -- however, this addition has lots of carries, so you can't work out the original characters just from the digits (that is, the final number) of the result.
EDIT: given your edit, then yes, it is possible. It can be a bit complicated, though -- i'll leave you to work out the maths yourself. Try juggling a bit with the number 31.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Efficient way for manipulating binary data - php

Related

Convert string to consistent but random 1 of 10 options

Store a playing cards deck in MySQL (single column)

Convert random text to a number with PHP

Generating unique 6 digit code

Is this algorithm reversible?

Categories

Resources