Two-way hashing of fixed range numbers - php

I need to create a function which takes a single integer as argument in the range 0-N and returns a seemingly random number in the same range.
Each input number should always have exactly one output and it should always be the same.
Such a function would produce something like this:
f(1) = 4
f(2) = 1
f(3) = 5
f(4) = 2
f(5) = 3
I believe this could be accomplished by some kind of a hashing algorithm? I don't need anything complex, just not something too simple like f(1) = 2, f(2) = 3 etc.
The biggest issue is that I need this to be reversible. E.g. the above table should be true left-to-right as well as right-to-left, using a different function for the right-to-left conversion is fine.
I know the easiest way is to create an array, shuffle it and just store the relations in a db or something, but as I need N to be quite large I'd like to avoid this if possible.
Edit: For my particular case N is a specific number, it's exactly 16777216 (64^4).

If the range is always a power of two -- like [0,16777216) -- then you can use exclusive-or just as #MarkBaker suggested. It just doesn't work so easily if your range is not a power of two.
You can use addition and subtraction modulo N, although these alone are too obvious, so you have to combine it with something else.
You can also do multiplication modulo-N, but reversing that is complicated. To make it simpler, we can isolate the bottom eight bits and multiply those and add them in a way that doesn't interfere with those bits so we can use them again to reverse the operation.
I don't know PHP so I'm going to give an example in C, instead. Maybe it's the same.
int enc(int x) {
x = x + 4799 * 256 * (x % 256);
x = x + 8896843;
x = x ^ 4777277;
return (x + 1073741824) % 16777216;
}
And to decode, play the operations back in reverse order:
int dec(int x) {
x = x + 1073741824;
x = x ^ 4777277;
x = x - 8896843;
x = x - 4799 * 256 * (x % 256);
return x % 16777216;
}
That 1073741824 must be a multiple of N, and 256 must be a factor of N, and if N is not a power of two then you can't (necessarily) use exclusive-or (^ is exclusive-or in C and I assume in PHP too). The other numbers you can fiddle with, and add and remove stages, at your leisure.
The addition of 1073741824 in both functions is to ensure that x stays positive; this is so that the modulo operation doesn't ever give a negative result, even after we've subtracted values from x which might have made it go negative in the interim.

I offered to describe how I "randomly" scramble up 9-digit SSNs when producing research data sets. This does not replace or hash an SSN. It re-orders the digits. It is difficult to put the digits back in the correct order if you don't know the order in which they were scrambled. I have a gut feeling that this is not what the questioner really wants. So, I am happy to delete this answer if it is deemed off-topic.
I know that I have 9 digits. So, I start with an array that has 9 index values in order:
$a = array(0,1,2,3,4,5,6,7,8);
Now, I need to turn a key that I can remember into a way to shuffle the array. The shuffling has to be the same order for the same key every time. I use a couple tricks. I use crc32 to turn a word into a number. I use srand/rand to get a predictable order of random values. Note: mt_rand no longer produces the same sequence of random digits with the same seed, so I have to use rand.
srand(crc32("My secret key"));
usort($a, function($a, $b) { return rand(-1,1); });
The array $a still has the digits 0 through 8, but they are shuffled. If I use the same keyword I will get the same shuffled order every time. That lets me repeat this every month and get the same result. Then, with a shuffled array, I can pick the digits off the SSN. First, I ensure it has 9 characters (some SSNs are sent as integers and a leading 0 is omitted). Then, I build a masked SSN by picking the digits using $a.
$ssn = str_pad($ssn, 9, '0', STR_PAD_LEFT);
$masked_ssn = '';
foreach($a as $i) $masked_ssn.= $ssn{$i};
$masked_ssn will now have all the digits in $ssn, but in a different order. Technically, there are keywords that make $a become the original ordered array after shuffling, but that is very very rare.
Hopefully this makes sense. If so, you can do it all much faster. If you turn the original string into an array of characters, you can shuffle the array of characters. You just need to reseed rand every time.
$ssn = "111223333"; // Assume I'm using a proper 9-digit SSN
$a = str_split($ssn);
srand(crc32("My secret key"));
usort($a, function($a, $b) { return rand(-1,1); });
$masked_ssn = implode('', $a);
This is not really faster in a runtime way because rand is a rather expensive function and you run rand a hell of lot more here. If you are masking thousands of values as I do, you will want to use an index array that is shuffled just once, not a shuffling for every value.
Now, how do I undo it? Assume I'm using the first method with the index array. It will be something like $a = {5, 3, 6, 1, 0, 2, 7, 8, 4}. Those are the indexes for the original SSN in the masked order. So, I can easily build the original SSN.
$ssn = '000000000'; // I like to define all 9 characters before I start
foreach($a as $i=>$j) $ssn[$j] = $masked_ssn{$i};
As you can see, $i counts from 0 to 8 across the masked SSN. $j counts 5, 3, 6... and puts each value from the masked SSN in the correct place in the original SSN.

Looks like you've got good answer, but still there is an alternative. Linear Congruential Generator (LCG) could provide 1-to-1 mapping and it is known to be a reversible using Euclid's algorithm. For 24bit
Xi = [(A * Xi-1) + C] Mod M
where M = 2^24 = 16,777,216
A = 16,598,013
C = 12,820,163
For LCG reversability take a look at Reversible pseudo-random sequence generator

Related

how to create a row of digits based on a string in php [duplicate]

In php is there a way to give a unique hash from a string, but that the hash was made up from numbers only?
example:
return md5(234); // returns 098f6bcd4621d373cade4e832627b4f6
but I need
return numhash(234); // returns 00978902923102372190
(20 numbers only)
the problem here is that I want the hashing to be short.
edit:
OK let me explain the back story here.
I have a site that has a ID for every registered person, also I need a ID for the person to use and exchange (hence it can't be too long), so far the ID numbering has been 00001, 00002, 00003 etc...
this makes some people look more important
this reveals application info that I don't want to reveal.
To fix point 1 and 2 I need to "hide" the number while keeping it unique.
Edit + SOLUTION:
Numeric hash function based on the code by https://stackoverflow.com/a/23679870/175071
/**
* Return a number only hash
* https://stackoverflow.com/a/23679870/175071
* #param $str
* #param null $len
* #return number
*/
public function numHash($str, $len=null)
{
$binhash = md5($str, true);
$numhash = unpack('N2', $binhash);
$hash = $numhash[1] . $numhash[2];
if($len && is_int($len)) {
$hash = substr($hash, 0, $len);
}
return $hash;
}
// Usage
numHash(234, 20); // always returns 6814430791721596451
An MD5 or SHA1 hash in PHP returns a hexadecimal number, so all you need to do is convert bases. PHP has a function that can do this for you:
$bignum = hexdec( md5("test") );
or
$bignum = hexdec( sha1("test") );
PHP Manual for hexdec
Since you want a limited size number, you could then use modular division to put it in a range you want.
$smallnum = $bignum % [put your upper bound here]
EDIT
As noted by Artefacto in the comments, using this approach will result in a number beyond the maximum size of an Integer in PHP, and the result after modular division will always be 0. However, taking a substring of the hash that contains the first 16 characters doesn't have this problem. Revised version for calculating the initial large number:
$bignum = hexdec( substr(sha1("test"), 0, 15) );
You can try crc32(). See the documentation at: http://php.net/manual/en/function.crc32.php
$checksum = crc32("The quick brown fox jumped over the lazy dog.");
printf("%u\n", $checksum); // prints 2191738434
With that said, crc should only be used to validate the integrity of data.
There are some good answers but for me the approaches seem silly.
They first force php to create a Hex number, then convert this back (hexdec) in a BigInteger and then cut it down to a number of letters... this is much work!
Instead why not
Read the hash as binary:
$binhash = md5('[input value]', true);
then using
$numhash = unpack('N2', $binhash); //- or 'V2' for little endian
to cast this as two INTs ($numhash is an array of two elements). Now you can reduce the number of bits in the number simply using an AND operation. e.g:
$result = $numhash[1] & 0x000FFFFF; //- to get numbers between 0 and 1048575
But be warned of collisions! Reducing the number means increasing the probability of two different [input value] with the same output.
I think that the much better way would be the use of "ID-Crypting" with a Bijectiv function. So no collisions could happen! For the simplest kind just use an Affine_cipher
Example with max input value range from 0 to 25:
function numcrypt($a)
{
return ($a * 15) % 26;
}
function unnumcrypt($a)
{
return ($a * 7) % 26;
}
Output:
numcrypt(1) : 15
numcrypt(2) : 4
numcrypt(3) : 19
unnumcrypt(15) : 1
unnumcrypt(4) : 2
unnumcrypt(19) : 3
e.g.
$id = unnumcrypt($_GET('userid'));
... do something with the ID ...
echo ' go ';
of course this is not secure, but if no one knows the method used for your encryption then there are no security reasons then this way is faster and collision safe.
The problem of cut off the hash are the collisions, to avoid it try:
return hexdec(crc32("Hello World"));
The crc32():
Generates the cyclic redundancy checksum polynomial of 32-bit lengths
of the str. This is usually used to validate the integrity of data
being transmitted.
That give us an integer of 32 bit, negative in 32 bits installation, or positive in the 64 bits. This integer could be store like an ID in a database. This donĀ“t have collision problems, because it fits into 32bits variable, once you convert it to decimal with the hexdec() function.
First of all, md5 is basically compromised, so you shouldn't be using it for anything but non-critical hashing.
PHP5 has the hash() function, see http://www.php.net/manual/en/function.hash.php.
Setting the last parameter to true will give you a string of binary data. Alternatively, you could split the resulting hexadecimal hash into pieces of 2 characters and convert them to integers individually, but I'd expect that to be much slower.
Try hashid.
It hash a number into format you can define. The formats include how many character, and what character included.
Example:
$hashids->encode(1);
Will return "28630" depends on your format,
Just use my manual hash method below:
Divide the number (e.g. 6 digit) by prime values, 3,5,7.
And get the first 6 values that are in the decimal places as the ID to be used. Do a check on uniqueness before actual creation of the ID, if a collision exists, increase the last digit by +1 until a non collision.
E.g. 123456 gives you 771428
123457 gives you 780952
123458 gives you 790476.

Store many numbers as a single unique number

I have the necessity to store many numbers (i can decide which numbers) as a single unique number from which i should be able to retrieve the original number.
I already know 2 ways to do this:
1) Fundamental theorem of arithmetic (Prime Numbers)
Say i have 5 values, i assign a prime number other than 1 to each value
a = 2
b = 3
c = 5
d = 7
e = 13
If i want to store a, b and c i can multiply them 2*3*5=30 and i know no other product of primes can be 30. Then to check if a value contains, for example, b, all i need to do is 30 % b == 0
2) Bitmask
Just like Linux permissions, use powers of 2 and sum each value
But these 2 methods grow up fast (1st way faster than 2nd), and using prime numbers requires me to have a lot of primes.
Is there any other method to do this efficiently when you have, for example, a thousand values?
If you are storing, say, base 10 numbers, then do a conversion through base 11 numbers. With the increased base, you have an extra 'digit'. Use that digit as a separator. So, three base 10 numbers "10, 42, 457" become "10A42A457": a single base 11 number (with 'A' as the additional digit).
Whatever base your original numbers are in, increase the base by 1 and concatenate, using the extra digit as a separator. That will give you a single number in the increased base.
That single number can be stored in whatever number base you find convenient: binary, denary or hex for example.
To retrieve your original numbers just convert to base 11 (or whatever) and replace the extra digit with separators.
ETA: You don't have to use base 11. The single number "10A42A457" is also a valid hexadecimal number, so any base of 11 or above could be used. Hex may be easier to work with than base 11.
Is there any other method to do this efficiently when you have, for example, a thousand values?
I an not a mathematician but it's basic math, all depends on range
Range 0-1: You want to store 4 numbers 0-1 - it's basically binary system
Number1 + Number2 * 2^1 + Number3 * 2^2 + Number4 * 2^3
Range 0-50 You want to store 4 numbers 0-49
Number1 + Number2 * 50^1 + Number3 * 50^2 + Number4 * 50^3
Range 0-X You want to store N numbers 0-X
Number1 + Number2 * (X+1)^1 + Number3 * (X+1)^2 + ... + NumberN * (X+1)^(N-1)
If you have no pattern for your numbers (so it can get compressed in some way) there is really no other way.
It's also super easy for computer to resolve the number unlike the prime numbers
Predetermined values
#FlorainK comment pointed me to fact I missed
(i can decide which numbers)
The only logical solution is give your numbers references
0 is 15342
1 is 6547
2 is 76234
3 is "i like stack overflow"
4 is 42141
so you'll work range 0-4 (5 options) and whatever combination length. Use reference when "encoding" and "decoding" the number
a thousand values?
so you'll work with Range 0-999
0 is 62342
1 is 7456345653
2 is 45656234532
...
998 is 7623452
999 is 4324234326453
Let's say you use 64-bit system and programming/db language that works with 64-bit integers
2^64 = 18446744073709551616
your max range is 1000^X < 18446744073709551616 where X is number of numbers you can store in one single 64-bit integer number
Which is only 6.
You can store only 6 separate numbers 0-999 that will fit one 64-bit integer number.
0,0,0,0,0,0 is 0
1,0,0,0,0,0 is 1
0,1,0,0,0,0 is 1000
999,999,999,999,999,999 is ~1e+18
Ok so you want to store "a,b,c" or "a,b" or "a,b,c,d" or "a" etc. (thanks #FlorianK)
in such case just could use bitwise operators and powers of two
$a = 1 << 0; // 1
$b = 1 << 1; // 2
$c = 1 << 2; // 4
$d = 1 << 3; // 8
.. etc
let's say $flag has $a and $c
$flag = $a | $c; // $flag is integer here
now check it
$ok = ($flag & $a) && ($flag & $c); // true
$ok = ($flag & $a) && ($flag & $b); // false
so in 64 bit system/language/os you can use up to 64 flags which gives you a 2^64 combinations
there is no really other option. prime numbers are much worse for this as you skip many numbers in-between while binary system uses every single number.
I see you are using database and you want to store this in DB.
I really think we are dealing here with XY Problem and you should reconsider your application instead of making such workarounds.

What is the best algorithm to see if my number is in an array of ranges?

I have a 2 dimensional arrays in php containing the Ranges. for example:
From.........To
---------------
125..........3957
4000.........5500
5217628......52198281
52272128.....52273151
523030528....523229183
and so on
and it is a very long list. now I want to see if a number given by user is in range.
for example numbers 130, 4200, 52272933 are in my range but numbers 1, 5600 are not.
of course I can count all indexes and see if my number is bigger than first and smaller than second item. but is there a faster algorithm or a more efficient way of doing it using php function?
added later
It is sorted. it is actually numbers created with ip2long() showing all IPs of a country.
I just wrote a code for it:
$ips[1] = array (2,20,100);
$ips[2] = array (10,30,200);
$n=11;// input ip
$count = count($ips);
for ($i = 0; $i <= $count; $i++) {
if ($n>=$ips[1][$i]){
if ($n<=$ips[2][$i]){
echo "$i found";
break;
}
}else if($n<$ips[1][$i]){echo "not found";break;}
}
in this situation numbers 2,8,22,and 200 are in range. but not numbers 1,11,300
Put the ranges in a flat array, sorted from lower to higher, like this:
a[0] = 125
a[1] = 3957
a[2] = 4000
a[3] = 5500
a[4] = 5217628
a[5] = 52198281
a[6] = 52272128
a[7] = 52273151
a[8] = 523030528
a[9] = 523229183
Then do a binary search to determine at what index of this array the number in question should be inserted. If the insertion index is even then the number is not in any sub-range. If the insertion index is odd, then the number falls inside one of the ranges.
Examples:
n = 20 inserts at index 0 ==> not in a range
n = 126 inserts at index 1 ==> within a range
n = 523030529 inserts at index 9 ==> within a range
You can speed things up by implementing a binary search algorithm. Thus, you don't have to look at every range.
Then you can use in_array to check if the number is in the array.
I'm not sure if I got you right, do your arrays really look like this:
array(125, 126, 127, ..., 3957);
If so, what's the point? Why not just have?
array(125, 3957);
That contains all the information necessary.
The example you give suggests that the numbers may be large and the space sparse by comparison.
At that point, you don't have very many options. If the array is sorted, binary search is about all there is. If the array is not sorted, you're down to plain, old CS101 linear search.
The correct data structure to use for this problem is an interval tree. This is, in general, much faster than binary search.
I am assuming that the ranges do not overlap.
If that is the case, you can maintain a map data structure that is keyed on the lower value of the range.
Now all you have to do (given the number N) is to find the key in the map that is just lower than N (using binary search - logarithmic complexity) and then check if the number is lesser than the right value.
Basically, it is a binary search (logarithmic) on the constructed map.
From a pragmatic point of view, a linear search may very well turn out to be the fastest lookup method. Think of page faults and hard disk seek time here.
If your array is large enough (whatever "enough" actually means), it may be wise to stuff your IPs in a SQL database and let the database figure out how to efficiently compute SELECT ID FROM ip_numbers WHERE x BETWEEN start AND end;.

PHP: Permutations of 26 letters in X characters

I have an alphabet array with 26 characters A..Z .
I am searching for a performant algorithm that lists all permutations that fill an array of length X without any repeating characters.
Examples:
X=3 . Target array: _ _ _
Permutations are A B C until Z Y X .
X=4 . Target array: _ _ _ _
Permutations are A B C D until Z Y X W
X=5 . Target array: _ _ _ _ _
Permutations are A B C D E until Z Y X W V
(Sorry, I don't know how this kind of algorithm is named)
Thanks in advance.
Code in C, Delphi or Java is also OK, since it can be easy translated.
A simple solution is a recursive one
char current_combination[27];
int char_used[26];
void enumerate(int i, int n)
{
for (int j=0; j<26; j++)
{
if (!char_used[j])
{
char_used[j] = 1;
current_combination[i] = 'A' + j;
if (i+1 == n)
{
puts(current_combination);
}
else
{
enumerate(i+1, n);
}
char_used[j] = 0;
}
}
}
The function above accepts the index i of the character to be computed and the total number n of characters in a combination (the code assumes i<n). It keeps the current combination and the array of flags for already used variables in globals to avoiding copying them around.
To generate for example all combinations of length 5 call enumerate(0, 5).
Note that the total number of combinations grows very fast. For example for n=6 there are 165,765,600 combinations, with more than 1Gb of output.
I'd take a simple brute force approach though do understand the number of permutations can get up there as the number is 26!/(26-x)! which can be rather large as for 3 there are 15,600 permutations and for 5 there are 7,893,600 permutations which isn't exactly small. Basically you could just loop through all the values with loops in loops that unfortunately would be O(n^x) where x is the number of characters since the nesting of loops causing the complexity jump.
Something to consider is how finely are you examining complexity here. For example, while you could consider ways to go about being clever in the first pair of loops to avoid duplication, the third loop in becomes a bit trickier though if you started with a List of 26 letters and removed the previous ones, this would make the last loop be a simply iterative as you know there isn't any duplicates though this can be expensive in terms of memory consumed in having to make copies of the list on each pass from the outer loop. Thus the first time, you'd go through AB_ and then AC_ and so forth, but the copying of the list may be where this gets expensive in terms of operations as there would be thousands of times that the list is copied that one could wonder if that is more efficient than doing comparisons.
Are you sure you want to see all permutations? If you have X=3, you will have 26*25*24 combinations = 15600. And if X=5 number of combinations is equal to 7893600.
You need to randomly select one letter(or array index) and store it somewhere and on each iteration you should check if this letter(or index) has been already selected on one of the previous iteration. After this you will get random sequence which length is X characters. You need to store it too. Then you need to reapeat all operation made on the previous step and also you have to check if there is random sequense with subsequence you have been generating now.
Or you could use direct enumeration.
Sorry for not satisfactory english. I tried to be clear.
I hope it will be usefull.

How to compare two 64 bit numbers

In PHP I have a 64 bit number which represents tasks that must be completed. A second 64 bit number represents the tasks which have been completed:
$pack_code = 1001111100100000000000000011111101001111100100000000000000011111
$veri_code = 0000000000000000000000000001110000000000000000000000000000111110
I need to compare the two and provide a percentage of tasks completed figure. I could loop through both and find how many bits are set, but I don't know if this is the fastest way?
Assuming that these are actually strings, perhaps something like:
$pack_code = '1001111100100000000000000011111101001111100100000000000000011111';
$veri_code = '0000000000000000000000000001110000000000000000000000000000111110';
$matches = array_intersect_assoc(str_split($pack_code),str_split($veri_code));
$finished_matches = array_intersect($matches,array(1));
$percentage = (count($finished_matches) / 64) * 100
Because you're getting the numbers as hex strings instead of ones and zeros, you'll need to do a bit of extra work.
PHP does not reliably support numbers over 32 bits as integers. 64-bit support requires being compiled and running on a 64-bit machine. This means that attempts to represent a 64-bit integer may fail depending on your environment. For this reason, it will be important to ensure that PHP only ever deals with these numbers as strings. This won't be hard, as hex strings coming out of the database will be, well, strings, not ints.
There are a few options here. The first would be using the GMP extension's gmp_xor function, which performs a bitwise-XOR operation on two numbers. The resulting number will have bits turned on when the two numbers have opposing bits in that location, and off when the two numbers have identical bits in that location. Then it's just a matter of counting the bits to get the remaining task count.
Another option would be transforming the number-as-a-string into a string of ones and zeros, as you've represented in your question. If you have GMP, you can use gmp_init to read it as a base-16 number, and use gmp_strval to return it as a base-2 number.
If you don't have GMP, this function provided in another answer (scroll to "Step 2") can accurately transform a string-as-number into anything between base-2 and 36. It will be slower than using GMP.
In both of these cases, you'd end up with a string of ones and zeros and can use code like that posted by #Mark Baker to get the difference.
Optimization in this case is not worth of considering. I'm 100% sure that you don't really care whether your scrip will be generated 0.00000014 sec. faster, am I right?
Just loop through each bit of that number, compare it with another and you're done.
Remember words of Donald Knuth:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
This code utilizes the GNU Multi Precision library, which is supported by PHP, and since it is implemented in C, should be fast enough, and supports arbitrary precision.
$pack_code = gmp_init("1001111100100000000000000011111101001111100100000000000000011111", 2);
$veri_code = gmp_init("0000000000000000000000000001110000000000000000000000000000111110", 2);
$number_of_different_bits = gmp_popcount(gmp_xor($pack_code, $veri_code));
$a = 11111;
echo sprintf('%032b',$a)."\n";
$b = 12345;
echo sprintf('%032b',$b)."\n";
$c = $a & $b;
echo sprintf('%032b',$c)."\n";
$n=0;
while($c)
{
$n += $c & 1;
$c = $c >> 1;
}
echo $n."\n";
Output:
00000000000000000010101101100111
00000000000000000011000000111001
00000000000000000010000000100001
3
Given your PHP-setuo can handle 64bit, this can be easily extended.
If not you can sidestep this restriction using GNU Multiple Precision
You could also split up the HEx-Representation and then operate on those coresponding parts parts instead. As you need just the local fact of 1 or 0 and not which number actually is represented! I think that would solve your problem best.
For example:
0xF1A35C and 0xD546C1
you just compare the binary version of F and D, 1 and 5, A and 4, ...

Categories