Recently I wanted to search an array for numeric values (ints, doubles, and numbers with exponent notation) as quickly as possible.
I initially used 'is_numeric()' as we often use that as our goto for checking but I wanted to see if I could put in something faster.
I noticed that if I cast to float then as long as the value is numeric PHP will produce a value other than zero. So using the bitwise operators I can do a logical NOT zero within the if statement surrounding the search.
if (~(~(float)$value)) {
// add to result array
}
After initial testing I found things seemed to speed up by 2 whole seconds with a moderately sized array of numerics and non numerics. However this was little more than a simple unit test.
Does anyone have experience of performance of casting as a float vs is_numeric? I know they're probably not 100% functionally equivalent (I think the cast to float would convert hexadecimal) but for my purposes I'm only going to be casting ints, doubles and numbers with an exponent notation. Is this a performance gain over is_numeric() or have I imagined this?
warning!
isnumeric () is not just a whim, I am attaching a small piece of code that shows you the error that your conversion type makes. in many attacks on php there are strings that can be both numbers and squeaks where the attackers inject bad code.
code:
<?php
$a="1809809808908099878758765<?php echo \"I powned you\"; ?>";
echo is_numeric($a)?"yes":"no"; // out no
echo "\n";
echo (~(~(float)$a))?"Yes":"No"; // out Yes
if you do it that way you could gain performance but depending on what you have to do you could open a hole in security!
I'm integrating a PHP application with an API that uses permissions in the form of 0x00000008, 0x20000000, etc. where each of these represents a given permission. Their API returned an integer. In PHP, how do I interpret something like 0x20000000 into an integer so I can utilize bitwise operators?
Another dev told me he thought these numbers were hex annotation, but googling that in PHP I'm finding limited results. What kind of numbers are these and how can I take an integer set of permissions and using bitwise operators determine if the user can 0x00000008.
As stated in the php documentation, much like what happens in the C language, this format is recognized in php as a native integer (in hexadecimal notation).
<?php
echo 0xDEADBEEF; // outputs 3735928559
Demo : https://3v4l.org/g1ggf
Ref : http://php.net/manual/en/language.types.integer.php#language.types.integer.syntax
Thus you could perform on them any bitwise operation since they are plain integers, with respect to the size of the registers on your server's processor (nowadays you are likely working on a 64-bit architecture, just saying in case it applies, you can confirm it with phpinfo() and through other means in doubt).
echo 0x00000001<<3; // 1*2*2*2, outputs 8
Demo : https://3v4l.org/AKv7H
Ref : http://php.net/manual/en/language.operators.bitwise.php
As suggested by #iainn and #SaltyPotato in their respective answer and comment, since you are reading the data from an API you might have to deal with these values obtained as strings instead of native integers. To overcome this you would just have to go through the hexdec function to convert them before use.
Ref : http://php.net/manual/en/function.hexdec.php
Since you are receiving it from an API php will interpret it as a string
echo hexdec('0x00000008');
returns the integer 8
This should work.
I posted this (php pack: problems with data types and verification of my results) and found that I had two problems.
So here again only one issue (I solved the other one) Hopefully this is easy to understand:
I want to use the PHP pack() function.
1) My aim is to convert any integer number info a hex one of length 2-Bytes.
Example: 0d37 --> 0x0025
2) Second aim is to toggle high / low byte of each value: 0x0025 --> 0x2500
3) There are many input values which will form 12-Bytes of binary data.
Can anyone help me?
You just have to lookup the format table in the pack() manual page and it is quite easy.
2 bytes means 16 bits, or also called a "short". I assume you want that unsigned ... so we get n for big endian (high) and v for little endian (low) byte order.
The only potentially tricky part is figuring out how to combine the format and parameters, as each format character is tied to a value argument:
bin2hex(pack('nv', 34, 34)) // returns 00222200
If you need a variable number of values, you'll need agument unpacking (a PHP language feature, not to be confused with unpack()):
$format = 'nv';
$values = [34, 34];
pack($format, ... $values); // does the same thing
And alternatively, if all of your values should be packed with the same format, you could do this:
pack('v*', $values); // will "pack" as many short integers as you want
the topic pretty much describes what we would like to accomplish.
a) start with a possible range of integers, for example, 1 to 10000.
b) take any md5 hash, run it thru this algo.
c) result that pops out will be an integer between 1 to 10000.
we are open to using another hashing method too.
the flow would ideally look like this:
string -> md5(string) -> algo(md5(string),range) -> resulting integer within range
is something like this possible?
final note: the range will always start with 1.
if you have an answer, feel free to post just the general idea, or if you so desire, php snippet works too :)
thanks!
Since MD5 (and SHA-1, etc.) will give you 128 bits of data (in PHP, you'll get it in hexadecimal string notation, so you need to convert it to an integer first). That number modulo 10000 will give you your integer.
Note however that many different hashes will convert to the same integer; this is unavoidable with any sort of conversion to your integer range, as the modulo operation essentially maps a larger set of numbers (in this case, 128 bits, that is numbers from 0 to 340,282,366,920,938,463,463,374,607,431,768,211,456) to a smaller set of numbers (less than 17 bits, numbers from 1 to 100,000).
since the range that we want will always start at 1, the following works great. all credit goes to Piskvor, as he was the one who provided the basic idea of how to go at this.
the code below seams to accomplish what we want. please chime in if this can be (not the code, its just for reference, but if the idea) improved at all. running the code below will result in 6305 / 10000 unique results. that in our case is good enough.
<?
$final=array();
$range=10000;
for($i=1;$i<=$range;$i++){
$string='this is my test string - attempt #'.$i;
echo 'initial string: '.$string.PHP_EOL;
$crc32=crc32($string);
echo 'crc32 of string: '.$crc32.PHP_EOL;
$postalgo=$crc32%$range;
echo 'post algo: '.$postalgo.PHP_EOL;
if(!in_array($postalgo,$final)){
$final[]=$postalgo;
}
}
echo 'unique results for '.($i-1).' attempts: '.count($final).PHP_EOL;
?>
enjoy!
In my user database table, I take the MD5 hash of the email address of a user as the id.
Example: email(example#example.org) = id(d41d8cd98f00b204e9800998ecf8427e)
Unfortunately, I have to represent the ids as integer values now - in order to be able to use an API where the id can only be an integer.
Now I'm looking for a way to encode the id into an integer for sending an decode it again when receiving. How could I do this?
My ideas so far:
convert_uuencode() and convert_uudecode() for the MD5 hash
replace every character of the MD5 hash by its ord() value
Which approach is better? Do you know even better ways to do this?
I hope you can help me. Thank you very much in advance!
Be careful. Converting the MD5s to an integer will require support for big (128-bit) integers. Chances are the API you're using will only support 32-bit integers - or worse, might be dealing with the number in floating-point. Either way, your ID will get munged. If this is the case, just assigning a second ID arbitrarily is a much better way to deal with things than trying to convert the MD5 into an integer.
However, if you are sure that the API can deal with arbitrarily large integers without trouble, you can just convert the MD5 from hexadecimal to an integer. PHP most likely does not support this built-in however, as it will try to represent it as either a 32-bit integer or a floating point; you'll probably need to use the PHP GMP library for it.
There are good reasons, stated by others, for doing it a different way.
But if what you want to do is convert an md5 hash into a string
of decimal digits (which is what I think you really mean by
"represent by an integer", since an md5 is already an integer in string form),
and transform it back into the same md5 string:
function md5_hex_to_dec($hex_str)
{
$arr = str_split($hex_str, 4);
foreach ($arr as $grp) {
$dec[] = str_pad(hexdec($grp), 5, '0', STR_PAD_LEFT);
}
return implode('', $dec);
}
function md5_dec_to_hex($dec_str)
{
$arr = str_split($dec_str, 5);
foreach ($arr as $grp) {
$hex[] = str_pad(dechex($grp), 4, '0', STR_PAD_LEFT);
}
return implode('', $hex);
}
Demo:
$md5 = md5('example#example.com');
echo $md5 . '<br />'; // 23463b99b62a72f26ed677cc556c44e8
$dec = md5_hex_to_dec($md5);
echo $dec . '<br />'; // 0903015257466342942628374306682186817640
$hex = md5_dec_to_hex($dec);
echo $hex; // 23463b99b62a72f26ed677cc556c44e8
Of course, you'd have to be careful using either string, like making sure to use them only as string type to avoid losing leading zeros, ensuring the strings are the correct lengths, etc.
A simple solution could use hexdec() for conversions for parts of the hash.
Systems that can accommodate 64-bit Ints can split the 128-bit/16-byte md5() hash into four 4-byte sections and then convert each into representations of unsigned 32-bit Ints. Each hex pair represents 1 byte, so use 8 character chunks:
$hash = md5($value);
foreach (str_split($hash, 8) as $chunk) {
$int_hashes[] = hexdec($chunk);
}
On the other end, use dechex() to convert the values back:
foreach ($int_hashes as $ihash) {
$original_hash .= dechex($ihash);
}
Caveat: Due to underlying deficiencies with how PHP handles integers and how it implements hexdec() and intval(), this strategy will not work with 32-bit systems.
Edit Takeaways:
Ints in PHP are always signed, there are no unsigned Ints.
Although intval() may be useful for certain cases, hexdec() is more performant and more simple to use for base-16.
hexdec() converts values above 7fffffffffffffff into Floats, making its use moot for splitting the hash into two 64-bit/8-byte chunks.
Similarly for intval($chunk, 16), it returns the same Int value for 7fffffffffffffff and above.
Why ord()? md5 produce normal 16-byte value, presented to you in hex for better readability. So you can't convert 16-byte value to 4 or 8 byte integer without loss. You must change some part of your algoritms to use this as id.
You could use hexdec to parse the hexadecimal string and store the number in the database.
Couldn't you just add another field that was an auto-increment int field?
what about:
$float = hexdec(md5('string'));
or
$int = (integer) (substr(hexdec(md5('string')),0,9)*100000000);
Definitely bigger chances for collision but still good enaugh to use instead of hash in DB though?
Add these two columns to your table.
`email_md5_l` bigint(20) UNSIGNED GENERATED ALWAYS AS (conv(left(md5(`email`),16),16,10)) STORED,
`email_md5_r` bigint(20) UNSIGNED GENERATED ALWAYS AS (conv(right(md5(`email`),16),16,10)) STORED,
It might or might not help to create a PK on these two columns though, as it probably concatenates two string representations and hashes the result. It would kind of defeat your purpose and a full scan might be quicker but that depends on number of columns and records. Don't try to read these bigints in php as it doesn't have unsigned integers, just stay in SQL and do something like:
select email
into result
from `address`
where url_md5_l = conv(left(md5(the_email), 16), 16, 10)
and url_md5_r = conv(right(md5(the_email), 16), 16, 10)
limit 1;
MD5 does collide btw.
Use the email address as the file name of a blank, temporary file in a shared folder, like /var/myprocess/example#example.org
Then, call ftok on the file name. ftok will return a unique, integer ID.
It won't be guaranteed to be unique though, but it will probably suffice for your API.