PHP: XOR cypher decryption function doesnt work - php

Lately, I have been doing some research into cryptography.
To get a better understanding of all of it, I have been trying to write a more advanced version of the XOR cypher in PHP.
I got the encryption function to work just fine, but the decryption function output is quite strange and totally different from the message inputted.
The idea of the algorithm is to run a XOR operation first on the first and last character, then on the second and one but last character, and so on.
After that, it runs a XOR operation on the first two characters and the last two characters, then the third and fourth character and the 2th and 3th to last, and so on once again.
This goes on with blocks of 3, 4, 5 and more characters.
The code I have right now:
<?php
function encrypt($message, $key) {
$output_text = '';
// Add zeros at the end until the length of the message corresponds with the length of
the key
$message = str_pad($message,strlen($key),0);
if((strlen($message) % 2)) {
// The lenght of the message is odd, add a zero
$message = $message . 0;
}
// Define the final length of the message
$length = strlen($message);
// Firstly, take 1 character, then 2, then 3, etc. until you reach half the length of the message
for($characters=1; $characters<=($length/2); $characters++) {
// Loop from i til half the length of the message
for($i=0; $i<=(($length/2)-1); $i += $characters) {
// Take the first and last character, the the first two and the last two, etc.
// Stop when it crosses half the length
if( ($i + $characters ) >= ( $length / 2 ) ) break;
// firstly, the characters at the beginning
$beginning = substr($message, $i, $characters);
for($j=0; $j<$characters; $j++) {
$position = ( $i + 1 ) + $j;
$output_text .= chr(ord($beginning{$j}) ^ ord($key{$position}));
}
// Then those at the end
$ending = substr($message, $length-(($i+1) * $characters), $characters);
for($j=0; $j<$characters; $j++) {
$position = ( $length - ( ( $i + 1 )* $characters) ) + $j;
$output_text .= chr(ord($ending{$j}) ^ ord($key{$position}));
}
}
}
return $output_text;
}
function decrypt($message, $key) {
$output_text = null;
// Define the final length of the message
$length = strlen($message);
// Firstly, take 1 character, then 2, then 3, etc. until you reach half the length of the message
for($characters=1; $characters<=($length/2); $characters++) {
// Loop from i til half the length of the message
for($i=0; $i<=(($length/2)-1); $i += $characters) {
// Take the first and last character, the the first two and the last two, etc.
// Stop when it crosses half the length
if( ($i + $characters ) >= ( $length / 2 ) ) break;
// firstly, the characters at the beginning
$beginning = substr($message, $i, $characters);
for($j=0; $j<$characters; $j++) {
$position = ( $i + 1 ) + $j;
$output_text .= chr(ord($key{$position}) ^ ord($beginning{$j}));
}
// The those at the end
$ending = substr($message, $length-(($i+1) * $characters), $characters);
for($j=0; $j<$characters; $j++) {
$position = ( $length - ( ( $i + 1 )* $characters) ) + $j;
$output_text .= chr(ord($key{$position}) ^ ord($ending{$j}));
}
}
}
return $output_text;
}
$message = 'sampletextjusttotrythisoutcreatedin2012';
$key = '123';
$output_text = encrypt($message, $key);
echo $output_text . '<br /><hr />';
echo decrypt($output_text, $key);
Thanks in advance for trying to help me!

Let's start with something a bit simpler - given a message and a key, XOR the message with the key to encrypt it. XOR the encrypted message with the key to decrypt it.
$msg = "The rooster crows at midnight!";
$key = "secret key";
$cipher_text = simple_xor($msg, $key);
$plain_text = simple_xor($cipher_text, $key);
echo "Original msg: $msg\n";
echo "Supplied key: $key\n";
echo "\n";
echo "Cipher Text: " . base64_encode($cipher_text) . "\n";
echo " Decrypted: " . $plain_text . "\n";
function simple_xor($input, $key) {
# Input must be of even length.
if (strlen($input) % 2)
$input .= '0';
# Keys longer than the input will be truncated.
if (strlen($key) > strlen($input))
$key = substr($key, 0, strlen($input));
# Keys shorter than the input will be padded.
if (strlen($key) < strlen($input))
$key = str_pad($key, strlen($input), '0', STR_PAD_RIGHT);
# Now the key and input are the same length.
# Zero is used for any trailing padding required.
# Simple XOR'ing, each input byte with each key byte.
$result = '';
for ($i = 0; $i < strlen($input); $i++) {
$result .= $input{$i} ^ $key{$i};
}
return $result;
}
Here you can see intrinsic value of XOR. Given Msg XOR Key = C then C XOR Key = Msg and C XOR Msg = Key.
Now lets return to your approach - it appears you wish to mix more characters together to generate a 'stronger' encrypted result. Before doing that, take a moment to reflect on what creates encryption strength when using XOR in this fashion. During this process, assume the attacker has the above code, but not the $msg or $key.
The attacker will know how long your message and key are, because this algorithm always generates a result that is the same number of bytes as the message and key.
The strongest key will be one where each byte is different - this way the result will not contain patterns. For example, if you encrypt English text with a key containing just one repeated byte, I might notice that the cipher-text contains one byte repeated multiple times. This is probably the letter 'e' in your plain-text, the most popular vowel. If the key contained completely random bytes, then any pattern spotted in the cipher-text would not help me identify the plain-text.
So, is a msg of 'Feet' and key of 'abcd' strong? Well, it's certainly stronger than using a key of '0000', but it could be stronger. The attacker might assume that you used a simple key, containing just lower-case letters. This means to brute force this key, the attacker needs to try 26 ^ 8 possible options. This can be done in less than a second on modern computers. A better key would incorporate upper-case letters, digits, punctuation and other characters. An even better key would include non-printable characters as well, for example: $key = chr(27) . chr(6) . 'q.';
Another interesting element to consider with this algorithm is that it requires the key to be of equal length to the msg. This means to strongly encrypt a large amount of text (such as a novel) requires a key that's completely random that's also as long as the novel. Most mainstream algorithms avoid this requirement by encrypting the message in blocks. There are many different ways to implement block encoding, let me illustrate one known as cipher-block-chaining (CBC).
Simple CBC works by taking the first few bytes of the plain-text, XOR'ing it with the key, which generates the first few bytes of the cipher-text. The next few bytes of the plain-text are encrypted by XOR'ing them with the first few bytes of the cipher-text AND the key. This process repeats until all plain-text is encrypted. This creates a chain, where each block in the cipher text has been created using the prior block and key. To decrypt the last result, you must XOR the cipher-text with the previous cipher-text block and then again with the key.
Strong algorithms incorporate other features to ensure the cipher-text is as random as possible, including features that allow you to determine if an encrypted message has been modified. A good place to read more about block cipher modes is Wikipedia: http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
The Cryptography page also has a great set of introductory information on different forms of encryption and the process of Cryptanalysis. http://en.wikipedia.org/wiki/Cryptography

Right now, the hardest part of "decrypting" a string is figuring out how long the input was. If we take that as an additional parameter, we can nearly decrypt it like this:
function decrypt($cipher, $messagelen, $key) {
if($messagelen % 2) { $messagelen++; }
$x = substr($cipher, -$messagelen + 2);
$y = substr($x, 0, strlen($key) - 1) ^ substr($key, 1);
$z = substr($x, strlen($key) - 1);
return $y . $z;
}
This is made much easier because most of the message appears in the clear at the end of the ciphertext. Oops. The only characters in that repetition which are "encrypted" are the first few, which are just XORed with the key.
The middle two characters are irretrievably lost due to an off-by-one error in encryption. Notes on how to fix this are in my comments below.

Related

How to generate random key separated by hypen

I am working on Yii. I want to generate 20 digit random keys. I had written a function as -
public function GenerateKey()
{
//for generating random confirm key
$length = 20;
$chars = array_merge(range(0,9), range('a','z'), range('A','Z'));
shuffle($chars);
$password = implode(array_slice($chars, 0, $length));
return $password;
}
This function is generating 20 digit key correctly. But I want the key in a format like
"g12a-Gh45-gjk7-nbj8-lhk8". i.e. separated by hypen. So what changes do I need to do?
You can use chunk_split() to add the hyphens. substr() is used to remove the trailing hyphen it adds, leaving only those hyphens that actually separate groups.
return substr(chunk_split($password, 4, '-'), 0, 24);
However, note that shuffle() not only uses a relatively poor PRNG but also will not allow the same character to be used twice. Instead, use mt_rand() in a for loop, and then using chunk_split() is easy to avoid:
$password = '';
for ($i = 0; $i < $length; $i++) {
if ( $i != 0 && $i % 4 == 0 ) { // nonzero and divisible by 4
$password .= '-';
}
$password .= $chars[mt_rand(0, count($chars) - 1)];
}
return $password;
(Even mt_rand() is not a cryptographically secure PRNG. If you need to generate something that must be extremely hard to predict (e.g. an encryption key or password reset token), use openssl_random_pseudo_bytes() to generate bytes and then a separate function such as bin2hex() to encode them into printable characters. I am not familiar with Yii, so I cannot say whether or not it has a function for this.)
You can use this Yii internal function:
Yii::app()->getSecurityManager()->generateRandomString($length);

How does this code extract the signature?

I have to debug an old PHP script from a developer who has left the company. I understand the most part of the code, except the following function. My question: What does...
if($seq == 0x03 || $seq == 0x30)
...mean in context of extracting the signature out of an X.509 certificate?
public function extractSignature($certPemString) {
$bin = $this->ConvertPemToBinary($certPemString);
if(empty($certPemString) || empty($bin))
{
return false;
}
$bin = substr($bin,4);
while(strlen($bin) > 1)
{
$seq = ord($bin[0]);
if($seq == 0x03 || $seq == 0x30)
{
$len = ord($bin[1]);
$bytes = 0;
if ($len & 0x80)
{
$bytes = ($len & 0x0f);
$len = 0;
for ($i = 0; $i < $bytes; $i++)
{
$len = ($len << 8) | ord($bin[$i + 2]);
}
}
if($seq == 0x03)
{
return substr($bin,3 + $bytes, $len);
}
else
{
$bin = substr($bin,2 + $bytes + $len);
}
}
else
{
return false;
}
}
return false;
}
An X.509 certificate contains data in multiple sections (called Tag-Length-Value triplets). Each section starts with a Tag byte, which indicates the data format of the section. You can see a list of these data types here.
0x03 is the Tag byte for the BIT STRING data type, and 0x30 is the Tag byte for the SEQUENCE data type.
So this code is designed to handle the BIT STRING and SEQUENCE data types. If you look at this part:
if($seq == 0x03)
{
return substr($bin,3 + $bytes, $len);
}
else // $seq == 0x30
{
$bin = substr($bin,2 + $bytes + $len);
}
you can see that the function is designed to skip over Sequences (0x30), until it finds a Bit String (0x03), at which point it returns the value of the Bit String.
You might be wondering why the magic number is 3 for Bit String and 2 for Sequence. That is because in a Bit String, the first value byte is a special extra field which indicates how many bits are unused in the last byte of the data. (For example, if you're sending 13 bits of data, it will take up 2 bytes = 16 bits, and the "unused bits" field will be 3.)
Next issue: the Length field. When the length of the Value is less than 128 bytes, the length is simply specified using a single byte (the most significant bit will be 0). If the length is 128 or greater, then the first length byte has bit 7 set, and the remaining 7 bits indicates how many following bytes contain the length (in big-endian order). More description here. The parsing of the length field happens in this section of the code:
$len = ord($bin[1]);
$bytes = 0;
if ($len & 0x80)
{
// length is greater than 127!
$bytes = ($len & 0x0f);
$len = 0;
for ($i = 0; $i < $bytes; $i++)
{
$len = ($len << 8) | ord($bin[$i + 2]);
}
}
After that, $bytes contains the number of extra bytes used by the length field, and $len contains the length of the Value field (in bytes).
Did you spot the error in the code? Remember,
If the length is 128 or greater, then the first length byte has bit 7
set, and the remaining 7 bits indicates how many following bytes
contain the length.
but the code says $bytes = ($len & 0x0f), which only takes the lower 4 bits of the byte! It should be:
$bytes = ($len & 0x7f);
Of course, this error is only a problem for extremely long messages: it will work fine as long as the length value will fit within 0x0f = 15 bytes, meaning the data has to be less than 256^15 bytes. That's about a trillion yottabytes, which ought to be enough for anybody.
As Pateman says above, you just have a logical if, we're just checking if $seq is either 0x30 or 0x03.
I have a feeling you already know that though, so here goes. $seq is the first byte of the certificate, which is probably either the version of the certificate or the magic number to denote that the file is a certificate (also known as "I'm guessing this because 10:45 is no time to start reading RFCs").
In this case, we're comparing against 0x30 and 0x03. These numbers are expressed in hexadecimal (as is every number starting with 0x), which is base-16. This is just really a very convenient shorthand for binary, as each hex digit corresponds to exactly four binary bits. A quick table is this:
0 = 0000
1 = 0001
2 = 0010
3 = 0011
...
...
E = 1110
F = 1111
Equally well, we could have said if($seq == 3 || $seq == 48), but hex is just much easier to read and understand in this case.
I'd hazard a guess that it's a byte-order-independent check for version identifier '3' in an x.509 certificate. See RFC 1422, p7. The rest is pulling the signature byte-by-byte.
ord() gets the value of the ASCII character you pass it. In this case it's checking to see if the ASCII character is either a 0 or end of text (according to this ASCII table).
0x03 and 0x30 are hex values. Look that up and you'll have what $seq is matching to

unique number from a string - php

I have some strings containing alpha numeric values, say
asdf1234,
qwerty//2345
etc..
I want to generate a specific constant number related with the string. The number should not match any number generated corresponding with other string..
Does it have to be a number?
You could simply hash the string, which would give you a unique value.
echo md5('any string in here');
Note: This is a one-way hash, it cannot be converted from the hash back to the string.
This is how passwords are typically stored (using this or another hash function, typically with a 'salt' method added.) Checking a password is then done by hashing the input and comparing to the stored hash.
edit: md5 hashes are 32 characters in length.
Take a look at other hash functions:
http://us3.php.net/manual/en/function.crc32.php (returns a number, possibly negative)
http://us3.php.net/manual/en/function.sha1.php (40 characters)
You can use a hashing function like md5, but that's not very interesting.
Instead, you can turn the string into its sequence of ASCII characters (since you said that it's alpha-numeric) - that way, it can easily be converted back, corresponds to the string's length (length*3 to be exact), it has 0 collision chance, since it's just turning it to another representation, always a number and it's a little more interesting... Example code:
function encode($string) {
$ans = array();
$string = str_split($string);
#go through every character, changing it to its ASCII value
for ($i = 0; $i < count($string); $i++) {
#ord turns a character into its ASCII values
$ascii = (string) ord($string[$i]);
#make sure it's 3 characters long
if (strlen($ascii) < 3)
$ascii = '0'.$ascii;
$ans[] = $ascii;
}
#turn it into a string
return implode('', $ans);
}
function decode($string) {
$ans = '';
$string = str_split($string);
$chars = array();
#construct the characters by going over the three numbers
for ($i = 0; $i < count($string); $i+=3)
$chars[] = $string[$i] . $string[$i+1] . $string[$i+2];
#chr turns a single integer into its ASCII value
for ($i = 0; $i < count($chars); $i++)
$ans .= chr($chars[$i]);
return $ans;
}
Example:
$original = 'asdf1234';
#will echo
#097115100102049050051052
$encoded = encode($original);
echo $encoded . "\n";
#will echo asdf1234
$decoded = decode($encoded);
echo $decoded . "\n";
echo $original === $decoded; #echoes 1, meaning true
You're looking for a hash function, such as md5. You probably want to pass it the $raw_output=true parameter to get access to the raw bytes, then cast them to whatever representation you want the number in.
A cryptographic hash function will give you a different number for each input string, but it's a rather large number — 20 bytes in the case of SHA-1, for example. In principle it's possible for two strings to produce the same hash value, but the chance of it happening is so extremely small that it's considered negligible.
If you want a smaller number — say, a 32-bit integer — then you can't use a hash function because the probability of collision is too high. Instead, you'll need to keep a record of all the mappings you've established. Make a database table that associates strings with numbers, and each time you're given a string, look it up in the table. If you find it there, return the associated number. If not, choose a new number that isn't used by any of the existing records, and add the new string and number to the table.

shorter php cipher than md5?

For a variety of stupid reasons, the maximum length of a given form variable that we are posting to an external server is 12 characters.
I wanted to obscure that value with md5, but obviously with 12 characters that isn't going to work. Is there a cipher with an already-made PHP function which will result in something 12 characters or less?
The security and integrity of the cipher isn't super important here. My last resort is to just write a function which moves each letter up or down an ascii value by x. So the goal isn't to obscure it from a cryptography expert, but just to not post it in plain text so a non-technical worker looking at it won't know what it is.
Thanks for any advice.
maybe this will help you generate a 12 char string that you can pass in a URL, without increasing the risk of collisions
substr(base_convert(md5($string), 16,32), 0, 12);
This is an addition to this answer.
The answer proposes to take the first twelve characters from a 32 character representation of md5. Thus 20 characters of information will be lost - this will result in way more possible collisions.
You can reduce the loss of information by taking the first twelve characters of a 16 character representation (the raw form):
substr(md5($string, true), 0, 12);
This will maintain 75% of the data, whereas the use of the 32 char form only maintains 37.5% of the data.
Try crc32() maybe?
If you just need a hash, you can still use the first 12 characters from the md5 hash.
substr(md5($yourString), 0, 12);
All the answers are suggesting loosing some of the data (higher collision possibility), but looks like using using base conversion is a better approach:
e.g. like described here http://proger.i-forge.net/Short_MD5/OMF
You may also generate any random string and insert it into database, checking if not already exists prior to saving. This will allow you to have short hashes, and ensure there are no collisions.
I have to put this suggestion across as I have to assume you are in control of the script that your encrypted value is sent to....
I also have to assume that you can create many form fields but they can't have a length larger than 12 characters each.
If that's the case, could you not simply create more than one form field and spread the md5 string across multiple hidden fields?
You could just split the md5 string into chunks of 8 and submit each chunk in a hidden form field and then join them together at the other end.
Just a thought...
You can make use of a larger alphabet and make hash shorter but still reversible to original value.
I implemented it here - for example, hash ee45187ab28b4814cf03b2b4224eb974 becomes 7fBKxltZiQd7TFsUkOp26w - it goes from 32 to 22 characters. And it can become even less if you use a larger alpahabet. If you use unicode, you can even encode hash with emoji...
This probably won't be of use to the OP since they were looking for 2 way function but may help someone looking for a shorter hash than md5. Here is what I came up with for my needs (thanks to https://rolandeckert.com/notes/md5 for highlighting the base64_encode function). Encode the md5 hash as base(64) and remove any undesirable base(64) characters. I'm removing vowels + and / so reducing the effective base from 64 to 52.
Note if you truncate a base(b) encoded hash after c characters it will allow for b ^ c unique hashes. Is this robust enough to avoid collisions? It depends on how many items (k) you are hashing. The probability of collision is roughly (k * k) / (b ^ c) / 2, so if you used the function below to hash k = 1 million items with base b = 52 encoding truncated after c = 12 characters the probability of collision is < 1 in 750 million. Compare to truncating the hex encoded (b = 16) hash after c = 12 characters. The probability of collision is roughly 1 in 500! Just say no to truncating hex encoded hashes. :)
I'll go out on a limb and say the function below (with length 12) is reasonably safe for 10 million items (< 1 in 7.5 million probability of collision), but if you want to be extra safe use base(64) encoding (comment out the $remove array) and/or truncate fewer characters.
// convert md5 to base64, remove undesirable characters and truncate to $length
function tinymd5($str, $length) { // $length 20-22 not advised unless $remove = '';
// remove vowels to prevent undesirable words and + / which may be problematic
$remove = array('a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U', '+', '/');
$salt = $str;
do { // re-salt and loop if rebase removes too many characters
$salt = $base64 = base64_encode(md5($salt, TRUE));
$rebase = substr(str_replace($remove, '', $base64), 0, $length);
} while ($length < 20 && substr($rebase, -1) == '=');
return str_pad($rebase, min($length, 22), '='); // 22 is max possible length
}
$str = 'Lorem ipsum dolor sit amet 557726776';
echo '<br />' . md5($str); // 565a0bf7e0ba474fdaaec57b82e6504a
$x = md5($str, TRUE);
echo '<br />' . base64_encode($x); // VloL9+C6R0/arsV7guZQSg==
echo '<br />' . tinymd5($str, 12); // VlL9C6R0rsV7
echo '<br />' . tinymd5($str, 17); // VlL9C6R0rsV7gZQSg
$x = md5(base64_encode($x), TRUE); // re-salt triggered < 20
echo '<br />' . base64_encode($x); // fmkPW/OQLqp7PTex0nK3NQ==
echo '<br />' . tinymd5($str, 18); // fmkPWQLqp7PTx0nK3N
echo '<br />' . tinymd5($str, 19); // fmkPWQLqp7PTx0nK3NQ
echo '<br />' . tinymd5($str, 20); // fmkPWQLqp7PTx0nK3NQ=
echo '<br />' . tinymd5($str, 22); // fmkPWQLqp7PTx0nK3NQ===
$hashlen = 4;
$cxtrong = TRUE;
$sslk = openssl_random_pseudo_bytes($hashlen, $cxtrong);
$rand = bin2hex($sslk);
echo $rand;
You can change the hash length (in multiples of two) by changing the value of the variable $hashlen
I came up with base 90 for reducing md5 to 20 multi-byte characters (that I tested to fit properly in a mysql's varchar(20) column). Unfortunately this actually makes the string potentially larger than even the 32 bytes from php's md5, with the only advantage that they can be stored in varchar(20) columns. Of course you could just replace the alphabet with single-byte ones if your worries are about storage...
There are a couple of rules that are important to have in mind if your idea is to use this reduced hash as a lookup key in something like mysql and for other kinds of processing:
By default MySQL does not differentiate Upper Case from Lower Case in a typical where clause which takes out a lot of characters right out of the possible target alphabets. This include not only english character but also almost all characters in other languages.
It's important that your hash can be upper-cased and lower-cased transparently since many systems uppercase these keys, so to keep it consistent with md5 in that sense you should use only lowercase when using case-able characters.
This is the alphabet I used (I handpicked each character to make the hashes as nice as possible):
define('NICIESTCHARS', [
"0","1","2","3","4","5","6","7","8","9",
"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
"¢","£","¥","§","¶","ø","œ","ƒ","α","δ","ε","η","θ","ι","λ","μ","ν","π","σ","τ","φ","ψ","ω","ћ","џ","ѓ","ѝ","й","ќ","ў","ф","э","ѣ","ѷ","ѻ","ѿ","ҁ","∂","∆","∑","√","∫",
"!","#","$","%","&","*","+","=","#","~","¤","±"
]);
Here is the code in PHP (I suppose it's not the best code but does the job). And keep in mind that it only works for strings in hexa (0-F) that are a multiple of 8 in length like md5 in php which is 32 0-f bytes:
function mbStringToArray ($string) {
$strlen = mb_strlen($string);
while ($strlen) {
$array[] = mb_substr($string,0,1,"UTF-8");
$string = mb_substr($string,1,$strlen,"UTF-8");
$strlen = mb_strlen($string);
}
return $array;
}
class Base90{
static function toHex5($s){
// Converts a base 90 number with a multiple of 5 digits to hex (compatible with "hexdec").
$chars = preg_split('//u', $s, null, PREG_SPLIT_NO_EMPTY);
$map = array_flip(NICIESTCHARS);
$rt = '';
$part = [];
$b90part = '';
foreach($chars as $c){
$b90part .= $c;
$part[] = $map[$c];
if(count($part) == 5){
$int = base90toInt($part);
$rt .= str_pad(dechex($int), 8, "0", STR_PAD_LEFT);
$part = [];
$b90part = '';
}
}
return $rt;
}
static function fromHex8($m){
// Converts an hexadecimal number compatible with "hexdec" to base 90
$parts = [];
$part = '';
foreach(str_split($m) as $i => $c){
$part.= $c;
if(strlen($part) === 8){
$parts[] = intToBase90(hexdec($part));
$part = '';
}
}
return implode('', $parts);
}
}
function intToBase90($int){
$residue = $int;
$result = [];
while($residue){
$digit = $residue % 90;
$residue -= $digit;
$residue = $residue / 90;
array_unshift($result, NICIESTCHARS[$digit]);
}
$result = implode('', $result);
return $result;
}
function base90toInt($digits){
$weight = 1;
$rt = 0;
while(count($digits)){
$rt += array_pop($digits)*$weight;
$weight *= 90;
}
return $rt;
}

PHP function to create 8 chars long hash ([a-z] = no numbers allowed)

I need PHP function that will create 8 chars long [a-z] hash from any input string.
So e.g. when I'll submit "Stack Overflow" it will return e.g. "gdqreaxc" (8 chars [a-z] no numbers allowed)
Perhaps something like:
$hash = substr(strtolower(preg_replace('/[0-9_\/]+/','',base64_encode(sha1($input)))),0,8);
This produces a SHA1 hash, base-64 encodes it (giving us the full alphabet), removes non-alpha chars, lowercases it, and truncates it.
For $input = 'yar!';:
mwinzewn
For $input = 'yar!!';:
yzzhzwjj
So the spread seems pretty good.
This function will generate a hash containing evenly distributed characters [a-z]:
function my_hash($string, $length = 8) {
// Convert to a string which may contain only characters [0-9a-p]
$hash = base_convert(md5($string), 16, 26);
// Get part of the string
$hash = substr($hash, -$length);
// In rare cases it will be too short, add zeroes
$hash = str_pad($hash, $length, '0', STR_PAD_LEFT);
// Convert character set from [0-9a-p] to [a-z]
$hash = strtr($hash, '0123456789', 'qrstuvwxyz');
return $hash;
}
By the way, if this is important for you, for 100,000 different strings you'll have ~2% chance of hash collision (for a 8 chars long hash), and for a million of strings this chance rises up to ~90%, if my math is correct.
function md5toabc($myMD5)
{
$newString = "";
for ($i = 0; $i < 16; $i+=2)
{
//add the first val of 0-15 to the second val of 0-15 for a range of 0-30
$myintval = hexdec(substr($myMD5, $i, $i +1) ) +
hexdec(substr($myMD5, $i+1, $i +2) );
// mod by 26 and add 97 to get to the lowercase ascii range
$newString .= chr(($myintval%26) + 97);
}
return $newString;
}
Note this introduces bias to various characters, but do with it what you will.
(Like when you roll two dice, the most common value is a 7 combined...) plus the modulo, etc...
one can give you a good a-p{8} (but not a-z) by using and modifying (the output of) a well known algo:
function mini_hash( $string )
{
$h = hash( 'crc32' , $string );
for($i=0;$i<8;$i++) {
$h{$i} = chr(96+hexdec($h{$i}));
}
return $h;
}
interesting set of constraints you posted there
how about
substr (preg_replace(md5($mystring), "/[1-9]/", ""), 0, 8 );
you could add a bit more entorpy by doing a
preg_replace($myString, "1", "g");
preg_replace($myString, "2", "h");
preg_replace($myString, "3", "i");
etc instead of stripping the digits.

Categories