What is the best way to create a short (6 chars), random, and with low collison probability? I need to create short links like bit.ly.
The problem of md5, sha1, uniqid etc. is that they don't generate uppercase characters, so I'm looking for a case-sensitive output to have a wider range of possible values...
I like to use Hashids for this kind of thing:
Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers.
It converts numbers like 347 into strings like “yr8”, or array of numbers like [27, 986] into “3kTMd”.
You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs.
Hashids has been ported to many languages, including PHP.
(Note that, despite the name, Hashids is not a true hashing system since it is designed to be reversible.)
Related
Please check this
https://github.com/namick/obfuscate_id
This plugin converts id 7000 to 5270192353
I tried https://github.com/ivanakimov/hashids.php and it similar ones but it converts ids into a mix of alphabets like (yJJpo90) and numbers.I don't want that.I want IDs to convert into a positive integers.Are there any php packages for this sort?
You can try Optimus id transformation:
With this library, you can transform your internal id's to obfuscated integers based on Knuth's integer hash. It is similar to Hashids, but will generate integers instead of random strings. It is also super fast.
https://github.com/jenssegers/optimus
This is getting me crazy, my md5's don't agree. I have this string:
The Combinations Generator is a tool that allows you to easily create
a series of combinations by selecting the related attributes. For
example, if you're selling t-shirts in three different sizes and two
different colors, the generator will create six combinations for you.
When I hash it on my computer using the md5 function (with php 5.5.0) it produces the following hash: 422f3f656e1a5f95e8b5cf7565d815b5
http://www.miraclesalad.com/webtools/md5.php agrees with my computer's result.
http://www.md5.cz/ disagrees with both my computer and miraclesalad.
This string/md5 pair was initially computed by another computer which also gives the same result as md5.cz.
I read about encoding issues (although the string doesn't contain any non ASCII characters), so I tried the following code on my computer:
<?php
$str = "The Combinations Generator is a tool that allows you to easily create a series of combinations by selecting the related attributes. For example, if you're selling t-shirts in three different sizes and two different colors, the generator will create six combinations for you.";
echo "$str<BR/>";
echo md5($str)."<BR/>";
echo md5(utf8_encode($str))."<BR/>";
echo md5(utf8_decode($str))."<BR/>";
die();
The output is:
The Combinations Generator is a tool that allows you to easily create
a series of combinations by selecting the related attributes. For
example, if you're selling t-shirts in three different sizes and two
different colors, the generator will create six combinations for you.
422f3f656e1a5f95e8b5cf7565d815b5
422f3f656e1a5f95e8b5cf7565d815b5
422f3f656e1a5f95e8b5cf7565d815b5
So it is not about utf8.
Any idea what's happening?
My best guess is that it has something to do with the ' mark in the word "you're" and character encodings. If you remove that quote both sites report the same md5.
I tried feeding the string above incrementally to both sites you linked to in your question, and it turns out that the character breaking the generator at md5.cz is the apostrophe in if you're selling t-shirts.
If you strip the string of special characters before feeding it to a hasher, possibly preserving the string's uniqueness using something like urlencode(), you should get matching hashes for any string.
The strings need to be exactly the same, including the whitespaces.
Probably the sites are using some transformation like trim() or stripslashes().
md5 will return the same value only if the strings are exact.
md5 is md5. That's all there is to it. If you get different hashes from different (non-buggy) implementations, then you're feeding in diffent inputs. Remember that md5 is DESIGNED to produce wildly different outputs if the input(s) are even slightly different. A single whitespace character (tab, linebreak, etc...) at the end of one of your test strings will totally trash your expected hash, because you've fed in a different input.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
php short hash
I need to generate a short hash. The shortest possible from urls say under 6 characters.
I need them to be unique just for the same domain, so a hash from
www.example.com/category/sth/blablabla must be different than one from
www.example.com/category2/sth/blabla but not from:
www.example2.com/category/sth/blablabla
Would using md5($url) and then picking some 5 characters out of that result (for example the first, last, middle and 2 other characters) give and unique id?
Would this abbreviated hash be unique as well?
A hash is not unique by definition. It's mathematically impossible to get a unique hash for something longer than the hash, unless it does not vary fully, which is the case for URLs but you cannot exploit it generally. Alternatively, you could use a simple incrementing ID, but that won't allow you to recognize matching URLs.
Either use a really long hash (at least 10 characters, ideally using upper and lower case letters), or accept collisions and handle them appropriately. Which is how actual hash tables work.
For low probability of collisions you can use universal hashing techniques. For example, choose a prime number P. Then for each character of the URL choose a random in the interval [0, P). Compute the hash of the URL as SUM(a[i]*c[i]) mod P, where c[i] is a character in the original URL. Then take the string containing the digits of the obtained integer as the hash.
Read more in this paper: http://www.cs.cmu.edu/~avrim/451/lectures/lect0929.pdf.
Yes, a small change in a URL will change pretty much every character in a good hash. MD5 or SHA1 is probably fine for this. Hence, take the first X characters - and you won't get any improvement by choosing the last X characters, or the first/last/middle. They're all good!
Obviously the more characters you put in your partial hash, the less likely you are to get collisions.
I would try using crc32($url); it will give an integer usually 10-11 digits-long, could be a negative value, but still it will be shorter than 32 chars for md5.
The only problem is that crc32 is not 100% unique, but it's very unlikely that two different URLs will end up with the same checksum (but still there is a possibility).
I'd like to have a super simple / fast encrypt/decrypt function for non-critical pieces of data. I'd prefer the encryped string to be url-friendly (bonus points for pure alphanumerics), and no longer than it has to be. Ideally it should have some sort of key or other mechanism to randomize the cipher as well.
Because of server constraints the solution should not use mcrypt. Ideally it should also avoid base64 because of easier decrypting.
Example strings:
sample#email_address.com
shortstring
two words
or three words
555-123-4567
Capitals Possible?
You will probably have to code it yourself, but a Vigenère cypher on the characters A-Z, a-z, 0-9 should meet your needs.
With careful key generation and a long key (ideally longer than the encrypted text) Vigenère can be secure, but you have to use it very carefully to ensure that.
There's a wide variety of easy-to-implement ciphers around, such as XTEA. Don't invent your own, or use a trivially broken one like the vigenere cipher. Better yet, don't do this at all - inventing your own cryptosystems is fraught with danger, and if you don't want your users to view the data, you probably shouldn't be sending it to them in the first place.
I'm not sure what this is called, which is why I'm having trouble searching for it.
What I'm looking to do is to take numbers and convert them to some alphanumeric base so that the number, say 5000, wouldn't read as '5000' but as 'G4u', or something like that. The idea is to save space and also not make it obvious how many records there are in a given system. I'm using php, so if there is something like this built into php even better, but even a name for this method would be helpful at this point.
Again, sorry for not being able to be more clear, I'm just not sure what this is called.
You want to change the base of the number to something other than base 10 (I think you want base 36 as it uses the entire alphabet and numbers 0 - 9).
The inbuilt base_convert function may help, although it does have the limitation it can only convert between bases 2 and 36
$number = '5000';
echo base_convert($number, 10, 36); //3uw
Funnily enough, I asked the exact opposite question yesterday.
The first thing that comes to mind is converting your decimal number into hexadecimal. 5000 would turn into 1388, 10000 into 2710. Will save a few bytes here and there.
You could also use a higher base that utilizes the full alphabet (0-Z instead of 0-F) or even the full 256 ASCII characters. As #Yacoby points out, you can use base_convert() for that.
As I said in the comment, keep in mind that this is not an efficient way to mask IDs. If you have a security problem when people can guess the next or previous ID to a record, this is very poor protection.
dechex will convert a number to hex for you. It won't obfuscate how many records are in a given system, however. I don't think it will make it any more efficient to store or save space, either.
You'd probably want to use a 2 way crypt function if obfuscation is needed. That won't save space, either.
Please state your goals more clearly and give more background, because this seems a bit pointless as it is.
This might confuse more people than simply converting the base of the numbers ...
Try using signed digits to represent your numbers. For example, instead of using digits 0..9 for decimal numbers, use digits -5..5. This Wikipedia article gives an example for the binary representation of numbers, but the approach can be used for any numeric base.
Using this together with, say, base-36 arithmetic might satisfy you.
EDIT: This answer is not really a solution to the question, so ignore it unless you are trying to hash a number.
My first thought we be to hash it using eg. md5 or sha1. (You'd probably not save any space though...)
To prevent people from using rainbow-tables or brute force to guess which number you hashed, you can always add a salt. It can be as simple as a string prepended to your number before hashing it.
md5 would return an alphanumeric string of exactly 32 chars and sha1 would return one of exaclty 40 chars.