I'm trying to generate a unique order number for my ecommerce application, this is my code:
<?php
$bytes = random_bytes(3);
$random_hash = bin2hex($bytes);
$order_num = $random_hash . "1";
echo strtoupper(hash('crc32b', $order_num));
The order number (in the example is 1), is going to be an auto-increment value retrieved from MySQL.
Does this ensure me uniqueness?
I wanted a short max 8-10 chars unique final value.
An only numbers solution would be fine too.
As far as I know, most hash algorithms make no guarantee of when collisions might occur, so you're probably just as likely to get a collision with your proposed code as using the random part on its own.
If the auto-increment part is unique, and the random part is just to avoid guesses, you could just concatenate the two parts together (i.e. everything in your example before the hash call). That way if the same random number comes up twice, it will have different numbers on the end.
If that results in something too long, you could do something with base_convert or asc to convert the number into a shorter representation.
The hash function will not provide any uniqueness to the id, it only obfuscates the id a bit.
If you have lets say 100 possible values, you would get 100 possible hashes from them, no more. If an attacker wants to brute-force the hashes, he can pick the 100 possible hashes and try them.
In your case with 3 bytes of randomness, you would not get all possible combinations before you get a duplicate. So the same random number would be generated much earlier than with 3 bytes of possible combinations.
There are two common approaches when it comes to unique ids:
You let the database automatically increment the id, this makes sure that the id is unique.
You generate a UUID (global id with 16 bytes) which offers such a huge keyspace, that a duplicate is extremely unlikely. In practice one can neglate the possiblilty of duplicates.
The UUID has a lot of advantages and one disadvantage:
(+) UUID's can work decentralized e.g. in an offline scenario.
(+) One can generate the id before it is inserted in the database, so one has not to wait before the row is created in the db.
(+) The ids are not deterministic, so an attacker cannot guess the next id.
(-) They use more storage space and are a bit slower when searching.
Related
I'm having a question regarding the uniqueness of md5 function.
I know that md5 (with microtime value) are not unique, however, they are pretty unique :)
How can I calculate the probability of a collision between two portions of an md5 hashes?
For example: The following in php that generates a 8 chars string from md5 result:
substr(md5(microtime()), 0, 8);
A second scenario - What if the index of it is unique (so it gets a different portion of the hash each time)?
substr(md5(microtime()), rand(0, 32), 8);
There are 2^32 combinations of 8 hexadecimal digits. Even if they are completely random, you can only generate about 65000 such strings, on average, before you get 2 that are the same.
md5(), using a random index or not, doesn't significantly change anything as long as all the microtime() values use use are unique. But, if you are generating these too fast, or across many machines, then the situation is much much worse, because there's a good chance you could end up using the same microtime() value twice.
As you are asking about uniqueness of your string, it's actually a probability. Means as much string character you will use and as much the length of random string you make will get less chances of getting similar random string.
So, to get unique string you need to store string in your DB and compare with random string, if you found similar then again go for new fresh string , until you get unique string.
It depends on how many "sub-hashes" you are going to generate and how many bits you're keeping from the original MD5 hash (length of a "sub-hash"). If you generate just 1 sub-hash and keep just 1 bit then no collision at all. If you generate 2 sub-hashes expect 50% collision. Use 2 bits and the odds are 25%. You do the math. Refer to the birthday paradox for more info
I have a table with few rows (tops 50), I need to get random value out of table I can do that by
ORDER BY RAND() LIMIT 1
Main question is in the point when I have 6k selects in 5 seconds is rand stil 'reliable'?
How is rand calculated, can I seed it over time? (idk, every 5 seconds).
The MySQL pseudo-random number generator is completely deterministic. The docs say:
RAND() is not meant to be a perfect random generator. It is a fast way to generate random numbers on demand that is portable between platforms for the same MySQL version.
It can't use /dev/random because MySQL is designed to work on a variety of operating systems, some of which don't have a /dev/random.
MySQL initializes a default seed at server startup, using the integer returned by time(0).
If you're interested in the source line, it's in the MySQL source in file sql/mysqld.cc, function init_server_components(). I don't think it ever re-seeds itself.
Then the subsequent "random" numbers are based solely on the seed. See source file mysys_ssl/my_rnd.cc, function my_rnd().
The best practice solution to your random-selection task, for both performance and quality of randomization, is to generate a random value between the minimum primary key value and maximum primary key value. Then use that random value to select a primary key in your table:
SELECT ... FROM MyTable WHERE id > $random LIMIT 1
The reason you'd use > instead of = is that you might have gaps in the id due to rows being deleted or rolled back, or you might have other conditions in your WHERE clause so that you have gaps in between rows that match your conditions.
The disadvantages of this greater-than method:
Rows following such a gap have a higher chance of being chosen, and the larger the gap the greater the chance.
You need to know the MIN(id) and MAX(id) before you generate the random value.
Doesn't work as well if you need more than one random row.
Advantages of this method:
It's much faster than ORDER BY RAND(), even for a modest table size.
You can use a random function outside of SQL.
RAND is pseudorandom. Be careful using it for security stuff. I don't think your "choose one row randomly out of fifty" is for security, so you're probably OK.
It's pretty fast for a small table. It will be horrible for picking a random row out of a large table: it will has to tag every row with a pseudorandom number and then sort them. For the application you're describing, #TheEwook's suggestion is exactly right; sorting even a small table more often than once a millisecond can swamp even powerful MySQL hardware.
Don't seed RAND, ever, unless you're testing and you want a repeatable sequence of random numbers for some kind of unit test. I learned this the hard way once when generating what I thought were hard-to-guess session tokens. The MySQL guys did a good job with RAND and you can trust them for the application you're talking about.
I think (not sure), if you don't seed it, it starts with a random seed from /dev/random.
If you need crypto-grade random numbers, read /dev/random yourself. But keep in mind that /dev/random can only generate a limited rate. /dev/urandom uses /dev/random to generate a faster rate, but isn't as high-grade in its entropy pool.
If your table is not too big (let's say max 1000 records) it doesn't really matter. But for big tables you must choose an alternative way.
This article may help you:
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/
I don't want my database id's to be sequential, so I'm trying to generate uids with this code:
$bin = openssl_random_pseudo_bytes(12);
$hex = bin2hex($bin);
return base_convert($hex, 16, 36);
My question is: how many bytes would i need to make the ids unique enough to handle large amounts of records (like twitter)?
Use PHP's uniqid(), with an added entropy factor. That'll give you plenty of room.
You might considering something like the way tinyurl and other shortening services work. I've used similar techniques, which guarantees uniqueness until all combinations are exhausted. So basically you choose an alphabet, and how many characters you want as a length. Let's say we use alphanumeric, upper and lower, so that's 62 characters in the alphabet, and let's do 5 characters per code. That's 62^5 = 916,132,832 combinations.
You start with your sequential database ID and you multiply that be some prime number (choose one that's fairly large, like 2097593). All you do is multiply that by your database ID, making sure to wrap around if you exceed 62^5, and then convert that number to base-62 as per your chosen alphabet.
This makes each code look fairly unique, yet because we use a prime number, we're guaranteed not to hit the same number twice until we've used all codes already. And it's very short.
You can use longer keys with a smaller alphabet, too, if length isn't a concern.
Here's a question I asked along the same lines: Tinyurl-style unique code: potential algorithm to prevent collisions
Assuming that openssl_random_pseudo_bytes may generate every possible value, N bytes will give you 2 ^ (N * 8) distinct values. For 12 bytes this is 7.923 * 10^28
use MySQL UUID
insert into `database`(`unique`,`data`) values(UUID(),'Test');
If your not using MySQL search google for UUID (Database Name) and it will give you an option
Source Wikipedia
In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%
In order to produce a unique Id I suppose I must use the uniqid function in php.
But uniqid produces a 13 digits long HEXA number, by default.
4f66835b507db
I would like to reduce this number to 7 digits long NUMERIC number but I want to conserve the unicity. Is it possible ?
4974012
This number will be used as User Id. The authentication will be done with thid Id and a password.
Some people say uniqid is not unique ! Is it a bad choice ?
Any "unique" number will eventually have a collision after generating enough records. To ensure uniqueness, you need to store the values you generated into a database and when generating next one, you need to check if there is no collision.
However, in practice, applications usually generate IDs as a simple sequence 1,2,3,... That way you know you won't get a collision until you run out of the datatype (UINT is usually 32 bits long, which gives you 4 billion unique ids).
Uniqid is not guaranteed to be unique, even in its full length.
Furthermore, uniqid is intended to be unique only locally. This means that if you create users simultaneously on two or more servers, you may end up with one ID for two different users, even if you use full-length uniqid.
My recommendations:
If you are really looking for globally unique identifiers (i.e. your application is running on multiple servers with separate databases), you should use UUIDs. These are even longer than the ones returned by uniqid, but there is no practical chance of collisions.
If you need only locally unique identifiers, stick with AUTO_INCREMENT in your database. This is (a little) faster and (a little) safer than checking if a short random ID already exists in your database.
EDIT: As it turns out in the comments below, you are looking not only for an ID for the user, but rather you are forced to provide your users with a random login name... Which is weird, but okay. In such case, you may try to use rand in a loop, until you get one that does not exist in your database.
Pseudocode:
$min = 1;
do {
$username = "user" . rand($min, $min * 10);
$min = $min * 10;
} while (user_exists($username));
// Create your user here.
Write a while loop that generates random letters and numbers of a desired length, which loops until it creates an ID that is not already in use.
Well, by reducing it to 7 characters and only numeric, you are reducing the 'uniqueness' by a lot.
I suggest using an auto increment of the user ID and start at 1000000 if it has to be 7 digits long.
If you really must generate it without auto increment, you can use mt_rand() to generate a random number 7 digits long:
$random = mt_rand(1000000, 9999999);
This is not ideal because you will need to check if the number is already in use by another user.
If you are using a Database. Define an id column as unique and auto-incremented, and then let the database manage your ids.
It's safer.
Read more : mysql-doc
Take a lookt at this article
Create short IDs with PHP - Like Youtube or TinyURL
It explains how to generate short unique ids, like youtube does.
Actually, the function in the article is very related to php function base_convert which converts a number from a base to another (but is only up to base 36).
I have just found this great tutorial as it is something that I need.
However, after having a look, it seems that this might be inefficient. The way it works is, first generate a unique key then check if it exists in the database to make sure it really is unique. However, the larger the database gets the slower the function gets, right?
Instead, I was thinking, is there a way to add ordering to this function? So all that has to be done is check the previous entry in the DB and increment the key. So it will always be unique?
function generate_chars()
{
$num_chars = 4; //max length of random chars
$i = 0;
$my_keys = "123456789abcdefghijklmnopqrstuvwxyz"; //keys to be chosen from
$keys_length = strlen($my_keys);
$url = "";
while($i<$num_chars)
{
$rand_num = mt_rand(1, $keys_length-1);
$url .= $my_keys[$rand_num];
$i++;
}
return $url;
}
function isUnique($chars)
{
//check the uniqueness of the chars
global $link;
$q = "SELECT * FROM `urls` WHERE `unique_chars`='".$chars."'";
$r = mysql_query($q, $link);
//echo mysql_num_rows($r); die();
if( mysql_num_rows($r)>0 ):
return false;
else:
return true;
endif;
}
The tiny url people like to use random tokens because then you can't just troll the tiny url links. "Where does #2 go?" "Oh, cool!" "Where does #3 go?" "Even cooler!" You can type in random characters but it's unlikely you'll hit a valid value.
Since the key is rather sparse (4 values each having 36* possibilities gives you 1,679,616 unique values, 5 gives you 60,466,176) the chance of collisions is small (indeed, it's a desired part of the design) and a good SQL index will make the lookup be trivial (indeed, it's the primary lookup for the url so they optimize around it).
If you really want to avoid the lookup and just unse auto-increment you can create a function that turns an integer into a string of seemingly-random characters with the ability to convert back. So "1" becomes "54jcdn" and "2" becomes "pqmw21". Similar to Base64-encoding, but not using consecutive characters.
(*) I actually like using less than 36 characters -- single-cased, no vowels, and no similar characters (1, l, I). This prevents accidental swear words and also makes it easier for someone to speak the value to someone else. I even map similar charactes to each other, accepting "0" for "O". If you're entirely machine-based you could use upper and lower case and all digits for even greater possibilities.
In the database table, there is an index on the unique_chars field, so I don't see why that would be slow or inefficient.
UNIQUE KEY `unique_chars` (`unique_chars`)
Don't rush to do premature optimization on something that you think might be slow.
Also, there may be some benefit in a url shortening service that generates random urls instead of sequential urls.
I don't know why you'd bother. The premise of the tutorial is to create a "random" URL. If the random space is large enough, then you can simply rely on pure, dumb luck. If you random character space is 62 characters (A-Za-z0-9), the the 4 characters they use, given a reasonable random number generator, is 1 in 62^4, which is 1 in 14,776,336. Five characters is 1 in 916,132,832. So, a conflict is, literally, "1 in a billion".
Obviously, as the documents fill, your odds increase for the chance of a collision.
With 10,000 documents, it's 1 in 91,613, almost 1 in 100,000 (for round numbers).
That means, for every new document, you have a 1 in 91,613 chance of hitting the DB again for another pull on the slot machine.
It is not deterministic. It's random. It's luck. In theory, you can hit a string of really, really, bad luck and just get collision after collision after collision. Also, it WILL, eventually, fill up. How many URLs do you plan on hashing?
But if 1 in 91,613 odds isn't good enough, boosting it to 6 chars makes it more than 1 in 5M for 10,000 documents. We're talking almost LOTTO odds here.
Simply put, make the key big enough (7 characters? 8?) and the problem pretty much "wishes" itself out of existence.
Couldn't you encode the URL as Base36 when it's generated, and then decode it when visited - that would allow you to remove the database completely?
A snippet from Channel9:
The formula is simple, just turn the
Entry ID of our post, which is a long
into a short string by Base-36
encoding it and then stick
'http://ch9.ms/' onto the front of it.
This produces reasonably short URLs,
and can be computed at either end
without any need for a database look
up. The result, a URL like
http://ch9.ms/A49H is then used in
creating the twitter link.
I solved a similar problem by implementing an alogirthm that used to generate serial numbers one-by-one in base36. I had my own oredring of base36 characters all of which are unique. Since it was generating numbers serially I did not have to worry about duplication. Complexity and randomness of the number depends on the ordering of base36 numbers[characters]... that too for public only becuase to my application they are serial numbers :)
Check out this guys functions - http://www.pgregg.com/projects/php/base_conversion/base_conversion.php source - http://www.pgregg.com/projects/php/base_conversion/base_conversion.inc.phps
You can use any base you like, for example to convert 554512 to base 62, call
$tiny = base_base2base(554512, 10, 62); and that evaluates to $tiny = '2KFk'.
So, just pass in the unique id of the database record.
In a project I used this in a removed a few characters from the $sChars string, and am using base 58. You can also rearrange the characters in the string if you want the values to be less easy to guess.
You could of course add ordering by simply numbering the urls:
http://mytinyfier.com/1
http://mytinyfier.com/2
and so on. But if the hash key is indexed in the database (which it obviously should be), the performance boost would be minimal at best.
I wouldn't bother doing ordered enumeration for two reasons:
1) SQL servers are very effective at checking such hash collisions (given correct indexes)
2) That might hurt privacy, as users would be able to easily figure out what other users are tinyurl-ing.
Use autoincrement on the database, and get the latest id as described by http://www.acuras.co.uk/articles/24-php-use-mysqlinsertid-to-get-the-last-entered-auto-increment-value
Perhaps this is a bit off-answer, but, my general rule for creating always unique keys is simple md5( time() * 100 + rand( 0, 100 ) ); There is a one in 100,000 chance that if two people are using the same service at the same second they will get the same result (nie impossible).
That said, md5( rand( 0, n ) ) works too.
That might work, but the easiest way to accomplish the problem would probably be with hashing. Theoretically speaking, hashing runs in O(1) time, as in, it only has to perform the hash, and then does only one actual hit to the database to retrieve the value. Then, you would introduce complications for checking for hash collisions, but it seems like this is probably what most of the tinyurl providers do. And, a good hash function isn't terribly hard to write.
I have also created small tinyurl service.
I wrote a script in Python that was generating keys and store in MySQL table named tokens with status U(Unused).
But, I am doing it in offline mode. I have a corn job on my VPS. It runs a script every 10 minutes. The script check if there are less than 1000 keys in the table, it keep generating keys and inserting them if they are unique and not already exists in the table until the key's count up to 1000.
For my service, 1000 keys for 10 minutes are more than enough, you can set the timing or number of keys generated according to your need.
Now when any tiny url needs to be created on my website, my PHP script just fetch any key which is unused from the table and marked its status as T(taken). PHP script does not have to bother about its uniqueness as my python script already populated only unique keys.
Couldn't you just trim the hash to the length you wish?
$tinyURL = substr(md5($longURL . time()),0,4);
Granted, this may not provide as much pseudo randomness as using the entire string length. But, if you hash the long URL concatenated with the time(), wouldn't this be sufficient? Thoughts on using this method? Thanks!