Background:
I am creating a service booking website. Each order needs to have a unique order number. I have chosen 16 digits because that's what the previous software used.
Questions
I am not sure if there is any benefit to putting data into the order number or if it should just be a purely random string.
If it is just a random string then its only purpose is to act as an ID. If that is the case, then why not just use an incremental ID? Other then to obfuscate the number of orders we have generated to the end user I can't think of a good reason.
If it is a good idea to put data into the string, what kind of data should I include? Probably the date of the order, but other then that I don't know.
I am currently generating a purely random 16 digit string like this.
public function generateOrderNumber()
{
$time = time(); // Time (CET) to hash
$token = md5($time); // Hash stored in variable
return str_shuffle(substr($token, 0, 16)); // Hash shortened to 5 chars and randomised
}
However I am not sure if this is good enough for production.
If you need globally unique, say across multiple databases that are synchronized at intervals, then I'd go with standard 128-bit GUID which could be squeezed into 16 8-bit bytes to maintain backwards compatibility. PHP has com_create_guid to generate GUIDs.
MD5 only produces values in the a-f0-9 range which is severely limiting here. You really need to expand this and use the entire alphabet, maybe even Base62, a variant of Base64 minus the two "annoying" characters.
A cryptographically random number, not the junk rand() produces, encoded as a 5-character Base62 value could work.
If you need people to be able to read and write these values by hand you'll want to omit 0, O and 1 and l and I for clarity.
Remember, on really short values you will probably get collisions so you'll need to test any INSERT you do against a UNIQUE constraint and retry if they fail.
I need to generate a unique string in PHP.
Currently I'm using a technique like this
$clipId = base_convert(microtime(), 8, 36);
However, as this is based on time, the ID changes when the page is re-rendered, and I need to to always remain the same.
If there would be a way to feed in the image URL and the post-title as strings to output an alphanumeric ID, that would be perfect, and 'random' enough for what I need to do here. Also if it were possible to get the unix-time the image was uploaded to Wordpress (together with the time the unix-time the post was created), I could use that.
So, you want an algorithm that turns one string into another string. That's not random, that's either an encoding or a hash. An encoding expresses the same value merely in different terms, for example base64_encode. You can convert between the original string and the encoded form back and forth as often as you like.
Alternatively you probably want a hash like SHA1 or MD5 to turn arbitrary input into a fixed-length output. You can not convert a hash back into its original value.
Alternatively you can use an entirely arbitrary random string generated with a pseudo random number generator. These generators need to be seeded with an initial value, and will then return a predictable and repeatable series of seemingly random numbers. If you seed it with the same value, it will return you the same random number sequence. You can use that to produce random numbers which have no direct connection with your string yet are still reproducible when necessary. e.g. mt_rand, mt_srand.
Example security code:
a35sfj9ksdf
How can I ask a user for several characters (e.g. first, forth and ninth) of their security code and then check these? The main difficulty comes in how do I store the seucurity code in an encrypted form - if I were to store each character individually, then the encryption would be incredibly easy to break.
A possibility that was described neither here nor at How to store and verify digits chosen at random from a PIN/Password is this:
Create a random salt of the same length as the seucrity code (here
11)
Store the salt with the user
for every char of the security code, replace the corresponding char
of the salt with the char from the security code and hash it securely
store these hashes with the user
Now you have to store the manageable quantity of n+1 fields for a security code of length n and can still verify single (position,char) tuples
What about using substr()?
substr("a35sfj9ksdf", 0, 1);
That would return 'a', the first character
substr("a35sfj9ksdf", 4, 1);
This would return 4, the 5th character
So something like please enter the $n character and use
substr("a35sfj9ksdf", $n-1, 1);
you can follow those steps,
store all your desired characters in an array
generate n (length of user code) number of random numbers where each number will represent a character of your array.
Then concat the new generated characters to make a string
Store the string using session and when ask from the user just match the user code with session
you can also make a simple captcha service using the similar way
I'm creating a link shortening service and I'm using base64 encoding/decoding of an incremented ID field to create my urls. A url with the ID "6" would be: http://mysite.com/Ng==
I need to also allow users to create a custom url name, like http://mysite.com/music
Here's my (possibly faulty) approach so far. Help in fixing it would be appreciated.
When someone creates a new link:
I get the largest link ID from the database (it's not auto incremented)
Increment the ID by 1
Generate a short URL code (http://website.com/[short url name]) by base64_encoding that ID
Insert into links table: id, short_url_code, destination_url
When someone creates a new link and passes a custom short URL:
My plan was base64_decode their custom string and use that as the link ID, but I didn't realize that you can't just base64_decode any alphanumeric string and turn it into a number.
Is there a better encoding method that will let me turn any number into a short string, and any string into a number, so I can always lookup short urls (whether custom or autogenerated) by turning the name into a number and querying for a link with an ID equal to that number?
First and foremost, make sure you have unicity constraints in place on the ID and short_url_code columns.
When someone creates a new link:
Get the next largest link ID from the database (for performance reasons you should really REALLY use autoincrement or SEQUENCE, depending on what your RDBMS offers; otherwise go ahead and select MAX(ID)+1 )
Generate a short URL code (http://website.com/[short url name]) from ID using base64_encode or any other custom or standard encoding scheme
Insert into the links table: ID, short_url_code, destination_url
If the insert fails because of a constraint violation go back to step 1 to try a new ID; you may have had a violation because:
the same ID has already been used (i.e. inserted) in parallel by another thread/process etc. (this will not happen if you used autoincrement or SEQUENCE, and may happen quite often otherwise), and/or
the same short_url_code has already been used as a custom URL (this will happen very seldomly unless someone is trying to cause trouble on your site)
If the insert succeeded, commit and return the short URL to the user
When someone creates a new link and passes a custom short URL:
Perform the same step 1 as above
Instead of generating the short URL part from ID as in step 2 above, use the custom short_url_code provided by the user
Perform the same step 3 as above
If the insert failed because of:
a constraint violation on ID: go back to step 1 to try a new ID
a constraint violation on short_url_code: return an error to the user asking him to pick a different custom URL, as the short URL he/she provided has already been used
Perform the same step 5 as above
base64 can be used to make short urls, but it can also make the url longer. For instance the base64_encode of the number 1 is 'MQ==' which is 4 times the size. Base64 will always have 2 characters to obtain the 64bits, which is not ideal for short urls.
If size is the most important factor then you maybe able to produce the shortest urls by relying on internationalization.
This can make a URI rather long (up to 9 ASCII characters for a single Unicode character), but the intention is that browsers only need to display the decoded form, and many protocols can send UTF-8 without the %HH escaping.
Keep in mind that Browsers work quite well with UTF-8, and twitter will have no trouble with these urls.
I've always wondered how and why they do this...an example: http://youtube.com/watch?v=DnAMjq0haic
How are these IDs generated such that there are no duplicates, and what advantage does this have over having a simple auto incrementing numeric ID?
How do one keep it short but still keep it's uniqueness? The string uniqid creates are pretty long.
Kevin van Zonneveld has written an excellent article including a PHP function to do exactly this. His approach is the best I've found while researching this topic.
His function is quite clever. It uses a fixed $index variable so problematic characters can be removed (vowels for instance, or to avoid O and 0 confusion). It also has an option to obfuscate ids so that they are not easily guessable.
Try this: http://php.net/manual/en/function.uniqid.php
uniqid — Generate a unique ID...
Gets a prefixed unique identifier based on the current time in microseconds.
Caution
This function does not generate cryptographically secure values, and should not be used for cryptographic purposes. If you need a cryptographically secure value, consider using random_int(), random_bytes(), or openssl_random_pseudo_bytes() instead.
Warning
This function does not guarantee uniqueness of return value. Since most systems adjust system clock by NTP or like, system time is changed constantly. Therefore, it is possible that this function does not return unique ID for the process/thread. Use more_entropy to increase likelihood of uniqueness...
base62 or base64 encode your primary key's value then store it in another field.
example base62 for primary key 12443 = 3eH
saves some space, which is why im sure youtube is using it.
doing a base62(A-Za-z0-9) encode on your PK or unique identifier will prevent the overhead of having to check to see if the key already exists :)
I had a similar issue - I had primary id's in the database, but I did not want to expose them to the user - it would've been much better to show some sort of a hash instead. So, I wrote hashids.
Documentation: http://www.hashids.org/php/
Souce: https://github.com/ivanakimov/hashids.php
Hashes created with this class are unique and decryptable. You can provide a custom salt value, so others cannot decrypt your hashes (not that it's a big problem, but still a "good-to-have").
To encrypt a number your would do this:
require('lib/Hashids/Hashids.php');
$hashids = new Hashids\Hashids('this is my salt');
$hash = $hashids->encrypt(123);
Your $hash would now be: YDx
You can also set minimum hash length as the second parameter to the constructor so your hashes can be longer. Or if you have a complex clustered system you could even encrypt several numbers into one hash:
$hash = $hashids->encrypt(2, 456); /* aXupK */
(for example, if you have a user in cluster 2 and an object with primary id 456) Decryption works the same way:
$numbers = $hashids->decrypt('aXupK');
$numbers would then be: [2, 456].
The good thing about this is you don't even have to store these hashes in the database. You could get the hash from url once request comes in and decrypt it on the fly - and then pull by primary id's from the database (which is obviously an advantage in speed).
Same with output - you could encrypt the id's on the way out, and display the hash to the user.
EDIT:
Changed urls to include both doc website and code source
Changed example code to adjust to the main lib updates (current PHP lib version is 0.3.0 - thanks to all the open-source community for improving the lib)
Auto-incrementing can easily be crawled. These cannot be predicted, and therefore cannot be sequentially crawled.
I suggest going with a double-url format (Similar to the SO URLs):
yoursite.com/video_idkey/url_friendly_video_title
If you required both the id, and the title in the url, you could then use simple numbers like 0001, 0002, 0003, etc.
Generating these keys can be really simple. You could use the uniqid() function in PHP to generate 13 chars, or 23 with more entropy.
If you want short URLs and predictability is not a concern, you can convert the auto-incrementing ID to a higher base.
Here is a small function that generates unique key randomly each time. It has very fewer chances to repeat same unique ID.
function uniqueKey($limit = 10) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$randstring = '';
for ($i = 0; $i < $limit; $i++) {
$randstring .= $characters[rand(0, strlen($characters))];
}
return $randstring;
}
source: generate random unique IDs like YouTube or TinyURL in PHP
Consider using something like:
$id = base64_encode(md5(uniqid(),true));
uniqid will get you a unique identifier. MD5 will diffuse it giving you a 128 bit result. Base 64 encoding that will give you 6 bits per character in an identifier suitable for use on the web, weighing in around 23 characters and computationally intractable to guess. If you want to be even more paranoid ugrade from md5 to sha1 or higher.
A way to do it is by a hash function with unique input every time.
example (you've tagged the question with php therfore):
$uniqueID = null
do {
$uniqueID = sha1( $fileName + date() );
} while ( !isUnique($uniqueID) )
There should be a library for PHP to generate these IDs. If not, it's not difficult to implement it.
The advantage is that later you won't have name conflicts, when you try to reorganize or merge different server resources. With numeric ids you would have to change some of them to resolve conflicts and that will result in Url change leading to SEO hit.
So much of this depends on what you need to do. How 'unique' is unique? Are you serving up the unique ID's, and do they mean something in your DB? if so, a sequential # might be ok.
ON the other hand, if you use sequential #'s someone could systematically steal your content by iterating thru the numbers.
There are filesystem commands that will generate unique file names - you could use those.
Or GUID's.
Results of hash functions like SHA-1 or MD5 and GUIDs tend to become very long, which is probably something you don't want. (You've specifically mentioned YouTube as an example: Their identifiers stay relatively short even with the bazillion videos they are hosting.)
This is why you might want to look into converting your numeric IDs, which you are using behind the scenes, into another base when putting them into URLs. Flickr e.g. uses Base58 for their canonical short URLs. Details about this are available here: http://www.flickr.com/groups/api/discuss/72157616713786392/. If you are looking for a generic solution, have a look at the PEAR package Mathe_Basex.
Please note that even in another base, the IDs can still be predicted from outside of your application.
I don't have a formula but we do this on a project that I'm on. (I can't share it). But we basically generate one character at a time and append the string.
Once we have a completed string, we check it against the database. If there is no other, we go with it. If it is a duplicate, we start the process over. Not very complicated.
The advantage is, I guess that of a GUID.
This is NOT PHP but can be converted to php or as it's Javascript & so clinetside without the need to slow down the server.. it can be used as you post whatever needs a unique id to your php.
Here is a way to create unique ids limited to
9 007 199 254 740 992 unique id's
it always returns 9 charachters.
where iE2XnNGpF is 9 007 199 254 740 992
You can encode a long Number and then decode the 9char generated String
and it returns the number.
basically this function uses the 62base index Math.log() and Math.Power to get the right index based on the number.. i would explain more about the function but ifound it some time ago and can't find the site anymore and it toke me very long time to get how this works... anyway i rewrote the function from 0.. and this one is 2-3 times faster than the one that i found.
i looped through 10million checking if the number is the same as the enc dec process and it toke 33sec with this one and the other one 90sec.
var UID={
ix:'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
enc:function(N){
N<=9007199254740992||(alert('OMG no more uid\'s'));
var M=Math,F=M.floor,L=M.log,P=M.pow,r='',I=UID.ix,l=I.length,i;
for(i=F(L(N)/L(l));i>=0;i--){
r+=I.substr((F(N/P(l,i))%l),1)
};
return UID.rev(new Array(10-r.length).join('a')+r)
},
dec:function(S){
var S=UID.rev(S),r=0,i,l=S.length,I=UID.ix,j=I.length,P=Math.pow;
for(i=0;i<=(l-1);i++){r+=I.indexOf(S.substr(i,1))*P(j,(l-1-i))};
return r
},
rev:function(a){return a.split('').reverse().join('')}
};
As i wanted a 9 character string i also appended a's on the generated string which are 0's.
To encode a number you need to pass a Number and not a string.
var uniqueId=UID.enc(9007199254740992);
To decode the Number again you need to pass the 9char generated String
var id=UID.dec(uniqueId);
here are some numbers
console.log(UID.enc(9007199254740992))//9 biliardi o 9 milioni di miliardi
console.log(UID.enc(1)) //baaaaaaaa
console.log(UID.enc(10)) //kaaaaaaaa
console.log(UID.enc(100)) //Cbaaaaaaa
console.log(UID.enc(1000)) //iqaaaaaaa
console.log(UID.enc(10000)) //sBcaaaaaa
console.log(UID.enc(100000)) //Ua0aaaaaa
console.log(UID.enc(1000000)) //cjmeaaaaa
console.log(UID.enc(10000000)) //u2XFaaaaa
console.log(UID.enc(100000000)) //o9ALgaaaa
console.log(UID.enc(1000000000)) //qGTFfbaaa
console.log(UID.enc(10000000000)) //AOYKUkaaa
console.log(UID.enc(100000000000)) //OjO9jLbaa
console.log(UID.enc(1000000000000)) //eAfM7Braa
console.log(UID.enc(10000000000000)) //EOTK1dQca
console.log(UID.enc(100000000000000)) //2ka938y2a
As you can see there are alot of a's and you don't want that... so just start with a high number.
let's say you DB id is 1 .. just add 100000000000000 so that you have 100000000000001
and you unique id looks like youtube's id 3ka938y2a
i don't think it's easy to fulfill the other 8907199254740992 unique id's