I need to generate serial numbers using PHP in the following format "ASDK3-JDAL9-24SFT-J5D8R-D4AL9". One requirement is the fact that I need to encode somehow a timestamp and an email address inside this serial number and then retrieve them when needed. Is there any easy way to do this?
EDIT
To be more specific the allowed numbers and letters in serial number need to be 0-9 and A-Z. To make it more generic I need to have 2 short strings for example encoded in that serial number. For example a date "04/03/2013" and one number "324" or email address if possible. The string don't need to be human readable in the serial number but I need to be able to retrieve them when needed.
Let's do some simple math using base32 encode. You have 36 characters but we'll assume 32 because we're doing a rough estimate.
Base 32 adds 60% overhead. If you want to store a date of 8 characters and a number of 3 characters you'll need at least: ( 8 + 3 ) * 1.6 = 18 characters for this data. Your key is 25 characters long so you'll have 7 / 1.6 = 4 characters left for some randomness. If your random keys have 64 characters you'll have 64^4 = 16 million possibilities.
PHP doesn't have a native base 32 function available but you can write one yourself, the outline is the same as base64 except you take 7 bits at a time instead of 8.
Related
I'm looking for a way to convert an alphanumeric string, e.g. "aBcd3f", into a purely numeric representation, and get the shortest possible input string. The valid characters in the input string are a-z, A-Z, 0-9, and the resultant string would be comprised only of digits 0-9.
Since there are 62 valid values for each character in the input string, I can assign values 00-61 to each input character, and covert the 6 input characters into a 12 character numeric string.
But I would like to get something more compact, if possible - e.g. 8-10 digits. Is it possible, and if so, are there any algorithms or functions for doing this in PHP?
Note that this has to be a 2-way function. I also need to be able to go back from the numeric string to the alphanumeric.
I haven't found this question asked on this site. My question is the opposite of this question, as I'm trying to go in the opposite direction.
A decimal digit encodes log2(10) = 3.32 bits of information on average. Alphanumeric data has 62 possible "digits", so each one encodes log2(62) = 5.95 bits of information on average.
This means that converting from alphanumeric to decimal digits only will require approximately 5.95 / 3.32 = 1.79 times more characters in the output than there are in the input. If your output is constrained to 10 characters maximum you can expect it to encode at most 5.58 characters of alphanumeric input, which for practical purposes means just 5. There is no room for maneuvering here; this is cold math.
The manner of converting from one representation to the other is fairly straightforward, because in essence you are simply converting a number from base 62 to base 10 and back. You can tweak the code from this answer of mine only slightly to achieve the aim.
See it in action.
Note that with the (arbitrary) order of digits I picked the "largest" possible input with 5 characters is "ZZZZZ", which encodes to 9 decimal digits. If you expand the input to 6 characters the largest input would be "ZZZZZZ" which would need 11 decimal digits to encode -- more than the limit we imposed, as predicted.
Also note that this analysis assumes every possible input string is as likely to occur as any other, i.e. the input is perfectly random. If this is not the case then the actual information content of the input would be lower than the theoretical maximum and consequently you could take advantage of this with some kind of compression scheme.
I have a very large integer 12-14 digits long and I want to encrypt/compress this to an alphanumeric value so that the integer can be recovered later from the alphanumeric value. I tried to convert this integer using a 62 base and tried to map those values to a-zA-Z0-9, but the value generated from this is 7 characters long. This length is still long enough and I want to convert to about 4-5 characters.
Is there a general way to do this or some method in which this can be done so that recovering the integer would still be possible? I am asking the mathematical aspects here but I would be programming this in PHP and I recently started programming in php.
Edit:
I was thinking in terms of assigning a masking bit and using this in a fashion to generate less number of Chars. I am aware of the fact that the range is not enough and that is the reason I was focusing on using a mathematical trick or a way of representation. The 62 base was an Idea that I already applied but is not working out.
14 digit decimal numbers can express 100,000,000,000,000 values (1014).
5 characters of a 62 character alphabet can express 916,132,832 values (625).
You cannot cram the equivalent number of values of a 14 digit number into a 5 character base 62 string. It's simply not possible to express each possible value uniquely. See http://en.wikipedia.org/wiki/Pigeonhole_principle. Even base 64 with 7 characters is not enough (only 4,398,046,511,104 possible values). In fact, if you target a 5 character short string you'd need to compensate by using a base 631 alphabet (6315 = 100,033,806,792,151).
Even compression doesn't help you. It would mean that two or more numbers would need to compress to the same compressed string (because there aren't enough possible unique compressed values), which logically means it's impossible to uncompress them into two different values.
To illustrate this very simply: Say my alphabet and target "string length" consists of one bit. That one bit can be 0 or 1. It can express 2 unique possible values. Say I have a compression algorithm which compresses anything and everything into this one bit. ... How could I possibly uncompress 100,000,000,000,000 unique values out of that one bit with two possible values? If you'd solve that problem, bandwidth and storage concerns would immediately evaporate and you'd be a billionaire.
With 95 printable ASCII characters you can switch to base 95 encoding instead of 62:
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
That way an integer string of length X can be compressed into length Y base 95 string, where
Y = X * log 10/ log 95 = roughly X / 2
which is pretty good compression. So from length 12 you get down to 6. If the purpose of compression is to save the bandwidth by using JSON, then base 92 can be good choice (excluding ",\,/ that become escaped in JSON).
Surely you can get better compression but the price to pay is a larger alphabet. Just replace 95 in the above formula by the number of symbols.
Unless of course, you know the structure of your integers. For instance, if they have plenty of zeroes, you can base your compression on this knowledge to get much better results.
because the pigeon principle you will end up with some values that get compressed and other values that get expanded. It simply impossible to create a compression algorithm that compress every possible input string (i.e. in your case your numbers).
If you force the cardinality of the output set to be smaller than the cardinality of the input set you'll get collisions (i.e. more input strings get "compressed" to the same compressed binary string). A compression algorithm should be reversible, right? :)
I am trying to resolve the following problem via PHP. The aim is to generate a unique 6-character string based on an integer seed and containing a predefined range of characters. The second requirement is that the string must appear random (so if code 1 were 100000, it is not acceptable for code 2 to be 100001, and 3 100002)
The range of characters is:
Uppercase A-Z excluding: B, I, O, S and Z
0-9 excluding: 0, 1, 2, 5, 8
So that would be a total of 26 characters if I am not mistaken. My first idea would to be encoding from base 10 to base 24 starting at number 7962624. So do 7962624 + seed, and then base24 encode that number.
This gives me the characters 0-N. If I replace the resulting string in the following fashion, I then meet the first criteria:
B=P, I=Q, 0=R, 1=T, 2=U, 5=V, 8=W
So at this point, my codes will look something like this:
1=TRRRR, 2=TRRRT, 3=TRRRU
So my question to you gurus is: How can I make a method that behaves consistently (so the return string for a given integer is always the same) and meets the 2 requirements above? I have spent 2 full days on this now and short of dumping 700,000,000 codes into a database and retrieving them randomly I'm all out of ideas.
Stephen
You get a reasonably random looking sequence if you take your input sequence 1,2,3... and apply a linear map modulo a prime number. The number of unique codes is limited to the prime number so you should choose a large one. The resulting codes will be unique as long as you choose a multiplier that's not divisible by the prime.
Here's an example: With 6 characters you can make 266=308915776 unique strings, so a suitable prime number could be 308915753. This function therefore will generate over 300.000.000 unique codes:
function encode($num) {
$scrambled = (240049382*$num + 37043083) % 308915753;
return base_convert($scrambled, 10, 26);
}
Make sure that you run this on 64bit PHP though, otherwise the multiplication will overflow. On 32bit you'll have to use bcmath. The codes generated for the numbers 1 through 9 are:
n89a2d
hdh4jo
biopb9
5o6k2k
3eek5
k8m9aj
ee4424
8jbojf
2ojjb0
All that's left is filling in the initial 0s that are sometimes missing and replacing the letters and numbers so that none of the forbidden characters are produced.
As you can see, there's no obvious pattern, but someone with some time on their hands, enough motivation and with access to a few of this codes will be able to find out what's going on. A safer alternative is using an encryption algorithm with a small block size, such as Skip32.
I need to group a number of parameters into a short, non-predictable, spellable code. Ex:
serial: WJ-JHA5JK7E9RTAS
date: 04/02/2013
days: 30
valid: true
Compressed code could look like this: 3xy9b0laiph3s
My goal is to make the code as short as possible (without losing any information, of course). The algorithm must be easily implemented in other languages as well (so it can't have crazy specific dependencies). Any thoughts?
Most of the time this is handled by storing the data somewhere and creating an ID which is then compressed and used. The most common users of this system are short URL sites.
Store data in DB and get row ID
convert base-10 row ID to base 32 or 64 (base_convert in PHP)
use the new ID which looks like '4F7c'
When that ID is passed just unconvert it bask to base 10 and look up the data in the DB
Code:
$id = 23590;
print $id;
$hash = base_convert($id, 10, 32);
print $hash;
$id = base_convert($hash, 32, 10);
print $id;
For arbitrary short strings there is not enough information to apply generalized predictive methods of compression.
You'll need to exploit the known features of your data.
Example:
Serial numbers appear to be capital letters and numbers - 36 values per character - and 15 characters long. That's 36^15 possible values which will fit in 78 bits.
Date can be converted into number of days since a fixed date. If all the dates are known to fall within 100 years of each other, this can be stored in 16 bits.
If days is never more than years worth, this can be stored in 9 bits.
Valid can be stored in 1 bit.
That's 104 bits, which can be Base64 encoded to 18 characters
Note that oftentimes serial numbers have a checksum character or two. If you know how the checksum is calculated, you can omit this character and recalculate it upon decoding. This could save you a Base64 digit here.
If you want to make the result less predictable, without worrying with heavy encryption, you can just deterministically shuffle your encoded string.
UUencode or Base64, but in these codings case is matched. Eventually you could edit these codings for your purposes (only small letters). If you have exactly the same amount of data this would be the easiest solution. But not the minimal one.
I want to create a way for images to generate a short link such as domain.com/9t6So63
I want to make it 8 chars long after domain.com/ long and accept
0123456789abcdefghijklmnopqrstuvwxyz
how many total generated links could i get out of this? And should i make it more?
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
This is a 62 characters longs string.
If your problem is only to know the number of links you can generate, it's no more than a simple Math problem :
62^8 = 218,340,105,584,896
how many total generated links could i get out of this?
You would get 62^8 combinations from this.
You could use uniqid() to generate the unique string. This string is generated based on a current time in microseconds so beware if you have multiple server instances generating the id at the same microsecond.
With an empty prefix, the returned string will be 13 characters long. If more_entropy is TRUE, it will be 23 characters.
Number of links are defined by the number of available characters and the length of that string, you've got twenty-six letters and ten numbers available for each position, and you're able to use each character more than once, so:
368 = 2821109907456
If you're able to use upper-case characters as well, then you've got 62 available characters for each of the eight positions, which gives a range of 628 (2821109907456) possible combinations.
This is not just a php problem, this is about combination.
The number of generate "links" changes if you want to repeat or not the chars in the chosen subset.