Encoding numbers with base62 and base36 - php

Currently I have some code that converts numbers into base62 format
This works fine, however when putting the data into the database, its possible to end up with two of the same base62 string (but with different case)
eg. SsUTF1 and Ssutf1
To get around this issue, is base36 a viable alternative to base62? my very limited understanding of this is that base36 wont produce same strings, but on the flip side, I am assuming the character length for bigger numbers can be longer on base 36 than base 62?
if I have strings in the database already that are base62 , is it possible to end up with duplicates after switching to base36? given that the base number the strings will be derived from are never going to be the same.

Given that the base number is different, it base36 representation will always be different. it's also true with base62 if you compare it using case sensitive or binary compare method.
Also base36 representation of the same number could be longer than the base62. Let say we have 10 positions it would mean we could represent :
base36 36E10 = 3 656 158 440 062 976 possibilities
base62 62E10 = 839 299 365 868 340 224 possibilities
Hope this help.

Related

PhpExcel how to turn off the default conversion of numbers to scientific notation

How to turn off the default conversion of numbers to scientific notation. When importing from an excel file, large numbers are automatically converted to scientific notation (3.5868405364945E+14 it should be:358684053649447).Is there any option to turn off conversion in PhpExcel?
Or reverse conversions from PHP? When I trying to use printf,
printf("%d", "3.5868405364945E+14"); // 358684053649450 wrong value
final number is inaccurate.
Sorry, you'll never get the full value again, it's been already rounded, because your number has 16 digits and 15 digits is the limit for numbers in Excel.
It happens at the entry point, when you enter a number that excedes 15 digits. EXcel will round it, modifying your entry forever.
It's similar as storing a decimal number like 1.2 as integer, you'll loose that 0.2, no matter what you do, it will be 1 forever.
The only solution for this is (too late in your case), storing the large number as text in the first place, just adding a single quote before the number: '358684053649447 instead of 358684053649447. Excel will interpret that as string, not as number, and you'll be able to save numbers higher than 15 digits.

Convert IEEE 754 to decimal floating point

I have what I think it is an IEEE754 with single or double precision (not sure) and I'd like to convert it to decimal on PHP.
Given 4 hex value (which might be in little endian format, so basically reversed order) 4A,5B,1B,05 I need to convert it to a decimal value which I know will be very close to 4724.50073.
I've tried some online converters but they are far from the expected result so I'm clearly missing something.
If I echo 0x4A; I get 74 and the others are 91, 27 and 5. Not sure where to take it from here...
To convert it to float, use unpack. If the byte order is incorrect, you'll have to reverse it yourself before unpacking. 4 bytes (32 bits) usually means it's a float, 8 for double.
$bin = "\x4A\x5B\x1B\x05";
$a = unpack('f', strrev($bin));
echo $a[1]; // 3589825.25
I don't see any way how this maps to 4724.50073 directly tho. Without any more test data or manufacturer's manual this question is not fully answerable.
Speculation: judging from the size of the coordinate it's probably some sort of projection (XYZ or mercator) which can then be converted to WGS84 or whatever you need. Unfortunately there's no way to check since you haven't provided both latitude and longitude.

Representing alphanumeric string as a shortest possible numeric string

I'm looking for a way to convert an alphanumeric string, e.g. "aBcd3f", into a purely numeric representation, and get the shortest possible input string. The valid characters in the input string are a-z, A-Z, 0-9, and the resultant string would be comprised only of digits 0-9.
Since there are 62 valid values for each character in the input string, I can assign values 00-61 to each input character, and covert the 6 input characters into a 12 character numeric string.
But I would like to get something more compact, if possible - e.g. 8-10 digits. Is it possible, and if so, are there any algorithms or functions for doing this in PHP?
Note that this has to be a 2-way function. I also need to be able to go back from the numeric string to the alphanumeric.
I haven't found this question asked on this site. My question is the opposite of this question, as I'm trying to go in the opposite direction.
A decimal digit encodes log2(10) = 3.32 bits of information on average. Alphanumeric data has 62 possible "digits", so each one encodes log2(62) = 5.95 bits of information on average.
This means that converting from alphanumeric to decimal digits only will require approximately 5.95 / 3.32 = 1.79 times more characters in the output than there are in the input. If your output is constrained to 10 characters maximum you can expect it to encode at most 5.58 characters of alphanumeric input, which for practical purposes means just 5. There is no room for maneuvering here; this is cold math.
The manner of converting from one representation to the other is fairly straightforward, because in essence you are simply converting a number from base 62 to base 10 and back. You can tweak the code from this answer of mine only slightly to achieve the aim.
See it in action.
Note that with the (arbitrary) order of digits I picked the "largest" possible input with 5 characters is "ZZZZZ", which encodes to 9 decimal digits. If you expand the input to 6 characters the largest input would be "ZZZZZZ" which would need 11 decimal digits to encode -- more than the limit we imposed, as predicted.
Also note that this analysis assumes every possible input string is as likely to occur as any other, i.e. the input is perfectly random. If this is not the case then the actual information content of the input would be lower than the theoretical maximum and consequently you could take advantage of this with some kind of compression scheme.

Encoding/Compressing a large integer into alphanumeric value

I have a very large integer 12-14 digits long and I want to encrypt/compress this to an alphanumeric value so that the integer can be recovered later from the alphanumeric value. I tried to convert this integer using a 62 base and tried to map those values to a-zA-Z0-9, but the value generated from this is 7 characters long. This length is still long enough and I want to convert to about 4-5 characters.
Is there a general way to do this or some method in which this can be done so that recovering the integer would still be possible? I am asking the mathematical aspects here but I would be programming this in PHP and I recently started programming in php.
Edit:
I was thinking in terms of assigning a masking bit and using this in a fashion to generate less number of Chars. I am aware of the fact that the range is not enough and that is the reason I was focusing on using a mathematical trick or a way of representation. The 62 base was an Idea that I already applied but is not working out.
14 digit decimal numbers can express 100,000,000,000,000 values (1014).
5 characters of a 62 character alphabet can express 916,132,832 values (625).
You cannot cram the equivalent number of values of a 14 digit number into a 5 character base 62 string. It's simply not possible to express each possible value uniquely. See http://en.wikipedia.org/wiki/Pigeonhole_principle. Even base 64 with 7 characters is not enough (only 4,398,046,511,104 possible values). In fact, if you target a 5 character short string you'd need to compensate by using a base 631 alphabet (6315 = 100,033,806,792,151).
Even compression doesn't help you. It would mean that two or more numbers would need to compress to the same compressed string (because there aren't enough possible unique compressed values), which logically means it's impossible to uncompress them into two different values.
To illustrate this very simply: Say my alphabet and target "string length" consists of one bit. That one bit can be 0 or 1. It can express 2 unique possible values. Say I have a compression algorithm which compresses anything and everything into this one bit. ... How could I possibly uncompress 100,000,000,000,000 unique values out of that one bit with two possible values? If you'd solve that problem, bandwidth and storage concerns would immediately evaporate and you'd be a billionaire.
With 95 printable ASCII characters you can switch to base 95 encoding instead of 62:
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
That way an integer string of length X can be compressed into length Y base 95 string, where
Y = X * log 10/ log 95 = roughly X / 2
which is pretty good compression. So from length 12 you get down to 6. If the purpose of compression is to save the bandwidth by using JSON, then base 92 can be good choice (excluding ",\,/ that become escaped in JSON).
Surely you can get better compression but the price to pay is a larger alphabet. Just replace 95 in the above formula by the number of symbols.
Unless of course, you know the structure of your integers. For instance, if they have plenty of zeroes, you can base your compression on this knowledge to get much better results.
because the pigeon principle you will end up with some values that get compressed and other values that get expanded. It simply impossible to create a compression algorithm that compress every possible input string (i.e. in your case your numbers).
If you force the cardinality of the output set to be smaller than the cardinality of the input set you'll get collisions (i.e. more input strings get "compressed" to the same compressed binary string). A compression algorithm should be reversible, right? :)

How to encode strings inside serial number using PHP?

I need to generate serial numbers using PHP in the following format "ASDK3-JDAL9-24SFT-J5D8R-D4AL9". One requirement is the fact that I need to encode somehow a timestamp and an email address inside this serial number and then retrieve them when needed. Is there any easy way to do this?
EDIT
To be more specific the allowed numbers and letters in serial number need to be 0-9 and A-Z. To make it more generic I need to have 2 short strings for example encoded in that serial number. For example a date "04/03/2013" and one number "324" or email address if possible. The string don't need to be human readable in the serial number but I need to be able to retrieve them when needed.
Let's do some simple math using base32 encode. You have 36 characters but we'll assume 32 because we're doing a rough estimate.
Base 32 adds 60% overhead. If you want to store a date of 8 characters and a number of 3 characters you'll need at least: ( 8 + 3 ) * 1.6 = 18 characters for this data. Your key is 25 characters long so you'll have 7 / 1.6 = 4 characters left for some randomness. If your random keys have 64 characters you'll have 64^4 = 16 million possibilities.
PHP doesn't have a native base 32 function available but you can write one yourself, the outline is the same as base64 except you take 7 bits at a time instead of 8.

Categories