PHP URL Shortening Algorithm - php

Could anyone recommend a preferred algorithm to use for URL shortening? I'm coding using PHP. Initially I thought about writing something that would start at a character such as "a" and iterate through requests, creating records in a database and therefore having to increment the character to b, c, d ... A, B and so on as appropriate.
However it dawned on me that this algorithm could be pretty heavy/clumsy and there could be a better way to do it.
I read around a bit on Google and some people seem to be doing it with base conversion from the database's ID column. This isn't something I'm too familiar with.
Could someone elaborate and explain to me how this would work? A couple of code examples would be great, too.
I obviously don't want a complete solution as I would like to learn by doing it myself, but just an explanation/pseudo-code on how this would work would be excellent.

Most shortening services just use a counter that is incremented with every entry and convert the base from 10 to 64.
An implementation in PHP could look like this:
function encode($number) {
return strtr(rtrim(base64_encode(pack('i', $number)), '='), '+/', '-_');
}
function decode($base64) {
$number = unpack('i', base64_decode(str_pad(strtr($base64, '-_', '+/'), strlen($base64) % 4, '=')));
return $number[1];
}
$number = mt_rand(0, PHP_INT_MAX);
var_dump(decode(encode($number)) === $number);
The encode function takes an integer number, converts it into bytes (pack), encodes it with the Base-64 encoding (base64_encode), trims the trailing padding = (rtrim), and replaces the characters + and / by - and _ respectively (strtr). The decode function is the inverse function to encode and does the exact opposite (except adding trailing padding).
The additional use of strtr is to translate the original Base-64 alphabet to the URL and filename safe alphabet as + and / need to be encoded with the Percentage-encoding.

You can use base_convert function to do a base convertion from 10 to 36 with the database IDs.
<?php
$id = 315;
echo base_convert($id, 10, 36), "\n";
?>
Or you can reuse some of the ideas presented in the comments on the page bellow:
http://php.net/manual/en/function.base-convert.php

Assuming your PRIMARY KEY is an INT and it auto_increments, the following code will get you going =).
<?php
$inSQL = "INSERT INTO short_urls() VALUES();";
$inResult = mysql_query($inSQL);
$databaseID = base_convert(mysql_insert_id(), 10, 36);
// $databaseID is now your short URL
?>
EDIT: Included the base_convert from HGF's answer. I forgot to base_convert in the original post.

i used to break ID by algorithm similar with how to convert from decimal to hex, but it will use 62 character instead of 16 character that hex would use.
'0','1','2','3','4','5','6','7','8','9',
'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'
example : if you will change ID = 1234567890 you will get kv7yl1 as your a key.

I adopted a "light" solution. On user request I generate a unique identifier (checking for conflicts in db) with this python snipplet:
url_hash = base64.b64encode(os.urandom(int(math.ceil(0.75*7))))[:6]
and store it in db.

The native PHP base_convert() works well for small ranges of numbers, but if you really need to encode large values, consider using something like the implementation provided here which will work to base 64 and beyond if you simply provide more legal characters for the encoding.
http://af-design.com/blog/2010/08/10/working-with-big-integers-in-php/

Here try this method :
hash_hmac('joaat', "http://www.example.com/long/url/", "secretkey");
It will provide you with hash value fit for a professional url shortener, e.g: '142ecd53'

Related

PHP int to longer string for use with qrcode

I'm trying to build a app that would identify a user by scanning a qrcode. For this, I want to use the primary key as the identifier. Since the character length of the integer is short, it wouldn't give a good look as a qrcode.
So my question is: Is it possible to convert the int to string which is longer than 10-12 chars (fixed length if possible),mix of chars and numbers which can be reversed to the original integer.
What you can do is to make SHA256 of your user's ID and convert it to QR code.
Then when user reads QR code and send you sha value you try to match it with SHA of user's IDs in the database.
So here is the way to have SHA hash from user id:
$hash = hash('sha256', $userId); // The result is long enough string for QA
The when you need to find a user based on SHA do the following:
select * from users where SHA2(id, 256) = 'SHA_PROVIDED_BY_USER';
You can in order to speed up the look up process store SHA in the DB as well then query will be much faster.
Another option is to prepend the number with some letters. It will give you random string, nice QRs and you can extract numeric ID with simple regexp.
Using function from PHP random string generator (don't forget to remove numbers from $characters) the code could be:
//encoding
$size = 12;
$str = generateRandomString($size-strlen($userId)).$userId;
//decoding
preg_match('/(\d+)$/', $str, $matching);
$userId = $matching[1];
you can convert your integer to any base with base_convert function.
here is the documentation.
http://php.net/manual/en/function.base-convert.php
The notion that a number, in PHP, has a "maximum size" is a little off (not wrong, just off =P)
From the manual:
If PHP encounters a number beyond the bounds of the integer type, it will be interpreted as a float instead.
So, you could use really large numbers for your QR Codes if you want. Shouldn't be an issue. However, what would be better is to think of "what exactly do you need"?
If you need a numeric value, but want it in hex, you can use base_convert() to go back and forth between the numbers:
$val = 1234;
$hex = base_convert($val, 10, 16);
However, if strings are more for you, you could use base64_encode() to encode it:
$val = 'awesome string value';
$encoded = base64_encode($val);
UPDATE
Based on comments, it sounds like you also want to pad the string if it's too short. You can use str_pad() to accomplish this:
$val = str_pad("1", 10, "0", STR_PAD_LEFT);
echo $val;
// displays: 0000000001
$orig = intval($val);
echo $orig;
// displays: 1
Coderpad Example of str_pad()

php encode string and vice-versa

I have some entities(objects), each one having an id(unique) and a name.
When i want to display oneof these , i have a url like www.domain.com/view/key:xxx.
The key is just the id of the entity encoded with base64_encode, so it's not straightforward from the url what the id is.
What i'm trying to do now (due to the projects specifications) is have the key contain only numbers and letters (base64_encode provides a result like eyJpZCI6IjM2In0= or eyJpZCI6IjM2In0%3D after url encode).
Is there a simple alternative to this? It's not a high-security issue - there are many ways the id can be revealed -, i just need to have a key that contains only letters and numbers that is produced by the entity ID (maybe in combination with its name) that can be decoded to give me the ID back.
All different encode methods i've found can contain special characters as well.
Any help here?
Thanks in advance
This answer doesn't really apply encryption, but since your question was tagged with encoding as well ...
Since PHP 5 you can use bin2hex:
$s = base64_decode('eyJpZCI6IjM2In0=');
echo bin2hex($s);
Output:
7b226964223a223336227d
To decode:
$s = hex2bin($data);
Or:
$s = pack('H*', $data);
Btw, if the id is sensitive you might want to consider tamper proofing it as an alternative to full-blown encryption.
Forgot to mention how you can make base64 encoded data URL safe:
function base64_url_encode($input)
{
return strtr(base64_encode($input), '+/=', '-_,');
}
function base64_url_decode($input)
{
return base64_decode(strtr($input, '-_,', '+/='));
}
There are many PHP encoding/decoding functions.
You can find a lot here and here.
Alternatively just get rid of the = at the end of the base64_encode and add it in the PHP code for base64_decode to find the ID.

PHP short encrypt

I'm using this code:
$url = "http://www.webtoolkit.info/javascript-base64.html";
print base64_encode($url);
But the result is very long: "aHR0cDovL3d3dy53ZWJ0b29sa2l0LmluZm8vamF2YXNjcmlwdC1iYXNlNjQuaHRtbA=="
There is a way to transform long string to short encryption and to be able to transform?
for example:
new_encrypt("http://www.webtoolkit.info/javascript-base64.html")
Result: "431ASDFafk2"
encoding is not encrypting. If you're depending on this for security then you're in for a very nasty shock in the future.
Base 64 encoding is intended for converting data that's 8 bits wide into a format that can be sent over a communications channel that uses 6 or 7 bits without loss of data. As 6 bits is less than 8 bits the encoded string is obviously going to be longer than the original.
This q/a might have what you're looking for:
An efficient compression algorithm for short text strings
It actually links here:
http://github.com/antirez/smaz/tree/master
I did not test it, just found the links.
First off, base64 is an encoding standard and it is not meant to encrypt data, so don't use that. The reason your data is so much longer is that for every 6 bits in the input string, base64 will output 8 bits.
There is no form of encryption that will directly output a shortened string. The result will be just as long in the best case.
A solution to that problem would be to gzip your string and then encrypt it, but with your URL the added data for the zip format will still end up making your output longer than the input.
There are a many different algorithms for encrypting/decryption. You can take a look at the following documentation: http://www.php.net/manual/en/function.mcrypt-list-algorithms.php (this uses mcrypt with different algorithms).
...BUT, you can't force something to be really small (depends on the size you want). The encrypted string needs to have all the information available to be able to decrypt it. Anyways, a base64-string is not that long (compared with really secure salted hashes for example).
I don't see the problem.
Well... you could try using md5() or uniqid().
The first one generate the md5 hash of your string.
md5("http://www.webtoolkit.info/javascript-base64.html");
http://php.net/manual/en/function.md5.php
The second one generates a 13 unique id and then you can create a relation between your string and that id.
http://php.net/manual/en/function.uniqid.php
P.S. I'm not sure of what you want to achieve but these solutions will probably satisfy you.
You can be creative and just do some 'stuff' to encrypt the url so that it is not easy quess able but encode / decode able..
like reverse strings...
or have a random 3 letters, your string encoded with base64 or just replace letters for numbers or numbers for letters and then 3 more random letters.. once you know the recipe, you can do and undo it.
$keychars = "abcdefghijklmnopqrstuvwxyz0123456789";
$length = 2;
$randkey = "";
$randkey2 = "";
for ($i=0;$i<$length;$i++) $randkey .= substr($keychars, rand(1, strlen($keychars) ), 1);

How to generate unguessable "tiny url" based on an id?

I'm interested in creating tiny url like links. My idea was to simply store an incrementing identifier for every long url posted and then convert this id to it's base 36 variant, like the following in PHP:
$tinyurl = base_convert($id, 10, 36)
The problem here is that the result is guessable, while it has to be hard to guess what the next url is going to be, while still being short (tiny). Eg. atm if my last tinyurl was a1, the next one will be a2. This is a bad thing for me.
So, how would I make sure that the resulting tiny url is not as guessable but still short?
What you are asking for is a balance between reduction of information (URLs to their indexes in your database), and artificial increase of information (to create holes in your sequence).
You have to decide how important both is for you. Another question is whether you just do not want sequential URLs to be guessable, or have them sufficiently random to make guessing any valid URL difficult.
Basically, you want to declare n out of N valid ids. Choose N smaller to make the URLs shorter, and make n smaller to generate URLs that are difficult to guess. Make n and N larger to generate more URLs when the shorter ones are taken.
To assign the ids, you can just take any kind of random generator or hash function and cap this to your target range N. If you detect a collision, choose the next random value. If you have reached a count of n unique ids, you must increase the range of your ID set (n and N).
I would simply crc32 url
$url = 'http://www.google.com';
$tinyurl = hash('crc32', $url ); // db85f073
cons: constant 8 character long identifier
This is really cheap, but if the user doesn't know it's happening then it's not as guessable, but prefix and postfix the actual id with 2 or 3 random numbers/letters.
If I saw 9d2a1me3 I wouldn't guess that dm2a2dq2 was the next in the series.
Try Xor'ing the $id with some value, e.g. $id ^ 46418 - and to convert back to your original id you just perform the same Xor again i.e. $mungedId ^ 46418. Stack this together with your base_convert and perhaps some swapping of chars in the resultant string and it'll get quite tricky to guess a URL.
Another way would be to set the maximum number of characters for the URL (let's say it's n). You could then choose a random number between 1 and n!, which would be your permutation number.
On which new URL, you would increment the id and use the permutation number to associate the actual id that would be used. Finally, you would base 32 (or whatever) encode your URL. This would be perfectly random and perfectly reversible.
If you want an injective function, you can use any form of encryption. For instance:
<?php
$key = "my secret";
$enc = mcrypt_ecb (MCRYPT_3DES, $key, "42", MCRYPT_ENCRYPT);
$f = unpack("H*", $enc);
$value = reset($f);
var_dump($value); //string(16) "1399e6a37a6e9870"
To reverse:
$rf = pack("H*", $value);
$dec = rtrim(mcrypt_ecb (MCRYPT_3DES, $key, $rf, MCRYPT_DECRYPT), "\x00");
var_dump($dec); //string(2) "42"
This will not give you a number in base 32; it will give you the encrypted data with each byte converted to base 16 (i.e., the conversion is global). If you really need, you can trivially convert this to base 10 and then to base 32 with any library that supports big integers.
You can pre-define the 4-character codes in advance (all possible combinations), then randomize that list and store it in this random order in a data table. When you want a new value, just grab the first one off the top and remove it from the list. It's fast, no on-the-fly calculation, and guarantees pseudo-randomness to the end-user.
Hashids is an open-source library that generates short, unique, non-sequential, YouTube-like ids from one or many numbers. You can think of it as an algorithm to obfuscate numbers.
It converts numbers like 347 into strings like "yr8", or array like [27, 986] into "3kTMd". You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs.
Use it when you don't want to expose your database ids to the user.
It allows custom alphabet as well as salt, so ids are unique only to you.
Incremental input is mangled to stay unguessable.
There are no collisions because the method is based on integer to hex conversion.
It was written with the intent of placing created ids in visible places, like the URL. Therefore, the algorithm avoids generating most common English curse words.
Code example
$hashids = new Hashids();
$id = $hashids->encode(1, 2, 3); // o2fXhV
$numbers = $hashids->decode($id); // [1, 2, 3]
I ended up creating a md5 sum of the identifier, use the first 4 alphanumerics of it and if this is a duplicate simply increment the length until it is no longer a duplicate.
function idToTinyurl($id) {
$md5 = md5($id);
for ($i = 4; $i < strlen($md5); $i++) {
$possibleTinyurl = substr($md5, 0, $i);
$res = mysql_query("SELECT id FROM tabke WHERE tinyurl='".$possibleTinyurl."' LIMIT 1");
if (mysql_num_rows($res) == 0) return $possibleTinyurl;
}
return $md5;
}
Accepted relet's answer as it's lead me to this strategy.

Convert MD5 to base62 for URL

I have a script to convert to base 62 (A-Za-z0-9) but how do I get a number out of MD5?
I have read in many places that because the number from an MD5 is bigger than php can handle as an integer it will be inaccurate... As I want a short URL anyway and was not planning on using the whole hash, maybe just 8 characters of it....
So my question is how to get part of the number of an MD5 hash?
Also is it a bad idea to use only part of the MD5 hash?
I'm going to suggest a different thing here.. Since you are only interested in using a decimal chunk of the md5 hash why don't you use any other short numeric hash like CRC32 or Adler? Here is an example:
$hash = sprintf('%u', crc32('your string here'));
This will produce a 8 digit hash of your string.
EDIT: I think I misunderstood you, here are some functions that provide conversions to and from bases up to 62.
EDIT (Again): To work with arbitrary length numbers you must use either the bc_math or the GMP extension, here is a function that uses the bc_math extension and can also convert from base 2 up to base 62. You should use it like this:
echo bc_base_convert(md5('your url here'), 16, 62); // public base 62 hash
and the inverse:
echo bc_base_convert('base 62 encoded value here', 62, 16); // private md5 hash
Hope it helps. =)
If it's possible, I'd advise not using a hash for your URLs. Eventually you'll run into collisions... especially if you're truncating the hash. If you go ahead and implement an id-based system where each item has a unique ID, there will be far fewer headaches. The first item will be 1, the second'll be 2, etc---if you're using MySQL, just throw in an autoincrement column.
To make a short id:
//the basic example
$sid = base_convert($id, 10, 36);
//if you're going to be needing 64 bit numbers converted
//on a 32 bit machine, use this instead
$sid = gmp_strval(gmp_init($id, 10), 36);
To make a short id back into the base-10 id:
//the basic example
$id = base_convert($id, 36, 10);
//if you're going to be needing 64 bit numbers
//on a 32 bit machine, use this instead
$id = gmp_strval(gmp_init($shortid, 36));
Hope this helps!
If you're truly wanting base 62 (which can't be done with gmp or base_convert), check this out:
http://snipplr.com/view/22246/base62-encode--decode/
You can do this like this: (Not all steps are in php, it's been a long time that I've used it.)
Create a md5 hash of the script like this:
$hash = md5(script, raw_output=true);
Convert that number to base 62.
See the questions about base conversion of arbitrary sized numbers in PHP
Truncate the string to a length you like.
There's no risk in using only a few of the bits of a md5. All that changes is danger of collisions.
There actually is a Java implementation which you could probably extract. It's an open-source CMS solution called Pulse.
Look here for the code of toBase62() and fromBase62().
http://pulse.torweg.org/javadoc/src-html/org/torweg/pulse/util/StringUtils.java.html
The only dependency in StringUtils is the LifeCycle-class which provides a way to get a salted hash for a string which you might even omit all together or just copy the method over to your copy StringUtils. Voilá.
You can do something like this,
$hash = md5("The data to be hashed", true);
$ints = unpack("L*num", $hash);
$hash_str = base62($ints['num1']) . base62($ints['num2']) . base62($ints['num3']) . base62($ints['num4'])
As of PHP 5.3.2, GMP supports bases up to 62 (was previously only 36), so brianreavis's suggestion was very close. I think the simplest answer to your question is thus:
function base62hash($source, $chars = 22) {
return substr(gmp_strval(gmp_init(md5($source), 16), 62), 0, $chars);
}
Converting from base-16 to base-62 obviously has space benefits. A normal 128-bit MD5 hash is 32 chars in hex, but in base-62 it's only 22. If you're storing the hashes in a database, you can convert them to raw binary and save even more space (16 bytes for an MD5).
Since the resulting hash is just a string representation, you can just use substr if you only want a bit of it (as the function does).
You may try base62x to get a safe and compatible encoded representation.
Here is for more information about base62x, or simply -base62x in -NatureDNS.
shell> ./base62x -n 16 -enc 16AF
1Ql
shell> ./base62x -n 16 -dec 1Ql
16AF
shell> ./base62x
Usage: ./base62x [-v] [-n <2|8|10|16|32>] <-enc|dec> string
Version: 0.60
Here is an open-source Java library that converts MD5 strings to Base62 strings
https://github.com/inder123/base62
Md5ToBase62.toBase62("9e107d9d372bb6826bd81d3542a419d6") ==> cbIKGiMVkLFTeenAa5kgO4
Md5ToBase62.fromBase62("4KfZYA1udiGCjCEFC0l") ==> 0000bdd3bb56865852a632deadbc62fc
The conversion is two-way, so you will get the original md5 back if you convert it back to md5:
Md5ToBase62.fromBase62(Md5ToBase62.toBase62("9e107d9d372bb6826bd81d3542a419d6")) ==> 9e107d9d372bb6826bd81d3542a419d6
Md5ToBase62.toBase62(Md5ToBase62.fromBase62("cbIKGiMVkLFTeenAa5kgO4")) . ==> cbIKGiMVkLFTeenAa5kgO4
```
You could use a slightly modified Base 64 with - and _ instead of + and /:
function base64_url_encode($str) {
return strtr(base64_encode($str), array('+'=>'-', '/'=>'_'));
}
function base64_url_decode($str) {
return base64_decode(strtr($str, array('-'=>'+', '_'=>'/')));
}
Additionally you could remove the trailing padding = characters.
And to get the raw MD5 value (binary string), set the second parameter (named $raw_output in the manual) to true:
$raw_md5 = md5($str, true);

Categories