Youtube URL-style hashes - php

I'm trying to find out how to build nice and short alpha numeric hashes like the kind used in youtube urls.
Example: http://www.youtube.com/watch?v=rw71YOSXhpE
Where rw71YOSXhpE would convert into video number 12834233 (for example).
These integers could be reversed in PHP to an integer and then looked up in a database.
I've run the following in PHP:
<?
$algoList = hash_algos( );
foreach( $algoList as $algoName )
{
echo $algoName . ": " . hash( $algoName, 357892345234 ) . "\n";
}
?>
But none of them come back with characters beyond the a-f you'd expect. Youtube have the whole english alphabet in upper and lower case. Any idea how they've done it?

You want to convert your integer to a different base, one which uses the full alphabet. Base64 could work but you will get strings which are longer than the original integer because the base64_encode() function takes a string, not an integer.
My suggestion would be to use the base_convert() function like so:
$id = 12834233;
$hash = base_convert($id, 10, 36);
and the reverse
$hash = '7n2yh'
$id = base_convert($hash, 36, 10);
This however will only use lowercase letters a-z and 0-9. If you wish to use all upper and lower case letters you would need to convert to base 62 (or higher if you use symbols). However to do this you will have to write your own code.
Edit: Gordon pointed out this great link to base62 encoding in php.

You could use base_convert() to convert your number into base 36, which uses 0-9 plus a-z, and which has the advantage that your URL parameter is not case-sensitive.

I had a similar problem, and wrote a class for myself just for this.
Documentation: http://www.hashids.org/php/
Souce: https://github.com/ivanakimov/hashids.php
You would use it like this:
require('lib/Hashids/Hashids.php');
$hashids = new Hashids\Hashids('salt value', 11);
$hash = $hashids->encrypt(12834233);
You would get the following $hash: Rz0zlKZGg6g
Provide your own unique string for the salt value. The 11 in the code is optional and stands for minimum hash length. (You can also define your own alphabet string as 3rd param to the constructor).
To decrypt the hash you would do this:
$numbers = $hashids->decrypt($hash);
So $numbers will be: [12834233]
(It's an array because hashids can encrypt/decrypt several numbers into one hash.)
EDIT:
Changed urls to include both doc website and code source
Changed example code to adjust to the main lib updates (current PHP lib version is 0.3.0 - thanks to all the open-source community for improving the lib)

probably a base64 encoding of a (part of) an md5 ? although I seems to recall that there are short ones and long ones, so it could be md5 or sha1.
if you base64 decode the token you gave, with proper padding, the result is an 8 bit entity, so it's not a full md5. It could be only the first half of it.

Something similar could be done with base64_encode().

Related

PHP Replace all characters with a symbol

I am trying to make an account generator with censured passwords, and I don't want to replace all characters with just 10 *'s. I want it to be like this:
if the password is 15 characters long, it will be replaced with 15 *'s. I tried to use this:
$censpass = preg_replace('/[a-zA-Z0-9\']/', '*', $accounts[$i]['password']);
but as you might know, that doesn't work for !'s. How can I use preg_replace with every single character in PHP?
If someone doesn't understand:
I want this: "password123!"
to be replaced with this: "************" with the accurate length using preg_replace
If this exists somewhere else, please link it below, I tried to find this but I could only find how to replace some characters, like numbers only
Thank you :)
For your goal I'd use a different approach, such as:
$encpass = str_pad('', strlen($accounts[$i]['password']), '*');
In fact, there is no need to use a regular expression (which is slow and resource consuming) just to generate a string the same length as another one.
Anyway, if you still want to use your solution, the correct regexp for your use case is simply a . such as:
$censpass = preg_replace('/./', '*', $accounts[$i]['password']);
Have a look here: http://php.net/manual/en/regexp.reference.dot.php

Shortest possible query string for a numerically indexed array in PHP

I’m looking for the most concise URL rather than the shortest PHP code. I don’t want my users to be scared by the hideous URLs that PHP creates when encoding arrays.
PHP will do a lot of repetition in query string if you just stuff an array ($fn) through http_build_query:
$fs = array(5, 12, 99);
$url = "http://$_SERVER[HTTP_HOST]/?" .
http_build_query(array('c' => 'asdf', 'fs' => $fs));
The resulting $url is
http://example.com/?c=asdf&fs[0]=5&fs[1]=12&fs[3]=99
How do I get it down to a minimum (using PHP or methods easily implemented in PHP)?
Default PHP way
What http_build_query does is a common way to serialize arrays to URL. PHP automatically deserializes it in $_GET.
When wanting to serialize just a (non-associative) array of integers, you have other options.
Small arrays
For small arrays, conversion to underscore-separated list is quite convenient and efficient. It is done by $fs = implode('_', $fs). Then your URL would look like this:
http://example.com/?c=asdf&fs=5_12_99
The downside is that you’ll have to explicitly explode('_', $_GET['fs']) to get the values back as an array.
Other delimiters may be used too. Underscore is considered alphanumeric and as such rarely has special meaning. In URLs, it is usually used as space replacement (e.g. by MediaWiki). It is hard to distinguish when used in underlined text. Hyphen is another common replacement for space. It is also often used as minus sign. Comma is a typical list separator, but unlike underscore and hyphen in is percent-encoded by http_build_query and has special meaning almost everywhere. Similar situation is with vertical bar (“pipe”).
Large arrays
When having large arrays in URLs, you should first stop coding a start thinking. This almost always indicates bad design. Wouldn’t POST HTTP method be more appropriate? Don’t you have any more readable and space efficient way of identifying the addressed resource?
URLs should ideally be easy to understand and (at least partially) remember. Placing a large blob inside is really a bad idea.
Now I warned you. If you still need to embed a large array in URL, go ahead. Compress the data as much as you can, base64-encode them to convert the binary blob to text and url-encode the text to sanitize it for embedding in URL.
Modified base64
Mmm. Or better use a modified version of base64. The one of my choice is using
- instead of +,
_ instead of / and
omits the padding =.
define('URL_BASE64_FROM', '+/');
define('URL_BASE64_TO', '-_');
function url_base64_encode($data) {
$encoded = base64_encode($data);
if ($encoded === false) {
return false;
}
return str_replace('=', '', strtr($encoded, URL_BASE64_FROM, URL_BASE64_TO));
}
function url_base64_decode($data) {
$len = strlen($data);
if (is_null($len)) {
return false;
}
$padded = str_pad($data, 4 - $len % 4, '=', STR_PAD_RIGHT);
return base64_decode(strtr($padded, URL_BASE64_TO, URL_BASE64_FROM));
}
This saves two bytes on each character, that would be percent-encoded otherwise. There is no need to call urlencode function, too.
Compression
Choice between gzip (gzcompress) and bzip2 (bzcompress) should be made. Do not want to invest time in their comparison, gzip looks better on several relatively small inputs (around 100 chars) for any setting of block size.
Packing
But what data should be fed into the compression algorithm?
In C, one would cast array of integers to array of chars (bytes) and hand it over to the compression function. That’s the most obvious way to do things. In PHP the most obvious way to do things is converting all the integers to their decimal representation as strings, then concatenation using delimiters, and only after that compression. What a waste of space!
So, let’s use the C approach! We’ll get rid of the delimiters and otherwise wasted space and encode each integer in 2 bytes using pack:
define('PACK_NUMS_FORMAT', 'n*');
function pack_nums($num_arr) {
array_unshift($num_arr, PACK_NUMS_FORMAT);
return call_user_func_array('pack', $num_arr);
}
function unpack_nums($packed_arr) {
return unpack(PACK_NUMS_FORMAT, $packed_arr);
}
Warning: pack and unpack behavior is machine-dependent in this case. Byte order could change between machines. But I think it will not be a problem in practice, because the application will not run on two systems with different endianity at the same time. When integrating multiple systems, though, the problem might arise. Also if you switch to a system with different endianity, links using the original one will break.
Encoding together
Now packing, compression and modified base64, all in one:
function url_embed_array($arr) {
return url_base64_encode(gzcompress(pack_nums($arr)));
}
function url_parse_array($data) {
return unpack_nums(gzuncompress(url_base64_decode($data)));
}
See the result on IdeOne. It is better than OP’s answer where on his 40-element array my solution produced 91 chars while his one 98. When using range(1, 1000) (generates array(1, 2, 3, …, 1000)) as a benchmark, OP’s solution produces 2712 characters while mine just 2032 characters. This is about 25 % better.
For the sake of completeness, OP’s solution is
function url_embed_array($arr) {
return urlencode(base64_encode(gzcompress(implode(',', $arr))));
}
There are multiple approaches possible:
serialize + base64 - can swallow any object, but data overhead is horrible.
implode + base64 - limited to arrays, forces user to find unused char as delimiter, data overhead is much smaller.
implode - unsafe for unescaped strings. Requires strict data control.
$foo = array('some unsafe data', '&&&==http://', '65535');
$ser = base64_encode(serialize($foo));
$imp = implode($foo, '|');
$imp2 = base64_encode($imp);
echo "$ser\n$imp\n$imp2";
Results are as follows:
YTozOntpOjA7czoxNjoic29tZSB1bnNhZmUgZGF0YSI7aToxO3M6MTI6IiYmJj09aHR0cDovLyI7aToyO3M6NToiNjU1MzUiO30=
some unsafe data|&&&==http://|65535
c29tZSB1bnNhZmUgZGF0YXwmJiY9PWh0dHA6Ly98NjU1MzU=
While serialize+base64 results are horribly long, implode+serialize gives output of manageable length with safety for GET… except for that = at end.
I believe the answer depends on the size of the query string.
Short query strings
For shorter query strings, this may be the best way:
$fs = array(5, 12, 99);
$fs_no_array = implode(',', $fs);
$url = "http://$_SERVER[HTTP_HOST]/?" .
http_build_query(array('c' => 'asdf', 's' => 'jkl')) . '&fs=' . $fs_no_array;
resulting in
http://example.com/?c=asdf&s=jkl&fs=5,12,99
On the other end you do this to get your array back:
$fs = array_map('intval', explode(',', $_GET['fs']));
Quick note about delimiters: A valid reasons to avoid commas is that they are used as delimiters in so many other applications. On the off-chance you may want to parse your URLs in Excel, for example, the commas might make it slightly more difficult. Underscores also would work, but can blend in with the underlining that is standard in web formatting for links. So dashes may actually be a better choice than either commas or underscores.
Long query strings
I came across another possible solution:
$fs_compressed = urlencode(base64_encode(gzcompress($fs_no_array)));
On the other end it can be decompressed by
$fs_decompressed = gzuncompress(base64_decode($_GET['fs']));
$fs = array_map('intval', explode(',', $fs_decompressed));
assuming it’s passed in through GET variable.
Effectivity tests
31 elements
$fs = array(7,2,3,4,5,6,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,52,53,54,61);
Result:
eJwFwckBwCAQxLCG%2FMh4D6D%2FxiIdpGiG5fLIR0IkRZoMWXLIJQ8%2FDIqFjYOLBy8jU0yz%2BQGlbxAB
$fs_no_array is 84 characters long, $fs_compressed 84 characters long. The same!
40 elements
$fs = array(7,2,3,4,5,6,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,52,53,54,61);
Result:
eJwNzEkBwDAQAzFC84jtPRL%2BxFoB0GJC0QyXhw4SMgoq1GjQoosePljYOLhw48GLL37kEJE%2FDCnSZMjSpkMXow%2BdIBUs
$fs_no_array is 111 characters long, $fs_compressed 98 characters long.
Summary
The savings is only about 10 %. But at greater lengths the savings will increase to beyond 50 %.
If you use Yahoo sites, you notice things like comma separated lists as well as sometimes a series of random looking characters. They may be employing these solutions in the wild already.
Also check out this stack question, which talks in way too much detail about what is allowed in a URI.

Number_format for arabic/persian numbers

I have a "price" field in a mysql database, which contains the price of a product in arabic or persian numbers.
Example of number: ۱۲۳۴۵۶۷۸۹۰ //1234567890
I cannot figure out how to format this so that it is formatted in a user-friendly way.
By that I mean, grouped by thousands or something similiar.
This would be ideal: ۱ ۲۳۴ ۵۶۷ ۸۹۰
number_format in php is what I would have used on latin numbers.
Is there any function which I don't know about, to make this possible?
If not, ideas of how to create one is appreciated.
Thanks
You could use a regex like this:
([۱۲۳۴۵۶۷۸۹۰])(?=(?:[۱۲۳۴۵۶۷۸۹۰]{3})+$)
Search and replace with \1, on the string ۱۲۳۴۵۶۷۸۹۰ would give you ۱,۲۳۴,۵۶۷,۸۹۰ (using , instead of space since SO trims them off. But using space in the replace instead will work just as well).
I would have to agree with the suggestion in the comments though, store the data using the numeric types available and convert them on input/output.
If you can store numbers in your database instead of strings (or convert to ascii numbers), then standard currency formatting with group-separators can be done with php5-intl functions. You just need ISO locale and currency codes:
$nf = new \NumberFormatter('fa_IR', \NumberFormatter::CURRENCY);
echo $nf->formatCurrency(1234.1234, 'IRR');
۱٬۲۳۴ ﷼
Otherwise, #rvalvik's answer is good.
See http://php.net/manual/en/class.numberformatter.php
More elegantly written than #rvalvik's regex pattern, you can add a comma after a character that is followed by 3, 6, 9, etc. characters.
Code: (Demo)
$str = '۱۲۳۴۵۶۷۸۹۰';
var_export(
preg_replace(
'~.\K(?=(?:.{3})+$)~u',
",",
$str
)
);
Output:
'۱,۲۳۴,۵۶۷,۸۹۰'
Here is similar answer to a related question.

Reduce a MongoDB id into a shorter hash

I am looking for the best way to convert a MongoDB id 504aaedeff558cb507000004 into a shorter representation in PHP? Basically, users can reference id's in the app, and that long string is difficult.
The one caveat is, collisions should be 'rare'. Can we somehow get it down to 4, 5 or 6 characters?
Thanks.
While a hex digit can store 16 different states, a base64 encoded digit can store 64 different states, so you can store your whole MongoDB Id in 16 digits instead of 24 without losing any information:
print hexToBase64("50b3701de3de2a2416000000") . "\n"; # -> ULNwHePeKiQWAAAA
print base64ToHex("ULNwHePeKiQWAAAA") . "\n"; # -> 50b3701de3de2a2416000000
function base64ToHex($string) {
return bin2hex(base64_decode($string));
}
function hexToBase64($string) {
return base64_encode(hex2bin($string));
}
Your unique ID to start with can be mapped by [0-9a-f]. Shortening can be done in multiple ways - one easy way is to re-map character sets.
Our aim will be to cut the string size in two by replacing characters. A single character is one of 16, so two characters gives you 16^2 = 256 possibilities... I'm sure you know where I'm going with this. Take each couple of characters in your string, and calculate the mapping value. Generate the ASCII character corresponding, and use this instead. If you dislike having such an ugly ID at the end, base64-encode it - you'll get a string which is roughly 1/3 shorter than the one you started with.

php covert a Hexadecimal number 273ef9 into a path 27/3e/f9

As the title reads, what it is an effeicent way to covert a Hexadecimal number such as 273ef9 into a path such as 27/3e/f9 in PHP?
updated:::
actually, I want a unsual number convert to dexadecimal and furthr convert to a path....but may be we can skip the middle step.
How about combining a str_split with implode? Might not be super efficient but very readable:
implode('/',str_split("273ef9",2));
As a side note, this will of course work well with larger hex strings and can handle partial (3,5,7 in length) hex numbers (by just printing it as a single letter after the last slash).
Edit: With what you're asking now (decimal -> hex -> path), it would look like this:
$num = 2572025;
$hex = dechex($num);
implode('/',str_split($hex,2));
Of course, you can combine it for an even shorter but less readable representation:
implode('/',str_split(dechex($num),2));
The most efficient approach is to touch each character in the hex value exactly once, building up the string as you go. Because the string may have either an odd or even number of digits, you'll have to start with a check for this, outputting a single digit if it's an odd-length string. Then use a for loop to append groups of two digits, being careful with whether or not to add a slash. It will be a few lines of code.
Unless this code is being executed many millions of times, it probably isn't worth writing out this algorithm; Michael Petrov's is so readable and so nice. Go with this unless you have a real need to optimize.
By the way, to go from a decimal number to a hex string, just use dechex :)

Categories