Is it possible to convert a path utf-8 in url to a short and fixed unique ID in php?
I want every time,same result.
for example this:
questions/صصثبصث/ask?q=asd$a=سیبث
to be something like this: 34Sdd6N
CRC32 is the most suitable option when we talk about performance. If you use MySQL/MariaDB as a DBMS, you may create INT column with UNSIGNED flag and it'll store CRC32 result fully in 4 bytes only per each url.
PHP has crc32 function which accepts strings and has a very good dispersion of hashes, so you should not really care about duplicates.
You may also add dechex wrapper around your CRC32 result if you want to add ABCDEF letters to hash without database performance impact (and for sure use hexdec to convert it back).
save url in the database table and add column "code" or other name like it to table . you can add random string for every insert url and so get url with this code . be carfull do not using duplicate code .
What you are trying to ask is basically Hashing. There are some function available for hashing in php also. One of such functions is available here. I hope you will find it useful.
What is the best way to create a short (6 chars), random, and with low collison probability? I need to create short links like bit.ly.
The problem of md5, sha1, uniqid etc. is that they don't generate uppercase characters, so I'm looking for a case-sensitive output to have a wider range of possible values...
I like to use Hashids for this kind of thing:
Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers.
It converts numbers like 347 into strings like “yr8”, or array of numbers like [27, 986] into “3kTMd”.
You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs.
Hashids has been ported to many languages, including PHP.
(Note that, despite the name, Hashids is not a true hashing system since it is designed to be reversible.)
This is getting me crazy, my md5's don't agree. I have this string:
The Combinations Generator is a tool that allows you to easily create
a series of combinations by selecting the related attributes. For
example, if you're selling t-shirts in three different sizes and two
different colors, the generator will create six combinations for you.
When I hash it on my computer using the md5 function (with php 5.5.0) it produces the following hash: 422f3f656e1a5f95e8b5cf7565d815b5
http://www.miraclesalad.com/webtools/md5.php agrees with my computer's result.
http://www.md5.cz/ disagrees with both my computer and miraclesalad.
This string/md5 pair was initially computed by another computer which also gives the same result as md5.cz.
I read about encoding issues (although the string doesn't contain any non ASCII characters), so I tried the following code on my computer:
<?php
$str = "The Combinations Generator is a tool that allows you to easily create a series of combinations by selecting the related attributes. For example, if you're selling t-shirts in three different sizes and two different colors, the generator will create six combinations for you.";
echo "$str<BR/>";
echo md5($str)."<BR/>";
echo md5(utf8_encode($str))."<BR/>";
echo md5(utf8_decode($str))."<BR/>";
die();
The output is:
The Combinations Generator is a tool that allows you to easily create
a series of combinations by selecting the related attributes. For
example, if you're selling t-shirts in three different sizes and two
different colors, the generator will create six combinations for you.
422f3f656e1a5f95e8b5cf7565d815b5
422f3f656e1a5f95e8b5cf7565d815b5
422f3f656e1a5f95e8b5cf7565d815b5
So it is not about utf8.
Any idea what's happening?
My best guess is that it has something to do with the ' mark in the word "you're" and character encodings. If you remove that quote both sites report the same md5.
I tried feeding the string above incrementally to both sites you linked to in your question, and it turns out that the character breaking the generator at md5.cz is the apostrophe in if you're selling t-shirts.
If you strip the string of special characters before feeding it to a hasher, possibly preserving the string's uniqueness using something like urlencode(), you should get matching hashes for any string.
The strings need to be exactly the same, including the whitespaces.
Probably the sites are using some transformation like trim() or stripslashes().
md5 will return the same value only if the strings are exact.
md5 is md5. That's all there is to it. If you get different hashes from different (non-buggy) implementations, then you're feeding in diffent inputs. Remember that md5 is DESIGNED to produce wildly different outputs if the input(s) are even slightly different. A single whitespace character (tab, linebreak, etc...) at the end of one of your test strings will totally trash your expected hash, because you've fed in a different input.
I'd like to store uuids in the database as BINARY(16) but I need to accept them and present them as the XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX format. Before I jump to split the guid in PHP code (what a fun that is!), are there any libraries or functions that do this out-of-the-box?
You can first remove the dashes with Replace()
and then use Unhex()Performs the inverse operation of HEX(str). That is, it interprets each pair of hexadecimal digits in the argument as a number and converts it to the character represented by the number.
Don't you mean BINARY(36) instead of BINARY(16)
Try http://pecl.php.net/package/uuid
CakePHP has some neat functionality for uuids and it works automatically for CHAR(36) and BINARY(36) from what I understand. Either way you could dig into the source a bit to see if it has what you need.
Edit: Here is a stand alone class that can generate and convert into just about any format you might need: http://www.shapeshifter.se/2008/09/29/uuid-generator-for-php/ I posted this in the below comment but wanted to make sure the link is easily visible for other users.
I've always wondered how and why they do this...an example: http://youtube.com/watch?v=DnAMjq0haic
How are these IDs generated such that there are no duplicates, and what advantage does this have over having a simple auto incrementing numeric ID?
How do one keep it short but still keep it's uniqueness? The string uniqid creates are pretty long.
Kevin van Zonneveld has written an excellent article including a PHP function to do exactly this. His approach is the best I've found while researching this topic.
His function is quite clever. It uses a fixed $index variable so problematic characters can be removed (vowels for instance, or to avoid O and 0 confusion). It also has an option to obfuscate ids so that they are not easily guessable.
Try this: http://php.net/manual/en/function.uniqid.php
uniqid — Generate a unique ID...
Gets a prefixed unique identifier based on the current time in microseconds.
Caution
This function does not generate cryptographically secure values, and should not be used for cryptographic purposes. If you need a cryptographically secure value, consider using random_int(), random_bytes(), or openssl_random_pseudo_bytes() instead.
Warning
This function does not guarantee uniqueness of return value. Since most systems adjust system clock by NTP or like, system time is changed constantly. Therefore, it is possible that this function does not return unique ID for the process/thread. Use more_entropy to increase likelihood of uniqueness...
base62 or base64 encode your primary key's value then store it in another field.
example base62 for primary key 12443 = 3eH
saves some space, which is why im sure youtube is using it.
doing a base62(A-Za-z0-9) encode on your PK or unique identifier will prevent the overhead of having to check to see if the key already exists :)
I had a similar issue - I had primary id's in the database, but I did not want to expose them to the user - it would've been much better to show some sort of a hash instead. So, I wrote hashids.
Documentation: http://www.hashids.org/php/
Souce: https://github.com/ivanakimov/hashids.php
Hashes created with this class are unique and decryptable. You can provide a custom salt value, so others cannot decrypt your hashes (not that it's a big problem, but still a "good-to-have").
To encrypt a number your would do this:
require('lib/Hashids/Hashids.php');
$hashids = new Hashids\Hashids('this is my salt');
$hash = $hashids->encrypt(123);
Your $hash would now be: YDx
You can also set minimum hash length as the second parameter to the constructor so your hashes can be longer. Or if you have a complex clustered system you could even encrypt several numbers into one hash:
$hash = $hashids->encrypt(2, 456); /* aXupK */
(for example, if you have a user in cluster 2 and an object with primary id 456) Decryption works the same way:
$numbers = $hashids->decrypt('aXupK');
$numbers would then be: [2, 456].
The good thing about this is you don't even have to store these hashes in the database. You could get the hash from url once request comes in and decrypt it on the fly - and then pull by primary id's from the database (which is obviously an advantage in speed).
Same with output - you could encrypt the id's on the way out, and display the hash to the user.
EDIT:
Changed urls to include both doc website and code source
Changed example code to adjust to the main lib updates (current PHP lib version is 0.3.0 - thanks to all the open-source community for improving the lib)
Auto-incrementing can easily be crawled. These cannot be predicted, and therefore cannot be sequentially crawled.
I suggest going with a double-url format (Similar to the SO URLs):
yoursite.com/video_idkey/url_friendly_video_title
If you required both the id, and the title in the url, you could then use simple numbers like 0001, 0002, 0003, etc.
Generating these keys can be really simple. You could use the uniqid() function in PHP to generate 13 chars, or 23 with more entropy.
If you want short URLs and predictability is not a concern, you can convert the auto-incrementing ID to a higher base.
Here is a small function that generates unique key randomly each time. It has very fewer chances to repeat same unique ID.
function uniqueKey($limit = 10) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$randstring = '';
for ($i = 0; $i < $limit; $i++) {
$randstring .= $characters[rand(0, strlen($characters))];
}
return $randstring;
}
source: generate random unique IDs like YouTube or TinyURL in PHP
Consider using something like:
$id = base64_encode(md5(uniqid(),true));
uniqid will get you a unique identifier. MD5 will diffuse it giving you a 128 bit result. Base 64 encoding that will give you 6 bits per character in an identifier suitable for use on the web, weighing in around 23 characters and computationally intractable to guess. If you want to be even more paranoid ugrade from md5 to sha1 or higher.
A way to do it is by a hash function with unique input every time.
example (you've tagged the question with php therfore):
$uniqueID = null
do {
$uniqueID = sha1( $fileName + date() );
} while ( !isUnique($uniqueID) )
There should be a library for PHP to generate these IDs. If not, it's not difficult to implement it.
The advantage is that later you won't have name conflicts, when you try to reorganize or merge different server resources. With numeric ids you would have to change some of them to resolve conflicts and that will result in Url change leading to SEO hit.
So much of this depends on what you need to do. How 'unique' is unique? Are you serving up the unique ID's, and do they mean something in your DB? if so, a sequential # might be ok.
ON the other hand, if you use sequential #'s someone could systematically steal your content by iterating thru the numbers.
There are filesystem commands that will generate unique file names - you could use those.
Or GUID's.
Results of hash functions like SHA-1 or MD5 and GUIDs tend to become very long, which is probably something you don't want. (You've specifically mentioned YouTube as an example: Their identifiers stay relatively short even with the bazillion videos they are hosting.)
This is why you might want to look into converting your numeric IDs, which you are using behind the scenes, into another base when putting them into URLs. Flickr e.g. uses Base58 for their canonical short URLs. Details about this are available here: http://www.flickr.com/groups/api/discuss/72157616713786392/. If you are looking for a generic solution, have a look at the PEAR package Mathe_Basex.
Please note that even in another base, the IDs can still be predicted from outside of your application.
I don't have a formula but we do this on a project that I'm on. (I can't share it). But we basically generate one character at a time and append the string.
Once we have a completed string, we check it against the database. If there is no other, we go with it. If it is a duplicate, we start the process over. Not very complicated.
The advantage is, I guess that of a GUID.
This is NOT PHP but can be converted to php or as it's Javascript & so clinetside without the need to slow down the server.. it can be used as you post whatever needs a unique id to your php.
Here is a way to create unique ids limited to
9 007 199 254 740 992 unique id's
it always returns 9 charachters.
where iE2XnNGpF is 9 007 199 254 740 992
You can encode a long Number and then decode the 9char generated String
and it returns the number.
basically this function uses the 62base index Math.log() and Math.Power to get the right index based on the number.. i would explain more about the function but ifound it some time ago and can't find the site anymore and it toke me very long time to get how this works... anyway i rewrote the function from 0.. and this one is 2-3 times faster than the one that i found.
i looped through 10million checking if the number is the same as the enc dec process and it toke 33sec with this one and the other one 90sec.
var UID={
ix:'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
enc:function(N){
N<=9007199254740992||(alert('OMG no more uid\'s'));
var M=Math,F=M.floor,L=M.log,P=M.pow,r='',I=UID.ix,l=I.length,i;
for(i=F(L(N)/L(l));i>=0;i--){
r+=I.substr((F(N/P(l,i))%l),1)
};
return UID.rev(new Array(10-r.length).join('a')+r)
},
dec:function(S){
var S=UID.rev(S),r=0,i,l=S.length,I=UID.ix,j=I.length,P=Math.pow;
for(i=0;i<=(l-1);i++){r+=I.indexOf(S.substr(i,1))*P(j,(l-1-i))};
return r
},
rev:function(a){return a.split('').reverse().join('')}
};
As i wanted a 9 character string i also appended a's on the generated string which are 0's.
To encode a number you need to pass a Number and not a string.
var uniqueId=UID.enc(9007199254740992);
To decode the Number again you need to pass the 9char generated String
var id=UID.dec(uniqueId);
here are some numbers
console.log(UID.enc(9007199254740992))//9 biliardi o 9 milioni di miliardi
console.log(UID.enc(1)) //baaaaaaaa
console.log(UID.enc(10)) //kaaaaaaaa
console.log(UID.enc(100)) //Cbaaaaaaa
console.log(UID.enc(1000)) //iqaaaaaaa
console.log(UID.enc(10000)) //sBcaaaaaa
console.log(UID.enc(100000)) //Ua0aaaaaa
console.log(UID.enc(1000000)) //cjmeaaaaa
console.log(UID.enc(10000000)) //u2XFaaaaa
console.log(UID.enc(100000000)) //o9ALgaaaa
console.log(UID.enc(1000000000)) //qGTFfbaaa
console.log(UID.enc(10000000000)) //AOYKUkaaa
console.log(UID.enc(100000000000)) //OjO9jLbaa
console.log(UID.enc(1000000000000)) //eAfM7Braa
console.log(UID.enc(10000000000000)) //EOTK1dQca
console.log(UID.enc(100000000000000)) //2ka938y2a
As you can see there are alot of a's and you don't want that... so just start with a high number.
let's say you DB id is 1 .. just add 100000000000000 so that you have 100000000000001
and you unique id looks like youtube's id 3ka938y2a
i don't think it's easy to fulfill the other 8907199254740992 unique id's