I'm building a MySQL database with a table that will store lots of rows (say, like 1.000.000).
Each row will have a numeric ID but I don't want to make it incremental, instead it has to be generated from a unique string.
For example, a user ABC will create a new element at time 123, so the original string will be "ABC-123". A PHP function will "translate" it to a number.
This way, I'll have the possibility to re-generate the same ID from the same pair of data in future. More or less... see it as a Java hashCode() function.
I've found this function that "translates" a string into a number:
function hashCode($string) {
return base_convert(substr(md5($string), 0, 16), 16, 10);
}
I have some doubts about it. First, it starts from creating an md5 hash which is 32 characters long, then cuts it to 16. It's a visible lack of data so how could that be an unique hash?
Second, the produced 16-digits number is converted from base-16 to base-10, so the max value is 18446744073709552046. The MySQL column that will store this number has an UNSIGNED BIGINT datatype so the maximum value is 18446744073709551615. It's not enough since
18446744073709551615 - 18446744073709552046 = -431
Am I missing something, or is there a better way to do what I need?
Related
I'm using Murmurhash3 to create unique hashes for text entries. When text entries are created, I'm using this php implementation, which returns a 32 bit hash integer, to get the hash value. The hash is stored in a BINARY(16) database column. I also need to update our existing database so I'm using this MySql implementation to update the database. In order to match the php created hash, I'm base converting it and lower-casing it.
UPDATE column SET hash=LOWER(CONV(murmur_hash_v3(CONCAT(column1, column2), 0), 10, 32));
It matches the php version about 80% of the time, which obviously isn't going to cut it. For example, hashing the string 'engtest' creates 15d15m in php and 3uqiuqa in MySql. However, the string 'engtest sentence' creates the same hash in both. What could I be doing wrong?
Figured it out. PHP's integer type is signed and occasionally Murmurhash was producing negative hash values that didnt match the always positive MySql values. The solution was to format php's hash value using sprintf with format set to "%u" before the base conversion.
$hash = murmurhash3_int($text);
return base_convert(sprintf("%u\n", $hash), 10, 32);
See the php crc32 docs for more info.
How to generate unique numeric value with fixed length from given data in PHP? For instance, I can have a string that contains numbers and characters and I need to generate unique numeric value with length 6. Thanks!
You won't be able to generate a unique numeric value out of an input with any algorithm. That's the problem of converting an input into a pseudorandom output. If you have an input string of 20 characters and an output of only 6, there will be repeated results, because:
input of 20 characters (assuming 58 alphanumerical possibilities):
58^20 = 1.8559226468222606056912232424512e+35 possibilities
output of 6 characters (assuming 10 numerical possibilities):
10^6 = 1000000 possibilities
So, to sum up, you won't be able to generate a unique number out of a string. Your best chances are to use a hashing function like md5 or sha1. They are alphanumerical but you can always convert them into numbers. However, once you crop them to, let's say, 6 digits, their chances to be repeated increase a lot.
It is impossible to generate a completely unique value given an arbitrary value with a limit on the number of characters unfortunately. There are an infinite number of possible values, while there are only 999999 possible values in a numeric value of length 6.
In PHP however you can do the following:
$list_of_numeric_values = array();
foreach ($list_of_given_values as $value)
{
if (!in_array($value, $list_of_numeric_values))
$list_of_numeric_values[] = $value;
}
After this is complete, the array then will have a unique key for each possible value you can use.
If you dont need to calculate these all at the same time you can follow a similar algorithm where instead of just "searching" the array using PHP perhaps its a SELECT on a MySQL table to see if the entry currently exists, and using the auto increment of the primary key to get your value.
I'm making an anonymous commenting system for my blog. I need the users to have a randomly picked username from an array I have made, it has 600 usernames. I can't just make it random because then people wouldn't know if it was the same person posting a reply, so I have given each post a randomly generated key between 1-9999, using the key and the users ID I want to do some sort of calculation so that number will stay consistent through that particular post. The result has to be within 1-600.
something like:
user_id x foo(1-9999) = bar(1-600)
Thanks.
What you're probably looking for is a hash function. To quote Wikipedia:
A hash function is any algorithm or subroutine that maps large data sets of variable length, called keys, to smaller data sets of a fixed length.
So you can use a standard hash function, plus modular arithmetic to further map the output of that hash function to your username range, like so:
function anonymise($username, $post_key) {
$hash = hash("adler32", "$username/$post_key");
$hash_decimal = base_convert($hash, 16, 10);
$anonymised_id = $hash_decimal % 600;
return $usernames[$anonymised_id];
}
So, what you really want is a unique identifier for every poster?
Why not use http://php.net/ip2long modded 600?
of course, you'll have to do some collision detection with that too.
You can try using md5 on the concatinated id and post key. it gives you a consistent 32 byte hash of that. And it is actually a hexadecimal string, so you can actually covet it to a number easily by doing a hex to int conversion.
Edit: Based on your feedback. you can take the generated int and modulas it by 600.
I'm generating a random 10 character string with php, and inserting it into a column in my DB. My issue is that I want this string to be unique(in the database). I've thought of many ways of doing this, but wondering which way is the most efficient.
My PHP looks like this(random string can only be 10 chars long):
//generates an almost unique(not quite) ticket number
$pretrimmedtask = md5(uniqid(mt_rand(),true));
$tasknum = substr($pretrimmedtask ,0,10);
I then take this "unique" value and insert it. But because of the trim of the string, this value is by no means unique. I'm wondering what is the best way of making sure this value could never be duplicated, while still being efficient.
(I understand that querying the db to look for this value in there is possible... but I would rather do it in a more elegant fashion)
You should update your table and make the relevant column be a UNIQUE KEY, than try to insert the generated string, if no rows where inserted, generate another key and try again.
ALTER TABLE table_name ADD UNIQUE KEY (column_name);
The code below will try to INSERT a new row into table1, if unable it will try again with a different random generated $key.
IE. the query will not succeed if col2 has a unique key constraint and the value of $key already exists in the column.
function generate_random_string () {
$charset = array_merge (
range ('a', 'z'), range ('A','Z'), range ('0','0')
);
shuffle ($charset);
return join ('', array_slice ($charset, 0, 9));
}
/* ....................................................... */
do {
$key = generate_random_string ();
} while (
!mysql_query ("INSERT INTO table1 (col1,col2) VALUES (123, '$key')")
);
You can of course use your own algorithm for generating random strings.
NOTE: Make sure that the query can potentially succeed so that you don't get caught in an endless loop.
Create unique mysql index that covers only that field and insert the value in a loop until success.
Like:
while (true) {
$random = generate it;
try to insert;
if (inserted without errors) break;
}
Does it has to be 10 character. With crypt() you can generate 13 character long hashes.
Here http://www.php.net/manual/en/function.hash.php you can check the length of different hashing methods. None of them unfortunatelly produces exactly 10 character long string. But you can generate 8 character long string and add two characters.
Another possible solution that I came up with is using current date of generating the string. Unixtimestamp is only numbers and to long but we can convert date into 10 char string in the following manner
Create two arrays, first with keys from 1 til 31 and assign one character for each key (26 letters plus 10 numbers will do the trick), the second array need to have keys from 0 til 99 and have values of two charater long string.
Now take the day, month, year (2 digits), hour, minute and seconds of the current time and replace the value with the value from the array, where day and month take from the first array and the rest from the second. Combine that and you have 10 character long unique string.
Had this issue by myself.
I use (insert) time() and insert/row id (+ special letters array) md5-ed all together as one string and hashed - for some cookie purposes lets say. So, that key is exposed.
Insert (or row) id cannot be duplicated, and merged with unix timestamp (10 digits)+random letters and md5-ed all together creates surely unique "second key" somewhat harder to break what is available via cookies. In this case is impossible to break it.
But it is a hash.
If 10 chars is essential - as I can't find reason to be - you may create function for creating keys like (99999999999999999999-primary key)+substr 10 with letters included also, but that depends on a level of exposure of that key.
However, substr is not an option, and primary key role is simply - essential.
Hi I have got a column in my database which is set to Int.
But my data always starts with a 0 so whenever I add a new record, it strips the 0 off and I don't want it to do that incase the first character has to be a 1 at some point.
How can I overcome this issue?
Is the best way to use VARCHAR any then validate using PHP?
Update
If I enter 02118272 it is saved to the database as 2118272.
The integer 7 is the same thing as the integer 000000000000000000000000007. They're both... ya know... seven. When stored in the database, it's actually stored as something like 000000000000000000000000007 but most MySQL clients won't bother to show all those zeros.
If it's important in your application to show these numbers using a certain number of digits, you can add back as many leading zeros as you want using the str_pad() function:
str_pad($your_string, 10, '0', STR_PAD_LEFT);
Here, 10 is the length you want the string to be, '0' is the character that will get added on in order to make it that length, and STR_PAD_LEFT says to add characters to the left-hand side of the string.
If, on the other hand, the number '007' is fundamentally different than the number '7', then you will have to use a VARCHAR() field to store it. Those are no longer integers; they're strings with very different meanings.
What you should be storing in your database is data. Formatting of that data is the responsibility of applications, not the database itself.
I would store it as an integer and, if you need that to 7 decimal places with leading zeros, the right place to do that is after extraction of the data to your application.
I think that you should use varchar type for that field. If you want to convert a variable to integer in php you can simply do this:
$var=preg_replace("/[^\d]/","",$var);
with this you delete all characters that aren't numbers and then you can put this value into the db preserving the initial 0.