I'm making a data base and I would like to get two unique ids from each player like clash royale game .
1.id unique only number (It knows only the user)
2.id unique number and letters (all the players can see it).
I was thinking of using the time to get the first unique id, and then add a random number, but I think this would create a string that is too long.
Moreover it does not guarantee 100% yet to obtain a unique id.
I'm working with PHP and MySQL
https://secure.php.net/manual/en/function.uniqid.php
It won't return a numeric string, for that I guess you could use microtime plus a sufficiently long random number to limit the chances of collision to virtually nil. But why not use uniqid for both?
Related
I am trying to come up with a solution to generate a Unique Id preferably on the Fly. Usage scope could be Order, Product or Plan Id, where there is no security involved.
I don't like idea of generating a random number and then querying the db to check its uniqueness and repeating the process if it is not in this case where security isn't an issue.
Also I don't prefer using Auto Increment id since it looks so simple.
My initial thought is to use a combination of Auto Increment id + timestamp converting the decimal value to hex so it looks like a random string. And then finally prefixing and suffixing it with 2 digit random string.
function generateUID($num){
$str = dechex(time()+ intval($num));
$prefix = dechex(rand(1,15));
$suffix = dechex(rand(1,15));
return strtoupper($suffix.$str.$prefix);
}
Where $num is the auto_increment id
Returns something like E53D42A046
Is this the right way to go about doing this, are there collision issues ?
I thank all responses..!
I acknowledge the usefulness of uniqid() but in this context to be genuinely unique Auto_Increment need to play a significant part so how will it do so in uniqid. Passing it as a prefix would result in a Product id which vary greatly in size. (153d432da861fe, 999999953d432f439bc0).
To expand the scope further, Ideally we want a unique code which looks random with fairly consistent length and could be reversed to the auto_increment id from which it was created.
Such a function already exists - uniqid()
http://php.net/manual/en/function.uniqid.php
It works based on the timestamp down to the microsecond - you can add a prefix based on the process ID to further refine it. There are a couple more robust versions out there as well - see PHP function to generate v4 UUID
I am probably thinking about this wrong but here goes.
A computer starts spitting out a gazillion random numbers between 11111111111111111111 and 99999999999999999999, in a linear row:
Sometimes the computer adds a number to one end of the line.
Sometimes the computer adds a number to the other end of the line.
Each number has a number that comes, or will come, before.
Each number has a number that comes, or will come, after.
Not all numbers are unique, many, but not most, are repeated.
The computer never stops spitting out numbers.
As I record all of these numbers, I need to be able to make an educated guess, at any given time:
If this is the second time I have seen a number I must know what number preceded it in line last time.
If it has appeared more than two times, I must know the probability/frequency of numbers preceding it.
If this is the second time I have seen a number, I must also know what number came after it in line last time.
If it has appeared more than two times, I must know the probability/frequency of numbers coming after it.
How the heck do I structure the tables in a MySQL database to store all these numbers? Which engine do I use and why? How do I formulate my queries? I need to know fast, but capacity is also important because when will the thing stop spitting them out?
My ill-conceived plan:
2 Tables:
1. Unique ID/#
2. #/ID/#
My thoughts:
Unique ID's are almost always going to be shorter than the number = faster match.
Numbers repeat = fewer ID rows = faster match initially.
Select * in table2 where id=(select id in table1 where #=?)
OR:
3 Tables:
1. Unique ID/#
2. #/ID
3. ID/#
My thoughts:
If I only need left/before, or only need after/right, im shrinking the size of the second query.
SELECT # IN table2(or 3) WHERE id=(SELECT id IN table1 WHERE #=?)
OR
1 Table:
1. #/#/#
Thoughts:
Less queries = less time.
SELECT * IN table WHERE col2=#.
I'm lost.... :( Each number has four attributes, that which comes before+frequency and that which comes after+frequency.
Would I be better off thinking of it in that way? If I store and increment frequency in the table, I do away with repetition and thus speed up my queries? I was initially thinking that if I store every occurrence, it would be faster to figure the frequency programmatically.......
Such simple data, but I just don't have the knowledge of how databases function to know which is more efficient.
In light of a recent comment, I would like to add a bit of information about the actual problem: I have a string of indefinite length. I am trying to store a Markov chain frequency table of the various characters, or chunks of characters, in this string.
Given any point in the string I need to know the probability of the next state, and the probability of the previous state.
I am anticipating user input, based on a corpus of text and past user input. A major difference compared to other applications I have seen is that I am going farther down the chain, more states, at a given time and I need the frequency data to provide multiple possibilities.
I hope that clarifies the picture a lot more. I didn't want to get into the nitty gritty of the problem, because in the past I have created questions that are not specific enough to get a specific answer.
This seems maybe a bit better. My primary question with this solution is: Would providing the "key" (first few characters of the state) increase the speed of the system? i.e query for state_key, then query only the results of that query for the full state?
Table 1:
name: state
col1:state_id - unique, auto incrementing
col2:state_key - the first X characters of the state
col3:state - fixed length string or state
Table 2:
name: occurence
col1:state_id_left - non unique key from table 1
col2:state_id_right - non unique key from table 1
col3:frequency - int, incremented every time the two states occur next to each other.
QUERY TO FIND PREVIOUS STATES:
SELECT * IN occurence WHERE state_id_right=(SELECT state_id IN state WHERE state_key=? AND state=?)
QUERY TO FIND NEXT STATES:
SELECT * IN occurence WHERE state_id_left=(SELECT state_id IN state WHERE state_key=? AND state=?)
I'm not familiar with Markov Chains but here is an attempt to answer the question. Note: To simplify things, let's call each string of numbers a 'state'.
First of all I imagine a table like this
Table states:
order : integer autonumeric (add an index here)
state_id : integer (add an index here)
state : varchar (?)
order: just use a sequential number (1,2,3,...,n) this will make it easy to search for the previous or next state.
state_id: a unique number associated to the state. As an example, you can use the number 1 to represent the state '1111111111...1' (whatever the length of the sequence is). What's important is that a reoccurrence of a state needs to use the same state_id that was used before. You may be able to formulate the state_id based on the string (maybe substracting a number). Of course a state_id only makes sense if the number of possible states fits in a MySQL int field.
state: that is the string of numbers '11111111...1' to '99999999...9' ... I'm guessing this can only be stored as a string but if it fits in an integer/number column you should try it as it may well be that you don't need the state_id
The point of state_id is that searching number is quicker than searching texts, but there will always be trade-offs when it comes to performance ... profile and identify your bottlenecks to make better design decisions.
So, how do you look for a previous occurrence of the state S_i ?
"SELECT order, state_id, state FROM states WHERE state_id = " and then attach get_state_id(S_i) where get_state_id ideally uses a formula to generate a unique id for the state.
Now, with order - 1 or order + 1 you can access the neighboring states issuing an additional query.
Next we need to track the frequency of different occurrences. You can do that in a different table that could look like this:
Table state_frequencies:
state_id integer (indexed)
occurrences integer
And only add records as you get the numbers.
Finally, you can have tables to track frequency for the neighboring states:
Table prev_state_frequencies (next_state_frequencies is the same):
state_id: integer (indexed)
prev_state_id: integer (indexed)
occurrences: integer
You will be able to infer probabilities (i guess this is what you are trying to do) by looking at the number of occurrences of a state (in state_frequencies) vs the number of occurrences of it's predecessor state (in prev_state_frequencies).
I'm not sure if I got your problem right but if this makes sense I'm guessing I have.
Hope it helps,
AH
It seems to me that the Markov Chain is finite, so first I would start by defining the limit of the chain (i.e. 26 characters with x number of spaces to fill) then you can calculate the total number of possible combinations. to determine the probability of a certain arrangement of characters the math if I remember correctly is:
x = ((C)(C))(P)
where
C = the number of possible characters and
P = the total potential outcomes.
this is a ton of data to store and creating procedures to filter through the data could turn out to be a seemingly endless task.
->
if you are using an auto incremented id in your table you could query the table and use preg_match to test the new result against the previous results then insert the number of total matches with the new result into the table, this would also allow you to query the preceding results to see what came before it this should give you a general idea of the pattern within the results as well as a general base for statistical relevance and new algorithm generation
After searching SO and other sites, I've failed to come up with conclusive evidence to how Facebook, Twitter and Pinterest generate their ID's. The reason this is needed is to avoid url collisions. Moving to an entirely different ID will prevent this because there wont be quadrillions of records.
Facebook.com/username/posts/362095193814294
Pinterest.com/pin/62487513549577588
Twitter.com/#!/username/status/17994686627061761
If you look at Pinterest as an example, the first few digits relate to the user id, and the last 6 or so digits represent the save id which possibly could be an auto increment.
To create a similar ID, but not unique I was able to use: base_convert(user_id.save_id, 16, 10). The problem here is that it's not unique, ex: base_convert(15.211, 16, 10) vs. base_convert(152.11, 16, 10). These two are the same. Simply just merging two unique sets of numbers will still produce duplicate results. Throwing uniqid() into the mix will essentially fix the duplicates, but this doesn't seem like a great practice.
Update: Twitter appears to use this: https://github.com/twitter/snowflake
Any suggestions on generating a unique ID like the above examples?
Suppose your IDs are all numeric. Delimit them by a character A (since it surely does not appear in the original IDs) and do a base conversion from base-11 to base-10.
For the example you did we now get different results:
echo base_convert("15A211", 11, 10); //247820
echo base_convert("152A11", 11, 10); //238140
The Flickr comment up above was very useful. We use sharding as well. We have an bigint (int64) locator field. It is generated by combining an int (int32) database id and an int (int32) identity field.
If you know you will have an int16 number of database max (quite likely), you could combine an int16 (smallint) database id and an int32 (int) user id and an int16 (smallint) action id. I don't know reasonable numbers for your application. But reserve some part for the database id, even if it's just tinyint, so you know you're future safe if you add more databases.
Actually, if you look at (for example) the IDs of users on your Friends (on Facebook), you'd notice that they are sequential among all users, exactly like an AUTO_INCREMENT database field. However, they probably don't start at 1. My friends list, for example, has some numbers in the millions, then suddenly jump to 1 trillion and something, so my guess is that the auto_increment value was bumped up - this may be done to "hide" exactly how many users there are.
Anyway, to generate unique IDs, just create them sequentially with that AUTO_INCREMENT field. Optionally, set the initial value to something high.
Suppose Table1 contains column orderid (not a key, although it's NOT NULL and unique). It contains 5-digit numbers.
What's the best way to generate a php var $unique_var that is not in that column.
What could be important, from 10% to 30% of 5-digit numbers are in the table (so generating a number and checking while (mysql_num_rows() ==0) {} is not a best way to find one, are there any better solution for performance?).
Thank you.
If there is just 10-30% of numbers already taken - then it means that only 10-30% of queries will be performed at least twice. Which is not a big performance issue at all.
Otherwise - just create all-5-digit-numbers-list table (just 100k rows) and remove all that exist. When you need another random number - just pick one and delete.
I would suggest finding the biggest number (with a MAX() clause) and start from there.
Here are a couple of suggestions. Each has its drawbacks.
Pre-populate your table and add a column to indicate that a number is unused. Select an unused number using LIMIT = 1 and mark it used. This uses a lot of space.
Keep a separate table containing previously used numbers. If that table is empty, generate numbers sequentially from the last used number (or from 00001 if Table1 is empty). This requires some extra bookkeeping.
Is there any way I could make my model ID (primary key) generated into random unique 8 digits containing only numbers instead of the default auto increment?
A client requested this specific 8-digits-number-only feature, so I can't argue much about the reasons.
I want to use the PHP uniqid but it's 13 digits and contains alphabets as well.
Any idea?
Thanks.
Update
I forgot to tell that I need the ID randomly generated each new record being saved.
Just want to ask the mechanism on generating the ID and then saving the ID (also the attributes). Do I have to check the database first for the randomly generated ID whether another same key already exists and then save the attributes or what?
Why dont you keep the auto increment but set it to start from 10000000 on your primary key instead of 1?
ALTER TABLE some_table AUTO_INCREMENT=10000000
Yes you can. I assume you are on MySQL, when talking about AI. Just do not set it as auto increment and insert the value as for the other columns. You can create a function or method, which will take up to 8 numbers randomly or in specific order (algorithm).
INSERT INTO model (id, name, value, etc) VALUES (87654321, 'My selected name', 'some price or text', 'etc').
Consider that INT(11) value may accept from -2147483648 to 2147483647. Which will fit you for numbers with 8 digits. If at later time the client request bigger numbers you may need to switch to BIGINT.
I use to set the Primary Keys as unsigned, which allows you to fit numbers between 0 and 4294967295.
For php function - generator of 8 digits:
<?php
mt_srand();
$id = mt_rand(10000000, 99999999);
?>
You can read more about mt_srand() and mt_rand() on php documentation. It is said that they are better than the srand() and rand().
Keep the ID, but pad it.
$id = 6;
$padded_id = sprintf("%013d", $id);
// This will print 0000000000006
That'll pad the $id so that it's 13 digits long.
Every time you need to display the ID use a function to convert it, like this.
function padId($id){
return sprintf("%013d", $id);
}
Or you could make a row in your table called pad-id, then run this function when you create a record (along with mysql_insert_id (to get the ID just inserted)).
The best approach depends on a subtle aspect of your client's randomness requirement --
When they say random do they mean completely unpredictable or just hard to predict? I don't mean to sound like Clinton at the Lewinski trial, but what your client intends when they say random affects whether it will even be possible for you to meet the requirement.
If the client wants to hide user IDs (for some perceived security benefit) and make them virtually impossible to predict or reverse-engineer, then that is very difficult. If the client would be satisfied with just "hard" to predict (which I suspect), then you can do something simple, similar to the md5 approach (#Dotty). But md5 is not collision resistant. And even with the best, provably unique hash algorithms (which md5 is not), you'll have a collision problem if the number of users is large compared to the number of digits you are allowed for user ID's (8). You have about 27 bits to work with in the 8 decimal digits allowed. Which means you're likely to get a collision after 2^N/2 = 2^(27/2) which is about 10K users. So if your client's user list approaches 10K users, then even the best hash algorithm will spend a lot of time filtering out all the collisions.
To solve this without filters and nondeterministic algorithms, just use a simple "Full Cycle" algorithm. Some will produce pseudo-random numbers (PRNs) that are guaranteed to be unique and guaranteed to fully span whatever range you're trying to cover (e.g., the set of all 8-digit positive integers). And if you ever need to reverse engineer the user registration sequence just rerun the full cycle PRN generator again with whatever initial value you used. And you can keep this initial value a secret, like a private key, if your client wants to make it slightly more difficult than easy for a hacker to reverse-engineer your user ID sequence.
Another question for your client is whether leading zeros are allowed in the user id. If so, (and the client's randomness requirements are liberal) then the simple Full Cycle algorithm on Wikipedia will work nicely for you. It could be distilled to 2 lines of PHP.
Whatever algorithm you use, it might be good to actually generate the list of official 8-digit semi-random user IDs in a separate table, and then just "pop" the value from the top of the table (deleting that row) whenever you add a new user. The database memory requirements shouldn't be prohibitive and it will streamline the user experience, eliminating any delays and memory gobbling caused by sophisticated, nondeterministic, random number generators and uniqueness filters. Trying to create the user ID online, live, it's conceivable you could get into a perpetual loop with some hash algorithms stalling your user registration indefinitely. And this stall (due to perpetual collision) might not occur until user 1000 or 10000. In contrast, with the offline lookup table approach, you can easily add additional client-prescribed filters like eliminating IDs with leading zeros; in case the client never wants to see a user with the ID 1 (00000001). And you'd know in advance whether everything is going to always work, without any hangs.