I have to create unique codes for each "company" in my database.
The only way I see this to be possible is to create a random number with rand() and then check if the number exists for this "company" in the DB, if it does recreate.
My question is: Is there not a better way to do this - a more efficient way. As if I am creating 10 000 codes and there are already 500 000 in the DB it's going to get progressively slower and slower.
Any ideas or tips on perhaps a better way to do it?
EDIT:
Sorry perhaps I can explain better. The codes will not all be generated at the same time, they can be created once a day/month/year whenever.
Also, I need to be able to define the characters of the codes for example, alpha numberic or numbers only
I recommend you to use "Universally Unique Identifier": http://en.wikipedia.org/wiki/Universally_unique_identifier to generate your random codes for each company. In this way you can avoid checking your database for duplicates:
Anyone can create a UUID and use it to identify something with
reasonable confidence that the same identifier will never be
unintentionally created by anyone to identify something else.
Information labeled with UUIDs can therefore be later combined into a
single database without needing to resolve identifier (ID) conflicts.
In PHP you can use function uniqid for this purpose: http://es1.php.net/manual/en/function.uniqid.php
MySQL's UUID Function should help. http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_uuid
INSERT INTO table (col1,col2)VALUES(UUID(), "someValue")
If the codes are just integers then use autoincrement or get the current max value and start incrementing it
Related
I'm trying to create a URL similar to youtube's /v=xxx in look and in behavior. In short, users will upload files and be able to access them through that URL. This URL code needs to be some form of the database's primary key so the page can gather the data needed. I'm new to databases and this is more of a database problem than anything.
In my database I have a auto increment primary key which file data is accessed by. I want to use that number to to create the URL for files. I started looking into different hash functions, but I'm worried about collisions. I don't want the same URL for two different files.
I also considered using uniqid() as my primary key CHAR(13), and just use that directly. But with this I'm worried about efficiency. Also looking around I can't seem to find much about it so it's probably a strange idea. Not to mention I would need to test for collisions when ids are generated which can be inefficient. Auto increment is a lot easier.
Is there any good solution to this? Will either of my ideas work? How can I generate a unique URL from an auto incremented primary key and avoid collisions?
I'm leaning toward my second idea, it won't be greatly efficient, but the largest performance drawbacks are caused when things need to be added to the database (testing for collisions), which for the end user, only happens once. The other performance drawback will probably be in the actual looking of of chars instead of ints. But I'm mainly worried that it's bad practice.
EDIT:
A simple solution would to be just to use the auto incremented value directly. Call me picky, but that looks kind of ugly.
Generating non colliding short hash will indeed be a headache. So, instead the slug format of Stackoverflow is very promising and is guaranteed to produce non duplicate url.
For example, this very same question has
https://stackoverflow.com/questions/11991785/unique-url-from-primary-key
Here, it has unique primary key and also a title to make it more SE friendly.
However as commented, they are few previously asked question, that might clear out, why? what you are trying is better left out.
How to generate a unique hash for a URL?
Create Tinyurl style hash
Creating short hashes increases the chances a collision a lot, so better user base64 or sha512 functions to create a secured hash.
You can simply make a hash of the time, and afterwards check that hash (or part of that hash in your DB.
If you set an index on that field in your DB (and make sure the hash is long enough to not make a lot of collisions), it won't be an issue at all time wise.
<?php
$hashChecked = false;
while( $hashChecked === false ){
$hash = substr( sha1(time().mt_rand(9999,99999999)), 0, 8); //varchar 8 (make sure that is enough with a very big margin)
$q = mysql_query("SELECT `hash` FROM `tableName` WHERE `hash` = '".$hash."'");
$hashChecked = mysql_num_rows() > 0 ? false : true;
}
mysql_query("INSERT INTO `tableName` SET `hash` = '".$hash."'");
This is fairly straightforward if you're willing to use a random number to generate your short URL. For example, you can do this:
SELECT BASE64_ENCODE(CAST(RAND()*1000000 AS UNSIGNED INTEGER)) AS tag
This is capable of giving you one million different tags. To get more possible tags, increase the value by which the RAND() number is multiplied. These tag values will be hard to predict.
To make sure you don't get duplicates you need to dedupe the tag values. That's easy enough to do but will require logic in your program. Insert the tag values into a table which uses them as a primary key. If your insert fails, try again, reinvoking RAND().
If you get close to your maximum number of tags you'll start having lots of insert failures (tag collisions).
BASE64_ENCODE comes from a stored function you need to install. You can find it here:
http://wi-fizzle.com/downloads/base64.sql
If you're using MySQL 5.6 or higher you can use the built-in TO_BASE64 function.
I wanted to do something similar (but with articles, not uploaded documents), and came up with something a bit different:
take a prime number [y] (much) larger than the max number [n] of documents there will ever be (e.g. 25000 will be large enough for the total number of documents, and 1000099 is a much larger prime number than 25001)
for the current document id [x]: (x*y) modulus (n+1)
this will generate a number between 1 and n that is never duplicated
although the url may look like a traditional primary key, it does have the slight advantage that each subsequent document will have a id which is totally unrelated to the previous one; some people also argue that not including the primary key also has a very slight security advantage...
Before you read on I just want to make something perfectly clear, Im not looking for someone to code this for me, I just need to know whether it would be possible for me to do this, as I don't have that long to spend on this part of the task, and I thought it'd be best to ask the experts/guys on Stackoverflow.
So my question is:
I have a number of questionnaires, which will be completed by participants, since the answers are strings, I was wondering whether it would be possible to store these answers as integers, for example, you'd have:
1 ="Never True" 2="Rarely True" 3="Sometime true" 4="Often True" 5="Very Often True". I want to store only the numbers, I was just wondering whether that would be possible.
Thank You.
Finished
I think before I had worded the question quite badly, my bad. However, I did manage to complete that part, I stored the values in the database, which I assigned to each of the answers. Also as #octern had mentioned to create a code, this was also very handy, so thank you.
I appreciate all the responses, and your time for dealing with this question.
You can. If your participants are using an HTML form to submit their answers, just set the VALUE attribute of the form element to the numeric code for the answer, and that's what will go in your database.
Just make very, very sure that you create a codebook so you can figure out which number corresponds to which answer in the future! You don't want to rely on parsing the web page, which may have changed over time.
Yes, you can store integers in a database. And obviously you can assign your own meaning to those integers.
Yes you can do that. Alternatively, You can create a table called "possible_answers" for example with id as the primary key and text as the question. The table that stores the answers to a question can now have a foreign key to the possible_answers table to make sure the integer saved is always valid.
There are two basic ways to store the values as integers in a relatively easy way to read and understand.
The simplest way is to store them as an ENUM because the value stored in the row is actually an index to the list of possibilities in the ENUM declaration. The problem with ENUMs is the list of possible answers must be known in advance, and the list needs to remain relatively static unless you like taking outages to change the table's structure each time you want to maintain that list.
The most flexible way is to create a table with an ID column (typically an auto_increment or serial of some sort) and the label used for that number. Then, the original table simply refers to the ID column in the "other" table. This is commonly referred to as a foreign key reference.
In MySQL, the performance of ENUMs versus foreign key references is nearly identical.
Yes, you can certainly store integers in a database. Keep in mind that if you do this, you will need to "translate" those integers into the actual answers in your application, probably with a switch statement.
I am looking to generate a random number for every user contribution as a title of the contribution.
I could simply check the database each time with a query and generate a number which does not equal to any of the entries of the database. But I imagine this as inefficient and it could become slow if the database is big in my opinion. Also I'd have to contain all the numbers of the database somewhere to manage the "not equals to", in an array or something similar but that can end up as a giant one.
Excuse the layman's speech I am new to this.
Any suggestions how this can be solved efficiently without straining the resources too much? You can explain it linguistically and do not have to provide me any scripts, I will figure it out.
You can use uniqid(). I'm not sure how portable it is.
Example:
printf("uniqid(): %s\r\n", uniqid());
Will output something like:
uniqid(): 4b3403665fea6
uniqid() will give you a random number that can technically repeat.
Maybe you can apply a simple algorithm on an auto-increment field? n(n+1)/2 or something?
So i noticed in a script i'm using that the id row in the database i have set up is started at 1728.
Is there any specific benefits in starting a database id number at a large number or anything other then 1 ??
It looks somewhat cool for someone.
profile.php?id=1728 looks better than profile.php?id=1
But in your case, it's probably wrong SQL dump which had AUTO_INCREMENT 1728
Not that I'm aware of, although I've seen it used as a very simple security measure, which prevents the first user in a table of users (typically the admin / creator) from having user ID = 1.
No, there are no benefits. As long as the id is unique, it doesn't matter. Some developers prefer to start ids for some rows higher because it seems to look better in a url. For example, this url:
http://www.example.com/user/profile.php?id=142541
looks better than:
http://www.example.com/user/profile.php?id=1
I don't know of any real reason. Perhaps then people don't guess that your admin user ID is 1.
I have seen tables start with non-1 IDs when using auto-incrementing IDs. Every attempted insert will increment even if the insert fails. In your case the table may have incremented to 1728 while the script was being developed so the first "real" record was 1728.
We can only guess. There are no technical benefits as such. But there may be soft benefits. I can imagine that it's done with the intention of having some reserved IDs for old/previous backup .sql dumps or even default database entries.
I occasionally start IDs (if I really have to expose database-internal numbering in the UI) at larger numbers like 1000, so I get e.g. 4-digit numbers for all IDs. Not technical necessary, but may look more consistent.
#Jake Chapman, The reason behind it is that if one see
profile.php?id=1 or profile.php?id=2 ...
all these takes programmer's attention, and tricky programmer would like to play some hacking tricks because they know well what can be done to it.
Numbers likes
profile.php?id=1343 or profile.php?id=2543 ...
Some confusing and don't take attntion suddenly that is why.
I have an idea. It might be bad, for reasons not known by me, but I would really appreciate your feedback on this!
We've been using Session ID's in a PHP project. And for the first time in 4 years, a duplicate sessionid was generated. Luckily enough I randomly decided to go looking through the Customers Table because I was bored and noticed there was a duplicate entry in the sessionid column and changed it, and references to it, before any real problems occured.
Which led (or lead?) me to ask myself this question:
Basically, any type of ID, be it a session id, uuid or guid can eventually be duplicated because the length of them is not infinite. So then I got thinking, because we never want this to happen again.
I thought, What if we use a combination of date and time, as an identifier? This wouldn't be "random", but it would solve the duplication issue. For example:
14th of May 2011 04:36:05PM
If used as an identifier, could be changed to:
14052011163605
The reason this form of ID will never become duplicated is because no date, combined with time in the future will ever be the same as one in the past, or present.
And since, in our case, these ID's are not meant to be seen by anybody, there's no reason for it to be random, is there?
I'd love to hear your thoughts on this, and, how you approach situations like this. What's your best method of generating [practically] unique ID'S?
The reason this form of ID will never become duplicated is because no date, combined with time in the future will ever be the same as one in the past, or present.
If you only had one user, this would be true.
UUID / GUIDs are very large (larger than the count of particles in visible universe)
your date/time solution will fail on high loads. what happens when i need 100+ new ids per second ?
Why not just make your session ID column a unique column, and then generate a new session ID if you get a constraint violation error? That way the database will find this problem for you in basically the same way that you did (and as a bonus, it can fix it too).
UUIDs are already generated based on nanosecond intervals of time (see this wikipeida article). If you are using PHP, I'd suggest this page to take a look at how to generate the different versions, depending on your use:
http://php.net/manual/en/function.uniqid.php
The reason you can't guarantee uniqueness is that you have to eventually pick a size limit for the actual string/variable containing your UUID, so there is always the potential for a duplicate. Given the number of possibilities for a UUID, though, this should be practically impossible.
I agree with the other posters... this probably shouldn't ever happen. How are you actually generating unique IDs? Are you sure your code is properly creating them?