A table in a MySQL database has a column for e-mail addresses. Ultimately the e-mail addresses are supposed to be unique and have valid formats. I'm having trouble deciding where the checking should be done and what checking is necessary.
Obviously SQL alone can't entirely validate an e-mail address but I was considering adding the NOT NULL constraint to prevent the submission of blank e-mail addresses. Since each e-mail address must be unique making the e-mail column a unique key seems reasonable, but just because a column is a unique key doesn't make it NOT NULL right? Since I'm probably going to be validating the e-mail address on the server using PHP I could just as well check to see if it's empty there.
A critical piece of information I'm making is does adding a unique key or a constraint make searches faster or slower?
For a column that holds e-mail addresses where there should be no duplicates and no empty strings/nulls etc. should it be made a unique key and/or given a NOT NULL constraint or something else?
I'm very novice with MySQL so code samples would be helpful. I've got phpMyAdmin if it's easier to work with.
For the unique I would use ALTER TABLE USER ADD UNIQUE INDEX(``e-mail``);
For the not null I would use ALTER TABLE user CHANGE ``e-mail`` varchar(254) NOT NULL;
Another idea I had was insert a row with a null e-mail address and then make the e-mail column unique so no other null e-mail addresses can be inserted.
Adding a unique constraint will actually make searches faster, because it will index the table by this field. Based on your description of the problem, I think your alter table statements are correct.
Fields with unique indexes can still allow nulls. nulls can never be equal to anything else, including themselves, so multiple nulls are not a violation of the uniqueness constraint. You can disallow nulls in the field by specifying it as NOT NULL, however.
A unique key is a normal field index, that simply doesn't allow multiple instances of a particular value. There will be a slight slowdown on insert/update so the key can be updated, but searches will be faster, because the index can (in some cases) be used to accelerate the search.
The answers so far are good, and I would recommend using UNIQUE KEY and NOT NULL for your application. Using UNIQUE KEY may slow down INSERT or UPDATE, but it would certainly not slow down searches.
However, one thing you should consider is that just because you use UNIQUE KEY, it does not necessarily enforce unique e-mail addresses. As an example, abc#gmail.com and a.b.c#gmail.com represent the same e-mail. If you don't want to allow this, you should normalize e-mail addresses in PHP before sending them to your database.
With MySQL you have to remember that unique index depends on the collation of your whole table (in other db you can make on upper() function).
See this link:
http://sqlfiddle.com/#!2/37386/1
Now, if you use utf8_general_ci insted of utf8_bin the index creation would fail.
Related
I need to make my table only allow unique values across two columns - a compound key.
Would it be more efficient to do this at database level, or should I not create a compound key and let my application check if a record exists for the two unique values about to be submitted, and if so, do not create the row?
Compound UNIQUE or PRIMARY keys pose little or no extra burden on database servers beyond what the keys' data types pose. (Data types like long strings with case-insensitive collation make for relatively expensive unique keys, for example. BIGINTs are cheap.)
And, if you use compound unique keys, your dbms will serve as a backstop to your application code. If there's a duplicate, you probably don't want to say "Exception: 1062 Duplicate" to your users. You'd be better off saying "that combination is already taken."
In the DBMS, go for it. You need support for this feature set both in your table's unique key and your app to get this right.
everyday i add almost 5000 new records in mysql and i want to prevent insert duplicate row in table,i think i should check all of the bank befor any insert operation,is it suitable?
Or there is any better way to do that??
thanks in advance
It's a good choice to prevent the data model beeing corrupted by software by applying a unique index to the field attributes which must not be duplicatable.
It's even better to ask the database for duplicate candidates before inserting data.
The best is, to have both combined. The security on the database model and the question for duplicates in the software layer because a) error handling is much more expensive than querying and b) the constraint protects the data from human failure.
mysql supports unique indexes with the CREATE UNIQUE INDEX statement.
e.g: CREATE UNIQUE INDEX IDX_FOO ON BAR(X,Y,Z);
creates a unique index on table BAR. This index will also be used when running the query for duplicates - speeds up the processing very much.
See MySQL Documentation for more details.
When you have a data integrity issue, you want the database to enforce the rules (if possible). In your case, you do this with a unique index or unique constraint, which are two names for the same thing. Here is sample syntax:
create unique index idx_table_col1_col2 on table(col1, col2)
You want to do this in the database, for three reasons:
You want the database to know that that column is unique.
You do not want a multi-threaded application to "accidentally" insert duplicate values.
You do not want to put such important checks into the application, where they might "accidentally" be removed.
MySQL then has very useful constructs to deal with duplicates, in particular, insert . . . on duplicate key update, insert ignore, and replace.
When you run SQL queries from your application, you should be checking for errors anyway, so catching duplicate key errors should be no additional burden on the application.
Firstly, any column that needs to be unique you can use the UNIQUE constraint:
CREATE TABLE IF NOT EXISTS tableName
(id SERIAL, someUniqueColumnName VARCHAR(255) NOT NULL UNIQUE);
See the MySQL Documentation for adding uniqueness to existing columns.
You need to decide what constitutes a duplicate in your table, because uniqueness is not always restricted to a single column. For instance, in a table where you store users with a corresponding id for something else, then it may be both combined which have to be unique. For that you can have PRIMARY KEY which uses two columns:
CREATE TABLE IF NOT EXISTS tableName (
id BIGINT(20) UNSIGNED NOT NULL,
pictureId BIGINT(20) UNSIGNED NOT NULL,
someOtherColumn VARCHAR(12),
PRIMARY KEY(id, pictureId));
$db->query("SELECT * FROM ".DB_PREFIX."users WHERE uid='".$uid_id."' AND login='ExpressCheckoutUser'");
if ($db->moveNext())
{
$db->assignStr("address1", $_REQUEST['address_street']);
$db->assignStr("city", $_REQUEST['address_city']);
$db->assignStr("state", $_REQUEST['address_state']);
$db->assignStr("fname", $_REQUEST['first_name']);
$db->assignStr("lname", $_REQUEST['last_name']);
$db->assignStr("email", $_REQUEST['payer_email']);
$db->assignStr("country", $country_code);
$db->assignStr("zip", $_REQUEST['address_zip']);
$db->update(DB_PREFIX."users", "WHERE uid='".$uid_id."'");
$db->reset();
}
everytime i make payment via paypal, my info will be captured in database but i wanted to prevent duplicates. so how do i go around it? Or should I check email duplicates?
EDIT
As far as I can tell, uid is set to primary by pinnaclecart. so wouldnt it be 'dangerous' to set it to be unique instead?
First and last name are nice, but everything but unique. I know a few people that have the same name I do, so I guess building a unique index on those two columns will only frustrate, not help. The thing that makes me unique though is that I am the only one who has both my e-mail address and password, so I think that would be a better candidate.
ALTER TABLE users ADD UNIQUE unique_emailaddress ( email );
That should at least help with some of the duplicates, but not all: users may have multiple e-mail addresses (I know I do ;)), but it still better than an arbitrary combination of first and last name which isn't unique at all.
If all you need is a single UNIQUE column, you can do something like:
ALTER TABLE `users` ADD UNIQUE `lname`(fname);
If you set a column to UNIQUE it will only make that column unique, so if you have two people, one named "John Smith" and another named "Jane Smith", a UNIQUE on the lname will cause the second to fail. If you set UNIQUE keys on both first and last name fields separately, then you will fail in either case of first or last names being the same.
You will probably instead wish to add a compound key to enforce uniqueness across multiple fields combined. For this:
ALTER TABLE `users` ADD UNIQUE `unique_key`(fname,lname);
This would force a constraint in the database that would throw an error if you tried to create a duplicate record with the same first and last name.
You can throw exceptions on error, and handle these higher up in your codebase, or you can instead just see that you have an error and choose to ignore it.
Considering your last edit which says
uid is set to primary by pinnaclecart. so wouldnt it be 'dangerous' to set it to be unique instead?
In this case, don't do that. And don't do anything at all, PRIMARY KEY is UNIQUE by default, so it can not be duplicated.
Create a UNIQUE index in the database.
I'm thinking of this, if I make a web site for a specific university's students should I make the ID as standard IDs on MySQL (integer, auto increment) or should I give the IDs as how is their email address is, like if the student's email address is e12234#university.edu then his/her id is e12234 (on my web site). Is it ok, what about performance?
Edit: Also, there are such email addresses:
n13345#university.edu (exchange student)
jennajameson#university.edu (this is a professor)
I would strongly recommend a separate, independent value for the id (integer, auto increment). Id values should never change, never be updated. People change their emails all the time, corporations reissue the same email address to new users sometimes.
If an emailaddress is unique and static in your population (and make very sure it is), you may make it a primary key, and actually a full normalization would favor that option. There are however some pitfalls to consider:
People change emailaddresses once in while. What if a student becomes a professor, or is harassed on his/hers emailaddress so he/she applied for a new address and got one? The primary key shold not change, ever, so there goes your schema.
Sanitizing emailaddresses takes a little bit more effort then integers.
Depending on how many foreign keys point to this ID, needed storage space could be increased, and joining on CHARs rather then INTs could suffer in performance (you should test that though)
Generally you'd want to map strings to ids and reference the ID eveywhere
CREATE TABLE `student` (
`id` int unsigned NOT NULL auto_increment,
`email` varchar(150) NOT NULL
PRIMARY KEY (`id`)
)
This will reduce the size of any table reference the email table as it will be using an INT instead of a VARCHAR.
Also if you used part of their email and the user ever changed their email you'd have to go back through every table and update their ID.
For example, I'm doing the next action:
SELECT COUNT(id)
FROM users
WHERE unique_name = 'Wiliam'
// if Wiliam don't exists then...
INSERT INTO users
SET unique_name = 'Wiliam'
The question is, I'm doing the SELECT COUNT(id) check every time I insert a new user, despite of using an unique key or not, so... if "unique_name" has an UNIQUE key it will be better for performance than using a normal key?
What you mean is a UNIQUE CONSTRAINT on the column which will be updated. Reads will be faster, Inserts will be just a bit slower. It will still be faster than your code checking first and then inserting the value though. Just let mysql do its thing and return an error to you if the value is not unique.
You didn't say what this is for, which would help. If its part of an authentication system, then why doesn't your query include the user's password as well? If it's not, a unique indexed column used to store names isn't going to work very well in a real-world system unless you are OK with having just 1 and only Wiliam in your system. (Was that supposed to be William?)
And if that name field is really unique you do not need to use COUNT(ID) in your query. If 'unique_name' is truly unique you either get an id number returned from your query or you get nothing.
You'd want something like this:
SELECT id FROM users WHERE unique_name = 'Wiliam'
No record return, no Wiliam.
An index (unique or non-unique -- I don't know what you're after here) on unique_name will improve the performance.
Your use of 'unique key' isn't very logical so I suspect you are getting confused about the nomenclature of keys, indexes, their relationships, and the purposes for them.
KEYS in a database are used to create and identify relationships between sets of data. This is what makes the 'relational' possible in a relational database.
Keys come in 2 flavors: Primary and foreign.
PRIMARY KEYS identify each row in a table. The value or values that comprise the key must be unique.
Primary keys can be made from a single column or made of several columns (in which case it is called a composite key) that together uniquely identifies the row. Again the important thing here is uniqueness.
I use MySql's auto-increment integer data type for my primary keys.
FOREIGN KEYS identify which rows in a table have a relationship with other rows in other tables. A foreign key of a record in one table is the primary key of the related record in the other table. A foreign key is not unique -- in many-to-many relationships there are by definition multiple records with the same foreign key. They should however be indexed.
INDEXES are used by the database as a sort of short-hand method to quickly look up values, as opposed to scanning the entire table or column for a match. Think of the index in the back of a book. Much easier to find something using a book's index than by flipping through the pages looking for it.
You may also want to index a non-key column for better performance when searching on that column. What column do you use frequently in a WHERE clause? Probably should index it then.
UNIQUE INDEX is an index where all the values in it must be distinct. A column with a unique index will not let you insert a duplicate value, because it would violate the unique constraint. Primary keys are unique indexes. But unique indexes do not have to be primary keys, or even a key.
Hope that helps.
[edited for brevity]
Having a unique constraint is a good thing because it prevents insertion of duplicated entries in case your program is buggy (are you missing a "for update" clause in your select statement?) or in case someone inserts data not using your application.
You should, however, not depend on it in your application for normal operation. Lets assume unique_name is an input field a user can specify. Your application should check whether the name is unique. If it is, insert it. If it was not, tell the user.
It is a bad idea to just try the insert in all cases and see if it was successful: It will create errors in the database server logs that makes it more difficult to find real errors. And it will render your current transaction useless, which may be an issue depending on the situation