I need to make my table only allow unique values across two columns - a compound key.
Would it be more efficient to do this at database level, or should I not create a compound key and let my application check if a record exists for the two unique values about to be submitted, and if so, do not create the row?
Compound UNIQUE or PRIMARY keys pose little or no extra burden on database servers beyond what the keys' data types pose. (Data types like long strings with case-insensitive collation make for relatively expensive unique keys, for example. BIGINTs are cheap.)
And, if you use compound unique keys, your dbms will serve as a backstop to your application code. If there's a duplicate, you probably don't want to say "Exception: 1062 Duplicate" to your users. You'd be better off saying "that combination is already taken."
In the DBMS, go for it. You need support for this feature set both in your table's unique key and your app to get this right.
Related
everyday i add almost 5000 new records in mysql and i want to prevent insert duplicate row in table,i think i should check all of the bank befor any insert operation,is it suitable?
Or there is any better way to do that??
thanks in advance
It's a good choice to prevent the data model beeing corrupted by software by applying a unique index to the field attributes which must not be duplicatable.
It's even better to ask the database for duplicate candidates before inserting data.
The best is, to have both combined. The security on the database model and the question for duplicates in the software layer because a) error handling is much more expensive than querying and b) the constraint protects the data from human failure.
mysql supports unique indexes with the CREATE UNIQUE INDEX statement.
e.g: CREATE UNIQUE INDEX IDX_FOO ON BAR(X,Y,Z);
creates a unique index on table BAR. This index will also be used when running the query for duplicates - speeds up the processing very much.
See MySQL Documentation for more details.
When you have a data integrity issue, you want the database to enforce the rules (if possible). In your case, you do this with a unique index or unique constraint, which are two names for the same thing. Here is sample syntax:
create unique index idx_table_col1_col2 on table(col1, col2)
You want to do this in the database, for three reasons:
You want the database to know that that column is unique.
You do not want a multi-threaded application to "accidentally" insert duplicate values.
You do not want to put such important checks into the application, where they might "accidentally" be removed.
MySQL then has very useful constructs to deal with duplicates, in particular, insert . . . on duplicate key update, insert ignore, and replace.
When you run SQL queries from your application, you should be checking for errors anyway, so catching duplicate key errors should be no additional burden on the application.
Firstly, any column that needs to be unique you can use the UNIQUE constraint:
CREATE TABLE IF NOT EXISTS tableName
(id SERIAL, someUniqueColumnName VARCHAR(255) NOT NULL UNIQUE);
See the MySQL Documentation for adding uniqueness to existing columns.
You need to decide what constitutes a duplicate in your table, because uniqueness is not always restricted to a single column. For instance, in a table where you store users with a corresponding id for something else, then it may be both combined which have to be unique. For that you can have PRIMARY KEY which uses two columns:
CREATE TABLE IF NOT EXISTS tableName (
id BIGINT(20) UNSIGNED NOT NULL,
pictureId BIGINT(20) UNSIGNED NOT NULL,
someOtherColumn VARCHAR(12),
PRIMARY KEY(id, pictureId));
I have a website connected to a database. In one of its tables, one entity attribute that is not the primary key needs to be unique in that table.
Currently, I am querying the database before inserting a value into that column to check, if the value already exists. If it does, the value gets altered by my script and the same procedure starts again until no result gets back, which means it doesn't exist yet in the database.
While this works, I feel it's a great performance hog – even when the value is unique, the database needs to queried at least two times: One time for checking & one time for writing.
To improve performance & to make my (possible buggy/unnecessary) code obsolete, I have the idea to mark the column as Unique Key & to use a try/catch block for the writing/error handling process. That way, the database engine needs to handle the uniqueness, which seems a bit more reasonable than my query-write procedure.
Is this a good idea or are Unique Keys not made for this behavior? What is the typical use case of a Unique Key in a SQL database?
INSERT INTO table (uniquerow) VALUES(1) ON DUPLICATE KEY UPDATE uniquerow = 1;
With this statement, you can insert if it is unique and update if the key allready exists.
With unique constraints you can check a tuple of values not to be there multiple times, without being a primary key.
A table in a MySQL database has a column for e-mail addresses. Ultimately the e-mail addresses are supposed to be unique and have valid formats. I'm having trouble deciding where the checking should be done and what checking is necessary.
Obviously SQL alone can't entirely validate an e-mail address but I was considering adding the NOT NULL constraint to prevent the submission of blank e-mail addresses. Since each e-mail address must be unique making the e-mail column a unique key seems reasonable, but just because a column is a unique key doesn't make it NOT NULL right? Since I'm probably going to be validating the e-mail address on the server using PHP I could just as well check to see if it's empty there.
A critical piece of information I'm making is does adding a unique key or a constraint make searches faster or slower?
For a column that holds e-mail addresses where there should be no duplicates and no empty strings/nulls etc. should it be made a unique key and/or given a NOT NULL constraint or something else?
I'm very novice with MySQL so code samples would be helpful. I've got phpMyAdmin if it's easier to work with.
For the unique I would use ALTER TABLE USER ADD UNIQUE INDEX(``e-mail``);
For the not null I would use ALTER TABLE user CHANGE ``e-mail`` varchar(254) NOT NULL;
Another idea I had was insert a row with a null e-mail address and then make the e-mail column unique so no other null e-mail addresses can be inserted.
Adding a unique constraint will actually make searches faster, because it will index the table by this field. Based on your description of the problem, I think your alter table statements are correct.
Fields with unique indexes can still allow nulls. nulls can never be equal to anything else, including themselves, so multiple nulls are not a violation of the uniqueness constraint. You can disallow nulls in the field by specifying it as NOT NULL, however.
A unique key is a normal field index, that simply doesn't allow multiple instances of a particular value. There will be a slight slowdown on insert/update so the key can be updated, but searches will be faster, because the index can (in some cases) be used to accelerate the search.
The answers so far are good, and I would recommend using UNIQUE KEY and NOT NULL for your application. Using UNIQUE KEY may slow down INSERT or UPDATE, but it would certainly not slow down searches.
However, one thing you should consider is that just because you use UNIQUE KEY, it does not necessarily enforce unique e-mail addresses. As an example, abc#gmail.com and a.b.c#gmail.com represent the same e-mail. If you don't want to allow this, you should normalize e-mail addresses in PHP before sending them to your database.
With MySQL you have to remember that unique index depends on the collation of your whole table (in other db you can make on upper() function).
See this link:
http://sqlfiddle.com/#!2/37386/1
Now, if you use utf8_general_ci insted of utf8_bin the index creation would fail.
What is the purpose of the Secondary key? Say I have a table that logs down all the check-ins (similar to Foursquare), with columns id, user_id, location_id, post, time, and there can be millions of rows, many people have stated to use secondary keys to speed up the process.
Why does this work? And should both user_id and location_id be secondary keys?
I'm using mySQL btw...
Edit: There will be a page that lists/calculates all the check-ins for a particular user, and another page that lists all the users who has checked-in to a particular location
mySQL Query
Type 1
SELECT location_id FROM checkin WHERE user_id = 1234
SELECT user_id FROM checkin WHERE location_id = 4321
Type 2
SELECT COUNT(location_id) as num_users FROM checkin
SELECT COUNT(user_id) as num_checkins FROM checkin
The key (also called index) is for speeding up queries. If you want to see all checkins for a given user, you need a key on user_id field. If you want to see all checking for a given location, you need index on location_id field. You can read more at mysql documentation
I want to comment on your question and your examples.
Let me just suggest strongly to you that since you are using MySQL you make sure that your tables are using the innodb engine type for many reasons you can research on your own.
One important feature of InnoDB is that you have referential integrity. What does that mean? In your checkin table, you have a foreign key of user_id which is the primary key of the user table. With referential integrity, MySQL will not let you insert a row with a user_id that doesn't exist in the user table. Using MyISAM, you can. That alone should be enough to make you want to use the innodb engine.
To your question about keys/indexes, essentially when a table is defined and a key is declared for a column or some combination of columns, mysql will create an index.
Indexes are essential for performance as a table grows with the insert of rows.
All relational databases and Document databases depend on an implementation of BTree indexing. What Btree's are very good for, is finding an item (or not) using a predictable number of lookups. So when people talk about the performance of a relational database the essential building block of that is use of btree indexes, which are created via KEY statements or with alter table or create index statements.
To understand why this is, imagine that your user table was simply a text file, with one line per row, perhaps separated by commas. As you add a row, a new line in the text file gets added at the bottom.
Eventually you get to the point that you have 10,000 lines in the file.
Now you want to find out if you entered a line for one particular person with the last name of Smith. How can you find that out?
Without any sort of sortation of the file, or a separate index, you have but one option and that is to start at the first line in the file and scan through every line in the table looking for a match. Even if you found a Smith, that might not be the only 'Smith' in the table, so you have to read the entire file from top to bottom every time you want do do this search.
Obviously as the table grows the performance of searching gets worse and worse.
In relational database parlance, this is known as a "table scan". The database has to start at the first row and scan through reading every row until it gets to the end.
Without indexes, relational databases still work, but they are highly dependent on IO performance.
With a Btree index, the rows you want to find are found in the index first. The indexes have a pointer directly to the data you want, so the table no longer needs to be scanned, but instead the individual data pages required are read. This is how a database can maintain adequate performance even when there are millions or 10's or 100's of millions of rows.
To really start to gain insight into how mysql works, you need to get familiar with EXPLAIN EXTENDED ... and start looking at the explain plans for queries. Simple ones like those you've provided will have simple plans that show you how many rows are being examined to get a result and whether or not they are using one or more indexes.
For your summary queries, indexes are not helpful because you are doing a COUNT(). The table will need to be scanned when you have no other criteria constraining the search.
I did notice what looks like a mistake in your summary queries. Just based on your labels, I would think that these are the right queries to get what you would want given your column alias names.
SELECT COUNT(DISTINCT user_id) as num_users FROM checkin
SELECT COUNT(*) as num_checkins FROM checkin
This is yet another reason to use InnoDB, which when properly configured has a data cache (innodb buffer pool) similar to other rdbms's like oracle and sql server. MyISAM doesn't cache data at all, so if you are repeatedly querying the same sorts of queries that might require a lot of IO, MySQL will have to do all that data reading work over and over, whereas with InnoDB, that data could very well be sitting in cache memory and have the result returned without having to go back and read from storage.
Primary vs Secondary
There really is no such concept internally. A Primary key is special because it allows the database to find one single row. Primary keys must be unique, and to reflect that, the associated Btree index is unique, which simply means that it will not allow you to have 2 keys with the same data to exist in the index.
Whether or not an index is unique is an excellent tool that allows you to maintain the consistency of your database in many other cases. Let's say you have an 'employee' table with the SS_Number column to store social security #. It makes sense to have an index on that column if you want the system to support finding an employee by SS number. Without an index, you will tablescan. But you also want to have that index be unique, so that once an employee with a SS# is inserted, there is no way the database will let you enter a duplicate employee with the same SS#.
But to demystify this for you, when you declare keys these indexes are just being created for you and used automagically in most cases, when you define the tables.
It's when you aren't dealing with keys (primary or foreign) as in the example of usernames, first, last & last names, ss#'s etc., that you need to also be aware of how to create an index because you are searching (using where clause criteria) on one or more columns that aren't keys.
For example, I'm doing the next action:
SELECT COUNT(id)
FROM users
WHERE unique_name = 'Wiliam'
// if Wiliam don't exists then...
INSERT INTO users
SET unique_name = 'Wiliam'
The question is, I'm doing the SELECT COUNT(id) check every time I insert a new user, despite of using an unique key or not, so... if "unique_name" has an UNIQUE key it will be better for performance than using a normal key?
What you mean is a UNIQUE CONSTRAINT on the column which will be updated. Reads will be faster, Inserts will be just a bit slower. It will still be faster than your code checking first and then inserting the value though. Just let mysql do its thing and return an error to you if the value is not unique.
You didn't say what this is for, which would help. If its part of an authentication system, then why doesn't your query include the user's password as well? If it's not, a unique indexed column used to store names isn't going to work very well in a real-world system unless you are OK with having just 1 and only Wiliam in your system. (Was that supposed to be William?)
And if that name field is really unique you do not need to use COUNT(ID) in your query. If 'unique_name' is truly unique you either get an id number returned from your query or you get nothing.
You'd want something like this:
SELECT id FROM users WHERE unique_name = 'Wiliam'
No record return, no Wiliam.
An index (unique or non-unique -- I don't know what you're after here) on unique_name will improve the performance.
Your use of 'unique key' isn't very logical so I suspect you are getting confused about the nomenclature of keys, indexes, their relationships, and the purposes for them.
KEYS in a database are used to create and identify relationships between sets of data. This is what makes the 'relational' possible in a relational database.
Keys come in 2 flavors: Primary and foreign.
PRIMARY KEYS identify each row in a table. The value or values that comprise the key must be unique.
Primary keys can be made from a single column or made of several columns (in which case it is called a composite key) that together uniquely identifies the row. Again the important thing here is uniqueness.
I use MySql's auto-increment integer data type for my primary keys.
FOREIGN KEYS identify which rows in a table have a relationship with other rows in other tables. A foreign key of a record in one table is the primary key of the related record in the other table. A foreign key is not unique -- in many-to-many relationships there are by definition multiple records with the same foreign key. They should however be indexed.
INDEXES are used by the database as a sort of short-hand method to quickly look up values, as opposed to scanning the entire table or column for a match. Think of the index in the back of a book. Much easier to find something using a book's index than by flipping through the pages looking for it.
You may also want to index a non-key column for better performance when searching on that column. What column do you use frequently in a WHERE clause? Probably should index it then.
UNIQUE INDEX is an index where all the values in it must be distinct. A column with a unique index will not let you insert a duplicate value, because it would violate the unique constraint. Primary keys are unique indexes. But unique indexes do not have to be primary keys, or even a key.
Hope that helps.
[edited for brevity]
Having a unique constraint is a good thing because it prevents insertion of duplicated entries in case your program is buggy (are you missing a "for update" clause in your select statement?) or in case someone inserts data not using your application.
You should, however, not depend on it in your application for normal operation. Lets assume unique_name is an input field a user can specify. Your application should check whether the name is unique. If it is, insert it. If it was not, tell the user.
It is a bad idea to just try the insert in all cases and see if it was successful: It will create errors in the database server logs that makes it more difficult to find real errors. And it will render your current transaction useless, which may be an issue depending on the situation