avoiding duplicates in MySQL on mulitple column values

avoiding duplicates in MySQL on mulitple column values - php

I keep finding myself writing queries to avoid inserting when there are duplicates - things like
select * from foobar where bar=barbar and foo=foofoo
and then checking in PHP with mysql_num_rows() to see if the number of results is > 0 to determine whether to go forward with my insert.
EDIT: for instance, let's say a user wants to send an invitation to another user. I want to make sure that in my invitations table, I don't add another entry with the same pair invited_id AND game_id. so this requires some sort of check.
this feels inefficient (and slightly dirty). is there a better way?

What about unique index?
A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row with a key value that matches an existing row. For all engines, a UNIQUE index permits multiple NULL values for columns that can contain NULL.
http://dev.mysql.com/doc/refman/5.1/en/create-table.html
EDIT:
A column list of the form (col1,col2,...) creates a multiple-column index. Index values are formed by concatenating the values of the given columns.
http://dev.mysql.com/doc/refman/5.0/en/create-index.html

In this case, create a unique index on (bar, foo), so that the insert fails on duplicated value. You just need to handle the exception in php. This way, the code is cleaner and faster.

Just use a UNIQUE key on the columns and the INSERT IGNORE statement to insert new rows (duplicate rows are IGNORED).
Beware that the UNIQUE key may not exceed a 1000 bytes, meaning that the potential number of bytes contained in the fields foo and bar together may not exceed a 1000 bytes. If this creates a problem, just MD5 the CONCATENATED values into its own column at insert time, like (in PHP) md5($foo.$bar), and set the unique key to that column.
CREATE TABLE `test_unique` (
`id` int(10) unsigned NOT NULL auto_increment,
`foo` varchar(45) default NULL,
`bar` varchar(45) default NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `Unique` (`foo`,`bar`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT IGNORE INTO `test_unique` VALUES
(1, 'foo1', 'bar1'),
(2, 'foo2', 'bar2');
INSERT IGNORE INTO `test_unique` VALUES
(2, 'foo2', 'bar2');

Related

php MySql 2 column as index and 1 as another index?

I'm using php and i have a table that have 2 column of varchar , one is used for user identification, and the other is used for page name entry.
they both must be varchar.
i want to insert ignore data when user enter a page to know if he visited it or not, and i want to fetch all the rows that the user have been in.
fetch all for first varchar column.
insert if not exist for both values.
I'm hoping to do it in the most efficient way.
what is the best way to insert without checking with another query if exist?
what is the best way other then:
SELECT * FROM table WHERE id = id
to fetch when the column needed is varchar?

You should consider a normalized table structure like this:
CREATE TABLE user (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE page (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE pages_visted (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
user_id INT UNSIGNED,
page_id INT UNSIGNED,
UNIQUE KEY (user_id, page_id)
);
INSERT IGNORE INTO pages_visted (user_id, page_id) VALUES (:userId, :pageId);
SELECT page_id FROM pages_visted WHERE user_id = :userId;

I think you want to implement a composite primary key.
A composite primary key tells MySQL that you want your primary key to be a combination of fields.
More info here: Why use multiple columns as primary keys (composite primary key)
I don't know of a better option for your query, although I can advise, if possible:
Define columns to be NOT NULL. This gives you faster processing and requires less storage. It will also simplify queries sometimes because you don't need to check for NULL as a special case.
And with variable-length rows, you get more fragmentation in tables where you perform many deletes or updates due to the differing sizes of the records. You'll need to run OPTIMIZE TABLE periodically to maintain performance.

How to limit number of entries in MySQL database?

how to limit the number of entry in inserting data in mysql database using php to 1
Any suggestions? Thanks .

You probably can't get it right in PHP since the trip back and forth to the database leaves room for another part of your application to create an entry. Normally we achieve this sort of thing by putting a unique index on the table that prevents duplication of data. For example:
CREATE TABLE alf_mimetype
(
id BIGINT NOT NULL AUTO_INCREMENT,
version BIGINT NOT NULL,
mimetype_str VARCHAR(100) NOT NULL,
PRIMARY KEY (id),
UNIQUE (mimetype_str)
) ENGINE=InnoDB;
If you attempt to insert a row with duplicate mimetype_str, the database will generate an exception. Catch it in your application and you'll know that your single entry for that particular row is already there.
You can create UNIQUE keys on multiple columns as well. Your primary key also represents a unique constraint and can consist of multiple columns.

On duplicate key update - increase by 2

I am inserting search terms into a database, and when I run it to test, I was seeing duplicates inserted into the database. I was able to solve that, but, now I am inserting, and it no longer inserts as a duplicate, but I am trying to get the on duplicate to work - and the on duplicate updates the popularity by 2 each time? what do I have wrong here?
$entryDate = date("c");
$insertsearchquery="insert into article_searches (termSafe,entryDate) values (\"$termSafe\",\"$entryDate\") on duplicate key update popularity=popularity+1";
mysql_query($insertsearchquery);
I have a UNIQUE key set for both termSafe and entryDate.
TABLE `article_searches` (
`id` bigint(20) NOT NULL AUTO INCREMENT ,
`termSafe` varchar(150) ,
`entryDate` varchar(255),
`popularity` tinyint(6) UNSIGNED NOT NULL,
PRIMARY KEY (`id`) USING BTREE,
UNIQUE INDEX `termSafe` USING BTREE (`termSafe`),
UNIQUE INDEX `entryDate` USING BTREE (`entryDate`)
)
Although I just deleted the entryDate unique Index.

I'm guessing that this is because you have a UNIQUE (termSafe) and UNIQUE (entryDate), perhaps you just want a UNIQUE (termSafe,entryDate).
Bear in mind mysql_* functions are deprecated.. you should use PDO or mysqli_* instead.

Try dropping the two unique indexes and create only one:
create unique index idx_article_searches_termSafe_EntryDate on article_searches(termSafe, EntryDate);
When using on duplciate key update, the behavior can be ill-defined when more than one unique constraint violation is detected. I think it is doing the update twice, once for each constraint.
Here is the warning in the documentation:
In general, you should try to avoid using an ON DUPLICATE KEY UPDATE
clause on tables with multiple unique indexes.

MySQL: what's the difference between INDEX, UNIQUE, FOREIGN KEY, and PRIMARY KEY?

Ok, so i'm a newbie here at SQL..
I'm settings up my tables, and i'm getting confused on indexes, keys, foreign keys..
I have a users table, and a projects table.
I want to use the users (id) to attach a project to a user.
This is what I have so far:
DROP TABLE IF EXISTS projects;
CREATE TABLE projects (
id int(8) unsigned NOT NULL,
user_id int(8),
name varchar(120) NOT NULL,
description varchar(300),
created_at date,
updated_at date,
PRIMARY KEY (id),
KEY users_id (user_id)
) ENGINE=InnoDB;
ALTER TABLE projects (
ADD CONSTRAINT user_projects,
FOREIGN KEY (user_id) REFERENCES users(id),
ON DELETE CASCADE
)
So what I'm getting lost on is what is the differences between a key, an index, a constraint and a foreign key?
I've been looking online and can't find a newbie explanation for it.
PS. I'm using phpactiverecord and have the relationships set up in the models
user-> has_many('projects');
projects -> belongs_to('user');
Not sure if that has anything to do with it, but thought i'd throw it in there..
Thanks.
EDIT:
I thought it could possible be something to do with Navicat, so I went into WampServer -> phpMyAdmin and ran this...
DROP TABLE IF EXISTS projects;
CREATE TABLE projects (
id int(8) unsigned NOT NULL,
user_id int(8) NOT NULL,
name varchar(120) NOT NULL,
description varchar(300),
created_at date,
updated_at date,
PRIMARY KEY (id),
KEY users_id (user_id),
FOREIGN KEY (user_id) REFERENCES users(id)
) ENGINE=InnoDB;
Still nothing... :(

Expanding on Shamil's answers:
INDEX is similar to the index at the back of a book. It provides a simplified look-up for the data in that column so that searches on it are faster. Fun details: MyISAM uses a hashtable to store indexes, which keys the data, but is still linearly proportional in depth to the table size. InnoDB uses a B-tree structure for its indexes. A B-tree is similar to a nested set - it breaks down the data into logical child groups, meaning search depth is significantly smaller. As such, lookups by ranges are faster in a InnoDB, whereas lookups of a single key are faster in MyISAM (try to remember the Big O of hashtables and binary trees).
UNIQUE INDEX is an index in which each row in the database must have a unique value for that column or group of columns. This is useful for preventing duplication, e.g. for an email column in a users table where you want only one account per email address. Important note that in MySQL, an INSERT... ON DUPLICATE KEY UPDATE statement will execute the update if it finds a duplicate unique index match, even if it's not your primary key. This is a pitfall to be aware of when using INSERT... UPDATE statements on tables with uniques. You may wind up unintentionally overwriting records! Another note about Uniques in MySQL - per the ANSI-92 standard, NULL values are not to be considered unique, which means you can have multiple NULL values in a nullable unique-indexed column. Although it's a standard, some other RDBMSes differ on implementation of this.
PRIMARY KEY is a UNIQUE INDEX that is the identifier for any given row in the table. As such, it must not be null, and is saved as a clustered index. Clustered means that the data is written to your filesystem in ascending order on the PK. This makes searches on primary key significantly faster than any other index type (as in MySQL, only the PK may be your clustered index). Note that clustering also causes concerns with INSERT statements if your data is not AUTO_INCREMENTed, as MySQL will have to shift data around on the filesystem if you insert a new row with a PK with a lower ordinal value. This could hamper your DB performance. So unless you're certain you know what you're doing, always use an auto-incremented value for your PK in MySQL.
FOREIGN KEY is a reference to a column in another table. It enforces Referential Integrity, which means that you cannot create an entry in a column which has a foreign key to another table if the entered value does not exist in the referenced table. In MySQL, a FOREIGN KEY does not improve search performance. It also requires that both tables in the key definition use the InnoDB engine, and have the same data type, character set, and collation.

KEY is just another word for INDEX.
A UNIQUE index means that all values within that index must be unique, and not the same as ant other within that index. An example would be an Id column in a table.
A PRIMARY KEY is a unique index where all key columns must be defined as NOT NULL, i.e, all values in the index must be set. Ideally, each table should have (and can have) one primary key only.
A FOREIGN KEY is a referential constraint between two tables. This column/index must have the same type and length as the referred column within the referred table. An example of a FOREIGN KEY is a userId, between a user-login table and a users table. Note that it usually points to a PRIMARY KEY in the referred table.
http://dev.mysql.com/doc/refman/5.1/en/create-table.html

How to get next alpha-numeric ID based on existing value from MySQL

First, I apologize if this has been asked before - indeed I'm sure it has, but I can't find it/can't work out what to search for to find it.
I need to generate unique quick reference id's, based on a company name. So for example:
Company Name Reference
Smiths Joinery smit0001
Smith and Jones Consulting smit0002
Smithsons Carpets smit0003
These will all be stored in a varchar column in a MySQL table. The data will be collected, escaped and inserted like 'HTML -> PHP -> MySQL'. The ID's should be in the format depicted above, four letters, then four numerics (initially at least - when I reach smit9999 it will just spill over into 5 digits).
I can deal with generating the 4 letters from the company name, I will simply step through the name until I have collected 4 alpha characters, and strtolower() it - but then I need to get the next available number.
What is the best/easiest way to do this, so that the possibility of duplicates is eliminated?
At the moment I'm thinking:
$fourLetters = 'smit';
$query = "SELECT `company_ref`
FROM `companies`
WHERE
`company_ref` LIKE '$fourLetters%'
ORDER BY `company_ref` DESC
LIMIT 1";
$last = mysqli_fetch_assoc(mysqli_query($link, $query));
$newNum = ((int) ltrim(substr($last['company_ref'],4),'0')) + 1;
$newRef = $fourLetters.str_pad($newNum, 4, '0', STR_PAD_LEFT);
But I can see this causing a problem if two users try to enter company names that would result in the same ID at the same time. I will be using a unique index on the column, so it would not result in duplicates in the database, but it will still cause a problem.
Can anyone think of a way to have MySQL work this out for me when I do the insert, rather than calculating it in PHP beforehand?
Note that actual code will be OO and will handle errors etc - I'm just looking for thoughts on whether there is a better way to do this specific task, it's more about the SQL than anything else.
EDIT
I think that #EmmanuelN's suggestion of using a MySQL trigger may be the way to handle this, but:
I am not good enough with MySQL, particularly triggers, to get this to work, and would like a step-by-step example of creating, adding and using a trigger.
I am still not sure whether this will will eliminate the possibility of two identical ID's being generated. See what happens if two rows are inserted at the same time that result in the trigger running simultaneously, and produce the same reference? Is there any way to lock the trigger (or a UDF) in such a way that it can only have one concurrent instance?.
Or I would be open to any other suggested approaches to this problem.

If you are using MyISAM, then you can create a compound primary key on a text field + auto increment field. MySQL will handle incrementing the number automatically. They are separate fields, but you can get the same effect.
CREATE TABLE example (
company_name varchar(100),
key_prefix char(4) not null,
key_increment int unsigned auto_increment,
primary key co_key (key_prefix,key_increment)
) ENGINE=MYISAM;
When you do an insert into the table, the key_increment field will increment based on the highest value based on key_prefix. So insert with key_prefix "smit" will start with 1 in key_inrement, key_prefix "jone" will start with 1 in key_inrement, etc.
Pros:
You don't have to do anything with calculating numbers.
Cons:
You do have a key split across 2 columns.
It doesn't work with InnoDB.

How about this solution with a trigger and a table to hold the company_ref's uniquely. Made a correction - the reference table has to be MyISAM if you want the numbering to begin at 1 for each unique 4char sequence.
DROP TABLE IF EXISTS company;
CREATE TABLE company (
company_name varchar(100) DEFAULT NULL,
company_ref char(8) DEFAULT NULL
) ENGINE=InnoDB
DELIMITER ;;
CREATE TRIGGER company_reference BEFORE INSERT ON company
FOR EACH ROW BEGIN
INSERT INTO reference SET company_ref=SUBSTRING(LOWER(NEW.company_name), 1, 4), numeric_ref=NULL;
SET NEW.company_ref=CONCAT(SUBSTRING(LOWER(NEW.company_name), 1, 4), LPAD(CAST(LAST_INSERT_ID() AS CHAR(10)), 4, '0'));
END ;;
DELIMITER ;
DROP TABLE IF EXISTS reference;
CREATE TABLE reference (
company_ref char(4) NOT NULL DEFAULT '',
numeric_ref int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (company_ref, numeric_ref)
) ENGINE=MyISAM;
And for completeness here is a trigger that will create a new reference if the company name is altered.
DROP TRIGGER IF EXISTS company_reference_up;
DELIMITER ;;
CREATE TRIGGER company_reference_up BEFORE UPDATE ON company
FOR EACH ROW BEGIN
IF NEW.company_name <> OLD.company_name THEN
DELETE FROM reference WHERE company_ref=SUBSTRING(LOWER(OLD.company_ref), 1, 4) AND numeric_ref=SUBSTRING(OLD.company_ref, 5, 4);
INSERT INTO reference SET company_ref=SUBSTRING(LOWER(NEW.company_name), 1, 4), numeric_ref=NULL;
SET NEW.company_ref=CONCAT(SUBSTRING(LOWER(NEW.company_name), 1, 4), LPAD(CAST(LAST_INSERT_ID() AS CHAR(10)), 4, '0'));
END IF;
END;
;;
DELIMITER ;

Given you're using InnoDB, why not use an explicit transaction to grab an exclusive row lock and prevent another connection from reading the same row before you're done setting a new ID based on it?
(Naturally, doing the calculation in a trigger would hold the lock for less time.)
mysqli_query($link, "BEGIN TRANSACTION");
$query = "SELECT `company_ref`
FROM `companies`
WHERE
`company_ref` LIKE '$fourLetters%'
ORDER BY `company_ref` DESC
LIMIT 1
FOR UPDATE";
$last = mysqli_fetch_assoc(mysqli_query($link, $query));
$newNum = ((int) ltrim(substr($last['company_ref'],4),'0')) + 1;
$newRef = $fourLetters.str_pad($newNum, 4, '0', STR_PAD_LEFT);
mysqli_query($link, "INSERT INTO companies . . . (new row using $newref)");
mysqli_commit($link);
Edit: Just to be 100% sure I ran a test by hand to confirm that the second transaction will return the newly inserted row after waiting rather than the original locked row.
Edit2: Also tested the case where there is no initial row returned (Where you would think there is no initial row to put a lock on) and that works as well.

Ensure you have an unique constraint on the Reference column.
Fetch the current max sequential reference the same way you do it in your sample code. You don't actually need to trim the zeroes before you cast to (int), '0001' is a valid integer.
Roll a loop and do your insert inside.
Check affected rows after the insert. You can also check the SQL state for a duplicate key error, but having zero affected rows is a good indication that your insert failed due to inserting an existing Reference value.
If you have zero affected rows, increment the sequential number, and roll the loop again. If you have non-zero affected rows, you're done and have an unique identifier inserted.

Easiest way to avoid duplicate values for the reference column is to add a unique constraint. So if multiple processes try to set to the same value, MySQL will reject the second attempt and throw an error.
ALTER TABLE table_name ADD UNIQUE KEY (`company_ref`);
If I were faced with your situation, I would handle the company reference id generation within the application layer, triggers can get messy if not setup correctly.

A hacky version that works for InnoDB as well.
Replace the insert to companies with two inserts in a transaction:
INSERT INTO __keys
VALUES (LEFT(LOWER('Smiths Joinery'),4), LAST_INSERT_ID(1))
ON DUPLICATE KEY UPDATE
num = LAST_INSERT_ID(num+1);
INSERT INTO __companies (comp_name, reference)
VALUES ('Smiths Joinery',
CONCAT(LEFT(LOWER(comp_name),4), LPAD(LAST_INSERT_ID(), 4, '0')));
where:
CREATE TABLE `__keys` (
`prefix` char(4) NOT NULL,
`num` smallint(5) unsigned NOT NULL,
PRIMARY KEY (`prefix`)
) ENGINE=InnoDB COLLATE latin1_general_ci;
CREATE TABLE `__companies` (
`comp_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`comp_name` varchar(45) NOT NULL,
`reference` char(8) NOT NULL,
PRIMARY KEY (`comp_id`)
) ENGINE=InnoDB COLLATE latin1_general_ci;
Notice:
latin1_general_ci can be replaced with utf8_general_ci,
LEFT(LOWER('Smiths Joinery'),4) would better become a function in PHP

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.