Related
How can I reset the AUTO_INCREMENT of a field?
I want it to start counting from 1 again.
You can reset the counter with:
ALTER TABLE tablename AUTO_INCREMENT = 1
For InnoDB you cannot set the auto_increment value lower or equal to the highest current index. (quote from ViralPatel):
Note that you cannot reset the counter to a value less than or equal
to any that have already been used. For MyISAM, if the value is less
than or equal to the maximum value currently in the AUTO_INCREMENT
column, the value is reset to the current maximum plus one. For
InnoDB, if the value is less than the current maximum value in the
column, no error occurs and the current sequence value is not changed.
See How can I reset an MySQL AutoIncrement using a MAX value from another table? on how to dynamically get an acceptable value.
SET #num := 0;
UPDATE your_table SET id = #num := (#num+1);
ALTER TABLE your_table AUTO_INCREMENT =1;
Simply like this:
ALTER TABLE tablename AUTO_INCREMENT = value;
Reference: 13.1.9 ALTER TABLE Statement
There is a very easy way with phpMyAdmin under the "operations" tab. In the table options you can set autoincrement to the number you want.
The best solution that worked for me:
ALTER TABLE my_table MODIFY COLUMN ID INT(10) UNSIGNED;
COMMIT;
ALTER TABLE my_table MODIFY COLUMN ID INT(10) UNSIGNED AUTO_INCREMENT;
COMMIT;
It's fast, works with InnoDB, and I don't need to know the current maximum value!
This way. the auto increment counter will reset and it will start automatically from the maximum value exists.
The highest rated answers to this question all recommend "ALTER yourtable AUTO_INCREMENT= value". However, this only works when value in the alter is greater than the current max value of the autoincrement column. According to the MySQL 8 documentation:
You cannot reset the counter to a value less than or equal to the value that is currently in use. For both InnoDB and MyISAM, if the value is less than or equal to the maximum value currently in the AUTO_INCREMENT column, the value is reset to the current maximum AUTO_INCREMENT column value plus one.
In essence, you can only alter AUTO_INCREMENT to increase the value of the autoincrement column, not reset it to 1, as the OP asks in the second part of the question. For options that actually allow you set the AUTO_INCREMENT downward from its current max, take a look at Reorder / reset auto increment primary key.
As of MySQL 5.6 you can use the simple ALTER TABLE with InnoDB:
ALTER TABLE tablename AUTO_INCREMENT = 1;
The documentation are updated to reflect this:
13.1.7 ALTER TABLE Statement
My testing also shows that the table is not copied. The value is simply changed.
Beware! TRUNCATE TABLE your_table will delete everything in your your_table.
You can also use the syntax TRUNCATE table like this:
TRUNCATE TABLE table_name
ALTER TABLE news_feed DROP id
ALTER TABLE news_feed ADD id BIGINT( 200 ) NOT NULL AUTO_INCREMENT FIRST ,ADD PRIMARY KEY (id)
I used this in some of my scripts. The id field is dropped and then added back with previous settings. All the existent fields within the database table are filled in with the new auto increment values. This should also work with InnoDB.
Note that all the fields within the table will be recounted and will have other ids!!!.
It is for an empty table:
ALTER TABLE `table_name` AUTO_INCREMENT = 1;
If you have data, but you want to tidy up it, I recommend to use this:
ALTER TABLE `table_name` DROP `auto_colmn`;
ALTER TABLE `table_name` ADD `auto_colmn` INT( {many you want} ) NOT NULL AUTO_INCREMENT FIRST ,ADD PRIMARY KEY (`auto_colmn`);
To update to the latest plus one id:
ALTER TABLE table_name AUTO_INCREMENT =
(SELECT (id+1) id FROM table_name order by id desc limit 1);
Edit:
SET #latestId = SELECT MAX(id) FROM table_name;
SET #nextId = #latestId + 1;
ALTER TABLE table_name AUTO_INCREMENT = #nextId;
Not tested please test before you run*
Warning: If your column has constraints or is connected as a foreign key to other tables this will have bad effects.
First, drop the column:
ALTER TABLE tbl_name DROP COLUMN column_id
Next, recreate the column and set it as FIRST (if you want it as the first column I assume):
ALTER TABLE tbl_access ADD COLUMN `access_id` int(10) NOT NULL PRIMARY KEY AUTO_INCREMENT FIRST
As of MySQL 5.6 the approach below works faster due to online DDL (note algorithm=inplace):
alter table tablename auto_increment=1, algorithm=inplace;
SET #num := 0;
UPDATE your_table SET id = #num := (#num+1);
ALTER TABLE your_table AUTO_INCREMENT =1;
ALTER TABLE tablename AUTO_INCREMENT = 1
Try to run this query:
ALTER TABLE tablename AUTO_INCREMENT = value;
Or try this query for the reset auto increment
ALTER TABLE `tablename` CHANGE `id` `id` INT(10) UNSIGNED NOT NULL;
And set auto increment and then run this query:
ALTER TABLE `tablename` CHANGE `id` `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT;
The auto-increment counter for a table can be (re)set in two ways:
By executing a query, like others already explained:
ALTER TABLE <table_name> AUTO_INCREMENT=<table_id>;
Using Workbench or another visual database design tool. I am going to show in Workbench how it is done - but it shouldn't be much different in other tools as well. By right clicking over the desired table and choosing Alter table from the context menu. On the bottom you can see all the available options for altering a table. Choose Options and you will get this form:
Then just set the desired value in the field Auto increment as shown in the image. This will basically execute the query shown in the first option.
If you're using PHPStorm's database tool you have to enter this in the database console:
ALTER TABLE <table_name> AUTO_INCREMENT = 0;
I tried to alter the table and set auto_increment to 1 but it did not work. I resolved to delete the column name I was incrementing, then create a new column with your preferred name and set that new column to increment from the onset.
I googled and found this question, but the answer I am really looking for fulfils two criteria:
using purely MySQL queries
reset an existing table auto-increment to max(id) + 1
Since I couldn't find exactly what I want here, I have cobbled the answer from various answers and sharing it here.
Few things to note:
the table in question is InnoDB
the table uses the field id with type as int as primary key
the only way to do this purely in MySQL is to use stored procedure
my images below are using SequelPro as the GUI. You should be able to adapt it based on your preferred MySQL editor
I have tested this on MySQL Ver 14.14 Distrib 5.5.61, for debian-linux-gnu
Step 1: Create Stored Procedure
create a stored procedure like this:
DELIMITER //
CREATE PROCEDURE reset_autoincrement(IN tablename varchar(200))
BEGIN
SET #get_next_inc = CONCAT('SELECT #next_inc := max(id) + 1 FROM ',tablename,';');
PREPARE stmt FROM #get_next_inc;
EXECUTE stmt;
SELECT #next_inc AS result;
DEALLOCATE PREPARE stmt;
set #alter_statement = concat('ALTER TABLE ', tablename, ' AUTO_INCREMENT = ', #next_inc, ';');
PREPARE stmt FROM #alter_statement;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END //
DELIMITER ;
Then run it.
Before run, it looks like this when you look under Stored Procedures in your database.
When I run, I simply select the stored procedure and press Run Selection
Note: the delimiters part are crucial. Hence if you copy and paste from the top selected answers in this question, they tend not to work for this reason.
After I run, I should see the stored procedure
If you need to change the stored procedure, you need to delete the stored procedure, then select to run again.
Step 2: Call the stored procedure
This time you can simply use normal MySQL queries.
call reset_autoincrement('products');
Originally from my own SQL queries notes in https://simkimsia.com/reset-mysql-autoincrement-to-max-id-plus-1/ and adapted for Stack Overflow.
delete from url_rewrite where 1=1;
ALTER TABLE url_rewrite AUTO_INCREMENT = 1;
and then reindex
ALTER TABLE `table_name` DROP `id`;
ALTER TABLE `table_name` ADD `id` INT NOT NULL AUTO_INCREMENT FIRST, ADD PRIMARY KEY (`id`) ;
Shortly,First we deleted id column then added it with primary key id again...
The best way is remove the field with AI and add it again with AI. It works for all tables.
You need to follow the advice from Miles M's comment and here is some PHP code that fixes the range in MySQL. Also you need to open up the my.ini file (MySQL) and change max_execution_time=60 to max_execution_time=6000; for large databases.
Don’t use "ALTER TABLE tablename AUTO_INCREMENT = 1". It will delete everything in your database.
$con = mysqli_connect($dbhost, $dbuser, $dbpass, $database);
$res = mysqli_query($con, "select * FROM data WHERE id LIKE id ORDER BY id ASC");
$count = 0;
while ($row = mysqli_fetch_array($res)){
$count++;
mysqli_query($con, "UPDATE data SET id='".$count."' WHERE id='".$row['id']."'");
}
echo 'Done reseting id';
mysqli_close($con);
I suggest you to go to Query Browser and do the following:
Go to schemata and find the table you want to alter.
Right click and select copy create statement.
Open a result tab and paste the create statement their.
Go to the last line of the create statement and look for the Auto_Increment=N,
(Where N is a current number for auto_increment field.)
Replace N with 1.
Press Ctrl + Enter.
Auto_increment should reset to one once you enter a new row in the table.
I don't know what will happen if you try to add a row where an auto_increment field value already exist.
First, I apologize if this has been asked before - indeed I'm sure it has, but I can't find it/can't work out what to search for to find it.
I need to generate unique quick reference id's, based on a company name. So for example:
Company Name Reference
Smiths Joinery smit0001
Smith and Jones Consulting smit0002
Smithsons Carpets smit0003
These will all be stored in a varchar column in a MySQL table. The data will be collected, escaped and inserted like 'HTML -> PHP -> MySQL'. The ID's should be in the format depicted above, four letters, then four numerics (initially at least - when I reach smit9999 it will just spill over into 5 digits).
I can deal with generating the 4 letters from the company name, I will simply step through the name until I have collected 4 alpha characters, and strtolower() it - but then I need to get the next available number.
What is the best/easiest way to do this, so that the possibility of duplicates is eliminated?
At the moment I'm thinking:
$fourLetters = 'smit';
$query = "SELECT `company_ref`
FROM `companies`
WHERE
`company_ref` LIKE '$fourLetters%'
ORDER BY `company_ref` DESC
LIMIT 1";
$last = mysqli_fetch_assoc(mysqli_query($link, $query));
$newNum = ((int) ltrim(substr($last['company_ref'],4),'0')) + 1;
$newRef = $fourLetters.str_pad($newNum, 4, '0', STR_PAD_LEFT);
But I can see this causing a problem if two users try to enter company names that would result in the same ID at the same time. I will be using a unique index on the column, so it would not result in duplicates in the database, but it will still cause a problem.
Can anyone think of a way to have MySQL work this out for me when I do the insert, rather than calculating it in PHP beforehand?
Note that actual code will be OO and will handle errors etc - I'm just looking for thoughts on whether there is a better way to do this specific task, it's more about the SQL than anything else.
EDIT
I think that #EmmanuelN's suggestion of using a MySQL trigger may be the way to handle this, but:
I am not good enough with MySQL, particularly triggers, to get this to work, and would like a step-by-step example of creating, adding and using a trigger.
I am still not sure whether this will will eliminate the possibility of two identical ID's being generated. See what happens if two rows are inserted at the same time that result in the trigger running simultaneously, and produce the same reference? Is there any way to lock the trigger (or a UDF) in such a way that it can only have one concurrent instance?.
Or I would be open to any other suggested approaches to this problem.
If you are using MyISAM, then you can create a compound primary key on a text field + auto increment field. MySQL will handle incrementing the number automatically. They are separate fields, but you can get the same effect.
CREATE TABLE example (
company_name varchar(100),
key_prefix char(4) not null,
key_increment int unsigned auto_increment,
primary key co_key (key_prefix,key_increment)
) ENGINE=MYISAM;
When you do an insert into the table, the key_increment field will increment based on the highest value based on key_prefix. So insert with key_prefix "smit" will start with 1 in key_inrement, key_prefix "jone" will start with 1 in key_inrement, etc.
Pros:
You don't have to do anything with calculating numbers.
Cons:
You do have a key split across 2 columns.
It doesn't work with InnoDB.
How about this solution with a trigger and a table to hold the company_ref's uniquely. Made a correction - the reference table has to be MyISAM if you want the numbering to begin at 1 for each unique 4char sequence.
DROP TABLE IF EXISTS company;
CREATE TABLE company (
company_name varchar(100) DEFAULT NULL,
company_ref char(8) DEFAULT NULL
) ENGINE=InnoDB
DELIMITER ;;
CREATE TRIGGER company_reference BEFORE INSERT ON company
FOR EACH ROW BEGIN
INSERT INTO reference SET company_ref=SUBSTRING(LOWER(NEW.company_name), 1, 4), numeric_ref=NULL;
SET NEW.company_ref=CONCAT(SUBSTRING(LOWER(NEW.company_name), 1, 4), LPAD(CAST(LAST_INSERT_ID() AS CHAR(10)), 4, '0'));
END ;;
DELIMITER ;
DROP TABLE IF EXISTS reference;
CREATE TABLE reference (
company_ref char(4) NOT NULL DEFAULT '',
numeric_ref int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (company_ref, numeric_ref)
) ENGINE=MyISAM;
And for completeness here is a trigger that will create a new reference if the company name is altered.
DROP TRIGGER IF EXISTS company_reference_up;
DELIMITER ;;
CREATE TRIGGER company_reference_up BEFORE UPDATE ON company
FOR EACH ROW BEGIN
IF NEW.company_name <> OLD.company_name THEN
DELETE FROM reference WHERE company_ref=SUBSTRING(LOWER(OLD.company_ref), 1, 4) AND numeric_ref=SUBSTRING(OLD.company_ref, 5, 4);
INSERT INTO reference SET company_ref=SUBSTRING(LOWER(NEW.company_name), 1, 4), numeric_ref=NULL;
SET NEW.company_ref=CONCAT(SUBSTRING(LOWER(NEW.company_name), 1, 4), LPAD(CAST(LAST_INSERT_ID() AS CHAR(10)), 4, '0'));
END IF;
END;
;;
DELIMITER ;
Given you're using InnoDB, why not use an explicit transaction to grab an exclusive row lock and prevent another connection from reading the same row before you're done setting a new ID based on it?
(Naturally, doing the calculation in a trigger would hold the lock for less time.)
mysqli_query($link, "BEGIN TRANSACTION");
$query = "SELECT `company_ref`
FROM `companies`
WHERE
`company_ref` LIKE '$fourLetters%'
ORDER BY `company_ref` DESC
LIMIT 1
FOR UPDATE";
$last = mysqli_fetch_assoc(mysqli_query($link, $query));
$newNum = ((int) ltrim(substr($last['company_ref'],4),'0')) + 1;
$newRef = $fourLetters.str_pad($newNum, 4, '0', STR_PAD_LEFT);
mysqli_query($link, "INSERT INTO companies . . . (new row using $newref)");
mysqli_commit($link);
Edit: Just to be 100% sure I ran a test by hand to confirm that the second transaction will return the newly inserted row after waiting rather than the original locked row.
Edit2: Also tested the case where there is no initial row returned (Where you would think there is no initial row to put a lock on) and that works as well.
Ensure you have an unique constraint on the Reference column.
Fetch the current max sequential reference the same way you do it in your sample code. You don't actually need to trim the zeroes before you cast to (int), '0001' is a valid integer.
Roll a loop and do your insert inside.
Check affected rows after the insert. You can also check the SQL state for a duplicate key error, but having zero affected rows is a good indication that your insert failed due to inserting an existing Reference value.
If you have zero affected rows, increment the sequential number, and roll the loop again. If you have non-zero affected rows, you're done and have an unique identifier inserted.
Easiest way to avoid duplicate values for the reference column is to add a unique constraint. So if multiple processes try to set to the same value, MySQL will reject the second attempt and throw an error.
ALTER TABLE table_name ADD UNIQUE KEY (`company_ref`);
If I were faced with your situation, I would handle the company reference id generation within the application layer, triggers can get messy if not setup correctly.
A hacky version that works for InnoDB as well.
Replace the insert to companies with two inserts in a transaction:
INSERT INTO __keys
VALUES (LEFT(LOWER('Smiths Joinery'),4), LAST_INSERT_ID(1))
ON DUPLICATE KEY UPDATE
num = LAST_INSERT_ID(num+1);
INSERT INTO __companies (comp_name, reference)
VALUES ('Smiths Joinery',
CONCAT(LEFT(LOWER(comp_name),4), LPAD(LAST_INSERT_ID(), 4, '0')));
where:
CREATE TABLE `__keys` (
`prefix` char(4) NOT NULL,
`num` smallint(5) unsigned NOT NULL,
PRIMARY KEY (`prefix`)
) ENGINE=InnoDB COLLATE latin1_general_ci;
CREATE TABLE `__companies` (
`comp_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`comp_name` varchar(45) NOT NULL,
`reference` char(8) NOT NULL,
PRIMARY KEY (`comp_id`)
) ENGINE=InnoDB COLLATE latin1_general_ci;
Notice:
latin1_general_ci can be replaced with utf8_general_ci,
LEFT(LOWER('Smiths Joinery'),4) would better become a function in PHP
This question already has answers here:
MySQL 'UPDATE ON DUPLICATE KEY' without a unique column?
(3 answers)
Closed 10 months ago.
I'm trying to create more robust MySQL Queries and learn in the process. Currently I'm having a hard time trying to grasp the ON DUPLICATE KEY syntax and possible uses.
I have an INSERT Query that I want to INSERT only if there is no record with the same ID and name, otherwise UPDATE. ID and name are not UNIQUE but ID is indexed.ID isn't UNIQUE because it references another record from another table and I want to have multiple records in this table that reference that one specific record from the other table.
How can I use ON DUPLICATE KEY to INSERT only if there is no record with that ID and name already set else UPDATE that record?
I can easily achieve this with a couple of QUERIES and then have PHP do the IF ELSE part, but I want to know how to LIMIT the amount of QUERIES I send to MySQL.
UPDATE: Note you need to use IF EXISTS instead of IS NULL as indicated in the original answer.
Code to create stored procedure to encapsulate all logic and check if Flavours exist:
DELIMITER //
DROP PROCEDURE `GetFlavour`//
CREATE PROCEDURE `GetFlavour`(`FlavourID` INT, `FlavourName` VARCHAR(20))
BEGIN
IF EXISTS (SELECT * FROM Flavours WHERE ID = FlavourID) THEN
UPDATE Flavours SET ID = FlavourID;
ELSE
INSERT INTO Flavours (ID, Name) VALUES (FlavourID, FlavourName);
END IF;
END //
DELIMITER ;
ORIGINAL:
You could use this code. It will check for the existence of a particular record, and if the recordset is NULL, then it will go through and insert the new record for you.
IF (SELECT * FROM `TableName` WHERE `ID` = 2342 AND `Name` = 'abc') IS NULL THEN
INSERT INTO `TableName` (`ID`, `Name`) VALUES ('2342', 'abc');
ELSE UPDATE `TableName` SET `Name` = 'xyz' WHERE `ID` = '2342';
END IF;
I'm a little rusty on my MySQL syntax, but that code should at least get you most of the way there, rather than using ON DUPLICATE KEY.
id and name are not unique but id is
indexed. id isn't unique
How can I use ON DUPLICATE KEY to
INSERT only if there is no record with
that id and name already set else
UPDATE that record?
You can't. ON DUPLICATE KEY UPDATE needs a unique or primary key to determine which row to update. You are better off having PHP do the IF ELSE part.
edit:
If the combination of name and id IS supposed to be unique, you can create a multi-column UNIQUE index. From there you can use ON DUPLICATE KEY UPDATE.
Why not just use a stored procedure, then you can embed all the logic there are plus you have a reusable piece of code (e.g. the stored proc) that you can use in other applications. Finally, this only requires one round trip to the server to call the stored proc.
I asked this question a little earlier today but am not sure as to how clear I was.
I have a MySQL column filled with ordered numbers 1-56. These numbers were generated by my PHP script, not by auto_increment.
What I'd like to do is make this column auto_incrementing after the PHP script sets the proper numbers. The PHP script works hand in hand with a jQuery interface that allows me to reorder a list of items using jQuery's UI plugin.
Once I decide what order I'd like the entries in, I'd like for the column to be set to auto increment, such that if i were to insert a new entry, it would recognize the highest number already existing in the column and set its own id number to be one higher than what's already existing.
Does anyone have any suggestions on how to approach this scenario?
I'd suggest creating the table with your auto_increment already in place. You can specify a value for the auto_inc column, and mysql will use it, and still the next insert to specify a NULL or 0 value for the auto_inc column will magically get $highest + 1 assigned to it.
example:
mysql> create table foobar (i int auto_increment primary key);
mysql> insert into foobar values (10),(25);
mysql> insert into foobar values (null);
mysql> select * from foobar;
# returns 10,25,26
You can switch it to MySQL's auto_increment implementation, but it'll take 3 queries to do it:
a) ALTER TABLE to add the auto_increment to the field in question
b) SELECT MAX(id) + 1 to find out what you need to set the ID to
c) ALTER TABLE table AUTO_INCREMENT =result from (b)
MySQL considers altering the AUTO_INCREMENT value a table-level action, so you can't do it in (a), and it doesn't allow you to do MAX(id) in (c), so 3 queries.
You can change that with a query, issued through php, using the mysql console interface or (easiest) using phpmyadmin.
ALTER TABLE table_name CHANGE old_column_name new_column_name column_definition;
ALTER TABLE table_name AUTO_INCREMENT = highest_current_index + 1
column_definiton:
old_column_definition AUTO_INCREMENT
More info:
http://dev.mysql.com/doc/refman/5.1/en/alter-table.html
http://dev.mysql.com/doc/refman/5.1/en/create-table.html
EDIT
Always use mysql_insert_id or the appropiate function of your abstraction layer to get the last created id, as LAST_INSERT_ID may lead to wrong results.
No, stop it. This isn't the point of auto_increment. If you aren't going to make them ordered by the id then don't make them auto_increment, just add a column onto the end of the table for ordering and enjoy the added flexibility it gives you. It seems like you're trying to pack two different sets of information into one column and it's really only going to bite you in the ass despite all the well-meaning people in this thread telling you how to go about shooting yourself in the foot.
In MySQL you can set a custom value for an auto_increment field. MySQL will then use the highest auto_increment column value for new rows, essentially MAX(id)+1. This means you can effectively reserve a range of IDs for custom use. For instance:
CREATE TABLE mytable (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
col1 VARCHAR(256)
);
ALTER TABLE mytable AUTO_INCREMENT = 5001;
In this schema all ids < 5001 are reserved for use by your system. So, your PHP script can auto-generate values:
for ($i=1; $i<=56; $i++)
mysql_query("INSERT INTO mytable SET id = $i, col1= 'whatevers'");
New entries will use the non-reserved range by not specifying id or setting it to null:
INSERT INTO mytable SET id = NULL, col1 = 'whatevers2';
-- The id of the new row will be 5001
Reserving a range like this is key - in case you need more than 56 special/system rows in the future.
ALTER TABLE <table name> <column name> NOT NULL AUTO_INCREMENT
More info:
AUTO_INCREMENT Handling in InnoDB
Server SQL Modes
I've got a table of URLs and I don't want any duplicate URLs. How do I check to see if a given URL is already in the table using PHP/MySQL?
If you don't want to have duplicates you can do following:
add uniqueness constraint
use "REPLACE" or "INSERT ... ON DUPLICATE KEY UPDATE" syntax
If multiple users can insert data to DB, method suggested by #Jeremy Ruten, can lead to an error: after you performed a check someone can insert similar data to the table.
To answer your initial question, the easiest way to check whether there is a duplicate is to run an SQL query against what you're trying to add!
For example, were you to want to check for the url http://www.example.com/ in the table links, then your query would look something like
SELECT * FROM links WHERE url = 'http://www.example.com/';
Your PHP code would look something like
$conn = mysql_connect('localhost', 'username', 'password');
if (!$conn)
{
die('Could not connect to database');
}
if(!mysql_select_db('mydb', $conn))
{
die('Could not select database mydb');
}
$result = mysql_query("SELECT * FROM links WHERE url = 'http://www.example.com/'", $conn);
if (!$result)
{
die('There was a problem executing the query');
}
$number_of_rows = mysql_num_rows($result);
if ($number_of_rows > 0)
{
die('This URL already exists in the database');
}
I've written this out longhand here, with all the connecting to the database, etc. It's likely that you'll already have a connection to a database, so you should use that rather than starting a new connection (replace $conn in the mysql_query command and remove the stuff to do with mysql_connect and mysql_select_db)
Of course, there are other ways of connecting to the database, like PDO, or using an ORM, or similar, so if you're already using those, this answer may not be relevant (and it's probably a bit beyond the scope to give answers related to this here!)
However, MySQL provides many ways to prevent this from happening in the first place.
Firstly, you can mark a field as "unique".
Lets say I have a table where I want to just store all the URLs that are linked to from my site, and the last time they were visited.
My definition might look something like this:-
CREATE TABLE links
(
url VARCHAR(255) NOT NULL,
last_visited TIMESTAMP
)
This would allow me to add the same URL over and over again, unless I wrote some PHP code similar to the above to stop this happening.
However, were my definition to change to
CREATE TABLE links
(
url VARCHAR(255) NOT NULL,
last_visited TIMESTAMP,
PRIMARY KEY (url)
)
Then this would make mysql throw an error when I tried to insert the same value twice.
An example in PHP would be
$result = mysql_query("INSERT INTO links (url, last_visited) VALUES ('http://www.example.com/', NOW()", $conn);
if (!$result)
{
die('Could not Insert Row 1');
}
$result2 = mysql_query("INSERT INTO links (url, last_visited) VALUES ('http://www.example.com/', NOW()", $conn);
if (!$result2)
{
die('Could not Insert Row 2');
}
If you ran this, you'd find that on the first attempt, the script would die with the comment Could not Insert Row 2. However, on subsequent runs, it'd die with Could not Insert Row 1.
This is because MySQL knows that the url is the Primary Key of the table. A Primary key is a unique identifier for that row. Most of the time, it's useful to set the unique identifier for a row to be a number. This is because MySQL is quicker at looking up numbers than it is looking up text. Within MySQL, keys (and espescially Primary Keys) are used to define relationships between two tables. For example, if we had a table for users, we could define it as
CREATE TABLE users (
username VARCHAR(255) NOT NULL,
password VARCHAR(40) NOT NULL,
PRIMARY KEY (username)
)
However, when we wanted to store information about a post the user had made, we'd have to store the username with that post to identify that the post belonged to that user.
I've already mentioned that MySQL is faster at looking up numbers than strings, so this would mean we'd be spending time looking up strings when we didn't have to.
To solve this, we can add an extra column, user_id, and make that the primary key (so when looking up the user record based on a post, we can find it quicker)
CREATE TABLE users (
user_id INT(10) NOT NULL AUTO_INCREMENT,
username VARCHAR(255) NOT NULL,
password VARCHAR(40) NOT NULL,
PRIMARY KEY (`user_id`)
)
You'll notice that I've also added something new here - AUTO_INCREMENT. This basically allows us to let that field look after itself. Each time a new row is inserted, it adds 1 to the previous number, and stores that, so we don't have to worry about numbering, and can just let it do this itself.
So, with the above table, we can do something like
INSERT INTO users (username, password) VALUES('Mez', 'd3571ce95af4dc281f142add33384abc5e574671');
and then
INSERT INTO users (username, password) VALUES('User', '988881adc9fc3655077dc2d4d757d480b5ea0e11');
When we select the records from the database, we get the following:-
mysql> SELECT * FROM users;
+---------+----------+------------------------------------------+
| user_id | username | password |
+---------+----------+------------------------------------------+
| 1 | Mez | d3571ce95af4dc281f142add33384abc5e574671 |
| 2 | User | 988881adc9fc3655077dc2d4d757d480b5ea0e11 |
+---------+----------+------------------------------------------+
2 rows in set (0.00 sec)
However, here - we have a problem - we can still add another user with the same username! Obviously, this is something we don't want to do!
mysql> SELECT * FROM users;
+---------+----------+------------------------------------------+
| user_id | username | password |
+---------+----------+------------------------------------------+
| 1 | Mez | d3571ce95af4dc281f142add33384abc5e574671 |
| 2 | User | 988881adc9fc3655077dc2d4d757d480b5ea0e11 |
| 3 | Mez | d3571ce95af4dc281f142add33384abc5e574671 |
+---------+----------+------------------------------------------+
3 rows in set (0.00 sec)
Lets change our table definition!
CREATE TABLE users (
user_id INT(10) NOT NULL AUTO_INCREMENT,
username VARCHAR(255) NOT NULL,
password VARCHAR(40) NOT NULL,
PRIMARY KEY (user_id),
UNIQUE KEY (username)
)
Lets see what happens when we now try and insert the same user twice.
mysql> INSERT INTO users (username, password) VALUES('Mez', 'd3571ce95af4dc281f142add33384abc5e574671');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO users (username, password) VALUES('Mez', 'd3571ce95af4dc281f142add33384abc5e574671');
ERROR 1062 (23000): Duplicate entry 'Mez' for key 'username'
Huzzah!! We now get an error when we try and insert the username for the second time. Using something like the above, we can detect this in PHP.
Now, lets go back to our links table, but with a new definition.
CREATE TABLE links
(
link_id INT(10) NOT NULL AUTO_INCREMENT,
url VARCHAR(255) NOT NULL,
last_visited TIMESTAMP,
PRIMARY KEY (link_id),
UNIQUE KEY (url)
)
and let's insert "http://www.example.com" into the database.
INSERT INTO links (url, last_visited) VALUES ('http://www.example.com/', NOW());
If we try and insert it again....
ERROR 1062 (23000): Duplicate entry 'http://www.example.com/' for key 'url'
But what happens if we want to update the time it was last visited?
Well, we could do something complex with PHP, like so:-
$result = mysql_query("SELECT * FROM links WHERE url = 'http://www.example.com/'", $conn);
if (!$result)
{
die('There was a problem executing the query');
}
$number_of_rows = mysql_num_rows($result);
if ($number_of_rows > 0)
{
$result = mysql_query("UPDATE links SET last_visited = NOW() WHERE url = 'http://www.example.com/'", $conn);
if (!$result)
{
die('There was a problem updating the links table');
}
}
Or, even grab the id of the row in the database and use that to update it.
$result = mysql_query("SELECT * FROM links WHERE url = 'http://www.example.com/'", $conn);
if (!$result)
{
die('There was a problem executing the query');
}
$number_of_rows = mysql_num_rows($result);
if ($number_of_rows > 0)
{
$row = mysql_fetch_assoc($result);
$result = mysql_query('UPDATE links SET last_visited = NOW() WHERE link_id = ' . intval($row['link_id'], $conn);
if (!$result)
{
die('There was a problem updating the links table');
}
}
But, MySQL has a nice built in feature called REPLACE INTO
Let's see how it works.
mysql> SELECT * FROM links;
+---------+-------------------------+---------------------+
| link_id | url | last_visited |
+---------+-------------------------+---------------------+
| 1 | http://www.example.com/ | 2011-08-19 23:48:03 |
+---------+-------------------------+---------------------+
1 row in set (0.00 sec)
mysql> INSERT INTO links (url, last_visited) VALUES ('http://www.example.com/', NOW());
ERROR 1062 (23000): Duplicate entry 'http://www.example.com/' for key 'url'
mysql> REPLACE INTO links (url, last_visited) VALUES ('http://www.example.com/', NOW());
Query OK, 2 rows affected (0.00 sec)
mysql> SELECT * FROM links;
+---------+-------------------------+---------------------+
| link_id | url | last_visited |
+---------+-------------------------+---------------------+
| 2 | http://www.example.com/ | 2011-08-19 23:55:55 |
+---------+-------------------------+---------------------+
1 row in set (0.00 sec)
Notice that when using REPLACE INTO, it's updated the last_visited time, and not thrown an error!
This is because MySQL detects that you're attempting to replace a row. It knows the row that you want, as you've set url to be unique. MySQL figures out the row to replace by using the bit that you passed in that should be unique (in this case, the url) and updating for that row the other values. It's also updated the link_id - which is a bit unexpected! (In fact, I didn't realise this would happen until I just saw it happen!)
But what if you wanted to add a new URL? Well, REPLACE INTO will happily insert a new row if it can't find a matching unique row!
mysql> REPLACE INTO links (url, last_visited) VALUES ('http://www.stackoverflow.com/', NOW());
Query OK, 1 row affected (0.00 sec)
mysql> SELECT * FROM links;
+---------+-------------------------------+---------------------+
| link_id | url | last_visited |
+---------+-------------------------------+---------------------+
| 2 | http://www.example.com/ | 2011-08-20 00:00:07 |
| 3 | http://www.stackoverflow.com/ | 2011-08-20 00:01:22 |
+---------+-------------------------------+---------------------+
2 rows in set (0.00 sec)
I hope this answers your question, and gives you a bit more information about how MySQL works!
Are you concerned purely about URLs that are the exact same string .. if so there is a lot of good advice in other answers. Or do you also have to worry about canonization?
For example: http://google.com and http://go%4fgle.com are the exact same URL, but would be allowed as duplicates by any of the database only techniques. If this is an issue you should preprocess the URLs to resolve and character escape sequences.
Depending where the URLs are coming from you will also have to worry about parameters and whether they are significant in your application.
First, prepare the database.
Domain names aren't case-sensitive, but you have to assume the rest of a URL is. (Not all web servers respect case in URLs, but most do, and you can't easily tell by looking.)
Assuming you need to store more than a domain name, use a case-sensitive collation.
If you decide to store the URL in two columns--one for the domain name and one for the resource locator--consider using a case-insensitive collation for the domain name, and a case-sensitive collation for the resource locator. If I were you, I'd test both ways (URL in one column vs. URL in two columns).
Put a UNIQUE constraint on the URL column. Or on the pair of columns, if you store the domain name and resource locator in separate columns, as UNIQUE (url, resource_locator).
Use a CHECK() constraint to keep encoded URLs out of the database. This CHECK() constraint is essential to keep bad data from coming in through a bulk copy or through the SQL shell.
Second, prepare the URL.
Domain names aren't case-sensitive. If you store the full URL in one column, lowercase the domain name on all URLs. But be aware that some languages have uppercase letters that have no lowercase equivalent.
Think about trimming trailing characters. For example, these two URLs from amazon.com point to the same product. You probably want to store the second version, not the first.
http://www.amazon.com/Systemantics-Systems-Work-Especially-They/dp/070450331X/ref=sr_1_1?ie=UTF8&qid=1313583998&sr=8-1
http://www.amazon.com/Systemantics-Systems-Work-Especially-They/dp/070450331X
Decode encoded URLs. (See php's urldecode() function. Note carefully its shortcomings, as described in that page's comments.) Personally, I'd rather handle these kinds of transformations in the database rather than in client code. That would involve revoking permissions on the tables and views, and allowing inserts and updates only through stored procedures; the stored procedures handle all the string operations that put the URL into a canonical form. But keep an eye on performance when you try that. CHECK() constraints (see above) are your safety net.
Third, if you're inserting only the URL, don't test for its existence first. Instead, try to insert and trap the error that you'll get if the value already exists. Testing and inserting hits the database twice for every new URL. Insert-and-trap just hits the database once. Note carefully that insert-and-trap isn't the same thing as insert-and-ignore-errors. Only one particular error means you violated the unique constraint; other errors mean there are other problems.
On the other hand, if you're inserting the URL along with some other data in the same row, you need to decide ahead of time whether you'll handle duplicate urls by
deleting the old row and inserting a new one (See MySQL's REPLACE extension to SQL)
updating existing values (See ON DUPLICATE KEY UPDATE)
ignoring the issue
requiring the user to take further action
REPLACE eliminates the need to trap duplicate key errors, but it might have unfortunate side effects if there are foreign key references.
To guarantee uniqueness you need to add a unique constraint. Assuming your table name is "urls" and the column name is "url", you can add the unique constraint with this alter table command:
alter table urls add constraint unique_url unique (url);
The alter table will probably fail (who really knows with MySQL) if you've already got duplicate urls in your table already.
The simple SQL solutions require a unique field; the logic solutions do not.
You should normalize your urls to ensure there is no duplication. Functions in PHP such as strtolower() and urldecode() or rawurldecode().
Assumptions: Your table name is 'websites', the column name for your url is 'url', and the arbitrary data to be associated with the url is in the column 'data'.
Logic Solutions
SELECT COUNT(*) AS UrlResults FROM websites WHERE url='http://www.domain.com'
Test the previous query with if statements in SQL or PHP to ensure that it is 0 before you continue with an INSERT statement.
Simple SQL Statements
Scenario 1: Your db is a first come first serve table and you have no desire to have duplicate entries in the future.
ALTER TABLE websites ADD UNIQUE (url)
This will prevent any entries from being able to be entered in to the database if the url value already exists in that column.
Scenario 2: You want the most up to date information for each url and don't want to duplicate content. There are two solutions for this scenario. (These solutions also require 'url' to be unique so the solution in Scenario 1 will also need to be carried out.)
REPLACE INTO websites (url, data) VALUES ('http://www.domain.com', 'random data')
This will trigger a DELETE action if a row exists followed by an INSERT in all cases, so be careful with ON DELETE declarations.
INSERT INTO websites (url, data) VALUES ('http://www.domain.com', 'random data')
ON DUPLICATE KEY UPDATE data='random data'
This will trigger an UPDATE action if a row exists and an INSERT if it does not.
In considering a solution to this problem, you need to first define what a "duplicate URL" means for your project. This will determine how to canonicalize the URLs before adding them to the database.
There are at least two definitions:
Two URLs are considered duplicates if they represent the same resource knowing nothing about the corresponding web service that generates the corresponding content. Some considerations include:
The scheme and domain name portion of the URLs are case-insensitive, so HTTP://WWW.STACKOVERFLOW.COM/ is the same as http://www.stackoverflow.com/.
If one URL specifies a port, but it is the conventional port for the scheme and they are otherwise equivalent, then they are the same ( http://www.stackoverflow.com/ and http://www.stackoverflow.com:80/).
If the parameters in the query string are simple rearrangements and the parameter names are all different, then they are the same; e.g. http://authority/?a=test&b=test and http://authority/?b=test&a=test. Note that http://authority/?a%5B%5D=test1&a%5B%5D=test2 is not the same, by this first definition of sameness, as http://authority/?a%5B%5D=test2&a%5B%5D=test1.
If the scheme is HTTP or HTTPS, then the hash portions of the URLs can be removed, as this portion of the URL is not sent to the web server.
A shortened IPv6 address can be expanded.
Append a trailing forward slash to the authority only if it is missing.
Unicode canonicalization changes the referenced resource; e.g. you can't conclude that http://google.com/?q=%C3%84 (%C3%84 represents 'Ä' in UTF-8) is the same as http://google.com/?q=A%CC%88 (%CC%88 represents U+0308, COMBINING DIAERESIS).
If the scheme is HTTP or HTTPS, 'www.' in one URL's authority can not simply be removed if the two URLs are otherwise equivalent, as the text of the domain name is sent as the value of the Host HTTP header, and some web servers use virtual hosts to send back different content based on this header. More generally, even if the domain names resolve to the same IP address, you can not conclude that the referenced resources are the same.
Apply basic URL canonicalization (e.g. lower case the scheme and domain name, supply the default port, stable sort query parameters by parameter name, remove the hash portion in the case of HTTP and HTTPS, ...), and take into account knowledge of the web service. Maybe you will assume that all web services are smart enough to canonicalize Unicode input (Wikipedia is, for example), so you can apply Unicode Normalization Form Canonical Composition (NFC). You would strip 'www.' from all Stack Overflow URLs. You could use PostRank's postrank-uri code, ported to PHP, to remove all sorts of pieces of the URLs that are unnecessary (e.g. &utm_source=...).
Definition 1 leads to a stable solution (i.e. there is no further canonicalization that can be performed and the canonicalization of a URL will not change). Definition 2, which I think is what a human considers the definition of URL canonicalization, leads to a canonicalization routine that can yield different results at different moments in time.
Whichever definition you choose, I suggest that you use separate columns for the scheme, login, host, port, and path portions. This will allow you to use indexes intelligently. The columns for scheme and host can use a character collation (all character collations are case-insensitive in MySQL), but the columns for the login and path need to use a binary, case-insensitive collation. Also, if you use Definition 2, you need to preserve the original scheme, authority, and path portions, as certain canonicalization rules might be added or removed from time to time.
EDIT: Here are example table definitions:
CREATE TABLE `urls1` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`scheme` VARCHAR(20) NOT NULL,
`canonical_login` VARCHAR(100) DEFAULT NULL COLLATE 'utf8mb4_bin',
`canonical_host` VARCHAR(100) NOT NULL COLLATE 'utf8mb4_unicode_ci', /* the "ci" stands for case-insensitive. Also, we want 'utf8mb4_unicode_ci'
rather than 'utf8mb4_general_ci' because 'utf8mb4_general_ci' treats accented characters as equivalent. */
`port` INT UNSIGNED,
`canonical_path` VARCHAR(4096) NOT NULL COLLATE 'utf8mb4_bin',
PRIMARY KEY (`id`),
INDEX (`canonical_host`(10), `scheme`)
) ENGINE = 'InnoDB';
CREATE TABLE `urls2` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`canonical_scheme` VARCHAR(20) NOT NULL,
`canonical_login` VARCHAR(100) DEFAULT NULL COLLATE 'utf8mb4_bin',
`canonical_host` VARCHAR(100) NOT NULL COLLATE 'utf8mb4_unicode_ci',
`port` INT UNSIGNED,
`canonical_path` VARCHAR(4096) NOT NULL COLLATE 'utf8mb4_bin',
`orig_scheme` VARCHAR(20) NOT NULL,
`orig_login` VARCHAR(100) DEFAULT NULL COLLATE 'utf8mb4_bin',
`orig_host` VARCHAR(100) NOT NULL COLLATE 'utf8mb4_unicode_ci',
`orig_path` VARCHAR(4096) NOT NULL COLLATE 'utf8mb4_bin',
PRIMARY KEY (`id`),
INDEX (`canonical_host`(10), `canonical_scheme`),
INDEX (`orig_host`(10), `orig_scheme`)
) ENGINE = 'InnoDB';
Table `urls1` is for storing canonical URLs according to definition 1. Table `urls2` is for storing canonical URLs according to definition 2.
Unfortunately you will not be able to specify a UNIQUE constraint on the tuple (`scheme`/`canonical_scheme`, `canonical_login`, `canonical_host`, `port`, `canonical_path`) as MySQL limits the length of InnoDB keys to 767 bytes.
i don't know the syntax for MySQL, but all you need to do is wrap your INSERT with IF statement that will query the table and see if the record with given url EXISTS, if it exists - don't insert a new record.
if MSSQL you can do this:
IF NOT EXISTS (SELECT 1 FROM YOURTABLE WHERE URL = 'URL')
INSERT INTO YOURTABLE (...) VALUES (...)
If you want to insert urls into the table, but only those that don't exist already you can add a UNIQUE contraint on the column and in your INSERT query add IGNORE so that you don't get an error.
Example: INSERT IGNORE INTO urls SET url = 'url-to-insert'
First things first. If you haven't already created the table, or you created a table but do not have data in in then you need to add a unique constriant, or a unique index. More information about choosing between index or constraints follows at the end of the post. But they both accomplish the same thing, enforcing that the column only contains unique values.
To create a table with a unique index on this column, you can use.
CREATE TABLE MyURLTable(
ID INTEGER NOT NULL AUTO_INCREMENT
,URL VARCHAR(512)
,PRIMARY KEY(ID)
,UNIQUE INDEX IDX_URL(URL)
);
If you just want a unique constraint, and no index on that table, you can use
CREATE TABLE MyURLTable(
ID INTEGER NOT NULL AUTO_INCREMENT
,URL VARCHAR(512)
,PRIMARY KEY(ID)
,CONSTRAINT UNIQUE UNIQUE_URL(URL)
);
Now, if you already have a table, and there is no data in it, then you can add the index or constraint to the table with one of the following pieces of code.
ALTER TABLE MyURLTable
ADD UNIQUE INDEX IDX_URL(URL);
ALTER TABLE MyURLTable
ADD CONSTRAINT UNIQUE UNIQUE_URL(URL);
Now, you may already have a table with some data in it. In that case, you may already have some duplicate data in it. You can try creating the constriant or index shown above, and it will fail if you already have duplicate data. If you don't have duplicate data, great, if you do, you'll have to remove the duplicates. You can see a lit of urls with duplicates using the following query.
SELECT URL,COUNT(*),MIN(ID)
FROM MyURLTable
GROUP BY URL
HAVING COUNT(*) > 1;
To delete rows that are duplicates, and keep one, do the following:
DELETE RemoveRecords
FROM MyURLTable As RemoveRecords
LEFT JOIN
(
SELECT MIN(ID) AS ID
FROM MyURLTable
GROUP BY URL
HAVING COUNT(*) > 1
UNION
SELECT ID
FROM MyURLTable
GROUP BY URL
HAVING COUNT(*) = 1
) AS KeepRecords
ON RemoveRecords.ID = KeepRecords.ID
WHERE KeepRecords.ID IS NULL;
Now that you have deleted all the records, you can go ahead and create you index or constraint. Now, if you want to insert a value into your database, you should use something like.
INSERT IGNORE INTO MyURLTable(URL)
VALUES('http://www.example.com');
That will attempt to do the insert, and if it finds a duplicate, nothing will happen. Now, lets say you have other columns, you can do something like this.
INSERT INTO MyURLTable(URL,Visits)
VALUES('http://www.example.com',1)
ON DUPLICATE KEY UPDATE Visits=Visits+1;
That will look try to insert the value, and if it finds the URL, then it will update the record by incrementing the visits counter. Of course, you can always do a plain old insert, and handle the resulting error in your PHP Code. Now, as for whether or not you should use constraints or indexes, that depends on a lot of factors. Indexes make for faster lookups, so your performance will be better as the table gets bigger, but storing the index will take up extra space. Indexes also usually make inserts and updates take longer as well, because it has to update the index. However, since the value will have to be looked up either way, to enforce the uniqueness, in this case, It may be quicker to just have the index anyway. As for anything performance related, the answer is try both options and profile the results to see which works best for your situation.
If you just want a yes or no answer this syntax should give you the best performance.
select if(exists (select url from urls where url = 'http://asdf.com'), 1, 0) from dual
If you just want to make sure there are no duplicates then add an unique index to the url field, that way there is no need to explicitly check if the url exists, just insert as normal, and if it is already there then the insert will fail with a duplicate key error.
The answer depends on whether you want to know when an attempt is made to enter a record with a duplicate field. If you don't care then use the "INSERT... ON DUPLICATE KEY" syntax as this will make your attempt quietly succeed without creating a duplicate.
If on the other hand you want to know when such an event happens and prevent it, then you should use a unique key constraint which will cause the attempted insert/update to fail with a meaningful error.
$url = "http://www.scroogle.com";
$query = "SELECT `id` FROM `urls` WHERE `url` = '$url' ";
$resultdb = mysql_query($query) or die(mysql_error());
list($idtemp) = mysql_fetch_array($resultdb) ;
if(empty($idtemp)) // if $idtemp is empty the url doesn't exist and we go ahead and insert it into the db.
{
mysql_query("INSERT INTO urls (`url` ) VALUES('$url') ") or die (mysql_error());
}else{
//do something else if the url already exists in the DB
}
Make the column the primary key
You can locate (and remove) using a self-join. Your table has some URL and also some PK (We know that the PK is not the URL because otherwise you would not be allowed to have duplicates)
SELECT
*
FROM
yourTable a
JOIN
yourTable b -- Join the same table
ON b.[URL] = a.[URL] -- where the URL's match
AND b.[PK] <> b.[PK] -- but the PK's are different
This will return all rows which have duplicated URLs.
Say, though, that you wanted to only select the duplicates and exclude the original.... Well you would need to decide what constitutes the original. For the purpose of this answer let's assume that the lowest PK is the "original"
All you need to do is add the following clause to the above query:
WHERE
a.[PK] NOT IN (
SELECT
TOP 1 c.[PK] -- Only grabbing the original!
FROM
yourTable c
WHERE
c.[URL] = a.[URL] -- has the same URL
ORDER BY
c.[PK] ASC) -- sort it by whatever your criterion is for "original"
Now you have a set of all non-original duplicated rows. You could easily execute a DELETE or whatever you like from this result set.
Note that this approach may be inefficient, in part because mySQL doesn't always handle IN well but I understand from the OP that this is sort of "clean up" on the table, not always a check.
If you want to check at INSERT time whether or not a value already exists you can run something like this
SELECT
1
WHERE
EXISTS (SELECT * FROM yourTable WHERE [URL] = 'testValue')
If you get a result then you can conclude the value already exists in your DB at least once.
You could do this query:
SELECT url FROM urls WHERE url = 'http://asdf.com' LIMIT 1
Then check if mysql_num_rows() == 1 to see if it exists.