I started by googling and found the article How to write INSERT if NOT EXISTS queries in standard SQL which talks about mutex tables.
I have a table with ~14 million records. If I want to add more data in the same format, is there a way to ensure the record I want to insert does not already exist without using a pair of queries (i.e., one query to check and one to insert is the result set is empty)?
Does a unique constraint on a field guarantee the insert will fail if it's already there?
It seems that with merely a constraint, when I issue the insert via PHP, the script croaks.
Use INSERT IGNORE INTO table.
There's also INSERT … ON DUPLICATE KEY UPDATE syntax, and you can find explanations in 13.2.6.2 INSERT ... ON DUPLICATE KEY UPDATE Statement.
Post from bogdan.org.ua according to Google's webcache:
18th October 2007
To start: as of the latest MySQL, syntax presented in the title is not
possible. But there are several very easy ways to accomplish what is
expected using existing functionality.
There are 3 possible solutions: using INSERT IGNORE, REPLACE, or
INSERT … ON DUPLICATE KEY UPDATE.
Imagine we have a table:
CREATE TABLE `transcripts` (
`ensembl_transcript_id` varchar(20) NOT NULL,
`transcript_chrom_start` int(10) unsigned NOT NULL,
`transcript_chrom_end` int(10) unsigned NOT NULL,
PRIMARY KEY (`ensembl_transcript_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Now imagine that we have an automatic pipeline importing transcripts
meta-data from Ensembl, and that due to various reasons the pipeline
might be broken at any step of execution. Thus, we need to ensure two
things:
repeated executions of the pipeline will not destroy our
> database
repeated executions will not die due to ‘duplicate
> primary key’ errors.
Method 1: using REPLACE
It’s very simple:
REPLACE INTO `transcripts`
SET `ensembl_transcript_id` = 'ENSORGT00000000001',
`transcript_chrom_start` = 12345,
`transcript_chrom_end` = 12678;
If the record exists, it will be overwritten; if it does not yet
exist, it will be created. However, using this method isn’t efficient
for our case: we do not need to overwrite existing records, it’s fine
just to skip them.
Method 2: using INSERT IGNORE Also very simple:
INSERT IGNORE INTO `transcripts`
SET `ensembl_transcript_id` = 'ENSORGT00000000001',
`transcript_chrom_start` = 12345,
`transcript_chrom_end` = 12678;
Here, if the ‘ensembl_transcript_id’ is already present in the
database, it will be silently skipped (ignored). (To be more precise,
here’s a quote from MySQL reference manual: “If you use the IGNORE
keyword, errors that occur while executing the INSERT statement are
treated as warnings instead. For example, without IGNORE, a row that
duplicates an existing UNIQUE index or PRIMARY KEY value in the table
causes a duplicate-key error and the statement is aborted.”.) If the
record doesn’t yet exist, it will be created.
This second method has several potential weaknesses, including
non-abortion of the query in case any other problem occurs (see the
manual). Thus it should be used if previously tested without the
IGNORE keyword.
Method 3: using INSERT … ON DUPLICATE KEY UPDATE:
Third option is to use INSERT … ON DUPLICATE KEY UPDATE
syntax, and in the UPDATE part just do nothing do some meaningless
(empty) operation, like calculating 0+0 (Geoffray suggests doing the
id=id assignment for the MySQL optimization engine to ignore this
operation). Advantage of this method is that it only ignores duplicate
key events, and still aborts on other errors.
As a final notice: this post was inspired by Xaprb. I’d also advise to
consult his other post on writing flexible SQL queries.
Solution:
INSERT INTO `table` (`value1`, `value2`)
SELECT 'stuff for value1', 'stuff for value2' FROM DUAL
WHERE NOT EXISTS (SELECT * FROM `table`
WHERE `value1`='stuff for value1' AND `value2`='stuff for value2' LIMIT 1)
Explanation:
The innermost query
SELECT * FROM `table`
WHERE `value1`='stuff for value1' AND `value2`='stuff for value2' LIMIT 1
used as the WHERE NOT EXISTS-condition detects if there already exists a row with the data to be inserted. After one row of this kind is found, the query may stop, hence the LIMIT 1 (micro-optimization, may be omitted).
The intermediate query
SELECT 'stuff for value1', 'stuff for value2' FROM DUAL
represents the values to be inserted. DUAL refers to a special one row, one column table present by default in all Oracle databases (see https://en.wikipedia.org/wiki/DUAL_table). On a MySQL-Server version 5.7.26 I got a valid query when omitting FROM DUAL, but older versions (like 5.5.60) seem to require the FROM information. By using WHERE NOT EXISTS the intermediate query returns an empty result set if the innermost query found matching data.
The outer query
INSERT INTO `table` (`value1`, `value2`)
inserts the data, if any is returned by the intermediate query.
In MySQL, ON DUPLICATE KEY UPDATE or INSERT IGNORE can be viable solutions.
An example of ON DUPLICATE KEY UPDATE update based on mysql.com:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;
An example of INSERT IGNORE based on mysql.com
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
{VALUES | VALUE} ({expr | DEFAULT},...),(...),...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Or:
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name
SET col_name={expr | DEFAULT}, ...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Or:
INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
SELECT ...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Any simple constraint should do the job, if an exception is acceptable. Examples:
primary key if not surrogate
unique constraint on a column
multi-column unique constraint
Sorry if this seems deceptively simple. I know it looks bad confronted to the link you share with us. ;-(
But I nevertheless give this answer, because it seems to fill your need. (If not, it may trigger you updating your requirements, which would be "a Good Thing"(TM) also).
If an insert would break the database unique constraint, an exception is throw at the database level, relayed by the driver. It will certainly stop your script, with a failure. It must be possible in PHP to address that case...
Try the following:
IF (SELECT COUNT(*) FROM beta WHERE name = 'John' > 0)
UPDATE alfa SET c1=(SELECT id FROM beta WHERE name = 'John')
ELSE
BEGIN
INSERT INTO beta (name) VALUES ('John')
INSERT INTO alfa (c1) VALUES (LAST_INSERT_ID())
END
REPLACE INTO `transcripts`
SET `ensembl_transcript_id` = 'ENSORGT00000000001',
`transcript_chrom_start` = 12345,
`transcript_chrom_end` = 12678;
If the record exists, it will be overwritten; if it does not yet exist, it will be created.
Here is a PHP function that will insert a row only if all the specified columns values don't already exist in the table.
If one of the columns differ, the row will be added.
If the table is empty, the row will be added.
If a row exists where all the specified columns have the specified values, the row won't be added.
function insert_unique($table, $vars)
{
if (count($vars)) {
$table = mysql_real_escape_string($table);
$vars = array_map('mysql_real_escape_string', $vars);
$req = "INSERT INTO `$table` (`". join('`, `', array_keys($vars)) ."`) ";
$req .= "SELECT '". join("', '", $vars) ."' FROM DUAL ";
$req .= "WHERE NOT EXISTS (SELECT 1 FROM `$table` WHERE ";
foreach ($vars AS $col => $val)
$req .= "`$col`='$val' AND ";
$req = substr($req, 0, -5) . ") LIMIT 1";
$res = mysql_query($req) OR die();
return mysql_insert_id();
}
return False;
}
Example usage:
<?php
insert_unique('mytable', array(
'mycolumn1' => 'myvalue1',
'mycolumn2' => 'myvalue2',
'mycolumn3' => 'myvalue3'
)
);
?>
There are several answers that cover how to solve this if you have a UNIQUE index that you can check against with ON DUPLICATE KEY or INSERT IGNORE. That is not always the case, and as UNIQUE has a length constraint (1000 bytes) you might not be able to change that. For example, I had to work with metadata in WordPress (wp_postmeta).
I finally solved it with two queries:
UPDATE wp_postmeta SET meta_value = ? WHERE meta_key = ? AND post_id = ?;
INSERT INTO wp_postmeta (post_id, meta_key, meta_value) SELECT DISTINCT ?, ?, ? FROM wp_postmeta WHERE NOT EXISTS(SELECT * FROM wp_postmeta WHERE meta_key = ? AND post_id = ?);
Query 1 is a regular UPDATE query without any effect when the data set in question is not there. Query 2 is an INSERT which depends on a NOT EXISTS, i.e. the INSERT is only executed when the data set doesn't exist.
Something worth noting is that INSERT IGNORE will still increment the primary key whether the statement was a success or not just like a normal INSERT would.
This will cause gaps in your primary keys that might make a programmer mentally unstable. Or if your application is poorly designed and depends on perfect incremental primary keys, it might become a headache.
Look into innodb_autoinc_lock_mode = 0 (server setting, and comes with a slight performance hit), or use a SELECT first to make sure your query will not fail (which also comes with a performance hit and extra code).
Update or insert without known primary key
If you already have a unique or primary key, the other answers with either INSERT INTO ... ON DUPLICATE KEY UPDATE ... or REPLACE INTO ... should work fine (note that replace into deletes if exists and then inserts - thus does not partially update existing values).
But if you have the values for some_column_id and some_type, the combination of which are known to be unique. And you want to update some_value if exists, or insert if not exists. And you want to do it in just one query (to avoid using a transaction). This might be a solution:
INSERT INTO my_table (id, some_column_id, some_type, some_value)
SELECT t.id, t.some_column_id, t.some_type, t.some_value
FROM (
SELECT id, some_column_id, some_type, some_value
FROM my_table
WHERE some_column_id = ? AND some_type = ?
UNION ALL
SELECT s.id, s.some_column_id, s.some_type, s.some_value
FROM (SELECT NULL AS id, ? AS some_column_id, ? AS some_type, ? AS some_value) AS s
) AS t
LIMIT 1
ON DUPLICATE KEY UPDATE
some_value = ?
Basically, the query executes this way (less complicated than it may look):
Select an existing row via the WHERE clause match.
Union that result with a potential new row (table s), where the column values are explicitly given (s.id is NULL, so it will generate a new auto-increment identifier).
If an existing row is found, then the potential new row from table s is discarded (due to LIMIT 1 on table t), and it will always trigger an ON DUPLICATE KEY which will UPDATE the some_value column.
If an existing row is not found, then the potential new row is inserted (as given by table s).
Note: Every table in a relational database should have at least a primary auto-increment id column. If you don't have this, add it, even when you don't need it at first sight. It is definitely needed for this "trick".
INSERT INTO table_name (columns) VALUES (values) ON CONFLICT (id) DO NOTHING;
Related
I started by googling and found the article How to write INSERT if NOT EXISTS queries in standard SQL which talks about mutex tables.
I have a table with ~14 million records. If I want to add more data in the same format, is there a way to ensure the record I want to insert does not already exist without using a pair of queries (i.e., one query to check and one to insert is the result set is empty)?
Does a unique constraint on a field guarantee the insert will fail if it's already there?
It seems that with merely a constraint, when I issue the insert via PHP, the script croaks.
Use INSERT IGNORE INTO table.
There's also INSERT … ON DUPLICATE KEY UPDATE syntax, and you can find explanations in 13.2.6.2 INSERT ... ON DUPLICATE KEY UPDATE Statement.
Post from bogdan.org.ua according to Google's webcache:
18th October 2007
To start: as of the latest MySQL, syntax presented in the title is not
possible. But there are several very easy ways to accomplish what is
expected using existing functionality.
There are 3 possible solutions: using INSERT IGNORE, REPLACE, or
INSERT … ON DUPLICATE KEY UPDATE.
Imagine we have a table:
CREATE TABLE `transcripts` (
`ensembl_transcript_id` varchar(20) NOT NULL,
`transcript_chrom_start` int(10) unsigned NOT NULL,
`transcript_chrom_end` int(10) unsigned NOT NULL,
PRIMARY KEY (`ensembl_transcript_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Now imagine that we have an automatic pipeline importing transcripts
meta-data from Ensembl, and that due to various reasons the pipeline
might be broken at any step of execution. Thus, we need to ensure two
things:
repeated executions of the pipeline will not destroy our
> database
repeated executions will not die due to ‘duplicate
> primary key’ errors.
Method 1: using REPLACE
It’s very simple:
REPLACE INTO `transcripts`
SET `ensembl_transcript_id` = 'ENSORGT00000000001',
`transcript_chrom_start` = 12345,
`transcript_chrom_end` = 12678;
If the record exists, it will be overwritten; if it does not yet
exist, it will be created. However, using this method isn’t efficient
for our case: we do not need to overwrite existing records, it’s fine
just to skip them.
Method 2: using INSERT IGNORE Also very simple:
INSERT IGNORE INTO `transcripts`
SET `ensembl_transcript_id` = 'ENSORGT00000000001',
`transcript_chrom_start` = 12345,
`transcript_chrom_end` = 12678;
Here, if the ‘ensembl_transcript_id’ is already present in the
database, it will be silently skipped (ignored). (To be more precise,
here’s a quote from MySQL reference manual: “If you use the IGNORE
keyword, errors that occur while executing the INSERT statement are
treated as warnings instead. For example, without IGNORE, a row that
duplicates an existing UNIQUE index or PRIMARY KEY value in the table
causes a duplicate-key error and the statement is aborted.”.) If the
record doesn’t yet exist, it will be created.
This second method has several potential weaknesses, including
non-abortion of the query in case any other problem occurs (see the
manual). Thus it should be used if previously tested without the
IGNORE keyword.
Method 3: using INSERT … ON DUPLICATE KEY UPDATE:
Third option is to use INSERT … ON DUPLICATE KEY UPDATE
syntax, and in the UPDATE part just do nothing do some meaningless
(empty) operation, like calculating 0+0 (Geoffray suggests doing the
id=id assignment for the MySQL optimization engine to ignore this
operation). Advantage of this method is that it only ignores duplicate
key events, and still aborts on other errors.
As a final notice: this post was inspired by Xaprb. I’d also advise to
consult his other post on writing flexible SQL queries.
Solution:
INSERT INTO `table` (`value1`, `value2`)
SELECT 'stuff for value1', 'stuff for value2' FROM DUAL
WHERE NOT EXISTS (SELECT * FROM `table`
WHERE `value1`='stuff for value1' AND `value2`='stuff for value2' LIMIT 1)
Explanation:
The innermost query
SELECT * FROM `table`
WHERE `value1`='stuff for value1' AND `value2`='stuff for value2' LIMIT 1
used as the WHERE NOT EXISTS-condition detects if there already exists a row with the data to be inserted. After one row of this kind is found, the query may stop, hence the LIMIT 1 (micro-optimization, may be omitted).
The intermediate query
SELECT 'stuff for value1', 'stuff for value2' FROM DUAL
represents the values to be inserted. DUAL refers to a special one row, one column table present by default in all Oracle databases (see https://en.wikipedia.org/wiki/DUAL_table). On a MySQL-Server version 5.7.26 I got a valid query when omitting FROM DUAL, but older versions (like 5.5.60) seem to require the FROM information. By using WHERE NOT EXISTS the intermediate query returns an empty result set if the innermost query found matching data.
The outer query
INSERT INTO `table` (`value1`, `value2`)
inserts the data, if any is returned by the intermediate query.
In MySQL, ON DUPLICATE KEY UPDATE or INSERT IGNORE can be viable solutions.
An example of ON DUPLICATE KEY UPDATE update based on mysql.com:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;
An example of INSERT IGNORE based on mysql.com
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
{VALUES | VALUE} ({expr | DEFAULT},...),(...),...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Or:
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name
SET col_name={expr | DEFAULT}, ...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Or:
INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
SELECT ...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Any simple constraint should do the job, if an exception is acceptable. Examples:
primary key if not surrogate
unique constraint on a column
multi-column unique constraint
Sorry if this seems deceptively simple. I know it looks bad confronted to the link you share with us. ;-(
But I nevertheless give this answer, because it seems to fill your need. (If not, it may trigger you updating your requirements, which would be "a Good Thing"(TM) also).
If an insert would break the database unique constraint, an exception is throw at the database level, relayed by the driver. It will certainly stop your script, with a failure. It must be possible in PHP to address that case...
Try the following:
IF (SELECT COUNT(*) FROM beta WHERE name = 'John' > 0)
UPDATE alfa SET c1=(SELECT id FROM beta WHERE name = 'John')
ELSE
BEGIN
INSERT INTO beta (name) VALUES ('John')
INSERT INTO alfa (c1) VALUES (LAST_INSERT_ID())
END
REPLACE INTO `transcripts`
SET `ensembl_transcript_id` = 'ENSORGT00000000001',
`transcript_chrom_start` = 12345,
`transcript_chrom_end` = 12678;
If the record exists, it will be overwritten; if it does not yet exist, it will be created.
Here is a PHP function that will insert a row only if all the specified columns values don't already exist in the table.
If one of the columns differ, the row will be added.
If the table is empty, the row will be added.
If a row exists where all the specified columns have the specified values, the row won't be added.
function insert_unique($table, $vars)
{
if (count($vars)) {
$table = mysql_real_escape_string($table);
$vars = array_map('mysql_real_escape_string', $vars);
$req = "INSERT INTO `$table` (`". join('`, `', array_keys($vars)) ."`) ";
$req .= "SELECT '". join("', '", $vars) ."' FROM DUAL ";
$req .= "WHERE NOT EXISTS (SELECT 1 FROM `$table` WHERE ";
foreach ($vars AS $col => $val)
$req .= "`$col`='$val' AND ";
$req = substr($req, 0, -5) . ") LIMIT 1";
$res = mysql_query($req) OR die();
return mysql_insert_id();
}
return False;
}
Example usage:
<?php
insert_unique('mytable', array(
'mycolumn1' => 'myvalue1',
'mycolumn2' => 'myvalue2',
'mycolumn3' => 'myvalue3'
)
);
?>
There are several answers that cover how to solve this if you have a UNIQUE index that you can check against with ON DUPLICATE KEY or INSERT IGNORE. That is not always the case, and as UNIQUE has a length constraint (1000 bytes) you might not be able to change that. For example, I had to work with metadata in WordPress (wp_postmeta).
I finally solved it with two queries:
UPDATE wp_postmeta SET meta_value = ? WHERE meta_key = ? AND post_id = ?;
INSERT INTO wp_postmeta (post_id, meta_key, meta_value) SELECT DISTINCT ?, ?, ? FROM wp_postmeta WHERE NOT EXISTS(SELECT * FROM wp_postmeta WHERE meta_key = ? AND post_id = ?);
Query 1 is a regular UPDATE query without any effect when the data set in question is not there. Query 2 is an INSERT which depends on a NOT EXISTS, i.e. the INSERT is only executed when the data set doesn't exist.
Something worth noting is that INSERT IGNORE will still increment the primary key whether the statement was a success or not just like a normal INSERT would.
This will cause gaps in your primary keys that might make a programmer mentally unstable. Or if your application is poorly designed and depends on perfect incremental primary keys, it might become a headache.
Look into innodb_autoinc_lock_mode = 0 (server setting, and comes with a slight performance hit), or use a SELECT first to make sure your query will not fail (which also comes with a performance hit and extra code).
Update or insert without known primary key
If you already have a unique or primary key, the other answers with either INSERT INTO ... ON DUPLICATE KEY UPDATE ... or REPLACE INTO ... should work fine (note that replace into deletes if exists and then inserts - thus does not partially update existing values).
But if you have the values for some_column_id and some_type, the combination of which are known to be unique. And you want to update some_value if exists, or insert if not exists. And you want to do it in just one query (to avoid using a transaction). This might be a solution:
INSERT INTO my_table (id, some_column_id, some_type, some_value)
SELECT t.id, t.some_column_id, t.some_type, t.some_value
FROM (
SELECT id, some_column_id, some_type, some_value
FROM my_table
WHERE some_column_id = ? AND some_type = ?
UNION ALL
SELECT s.id, s.some_column_id, s.some_type, s.some_value
FROM (SELECT NULL AS id, ? AS some_column_id, ? AS some_type, ? AS some_value) AS s
) AS t
LIMIT 1
ON DUPLICATE KEY UPDATE
some_value = ?
Basically, the query executes this way (less complicated than it may look):
Select an existing row via the WHERE clause match.
Union that result with a potential new row (table s), where the column values are explicitly given (s.id is NULL, so it will generate a new auto-increment identifier).
If an existing row is found, then the potential new row from table s is discarded (due to LIMIT 1 on table t), and it will always trigger an ON DUPLICATE KEY which will UPDATE the some_value column.
If an existing row is not found, then the potential new row is inserted (as given by table s).
Note: Every table in a relational database should have at least a primary auto-increment id column. If you don't have this, add it, even when you don't need it at first sight. It is definitely needed for this "trick".
INSERT INTO table_name (columns) VALUES (values) ON CONFLICT (id) DO NOTHING;
I have a PHP 7.3 project which connects via PDO to a MySQL database or a MSSQL database, depending on being run on Linux or Windows.
I want to insert a new values into a table, if the unique value is not yet in that table. If it is already in the table, I want to update the non-unique values.
I searched a lot of docs and SO posts, also, but I couldn't find a syntax, which does that in one query for both database types.
SQL Server query:
IF (EXISTS (SELECT * FROM failed_logins_ip_address WHERE ip_address = 'xxx'))
BEGIN
UPDATE failed_logins_ip_address
SET attempts_count = attempts_count + 1, attempt_datetime = CURRENT_TIMESTAMP
WHERE ip_address = 'xxx'
END
ELSE
BEGIN
INSERT INTO failed_logins_ip_address (ip_address, attempts_count, attempt_datetime)
VALUES ('xxx', 1, CURRENT_TIMESTAMP)
END
MySQL query:
INSERT INTO failed_logins_ip_address (ip_address, attempts_count, attempt_datetime)
VALUES ('xxx', 1, CURRENT_TIMESTAMP)
ON DUPLICATE KEY
UPDATE attempts_count = attempts_count + 1, attempt_datetime = CURRENT_TIMESTAMP
'ip_addess' column is unique, and the table structure is identical for both MSSQL and MySQL.
Is there a syntax, which can do an IF INSERT ELSE UPDATE in both database types?
Yes, I do (PDO) parameter binding, xxx is just to shorten the code snippet.
Yes, I could use identical syntax if I did it in two queries (first select, then insert or update) but I want to avoid (hopefully) unnecessary queries.
No, I do not want to insert every login attempt so I do not need the update anymore because I do not need this data.
If the REPLACE approach would work: this does not update, it deletes and inserts, which I also do not want.
My current solution: I check in PHP for the current database type and switch/case the query strings. It is clean but one string is even less smelly ;-)
UPDATE:
I changed the MSSQL query around: from of IF NOT EXISTS TO IF EXISTS to improve the efficiency. UPDATE will occur a lot more often than INSERT, so in most of the cases, only the first (sub)query will be executed.
After digging deeper, I found this post by a Derek Dieter, which describes how to replace SQL Server's IF EXISTS ELSE by WHERE EXISTS:
https://sqlserverplanet.com/optimization/avoiding-if-else-by-using-where-exists
The WHERE EXISTS syntax seems to be the same in MySQL and MSSQL.
Derek Dieter's example, with IF EXSISTS:
IF NOT EXISTS (SELECT 1 FROM customer_totals WHERE cust_id = #cust_id)
BEGIN
INSERT INTO customer_totals
(
cust_id,
order_amt
)
SELECT
cust_id = #cust_id
,order_amt = #order_amt
END
ELSE
UPDATE customer
SET order_amt = order_amt + #order_amt
WHERE cust_id = #cust_id
END
Derek Dieter's example, with WHERE EXISTS:
INSERT INTO customer_totals
(
cust_id,
order_amt
)
SELECT TOP 1 — important since we’re not constraining any records
cust_id = #cust_id
,order_amt = #order_amt
FROM customer_totals ct
WHERE NOT EXISTS — this replaces the if statement
(
SELECT 1
FROM customer_totals
WHERE cust_id = #cust_id
)
SET #rowcount = ##ROWCOUNT — return back the rows that got inserted
UPDATE customer
SET order_amt = order_amt + #order_amt
WHERE #rowcount = 0
AND cust_id = #cust_id — if no rows were inserted, the cust_id must exist, so update
I still have to test it, though, in MySQL. I'll update this post and add the code, if it works.
If you are using PHP, then you are calling the code through an interface. You can do the following:
Create a unique index on ip_address.
Attempt to insert a new row. This will fail if the row already exists.
If the insert fails (particularly with a duplicate key error), then update the existing row.
However, your goal of trying to have the same code in both databases is . . . just not going to work very well. The two databases are rather different. Perhaps you should consider constructing stored procedures in each database to do what you want and then calling those stored procedures.
UPDATE AggregatedData SET datenum="734152.979166667",
Timestamp="2010-01-14 23:30:00.000" WHERE datenum="734152.979166667";
It works if the datenum exists, but I want to insert this data as a new row if the datenum does not exist.
UPDATE
the datenum is unique but that's not the primary key
Jai is correct that you should use INSERT ... ON DUPLICATE KEY UPDATE.
Note that you do not need to include datenum in the update clause since it's the unique key, so it should not change. You do need to include all of the other columns from your table. You can use the VALUES() function to make sure the proper values are used when updating the other columns.
Here is your update re-written using the proper INSERT ... ON DUPLICATE KEY UPDATE syntax for MySQL:
INSERT INTO AggregatedData (datenum,Timestamp)
VALUES ("734152.979166667","2010-01-14 23:30:00.000")
ON DUPLICATE KEY UPDATE
Timestamp=VALUES(Timestamp)
Try using this:
If you specify ON DUPLICATE KEY UPDATE, and a row is inserted that would cause a duplicate value in a UNIQUE index orPRIMARY KEY, MySQL performs an [UPDATE`](http://dev.mysql.com/doc/refman/5.7/en/update.html) of the old row...
The ON DUPLICATE KEY UPDATE clause can contain multiple column assignments, separated by commas.
With ON DUPLICATE KEY UPDATE, the affected-rows value per row is 1 if the row is inserted as a new row, 2 if an existing row is updated, and 0 if an existing row is set to its current values. If you specify the CLIENT_FOUND_ROWS flag to mysql_real_connect() when connecting to mysqld, the affected-rows value is 1 (not 0) if an existing row is set to its current values...
This is not too bad, but we could actually combine everything into one query. I found different solutions on the internet. The simplest, but MySQL only solution is this:
INSERT INTO wp_postmeta (post_id, meta_key)
SELECT
?id,
‘page_title’
FROM
DUAL
WHERE
NOT EXISTS (
SELECT
meta_id
FROM
wp_postmeta
WHERE
post_id = ?id
AND meta_key = ‘page_title’
);
UPDATE
wp_postmeta
SET
meta_value = ?page_title
WHERE
post_id = ?id
AND meta_key = ‘page_title’;
Link to documentation.
I had a situation where I needed to update or insert on a table according to two fields (both foreign keys) on which I couldn't set a UNIQUE constraint (so INSERT ... ON DUPLICATE KEY UPDATE won't work). Here's what I ended up using:
replace into last_recogs (id, hasher_id, hash_id, last_recog)
select l.* from
(select id, hasher_id, hash_id, [new_value] from last_recogs
where hasher_id in (select id from hashers where name=[hasher_name])
and hash_id in (select id from hashes where name=[hash_name])
union
select 0, m.id, h.id, [new_value]
from hashers m cross join hashes h
where m.name=[hasher_name]
and h.name=[hash_name]) l
limit 1;
This example is cribbed from one of my databases, with the input parameters (two names and a number) replaced with [hasher_name], [hash_name], and [new_value]. The nested SELECT...LIMIT 1 pulls the first of either the existing record or a new record (last_recogs.id is an autoincrement primary key) and uses that as the value input into the REPLACE INTO.
How can I implement a undo changes function to mysql database, just like Gmail when you delete/move/tag an email.
So far I have a system log table that holds the exact sql statements executed by the user.
For example, I'm trying to transform:
INSERT INTO table (id, column1, column2) VALUES (1,'value1', 'value2')
into:
DELETE FROM table WHERE id=1, column1='value1', column2='value2'
is there a built in function to do this like the cisco routers commands, something like
(NO|UNDO|REVERT) INSERT INTO table (id, column1, column2) VALUES (1,'value1', 'value2')
Maybe my approach is incorrect, should i save the current state of my row and the changed row to get back to it's original state?.
something like:
original_query = INSERT INTO table (id, column1, column2) VALUES (1,'value1', 'value2')
executed_query = INSERT INTO table (id, column1, column2) VALUES (1,'change1', 'change2')
to later transform into:
INSERT INTO table (id, column1, column2) VALUES (1,'value1', 'value2') ON DUPLICATE KEY UPDATE
column1=VALUES(column1), column2=VALUES(column2)
But maybe it won't work with newly inserted rows or can cause troubles if i modify the primary key so i will rather let them unchanged.
This is my log table:
CREATE TABLE `log` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT ,
`date` datetime NOT NULL ,
`user` int(11) NOT NULL,
`client` text COMMENT ,
`module` int(11) unsigned NOT NULL ,
`query` text NOT NULL ,
`result` tinyint(1) NOT NULL ,
`comment` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8
The objective is like i said, undo changes from certain period of time based on the date of the statement execution, for example (can be in php)
function_undo(startdate, enddate)
{
RESULT = SELECT query FROM log WHERE date BETWEEN startdate AND endate
FOR EACH RESULT AS KEY - query
REVERT query
}
or a undo button to revert one single action (single logged query ).
It's my concept of this 'incremental backup changes' correct or am I overcomplicating everything?
Considering the obvious fact that the size of my database will be double or maybe tripple if I store the full queries. Should I store it in a different database ? or simply erase the log table once I make a programed full backup to only keep recent changes?
Any advices are welcome...
It was always problematic, SQL 2012 addresses this issue.
Temporal model is simple: add interval columns (valid_from, valid_to ) but it is very complicated to implement constraints.
Model manipulation is also simple:
1. insert - new version valid_from=now, valit_to=null
2. update - new version valid_from=now, valit_to=null, update previous version valit_to=now
3. delete - update current version valit_to=now
4. undo delete - update last version valit_to=null
5. undo update/insert - delete current version if you do not need redo and update valit_to=null if previous version exits
It is more complicated with redo but it is similar, typically this model is used in data warehouse to track changes instead of redo function but it should be fine for redo too. It is also know as slowly changing dimension in data warehouse.
I think you need to record the reverse of each insert / update / delete queries and then perform them to do the undo. Here is a solution for you but this does not take foreign key relationships (cascade operations) into account. It is just a simple solution concept. Hopefully it will give you more ideas. Here it goes:
assume u have a table like this that you want to undo
create table if not exists table1
(id int auto_increment primary key, mydata varchar(15));
here is the table that records reverse queries
create table if not exists undoer(id int auto_increment primary key,
undoquery text , created datetime );
create triggers for insert update and delete operations that saves the reverse/rescue query
create trigger after_insert after insert on table1 for each row
insert into undoer(undoquery,created) values
(concat('delete from table1 where id = ', cast(new.id as char)), now());
create trigger after_update after update on table1 for each row
insert into undoer(undoquery,created) values
(concat('update table1 set mydata = \'',old.mydata,
'\' where id = ', cast(new.id as char)), now());
create trigger after_delete after delete on table1 for each row
insert into undoer(undoquery,created) values
(concat('insert into table1(id,mydata)
values(',cast(old.id as char), ', \'',old.mydata,'\') '), now());
to undo, you execute the reverse queries from undoer table between your dates sorted by date in desc order
The best solution is a soft delete in the database table, usually a column named "is_deleted", and "datetime_deleted", auto populated when the user deletes.
When the delete is completed, the response includes the ID of the record- which populates a link calling an undo method the user can click, which simply undeletes the record by updating the database again.
You can then operate a job which is either executed by the user, or on a scheduled task, to clean up all data marked "is_deleted = 1" over a period of time.
I think a combination of techniques would be needed here...
You could implement a Queue system which executes a job (sending emails etc) after a certain time.
E.g. If the user deletes an object send it to the queue for 30seconds or so just incase the user clicks undo. If the user does click undo you could just simply remove the job from the queue.
This combined with soft deleting may be a good option to look into.
I've used Laravels Queue class which is really good.
I'm not really sure if there will ever be a correct answer for this as theres no correct way of doing it. Good luck though :)
I would suggest you use something like the following table to log the changes to your database.
TABLE audit_entry_log
-- This is an audit entry log table where you can track changes and log them here.
( audit_entry_log_id INTEGER PRIMARY KEY
, audit_entry_type VARCHAR2(10) NOT NULL
-- Stores the entry type or DML event - INSERT, UPDATE or DELETE.
, table_name VARCHAR2(30)
-- Stores the name of the table which got changed
, column_name VARCHAR2(30)
-- Stores the name of the column which was changed
, primary_key INTEGER
-- Stores the PK column value of the row which was changed.
-- This is to uniquely identify the row which has been changed.
, ts TIMESTAMP
-- Timestamp when the change was made.
, old_number NUMBER(36, 2)
-- If the changed field was a number, the old value should be stored here.
-- If it's an INSERT event, this would be null.
, new_number NUMBER(36,2)
-- If the changed field was a number, the new value in it should be stored here.
-- If it's a DELETE statement, this would be null.
, old_text VARCHAR2(2000)
-- Similar to old_number but for a text/varchar field.
, new_text VARCHAR2(2000)
-- Similar to new_number but for a text/varchar field.
, old_date VARCHAR2(2000)
-- Similar to old_date but for a date field.
, new_date VARCHAR2(2000)
-- Similar to new_number but for a date field.
, ...
, ... -- Any other data types you wish to include.
, ...
);
Now, suppose you have a table like this:
TABLE user
( user_id INTEGER PRIMARY KEY
, user_name VARCHAR2(50)
, birth_date DATE
, address VARCHAR2(50)
)
On this table, I have a trigger that populates audit_entry_log tracking the changes to this table.
I am giving this code example for Oracle, you can definitely tweak it a little to suit MySQL:
CREATE OR REPLACE TRIGGER user_id_trg
BEFORE INSERT OR UPDATE OR DELETE ON user
REFERENCING new AS new old AS old
FOR EACH ROW
BEGIN
IF INSERTING THEN
IF :new.user_name IS NOT NULL THEN
INSERT INTO audit_entry_log (audit_entry_type,
table_name,
column_name,
primary_key,
ts,
new_text)
VALUES ('INSERT',
'USER',
'USER_NAME',
:new.user_id,
current_timestamp(),
:new.user_name);
END IF;
--
-- Similar code would go for birth_date and address columns.
--
ELSIF UPDATING THEN
IF :new.user_name != :old.user_name THEN
INSERT INTO audit_entry_log (audit_entry_type,
table_name,
column_name,
primary_key,
ts,
old_text,
new_text)
VALUES ('INSERT',
'USER',
'USER_NAME',
:new.user_id,
current_timestamp(),
:old.user_name,
:new.user_name);
END IF;
--
-- Similar code would go for birth_date and address columns
--
ELSIF DELETING THEN
IF :old.user_name IS NOT NULL THEN
INSERT INTO audit_entry_log (audit_entry_type,
table_name,
column_name,
primary_key,
ts,
old_text)
VALUES ('INSERT',
'USER',
'USER_NAME',
:new.user_id,
current_timestamp(),
:old.user_name);
END IF;
--
-- Similar code would go for birth_date and address columns
--
END IF;
END;
/
Now, consider, as a simple example, you run this query on timestamp 31-JAN-2014 14:15:30:
INSERT INTO user (user_id, user_name, birth_date, address)
VALUES (100, 'Foo', '04-JUL-1995', 'Somewhere in New York');
Next you run an UPDATE query on timestamp 31-JAN-2014 15:00:00:
UPDATE user
SET username = 'Bar',
address = 'Somewhere in Los Angeles'
WHERE user_id = 100;
Thus your user table would have data:
user_id user_name birth_date address
------- --------- ----------- --------------------------
100 Bar 04-JUL-1995 Somewhere in Los Angeles
This results in following data in the audit_entry_log table:
audit_entry_type table_name column_name primary_key ts old_text new_text old_date new_date
---------------- ---------- ----------- ----------- -------------------- --------------------- ------------------------ -------- -----------
INSERT USER USER_NAME 100 31-JAN-2014 14:15:30 FOO
INSERT USER BIRTH_DATE 100 31-JAN-2014 14:15:30 04-JUL-1992
INSERT USER ADDRESS 100 31-JAN-2014 14:15:30 SOMEWHERE IN NEW YORK
UPDATE USER USER_NAME 100 31-JAN-2014 15:00:00 FOO BAR
UPDATE USER ADDRESS 100 31-JAN-2014 15:00:00 SOMEWHERE IN NEW YORK SOMEWHERE IN LOS ANGELES
Create a procedure like the following that would accept table name and timestamp to which we have to restore a particular table name.
The table would be restored only upto a timestamp. There will not be a from timestamp. It is only from current to a timestamp in the past.
CREATE OR REPLACE PROCEDURE restore_db (p_table_name varchar, p_to_timestamp timestamp)
AS
CURSOR cur_log IS
SELECT *
FROM audit_entry_log
WHERE table_name = p_table_name
AND ts > p_to_timestamp;
BEGIN
FOR i IN cur_log LOOP
IF i.audit_entry_type = 'INSERT' THEN
-- Delete the row that was inserted.
EXEC ('DELETE FROM '||p_table_name||' WHERE '||p_table_name||'_id = '||i.primary_key);
ELSIF i.audit_entry_type = 'UPDATE' THEN
-- Put all the old data back into the table.
IF i.old_number IS NOT NULL THEN
EXEC ('UPDATE '||p_table_name||' SET '||i.column_name||' = '||i.old_number
||' WHERE '||p_table_name||'_id = '||i.primary_key);
ELSIF i.old_text IS NOT NULL THEN
-- Similar statement as above EXEC for i.old_text
ELSE
-- Similar statement as above EXEC for i.old_text
END IF;
ELSIF i.audit_entry_type = 'DELETE' THEN
-- Write an INSERT statement for the row that has been deleted.
END IF;
END LOOP;
END;
/
Now, if you want to restore user table to a state at 31-JAN-2014 14:30:00- when the INSERT was fired and UPDATE was not fired, a procedure call like this would do a good joib:
restore_db ('USER', '31-JAN-2014 14:30:00');
I am iterating this again- treat all the above code as pseudo-code and make necessary changes when you try to run them. This is the most fail-proof design I have seen for manual query flashbacks.
Have you considered passing the old values into a separate table as XML values? Then, if you need to restore them, you can retrieve the XML values from the table.
For this kind of system, a log table is the way to go. Yes, the table will most likely be big, but it all depends on how far back you want to be able to go. You could use a time limit, as you said, and delete all logs before 6 months ago. You could also create some sort of recycle bin and don't allow users to have more than, lets say, 100 "items" in it - always keep the most recent 100 log entries for each user.
Regarding the issue of what queries to keep in your log table, there is no built in function that allows you to do what you want. But since you only log updates and deletes (no need to log inserts since users usually have the option to delete their stuff), you can easily build your own function.
Before any UPDATE or DELETE statement, you get the entire row from the database, and you create a REPLACE statement for it - it works both as an UPDATE and an INSERT. The only thing to keep in mind is that you need a PRIMARY KEY or UNIQUE index for all of your tables.
Here is an ideea on how the function should look like:
function translateStatement($table, $primaryKey, $id)
{
$sql = "SELECT * FROM `$table` WHERE `$primaryKey` = '$id'"; //should always return one row
$result = mysql_query($sql) or die(mysql_error());
$row = mysql_fetch_assoc($result);
$columns = implode(',', array_map( function($item){ return '`'.$item.'`'; }, array_keys($row)) ); //get column names
$values = implode(',', array_map( function($item){ return '"'.mysql_real_escape_string($item).'"'; }, $row) ); //get escaped column values
return 'REPLACE INTO `$table` ('.$columns.') VALUES ('.$values.')';
}
I have a simple table made up of two columns: col_A and col_B.
The primary key is defined over both.
I need to update some rows and assign to col_A values that may generate duplicates, for example:
UPDATE `table` SET `col_A` = 66 WHERE `col_A` = 70
This statement sometimes yields a duplicate key error.
I don't want to simply ignore the error with UPDATE IGNORE, because then the rows that generate the error would remain unchanged. Instead, I want them to be deleted when they would conflict with another row after they have been updated
I'd like to write something like:
UPDATE `table` SET `col_A` = 66 WHERE `col_A` = 70 ON DUPLICATE KEY REPLACE
which unfortunately isn't legal in SQL, so I need help finding another way around.
Also, I'm using PHP and could consider a hybrid solution (i.e. part query part php code), but keep in mind that I have to perform this updating operation many millions of times.
thanks for your attention,
Silvio
Reminder: UPDATE's syntax has problems with joins with the same table that is being updated
EDIT: sorry, the column name in the WHERE clause was wrong, now I fixed it
Answer to revised question:
DELETE FROM
table_A
USING
table AS table_A
JOIN table AS table_B ON
table_A.col_B = table_B.col_B AND
table_B.col_A = 70
WHERE
table_A.col_A = 66
This gets rid of the rows that would cause problems. Then you issue your UPDATE query. Ideally you will do it all inside a transaction to avoid a situation where troublesome rows are re-inserted in between the two queries.
Are there any foreign keys referencing this table? If not then the following should do:
CREATE PROCEDURE `MyProcedure` (IN invarA INT, IN invarB INT)
LANGUAGE SQL
NOT DETERMINISTIC
MODIFIES SQL DATA
SQL SECURITY DEFINER
BEGIN
DELETE FROM table WHERE col_B = invarB;
IF ROW_COUNT() > 0 THEN
INSERT INTO table (`col_A`, `col_B`) VALUES (invarA, invarB);
END IF;
END
Example call:
CALL `MyProcedure`(66, 70)