updating row with duplicates value

updating row with duplicates value - php

If I have a table with 2 columns , what If I am going to update a column in that table that creates duplicate rows, This table has unique constraint as well, is there any way that if unique row get created while I am updating I can process that row?

Usually adding an SQL inline IF statement with some criteria and optional processing and a self-join to detect duplication will do what you're looking for. The true answer will be specific to your structure, but I will give an example for a table called user with a column called id which is the primary key and SSN which has a unique constraint on it. We'll populate it with 2 users and update one of them to duplicate the first one in the unique ssn column"
CREATE TABLE `test`.`user` (
`id` INT NOT NULL,
`SSN` VARCHAR(45) NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `ssn_UNIQUE` (`SSN` ASC));
INSERT INTO user VALUES (1, "1234567"), (2, "0123456");
As you have noticed, if I run the following update when another user (where id=1) already has SSN="1234567", then we will have made no updates.
UPDATE user SET SSN="1234567" WHERE id=2;
ERROR 1062 (23000): Duplicate entry '1234567' for key 'ssn_UNIQUE'
However, consider the following instead:
UPDATE user u
LEFT JOIN user AS u2
ON u2.SSN="1234567"
SET u.SSN=IF(
u2.id IS NOT NULL,
CONCAT(u2.SSN, "duplicates", u2.id, "onto", u.id),
"1234567")
WHERE u.id=2;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
In the above example, the following scenarios could play out:
If user id=1 already has SSN="1234567", and I run the above update, the result will be:
SELECT * FROM test.user;
+----+--------------------------+
| id | SSN |
+----+--------------------------+
| 2 | 1234567duplicates1onto2 |
| 1 | 1234567 |
+----+--------------------------+
2 rows in set (0.00 sec)
If I try to set instead to "01234567" instead, and I run the same above update, the result will be:
SELECT * FROM test.user;
+----+----------+
| id | SSN |
+----+----------+
| 2 | 01234567 |
| 1 | 1234567 |
+----+----------+
2 rows in set (0.00 sec)
If I had a 3rd user, that user might possibly have the value "1234567duplicates2" if two other users had attempts at setting the value to "1234567" similarly:
SELECT * FROM test.user;
+----+-------------------------+
| id | SSN |
+----+-------------------------+
| 1 | 1234567 |
| 2 | 1234567duplicates1onto2 |
| 3 | 1234567duplicates1onto3 |
+----+-------------------------+
3 rows in set (0.00 sec)
As you can see, the "onto" part allows me to have many duplicates in the same update batch.
To adapt this technique, just change the output of the inline IF to be the formula you would use for processing, and the criteria for the JOIN should be anything to provide duplication detection.
http://dev.mysql.com/doc/refman/5.1/en/control-flow-functions.html

Related

Proper way to create a sequence based on multiple fields

I'm using Laravel and Migrations to build my entire database structure.
Problem description
In schema, I have a pack table, that belongs to user and group and need to keep a kind of unique "index" for each different combination of these tables.
It means: a sequential number that increments based on distinct user_id and group_id. For example:
| id | user_id | group_id | sequence |
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
| 3 | 1 | 3 | 1 |
| 4 | 1 | 1 | 2 |
| 5 | 1 | 2 | 2 |
| 6 | 1 | 3 | 2 |
| 7 | 2 | 1 | 1 |
| 8 | 2 | 2 | 1 |
| 9 | 2 | 3 | 1 |
This will be used to references a pack on view layer:
user 1, this is your pack 1 of group 1.
user 1, this is your pack 2 of group 1.
user 1, this is your pack 1 of group 2.
I designed my migration (on up) like:
Schema::create('pack', function (Blueprint $table) {
$table->increments('id');
$table->integer('user_id')->unsigned();
$table->foreign('user_id')->references('id')->on('user');
$table->integer('group_id')->unsigned();
$table->foreign('group_id')->references('id')->on('group');
$table->integer('sequence')->unsigned();
});
And use this business logic to fill $pack->sequence field on model layer.
Question 1:
Theoretically, this should be considered the best strategy to use in described scenario?
Question 2:
There's some pattern/approach that can be used to fill sequence field on database layer?

It appears you already have an auto-increment column id. MySQL does not support more than one auto-increment column per table.
In general, you can't get the behavior you're describing while allowing concurrent inserts to the table. The reason is that you have to read the max sequence value for some user/group pair, then insert the next value as you insert a new row.
But this creates a race condition, because some other concurrent session could be doing the same thing, and it will sneak in and insert a row with the next sequence value in between your session's steps of reading and inserting.
The solution is to use locks in a way to prevent a concurrent insert of the same user_id and group_id. InnoDB will use gap locks to help this.
Example:
Open two MySQL clients. In the first session, try this:
mysql> begin;
mysql> select max(sequence) from pack where user_id=1 and group_id=1 for update;
+---------------+
| max(sequence) |
+---------------+
| 2 |
+---------------+
The FOR UPDATE locks the rows examined, and it locks the "gap" which is the place where other rows with the same user_id and group_id would be inserted.
To prove this, try in the second session:
mysql> begin;
mysql> insert into pack set user_id=1, group_id=1, sequence=3;
It hangs. It can't do the insert, because that conflicts with the gap lock still held by the first session. The race-condition has been avoided.
Now in the first session, finish the work.
mysql> insert into pack set user_id=1, group_id=1, sequence=3;
mysql> commit;
Notice after the commit, immediately session 1's locks are released. The second session resolves its blocked INSERT, but it correctly gets an error:
ERROR 1062 (23000): Duplicate entry '1-1-3' for key 'user_id'
Of course, session 2 should have done the same SELECT...FOR UPDATE. That would have also been blocked until it could resolve the lock conflict. Once it resolved, it would have returned the correct new max sequence value.
The locks are only per user_id/group_id combo, if and only if you have a suitable index. I used:
ALTER TABLE pack ADD UNIQUE KEY (user_id, group_id, sequence);
Once you have that key, the SELECT...FOR UPDATE is able to be specific to the right set of rows when it locks them.
What this means is that even if user_id=1, group_id=1 is locked, you can still insert a new entry for any other values of user_id or group_id. They lock distinct parts of the index, so there's no conflict.
I encourage you to do some experiments yourself to prove to yourself you understand how it works. You can do this without writing any PHP code. I just opened two Terminal windows, ran the mysql command-line client, and started writing at the mysql> prompt. You can too!

Create a queue system

I would like to create a queue system that works in this way:
A user fills in a form where they will have to enter some data.
Click on Send and these data will be saved in a sql table.
Going to the index.php page will see a box containing a text like this: There are 4 requests in front of you, please wait a few minutes.
I have already tried to do such a thing, but going to create new requests the number "4" of the message grows.
This is because I created a query that counts all the results on the table.
$query = $mysql->query("SELECT COUNT(*) AS q FROM application_approve");
While I want it to count only the results above the request that sent the user.
id name text text2
1 First request dassasad dsadasas
2 Second request dassasad dsadasas
3 Third request dsadasdsas dsadasad
In the example above I would like to count only how many lines there are above the "Second Request": in this case 1.

Assuming your table has a PK (id) and references a user_id to identify which request belongs to which user and assuming there can only be a single request in the queue per user then your query would look something like the following.
SELECT COUNT(id) AS q FROM application_approve
WHERE id < (
SELECT id FROM application_approve
WHERE user_id = ?
)
This also assumes the PK id is an auto-incrementing key.
Given the user_id this query would return the number of rows above the given user's row (assuming they have one). Or, in other words, all ids less than the id of the given user.
For simplicity, let's assume this schema only has 2 columns (id and user_id):
mysql> SELECT * FROM application_approve;
+------+---------+
| id | user_id |
+------+---------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+------+---------+
3 rows in set (0.00 sec)
So in the given table, there are 3 users, each with 1 entry in the queue.
If we wanted to find which position user 2 is in the query would give us the following result:
mysql> SELECT COUNT(id) AS q FROM application_approve WHERE id < (SELECT id FROM application_approve WHERE user_id = 2);
+---+
| q |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

MySQL: fix fully duplicate row

I have table tags and I forget to set id column as primary key when created it.
Now I am facing a duplicate key problem.
tags table:
id text
1 man
2 ball
2 ball
2 ball
3 love
3 love
4 heart
4 heart
How to remove duplication and keep and set id as the primary key ?
Expected result: ( the new required tags table)
id text
1 man
2 ball
3 love
4 heart

I think the easiest way is to create a temporary table with the data and then reload the data:
create temporary table tags_temp as
select distinct id, text
from tags;
truncate table tags;
alter table tags add primary key (id);
insert into tags(id, text)
select id, temp
from tags_temp;

What I would do, is create a new table, add the key insert the data from the old table, then drop tags and rename temp
/* Make a copy of the database table (including indexes) */
create table tags_tmp like tags;
/* Add the primary key to the table */
alter table tags_tmp add primary key (id);
/* Insert the data from the bad table and ignore any duplicates */
insert ignore into tags_tmp (id, text)
select id, text from tags;
/* Drop the bad table */
drop table tags;
/* Rename the temporary table to the original name */
rename table tags_tmp to tags;

First, I created your table and inserted data in:
mysql> select * from tags;
+----+-------+
| id | text |
+----+-------+
| 1 | man |
| 2 | ball |
| 2 | ball |
| 2 | ball |
| 3 | love |
| 3 | love |
| 4 | heart |
| 4 | heart |
+----+-------+
8 rows in set (0.00 sec)
I backup the distinct entries only:
mysql> create table T as select distinct * from tags;
Query OK, 4 rows affected (0.27 sec)
Records: 4 Duplicates: 0 Warnings: 0
I no longer need the original table, so I drop it from the database:
mysql> drop table tags;
Query OK, 0 rows affected (0.12 sec)
I rename the previous backup table:
mysql> rename table T to tags;
Query OK, 0 rows affected (0.08 sec)
Now it is time to add the PRIMARY KEY constraint to our table:
mysql> alter table tags add primary key(id);
Query OK, 0 rows affected (0.48 sec)
Records: 0 Duplicates: 0 Warnings: 0
Now, let us test if what we did is correct. First, let us display the data:
mysql> select * from tags;
+----+-------+
| id | text |
+----+-------+
| 1 | man |
| 2 | ball |
| 3 | love |
| 4 | heart |
+----+-------+
4 rows in set (0.00 sec)
Let's try to add a row with id = 4:
mysql> insert into tags values(4,'proof');
ERROR 1062 (23000): Duplicate entry '4' for key 'PRIMARY'
Conclusion: what we did is correct.

Improving a query UPDATE using large mysql databases

I'm trying to attempt updating my quite robust database (nearly 3 million rows) with following query:
$length = strlen($this);
$query = "UPDATE database
SET row_to_update='1'
WHERE row='{$this}'
AND row_length='{$length}'
LIMIT 1";
It gets words ($this) from a file (quite a lot of them) and then searches for a match. If found, it updates row_to_update with value 1 (set none as default).
Every row_length contains already value of length of certain cell, which I thought might speed up process significantly. Sadly it didn't.
It manages only ~30k queries in 8h. That's slow, to say the least!
Is there any way, I could improve this bit of inefficient code?

Try to collect a bunch of values you're looking for and use
UPDATE table SET row_to_update='1' WHERE row IN ({$my_values});
You can use EXPLAIN <your_query> and EXPLAIN EXTENDED .. to check if it uses indexes or not and adjust the query or create indexes to speed it up. Play with SELECT with the same WHERE conditions that way.
Much more you can get using:
SET profiling = 1;
<your query goes here>
SHOW PROFILES;
SHOW PROFILE FOR QUERY 1;
Be carefull with it if it's not on dev. env.
Consider as well to fill temp table with the values you're interested in and use it that way:
UPDATE table SET row_to_update='1' WHERE row in (SELECT values FROM my_temp_table);
when you get there than you can improve it to:
UPDATE table INNER JOIN temp_table ON table.row = temp_table.row SET row_to_update = '1';
EXAMPLES:
As you asked for examples. Lat say example table represents your original one with lot of data in it. In this example I'll use just 4 rows:
mysql> select * from example;
+----+------+
| id | data |
+----+------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+----+------+
4 rows in set (0.00 sec)
Let say that you're looking for ids of rows that has data= 'a', 'b', or 'c'
You can do this in 3 ways:
1) SELECT ... IN (list)
mysql> select id from example where data in ('a', 'b', 'c');
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
+----+
3 rows in set (0.00 sec)
2) SELECT ... IN (SELECT ... FROM temp_table)
mysql> select * from temp_table;
+----+------+
| id | data |
+----+------+
| 10 | foo~ |
| 11 | a |
| 12 | bar |
| 13 | baz |
| 14 | b |
| 15 | c |
+----+------+
6 rows in set (0.00 sec)
mysql> select id from example where data in (SELECT data from temp_table);
[..]
3 rows in set (0.00 sec)
3) SELECT ... INNER JOIN temp_table ...
mysql> select example.id from example inner join temp_table on example.data = temp_table.data;
[..]
3 rows in set (0.01 sec)
And when you'll be ready use UPDATE with the same conditions to apply changes you like.

Change existing row ID based on AUTO_INCREMENT (unique key)

I have a table that records tickets that are separated by a column that denotes the "database". I have a unique key on the database and cid columns so that it increments each database uniquely (cid has the AUTO_INCREMENT attribute to accomplish this). I increment id manually since I cannot make two AUTO_INCREMENT columns (and I'd rather the AUTO_INCREMENT take care of the more complicated task of the uniqueness).
This makes my data look like this basically:
-----------------------------
| id | cid | database |
-----------------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 2 |
-----------------------------
This works perfectly well.
I am trying to make a feature that will allow a ticket to be "moved" to another database; frequently a user may enter the ticket in the wrong database. Instead of having to close the ticket and completely create a new one (copy/pasting all the data over), I'd like to make it easier for the user of course.
I want to be able to change the database and cid fields uniquely without having to tamper with the id field. I want to do an UPDATE (or the like) since there are foreign key constraints on other tables the link to the id field; this is why I don't simply do a REPLACE or DELETE then INSERT, as I don't want it to delete all of the other table data and then have to recreate it (log entries, transactions, appointments, etc.).
How can I get the next unique AUTO_INCREMENT value (based on the new database value), then use that to update the desired row?
For example, in the above dataset, I want to change the first record to go to "database #2". Whatever query I make needs to make the data change to this:
-----------------------------
| id | cid | database |
-----------------------------
| 1 | 3 | 2 |
| 2 | 1 | 2 |
| 3 | 2 | 2 |
-----------------------------
I'm not sure if the AUTO_INCREMENT needs to be incremented, as my understanding is that the unique key makes it just calculate the next appropriate value on the fly.

I actually ended up making it work once I re-read an except on using AUTO_INCREMENT on multiple columns.
For MyISAM and BDB tables you can specify AUTO_INCREMENT on a
secondary column in a multiple-column index. In this case, the
generated value for the AUTO_INCREMENT column is calculated as
MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is
useful when you want to put data into ordered groups.
This was the clue I needed. I simply mimic'd the query MySQL runs internally according to that quote, and joined it into my UPDATE query as such. Assume $new_database is the database to move to, and $id is the current ticket id.
UPDATE `tickets` AS t1,
(
SELECT MAX(cid) + 1 AS new_cid
FROM `tickets`
WHERE database = {$new_database}
) AS t2
SET t1.cid = t2.new_cid,
t1.database = {$new_database}
WHERE t1.id = {$id}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.