I have mysql table of some records, e.g.:
CREATE TABLE test (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
value varchar NOT NULL
)
now, what I need is to generate unique sequence of 1, 2, ..., N in php script and store to another table... How to achieve this to be thread-safe and not creating doubles or skipping something?
I was wondering if some additional mysql table could be helpful, but I don't know how to create something like "separate autoincrements for each column value" or anything else...
test:
1 ... apples
2 ... oranges
3 ... lemons
some php script (accessed parallely by multiple users at the time):
save_next_fruit($_GET['fruit']);
will create some record in another tables with values like this:
saved_fruit:
ID | FRUIT(FK) | FRUIT_NO
1 1 1
2 1 2
3 2 1
4 3 1
5 3 2
6 1 3
7 3 3
8 2 2
9 1 4
10 2 3
11 1 5
12 2 4
13 1 6
14 3 4
15 3 5
other words, I need to do this (e.g. for fruit 3 (lemons)):
insert into saved_fruit (fruit, fruit_no) values (3, select MAX(fruit_no)+1 from saved_fruit where fruit = 3);
but in thread safe way (I understand that above command is not thread safe in MyISAM MySQL database)
Can you help?
Thanks
MyISAM does support this behavior. Create a two-column primary key, and make the second column auto-increment. It'll start over for each distinct value in the first column.
CREATE TABLE t (i INT, j INT AUTO_INCREMENT, PRIMARY KEY (i,j)) ENGINE=MyISAM;
INSERT INTO t (i) VALUES (1), (1), (2), (2), (1), (3);
SELECT * FROM t;
+---+---+
| i | j |
+---+---+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
+---+---+
But if you think about it, this is only thread-safe in a storage engine that does table-level locking for INSERT statements. Because the INSERT has to search other rows in the table to find the max j value per the same i value. If other people are doing INSERTs concurrently, it creates a race condition.
Thus, the dependency on MyISAM, which does table-level locking on INSERT.
See this reference in the manual: http://dev.mysql.com/doc/refman/5.6/en/example-auto-increment.html under the section, MyISAM Notes.
There are a whole lot of good reasons not to use MyISAM. The deciding factor for me is MyISAM's tendency to corrupt data.
Re your comment:
InnoDB does not support the increment-per-group behavior described above. You can make a multi-column primary key, but the error you got is because InnoDB requires that the auto-increment column be the first column in a key of the table (it doesn't strictly have to be the primary key)
Regardless of the position of the auto-increment column in the multi-column key, it only increments when you use it with InnoDB; it does not number entries per distinct value in another column.
To do this with an InnoDB table, you'd have to lock the table explicitly for the duration of the INSERT, to avoid race conditions. You'd do your own SELECT query for the max value in the group you're inserting to. Then insert that value + 1.
Basically, you have to bypass the auto-increment feature and specify values instead of having them automatically generated.
As you using MyISAM you could to lock whole table.
LOCK TABLES `saved_fruit`;
-- Insert query with select.
UNLOCK TABLES;
Related
I'm using Laravel and Migrations to build my entire database structure.
Problem description
In schema, I have a pack table, that belongs to user and group and need to keep a kind of unique "index" for each different combination of these tables.
It means: a sequential number that increments based on distinct user_id and group_id. For example:
| id | user_id | group_id | sequence |
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
| 3 | 1 | 3 | 1 |
| 4 | 1 | 1 | 2 |
| 5 | 1 | 2 | 2 |
| 6 | 1 | 3 | 2 |
| 7 | 2 | 1 | 1 |
| 8 | 2 | 2 | 1 |
| 9 | 2 | 3 | 1 |
This will be used to references a pack on view layer:
user 1, this is your pack 1 of group 1.
user 1, this is your pack 2 of group 1.
user 1, this is your pack 1 of group 2.
I designed my migration (on up) like:
Schema::create('pack', function (Blueprint $table) {
$table->increments('id');
$table->integer('user_id')->unsigned();
$table->foreign('user_id')->references('id')->on('user');
$table->integer('group_id')->unsigned();
$table->foreign('group_id')->references('id')->on('group');
$table->integer('sequence')->unsigned();
});
And use this business logic to fill $pack->sequence field on model layer.
Question 1:
Theoretically, this should be considered the best strategy to use in described scenario?
Question 2:
There's some pattern/approach that can be used to fill sequence field on database layer?
It appears you already have an auto-increment column id. MySQL does not support more than one auto-increment column per table.
In general, you can't get the behavior you're describing while allowing concurrent inserts to the table. The reason is that you have to read the max sequence value for some user/group pair, then insert the next value as you insert a new row.
But this creates a race condition, because some other concurrent session could be doing the same thing, and it will sneak in and insert a row with the next sequence value in between your session's steps of reading and inserting.
The solution is to use locks in a way to prevent a concurrent insert of the same user_id and group_id. InnoDB will use gap locks to help this.
Example:
Open two MySQL clients. In the first session, try this:
mysql> begin;
mysql> select max(sequence) from pack where user_id=1 and group_id=1 for update;
+---------------+
| max(sequence) |
+---------------+
| 2 |
+---------------+
The FOR UPDATE locks the rows examined, and it locks the "gap" which is the place where other rows with the same user_id and group_id would be inserted.
To prove this, try in the second session:
mysql> begin;
mysql> insert into pack set user_id=1, group_id=1, sequence=3;
It hangs. It can't do the insert, because that conflicts with the gap lock still held by the first session. The race-condition has been avoided.
Now in the first session, finish the work.
mysql> insert into pack set user_id=1, group_id=1, sequence=3;
mysql> commit;
Notice after the commit, immediately session 1's locks are released. The second session resolves its blocked INSERT, but it correctly gets an error:
ERROR 1062 (23000): Duplicate entry '1-1-3' for key 'user_id'
Of course, session 2 should have done the same SELECT...FOR UPDATE. That would have also been blocked until it could resolve the lock conflict. Once it resolved, it would have returned the correct new max sequence value.
The locks are only per user_id/group_id combo, if and only if you have a suitable index. I used:
ALTER TABLE pack ADD UNIQUE KEY (user_id, group_id, sequence);
Once you have that key, the SELECT...FOR UPDATE is able to be specific to the right set of rows when it locks them.
What this means is that even if user_id=1, group_id=1 is locked, you can still insert a new entry for any other values of user_id or group_id. They lock distinct parts of the index, so there's no conflict.
I encourage you to do some experiments yourself to prove to yourself you understand how it works. You can do this without writing any PHP code. I just opened two Terminal windows, ran the mysql command-line client, and started writing at the mysql> prompt. You can too!
I :)
I need to insert 13500 lines and +500 columns from csv.
So, I use load data infile and it's work.
But, I need exactly the same order in my MySQL database and my Csv.
Actually, for example, the 1000 line of the csv can be at the 800 place in my base
I need something like "Order by column1" but I don't find the clue.
Thank for your help
Ps : I have 2primary keys (ref of products) and the are not in the mathematical order (like 1, 8, 4, etc.)
EDIT : My code
$dataload = 'LOAD DATA LOCAL INFILE "'.__FILE__.'../../../../bo/csv/'.$nomfichier.'"
REPLACE
INTO TABLE gc_csv CHARACTER SET "latin1"
FIELDS TERMINATED BY "\t"
IGNORE 1 LINES
';
I just take the csv and use data local inline with him... And the order is'nt perfectly respected, I don't know why...
My design Table
CREATE TABLE `csv` (
`example` int(20) unsigned NOT NULL,
`example` int(15) unsigned NOT NULL,
`example` varchar(10) default NULL,
[...]
`example` varchar(4) default NULL,
PRIMARY KEY (`RefCatSYS`,`IdProduit`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Add an auto_increment column to your table, with DEFAULT NULL. When you load data with LOAD DATA INFILE, there will be no value for the column, and it will get assigned an automatically generated id. Select data ordered by the column.
kostja#annie:~$ sudo cat /var/lib/mysql/test/foo.csv
10
9
8
7
6
5
4
3
2
1
mysql> create table tmp (example int primary key, id int unique auto_increment default null);
Query OK, 0 rows affected (0.11 sec)
mysql> load data infile "foo.csv" into table tmp;
Query OK, 10 rows affected, 10 warnings (0.03 sec)
Records: 10 Deleted: 0 Skipped: 0 Warnings: 10
mysql> select * from tmp;
+---------+----+
| example | id |
+---------+----+
| 10 | 1 |
| 9 | 2 |
| 8 | 3 |
| 7 | 4 |
| 6 | 5 |
| 5 | 6 |
| 4 | 7 |
| 3 | 8 |
| 2 | 9 |
| 1 | 10 |
+---------+----+
10 rows in set (0.00 sec)
The tables in relational databases are unordered collections of records. You can get the rows in a particular order when you run a query by using SORT BY. If the query does not contain a SORT BY clause, the server returns the rows in the order they are on the storage medium. This order sometimes changes when records are updated. You must never rely on it and always use SORT BY (and indexes) to get a certain order of the rows in the result set.
LOAD DATA INFILE reads the lines from the CSV file and inserts them in the same order they are in the file. Apart from setting the value of an auto-incremented column (if there is one in the table), the order of the lines in the CSV file does not matter.
I have solve the problem. In fact, the WS send to me a csv with a multiple couple of the two primary key. At the line 2365 and 9798, the RefCatSYS and IdProduit as the same. So the load data infile REPLACE the line 2365 by the 9798 and that change the order.
I ask them to send a 3th and UNIQUE primary key
Thank for your help, and sorry for the disruption.
Sorry for asking a trivial question. I want to translate some of the fields of my database which has one million rows. So what I want to do is
to read field 1 and perform the translate function and write it to field 3 and respectively field 2 needs to be written into field 4.
initial table
field id|field 1 |field 2 |field 3|field 4|
1 | apple | pear | empty |empty |
2 | banana | pineapple | empty |empty |
end result table translate(apple) - yabloko
field id|field 1 |field 2 |field 3|field 4|
1 | apple | pear | yablogo |grusha |
2 | banana | pineapple | banan |ananas |
I already have the translate function, the question is how to perform
this on all one million rows. How to construct the loop through it correctly? (surely there are some IDs missing, as some of the data was removed).
thank you so much in advance!!!
Rather than "construct a loop" and process row by row, the normative pattern would be to perform the operation in a single statement.
I'd populate a translation table:
CREATE TABLE my_translation
( old_word VARCHAR(100) NOT NULL PRIMARY KEY
, new_word VARCHAR(100)
) Engine=InnoDB;
INSERT INTO my_translation (old_word, new_word) VALUES
('apple' ,'yablogo')
,('pear' ,'grush')
,('banana' ,'banan')
,('pineapple','ananas);
Then do an update. The tricky part is leaving field_3 and field_4 unmodified if there's no match.
UPDATE my_table t
LEFT
JOIN my_translation c3
ON c3.old_word = t.field_1
LEFT
JOIN my_translation c4
ON c4.old_word = t.field_2
SET t.field_3 = IF(c3.old_word IS NULL,t.field_3,c3.new_word)
, t.field_4 = IF(c4.old_word IS NULL,t.field_4,c4.new_word)
NOTE: If this is a one-time operation, I might consider doing this as an INSERT into a new table, and then swapping the table names and changing foreign key references, to put the new table in place of the old table.
I have a table which contains a standard auto-incrementing ID, a type identifier, a number, and some other irrelevant fields. When I insert a new object into this table, the number should auto-increment based on the type identifier.
Here is an example of how the output should look:
id type_id number
1 1 1
2 1 2
3 2 1
4 1 3
5 3 1
6 3 2
7 1 4
8 2 2
As you can see, every time I insert a new object, the number increments according to the type_id (i.e. if I insert an object with type_id of 1 and there are 5 objects matching this type_id already, the number on the new object should be 6).
I'm trying to find a performant way of doing this with huge concurrency. For example, there might be 300 inserts within the same second for the same type_id and they need to be handled sequentially.
Methods I've tried already:
PHP
This was a bad idea but I've added it for completeness. A request was made to get the MAX() number for the item type and then add the number + 1 as part of an insert. This is quick but doesn't work concurrently as there could be 200 inserts between the request for MAX() and that particular insert leading to multiple objects with the same number and type_id.
Locking
Manually locking and unlocking the table before and after each insert in order to maintain the increment. This caused performance issues due to the number of concurrent inserts and because the table is constantly read from throughout the app.
Transaction with Subquery
This is how I'm currently doing it but it still causes massive performance issues:
START TRANSACTION;
INSERT INTO objects (type_id,number) VALUES ($type_id, (SELECT COALESCE(MAX(number),0)+1 FROM objects WHERE type_id = $type_id FOR UPDATE));
COMMIT;
Another negative thing about this approach is that I need to do a follow up query in order to get the number that was added (i.e. searching for an object with the $type_id ordered by number desc so I can see the number that was created - this is done based on a $user_id so it works but adds an extra query which I'd like to avoid)
Triggers
I looked into using a trigger in order to dynamically add the number upon insert but this wasn't performant as I need to perform a query on the table I'm inserting into (which isn't allowed so has to be within a subquery causing performance issues).
Grouped Auto-Increment
I've had a look at grouped auto-increment (so that the number would auto-increment based on type_id) but then I lose my auto-increment ID.
Does anybody have any ideas on how I can make this performant at the level of concurrent inserts that I need? My table is currently InnoDB on MySQL 5.5
Appreciate any help!
Update: Just in case it is relevant, the objects table has several million objects in it. Some of the type_id can have around 500,000 objects assigned to them.
Use transaction and select ... for update. This will solve concurrency conflicts.
In Transaction with Subquery
Try to make index on column type_id
I think by making index on column type_id it will speed up your subquery.
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,type_id INT NOT NULL
);
INSERT INTO my_table VALUES
(1,1),(2,1),(3,2),(4,1),(5,3),(6,3),(7,1),(8,2);
SELECT x.*
, COUNT(*) rank
FROM my_table x
JOIN my_table y
ON y.type_id = x.type_id
AND y.id <= x.id
GROUP
BY id
ORDER
BY type_id
, rank;
+----+---------+------+
| id | type_id | rank |
+----+---------+------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 4 | 1 | 3 |
| 7 | 1 | 4 |
| 3 | 2 | 1 |
| 8 | 2 | 2 |
| 5 | 3 | 1 |
| 6 | 3 | 2 |
+----+---------+------+
or, if performance is an issue, just do the same thing with a couple of #variables.
Perhaps an idea to create a (temporary) table for all rows with a common "type_id".
In that table you can use auto-incrementing for your num colomn.
Then your num shoud be fully trustable.
Then you can select your data and update your first table.
I'm developing a QA web-app which will have some points to evaluated assigned to one of the following Categories.
Call management
Technical skills
Ticket management
As this aren't likely to change it's not worth making them dynamic but the worst point is that points are like to.
First I had a table of 'quality' which had a column for each point but then requisites changed and I'm kinda blocked.
I have to store "evaluations" that have all points with their values but maybe, in the future, those points will change.
I thought that in the quality table I could make some kind of string that have something like that
1=1|2=1|3=2
Where you have sets of ID of point and punctuation of that given value.
Can someone point me to a better method to do that?
As mentioned many times here on SO, NEVER PUT MORE THAN ONE VALUE INTO A DB FIELD, IF YOU WANT TO ACCESS THEM SEPERATELY.
So I suggest to have 2 additional tables:
CREATE TABLE categories (id int AUTO_INCREMENT PRIMARY KEY, name VARCHAR(50) NOT NULL);
INSERT INTO categories VALUES (1,"Call management"),(2,"Technical skills"),(3,"Ticket management");
and
CREATE TABLE qualities (id int AUTO_INCREMENT PRIMARY KEY, category int NOT NULL, punctuation int NOT nULL)
then store and query your data accordingly
This table is not normalized. It violates 1st Normal Form (1NF):
Evaluation
----------------------------------------
EvaluationId | List Of point=punctuation
1 | 1=1|2=1|3=2
2 | 1=5|2=6|3=7
You can read more about Database Normalization basics.
The table could be normalized as:
Evaluation
-------------
EvaluationId
1
2
Quality
---------------------------------------
EvaluationId | Point | Punctuation
1 | 1 | 1
1 | 2 | 1
1 | 3 | 2
2 | 1 | 5
2 | 2 | 6
2 | 3 | 7