I'm using Laravel and Migrations to build my entire database structure.
Problem description
In schema, I have a pack table, that belongs to user and group and need to keep a kind of unique "index" for each different combination of these tables.
It means: a sequential number that increments based on distinct user_id and group_id. For example:
| id | user_id | group_id | sequence |
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
| 3 | 1 | 3 | 1 |
| 4 | 1 | 1 | 2 |
| 5 | 1 | 2 | 2 |
| 6 | 1 | 3 | 2 |
| 7 | 2 | 1 | 1 |
| 8 | 2 | 2 | 1 |
| 9 | 2 | 3 | 1 |
This will be used to references a pack on view layer:
user 1, this is your pack 1 of group 1.
user 1, this is your pack 2 of group 1.
user 1, this is your pack 1 of group 2.
I designed my migration (on up) like:
Schema::create('pack', function (Blueprint $table) {
$table->increments('id');
$table->integer('user_id')->unsigned();
$table->foreign('user_id')->references('id')->on('user');
$table->integer('group_id')->unsigned();
$table->foreign('group_id')->references('id')->on('group');
$table->integer('sequence')->unsigned();
});
And use this business logic to fill $pack->sequence field on model layer.
Question 1:
Theoretically, this should be considered the best strategy to use in described scenario?
Question 2:
There's some pattern/approach that can be used to fill sequence field on database layer?
It appears you already have an auto-increment column id. MySQL does not support more than one auto-increment column per table.
In general, you can't get the behavior you're describing while allowing concurrent inserts to the table. The reason is that you have to read the max sequence value for some user/group pair, then insert the next value as you insert a new row.
But this creates a race condition, because some other concurrent session could be doing the same thing, and it will sneak in and insert a row with the next sequence value in between your session's steps of reading and inserting.
The solution is to use locks in a way to prevent a concurrent insert of the same user_id and group_id. InnoDB will use gap locks to help this.
Example:
Open two MySQL clients. In the first session, try this:
mysql> begin;
mysql> select max(sequence) from pack where user_id=1 and group_id=1 for update;
+---------------+
| max(sequence) |
+---------------+
| 2 |
+---------------+
The FOR UPDATE locks the rows examined, and it locks the "gap" which is the place where other rows with the same user_id and group_id would be inserted.
To prove this, try in the second session:
mysql> begin;
mysql> insert into pack set user_id=1, group_id=1, sequence=3;
It hangs. It can't do the insert, because that conflicts with the gap lock still held by the first session. The race-condition has been avoided.
Now in the first session, finish the work.
mysql> insert into pack set user_id=1, group_id=1, sequence=3;
mysql> commit;
Notice after the commit, immediately session 1's locks are released. The second session resolves its blocked INSERT, but it correctly gets an error:
ERROR 1062 (23000): Duplicate entry '1-1-3' for key 'user_id'
Of course, session 2 should have done the same SELECT...FOR UPDATE. That would have also been blocked until it could resolve the lock conflict. Once it resolved, it would have returned the correct new max sequence value.
The locks are only per user_id/group_id combo, if and only if you have a suitable index. I used:
ALTER TABLE pack ADD UNIQUE KEY (user_id, group_id, sequence);
Once you have that key, the SELECT...FOR UPDATE is able to be specific to the right set of rows when it locks them.
What this means is that even if user_id=1, group_id=1 is locked, you can still insert a new entry for any other values of user_id or group_id. They lock distinct parts of the index, so there's no conflict.
I encourage you to do some experiments yourself to prove to yourself you understand how it works. You can do this without writing any PHP code. I just opened two Terminal windows, ran the mysql command-line client, and started writing at the mysql> prompt. You can too!
Related
I have a table which contains a standard auto-incrementing ID, a type identifier, a number, and some other irrelevant fields. When I insert a new object into this table, the number should auto-increment based on the type identifier.
Here is an example of how the output should look:
id type_id number
1 1 1
2 1 2
3 2 1
4 1 3
5 3 1
6 3 2
7 1 4
8 2 2
As you can see, every time I insert a new object, the number increments according to the type_id (i.e. if I insert an object with type_id of 1 and there are 5 objects matching this type_id already, the number on the new object should be 6).
I'm trying to find a performant way of doing this with huge concurrency. For example, there might be 300 inserts within the same second for the same type_id and they need to be handled sequentially.
Methods I've tried already:
PHP
This was a bad idea but I've added it for completeness. A request was made to get the MAX() number for the item type and then add the number + 1 as part of an insert. This is quick but doesn't work concurrently as there could be 200 inserts between the request for MAX() and that particular insert leading to multiple objects with the same number and type_id.
Locking
Manually locking and unlocking the table before and after each insert in order to maintain the increment. This caused performance issues due to the number of concurrent inserts and because the table is constantly read from throughout the app.
Transaction with Subquery
This is how I'm currently doing it but it still causes massive performance issues:
START TRANSACTION;
INSERT INTO objects (type_id,number) VALUES ($type_id, (SELECT COALESCE(MAX(number),0)+1 FROM objects WHERE type_id = $type_id FOR UPDATE));
COMMIT;
Another negative thing about this approach is that I need to do a follow up query in order to get the number that was added (i.e. searching for an object with the $type_id ordered by number desc so I can see the number that was created - this is done based on a $user_id so it works but adds an extra query which I'd like to avoid)
Triggers
I looked into using a trigger in order to dynamically add the number upon insert but this wasn't performant as I need to perform a query on the table I'm inserting into (which isn't allowed so has to be within a subquery causing performance issues).
Grouped Auto-Increment
I've had a look at grouped auto-increment (so that the number would auto-increment based on type_id) but then I lose my auto-increment ID.
Does anybody have any ideas on how I can make this performant at the level of concurrent inserts that I need? My table is currently InnoDB on MySQL 5.5
Appreciate any help!
Update: Just in case it is relevant, the objects table has several million objects in it. Some of the type_id can have around 500,000 objects assigned to them.
Use transaction and select ... for update. This will solve concurrency conflicts.
In Transaction with Subquery
Try to make index on column type_id
I think by making index on column type_id it will speed up your subquery.
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,type_id INT NOT NULL
);
INSERT INTO my_table VALUES
(1,1),(2,1),(3,2),(4,1),(5,3),(6,3),(7,1),(8,2);
SELECT x.*
, COUNT(*) rank
FROM my_table x
JOIN my_table y
ON y.type_id = x.type_id
AND y.id <= x.id
GROUP
BY id
ORDER
BY type_id
, rank;
+----+---------+------+
| id | type_id | rank |
+----+---------+------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 4 | 1 | 3 |
| 7 | 1 | 4 |
| 3 | 2 | 1 |
| 8 | 2 | 2 |
| 5 | 3 | 1 |
| 6 | 3 | 2 |
+----+---------+------+
or, if performance is an issue, just do the same thing with a couple of #variables.
Perhaps an idea to create a (temporary) table for all rows with a common "type_id".
In that table you can use auto-incrementing for your num colomn.
Then your num shoud be fully trustable.
Then you can select your data and update your first table.
I have a table that records tickets that are separated by a column that denotes the "database". I have a unique key on the database and cid columns so that it increments each database uniquely (cid has the AUTO_INCREMENT attribute to accomplish this). I increment id manually since I cannot make two AUTO_INCREMENT columns (and I'd rather the AUTO_INCREMENT take care of the more complicated task of the uniqueness).
This makes my data look like this basically:
-----------------------------
| id | cid | database |
-----------------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 2 |
-----------------------------
This works perfectly well.
I am trying to make a feature that will allow a ticket to be "moved" to another database; frequently a user may enter the ticket in the wrong database. Instead of having to close the ticket and completely create a new one (copy/pasting all the data over), I'd like to make it easier for the user of course.
I want to be able to change the database and cid fields uniquely without having to tamper with the id field. I want to do an UPDATE (or the like) since there are foreign key constraints on other tables the link to the id field; this is why I don't simply do a REPLACE or DELETE then INSERT, as I don't want it to delete all of the other table data and then have to recreate it (log entries, transactions, appointments, etc.).
How can I get the next unique AUTO_INCREMENT value (based on the new database value), then use that to update the desired row?
For example, in the above dataset, I want to change the first record to go to "database #2". Whatever query I make needs to make the data change to this:
-----------------------------
| id | cid | database |
-----------------------------
| 1 | 3 | 2 |
| 2 | 1 | 2 |
| 3 | 2 | 2 |
-----------------------------
I'm not sure if the AUTO_INCREMENT needs to be incremented, as my understanding is that the unique key makes it just calculate the next appropriate value on the fly.
I actually ended up making it work once I re-read an except on using AUTO_INCREMENT on multiple columns.
For MyISAM and BDB tables you can specify AUTO_INCREMENT on a
secondary column in a multiple-column index. In this case, the
generated value for the AUTO_INCREMENT column is calculated as
MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is
useful when you want to put data into ordered groups.
This was the clue I needed. I simply mimic'd the query MySQL runs internally according to that quote, and joined it into my UPDATE query as such. Assume $new_database is the database to move to, and $id is the current ticket id.
UPDATE `tickets` AS t1,
(
SELECT MAX(cid) + 1 AS new_cid
FROM `tickets`
WHERE database = {$new_database}
) AS t2
SET t1.cid = t2.new_cid,
t1.database = {$new_database}
WHERE t1.id = {$id}
I have this structure in table named grades
1st row: 2nd row:
------------------------------
| english | math |
------------------------------
| 3 | 4 |
| 4 | 4 |
| 5 | 2 |
| 2 | 5 |
------------------------------
How do I now calucate average of english row?
I tried with:
"SELECT AVG(english) as `averageenglish` FROM grades"
It always gives me "No database selected"
Database selections are per session. If you go into the MySQL command prompt and type
USE MyDatabase
you do not have to qualify table names in queries for that session. As soon as you leave (exit, Ctrl+c, etc.) you don't have that privilege. You will have to USE the database again or qualify table names in queries. This includes a session created by a separate script that is not running on the command line.
In production code, it is helpful to always qualify table names in queries. Your query should look like:
SELECT AVG(english) as averageenglish FROM MyDatabase.grades
To demonstrated what #eggyal means
SELECT AVG(english) as `averageenglish` FROM databasename.grades
The preferred method is to add the database name to your connection if you only have one database.
I want to insert into the cart table
**orderId** | cartId | cartDate | cartStatus
____________________________________________
1 | 1 | 20120102 | complete
2 | 2 | 20120102 | complete
3 | 3 | 20120102 | complete
4 | 4 | 20120102 | complete
using the auto increment value orderId from the order table
**orderId** | orderStatus | secret | sauce
____________________________________________
1 | 7 | 020200202 | bbq
2 | 6 | 020200202 | bbq
3 | 6 | 020200202 | t
4 | 4 | 020200202 | m
INSERT INTO ordertable VALUES(null,7,020200202,bbq)
but then using the orderId (which will now be 5)
INSERT INTO carttable VALUES(orderId,20120102,complete)
However,
this insert must be done as the same query. If I use mysql_last_id (php) there is an opportunity for someone else to insert into the database before my cart insert is executed. Or the connection might timeout. The database is MyISAM (and I can not change this, 3rd party solution).
Thank you,
J
I think your concern about using mysql_last_id is unfounded - it will return the last id for the current connection, not the last id globally across all connections.
So unless you have multiple threads sharing the same database connection or you perform another identity insert on the same connection before calling mysql_last_id, you should have nothing to worry about.
ETA: You could do this by sending multiple queries at once, like this:
INSERT INTO ordertable VALUES(null,7,020200202,bbq);
INSERT INTO carttable VALUES(LAST_INSERT_ID(),20120102,complete);
But if you are using mysql_query it usually won't let you send multiple queries in the same call (mostly as a security measure to try to prevent SQL injection).
I have a report I'm rewriting for an application using MySQL as the database. Currently, the report is using a lot of grunt work coming from php, which creates arrays, re-stores them into a temp database then generates results from that temp DB.
One of the main goals from rewriting a bulk of all this code is to simplify and clean a lot of my old code and am wondering whether the below process can be simplified, or even better done solely on MySQL to let php just handle the dstribution of the data to the client.
I will use a made up scenario to describe what I am attempting to do:
Let's assume the following table (please note in real app, this table's information is actually pulled from several tables, but this should get the point across for clarity):
+----+-----------+--------------+--------------+
| id | location | date_visited | time_visited |
+----+-----------+--------------+--------------+
| 1 | place 1 | 2012-04-20 | 11:00:00 |
+----+-----------+--------------+--------------+
| 2 | place 2 | 2012-04-20 | 11:06:00 |
+----+-----------+--------------+--------------+
| 3 | place 1 | 2012-04-20 | 11:06:00 |
+----+-----------+--------------+--------------+
| 4 | place 3 | 2012-04-20 | 11:20:00 |
+----+-----------+--------------+--------------+
| 5 | place 2 | 2012-04-20 | 11:21:00 |
+----+-----------+--------------+--------------+
| 6 | place 1 | 2012-04-20 | 11:22:00 |
+----+-----------+--------------+--------------+
| 7 | place 3 | 2012-04-20 | 11:23:00 |
+----+-----------+--------------+--------------+
The report I need requires me to first list each location and then the number of visits made to that place. However, the caveat and what makes the query difficult for me is that there needs to be a time interval met for the visit to count whithin this report.
For example: Let's say the interval between visits to any given place is 10 minutes.
The first entry is locked in automatically because there are no previous entries, and so is the second since there are no other entries for 'place 2' yet. However on the third entry, place 1 is checked for the last time it was visited, which was less than the interval defined (10 minutes), therefore the report would ignore this entry and move along to the next one.
In essence, we are checking on a case by case scenario where the time interval is not from the last entry, but from the last entry from the same location.
The results from the report should look something like this in the end:
+----+-----------+--------+
| id | location | visits |
+----+-----------+--------+
| 1 | place 1 | 2 |
+----+-----------+--------+
| 2 | place 2 | 2 |
+----+-----------+--------+
| 3 | place 3 | 1 |
+----+-----------+--------+
My current implementation on a basic level goes through the following steps to acquire the above result set:
MySQL query creates one temp table with a list of all the required locations and their ID.
MySQL query selects all the visit data whithin the specified time frame and passes it to PHP.
PHP & MySQL populate the temporary table with the visits data, PHP does the grunt work here.
MySQL selects data from temporary table and returns it to client for display.
My question is. Is there a way to do most of this with MySQL alone? What I've been trying to find is a way to write a MySQL query which can parse through the select statement and select only the visits which meet the above criteria and then finally groups it by location and provides me with a COUNT(*) of each group.
I really don't know if it's possible and am in hopes that one of the database gurus out there might be able to shed some light on how to do this.
Suppose you have a table (probably temporary) of a slightly different structure:
CREATE TABLE `visits` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`location` varchar(45) NOT NULL,
`visited` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `loc_vis` (`location`,`visited`)
) ENGINE=InnoDB;
INSERT INTO visits (location, visited) VALUES
('place 1', '2012-04-20 11:00:00'),
('place 2', '2012-04-20 11:06:00'),
('place 1', '2012-04-20 11:06:00'),
('place 3', '2012-04-20 11:20:00'),
('place 2', '2012-04-20 11:21:00'),
('place 1', '2012-04-20 11:22:00'),
('place 1', '2012-04-20 11:23:00');
which, as you see, has an index on (location,visited). Then the following query will use the index, that is read data in the order of the index, and return the results you expected:
SELECT
location,
COUNT(IF(#loc <> #loc:=location,
#vis:=visited,
IF(#vis + INTERVAL 10 MINUTE < #vis:=visited,
visited,
NULL))) as visit_count
FROM visits,
(SELECT #loc:='', #vis:=FROM_UNIXTIME(0)) as init
GROUP BY location;
Result:
+----------+-------------+
| location | visit_count |
+----------+-------------+
| place 1 | 2 |
| place 2 | 2 |
| place 3 | 1 |
+----------+-------------+
3 rows in set (0.00 sec)
Some explanation:
The key of the solution is that it fades out the functional nature of SQL, and uses MySQL implementation specifics (they say it is bad, never do it again!!!).
If a table has an index (an ordered representation of column values) and the index is used in a query, that means that the data from the table is read in the order of the index.
GROUP BY operation will benefit from an index (since the data is already grouped there) and will choose it if it is applicable.
All aggregating functions in SQL (except for COUNT(*) which has a special meaning) check each row, and use the value only if it is not NULL (the expression within COUNT above returns NULL for wrong conditions)
The rest is just a hacky representation of procedural iteration over a list of rows (which is read in the order of the index, that is ordered by location asc, visisted asc): I initialize some variables, if location differs from the previous row - I count it, if not - I check the interval and return NULL if it is wrong.
You can populate the temporary table using a INSERT / SELECT statement.
See manual. http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
I'd use the GROUP BY in the SELECT statement to narrow down the places.
For the visits column that can be populated as a COUNT operation, and I think it might be possible to perform that as also part of the INSERT / SELECT.
See manual. http://dev.mysql.com/doc/refman/5.1/en/counting-rows.html
So your SQL might look something like this.
INSERT INTO temp
SELECT * FROM (
SELECT *,COUNT('visits')
FROM source AS table1
GROUP BY location
WHERE date_visited > xxxx AND date_visited < xxxx
)
AS table2
Seriously, that is off the top of my head but it should give you some ideas on how SQL can be structured. But you likely can do the report using just one good query.