ID|RID|SID|Name
1| 1 | 1 |Alpha
2| 1 | 2 |Beta
3| 2 | 1 |Charlie
ID is auto-incrementing unique, not the problem here. RID is grouping sets of data together, I need a way to make the SID unique per RID 'group'. This structure is correct, I anticipate someone saying 'split it into multiple tables', that's not an option here (it's taxonomic classification).
As shown, under RID 1, the SID increments from 1 to 2, but when the RID changes to 2, the SID is 1.
I have the code to get the next value: SELECT IFNULL(MAX(SID),0)+1 AS NextVal FROM t WHERE RID=1, the question is how do I use that value when inserting a new record?
I can't simply run two queries as that can result in duplication, so somehow the table needs to be locked, ideally to write only. What would be the correct way to do this?
At first you should constraint your data to be exactly the way you want it to be, so put an unique combined index on (RID, SID).
For your problem you should start a transaction (BEGIN) and then put an exclusive lock onto the rows you need, which blocks all access to these rows for other connections (not the whole table, which is poor for performance!):
SELECT .... FOR UPDATE
This locks all selected rows exclusively. Further you should not use READ UNCOMMITTED as isolation level. you can view in the manual how to check the current isolation level and how to change this.
REPEATABLE READ is the default isolation level, which would be fine here.
Then insert your new query and commit (COMMIT) the transaction.
This should prohibit duplicates altogether since you created an unique index and it should also prohibit your scripts just failing with an error message that the unique check failed, but instead wait for other scripts to finish and insert the next row then.
Related
I have got a table which has an id (primary key with auto increment), uid (key refering to users id for example) and something else which for my question won’t matter.
I want to make, lets call it, different auto-increment keys on id for each uid entry.
So, I will add an entry with uid 10, and the id field for this entry will have a 1 because there were no previous entries with a value of 10 in uid. I will add a new one with uid 4 and its id will be 3 because I there were already two entried with uid 4.
...Very obvious explanation, but I am trying to be as explainative an clear as I can to demonstrate the idea... clearly.
What SQL engine can provide such a functionality natively? (non Microsoft/Oracle based)
If there is none, how could I best replicate it? Triggers perhaps?
Does this functionality have a more suitable name?
In case you know about a non SQL database engine providing such a functioality, name it anyway, I am curious.
Thanks.
MySQL's MyISAM engine can do this. See their manual, in section Using AUTO_INCREMENT:
For MyISAM tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups.
The docs go on after that paragraph, showing an example.
The InnoDB engine in MySQL does not support this feature, which is unfortunate because it's better to use InnoDB in almost all cases.
You can't emulate this behavior using triggers (or any SQL statements limited to transaction scope) without locking tables on INSERT. Consider this sequence of actions:
Mario starts transaction and inserts a new row for user 4.
Bill starts transaction and inserts a new row for user 4.
Mario's session fires a trigger to computes MAX(id)+1 for user 4. You get 3.
Bill's session fires a trigger to compute MAX(id). I get 3.
Bill's session finishes his INSERT and commits.
Mario's session tries to finish his INSERT, but the row with (userid=4, id=3) now exists, so Mario gets a primary key conflict.
In general, you can't control the order of execution of these steps without some kind of synchronization.
The solutions to this are either:
Get an exclusive table lock. Before trying an INSERT, lock the table. This is necessary to prevent concurrent INSERTs from creating a race condition like in the example above. It's necessary to lock the whole table, since you're trying to restrict INSERT there's no specific row to lock (if you were trying to govern access to a given row with UPDATE, you could lock just the specific row). But locking the table causes access to the table to become serial, which limits your throughput.
Do it outside transaction scope. Generate the id number in a way that won't be hidden from two concurrent transactions. By the way, this is what AUTO_INCREMENT does. Two concurrent sessions will each get a unique id value, regardless of their order of execution or order of commit. But tracking the last generated id per userid requires access to the database, or a duplicate data store. For example, a memcached key per userid, which can be incremented atomically.
It's relatively easy to ensure that inserts get unique values. But it's hard to ensure they will get consecutive ordinal values. Also consider:
What happens if you INSERT in a transaction but then roll back? You've allocated id value 3 in that transaction, and then I allocated value 4, so if you roll back and I commit, now there's a gap.
What happens if an INSERT fails because of other constraints on the table (e.g. another column is NOT NULL)? You could get gaps this way too.
If you ever DELETE a row, do you need to renumber all the following rows for the same userid? What does that do to your memcached entries if you use that solution?
SQL Server should allow you to do this. If you can't implement this using a computed column (probably not - there are some restrictions), surely you can implement it in a trigger.
MySQL also would allow you to implement this via triggers.
In a comment you ask the question about efficiency. Unless you are dealing with extreme volumes, storing an 8 byte DATETIME isn't much of an overhead compared to using, for example, a 4 byte INT.
It also massively simplifies your data inserts, as well as being able to cope with records being deleted without creating 'holes' in your sequence.
If you DO need this, be careful with the field names. If you have uid and id in a table, I'd expect id to be unique in that table, and uid to refer to something else. Perhaps, instead, use the field names property_id and amendment_id.
In terms of implementation, there are generally two options.
1). A trigger
Implementations vary, but the logic remains the same. As you don't specify an RDBMS (other than NOT MS/Oracle) the general logic is simple...
Start a transaction (often this is Implicitly already started inside triggers)
Find the MAX(amendment_id) for the property_id being inserted
Update the newly inserted value with MAX(amendment_id) + 1
Commit the transaction
Things to be aware of are...
- multiple records being inserted at the same time
- records being inserted with amendment_id being already populated
- updates altering existing records
2). A Stored Procedure
If you use a stored procedure to control writes to the table, you gain a lot more control.
Implicitly, you know you're only dealing with one record.
You simply don't provide a parameter for DEFAULT fields.
You know what updates / deletes can and can't happen.
You can implement all the business logic you like without hidden triggers
I personally recommend the Stored Procedure route, but triggers do work.
It is important to get your data types right.
What you are describing is a multi-part key. So use a multi-part key. Don't try to encode everything into a magic integer, you will poison the rest of your code.
If a record is identified by (entity_id,version_number) then embrace that description and use it directly instead of mangling the meaning of your keys. You will have to write queries which constrain the version number but that's OK. Databases are good at this sort of thing.
version_number could be a timestamp, as a_horse_with_no_name suggests. This is quite a good idea. There is no meaningful performance disadvantage to using timestamps instead of plain integers. What you gain is meaning, which is more important.
You could maintain a "latest version" table which contains, for each entity_id, only the record with the most-recent version_number. This will be more work for you, so only do it if you really need the performance.
everyone! I'm making a simple todo app. I stopped on the one problem. I want to allow users to change the order of elements in a list (saving this to database).
One of first idea was:
Create a column (order) and change it every time when user do something.
It's good when we have a few records, but what with bigger number?
My thought:
id | name | order
1 | lorem| 1
2 | ipsum| 2
3 | dolor| 3
When user change "dolor" to first position, script must update all of records.
This isn't the best solution I think.
Anyone can share the knowledge how to optimize that?
I will be grateful!
You could use a column called next or previous. This is called a linked list, or if you use both, a double linked list. See:
https://en.wikipedia.org/wiki/Doubly_linked_list
Moving a record up one step in a database table would involve two steps:
Remove the record from the order.
Insert the record back into the order.
In all you would always need about five record changes for a double linked list, and a minimum of three records for a linked list.
If you want to store this data in a database, then an "ordering" column is appropriate.
Whenever you update or insert into the table, you will need to update this column (deletes are unnecessary). In general, you will need to update all the rows after the changed row. A trigger can do this work.
Cycling through rows is probably fine for a few dozen or even a few hundred rows (depending on how powerful your database is). So, depending on the length of the list, this is likely to be fine.
Any enhancements depend on additional factors. Some that I can think of:
How large will the lists really be?
What kind of transformations are most import? (Swaps? Inserts? deletes? updates?)
Will the transformations happen in bulk?
Will multiple users be changing the list at the same time.
I have 2 tables,
first table stores URLs
|link_id | link_url | <== schema url_table ::: Contains 2 million+ rows
and second table stores user_bookmarks
|user_id| link_id | is_bookmarked | <== schema for user_table ::: over 3.5 million+ rows
is_bookmarked stores 1 or 0, according to the link being bookmarked by the user or not.
Here is the problem,
When a new link is added, these are the steps followed
1) Check if url already exists in url_table , which means going through millions of rows
2)if does not exist add a new row in url_table and user_table
The Database(Mysql) is simply taking too much time, due to the enormous row-set,
Also, its a very simple php+Mysql app, with no search-assisted indexing programs whatsoever.
any suggestions to speed this up?
Why not remove the column user_bookmarks.is_bookmarked and use the sole existence of an entry with user_id and link_id as indicator that the link was bookmarked?
A new link has no entries in the user_bookmarks table, because nobody bookmarked it yet. When a user bookmarks a link, you add an entry. When the user removes the bookmark, you remove the row.
To check if a user bookmarked a link or not, simply SELECT count() FROM user_bookmarks WHERE user_id=? AND link_id=?. When you receive 1 row, it is bookmarked. When you receive 0 rows, it isn't.
Speeding up the insert-query when adding a new entry in the URL table could be accelerated with an appropriate index.
If you told us what your curent schema was (i.e. the create table statements including indexes) rather than just what your column names were then we might be able to make practical suggestions as to how to improve that.
There's certainly scope for improving the method of adding rows:
Assuming that the link_url can be larger than the 767 byte limit for an Innodb table (you didn't say what engine you are using), then change the id column to contain a md5 hash of the link_url with a unique index. Then when you want to add a record, go ahead and try to insert it using INSERT IGNORE ....
Say I have the following table
//table user_update
update_id | userid | update_message | timestamp
Is there a way to set a maximum number of entries per userid? I want to make it so after a user types say 5 updates, any more updates entered after that deletes the oldest update. I know how to do this through PHP, just curious if there's any way to do this through MySQL only.
The only way I can think of doing this database-side would be to use a TRIGGER on the table. Maybe something like this:
CREATE TRIGGER check_for_too_many_rows
AFTER INSERT ON User_Update
FOR EACH ROW BEGIN
DO SOME LOGIC TO CHECK THE COUNT AND DELETE IF MORE THAN 5...
END;
Here is some additional information:
http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html
Good luck.
That is actually possible. but it is questionable if you really want to make that effort.
Read a little about 'triggers'. You can use a trigger to start an action when certain conditions are met. This way you can trigger a delete action on every insert action on that table. Then all that is left is a condition that up to five entries are kept.
I have several servers running their own instance of a particular MySQL database which unfortunately cannot be setup in replication/cluster. Each server inserts data into several user-related tables which have foreign key constraints between them (e.g. user, user_vote). Here is how the process goes about:
all the servers start with the same data
each server grows its own set of data indepedently from the other servers
periodically, the data from all the servers is merged manually together and applied back to each server (the process therefore repeats itself from step 1).
This is made possible because in addition to its primary key, the user table contains a unique email field which allows identifying which users are already existing in each database, and merging those who are new while changing the primary and foreign keys to avoid collisions and maintain the correct foreign key constraints. It works, but it's quite some effort because primary and foreign keys have to be changed to avoid collision, hence my question:
Is there a way to have each server use primary keys that don't collide with other servers to facilitate the merging?
I initially wanted to use a composite primary key (e.g. server_id, id) but I am using Doctrine which doesn't support primary keys composed of multiple foreign keys so I would have problems with my foreign key constraints.
I thought about using a VARCHAR as an id and using part of the string as a prefix (SERVER1-1,SERVER1-2, SERVER2-1, SERVER2-2...) but I'm thinking it will make the DB slower as I will have to do some manipulations with the ids (e.g. on insert, I have to parse existing ids and extract highest, increment it, concatenate it with server id...).
PS: Another option would be to implement replication with read from slaves and write to master but this option was discarded because of issues such as replication lag and single point of failure on the master which can't be solved for now.
You can make sure each server uses a different incrementation of autoincrement, and a different start offset:
Change the step auto_increment fields increment by
(assuming you are using auoincrements)
I've only ever used this across two servers, so my set-up had one with even ids and one with odd.
When they are merged back together nothing will collide, as long as you make sure all tables follow the above idea.
in order to implement for 4 servers
You would say, set-up the following offsets:
Server 1 = 1
Server 2 = 2
Server 3 = 3
Server 4 = 4
You would set your incrementation as such (I've used 10 to leave space for extra servers):
Server 1 = 10
Server 2 = 10
Server 3 = 10
Server 4 = 10
And then after you have merged, before copying back to each server, you would just need to update the autoinc value for each table to have the correct offset again. Imagine each server had created 100 rows, autoincs would be:
Server 1 = 1001
Server 2 = 1002
Server 3 = 1003
Server 4 = 1004
This is where it does get tricky due to having four servers. For imagine certain tables may not have had any rows inserted from a particular server. So you could end up with some tables having their last autoinc id not being from server 4, but from being from server 2 instead. This would make it very tricky to work out what the next autoinc should be for any particular table.
For this reason it is probably best to also include a column in each of your tables that records the server number when any rows are inserted.
id | field1 | field2 | ... | server
That way you can easily find out what the last autoinc value should be for a particular server by selecting the following on any of your tables:
SELECT MAX(id) FROM `table` WHERE `server`=4 LIMIT 0,1
Using this value you can reset the next autoinc value you need for each table on each server, before rolling the merged dataset out to the server in question.
UPDATE information_schema.tables SET Auto_increment = (
SELECT MAX(id) FROM `table` WHERE `server`=s LIMIT 0,1
)+n WHERE table_name='table' AND table_schema = DATABASE();
Where s is the server number and n is set to the offset, so in my example it would be 10.
Prefixing ID ould do the trick. As for DB being slower - depends how big traffic is served there. You can also have "prefixed id" splitted into two columns, "prefix" and "id" and these can be of any type. Would require some logic to cope with it in requests, but may be worth evaluating