Multiple commands atomically

Multiple commands atomically - php

I'm using PHP and MySQL to call a couple of commands. There is a group-employee cross-reference table, and employees may be added or removed from the group. I'm writing this feature by first removing all the employees and then adding back in each employee that is in the table if the group is modified. But I want to make sure that all the commands go through or else have the table roll back. How can I make sure the commands happen atomically?

You're looking for Transactions unless I'm mistaken.
Here's a great place to start: http://dev.mysql.com/doc/refman/5.0/en/sql-syntax-transactions.html

If you're using the MyIsam engine instead of InnoDB, then you can't take advantage of transactions. In that case, your best bet is to perform all your INSERTS firsts (your cross-ref table should have a two-field unique constraint so dups will silently fail). Then perform a single DELETE:
DELETE FROM employees_groups
WHERE group_id = <groupid>
AND employee_id NOT IN (list of employee ids you just inserted)

Assuming your target tables support transactions, then you want the mysqli transaction family of methods.
See:
mysqli->commit()
mysqli->autocommit()
mysqli->rollback()
If your tables do not support transactions, you could likely change your table types to 'InnoDB` without a problem, and thus support transactions. Check with your DB admin before doing so first.

Related

SQL - auto increment withing group inside one table [duplicate]

I have got a table which has an id (primary key with auto increment), uid (key refering to users id for example) and something else which for my question won’t matter.
I want to make, lets call it, different auto-increment keys on id for each uid entry.
So, I will add an entry with uid 10, and the id field for this entry will have a 1 because there were no previous entries with a value of 10 in uid. I will add a new one with uid 4 and its id will be 3 because I there were already two entried with uid 4.
...Very obvious explanation, but I am trying to be as explainative an clear as I can to demonstrate the idea... clearly.
What SQL engine can provide such a functionality natively? (non Microsoft/Oracle based)
If there is none, how could I best replicate it? Triggers perhaps?
Does this functionality have a more suitable name?
In case you know about a non SQL database engine providing such a functioality, name it anyway, I am curious.
Thanks.

MySQL's MyISAM engine can do this. See their manual, in section Using AUTO_INCREMENT:
For MyISAM tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups.
The docs go on after that paragraph, showing an example.
The InnoDB engine in MySQL does not support this feature, which is unfortunate because it's better to use InnoDB in almost all cases.
You can't emulate this behavior using triggers (or any SQL statements limited to transaction scope) without locking tables on INSERT. Consider this sequence of actions:
Mario starts transaction and inserts a new row for user 4.
Bill starts transaction and inserts a new row for user 4.
Mario's session fires a trigger to computes MAX(id)+1 for user 4. You get 3.
Bill's session fires a trigger to compute MAX(id). I get 3.
Bill's session finishes his INSERT and commits.
Mario's session tries to finish his INSERT, but the row with (userid=4, id=3) now exists, so Mario gets a primary key conflict.
In general, you can't control the order of execution of these steps without some kind of synchronization.
The solutions to this are either:
Get an exclusive table lock. Before trying an INSERT, lock the table. This is necessary to prevent concurrent INSERTs from creating a race condition like in the example above. It's necessary to lock the whole table, since you're trying to restrict INSERT there's no specific row to lock (if you were trying to govern access to a given row with UPDATE, you could lock just the specific row). But locking the table causes access to the table to become serial, which limits your throughput.
Do it outside transaction scope. Generate the id number in a way that won't be hidden from two concurrent transactions. By the way, this is what AUTO_INCREMENT does. Two concurrent sessions will each get a unique id value, regardless of their order of execution or order of commit. But tracking the last generated id per userid requires access to the database, or a duplicate data store. For example, a memcached key per userid, which can be incremented atomically.
It's relatively easy to ensure that inserts get unique values. But it's hard to ensure they will get consecutive ordinal values. Also consider:
What happens if you INSERT in a transaction but then roll back? You've allocated id value 3 in that transaction, and then I allocated value 4, so if you roll back and I commit, now there's a gap.
What happens if an INSERT fails because of other constraints on the table (e.g. another column is NOT NULL)? You could get gaps this way too.
If you ever DELETE a row, do you need to renumber all the following rows for the same userid? What does that do to your memcached entries if you use that solution?

SQL Server should allow you to do this. If you can't implement this using a computed column (probably not - there are some restrictions), surely you can implement it in a trigger.
MySQL also would allow you to implement this via triggers.

In a comment you ask the question about efficiency. Unless you are dealing with extreme volumes, storing an 8 byte DATETIME isn't much of an overhead compared to using, for example, a 4 byte INT.
It also massively simplifies your data inserts, as well as being able to cope with records being deleted without creating 'holes' in your sequence.
If you DO need this, be careful with the field names. If you have uid and id in a table, I'd expect id to be unique in that table, and uid to refer to something else. Perhaps, instead, use the field names property_id and amendment_id.
In terms of implementation, there are generally two options.
1). A trigger
Implementations vary, but the logic remains the same. As you don't specify an RDBMS (other than NOT MS/Oracle) the general logic is simple...
Start a transaction (often this is Implicitly already started inside triggers)
Find the MAX(amendment_id) for the property_id being inserted
Update the newly inserted value with MAX(amendment_id) + 1
Commit the transaction
Things to be aware of are...
- multiple records being inserted at the same time
- records being inserted with amendment_id being already populated
- updates altering existing records
2). A Stored Procedure
If you use a stored procedure to control writes to the table, you gain a lot more control.
Implicitly, you know you're only dealing with one record.
You simply don't provide a parameter for DEFAULT fields.
You know what updates / deletes can and can't happen.
You can implement all the business logic you like without hidden triggers
I personally recommend the Stored Procedure route, but triggers do work.

It is important to get your data types right.
What you are describing is a multi-part key. So use a multi-part key. Don't try to encode everything into a magic integer, you will poison the rest of your code.
If a record is identified by (entity_id,version_number) then embrace that description and use it directly instead of mangling the meaning of your keys. You will have to write queries which constrain the version number but that's OK. Databases are good at this sort of thing.
version_number could be a timestamp, as a_horse_with_no_name suggests. This is quite a good idea. There is no meaningful performance disadvantage to using timestamps instead of plain integers. What you gain is meaning, which is more important.
You could maintain a "latest version" table which contains, for each entity_id, only the record with the most-recent version_number. This will be more work for you, so only do it if you really need the performance.

PHP MySQL Task API, Prevent Duplicate Records

I am building a PHP RESTful-API for remote "worker" machines to self-assign tasks. The MySQL InnoDB table on the API host holds pending records that the workers can pick up from the API whenever they are ready to work on a record. How do I prevent concurrently requesting worker system from ever getting the same record?
My initial plan to prevent this is to UPDATE a single record with a uniquely generated ID in a default NULL field, and then poll for the details of the record where the unique ID field matches.
For example:
UPDATE mytable SET status = 'Assigned', uniqueidfield = '3kj29slsad'
WHERE uniqueidfield IS NULL LIMIT 1
And in the same PHP instance, the next query:
SELECT id, status, etc FROM mytable WHERE uniqueidfield = '3kj29slsad'
The resulting record from the SELECT statement above is then given to the worker. Would this prevent simultaneously requesting workers from getting the same records shown to them? I am not exactly sure on how MySQL handles the lookups within an UPDATE query, and if two UPDATES could "find" the same record, and then update it sequentially. If this works, is there a more elegant or standardized way of doing this (not sure if FOR UPDATE would need to be applied to this)? Thanks!

Nevermind my previous answer. I believe I understand what you are asking. I'll reword it so maybe it is clearer to others.
"If I issue two of the above update statements at the same time, what would happen?"
According to http://dev.mysql.com/doc/refman/5.0/en/lock-tables-restrictions.html, the second statement would not interfere with the first one.
Normally, you do not need to lock tables, because all single UPDATE
statements are atomic; no other session can interfere with any other
currently executing SQL statement.
A more elegant way is probably opinion based, but I don't see anything wrong with what you're doing.

Keep updated mysql data between multiple mysql tables

I have two tables in mysql. When I insert/delete values in the first table I want that the values get duplicated in table 2 to keep them "aligned".
table1:
id - username
1 - test_user
table2:
Same id as table1 and username as table1 (on insert/delete)
I want to keep the data between the tables aligned without doing multiple queries. I've read about triggers not sure if it's the correct road, i am a beninner.
I said two tables but i will need to do this in multiple tables.

You can use Mysql triggers. This way you can auto insert/update/delete datas from second table.
MySql Using Triggers

When you INSERT new records, given that you don't want to do two inserts for some reason, using a trigger to insert into the second table will work. For UPDATE and DELETE you might want to look at the CASCADE option with foreign keys. If all you are doing is keeping the data consistent between tables, that's exactly what cascade is for.
When you create table2 you just add a foreign key like this:
FOREIGN KEY (id, username)
REFERENCES table1(id, username) ON UPDATE CASCADE ON DELETE CASCADE
Then whenever you alter table1 the changes will automatically get pushed through to table2.
Couple prerequisites for this to work:
You have to use a storage engine that supports foreign keys, something like InnoDB and not MyISAM
You need to have an index on (id,username) in table1; the foriegn key needs to match a key in the parent table
You should read the doc page for foreign keys. There are a couple other ways you can tweak them, and you should figure out what works best for your purposes.

You can certainly put triggers on your table1 to make parallel changes to your other tables as your application changes table1.
See here for the documentation: http://dev.mysql.com/doc/refman/5.0/en/trigger-syntax.html
But, you should think over your design. It will take multiple queries to do your inserts and updates; they'll just be done "behind your back" on the server. They'll still take time. Triggers can really slow things down.
Also, triggers are a little bit fragile. If you add a column to a table, you'll have to rework your triggers. Triggers are generally a pain in the neck to keep in a source-control system and a huge pain in the neck to test, so using them will make your application more troublesome to maintain.
Could you think of another approach to handling this need for duplication? Could you, for example, use a view or a join to present the data you need to your application program without actually duplicating tables and the rows in them? If you figure out how to do that you'll be much happier in the long run.
CREATE VIEW table2 AS
SELECT *
FROM table1;
will produce a "fake" table2 with the contents of table1.
Or if you're hoping to view only the test users in a second table, a view can do that for you too, for example:
CREATE VIEW table3 AS
SELECT *
FROM table1
WHERE usertype = 'test_user' ;
If you're using duplicate tables for "backup," that's a bad way to make sure your information is safe. Instead, you need to back up your MySQL server instance.
Formal relational database design principles teach us to duplicating data, but instead use view and joins to structure the data the way applications need to see it.

PHP MySQL inserting data to multiple tables

I'm trying to make an experimental web application which minimises redundant data.
I have three example tables set up like so:
Table one
ID | created_at (unix timestamp) | updated_at (unix timestamp)
Table two
ID | Foreign Key to table one | Title
Table three (pages)
ID | Foreign Keys to both table one and two | Content | Metadata
The idea being that everything created in the application will have a creation/edit time.
Many (but not all) things will have a title (For example a page or a section for a page to go into).
Finally, some things will have attributes specific to themselves, eg content and metadata for a page.
I'm trying to work out the best way to enter data into multiple tables. I know I could do multiple insert queries from PHP, keep track of rows created in the current transaction and delete those rows should a later part of the transaction fail. However, if the PHP script dies completely, it may stop before all of the deletions can be completed.
Does MySQL have any inbuilt logic which would allow the insert query to be split up? Would a trigger be able to handle this type of transaction or is it beyond its capabilities?
Any advice, thoughts or ideas would be greatly appreciated.
Thanks!

A solution would be to use Transactions, which allow to get "all or nothing" behaviour.
The idea is the following :
you start a transaction
you do your inserts/updates
if everything is OK, you commit the transaction ; which will save everything you did during this transaction
if not, you rollback the transaction ; and everything you did in it will be cancelled.
if you don't commit and disconnected (if your PHP script dies, for instance), nothing will be commited, and what you did during the un-commited transaction will automatically be rolled-back.
For more informations, you can take a look at 12.4.1. START TRANSACTION, COMMIT, and ROLLBACK Syntax, for MySQL.
Note that transactions are only available for some DB engines :
MyISAM doesn't support transactions
InnoDB does (it also supports foreign keys, for instance -- it's far more advanced that MyISAM).

For multiple inserts you can create a procedure and on PHP you call the procedure.

Key problem: Which key strategy should I use in my database?

Problem: When I use an auto-incrementing primary key in my database, this happens all the time:
I want to store an Order with 10 Items. The ordered Items belong to the Order. So I store the order, ask the database for the last inserted id (which is dangerous when it comes to concurrency, right?), and then store the 10 Items with the foreign key (order_id).
So I always have to do:
INSERT ...
last_inserted_id = db.lastInsertId();
INSERT ...
INSERT ...
INSERT ...
and I believe this prevents me from using transactions in almost all INSERT cases where I need a foreign key.
So... here some solutions, and I don't know if they're really good:
A) Don't use auto_increment keys! Use a key table?
Key Table would have two fields: table_name, next_key. Every time I need a key for a table to insert a new dataset, first I ask for the next_key by accessing a special static KeyGenerator class method. This does a SELECT and an UPDATE, if possible in one transaction (would that work?). Of course I would request that for every affected table. Next, I can INSERT my entire object graph in one transaction without playing ping-pong with the database, before I know the keys already in advance.
B) Using GUUID / UUID algorithm for keys?
These suppose to be really unique worldwide, and they're LARGE. I mean ... L_A_R_G_E. So a big amount of memory would go into these gigantic keys. Indexing will be hard, right? And data retrieval will be a pain for the database - at least I guess - integer keys are much faster to handle. On the other hand, these also provide some security: Visitors can't iterate anymore over all orders or all users or all pictures by just incrementing the id parameter.
C) Stick with auto_incremented keys?
Ok, if then, what about transactions like described in the example above? How can I solve that? Maybe by inserting a Ghost Row first and then doing an transaction with one UPDATE + n INSERTs?
D) What else?

When storing orders, you need transactions to prevent situations where only half your products are added to the database.
Depending on your database and your connector, the value returned by the last-insert-id function might be transaction-independent. For instance, with MySQL, mysql_insert_id returns the identifier for the last query from that particular client (without being affected by what other clients are doing concurrently).

Which database are you using?
Yes, typically inserting a record and then trying to select it again to find the auto-generated key is bad, especially if you are using a naive select max(id) from table query. This is because as soon as two threads are creating records max(id) may not actually return the last id your current thread used.
One way to avoid this is to create a sequence in the database. From your code you select sequence.NextValue then use that value to then execute your inserts (or you can craft a more complex SQL statement that does this selection and the inserts in one go). Sequences are atomic / thread-safe.
In MySQL you can ask for the last inserted id from the execution results which I believe will always give you the correct answer.

Sql Server supports SCOPE_IDENTITY (Transact-SQL) which should take care of your transaction issue and concurrency issue.
I would say stick with auto_increment.

(Assuming you are using MySQL)
"ask the database for the last inserted id (which is dangerous when it comes to concurrency, right?)"
If you use MySQLs last_insert_id() function, you only see what happened in your session. So this is safe. You mention ths:
db.last_insert_id()
I don't know what framework or language it is, but I would assume that uses MySQL's last_insert_id() under the covers (if not, it is a pretty useless database abstraction fromework)
" I believe this prevents me from using transactions in almost all INSERT cases w"
I don't see why. Please explain.

D) Sequence
: may not be available in your DBMS, but if it is, solves your problem elegantly.
For Postgresql, have a look at Sequence Functions

There is no final and general answer to this question.
auto incrementing columns are easy to use when you add new records. To use them as foreign keys within the same transaction, they are not so straight forward. You need database specific commands to get the newly created key. This technology is common for certain databases, for instance sql server.
Sequences seem to be harder to use, because you need to get a key before you insert a row, but at the end its easier to use them as foreign keys. This technology is common for certain databases, for instance oracle.
When you use Hibernate or NHibernate, it is discouraged to use auto incrementing keys, because some optimizations are not possible anymore. Using a hi-lo algorithm which uses an additional table is recommended.
Guids are strong, for instance when sharing data between different databases, systems, disconnected scenarios, import / export etc. In many databases, most of the tables contain only a few hundred records, so memory and performance are not such an issue. When using NHibernate, you get an guid generator which produces sequential guids, because some databases perform better when keys are sequential.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.