PHP MySQL Task API, Prevent Duplicate Records

PHP MySQL Task API, Prevent Duplicate Records - php

I am building a PHP RESTful-API for remote "worker" machines to self-assign tasks. The MySQL InnoDB table on the API host holds pending records that the workers can pick up from the API whenever they are ready to work on a record. How do I prevent concurrently requesting worker system from ever getting the same record?
My initial plan to prevent this is to UPDATE a single record with a uniquely generated ID in a default NULL field, and then poll for the details of the record where the unique ID field matches.
For example:
UPDATE mytable SET status = 'Assigned', uniqueidfield = '3kj29slsad'
WHERE uniqueidfield IS NULL LIMIT 1
And in the same PHP instance, the next query:
SELECT id, status, etc FROM mytable WHERE uniqueidfield = '3kj29slsad'
The resulting record from the SELECT statement above is then given to the worker. Would this prevent simultaneously requesting workers from getting the same records shown to them? I am not exactly sure on how MySQL handles the lookups within an UPDATE query, and if two UPDATES could "find" the same record, and then update it sequentially. If this works, is there a more elegant or standardized way of doing this (not sure if FOR UPDATE would need to be applied to this)? Thanks!

Nevermind my previous answer. I believe I understand what you are asking. I'll reword it so maybe it is clearer to others.
"If I issue two of the above update statements at the same time, what would happen?"
According to http://dev.mysql.com/doc/refman/5.0/en/lock-tables-restrictions.html, the second statement would not interfere with the first one.
Normally, you do not need to lock tables, because all single UPDATE
statements are atomic; no other session can interfere with any other
currently executing SQL statement.
A more elegant way is probably opinion based, but I don't see anything wrong with what you're doing.

Related

How to check in real-time if new row was added to MySQL table

We have an automatic car plate reader which records plates of the cars enter to firm. My colleague asked me if we can instantly get the plate number of the car coming. The software uses MySQL and I have only database access. Cannot reach/edit PHP codes.
My offer is to check using a query periodically. For example for 10 seconds. But in this way it is possible to miss the cars coming in 5 seconds. Then decreasing interval increases request/response count which means extra load for the server. I do not want the script to run always. It should run only a new db row added. It shows the plate and exits.
How can I get last recorded row from the db right after inserting? I mean there should be trigger which runs my PHP script after insertion. But I do not know.
What I want is MySQL could run my PHP script after a new record.

If your table is MyISAM, I would stick to your initial idea. Getting the row count from a MyISAM table is instant. It only takes the reading of one single value as MyISAM maintains the row count at all times.
With InnoDB, this approach can still be acceptable. Assuming car_table.id is primary key, SELECT COUNT(id) FROM car_table only requires an index scan, which is very fast. You can improve on this idea by adding another indexed boolean column to your table:
ALTER car_table ADD COLUMN checked BOOLEAN NOT NULL DEFAULT 0, ADD INDEX (checked);
The default value ensures new cars will be inserted with this flag set to 0 without modifying the inserting statement. Then:
BEGIN TRANSACTION; -- make sure nobody interferes
SELECT COUNT(checked) FROM car_table WHERE checked = FALSE FOR UPDATE; -- this gets you the number of new, unchecked cars
UPDATE car_table SET checked = TRUE WHERE checked = FALSE; -- mark these cars as checked
COMMIT;
This way, you only scan a very small number of index entries at each polling.
A more advanced approach consists in adding newly created cars ID's into a side table, through a trigger. This side table is scanned every now and then, without locking the main table, and without altering its structure. Simply TRUNCATE this side table after each polling.
Finally, there is the option of triggering a UDF, as suggested by Panagiotis, but this seems to be an overkill in most situations.

Although this is not the greatest of designs and I have not implemented it, there is way to call an external script through sys_exec() UDF using a trigger as mentioned here:
B.5.11: Can triggers call an external application through a UDF?
Yes. For example, a trigger could invoke the sys_exec() UDF.
http://dev.mysql.com/doc/refman/5.1/en/faqs-triggers.html#qandaitem-B-5-1-11
Also have a look on this thread which is similar to your needs.
Invoking a PHP script from a MySQL trigger

Tricky MySQL Batch Design

I have a scraper which visits many sites and finds upcoming events and another script which is actually supposed to put them in the database. Currently the inserting into the database is my bottleneck and I need a faster way to batch the queries than what I have now.
What makes this tricky is that a single event has data across three tables which have keys to each other. To insert a single event I insert the location or get the already existing id of that location, then insert the actual event text and other data or get the event id if it already exists (some are repeating weekly etc.), and finally insert the date with the location and event ids.
I can't use a REPLACE INTO because it will orphan older data with those same keys. I asked about this in Tricky MySQL Batch Query but if TLDR the outcome was I have to check which keys already exist, preallocate those that don't exist then make a single insert for each of the tables (i.e. do most of the work in php). That's great but the problem is that if more than one batch was processing at a time, they could both choose to preallocate the same keys then overwrite each other. Is there anyway around this because then I could go back to this solution? The batches have to be able to work in parallel.
What I have right now is that I simply turn off the indexing for the duration of the batch and insert each of the events separately but I need something faster. Any ideas would be helpful on this rather tricky problem. (The tables are InnoDB now... could transactions help solve any of this?)

I'd recommend starting with Mysql Lock Tables which you can use to prevent other sessions from writing to the tables whilst you insert your data.
For example you might do something similar to this
mysql_connect("localhost","root","password");
mysql_select_db("EventsDB");
mysql_query("LOCK TABLE events WRITE");
$firstEntryIndex = mysql_insert_id() + 1;
/*Do stuff*/
...
mysql_query("UNLOCK TABLES);
The above does two things. Firstly it locks the table preventing other sessions from writing to it until you the point where you're finished and the unlock statement is run. The second thing is the $firstEntryIndex; which is the first key value which will be used in any subsequent insert queries.

How MySQL manage multiple queries from multiple users simultaneously?

Just to give you an example:
I have a PHP script that manages users votes.
When a user votes, the script makes a query to check if someone has already voted for the same ID/product. If nobody has voted, then it makes another query and insert the ID into a general ID votes table and another one to insert the data into a per user ID votes table. And this kind of behavior is repeated in other kind of scripts.
The question is, if two different users votes simultaneously its possible that the two instances of the code try to insert a new ID (or some similar type of query) that will give an error??
If yes, how I prevent this from happening?
Thanks?
Important note: I'm using MyISAM! My web hosting don't allow InnoDB.

The question is, if two different users votes simultaneously its possible that the two instances of the
code try to insert a new ID (or some similar type of query) that will give an erro
Yes, you might end up with two queries doing the insert. Depending on the constraints on the table, one of them will either generate an error, or you'll end up with two rows in your database.
You could solve this, I believe, with applying some locking;
e.g. if you need to add a vote to the product with id theProductId:(pseudo code)
START TRANSACTION;
//lock on the row for our product id (assumes the product really exists)
select 1 from products where id=theProductId for update;
//assume the vote exist, and increment the no.of votes
update votes set numberOfVotes = numberOfVotes + 1 where productId=theProductId ;
//if the last update didn't affect any rows, the row didn't exist
if(rowsAffected == 0)
insert into votes(numberOfVotes,productId) values(1,theProductId )
//insert the new vote in the per user votes
insert into user_votes(productId,userId) values(theProductId,theUserId);
COMMIT;
Some more info here
MySQL offers another solution as well, that might be applicable here, insert on duplicate
e.g. you might be able to just do:
insert into votes(numberOfVotes,productId) values(1,theProductId ) on duplicate key
update numberOfVotes = numberOfVotes + 1;
If your votes table have a unique key on the product id column, the above will
do an insert if the particular theProductId doesn't exist, otherwise it will do an update, where it increments the numberOfVotes column by 1
You could probably avoid a lot of this if you created a row in the votes table at the same time you added the product to the database. That way you could be sure there's always a row for your product, and just issue an UPDATE on that row.

The question is, if two different
users votes simultaneously its
possible that the two instances of the
code try to insert a new ID (or some
similar type of query) that will give
an error??
Yes, in general this is possible. This is an example of a very common problem in concurrent systems, called a race condition.
Avoiding it can be rather tricky, but in general you need to make sure that the operations cannot interleave in the way you describe, e.g. by locking the database for a while.
There are several practical solutions to this, all with their own advantages and risks (e.g. dead locks). See the Wikipedia article for a discussion and further pointers to information.

The easiest way:
LOCK TABLES table1 WRITE, table2 WRITE, table3 WRITE
-- check for record, insert if not exists, etc...
UNLOCK TABLES
If voting does not occur many times per second, then the above should be sufficient.
InnoDB tables offer transactions, which might be useful here as well. Others have already commented on it, so I won't go into any detail.
Alternatively, you could solve it at the code level via using some sort of shared memory mutex that disables concurrent execution of that section of PHP code.

This when the singleton pattern come in handy. It ensure that a code is executed only by one process at an instant.
http://en.wikipedia.org/wiki/Singleton_pattern
You have to make a singleton class for the database access this will prevent you from the type of error you describing.
Cheers.

Deferring frequent updates in MySQL

I have frequent updates to a user table that simply sets the last seen time of a user, and I was wondering whether there is a simple way to defer them and group them into a single query after a short timeout (5 minutes or so). This would reduce queries on my user database quite a lot.

If you do a UPDATE LOW_PRIORITY table ... you will make sure it will only execute your update when it's not doing anything else. Besides that I don't think there are much options inside MySQL.
Also, is it causing problems now or are you simply optimizing something that isn't a problem? Personally, if I would batch updates like these I would simply insert all the IDs in memcached and use a cronjob to update every 5 minutes.

Wolph's suggestion should do the trick. Also possible is to create a second table without any indices on it and insert all your data into that table. It can even be an in memory table. Then you an do a periodic INSERT INTO table1 SELECT * FROM TABLE2 ON DUPLICATE KEY UPDATE ... to transfer to the main table.

Ids from mysql massive insert from simultaneous sources

I've got an application in php & mysql where the users writes and reads from a particular table. One of the write modes is in a batch, doing only one query with the multiple values. The table has an ID which auto-increments.
The idea is that for each row in the table that is inserted, a copy is inserted in a separate table, as a history log, including the ID that was generated.
The problem is that multiple users can do this at once, and I need to be sure that the ID loaded is the correct.
Can I be sure that if I do for example:
INSERT INTO table1 VALUES ('','test1'),('','test2')
that the ids generated are sequential?
How can I get the Id's that were just loaded, and be sure that those are the ones that were just loaded?
I've thinked of the LOCK TABLE, but the users shouldn't note this.
Hope I made myself clear...

Building an application that requires generated IDs to be sequential usually means you're taking a wrong approach - what happens when you have to delete a value some day, are you going to re-sequence the entire table? Much better to just let the values fall as they may, using a primary key to prevent duplication.

based on the current implementation of myisam and innodb, yes. however, this is not guaranteed to be so in the future, so i would not rely on it.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.