How to prevent selection of the row while it is being handled? - php

I have MySQL (InnoDB) table with the column is_locked which shows current state of the record (is it being handled by system now, or not).
On the other hand, I have many nodes that perform SELECT * FROM table_name WHERE is_locked = 0 and then handles got rows from this table.
In my code I do this:
System takes the row from DB (SELECT * FROM table_name WHERE is_locked = 0)
System lockes the row by command UPDATE table_name SET is_locked = 1 WHERE id = <id>
Problem:
Nodes are working very fast, all of them may get the same row, before first of them will update the row and set is_locked to 1
I found out LOCKING of the tables, but I don't think it is the right way.
Can anybody tell me, how to handle such cases?

I recommend two things:
Limit your select to one, as you're dealing with concurrency issues, it is better to take smaller "bites" with each iteration
Use transactions, this allows you to start the transaction, get the record, lock it and then commit the transaction. This will force mysql to enforce your concurrency locks.

Related

Locking transaction on mysql table with multi threading [duplicate]

I have one table that is read at the same time by different threads.
Each thread must select 100 rows, execute some tasks on each row (unrelated to the database) then they must delete the selected row from the table.
rows are selected using this query:
SELECT id FROM table_name FOR UPDATE;
My question is: How can I ignore (or skip) rows that were previously locked using a select statement in MySQL ?
I typically create a process_id column that is default NULL and then have each thread use a unique identifier to do the following:
UPDATE table_name SET process_id = #{process.id} WHERE process_id IS NULL LIMIT 100;
SELECT id FROM table_name WHERE process_id = #{process.id} FOR UPDATE;
That ensures that each thread selects a unique set of rows from the table.
Hope this helps.
Even though it is not the best solution, as there is no way that I know to ignore locked rows, I select a random one and try to obtain a lock.
START TRANSACTION;
SET #v1 =(SELECT myId FROM tests.table WHERE status is NULL LIMIT 1);
SELECT * FROM tests.table WHERE myId=#v1 FOR UPDATE; #<- lock
Setting a small timeout for the transaction, if that row is locked the transaction is aborted and I try another one. If I obtain the lock, I process it. If (bad luck) that row was locked, it is processed and the lock is released before my timeout, I then select a row that has already been 'processed'! However, I check a field that my processes set (e.g. status): if the other process transaction ended OK, that field tells me that work has already been done and I do not process that row again.
Every other possible solution without transactions (e.g. setting another field if the row has no status and ... etc.) can easily provide race conditions and missed processes (e.g. one thread abruptly dies, the allocated data is still tagged, while a transaction expires; ref. comment here
Hope it helps

How do I make sure I'm not updating the same record from multiple processes? Do I need table locks?

I'm working on a project using a MySQL database as the back-end (accessed from PHP). Sometimes, I select a row, do some operations on it, and then update the record in the database.
I am worried that another user could have initiated a similar process on the same row right after the first select, and his changes could overwrite some of the changes the first user did (because the second user's select did not yet include those changes).
Is this an actual problem? Should I lock the table, and won't this severely impact my application's performance? Any other solutions?
Just to be thorough with my information, I also have some CRON jobs running that could also be modifying the same data.
Thanks!
I can think of two solutions, other than explicitly using transactions:
Use SELECT .. FOR UPDATE : http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
Manually change a value so the row is not select by other queries:
SET #update_id := 0;
UPDATE table_name SET status = 'IN_PROCESS', id = (SELECT #update_id := id) WHERE status = 'WAITING' AND [your condition] LIMIT 1;
SELECT #update_id;
Here, the rows to be selected must have the value of status="WAITING". And when this query runs, it selects the ID, and changes the value of 'status', so the row can't be selected by other queries.

MYSQL table locking with PHP

I have mysql table fg_stock. Most of the time concurrent access is happening in this table. I used this code but it doesn't work:
<?php
mysql_query("LOCK TABLES fg_stock READ");
$select=mysql_query("SELECT stock FROM fg_stock WHERE Item='$item'");
while($res=mysql_fetch_array($select))
{
$stock=$res['stock'];
$close_stock=$stock+$qty_in;
$update=mysql_query("UPDATE fg_stock SET stock='$close_stock' WHERE Item='$item' LIMIT 1");
}
mysql_query("UNLOCK TABLES");
?>
Is this okay?
"Most of the time concurrent access is happening in this table"
So why would you want to lock the ENTIRE table when it's clear you are attempting to access a specific row from the table (WHERE Item='$item')? Chances are you are running a MyISAM storage engine for the table in question, you should look into using the InnoDB engine instead, as one of it's strong points is that it supports row level locking so you don't need to lock the entire table.
Why do you need to lock your table anyway?????
mysql_query("UPDATE fg_stock SET stock=stock+$qty_in WHERE Item='$item'");
That's it! No need in locking the table and no need in unnecessary loop with set of queries. Just try to avoid SQL Injection by using intval php function on $qty_in (if it is an integer, of course), for example.
And, probably, time concurrent access is only happens due to non-optimized work with database, with the excessive number of queries.
ps: moreover, your example does not make any sense as mysql could update the same record all the time in the loop. You did not tell MySQL which record exactly do you want to update. Only told to update one record with Item='$item'. At the next iteration the SAME record could be updated again as MySQL does not know about the difference between already updated records and those that it did not touched yet.
http://dev.mysql.com/doc/refman/5.0/en/internal-locking.html
mysql> LOCK TABLES real_table WRITE, temp_table WRITE;
mysql> INSERT INTO real_table SELECT * FROM temp_table;
mysql> DELETE FROM temp_table;
mysql> UNLOCK TABLES;
So your syntax is correct.
Also from another question:
Troubleshooting: You can test for table lock success by trying to work
with another table that is not locked. If you obtained the lock,
trying to write to a table that was not included in the lock statement
should generate an error.
You may want to consider an alternative solution. Instead of locking,
perform an update that includes the changed elements as part of the
where clause. If the data that you are changing has changed since you
read it, the update will "fail" and return zero rows modified. This
eliminates the table lock, and all the messy horrors that may come
with it, including deadlocks.
PHP, mysqli, and table locks?

row lock in mysql

I have to do some network IO based on every row in a table with more than 70 million rows. Since high TPS is needed i have created a php script that does this task for a single row in table. I plan to call this php script using a cron job about 40 times every second. How do I do this so that no two script access the same row.
To do it purely based on the table, you will need to set something In the table - a boolean, timestamp, deleting the row, etc - that indicates that you've processed the row. After that, a transaction is all you need.
START TRANSACTION;
SELECT * FROM table WHERE processing = 0 ORDER BY id ASC LIMIT 1 FOR UPDATE;
UPDATE table SET processing = 1 WHERE id = $id_of_what_we_got;
COMMIT;
-- process row here
-- optionally, tell the db we're done
UPDATE table SET processing = 2 WHERE id = $id_of_what_we_got;
Just make sure to use the same MySQL connection (PHP resource) for the entire transaction.
Further reading:
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html
http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
https://github.com/ryandotsmith/Queue-Classic/blob/master/lib/queue_classic/durable_array.rb

What's the fastest way to poll a MySQL table for new rows?

My application needs to poll a MySQL database for new rows. Every time new rows are added, they should be retrieved. I was thinking of creating a trigger to place references to new rows on a separate table. The original table has over 300,000 rows.
The application is built in PHP.
Some good answers, i think the question deserves a bounty.
For external applications I find using a TimeStamp column is a more robust method that is independent of auto id and other primary key issues
Add columns to the tables such as:
insertedOn TIMESTAMP DEFAULT CURRENT_TIMESTAMP
or to track inserts and updates
updatedOn TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
In the external application all you need to do is track the last timestamp when you did a poll. Then select from that timestamp forward on all the relevant tables. In large tables you may need to index the timestamp column
You can use the following statement to find out if a new record was inserted in the table:
select max(id) from table_name
replacing the name of primary key and table name in the above statement. Keep the max(id) value in a temporary variable, and retrieve all new records between this and the last saved max(id) value. After fetching the new records, set max(id) value to the one you got from the query.
Create a PHP Daemon to monitor the MySQL Table File size, if size changes query for new records, if new records found run next process.
I think there is an active PEAR daemon you can easily configure to monitor the MySQL Table file size and kick off your script.
assuming you have an identify or some other data that always grow, you should keep track on your php application of the last id retrieved.
that'd work for most scenarios. Unless you are into the real time camp, I don't think you'd need any more than that.
I would do something like this. Of course, this is assuming that ID is an incrementing numerical ID.
And how you store your "current location" in the database is upto you.
<?
$idFile = 'lastID.dat';
if(is_file($idFile)){
$lastSelectedId = (int)file_get_contents($idFile);
} else {
$lastSelectedId = 0;
}
$res = mysql_query("select * from table_name where id > {$lastSelectedId}");
while($row = mysql_fetch_assoc($res)){
// Do something with the new rows
if($row['id']>$lastSelectedId){
$lastSelectedId = $row['id'];
}
}
file_put_contents($idFile,$lastSelectedId);
?>
I would concurr with TFD's answer about keeping track of a timestamp in an separate file/table and then fetching all rows newer than that. That's how I do it for a similar application.
Your application querying a single row table (or file) to see if a timestamp has changed from the local storage should not be much of a performance hit. Then, fetching new rows from the 300k row table based on timestamp should again be fine, assuming timestamp is properly indexed.
However, reading your question I was curious if Mysql triggers can do system calls, say a php script that would do some heavy lifting. Turns out they can by using the sys_exec() User-Defined Function. You could use this to do all sorts of processing by passing into it the inserted row data, essentially having an instant notification of inserts.
Finally, a word of caution about using triggers to call external applications.
One option might be to use an INSERT INTO SELECT statement. Taking from the suggestions using timestamps to pull the latest rows, you could do something like...
INSERT INTO t2 (
SELECT *
FROM t1
WHERE createdts > DATE_SUB(NOW(), INTERVAL 1 HOUR)
);
This would take all of the rows inserted in the previous hour and insert them in to table 2. You could have a script run this query and have it run every hour (or whatever interval you need).
This would drastically simplify your PHP script for pulling rows as you wouldn't need to iterate over any rows. It also gets rid of having to keep track of the last insert id.
The solution Fanis purposed also sounds like it could be interesting as well.
As a note, the select query in the above insert can but adjusted to only insert certain fields. If you only need certain fields, you would need to specify them in the insert like so...
INSERT INTO t2 (field1, field2) (
SELECT field1, field2
FROM t1
WHERE createdts > DATE_SUB(NOW(), INTERVAL 1 HOUR)
);

Categories