I have to do some network IO based on every row in a table with more than 70 million rows. Since high TPS is needed i have created a php script that does this task for a single row in table. I plan to call this php script using a cron job about 40 times every second. How do I do this so that no two script access the same row.
To do it purely based on the table, you will need to set something In the table - a boolean, timestamp, deleting the row, etc - that indicates that you've processed the row. After that, a transaction is all you need.
START TRANSACTION;
SELECT * FROM table WHERE processing = 0 ORDER BY id ASC LIMIT 1 FOR UPDATE;
UPDATE table SET processing = 1 WHERE id = $id_of_what_we_got;
COMMIT;
-- process row here
-- optionally, tell the db we're done
UPDATE table SET processing = 2 WHERE id = $id_of_what_we_got;
Just make sure to use the same MySQL connection (PHP resource) for the entire transaction.
Further reading:
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html
http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
https://github.com/ryandotsmith/Queue-Classic/blob/master/lib/queue_classic/durable_array.rb
Related
I have MySQL (InnoDB) table with the column is_locked which shows current state of the record (is it being handled by system now, or not).
On the other hand, I have many nodes that perform SELECT * FROM table_name WHERE is_locked = 0 and then handles got rows from this table.
In my code I do this:
System takes the row from DB (SELECT * FROM table_name WHERE is_locked = 0)
System lockes the row by command UPDATE table_name SET is_locked = 1 WHERE id = <id>
Problem:
Nodes are working very fast, all of them may get the same row, before first of them will update the row and set is_locked to 1
I found out LOCKING of the tables, but I don't think it is the right way.
Can anybody tell me, how to handle such cases?
I recommend two things:
Limit your select to one, as you're dealing with concurrency issues, it is better to take smaller "bites" with each iteration
Use transactions, this allows you to start the transaction, get the record, lock it and then commit the transaction. This will force mysql to enforce your concurrency locks.
I have one table that is read at the same time by different threads.
Each thread must select 100 rows, execute some tasks on each row (unrelated to the database) then they must delete the selected row from the table.
rows are selected using this query:
SELECT id FROM table_name FOR UPDATE;
My question is: How can I ignore (or skip) rows that were previously locked using a select statement in MySQL ?
I typically create a process_id column that is default NULL and then have each thread use a unique identifier to do the following:
UPDATE table_name SET process_id = #{process.id} WHERE process_id IS NULL LIMIT 100;
SELECT id FROM table_name WHERE process_id = #{process.id} FOR UPDATE;
That ensures that each thread selects a unique set of rows from the table.
Hope this helps.
Even though it is not the best solution, as there is no way that I know to ignore locked rows, I select a random one and try to obtain a lock.
START TRANSACTION;
SET #v1 =(SELECT myId FROM tests.table WHERE status is NULL LIMIT 1);
SELECT * FROM tests.table WHERE myId=#v1 FOR UPDATE; #<- lock
Setting a small timeout for the transaction, if that row is locked the transaction is aborted and I try another one. If I obtain the lock, I process it. If (bad luck) that row was locked, it is processed and the lock is released before my timeout, I then select a row that has already been 'processed'! However, I check a field that my processes set (e.g. status): if the other process transaction ended OK, that field tells me that work has already been done and I do not process that row again.
Every other possible solution without transactions (e.g. setting another field if the row has no status and ... etc.) can easily provide race conditions and missed processes (e.g. one thread abruptly dies, the allocated data is still tagged, while a transaction expires; ref. comment here
Hope it helps
I have 10 seperate php chron jobs running that select 100 records at a time from the same table using
SELECT `username` FROM `data` where `id` <> = '' limit 0,100
How do I ensure that each of these recordsets are unique? Is there a way of ensuring that each chron job does not select the same 100 records?
username is a unique if that helps.
Thanks
Jonathan
You can either choose different 100 records:
limit 100,100, limit 200,100 ...
Or choose 100 randomly:
...FROMdatawhereid<> = '' ORDER BY RAND() LIMIT 0,100
If you want to ensure that a record would not be chosen twice, you'll have to mark that record ("make it dirty"), so other cron jobs would be able to query only ones that were not chosen already. just add another boolean key called chosen, and mark it true after a given record was chosen. You'll have to run the cron jobs one by one, or use locking or mutex mechanism to ensure they won't run in parallel and race each other.
What you could do is 'mark' the records each job is going to use - the trick would be ensuring there's no race condition in marking them. Here's one way to do that.
create table job
(
job_id int not null auto_increment,
#add any other fields for a job you might want
primary key(job_id)
);
# add a job_id column to data
alter table data add column job_id not null default '0', add index(job_id);
Now, when you want to get 100 data rows to work on, get a unique job_id by inserting a row into job and obtaining the automatically generated id. Here's how you might do this in the mysql command line client, easy to see how it is adapted to code though:
insert into job (job_id) values(0);
set #myjob=last_insert_id();
Then, mark a hundred rows which are currently 0
update data set job_id=#myjob where job_id=0 limit 100;
Now, you can take your time and process all rows where job_id=#myjob, safe in the knowledge no other process will touch them.
No doubt you'll need to tailor this to suit your problem, but this illustrates how you can use simple features of MySQL to avoid a race condition among parallel processes competing for access to the same records.
I am using the following query on MySQL using PHP
$sql = SELECT MAX(SrNo) FROM cart;
$result = mysql_query($sql);
The structure of table CART is
CART (SrNo int(10));
Now I am using the result to do some kind of processing and inserting the maximum value into this table by incrementing one. My problem is that if user1 has got the maximum value of SrNo and is in-between the processing. During this time user2 also requests the server got the same maximum value of SrNo as user1 got and starts processing.
Now when both are done with the processing + insertion into the table, I will have two duplicates in the table CART. How can I prevent this from happening?
In other words, I want no one else to get the maximum value of SrNo until unless one user is finished doing its processing.
NOt a trivial thing with a web application that creates a new connection on each request.
You'd need to add a lockedBy and lockedTime columns to this table, and put into them an ID of user that requested the lock as well as timestamp of when the lock was requested. You need the timestamp, so that you can ignore locks that are longer than certain amount of time.
wouldn't you be fine with the AUTO_INCREMENT feature for PRIMARY KEY?
create table cart ( SrNo int(10) AUTO_INCREMENT PRIMARY KEY ) ENGINE = InnoDB;
then just simply insert new lines and it will automatically increment the new values. That would probably very easily do the trick you are (maybe?) trying to do.
But if you need to lock the maxmium, you can do this:
start transaction;
select max(SrNo) from cart for update;
/* do some other stuff, insert the max value + 1 etc... */
commit;
Remember: You should use transaction for any operation which is not 1 single query!
if you set SrNo as the primary key on the table, then the secound time you try to add the row going to fail, and if it fails, you can request a new number.
ALTER TABLE cart ADD PRIMARY KEY (SrNo);
I have mutliple workers SELECTing and UPDATing row.
id status
10 new
11 new
12 old
13 old
Worker selects a 'new' row and updates its status to 'old'.
What if two workers select same row at the same time?
I mean worker1 selects a new row, and before worker one updates its status, worker2 selects the same row?
Should I SELECT and UPDATE in one query or is there another way?
You can use LOCK TABLES but sometimes I prefer the following solution (in pseudo-code):
// get 1 new row
$sql = "select * from table where status='new' limit 0, 1";
$row = mysql_query($sql);
// update it to old while making sure no one else has done that
$sql = "update table set status='old' where status='new' and id=row[id]";
mysql_query($sql);
// check
if (mysql_affected_rows() == 1)
// status was changed
else
// failed - someone else did it
You could LOCK the table before your read, and unlock it after your write. This would eliminate the chance of two workers updating the same record at the same time.
http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html
Depending on your database storage engine (InnoDB, MyIsam, etc), you may be able to lock the table while a person is modifing it. It would then keep simotanious actions to the same table.
could you put conditions in your PHP logic to imply a lock? Like set a status attribute on a row that would prevent the second user from performing an update. This would possibly require querying the database before an update to make sure the row is not locked.