Mysql lock concurrent read/update of row - php

I have table, and many (too many) requests for selecting from it a single row. After selecting a row, the script run update query to set a flag that is that row had been "selected". But as we have too many requests per time, in period between one thread select a row, and update its flag, another thread have time to select the same row.
Select query get one row from the table, ordering it by some field and using LIMIT 0, 1. I need that DB just skip the row, that had been selected before.
The engine is InnoDB.

Just before you start a transaction, call the following:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
This will ensure that if you read a row with a flag, it'll still be that way when you update it within the same transaction.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT id_site
INTO #site
FROM table1 WHERE flag = 0 ORDER BY field LIMIT 0,1;
UPDATE table1 SET flag = 1 WHERE id_site = #site;
COMMIT;

Related

set var mysql with start transaction

i don't familiar at database, That is my test syntax:
START TRANSACTION ;
SET #VAR = (SELECT `some ID` FROM `some table` ORDER BY `some ID` DESC LIMIT 1);
SELECT #VAR;
COMMIT;
SELECT #VAR;
i think is result is first select is null (because before commit) and second select is have value, and in my test first and second select have value, why? and how to fix my syntax?
You seem confused. First, changes made within a transaction are visible within the same transaction. Second, transactions are about changes to the database, not changes to the session. After all, the database is ACID-compliant (or not), not the variables in a session.
The first print prints the value present during the transaction. Changes within a transaction are visible -- in the transaction. This is true for changes on tables, as well. If you insert a row in a table and -- in the same transaction -- look for the row, then you will see it.
You should not see the row in another session. You won't see it elsewhere, until the changes are committed.

Update Current Row in MySQL Loop

I have a MySQL table with over 16 million rows and there is no primary key. Whenever I try to add one, my connection crashes. I have tried adding one as an auto increment in PHPMyAdmin and in shell but the connection is always lost after about 10 minutes.
What I would like to do is loop through the table's rows in PHP so I can limit the number of results and with each returned row add an auto-incremented ID number. Since the number of impacted rows would be reduced by reducing the load on the MySQL query, I won't lose my connection.
I want to do something like
SELECT * FROM MYTABLE LIMIT 1000001, 2000000;
Then, in the loop, update the current row
UPDATE (current row) SET ID='$i++'
How do I do this?
Note: the original data was given to me as a txt file. I don't know if there are duplicates but I cannot eliminate any rows. Also, no rows will be added. This table is going to be used only for querying purposes. When I have added indexes, however, there were no problems.
I suspect you are trying to use phpmyadmin to add the index. As handy as it is, it is a PHP script and is limited to the same resources as any PHP script on your server, typically 30-60 seconds run time, and a limited amount of ram.
Suggest you get the mysql query you need to add the index, then use SSH to shell in, and use command line MySQL to add your indexes.
If you don't have duplicate rows then the following way might shed some light:
Suppose you want to update the auto incremented value for first 10000 rows.
UPDATE
MYTABLE
INNER JOIN
(SELECT
*,
#rn := #rn + 1 AS row_number
FROM MYTABLE,(SELECT #rn := 0) var
ORDER BY SOME_OF_YOUR_FIELD
LIMIT 0,10000 ) t
ON t.field1 = MYTABLE.field1 AND t.field2 = MYTABLE.field2 AND .... t.fieldN = MYTABLE.fieldN
SET MYTABLE.ID = t.row_number;
For next 10000 rows just need to change two things:
(SELECT #rn := 10000) var
LIMIT 10000,10000
Repeat..
Note: ORDER BY SOME_OF_YOUR_FIELD is important otherwise you would get results in random order. Better create a function which might take limit,offset as parameter and do this job. Since you need to repeat the process.
Explanation:
The idea is to create a temporary table(t) having N number of rows and assigning a unique row number to each of the row. Later make an inner join between your main table MYTABLE and this temporary table t ON matching all the fields and then update the ID field of the corresponding row(in MYTABLE) with the incremented value(in this case row_number).
Another IDEA:
You may use multithreading in PHP to do this job.
Create N threads.
Assign each thread a non overlapping region (1 to 10000, 10001 to
20000 etc) like the above query.
Caution: The query will get slower in higher offset.

Disadvantages of MySQL Row Locking

I am using row locking (transactions) in MySQL for creating a job queue. Engine used is InnoDB.
SQL Query
START TRANSACTION;
SELECT *
FROM mytable
WHERE status IS NULL
ORDER BY timestamp DESC LIMIT 1
FOR UPDATE;
UPDATE mytable SET status = 1;
COMMIT;
According to this webpage,
The problem with SELECT FOR UPDATE is that it usually creates a
single synchronization point for all of the worker processes, and you
see a lot of processes waiting for the locks to be released with
COMMIT.
Question: Does this mean that when the first query is executed, which takes some time to finish the transaction before, when the second similar query occurs before the first transaction is committed, it will have to wait for it to finish before the query is executed? If this is true, then I do not understand why the row locking of a single row (which I assume) will affect the next transaction query that would not require reading that locked row?
Additionally, can this problem be solved (and still achieve the effect row locking does for a job queue) by doing a UPDATE instead of the transaction?
UPDATE mytable SET status = 1
WHERE status IS NULL
ORDER BY timestamp DESC
LIMIT 1
If you use FOR UPDATE with a storage engine that uses page or row locks, rows examined by the query are write-locked until the end of the current transaction. Using LOCK IN SHARE MODE sets a shared lock that permits other transactions to read the examined rows but not to update or delete them.
and about this query
UPDATE mytable SET status = 1
WHERE status IS NULL
ORDER BY timestamp DESC
LIMIT 1
since innodb
automatically acquire locks during the processing of SQL statements i think it works as the same .

how to synchronize mysql database requests?

I have a lot of entries in a table that are fetched for performing jobs. this is scaled to several servers.
when a server fetches a bunch of rows to add to its own job queue they should be "locked" so that no other server fetches them.
when the update is performed a timestamp is increased and they are "unlocked".
i currently do this by updating a field that is called "jobserver" in the table that defaults to null with the id of the jobserver.
a job server only selects rows where the field is null.
when all rows are processed their timestamp is updated and finally the job field set to null again.
so i need to synchronize this:
$jobs = mysql_query("
SELECT itemId
FROM items
WHERE
jobserver IS NULL
AND
DATE_ADD(updated_at, INTERVAL 1 DAY) < NOW()
LIMIT 100
");
mysql_query("UPDATE items SET jobserver = 'current_job_server' WHERE itemId IN (".join(',',mysql_fetch_assoc($jobs)).")");
// do the update process in foreach loop
// update updated_at for each item and set jobserver to null
every server executes the above in an infinite loop. if no fields are returned, everything is up 2 date (last update is not longer ago than 24 hours) and is sent to 10 minutes.
I currently have MyIsam and i would like to stay with it because it had far better performance than innodb in my case, but i heard that innodb has ACID transactions.
So i could execute the select and update as one. but how would that look and work?
the problem is that i cannot afford to lock the table or something because other processes neeed to read/write and cannot be locked.
I am also open to a higher level solution like a shared semaphore etc. the problem is the synchronization needs to be across several servers.
is the approach generally sane? would you do it differently?
how can i synchronize the job selectino to ensure that two servers dont update the same rows?
You can run the UPDATE first but with the WHERE and LIMIT that you had on the SELECT. You then SELECT the rows that have the jobserver field set to your server.
If you can't afford to lock the tables, then I would make the update conditional on the row not being modified. Something like:
$timestamp = mysql_query("SELECT DATE_SUB(NOW(), INTERVAL 1 DAY)");
$jobs = mysql_query("
SELECT itemId
FROM items
WHERE
jobserver IS NULL
AND
updated_at < ".$timestamp."
LIMIT 100
");
// Update only those which haven't been updated in the meantime
mysql_query("UPDATE items SET jobserver = 'current_job_server' WHERE itemId IN (".join(',',mysql_fetch_assoc($jobs)).") AND updated_at < ".$timestamp);
// Now get a list of jobs which were updated
$actual_jobs_to_do = mysql_query("
SELECT itemId
FROM items
WHERE jobserver = 'current_job_server'
");
// Continue processing, with the actual list of jobs
You could even combine the select and update queries, like this:
mysql_query("
UPDATE items
SET jobserver = 'current_job_server'
WHERE jobserver IS NULL
AND updated_at < ".$timestamp."
LIMIT 100
");

Does MYSQL load the whole table into cache everytime?

Lets say I have a table, with say 1 million rows, with the first column being a primary key.
Then, if I run the following:
SELECT * FROM table WHERE id='tomato117' LIMIT 1
Does the table ALL get put into the cache (thereby causing the query to slow as more and more rows get added) or would the number of rows of the table not matter, since the query uses the primary key?
edit: (added limit 1)
If the id is define as primary key, which only one record with value tomato117, so limit does not useful.
Using SELECT * will trigger mysql read from disk because unlikely all columns are stored into index. (mysql not able to fetch from index) In theory, it will affect performance.
However, your sql is matching query cache condition. So, mysql will stored the result into query cache for subsequent usage.
If you query cache size is huge, mysql will keep store all sql results into query cache until memory full.
This come with a cost, if there is an update on your table, query cache invalidation will be harder for mysql.
http://www.mysqlperformanceblog.com/2007/03/23/beware-large-query_cache-sizes/
http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/
nothing of the sort.
It will only fetch the row you selected and perhaps a few other blocks. They will remain in cache until something pushes them out.
By cache, I refer to the innodb buffer pool, not query cache, which should probably be off anyway.
SELECT * FROM table WHERE id = 'tomato117' LIMIT 1
When tomato117 is found, it stops searching, if you don't set LIMIT 1 it will search until end of table. tomato117 can be second, and it will still search 1 000 000 rows for other tomato117.
http://forge.mysql.com/wiki/Top10SQLPerformanceTips
Showing rows 0 - 0 (1 total, Query took 0.0159 sec)
SELECT *
FROM 'forum_posts'
WHERE pid = 643154
LIMIT 0 , 30
Showing rows 0 - 0 (1 total, Query took 0.0003 sec)
SELECT *
FROM `forum_posts`
WHERE pid = 643154
LIMIT 1
Table is about 1GB, 600 000+ rows.
If you add the word EXPLAIN before the word SELECT, it will show you a table with a summary of how many rows it's reading instead of the normal results.
If your table has an index on the id column (including if it's set as primary key), the engine will be able to jump straight to the exact row (or rows, for a non-unique index) and only read the minimal amount of date. If there's no index, it will need to read the whole table.

Categories