how to synchronize mysql database requests? - php

I have a lot of entries in a table that are fetched for performing jobs. this is scaled to several servers.
when a server fetches a bunch of rows to add to its own job queue they should be "locked" so that no other server fetches them.
when the update is performed a timestamp is increased and they are "unlocked".
i currently do this by updating a field that is called "jobserver" in the table that defaults to null with the id of the jobserver.
a job server only selects rows where the field is null.
when all rows are processed their timestamp is updated and finally the job field set to null again.
so i need to synchronize this:
$jobs = mysql_query("
SELECT itemId
FROM items
WHERE
jobserver IS NULL
AND
DATE_ADD(updated_at, INTERVAL 1 DAY) < NOW()
LIMIT 100
");
mysql_query("UPDATE items SET jobserver = 'current_job_server' WHERE itemId IN (".join(',',mysql_fetch_assoc($jobs)).")");
// do the update process in foreach loop
// update updated_at for each item and set jobserver to null
every server executes the above in an infinite loop. if no fields are returned, everything is up 2 date (last update is not longer ago than 24 hours) and is sent to 10 minutes.
I currently have MyIsam and i would like to stay with it because it had far better performance than innodb in my case, but i heard that innodb has ACID transactions.
So i could execute the select and update as one. but how would that look and work?
the problem is that i cannot afford to lock the table or something because other processes neeed to read/write and cannot be locked.
I am also open to a higher level solution like a shared semaphore etc. the problem is the synchronization needs to be across several servers.
is the approach generally sane? would you do it differently?
how can i synchronize the job selectino to ensure that two servers dont update the same rows?

You can run the UPDATE first but with the WHERE and LIMIT that you had on the SELECT. You then SELECT the rows that have the jobserver field set to your server.

If you can't afford to lock the tables, then I would make the update conditional on the row not being modified. Something like:
$timestamp = mysql_query("SELECT DATE_SUB(NOW(), INTERVAL 1 DAY)");
$jobs = mysql_query("
SELECT itemId
FROM items
WHERE
jobserver IS NULL
AND
updated_at < ".$timestamp."
LIMIT 100
");
// Update only those which haven't been updated in the meantime
mysql_query("UPDATE items SET jobserver = 'current_job_server' WHERE itemId IN (".join(',',mysql_fetch_assoc($jobs)).") AND updated_at < ".$timestamp);
// Now get a list of jobs which were updated
$actual_jobs_to_do = mysql_query("
SELECT itemId
FROM items
WHERE jobserver = 'current_job_server'
");
// Continue processing, with the actual list of jobs
You could even combine the select and update queries, like this:
mysql_query("
UPDATE items
SET jobserver = 'current_job_server'
WHERE jobserver IS NULL
AND updated_at < ".$timestamp."
LIMIT 100
");

Related

display huge data in batches of 100 every hour in mysql/php

I have a database with more than 600 rows but I can only retrieve/display 100 every hour. So I use
select * from table ORDER BY id DESC LIMIT 100
to retrieve the first 100. How do I write a script that will retrieve the data in batches of 100 every 1hr so that I can use it in a cron job?
Possible solution.
Add a field for to mark the record was already shown.
ALTER TABLE tablename
ADD COLUMN shown TINYINT NULL DEFAULT NULL;
NULL will mean that the record was not selected, 1 - that record is marked for selection, 0 - that record was already selected.
When you need to select up to 100 records you
2.1. Mark records to be shown
UPDATE tablename
SET shown = 1
WHERE shown = 1
OR shown IS NULL
ORDER BY shown = 1 DESC, id ASC
LIMIT 100;
shown = 1 condition in WHERE considered the fact that some records were marked but were not selected due to some error. shown = 1 DESC re-marks such records before non-marked.
If there is 100 or less records which were not selected all of them will be marked, else only 100 records with lower id (most ancient) will be marked.
2.2. Select marked records.
SELECT *
FROM tablename
WHERE shown = 1
ORDER BY id
LIMIT 100;
2.3. Mark selected records.
UPDATE tablename
SET shown = 0
WHERE shown = 1
ORDER BY id
LIMIT 100;
This is applicable when only one client selects the records.
If a lot of clients may work in parallel, and only one cliens must select a record, then use some cliens number (unique over all clients) for to mark a record for selection instead of 1.
Of course if there is only one client, and you guarantee that selection will not fail, you may simply store last shown ID somewhere (on the client side, or in some service table on the MySQL side) and simply select "next 100" starting from this stored ID:
SELECT *
FROM tablename
WHERE id > #stored_id
ORDER BY id
LIMIT 100;
and
SELECT MAX(id)
FROM tablename
WHERE id > #stored_id
ORDER BY id
LIMIT 100;
for to store instead of previous #stored_id.
Thank you #Akina and #Vivek_23 for your contributions. I was able to figure out an easier way to go about it.
Add a new field to table, eg shownstatus
Create a cronjob to display 100 (LIMIT 100) records with their shownstatus not marked as shown from table every hour and then update each record's shownstatus to shown NB. If I create a cronjob to run every hour for the whole day, I can get all records displayed and their shownstatus updated to shown by close of day.
Create a second cronjob to update all record's shownstatus to notshown
The downside to this is that, you can only display a total of 2,400 records a day. ie. 100 records every hour times 24hrs. So if your record grows to about 10,000. You will need to set your cronjob to run for atleast 5 days to display all records.
Still open to a better approach if there's any, but till then, I will have to just stick to this for now.
Let's say you made a cron that hits a URL something like
http://yourdomain.com/fetch-rows
or a script for instance, like
your_project_folder/fetch-rows.php
Let's say you have a DB table in place that looks something like this:
| id | offset | created_at |
|----|--------|---------------------|
| 1 | 100 | 2019-01-08 03:15:00 |
| 2 | 200 | 2019-01-08 04:15:00 |
Your script:
<?php
define('FETCH_LIMIT',100);
$conn = mysqli_connect(....); // connect to DB
$result = mysqli_query($conn,"select * from cron_hit_table where id = (select max(id) from cron_hit_table)")); // select the last record to get the latest offset
$offset = 0; // initial default offset
if(mysqli_num_rows($result) > 0){
$offset = intval(mysqli_fetch_assoc($result)['offset']);
}
// Now, hit your query with $offset included
$result = mysqli_query($conn,"select * from table ORDER BY id DESC LIMIT $offset,100");
while($row = mysqli_fetch_assoc($result)){
// your data processing
}
// insert new row to store next offset for next cron hit
$offset += FETCH_LIMIT; // increment current offset
mysqli_query($conn,"insert into cron_hit_table(offset) values($offset)"); // because ID would be auto increment and created_at would have default value as current_timestamp
mysqli_close($conn);
Whenever cron hits, you fetch last row from your hit table to get the offset. Hit the query with that offset and store the next offset for next hit in your table.
Update:
As pointed out by #Dharman in the comments, you can use PDO for more abstracted way of dealing with different types of database(but make sure you have appropriate driver for it, see checklist of drivers PDO supports to be sure) along with minor checks of query syntaxes.

Mysql Return ID where last activity is older than X Days

i have a log tabel where every user activity is stored.
UserActivityTable (arround 15Milion records)
id userID category value timestamp
1 2 Visit homepage 2018-02-21 13:13:54
1 2 Visit page2 2018-02-18 13:13:45
1 2 Visit page1 2018-02-15 13:13:30
1 3 Visit homepage 2018-02-01 13:13:12
With an SQL query i need to get all userID´s where the last activity is older than X Days (lets say 30) if the user is set to "Active"
Users(arround 15k User)
id Groups Active Name Mails ...
2 Customer 1 Hans
3 Customer 0 Wurst
if i get all user that active (arround 5k) and than try to get there last activity i run into a timeout (the query is not perfomant i think)
If i limit it to 5 there is no problem.
What i try.
1 Select of all User that are active, than use a foreach function to get there last activity if its older than 30 days i write it inside new array and at the end i use that array to set the activity inside the user table to false.
Untill the last 2-3 Month it was just fine but now we have a lot of new users and the function cant handle it.
Is there a clean way to get all that stuff in one sql query?
You can use the following query to get the Users:
SELECT `userID`, MAX(`timestamp`) AS lastActive FROM `UserActivityTable`
WHERE `userID` IN (
SELECT `id` FROM `Users` WHERE `Active` = 1
) GROUP BY `userID` HAVING lastActive < DATE_SUB(NOW(), INTERVAL 30 DAY)
Indexing
You should use a PRIMARY KEY index on the Users table.
You should use a FOREIGN KEY index on UserActivityTable table.
To speed up the query above you can create a column index on timestamp column.
You can use the following to create a INDEX on the timestamp column:
CREATE INDEX index_timestamp ON `UserActivityTable` (`timestamp`);
You can also use a single query to UPDATE the active state on the Users table:
UPDATE `Users` SET `active` = EXISTS (
SELECT `userID` FROM `UserActivityTable` WHERE `UserActivityTable`.`userID` = `Users`.`id` GROUP BY `UserActivityTable`.`userID` HAVING MAX(`UserActivityTable`.`timestamp`) > DATE_SUB(NOW(), INTERVAL 30 DAY)
)
Is there a clean way to get all that stuff in one sql query?
Yes, you can update the Users table in a single step with the following query:
UPDATE `Users` SET `Active` = EXISTS(
SELECT * from `UserActivityTable ` WHERE
`UserActivityTable `.`userID` = `Users`.`id` AND
`timestamp`>DATE_SUB( NOW(), INTERVAL 30 DAY )
)
The EXISTS statement returns 1 or 0 depending if does esists at least one record in the user activity in the last 30 days. So the field Active is properly updated to 1 or 0 for every user.
Mysql Return ID where last activity is older than X Days
If you just want the list of users' id with activity you have:
SELECT `Users`.`id` WHERE EXISTS(
SELECT * from `UserActivityTable ` WHERE
`UserActivityTable `.`userID` = `Users`.`id` AND
`timestamp`>DATE_SUB( NOW(), INTERVAL 30 DAY )
) = 1
In order to have good performaces (at least) the field timestamp must be indexed.
Side note
You already hit 15M records.
As your events table will grow indefinitely over time you should consider deleting periodically old entries or moving them to a separate table/dump file.
Don't do it.
It is improper to have redundant information in a database. (active is redundant because it can be discovered by a query against UserActivityTable.)
OK, you need more performance, so you are setting this flag. I assume this is not a one-time task, but needs to be updated daily? Or what? I ask this because active=0 will be wrong if the 'user' does something after you run the UPDATE, and before running it again!
Let's solve that bug, then discover that we are making the UPDATE very fast in the process.
The 'only' way to fix that bug is to reach into UserActivityTable dynamically. However, we can make that so cheap that it is OK to do it in 'realtime'.
FROM Users
WHERE EXISTS ( SELECT * FROM UserActivityTable
WHERE userID = x.userID
AND timestamp > NOW() - INTERVAL 30 DAY ) -- == "active"
UserActivityTable needs INDEX(userID, timestamp)
Oops! I just obviated the need for the active column.
One of your Comments mentioned purging 'old, inactive' users?? Is the UPDATE aimed at that? Please fold that requirement into the question, else I (and others) are not necessarily helping you.

Disadvantages of MySQL Row Locking

I am using row locking (transactions) in MySQL for creating a job queue. Engine used is InnoDB.
SQL Query
START TRANSACTION;
SELECT *
FROM mytable
WHERE status IS NULL
ORDER BY timestamp DESC LIMIT 1
FOR UPDATE;
UPDATE mytable SET status = 1;
COMMIT;
According to this webpage,
The problem with SELECT FOR UPDATE is that it usually creates a
single synchronization point for all of the worker processes, and you
see a lot of processes waiting for the locks to be released with
COMMIT.
Question: Does this mean that when the first query is executed, which takes some time to finish the transaction before, when the second similar query occurs before the first transaction is committed, it will have to wait for it to finish before the query is executed? If this is true, then I do not understand why the row locking of a single row (which I assume) will affect the next transaction query that would not require reading that locked row?
Additionally, can this problem be solved (and still achieve the effect row locking does for a job queue) by doing a UPDATE instead of the transaction?
UPDATE mytable SET status = 1
WHERE status IS NULL
ORDER BY timestamp DESC
LIMIT 1
If you use FOR UPDATE with a storage engine that uses page or row locks, rows examined by the query are write-locked until the end of the current transaction. Using LOCK IN SHARE MODE sets a shared lock that permits other transactions to read the examined rows but not to update or delete them.
and about this query
UPDATE mytable SET status = 1
WHERE status IS NULL
ORDER BY timestamp DESC
LIMIT 1
since innodb
automatically acquire locks during the processing of SQL statements i think it works as the same .

Mysql lock concurrent read/update of row

I have table, and many (too many) requests for selecting from it a single row. After selecting a row, the script run update query to set a flag that is that row had been "selected". But as we have too many requests per time, in period between one thread select a row, and update its flag, another thread have time to select the same row.
Select query get one row from the table, ordering it by some field and using LIMIT 0, 1. I need that DB just skip the row, that had been selected before.
The engine is InnoDB.
Just before you start a transaction, call the following:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
This will ensure that if you read a row with a flag, it'll still be that way when you update it within the same transaction.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT id_site
INTO #site
FROM table1 WHERE flag = 0 ORDER BY field LIMIT 0,1;
UPDATE table1 SET flag = 1 WHERE id_site = #site;
COMMIT;

Keep only 10 records per user

I run points system on my site so I need to keep logs of different action of my users into database. The problem is that I have too many users and keeping all the records permanently may cause server overload... I there a way to keep only 10 records per user and automatically delete older entries? Does mysql have some function for this?
Thanks in advance
You can add a trigger that takes care of removing old entries.
For instance,
DELIMITER //
CREATE definer='root'#'localhost' TRIGGER afterMytableInsert AFTER INSERT ON MyTable
FOR EACH ROW
BEGIN
DELETE FROM MyTable WHERE user_id = NEW.user_id AND id NOT IN
(SELECT id FROM MyTable WHERE user_id = NEW.user_id ORDER BY action_time DESC LIMIT 10);
END//
Just run an hourly cron job that deletes the 11th - n records.
Before insert a record you could check how many the user has first. If they have >=10 delete the oldest one. Then insert the new one.
If your goal is to have the database ensure that for a given table there are never more than N rows per a given subkey (user) then the correct way to solve this will be either:
Use stored procedures to manage inserts in the table.
Use a trigger to delete older rows after an insert.
If you're already using stored procedures for data access, then modifying the insert procedure would make the most sense, otherwise a trigger will be the easiest solution.
Alternately if your goal is to periodically remove old data, then using a cron job to start a stored procedure to prune old data would make the most sense.
When you are inserting a new record for a user. Just do a query like this before (Don't forget the where-condition):
DELETE FROM tablename WHERE userID = 'currentUserId' LIMIT 9, 999999
After that you can insert new data. This keeps the data always to ten records for each user.
INSERT INTO tablename VALUES(....)
DELETE FROM Table WHERE ID NOT IN (SELECT TOP 10 ID FROM Table WHERE USER_ID = 1) AND USER_ID = 1
Clearer Version
DELETE FROM Table
WHERE ID NOT IN
(
SELECT TOP 10 ID FROM Table WHERE USER_ID = 1
)
AND USER_ID = 1

Categories