How to block a piece of code for multiple processes - php

As I have installed cron run the same script every minute. At the same time a script is executed several times.
There is such a part:
$query = mysql_query("select distinct `task_id` from tasks_pending where `checked`='0' and `taken`='0' limit 50");
Then, the obtained values ​​set "taken = 1"
Since several processes are executed at the same time the request Return the same data for different processes. Is it possible to somehow disable this part of her time to can perform only one process?
Sorry for bad English.

Use SELECT FOR UPDATE and it will block another process to select the same rows, before they are updated

You might want to lock the table before doing anything with it, and unlock it afterwards using the [UN]LOCK TABLES statements.
Otherwise, you could use a SELECT FOR UPDATE query within a transaction scope.

Related

PHP - Query on page with many requests

I have around 700 - 800 visitors at all time on my home page (according to analytics) and a lot of hits in general. However, I wish to show live statistics of my users and other stuff on my homepage. I therefore have this:
$stmt = $dbh->prepare("
SELECT
count(*) as totalusers,
sum(cashedout) cashedout,
(SELECT sum(value) FROM xeon_stats_clicks
WHERE typ='1') AS totalclicks
FROM users
");
$stmt->execute();
$stats=$stmt->fetch();
Which I then use as $stats["totalusers"] etc.
table.users have `22210` rows, with index on `id, username, cashedout`, `table.xeon_stats_clicks` have index on `value` and `typ`
However, whenever I enable above query my website instantly becomes very slow. As soon as I disable it, the load time drastically falls.
How else can this be done?
You should not do it that way. You will eventually exhuast your precious DB resources, as you now are experiencing. The good way is to run a separate cronjob in 30 secs or 1 min interval, and then write the result down to a file :
file_put_contents('stats.txt', $stats["totalusers"]);
and then on your mainpage
<span>current users :
<b><? echo file_get_contents('stats.txt');?></b>
</span>
The beauty is, that the HTTP server will cache this file, so until stats.txt is changed, a copy will be upfront in cache too.
Example, save / load JSON by file :
$test = array('test' => 'qwerty');
file_put_contents('test.txt', json_encode($test));
echo json_decode(file_get_contents('test.txt'))->test;
will output qwerty. Replace $test with $stats, as in comment
echo json_decode(file_get_contents('stats.txt'))->totalclicks;
From what I can tell, there is nothing about this query that is specific to any user on the site. So if you have this query being executed for every user that makes a request, you are making thousands of identical queries.
You could do a sort of caching like so:
Create a table that basically looks like the output of this query.
Make a PHP script that just executes this query and updates the aforementioned table with the lastest result.
Execute this PHP script as a cron job every minute to update the stats.
Then the query that gets run for every request can be real simple, like:
SELECT totalusers cashedout, totalclicks FROM stats_table
From the query, I can't see any real reason to use a sub-query in there as it doesn't use any of the data in the users table, and it's likely that that is slowing it down - if memory serves me right it will query that xeon_clicks table once for every row in your users table (which is a lot of rows by the looks of things).
Try doing it as two separate queries rather than one.

SELECT+UPDATE to avoid returning the same result

I have a cron task running every x seconds on n servers. It will "SELECT FROM table WHERE time_scheduled<CURRENT_TIME" and then perform a lengthy task on this result set.
My problem is now: How do I avoid having two seperate servers perform the same task at the same time?
The idea is to update *time_scheduled* with a set interval after selecting it. But if two servers happen to run the query at the same time, that will be too late, no?
All ideas are welcome. It doesnt have to be a strict MySQL solution.
Thanks!
I am guessing you have a single MySQL instance, and connections from your n servers to run this processing job. You're implementing a job queue here.
The table you mention needs to use the InnoDB access method (or one of the other transaction-friendly access methods offered by Percona or MariaDB).
Do these items in your table need to be processed in batches? That is, are they somehow inter-related? Or is it possible for your server processes to handle them one-by-one? This is an important question, because you'll get better load balancing between your server processes if you can handle them individually or in small batches. Let's assume the small batches.
The idea is to prevent any server process from grabbing onto a row in your table if some other server process has that row. I've had to do this kind of thing a lot, and here is my suggestion; I know this works.
First, add an integer column to your table. Call it "working" or some such thing. Give it a default value of zero.
Second, assign a permanent id number to each server. The last part of the server's IP address (for example, if the server's IP address is 10.1.0.123, the id number is 123) is a good choice, because it's probably unique in your environment.
Then, when a server's grabbing work to do, use these two SQL queries.
UPDATE table
SET working = :this_server_id
WHERE working = 0
AND time_scheduled < CURRENT_TIME
ORDER BY time_scheduled
LIMIT 1
SELECT table_id, whatever, whatever
FROM table
WHERE working = :this_server_id
The first query will consistently grab a batch of rows to work on. If another server process comes in at the same time, it won't ever grab the same rows, because no process can grab rows unless working = 0. Notice that the LIMIT 1 will limit your batch size. You don't have to do this, but you can. I also threw in ORDER BY to process the rows first that have been waiting the longest. That's probably a useful way to do things.
The second query retrieves the information you need to do the work. Don't forget to retrieve the primary key values (I called them table_id) for the rows you're working on.
Then, your server process does whatever it needs to do.
When it's done, it needs to throw the row back into the queue for a later time. To do that, the server process needs to set the time_scheduled to whatever it needs to be, then to set working = 0. So, for example, you could run this query for each row you're processing.
UPDATE table
SET time_scheduled = CURRENT_TIME + INTERVAL 5 MINUTE,
working = 0
WHERE table_id = ?table_id_from_previous_query
That's it.
Except for one thing. In the real world these queuing systems get fouled up sometimes. Server processes crash. Etc. Etc. See Murphy's Law. You need a monitoring query. That's easy in this system.
This query will give a list of all jobs that are more than five minutes overdue, along with the server that's supposed to be working on them.
SELECT working, COUNT(*) stale_jobs
FROM table
WHERE time_scheduled < CURRENT_TIME - INTERVAL 5 MINUTE
GROUP BY WORKING
If this query comes up empty, all is well. If it comes up with lots of jobs with working set to zero, your servers aren't keeping up. If it comes up with jobs with working set to some server's id number, that server is taking a lunch break.
You can reset all the jobs assigned to the server that's gone to lunch with this query, if need be.
UPDATE table
SET working=0
WHERE working=?server_id_at_lunch
By the way, a compound index on (working, time_scheduled) will probably help this perform well.

Multiple instances of same PHP script processing diffrent MySQL rows using rowlocking

What I want to do is to execute the same script every few minutes with cron.
The script needs to process some data read from the database, so obviously I need it work on diffrent row each time.
My concept was to use row locking to make sure each instance work on different row, but it doesn't seem to work that way.
Is it even possible to use row locks this way? Any other solutins?
Example:
while($c < $limit) {
$sql=mysql_query("SELECT * FROM table WHERE ... LIMIT 1 FOR UPDATE");
$data=mysql_fetch_assoc($sql);
(process data)
mysql_query("update table set value=spmething, timestamp=NOW()");
$c++;
}
Basically what i need is SCRIPT1 reads R1 from the table; SCRIPT2 reads R2 (next non-locked row matching criteria)
EDIT:
Let's say for example that:
1) the table stores a list of URL
2) the script checks if URL responses, and updates it's status (and timestamp) in database
This should essentially be treated as two separate problems:
Finding a job for each worker to process. Ideally this should be very efficient and pre-emptively avoid failures in step 2, which comes next.
Ensuring that each job gets processed at most once or exactly once. No matter what happens the same job should not be concurrently processed by multiple workers. You may want to ensure that no jobs are lost due to buggy/crashing workers.
Both problems have multiple workable solutions. I'll give some suggestions about my preference:
Finding a job to process
For low-velocity systems it should be sufficient just to look for the most recent un-processed job. You do not want to take the job yet, just identify it as a candidate. This could be:
SELECT id FROM jobs ORDER BY created_at ASC LIMIT 1
(Note that this will process the oldest job first—FIFO order—and we assume that rows are deleted after processing.)
Claiming a job
In this simple example, this would be as simple as (note I am avoiding some potential optimizations that will make things less clear):
BEGIN;
SELECT * FROM jobs WHERE id = <id> FOR UPDATE;
DELETE FROM jobs WHERE id = <id>;
COMMIT;
If the SELECT returns our job when queried by id, we've now locked it. If another worker has already taken this job, an empty set will be returned, and we should look for a different job. If two workers are competing for the same job, they will block each other from the SELECT ... FOR UPDATE onwards, such that the previous statements are universally true. This will allow you to ensure that each job is processed at most once. However...
Processing a job exactly once
A risk in the previous design is that a worker takes a job, fails to process it, and crashes. The job is now lost. Most job processing systems therefor do not delete the job when they claim it, instead marking it as claimed by some worker and implement a job-reclaim system.
This can be achieved by keeping track of the claim itself using either additional columns in the job table, or a separate claim table. Normally some information is written about the worker, e.g. hostname, PID, etc., (claim_description) and some expiration date (claim_expires_at) is provided for the claim e.g. 1 hour in the future. An additional process then goes through those claims and transactionally releases claims which are past their expiration (claim_expires_at < NOW()). Claiming a job then also requires that the job row is checked for claims (claim_expires_at IS NULL) both at selection time and when claiming with SELECT ... FOR UPDATE.
Note that this solution still has problems: If a job is processed successfully, but the worker crashes before successfully marking the job as completed, we may eventually release the claim and re-process the job. Fixing that requires a more advanced system which is left as an exercise for the reader. ;)
If you are going to read the row once, and only once, then I would create an is_processed column and simply update that column on the rows that you've processed. Then you can simply query for the first row that has is_processed = 0

Using PHP to delete many rows in a large table from MySQL

I am having trouble deleting many rows in a large table. I am trying to delete 200-300k rows from a 2m rows table.
My PHP script is something like this
for($i=0;$i<1000;$i++){
$query="delete from record_table limit 100";
$queryres=mysql_query($query) or die(mysql_error());;
}
this is just an example of my script where I will delete 100 rows at a time running for 1000 times to delete 100k records.
However, the PHP script just seems to keep running forever and not returning anything.
But when I tried to run the query from command line, it seems to delete just fine, although it takes about 5-6 minutes to delete.
Could there be something else that is preventing the PHP script from executing the query? I tried deleting 100k in one query and the result is the same too.
The query that I really wanted to run is "DELETE FROM table WHERE (timeinlong BETWEEN 'timefrom' AND 'timeto'"
The timeinlong column is indexed.
Hopefully you have an ID field so you can just do something like this:
$delete = mysql_query("DELETE FROM records WHERE id > 1000");
That would leave the first 1,000 rows and remove every other entry.
Perhaps adding a field to track deleted items will work for you. Then rather than actually deleting the rows, you update 'deleted' to TRUE. Obviously your other queries need to modified to select where deleted equals FALSE. But it's fast. Then you can trim the db via script at some other time.
Why are you deleting using loop so many times it's not a good way to delete. If you have any id (auto incremented) then use it with where clause
(e.g delete from record_table where id< any ID)
Or if you want to delete it with looping then for a long time you should also use set_time_limit(0) function to keep PHP script executing.

Execution order of mysql queries from php script when same script is launched quickly twice

I have a php script that executes mysql pdo queries. There are a few reads and writes to the same table in this script.
For sake of example let's say that there are 4 queries, a read, write, another read, another write, each read takes 10 second to execute, and each write takes .1 seconds to execute.
If I execute this script from the cli nohup php execute_queries.php & twice in 1/100th of a second, what would the execution order of the queries be?
Would all the queries from the first instance of the script need to finish before the queries from the 2nd instance begin to run, or would the first read from both instances start and finish before the table is locked by the write?
NOTE: assume that I'm using myisam and that the write is an update to a record (IE, entire table gets locked during the write.)
Since you are not using transactions, then no, the won't wait for all the queries in one script to finish an so the queries may get overlaped.
There is an entire field of study called concurrent programming that teaches this.
In databases it's about transactions, isolation levels and data locks.
Typical (simple) race condition:
$visits = $pdo->query('SELECT visits FROM articles WHERE id = 44')->fetch()[0]['visits'];
/*
* do some time-consuming thing here
*
*/
$visits++;
$pdo->exec('UPDATE articles SET visits = '.$visits.' WHERE id = 44');
The above race condition can easily turn sour if 2 PHP processes read the visits from the database one millisecond after the other, and assuming the initial value of visits was 6, both would increment it to 7 and both would write 7 back into the database even though the desired effect was that 2 visits increment the value by 2 (final value of visits should've been 8).
The solution to this is using atomic operations (because the operation is simple and can be reduced to one single atomic operation).
UPDATE articles SET visits = visits+1 WHERE id = 44;
Atomic operations are guaranteed by the database engines to take place uninterrupted by other processes/threads. Usually the database has to queue incoming updates so that they don't affect each other. Queuing obviously slows things down because each process has to wait for all processes before it until it gets the chance to be executed.
In a less simple operation we need more than one statement:
SELECT #visits := visits FROM articles WHERE ID = 44;
SET #visits = #visits+1;
UPDATE articles SET visits = #visits WHERE ID = 44;
But again even at the database level 3 separate atomic statements are not guaranteed to yield an atomic result. They can be overlap with other operations. Just like the PHP example.
To solve this you have to do the following:
START TRANSACTION
SELECT #visits := visits FROM articles WHERE ID = 44 FOR UPDATE;
SET #visits = #visits+1;
UPDATE articles SET visits = #visits WHERE ID = 44;
COMMIT;

Categories