MySql Query lag time / deadlock? - php

When there are multiple PHP scripts running in parallel, each making an UPDATE query to the same record in the same table repeatedly, is it possible for there to be a 'lag time' before the table is updated with each query?
I have basically 5-6 instances of a PHP script running in parallel, having been launched via cron. Each script gets all the records in the items table, and then loops through them and processes them.
However, to avoid processing the same item more than once, I store the id of the last item being processed in a separate table. So this is how my code works:
function getCurrentItem()
{
$sql = "SELECT currentItemId from settings";
$result = $this->db->query($sql);
return $result->get('currentItemId');
}
function setCurrentItem($id)
{
$sql = "UPDATE settings SET currentItemId='$id'";
$this->db->query($sql);
}
$currentItem = $this->getCurrentItem();
$sql = "SELECT * FROM items WHERE status='pending' AND id > $currentItem'";
$result = $this->db->query($sql);
$items = $result->getAll();
foreach ($items as $i)
{
//Check if $i has been processed by a different instance of the script, and if so,
//leave it untouched.
if ($this->getCurrentItem() > $i->id)
continue;
$this->setCurrentItem($i->id);
// Process the item here
}
But despite of all the precautions, most items are being processed more than once. Which makes me think that there is some lag time between the update queries being run by the PHP script, and when the database actually updates the record.
Is it true? And if so, what other mechanism should I use to ensure that the PHP scripts always get only the latest currentItemId even when there are multiple scripts running in parallel? Would using a text file instead of the db help?

If this is run in parallell there's little measure to avoid race conditions.
script1:
getCurrentItem() yields Id 1234
...context switch to script2, before script 1 gets to run its update statement.
script2:
getCurrentItem() yields Id 1234
And both scripts process Id 1234
You'd want to update and check status of the item an all-or-nothing operation, you don't need the settings table, but you'd do something like this (pseudo code):
SELECT * FROM items WHERE status='pending' AND id > $currentItem
foreach($items as $i) {
rows = update items set status='processing' where id = $i->id and status='pending';
if(rows == 0) //someone beat us to it and is already processing the item
continue;
process item..
update items set status='done' where id = $i->id;
}

What you need is for any thread to be able to:
find a pending item
record that that item is now being worked on (in the settings table)
And it needs to do both of those in one go, without any other thread interfering half-way through.
I recommend putting the whole SQL in a stored procedure; that will be able to run the entire thing as a single transaction, which makes it safe from competing threads.

Related

Export Large data in Yii Stuck server to work

I am working in Yii and want to export Large data approx 2 Lack records at a time. Problem is When I try to export data server is stop working and hang all process in system. I have to kill all service and restart server again,m Can anyone tell me appropriate way to export data in csv file.
$count = Yii::app()->db->createCommand('SELECT COUNT(*) FROM TEST_DATA')->queryScalar();
$maxRows = 1000:
$maxPages = ceil($count / $maxRows);
for ($i=0;$i<$maxPages;$i++)
{
$offset = $i * $maxRows;
$rows = $connection->createCommand("SELECT * FROM TEST_DATA LIMIT $offset,$maxRows")->query();
foreach ($rows as $row)
{
// Here your code
}
}
May be it is because of the processing the code without closing the session. When you start the process and do not close session, in the period of processing code, you can not load any page of the site (in the same browser) because of session (it will be busy). It could be accepted as "hanging of the server" but server is running as it should. You can check it by loading the site on different browser, if it loads, it means the process is running as it should be.
In my experience, i used some table to save processing data (successfully processed offset, last_iterated_time) and see the current state of the processing. Fore example table "processing_data" with variables 'id' (int), 'stop_request'(tinyint, for stopping process, if 1 stop the iteration), 'offset'(int), 'last_iterated_time'(datetime). Add only one record on this table, and on every iteration check the 'stop_request' variable, if it gets the value 1 you can break iteration. And on every iteration you can save current offset value a current datetime. By doing this you can stop processing and continue.
And you can use while (to reduce memory usage) to iterate without counting:
set_time_limit(0);
$offset=0;
$nextRow= $connection->createCommand("SELECT * FROM TEST_DATA LIMIT $offset, 1")->queryRow();
while($nextRow) {
//Here your code
$processingData= ProcessingData::model()->findByPk(1);
$processingData->offset=$offset;
$processingData->last_iterated_time=new CDbExpression('NOW()');
$processingData->save();
if($processingData->stop_request==1) { break; }
$offset++;
$nextRow= $connection->createCommand("SELECT * FROM TEST_DATA LIMIT $offset, 1")->queryRow();
}

Prevent PHP from sending multiple emails when running parallel instances

This is more of a logic question than language question, though the approach might vary depending on the language. In this instance I'm using Actionscript and PHP.
I have a flash graphic that is getting data stored in a mysql database served from a PHP script. This part is working fine. It cycles through database entries every time it is fired.
The graphic is not on a website, but is being used at 5 locations, set to load and run at regular intervals (all 5 locations fire at the same time, or at least within <500ms of each-other). This is real-time info, so time is of the essence, currently the script loads and parses at all 5 locations between 30ms-300ms (depending on the distance from the server)
I was originally having a pagination problem, where each of the 5 locations would pull a different database entry since i was moving to the next entry every time the script runs. I solved this by setting the script to only move to the next entry after a certain amount of time passed, solving the problem.
However, I also need the script to send an email every time it displays a new entry, I only want it to send one email. I've attempted to solve this by adding a "has been emailed" boolean to the database. But, since all the scripts run at the same time, this rarely works (it does sometimes). Most of the time I get 5 emails sent. The timeliness of sending this email doesn't have to be as fast as the graphic gets info from the script, 5-10 second delay is fine.
I've been trying to come up with a solution for this. Currently I'm thinking of spawning a python script through PHP, that has a random delay (between 2 and 5 seconds) hopefully alleviating the problem. However, I'm not quite sure how to run exec() command from php without the script waiting for the command to finish. Or, is there a better way to accomplish this?
UPDATE: here is my current logic (relevant code only):
//get the top "unread" information from the database
$query="SELECT * FROM database WHERE Read = '0' ORDER BY Entry ASC LIMIT 1";
//DATA
$emailed = $row["emailed"];
$Entry = $row["databaseEntryID"];
if($emailed == 0)
{
**CODE TO SEND EMAIL**
$EmailSent="UPDATE database SET emailed = '1' WHERE databaseEntryID = '$Entry'";
$mysqli->query($EmailSent);
}
Thanks!
You need to use some kind of locking. E.g. database locking
function send_email_sync($message)
{
sql_query("UPDATE email_table SET email_sent=1 WHERE email_sent=0");
$result = FALSE;
if(number_of_affacted_rows() == 1) {
send_email_now($message);
$result = TRUE;
}
return $result;
}
The functions sql_query and number_of_affected_rows need to be adapted to your particular database.
Old answer:
Use file-based locking: (only works if the script only runs on a single server)
function send_email_sync($message)
{
$fd = fopen(__FILE__, "r");
if(!$fd) {
die("something bad happened in ".__FILE__.":".__LINE__);
}
$result = FALSE;
if(flock($fd, LOCK_EX | LOCK_NB)) {
if(!email_has_already_been_sent()) {
actually_send_email($message);
mark_email_as_sent();
$result = TRUE; //email has been sent
}
flock($fd, LOCK_UN);
}
fclose($fd);
return $result;
}
You will need to lock the row in your database by using a transaction.
psuedo code:
Start transaction
select row .. for update
update row
commit
if (mysqli_affected_rows ( $connection )) >1
send_email();

mysql queries queing (using php/mysql)

I am creating a panel where the user would select a contact group and then schedule mail.
In order to keep a track of mails send, I am storing it into the database.
But this storing of data in tables takes a long time as loop generates near about 1000+ insert queries which makes the page unresponsive to get the data to be populated into the table.
This is just for single user, what if at a time there are 10 users performing the same action, that would cause low performance on application.
I need to know whether is there any mechanism using 'mysql' and 'php' to que my queries so that it could be executed later and user should not wait for the query execution to be completed?
my code is as below,
$recordsCounter++;
//insert into EMAIL_RECIPIENTS
foreach($this->recipients as $val){
$query="INSERT INTO bas_email_recipients SET
recipient_type='".$val['recipientType']."',
email_dump_id='".$emailDumpId."', ";
if(isset($val['contactDetailsId'])){
$query.="contact_details_id='".$val['contactDetailsId']."', ";
}
if(isset($val['contactName'])){
$query.="recipient_name='".$val['contactName']."', ";
}
$query.="email_address='".$val['emailAddress']."',
mail_to='".$val['mailTo']."'";
if($db->query($query)>0){
$recordsCounter++;
}
}
For example, you can put that queries into cache and retrieve/execute/remove them on cron
Instead of
if($db->query($query)>0){ // putting to database
$recordsCounter++;
}
Put query to array
$queryList[] = $query;
and then put this array to cache with some unique key:
$cache->put('key', $queryList);
Install cron job for say every minute to execute:
$queryList = $cache->get('key');
if ($queryList) {
foreach ($queryList as $query) {
$db->query($query);
}
}
Heavy database operations will be executed in cron in background, so you will get much better performance with your web

are php files executed parallel or sequential?

If two users execute the same php file, will it be executed parallel or sequential? Example:
If I have a database data which only has one column id would it be possible that the following code produces for two different users the same outcome?
1. $db=startConnection();
2. $query="SELECT id FROM data";
3. $result=$db->query($query)or die($db->error);
4. $zeile=mysqli_fetch_row($result);
5. $number=$zeile['id'];
6. $newnumber=$number+1;
7. echo $number;
8. $update = "UPDATE data Set id = '$newnumber' WHERE id = '$number'";
9. $db->query($query)or die($db->error);
10. mysqli_close($db);
If it is not executed parallel, does it mean when 100 people are loading a php file that has a loading time of 1 second, then one of them has to wait 99 seconds?
Edit: In the comments it is stated that I could messup my database, I guess this is how it could mess up:
User A executes the file from 1.-7. in this moment user B executes the file from 1.-7. then A loads 8.-10. and B loads 8.-10. In this scenario both users would have the same number on the screen.
Now lets take the following example:
1. $db=startConnection();
2. $query=" INSERT INTO data VALUES ()";
3. $result=$db->query($query)or die($db->error);
4. echo $db->insert_id;
5. mysqli_close($db);
Lets say A executes the file from 1.-3. in this moment user B executes the file from 1.-5., after that user A loads the file from 4.-5. I guess in this scenario also both would have the same number on the screen right? Does transaction prevent both scenarios?
You can say that php files executed parallel (for most cases it is so, but this depends on web server).
Yes, it is possible that the following code produces for two different users the same outcome.
How to avoid this possibility?
1) If you are using MySQL, you can use transactions and "SELECT ... UPDATE FOR" to avoid this possibility. Just using transaction wouldn't help!
2) Be sure that you are using InnoDB or any other database engine that support transactions. For example MyISAM doesn't support transactions. Also you can have problems if any form of snapshotting is enabled in the database to handle reading locked records.
3) Example of using "SELECT ... UPDATE FOR":
$db = startConnection();
// Start transaction
$db->query("START TRANSACTION") or die($db->error);
// Your SELECT request but with "FOR UPDATE" lock
$query = "SELECT id FROM data FOR UPDATE";
$result = $db->query($query);
// Rollback changes if there is error
if (!$result)
{
mysql_query("ROLLBACK");
die($db->error);
}
$zeile = mysqli_fetch_row($result);
$number = $zeile['id'];
$newnumber = $number + 1;
echo $number;
$update = "UPDATE data Set id = '$newnumber' WHERE id = '$number'";
$result = $db->query($query);
// Rollback changes if there is error
if (!$result)
{
mysql_query("ROLLBACK");
die($db->error);
}
// Commit changes in database after requests sucessfully executed
mysql_query("COMMIT");
mysqli_close($db);
Why just using transaction wouldn't help?
Just transaction will lock only for write. You can test examples bellow by running two mysql console clients in two separate terminal windows. I did so and that's how it works.
We have client#1 and client#2 that executed parallel.
Example #1. Without "SELECT ... FOR UPDATE":
client#1: BEGIN
client#2: BEGIN
client#1: SELECT id FROM data // fetched id = 3
client#2: SELECT id FROM data // fetched id = 3
client#1: UPDATE data Set id = 4 WHERE id = 3
client#2: UPDATE data Set id = 4 WHERE id = 3
client#1: COMMIT
client#2: COMMIT
Both clients fetched the same id (3).
Example #2. With "SELECT ... FOR UPDATE":
client#1: BEGIN
client#2: BEGIN
client#1: SELECT id FROM data FOR UPDATE // fetched id = 3
client#2: SELECT id FROM data FOR UPDATE // here! client#2 will wait for end of transaction started by client#1
client#1: UPDATE data Set id = 4 WHERE id = 3
client#1: COMMIT
client#2: client#1 ended transaction and client#2 fetched id = 4
client#1: UPDATE data Set id = 5 WHERE id = 4
client#2: COMMIT
Hey, I think such read-locks reduce performance!
"SELECT ... FOR UPDATE" do read-lock only for clients that use "SELECT ... FOR UPDATE". That's good, cause it means that such read-lock wouldn't affect on standart "SELECT" requests without "FOR UPDATE".
Links
MySQL documentation: "SELECT ... FOR UPDATE" and other read-locks
Parallel or Sequential?
Part of your question was about PHP running either parallel or sequential. As I have read everything and its opposite about that topic, I decided to test it myself.
Field testing:
On a LAMP stack running PHP 5.5 w/ Apache 2, I made a script with a very expensive loop:
function fibo($n)
{
return ($n > 1) ? fibo($n - 1) + fibo($n - 2) : 1;
}
$start = microtime(true);
print "result: ".fibo(38);
$end = microtime(true);
print " - took ".round(($end - $start), 3).' s';
Result with 1 script running:
result: 63245986 - took 19.871 s
Result with 2 scripts running at the same time in two different browser windows:
result: 63245986 - took 20.753 s
result: 63245986 - took 20.847 s
Result with 3 scripts running at the same time in three different browser windows:
result: 63245986 - took 26.172 s
result: 63245986 - took 28.302 s
result: 63245986 - took 28.422 s
CPU usage while running 2 instances of the script:
CPU usage while running 3 instances of the script:
So, it's parallel!
Althoug inside a PHP script, you can't easily use multithreading (while it's possible), Apache takes benefit from your servers having multiple cores to dispatch the load.
So if your 1-second script is run by 100 users at the same time, well if you have 100 CPU cores the 100th user will hardly notice anything. If you have 8 CPU cores (which is more common), then the 100th user will theoritically have to wait something like 100 / 8 = 12.5 seconds for his instance of the script to begin. In practice, as the "benchmark" puts in evidence, each thread's performance diminishes when other threads are running at the same time on other cores. So it could be a lot more. But not 100 seconds more.

Running multiple PHP scripts at the same time (database loop issue)

I am running 10 PHP scripts at the same time and it processing at the background on Linux.
For Example:
while ($i <=10) {
exec("/usr/bin/php-cli run-process.php > /dev/null 2>&1 & echo $!");
sleep(10);
$i++;
}
In the run-process.php, I am having problem with database loop. One of the process might already updated the status field to 1, it seem other php script processes is not seeing it. For Example:
$SQL = "SELECT * FROM data WHERE status = 0";
$query = $db->prepare($SQL);
$query->execute();
while ($row = $query->fetch(PDO::FETCH_ASSOC)) {
$SQL2 = "SELECT status from data WHERE number = " . $row['number'];
$qCheckAgain = $db->prepare($SQL2);
$qCheckAgain->execute();
$tempRow = $qCheckAgain->fetch(PDO::FETCH_ASSOC);
//already updated from other processs?
if ($tempRow['status'] == 1) {
continue;
}
doCheck($row)
sleep(2)
}
How do I ensure processes is not re-doing same data again?
When you have multiple processes, you need to have each process take "ownership" of a certain set of records. Usually you do this by doing an update with a limit clause, then selecting the records that were just "owned" by the script.
For example, have a field that specifies if the record is available for processing (i.e. a value of 0 means it is available). Then your update would set the value of the field to the scripts process ID, or some other unique number to the process. Then you select on the process ID. When your done processing, you can set it to a "finished" number, like 1. Update, Select, Update, repeat.
The reason why your script executeds the same query multiple times is because of the parallelisation you are creating. Process 1 reads from the database, Process 2 reads from the database and both start to process their data.
Databases provide transactions in order to get rid of such race conditions. Have a look at what PDO provides for handling database transactions.
i am not entirely sure of how/what you are processing.
You can introduce limit clause and pass that as a parameter. So first process does first 10, the second does the next 10 and so on.
you need lock such as "SELECT ... FOR UPDATE".
innodb support row level lock.
see http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html for details.

Categories