How to handle deadlock in Doctrine? - php

I have a mobile application and server based on Symfony which gives API for the mobile app.
I have a situation, where users can like Post. When users like Post I add an entry in ManyToMany table that this particular user liked this particular Post (step 1). Then in Post table I increase likesCounter (step 2). Then in User table I increase gamification points for user (because he liked the Post) (step 3).
So there is a situation where many users likes particular Post at the same time and deadlock occurs (on Post table or on User table).
How to handle this? In Doctrine Docs I can see solution like this:
<?php
try {
// process stuff
} catch (\Doctrine\DBAL\Exception\RetryableException $e) {
// retry the processing
}
but what should I do in catch part? Retry the whole process of liking (steps 1 to 3) for instance 3 times and if failed return BadRequest to the mobile application? Or something else?
I don't know if this is a good example cause maybe I could try to rebuild the process so the deadlock won't happen but I would like to know what should I do if they actually happen?

I disagree with Stefan, deadlocks are normal as the MySQL documentation says:
Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
See: MySQL documentation
However, the loop suggested by Stefan is the right solution. Except that it lacks an important point: after Doctrine has thrown an Exception, the EntityManager becomes unusable and you must create a new one in the catch clause with resetManager() from the ManagerRegistry instance.
When I had exactly the same concern as you, I searched the web but couldn't find any completely satisfactory answer. So I got my hands dirty and came back with an article where you'll find an implementation exemple of what I said above:
Thread-safe business logic with Doctrine

What I'd do is post all likes on a queue and consume them using a batch consumer so that you can group the updates on a single post.
If you insist on keeping you current implementation you could go down the road you yourself suggested like this:
<?php
for ($i = 0; $i < $retryCount; $i++) {
try {
// try updating
break;
} catch (\Doctrine\DBAL\Exception\RetryableException $e) {
// you could also add a delay here
continue;
}
}
if ($i === $retryCount) {
// throw BadRequest
}
This is an ugly solution and I wouldn't suggest it. Deadlocks shouldn't be "avoided" by retrying or using delays. Also have a look at named locks and use the same retry system, but don't wait for the deadlock to happen.

The problem is that after Symfony Entity Manager fails - it closes db connection and you can't continue you work with db even if you catch the ORMException.
First good solution is to process your 'likes' async, with rabbitmq or other queue implementation.
Step-by-step:
Create message like {type: 'like', user:123, post: 456}
Publish it in queue
Consume it and update 'likes' count.
You can have several consumers that try to obtain lock on based on postId. If two consumers try to update same post - one of them will fail obtaining the lock. But it's ok, you can consume failed message after.
Second solution is to have special table e.g. post_likes (userId, postId, timestamp). Your endpoint could create new rows in this table synchronously. And you can count 'likes' on some post with this table. Or you can write some cron script, which will update post likes count by this table.

I've made a special class to retry on deadlock (I'm on Symfony 4.4).
Here it is :
class AntiDeadlockService
{
/**
* #var EntityManagerInterface
*/
private $em;
public function __construct(EntityManagerInterface $em)
{
$this->em = $em;
}
public function safePush(): void
{
// to retry on deadlocks or other retryable exceptions
$connection = $this->em->getConnection();
$retry = 0;
$maxRetries = 3;
while ($retry < $maxRetries) {
try {
if (!$this->em->isOpen()) {
$this->em = $this->em->create(
$connection = $this->em->getConnection(),
$this->em->getConfiguration()
);
}
$connection->beginTransaction(); // suspend auto-commit
$this->em->flush();
$connection->commit();
break;
} catch (RetryableException $exception) {
$connection->rollBack();
$retry++;
if ($retry === $maxRetries) {
throw $exception;
}
}
}
}
}
Use this safePush() method instead of the $entityManager->push() one ;)

Related

Symfony Doctrine: When flushing a large amount of entities, how to handle a situation with possible database changes in the background?

Not having a lot of experience with Doctrine I would like to know how to solve the following problem the best way.
This is a code within an E-Mail reminder function:
foreach($this->entityRepository->getEntityExpirationReminderCandidates($reminderDay) as $entity) {
if($entity instanceof Entity) {
try {
$sent = $this->entityStatusUpdateMailer->sendUpdateFetchUser($entity);
if(!$sent) {
throw new MailException("Due to misconfiguration an email could not be sent.");
}
} catch(Throwable $exception){
// unable to send status update email
$sent = false;
}
if($sent) {
$setReminderSent($entity);
$this->entityRepository->persist($entity);
}
}
}
$this->entityRepository->flushEntities();
$this->entityRepository->clearEntities();
Now lets assume that the call to getEntityExpirationReminderCandidates can return up to 5000 entities, that means that loop may run for some seconds. If any of the entities now is changed or deleted in the database in the background will Doctrine still be able to perform the rest of the transaction? It does not feel good to flush each entity on its own within the loop for performance reasons.
I want to avoid that the whole transaction fails though just because a certain entity is deleted in the background while the loop runs. I am curious what could happen and how Doctrine will handle this?
NOTE for better understanding: Afaik on ->flushEntities() a COMMIT; transaction is running which basically means all or nothing.

LARAVEL: Is re queue a job is a bad idea?

I would like to know re queueing a laravel job is a bad idea or not. i had a scenario where i need to pull users post from facebook once they integrated there facebook account to my application. i want to pull {x} days historic data. facebook api like any other api limit there api request per minute. i keep track the request headers and once rate limit reached i saved those information in database and for each re queue i check whether i am eligible to make a call to facebook api
here is the code snippet for a better visualization
<?php
namespace App\Jobs;
class FacebookData implements ShouldQueue
{
/**
* The number of seconds the job can run before timing out.
*
* #var int
*/
public $timeout = 120;
public $userid;
public function __construct($id)
{
$this->userid=$id;
}
public function handle()
{
if($fbhelper->canPullData())
{
$res=$fbhelper->getData($user->id);
if($res['code']==429)
{
$fbhelper->storeRetryAfter($res);
self::dispatch($user->id);
}
}
}
}
The above snippet is a rough idea. is this a good idea? the reason why i post this question is the self::dispatch($user->id); looks like a recursion and it will try until $fbhelper->canPullData() returns true.that probably will take 6 minutes.i am worried about any impact would happen in my application.Thanks in advance
Retrying job is not a bad idea, it is just build into jobs design already. Laravel have retries for this matter, that jobs can do unreliable operations.
As an example in a project i have been working on, an external API we are working with has 1-5 http 500 errors per 100 requests we are sending. This is thou handled by the built in retry functionality of Laravel.
As of Laravel 5.4 you can set it in the class like so. This will do exactly what you want, without defining the logic. Finally for hitting the retry limit, you can define a function called retryAfter(), which specifies when the job should be retried.
class FacebookData {
public $tries = 5;
public function retryAfter() {
//wait 6 minutes
return 360;
}
}
If you want to keep your logic where you only retry 429 errors, i would use the inverse of that to delete the job, if its anything else than a 429.
if ($res['code'] !== 429) {
$this->delete();
}

Handling MySQL transactions from PHP side: strategy and best practices

Let's say I have following dummy code that actually copy-pastes company (client) information with all related objects:
class Company extends BaseModel{
public function companyCopyPaste($existingCompanyId)
{
$this->db->transaction->start();
try{
$newCompanyId = $this->createNewCompany($existingCompanyId);
$this->copyPasteClientObjects($companyId, $newCompanyId)
$this->db->transaction->commit();
} catch(Exception $e){
this->db->transaction->rollback();
}
}
...
}
Method copyPasteClientObjects contains a lot of logic inside, like selecting/updating existing data, aggregating it and saving it.
Also whole process may take up to 10 seconds to complete (due to loads of information to process)
Easiest way is to start transaction in the begging of the method and commit it when its done. But I guess this is not the
right way to do it, but still I want to keep everything integral, also to avoid deadlocks as well. So if one of the steps fail, I want previous steps to be rolled back.
Any good advice how to handle such situations properly?
This is not an answer but some opinion.
If I get you right, you want to implement create-new-from-existing kind operation.
There is nothing really dangerous yet happen while you create new records.
I would suggest you to transform code this way:
try{
$newCompanyId = $this->createNewCompany($existingCompanyId);
$this->copyPasteClientObjects($companyId, $newCompanyId)
} catch(Exception $e){
this->deleteNewCompany($newCompanyId);
}
This way you don't need any transaction, but your deleteNewCompany should revert everything that was done but not finished. Yes it is more work to create that functionality, but to me it makes more sense then to block DB for 10 sec.
And } catch(Exception $e){ imho is not the best practice, you need to define some custom case specific Exception type. Like CopyPasteException or whatever.

PHP concurrency issue, multiple simultaneous requests; mutexes?

So I've just realised that PHP is potentially running multiple requests simultaneously. The logs from last night seem to show that two requests came in, were processed in parallel; each triggered an import of data from another server; each attempted to insert a record into the database. One request failed when it tried to insert a record that the other thread had just inserted (the imported data comes with PKs; I'm not using incrementing IDs): SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '865020' for key 'PRIMARY' ....
Have I diagnosed this issue correctly?
How should I address this?
The following is some of the code. I've stripped out much of it (the logging, the creation of other entities beyond the Patient from the data), but the following should include the relevant snippets. Requests hit the import() method, which calls importOne() for each record to import, essentially. Note the save method in importOne(); that's an Eloquent method (using Laravel and Eloquent) that will generate the SQL to insert/update the record as appropriate.
public function import()
{
$now = Carbon::now();
// Get data from the other server in the time range from last import to current import
$calls = $this->getCalls($this->getLastImport(), $now);
// For each call to import, insert it into the DB (or update if it already exists)
foreach ($calls as $call) {
$this->importOne($call);
}
// Update the last import time to now so that the next import uses the correct range
$this->setLastImport($now);
}
private function importOne($call)
{
// Get the existing patient for the call, or create a new one
$patient = Patient::where('id', '=', $call['PatientID'])->first();
$isNewPatient = $patient === null;
if ($isNewPatient) {
$patient = new Patient(array('id' => $call['PatientID']));
}
// Set the fields
$patient->given_name = $call['PatientGivenName'];
$patient->family_name = $call['PatientFamilyName'];
// Save; will insert/update appropriately
$patient->save();
}
I'd guess that the solution would require a mutex around the entire import block? And if a request couldn't attain a mutex, it'd simply move on with the rest of the request. Thoughts?
EDIT: Just to note, this isn't a critical failure. The exception is caught and logged, and then the request is responded to as per usual. And the import succeeds on the other request, and then that request is responded to as per usual. The users are none-the-wiser; they don't even know about the import, and that isn't the main focus of the request coming in. So really, I could just leave this running as is, and aside from the occasional exception, nothing bad happens. But if there is a fix to prevent additional work being done/multiple requests being sent of to this other server unnecessarily, that could be worth pursuing.
EDIT2: Okay, I've taken a swing at implementing a locking mechanism with flock(). Thoughts? Would the following work? And how would I unit test this addition?
public function import()
{
try {
$fp = fopen('/tmp/lock.txt', 'w+');
if (flock($fp, LOCK_EX)) {
$now = Carbon::now();
$calls = $this->getCalls($this->getLastImport(), $now);
foreach ($calls as $call) {
$this->importOne($call);
}
$this->setLastImport($now);
flock($fp, LOCK_UN);
// Log success.
} else {
// Could not acquire file lock. Log this.
}
fclose($fp);
} catch (Exception $ex) {
// Log failure.
}
}
EDIT3: Thoughts on the following alternate implementation of the lock:
public function import()
{
try {
if ($this->lock()) {
$now = Carbon::now();
$calls = $this->getCalls($this->getLastImport(), $now);
foreach ($calls as $call) {
$this->importOne($call);
}
$this->setLastImport($now);
$this->unlock();
// Log success
} else {
// Could not acquire DB lock. Log this.
}
} catch (Exception $ex) {
// Log failure
}
}
/**
* Get a DB lock, returns true if successful.
*
* #return boolean
*/
public function lock()
{
return DB::SELECT("SELECT GET_LOCK('lock_name', 1) AS result")[0]->result === 1;
}
/**
* Release a DB lock, returns true if successful.
*
* #return boolean
*/
public function unlock()
{
return DB::select("SELECT RELEASE_LOCK('lock_name') AS result")[0]->result === 1;
}
Your example code would block the second request until the first is finished. You would need to use LOCK_NB option for flock() to return error immediately and not wait.
Yes you can use either locking or semaphores, either on filesystem level or directly in the database.
In your case when you need each import file to be processed only once, the best solution would be to have a SQL table with row for each import file. At the beginning of import, you insert the info that import is in progress, so other threads will know to not process it again. After import is finished, you mark it as such. (Then few hours later you can check the table to see if the import really finished.)
Also it is better to do such one-time long-lasting things like import on separate scripts and not while serving normal webpages to visitors. For example you can schedule a nightly cron job which would pick up the import file and process it.
It doesn't seem like you are having a race condition, because the ID is coming from the import file, and if your import algorithm is working correctly then each thread would have its own shard of the work to be done and should never conflict with others. Now it seems like 2 threads are receiving a request to create the same patient and get in conflict with eachother because of bad algorithm.
Make sure that each spawned thread gets a new row from the import file, and repeat only on failure.
If you cant do that, and want to stick to mutex, using a file lock doesn't seem like a very nice solution, since now you solved the conflict within the application, while it is actually occurring in your database. A DB lock should be a lot faster too, and overall a more decent solution.
Request a database lock, like this:
$db -> exec('LOCK TABLES table1 WRITE, table2 WRITE');
And you can expect a SQL error when you would write to a locked table, so surround your Patient->save() with a try catch.
An even better solution would be to use a conditional atomic query. A DB query that also has the condition within it. You could use a query like this:
INSERT INTO targetTable(field1)
SELECT field1
FROM myTable
WHERE NOT(field1 IN (SELECT field1 FROM targetTable))
I see three options:
- use mutex/semaphore/some other flag - not easy to code and maintain
- use DB built-in transaction mechanism
- use queue (like RabbitMQ or 0MQ) to write messages into DB in a row

Gearman & PHP: Proper Way For a Worker to Send Back a Failure

The PHP docs are a bit fuzzy on this one, so I'm asking it here. Given this worker code:
<?php
$gmworker= new GearmanWorker();
$gmworker->addServer();
$gmworker->addFunction("doSomething", "doSomethingFunc");
while($gmworker->work());
function doSomethingFunc()
{
try {
$value = doSomethingElse($job->workload());
} catch (Exception $e) {
// Need to notify the client of the error
}
return $value;
}
What's the proper way to notify the client of any error that took place? Return false? Use GearmanJob::sendFail()? If it's the latter, do I need to return from my doSomethingFunc() after calling sendFail()? Should the return value be whatever sendFail() returns?
The client is using GearmanClient::returnCode() to check for failures. Additionally, simply using "return $value" seems to work, but should I be using GearmanJob::sendData() or GearmanJob::sendComplete() instead?
This may not the be the best way to do it, but it is the method i have used in the past and it has worked well for me.
I use sendException() followed by sendFail() in the worker to return a job failure. The exception part is optional but i use it so the client can error and know roughly why it failed. After the sendFail I return nothing else.
As an example this is a the method that the worker registers as the callback for doing work:
public function doJob(GearmanJob $job)
{
$this->_gearmanJob = $job;
try{
//This method does the actual work
$this->_doJob($job->functionName());
}
catch (Exception $e) {
$job->sendException($e->getMessage());
$job->sendFail();
}
}
After sendFail() do not return anything else, otherwise you may get strange results such as the jobserver thinking the job ended ok.
As regards returning data, i use sendData() if i am returning data in chunks (such as streaming transcoded video, or any 'big' data where i don't want to move around one large blobs) at various intervals during my job with a sendComplete() at the end. Otherwise if I only want to return my data in one go at the end of the job I only use sendComplete().
Hope this helps.

Categories