So I've just realised that PHP is potentially running multiple requests simultaneously. The logs from last night seem to show that two requests came in, were processed in parallel; each triggered an import of data from another server; each attempted to insert a record into the database. One request failed when it tried to insert a record that the other thread had just inserted (the imported data comes with PKs; I'm not using incrementing IDs): SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '865020' for key 'PRIMARY' ....
Have I diagnosed this issue correctly?
How should I address this?
The following is some of the code. I've stripped out much of it (the logging, the creation of other entities beyond the Patient from the data), but the following should include the relevant snippets. Requests hit the import() method, which calls importOne() for each record to import, essentially. Note the save method in importOne(); that's an Eloquent method (using Laravel and Eloquent) that will generate the SQL to insert/update the record as appropriate.
public function import()
{
$now = Carbon::now();
// Get data from the other server in the time range from last import to current import
$calls = $this->getCalls($this->getLastImport(), $now);
// For each call to import, insert it into the DB (or update if it already exists)
foreach ($calls as $call) {
$this->importOne($call);
}
// Update the last import time to now so that the next import uses the correct range
$this->setLastImport($now);
}
private function importOne($call)
{
// Get the existing patient for the call, or create a new one
$patient = Patient::where('id', '=', $call['PatientID'])->first();
$isNewPatient = $patient === null;
if ($isNewPatient) {
$patient = new Patient(array('id' => $call['PatientID']));
}
// Set the fields
$patient->given_name = $call['PatientGivenName'];
$patient->family_name = $call['PatientFamilyName'];
// Save; will insert/update appropriately
$patient->save();
}
I'd guess that the solution would require a mutex around the entire import block? And if a request couldn't attain a mutex, it'd simply move on with the rest of the request. Thoughts?
EDIT: Just to note, this isn't a critical failure. The exception is caught and logged, and then the request is responded to as per usual. And the import succeeds on the other request, and then that request is responded to as per usual. The users are none-the-wiser; they don't even know about the import, and that isn't the main focus of the request coming in. So really, I could just leave this running as is, and aside from the occasional exception, nothing bad happens. But if there is a fix to prevent additional work being done/multiple requests being sent of to this other server unnecessarily, that could be worth pursuing.
EDIT2: Okay, I've taken a swing at implementing a locking mechanism with flock(). Thoughts? Would the following work? And how would I unit test this addition?
public function import()
{
try {
$fp = fopen('/tmp/lock.txt', 'w+');
if (flock($fp, LOCK_EX)) {
$now = Carbon::now();
$calls = $this->getCalls($this->getLastImport(), $now);
foreach ($calls as $call) {
$this->importOne($call);
}
$this->setLastImport($now);
flock($fp, LOCK_UN);
// Log success.
} else {
// Could not acquire file lock. Log this.
}
fclose($fp);
} catch (Exception $ex) {
// Log failure.
}
}
EDIT3: Thoughts on the following alternate implementation of the lock:
public function import()
{
try {
if ($this->lock()) {
$now = Carbon::now();
$calls = $this->getCalls($this->getLastImport(), $now);
foreach ($calls as $call) {
$this->importOne($call);
}
$this->setLastImport($now);
$this->unlock();
// Log success
} else {
// Could not acquire DB lock. Log this.
}
} catch (Exception $ex) {
// Log failure
}
}
/**
* Get a DB lock, returns true if successful.
*
* #return boolean
*/
public function lock()
{
return DB::SELECT("SELECT GET_LOCK('lock_name', 1) AS result")[0]->result === 1;
}
/**
* Release a DB lock, returns true if successful.
*
* #return boolean
*/
public function unlock()
{
return DB::select("SELECT RELEASE_LOCK('lock_name') AS result")[0]->result === 1;
}
Your example code would block the second request until the first is finished. You would need to use LOCK_NB option for flock() to return error immediately and not wait.
Yes you can use either locking or semaphores, either on filesystem level or directly in the database.
In your case when you need each import file to be processed only once, the best solution would be to have a SQL table with row for each import file. At the beginning of import, you insert the info that import is in progress, so other threads will know to not process it again. After import is finished, you mark it as such. (Then few hours later you can check the table to see if the import really finished.)
Also it is better to do such one-time long-lasting things like import on separate scripts and not while serving normal webpages to visitors. For example you can schedule a nightly cron job which would pick up the import file and process it.
It doesn't seem like you are having a race condition, because the ID is coming from the import file, and if your import algorithm is working correctly then each thread would have its own shard of the work to be done and should never conflict with others. Now it seems like 2 threads are receiving a request to create the same patient and get in conflict with eachother because of bad algorithm.
Make sure that each spawned thread gets a new row from the import file, and repeat only on failure.
If you cant do that, and want to stick to mutex, using a file lock doesn't seem like a very nice solution, since now you solved the conflict within the application, while it is actually occurring in your database. A DB lock should be a lot faster too, and overall a more decent solution.
Request a database lock, like this:
$db -> exec('LOCK TABLES table1 WRITE, table2 WRITE');
And you can expect a SQL error when you would write to a locked table, so surround your Patient->save() with a try catch.
An even better solution would be to use a conditional atomic query. A DB query that also has the condition within it. You could use a query like this:
INSERT INTO targetTable(field1)
SELECT field1
FROM myTable
WHERE NOT(field1 IN (SELECT field1 FROM targetTable))
I see three options:
- use mutex/semaphore/some other flag - not easy to code and maintain
- use DB built-in transaction mechanism
- use queue (like RabbitMQ or 0MQ) to write messages into DB in a row
Related
I want to prevent a user from making the same request two times by using the Symfony Lock component. Because now users can click on a link two times(by accident?) and duplicate entities are created. I want to use the Unique Entity Constraint which does not protect against race conditions itself.
The Symfony Lock component does not seem to work as expected. When I create a lock in the beginning of a page and open the page two times at the same time the lock can be acquired by both requests. When I open the test page in a standard and incognito browser window the second request doesn't acquire the lock. But I can't find anything in the docs about this being linked to a session. I have created a small test file in a fresh project to isolate the problem. This is using php 7.4 symfony 5.3 and the lock component
<?php
namespace App\Controller;
use Sensio\Bundle\FrameworkExtraBundle\Configuration\Template;
use Symfony\Bundle\FrameworkBundle\Controller\AbstractController;
use Symfony\Component\Lock\LockFactory;
use Symfony\Component\Routing\Annotation\Route;
class LockTest extends AbstractController
{
/**
* #Route("/test")
* #Template("lock/test.html.twig")
*/
public function test(LockFactory $factory): array
{
$lock = $factory->createLock("test");
$acquired = $lock->acquire();
dump($lock, $acquired);
sleep(2);
dump($lock->isAcquired());
return ["message" => "testing"];
}
}
I slightly rewrote your controller like this (with symfony 5.4 and php 8.1):
class LockTestController extends AbstractController
{
#[Route("/test")]
public function test(LockFactory $factory): JsonResponse
{
$lock = $factory->createLock("test");
$t0 = microtime(true);
$acquired = $lock->acquire(true);
$acquireTime = microtime(true) - $t0;
sleep(2);
return new JsonResponse(["acquired" => $acquired, "acquireTime" => $acquireTime]);
}
}
It waits for the lock to be released and it counts the time the controller waits for the lock to be acquired.
I ran two requests with curl against a caddy server.
curl -k 'https://localhost/test' & curl -k 'https://localhost/test'
The output confirms one request was delayed while the first one slept with the acquired lock.
{"acquired":true,"acquireTime":0.0006971359252929688}
{"acquired":true,"acquireTime":2.087146043777466}
So, the lock works to guard against concurrent requests.
If the lock is not blocking:
$acquired = $lock->acquire(false);
The output is:
{"acquired":true,"acquireTime":0.0007710456848144531}
{"acquired":false,"acquireTime":0.00048804283142089844}
Notice how the second lock is not acquired. You should use this flag to reject the user's request with an error instead of creating the duplicate entity.
If the two requests are sufficiently spaced apart to each get the lock in turn, you can check that the entity exists (because it had time to be fully committed to the db) and return an error.
Despite those encouraging results, the doc mentions this note:
Unlike other implementations, the Lock Component distinguishes lock instances even when they are created for the same resource. It means that for a given scope and resource one lock instance can be acquired multiple times. If a lock has to be used by several services, they should share the same Lock instance returned by the LockFactory::createLock method.
I understand two locks acquired by two distinct factories should not block each other. Unless the note is outdated or wrongly phrased, it seems possible to have non working locks under some circumstances. But not with the above test code.
StreamedResponse
A lock is released when it goes out of scope.
As a special case, when a StreamedResponse is returned, the lock goes out of scope when the response is returned by the controller. But the StreamedResponse has yet to return anything!
To keep the lock while the response is generated, it must be passed to the function executed by the StreamedResponse:
public function export(LockFactory $factory): Response
{
// create a lock with a TTL of 60s
$lock = $factory->createLock("test", 60);
if (!$lock->acquire(false)) {
return new Response("Too many downloads", Response::HTTP_TOO_MANY_REQUESTS);
}
$response = new StreamedResponse(function () use ($lock) {
// now $lock is still alive when this function is executed
$lockTime = time();
while (have_some_data_to_output()) {
if (time() - $lockTime > 50) {
// refresh the lock well before it expires to be on safe side
$lock->refresh();
$lockTime = time();
}
output_data();
}
$lock->release();
};
$response->headers->set('Content-Type', 'text/csv');
// lock would be released here if it wasn't passed to the StreamedResponse
return $response;
}
The above code refreshes the lock every 50s to cut down on communication time with the storage engine (such as redis).
The lock remains locked for at most 60s should the php process suddenly die.
I have a mobile application and server based on Symfony which gives API for the mobile app.
I have a situation, where users can like Post. When users like Post I add an entry in ManyToMany table that this particular user liked this particular Post (step 1). Then in Post table I increase likesCounter (step 2). Then in User table I increase gamification points for user (because he liked the Post) (step 3).
So there is a situation where many users likes particular Post at the same time and deadlock occurs (on Post table or on User table).
How to handle this? In Doctrine Docs I can see solution like this:
<?php
try {
// process stuff
} catch (\Doctrine\DBAL\Exception\RetryableException $e) {
// retry the processing
}
but what should I do in catch part? Retry the whole process of liking (steps 1 to 3) for instance 3 times and if failed return BadRequest to the mobile application? Or something else?
I don't know if this is a good example cause maybe I could try to rebuild the process so the deadlock won't happen but I would like to know what should I do if they actually happen?
I disagree with Stefan, deadlocks are normal as the MySQL documentation says:
Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
See: MySQL documentation
However, the loop suggested by Stefan is the right solution. Except that it lacks an important point: after Doctrine has thrown an Exception, the EntityManager becomes unusable and you must create a new one in the catch clause with resetManager() from the ManagerRegistry instance.
When I had exactly the same concern as you, I searched the web but couldn't find any completely satisfactory answer. So I got my hands dirty and came back with an article where you'll find an implementation exemple of what I said above:
Thread-safe business logic with Doctrine
What I'd do is post all likes on a queue and consume them using a batch consumer so that you can group the updates on a single post.
If you insist on keeping you current implementation you could go down the road you yourself suggested like this:
<?php
for ($i = 0; $i < $retryCount; $i++) {
try {
// try updating
break;
} catch (\Doctrine\DBAL\Exception\RetryableException $e) {
// you could also add a delay here
continue;
}
}
if ($i === $retryCount) {
// throw BadRequest
}
This is an ugly solution and I wouldn't suggest it. Deadlocks shouldn't be "avoided" by retrying or using delays. Also have a look at named locks and use the same retry system, but don't wait for the deadlock to happen.
The problem is that after Symfony Entity Manager fails - it closes db connection and you can't continue you work with db even if you catch the ORMException.
First good solution is to process your 'likes' async, with rabbitmq or other queue implementation.
Step-by-step:
Create message like {type: 'like', user:123, post: 456}
Publish it in queue
Consume it and update 'likes' count.
You can have several consumers that try to obtain lock on based on postId. If two consumers try to update same post - one of them will fail obtaining the lock. But it's ok, you can consume failed message after.
Second solution is to have special table e.g. post_likes (userId, postId, timestamp). Your endpoint could create new rows in this table synchronously. And you can count 'likes' on some post with this table. Or you can write some cron script, which will update post likes count by this table.
I've made a special class to retry on deadlock (I'm on Symfony 4.4).
Here it is :
class AntiDeadlockService
{
/**
* #var EntityManagerInterface
*/
private $em;
public function __construct(EntityManagerInterface $em)
{
$this->em = $em;
}
public function safePush(): void
{
// to retry on deadlocks or other retryable exceptions
$connection = $this->em->getConnection();
$retry = 0;
$maxRetries = 3;
while ($retry < $maxRetries) {
try {
if (!$this->em->isOpen()) {
$this->em = $this->em->create(
$connection = $this->em->getConnection(),
$this->em->getConfiguration()
);
}
$connection->beginTransaction(); // suspend auto-commit
$this->em->flush();
$connection->commit();
break;
} catch (RetryableException $exception) {
$connection->rollBack();
$retry++;
if ($retry === $maxRetries) {
throw $exception;
}
}
}
}
}
Use this safePush() method instead of the $entityManager->push() one ;)
I have an API written in Laravel. There is the following code in it:
public function getData($cacheKey)
{
if(Cache::has($cacheKey)) {
return Cache::get($cacheKey);
}
// if cache is empty for the key, get data from external service
$dataFromService = $this->makeRequest($cacheKey);
$dataMapped = array_map([$this->transformer, 'transformData'], $dataFromService);
Cache::put($cacheKey, $dataMapped);
return $dataMapped;
}
In getData() if cache contains necessary key, data returned from cache.
If cache does not have necessary key, data is fetched from external API, processed and placed to cache and after that returned.
The problem is: when there are many concurrent requests to the method, data is corrupted. I guess, data is written to cache incorrectly because of race conditions.
You seem to be experiencing some sort of critical section problem. But here's the thing. Redis operations are atomic however Laravel does its own checks before calling Redis.
The major issue here is that all concurrent requests will all cause a request to be made and then all of them will write the results to the cache (which is definitely not good). I would suggest implementing a simple mutual exclusion lock on your code.
Replace your current method body with the following:
public function getData($cacheKey)
{
$mutexKey = "getDataMutex";
if (!Redis::setnx($mutexKey,true)) {
//Already running, you can either do a busy wait until the cache key is ready or fail this request and assume that another one will succeed
//Definately don't trust what the cache says at this point
}
$value = Cache::rememberForever($cacheKey, function () { //This part is just the convinience method, it doesn't change anything
$dataFromService = $this->makeRequest($cacheKey);
$dataMapped = array_map([$this->transformer, 'transformData'], $dataFromService);
return $dataMapped;
});
Redis::del($mutexKey);
return $value;
}
setnx is a native redis command that sets a value if it doesn't exist already. This is done atomically so it can be used to implement a simple locking mechanism, but (as mentioned in the manual) will not work if you're using a redis cluster. In that case the redis manual describes a method to implement distributed locks
In the end I came to the following solution: I use retry() function from Laravel 5.5 helpers to get cache value until it is written there normally with interval of 1 second.
On a project i am working we use Symfony2 console commands to run image converting (using LaTeX and some imagick). Due to the nature of project, not all conditions may be met during the console command run so the execution will fail, to be later restarted with a cron job, only if attempts count is not higher that predefined limit.
We already hove logging in our project, we use Monolog logger. What i basically want is to somehow duplicate everything that goes to the main log file in another log file, created specifically for that console command execution and only if attempts limit is reached.
So, if we run command once and it fails - it's ok and nothing should be created.
But if we run command for the 10th time, which is attempt limit, i want to have a separate log file named, say '/logs/failed_commands//fail.log'. That log file should only have messages for the last failed attempt, but not for all the previous ones.
How to do that? Do i need some combination of special logger handler (like FingersCrossed) and proper exceptions handling? Should i rather create additional instance of logger (if so, how can i pass it over to dependent services?)
This is simplified and cleaned piece of command that runs images converting. The attempts limit is checked withing the $this->checkProcessLimit() method
public function execute(InputInterface $input, OutputInterface $output)
{
try {
set_time_limit(0); //loose any time restrictions
$this->checkingDirectories();
$this->checkProcessLimit();
$this->isBuildRunning();
$this->checkingFiles();
try {
$this->startPdfBuilding();
} catch (InternalProjectException $e) {
throw PdfBuildingException::failedStartBuilding($this->pressSheet->getId());
}
} catch (PdfBuildingException $e) {
$this->printError($output, $e);
return;
}
try {
$this->logger->info('Building Image.');
$this->instantiatePdfBuilder();
$buildingResult = $this->pdfBuilder->outputPdf();
$this->generatePreview($buildingResult);
$this->movePDFs($buildingResult);
$this->unlinkLockFile();
$output->writeln('<info>Image successfully built</info>');
} catch (LaTeXException $e) {
$this->unlinkLockFile();
$this->abortPdfBuilding($e->getMessage());
$this->printError($output, $e);
return;
}
}
UPD: It seems that for dumping a bunch of log entries i need to use BufferHandler bundled with Monolog Logger. But i still need to figure out the way to set it up to get dumps only when errors limit (not error level) reached.
UPD2: I've managed to make it work, but i don't like the solution.
Since in Symfony2 you have to define loggers in config.yml and have to rebuild cache for any changes in configuration, i had to resort to dynamically adding a handler to a logger. But the logger itself is considered to be of Psr\Log\LoggerInterface interface, which does not have any means to add handlers. The solution i had to use actually checks if used logger is an instance of Monolog\Logger and then manually adding a BufferHandler to it in Symfony2 Console command's initialize() method.
Then, when it comes to the point where I check for attempts limit, i close buffer handler and delete actual log file (since BufferHandler has no means to removing/closing itself without flushing all it's contents) if limit is not yet reached. If it is, i just let the log file to stay.
This way it works, but it always writes the log, and i have to remove logs if condition (reached attempt limit) is not met.
i think you must create a custom handler.
With Monolog, you can log in a database (see for example https://github.com/Seldaek/monolog/blob/master/doc/04-extending.md)
Thus, it's easy to know how many times an error was raised since x days.
(something like : "select count(*) from monolog where channel='...' and time>...")
I wrote a class that syncs the db from an xml file and reports through email any alerts.
The xml contains product prices and stock.
The execution of the method only occurs only if the xml filetime is newer than the last one synced.
Here is the first problem. I suspect that server (randomly) changes the filetime for some reason, becuse the sync method runs although no new xml file produced.
The xml file is exported from a local server and uploads to the remote server through an ftp client
(SyncBack)
Second problem is that on heavy traffic hours, the do_sync method runs more than once because i get the alerts more than once into my email.
I understand why it is called many times, so i created a flag syncing_now, to prevent the execution.
The mistake is that the flag is stored into db and since the first call has to update the db, all other call can run the method.
<?php class Sync extends Model{
public function __construct(){
parent::__construct();
$this->syncing_now = $this->db->get($syncing_now);
}//END constructor
public function index(){
if($this->determine_sync()){
$this->do_sync();
}else{
return FALSE;
}
}
public function determine_sync(){
if( filemtime($file) <= $this->db->last_sync() or !$this->$syncing_now){
return FALSE;
}else{
return TRUE;
}
}
public function do_sync(){
$this->db->update('syncing_now', TRUE);
//the sync code works fine..
$this->db->update('syncing_now', FALSE);
}
}
So what can i do to run the method only once and how can track down why the filetime change occurs?
Thanks all any help appreciated.
I suggest you use a Table that stores your Synchronisations.
id | md5_of_xml_file | synched_date
Now use LOCK_TABLES in order to ensure that only one process at a time may process your sync files.
Lock the synchronisations table. If locking them fails, just quit.
if (!mysqli_query('LOCK TABLES synchronisations WRITE')) {
die();//quit;
}
If an entry with the hash of the XML Sync File already exists, just quit.
$md5Hash = md5_file('yourXmlSyncFile.xml');
$result = null;
$stmt= $mysqli->prepare("SELECT md5_of_xml_file FROM synchronisations
WHERE md5_of_xml_file=?");
$stmt->bind_param("s", $md5Hash);
$stmt->execute();
$stmt->bind_result($result);
$stmt->fetch();
$stmt->close();
if ($result == $md5Hash) {
die();//quit;
}
Else, attempt to sync the file. If that works, add an entry, storing when you did this and a hash of the file used for synchronisation.