Transaction priority? - php

I have crons what run script each 3 minutes, script contains function what:
try
begin transaction
loop
//parse large xml data
//send data to database
endloop
commit
endtry
catch
rollback
endcatch
Now, data insertion is long process what takes about 3-6 minutes, and cron is each 3 minutes, so there is sometimes process conflict.
I see when i add commit inside loop that priority has new process, can i somehow say hey new transaction wait until before transaction commit?

I would try and Keep It Simple S....., and use a simple file locking process like this at the top of your existing cron script.
$fp = fopen("/tmp/my_cron_lock.txt", "r+");
if ( ! flock($fp, LOCK_EX)) {
// other cron is overrunning so
// I'll get restarted in 3 mins
// so I will let other job finish
fclose($fp);
exit;
}
// existing script
// free the lock,
// although this will happen automatically when script terminates
fclose($fp);
?>

You can store some lock somewhere persistent, typically that is done in some lock file in a file system:
The process first checks if a file already exists. If so, it exits right away.
If no lock file exists it creates the lock file itself and writes its own process id into it. When terminating, it again checks if that is still its own lock file (by the process id) and removes it if all is fine.
That way you can fire your trigger script (cron job) every minute without any risk.
The same can be done on database or even table level. However that can be less robust depending on the situation, since it obviously fails if there is an issue with the database connection. The less layers are involved, the more robust. And as always: you have to decide yourself what approach is the best. But in general: locking is the answer.

Related

Best way to prevent PHP script from running more than once simultaneously?

I have a long running script (anywhere from 30 to 160 seconds, based on settings) that requests data from an API using NuSOAP, based on this data it builds one big ~1000-4000 row insert query. It than truncates a table and inserts the big query.
When I run the scripts too closely timed one after the other it creates a problem where it loses data. I want to prevent this script from being run twice simultaneously.
This script will in the future also be run every ~5-10 minutes via cron/task scheduler.
Currently I block running the script simultaneously by checking if a file exists:
<?php
header('content-type: application/json');
ignore_user_abort(true);
if (!file_exists('lock.txt')) {
$lock = fopen('lock.txt','w');
fclose($lock);
//~450 API requests using NuSOAP.
//TRUNCATE `table`
//INSERT ~1000-4000 rows into `table
$jsonArray = array(utf8_encode('script')=>utf8_encode('finished'));
unlink('lock.txt');
} else {
$jsonArray = array(utf8_encode('script')=>utf8_encode('locked'));
}
echo json_encode($jsonArray);
?>
Is this a secure way of blocking a script from being run simultaneously? Is is better to check wether a MySQL column contains 'true' or 'false, instead of a file?
Is there a Better way?
I reckon lock file may not be the ideal, I would rather create a temporary session variable to check 'True' or 'False' to find whether the script running or not.
Optionally, I would also keep track of the actual time taken to execute each script which can be useful to check unnecessary time days and overall average time.
You should be still able to use the session even if you running the script via CRON Scheduler by manually assigning the session_id() before using session_start().
However, the Cookies will be browser dependent and it will not work with Cron.

How can I lock parts of the code in an inter-request manner?

Consider a PHP script (possibly calling functions in other scripts). I want to make sure some part of it can only be executed by one request at a time. For example:
doSomething();
doSomethingElse(); // Lock this: Can only be executed by one request at a time
yetAnotherThing();
So if request A is currently 'inside' doSomethingElse(), I want any farther requests to be queued before the line of code calling this function.
I haven't found a solution to this online, because I'm talking about a lock between requests, and not a lock for separate threads executing as part of the same request. I am using Apache server.
You would need to guard the execution by setting up a flag to indicate what should happen or not happen.
You can store the guard-status in any other storage, which is persistent across requests: database, session, flat-file...
The most basic thing you could do is to write a flag file.
This will exclude all subsequent requests from processing doSomethingElse(), while the file exists. But, when the file is gone, the next request will exec doSomethingElse() again.
You might use flock() (http://php.net/manual/en/function.flock.php)
or your own flat-file locking approach. Just for the concept:
Add file_put_contents(__DIR__.'/doSomethingElse.processing.flag', 'processing');
at the start of the doSomethingElse() and remove it at the end of the function.
Then wrap the execution into a condition check:
doSomething();
if( ! is_file(__DIR__.'/doSomethingElse.processing.flag')) {
doSomethingElse();
}
yetAnotherThing();
Building a Queue
Well, you could expand the given idea or use a prepared library/tool for the job.
For building a "queue" you would need to expand the lock-idea and:
add a processing ID,
add a stack to lookup the current lowest ID for processing,
and the lookup and processing logic itself
including "locking" for the resource processed
A queue locks the resource to process and only allows the lowest ID.
The lock is often called semaphore. (It could actually be the highest ID, that depends on your processing logic - its basically LIFO or FIFO stack processing.)
Create a queue and put the job in, then add a worker running as a cronjob or daemon. The worker takes jobs off the queue, processes it and returns the result with the status flag "done". You might then periodically poll the queue to see if job have finished. You can use a database for the queue, pick one that supports locking.
while(1) {
begin new transaction;
remove item from queue;
process item;
save new state of item;
commit;
}
Not sure where you are heading, but you have a lot of options to implement it:
For a file based queue-ing mechanism see this basic tutorial: http://squirrelshaterobots.com/programming/php/building-a-queue-server-in-php-part-1-understanding-the-project/
You could rely on SPLQueue and combine it with the locking idea.
PHP has support for semaphores, too: http://php.net/manual/en/ref.sem.php
Then there are real job-queue systems like Gearman, Beanstalk, Redis or any message queue, like RabbitMQ, ZeroMQ.
See the gearman examples http://php.net/manual/en/gearman.examples-reverse.php
http://laravel.com/docs/5.1/queues
You can put doSomethingElse() function in another file and then you can use
flock
http://php.net/manual/en/function.flock.php
This function is to prevent simultaneous access.

How to design the backend to put requests into a queue?

I want to design such a system that users can submit files. After they submit a file I will run some scripts with the files. I want to run these files in order so I want to maintain a queue of requests. How can I do this with php? Is there any open source library for this?
Thanks!
I would use a database..
When user submits a file it adds a link to it's location in the file system to the database.
You have a cron running that checks for new submissions and processes them in order, when done it marks it as processed in the database.
I would use Redis. Redis is a superfast key-value storage; usually its response time is double digit microseconds. (10 - 99 microseconds)
Redis transactions are atomic (transactions either happen or they don't), and you can have a job constantly running without using cron.
To use Redis with PHP, you can use Predis.
Once redis is installed, and Predis is set up to work with your script, upon uploading the file I would do something like this:
// 'hostname' and 'port' is the hostname and the port
// where Redis is installed and listening to.
$client = new Predis\Client('tcp://hostname:port');
// assuming the path to the file is stored in $pathToFile
$client->lpush('queue:files', $pathToFile);
then the script that needs to work with the files, just needs to do something like:
$client = new Predis\Client('tcp://hostname:port');
// assuming the path to the file is stored in $pathToFile
while(true){
$pathToFile = $client->rpop('queue:files');
if(!$pathToFile) {
// list is empty, so move on.
continue;
}
// there was something in the list, do whatever you need to do with it.
// if there's an exception or an error, you can always use break; or exit; to terminate the loop.
}
Take in consideration that PHP tends to use a lot of memory, so I would either explicitly collect the garbage (via gc_enable() and gc_collect_cycles() and unset() variables as you go).
Alternatively, you can use software like supervisord to make this script run just once, and as soon as it ends, start it up again.
In general, I would stay away from using a database and cron to implement queues. It can lead to serious problems.
Assume, for example, you use a table as a queue.
In the first run, your script pulls a job from the database and start doing its thing.
Then for some reason, your script takes longer to run, and the cron job kicks in again, and now you have 2 scripts working the same file. This could either have no consequences, or could have serious consequences. That depends on what your application is actually doing.
So unless you're working with a very small dataset, and you know for sure that your cronjob will finish before the previous script ran, and there will be no collisions, then you should be fine. Otherwise, stay away from that approach.
Third party Library? Too simple to need an entire Library. You could use Redis (see answer by AlanChavez) if you want to waste time and resources and then have to be concerned about garbage collection, when the real solution is not to bring in garbage into the mix in the first place.
Your queue is a text file. When a file is uploaded, the name of the file is appended to the queue.
$q= fopen('queue.txt','a');
The 'a' mode is important. It automatically moves the write pointer to the end of the file for append writes. But the reason it is important is because if the file does not exist, a new one is created.
fwrite($q,"$filename\n");
fclose($q);
If there are simultaneous append writes to this file the OS with arbitrate the conflict without error. No need for file locking, cooperative multitasking, or transactional processing.
When your script that processes the queue begins to run it renames the live queue to a working queue.
if(!file_exists('q.txt')){
if(!file_exists('queue.txt')){exit;}
rename('queue.txt','q.txt');
$q = fopen(`q.txt`,'r');
while (($filename = fgets($q, 4096)) !== false) {
process($filename);
}
fclose($q);
unlink('q.txt');
}
else{
echo 'Houston, we have a problem';
}
Now you see why the 'a' mode was important. We rename the queue and when the next upload occurs, the queue.txt is automatically created.
If the file is being written to as it is being rename the OS will sort it out without error. The rename is so fast that the chance of a simultaneous write is astronomically small. And it is a basic OS feature to sort out file system contention. No need for file locking, cooperative multitasking, or transactional processing.
This is a bullet proof method I have been using for many years.
Replace the Apollo 13 quote with an error recovery routine. If q.txt exists the previous processing did not complete successfully.
That was too easy.
Because it is so simple and we have plenty of memory because we are so efficient: Let's have some fun.
Let's see if writing to the queue is faster than AlanChavez's "super fast" Redis with its only double digit millisecond response.
The time in seconds to add a file name to the queue = 0.000014537 or 14.5µS. Slightly better than Redis's "super fast" 10-99 mS Response Time by, at minimum, 100,000%.

Two users write to a file at the same time? (PHP/file_put_contents)

If I write data to a file via file_put_contents with the FILE_APPEND flag set and two users submit data at the same time, will it append regardless, or is there a chance one entry will be overwritten?
If I set the LOCK_EX flag, will the second submission wait for the first submission to complete, or is the data lost when an exclusive lock can't be obtained?
How does PHP generally handle that? I'm running version 5.2.9. if that matters.
Thanks,
Ryan
you could also check the flock function to implement proper locking (not based on the while / sleep trick)
If you set an exclusive file lock via LOCK_EX, the second script (time-wise) that attempts to write will simply return false from file_put_contents.
i.e.: It won't sit and wait until the file becomes available for writing.
As such, if so required you'll need to program in this behaviour yourself, perhaps by attempting to use file_put_contents a limited number of times (e.g.: 3) with a suitably sized usage of sleep between each attempt.

PHP and concurrent file access

I'm building a small web app in PHP that stores some information in a plain text file. However, this text file is used/modified by all users of my app at some given point in time and possible at the same time.
So the questions is. What would be the best way to make sure that only one user can make changes to the file at any given point in time?
You should put a lock on the file
$fp = fopen("/tmp/lock.txt", "r+");
if (flock($fp, LOCK_EX)) { // acquire an exclusive lock
ftruncate($fp, 0); // truncate file
fwrite($fp, "Write something here\n");
fflush($fp); // flush output before releasing the lock
flock($fp, LOCK_UN); // release the lock
} else {
echo "Couldn't get the lock!";
}
fclose($fp);
Take a look at the http://www.php.net/flock
My suggestion is to use SQLite. It's fast, lightweight, stored in a file, and has mechanisms for preventing concurrent modification. Unless you're dealing with a preexisting file format, SQLite is the way to go.
You could do a commit log sort of format, sort of how wikipedia does.
Use a database, and every saved change creates a new row in the database, that makes the previous record redundant, with an incremented value, then you only have to worry about getting table locks during the save phase.
That way at least if 2 concurrent people happen to edit something, both changes will appear in the history and whatever one lost out to the commit war can be copied into the new revision.
Now if you don't want to use a database, then you have to worry about having a revision control file backing every visible file.
You could put a revision control ( GIT/MERCURIAL/SVN ) on the file system and then automate commits during the save phase,
Pseudocode:
user->save :
getWritelock();
write( $file );
write_commitmessage( $commitmessagefile ); # <-- author , comment, etc
call "hg commit -l $commitmessagefile $file " ;
releaseWriteLock();
done.
At least this way when 2 people make critical commits at the same time, neither will get lost.
A single file for many users really shouldn't be the strategy you use I don't think - otherwise you'll probably need to implement a single (global) access point that monitors if the file is currently being edited or not. Aquire a lock, do your modification, release the lock etc. I'd go with 'Nobody's suggestion to use a database (SQLite if you don't want the overhead of a fully decked out RDBMS)

Categories