Run function in PHP CLI script after period of inactivity - php

I used Symfony2 with the RabbitMqBundle to create a worker that sends documents to ElasticSearch. Indexing documents at a one-by-one rate is much slower than using the ElasticSearch bulk API. Therefore I created a buffer that flushes the documents to ES in groups of thousand. The code looks (a bit simplified) as follows:
class SearchIndexator
{
protected $elasticaService;
protected $buffer = [];
protected $bufferSize = 0;
// The maximum number of documents to keep in the buffer.
// If the buffer reaches this amount of documents, then the buffers content
// is send to elasticsearch for indexation.
const MAX_BUFFER_SIZE = 1000;
public function __construct(ElasticaService $elasticaService)
{
$this->elasticaService = $elasticaService;
}
/**
* Destructor
*
* Flush any documents that remain in the buffer.
*/
public function __destruct()
{
$this->flush();
}
/**
* Add a document to the indexation buffer.
*/
public function onMessage(array $document)
{
// Prepare the document for indexation.
$this->doHeavyWeightStuff($document);
// Create an Elastica document
$document = new \Elastica\Document(
$document['key'],
$document
);
// Add the document to the buffer.
$this->buffer[] = $document;
// Flush the buffer when max buffersize has been reached.
if (self::MAX_BUFFER_SIZE <= ++$this->bufferSize) {
$this->flush();
}
}
/**
* Send the current buffer to ElasticSearch for indexation.
*/
public function flush()
{
// Send documents to ElasticSearch for indexation.
if (1 <= $this->bufferSize) {
$this->elasticaService->addDocuments($this->buffer);
}
// Clear buffer
$this->buffer = [];
$this->bufferSize = 0;
}
}
This all works quite nice but there is a slight problem. The queue gets filled with messages at an unpredictable rate. Sometimes 100000 in 5 minutes, sometimes not one for hours. When there are for instance 82671 documents queued, the last 671 documents don't get indexated before receiving another 329 documents which might take hours. I would like to be able to do the following:
Warning: Sci-Fi code! This obviously won't work:
class SearchIndexator
{
protected $elasticaService;
protected $buffer = [];
protected $bufferSize = 0;
protected $flushTimer;
// The maximum number of documents to keep in the buffer.
// If the buffer reaches this amount of documents, then the buffers content
// is send to elasticsearch for indexation.
const MAX_BUFFER_SIZE = 1000;
public function __construct(ElasticaService $elasticaService)
{
$this->elasticaService = $elasticaService;
// Highly Sci-fi code
$this->flushTimer = new Timer();
// Flush buffer after 5 minutes of inactivity.
$this->flushTimer->setTimeout(5 * 60);
$this->flushTimer->setCallback([$this, 'flush']);
}
/**
* Destructor
*
* Flush any documents that remain in the buffer.
*/
public function __destruct()
{
$this->flush();
}
/**
* Add a document to the indexation buffer.
*/
public function onMessage(array $document)
{
// Prepare the document for indexation.
$this->doHeavyWeightStuff($document);
// Create an Elastica document
$document = new \Elastica\Document(
$document['key'],
$document
);
// Add the document to the buffer.
$this->buffer[] = $document;
// Flush the buffer when max buffersize has been reached.
if (self::MAX_BUFFER_SIZE <= ++$this->bufferSize) {
$this->flush();
} else {
// Start a timer that will flush the buffer after a timeout.
$this->initTimer();
}
}
/**
* Send the current buffer to ElasticSearch for indexation.
*/
public function flush()
{
// Send documents to ElasticSearch for indexation.
if (1 <= $this->bufferSize) {
$this->elasticaService->addDocuments($this->buffer);
}
// Clear buffer
$this->buffer = [];
$this->bufferSize = 0;
// There are no longer messages to be send, stop the timer.
$this->flushTimer->stop();
}
protected function initTimer()
{
// Start or restart timer
$this->flushTimer->isRunning()
? $this->flushTimer->reset()
: $this->flushTimer->start();
}
}
Now, I know about the limitations of PHP not being event driven. But this is 2015 and there are solutions like ReactPHP, so this should be possible right? For ØMQ there is this function. What would be a solution that will work for RabbitMQ or independent of any message queue extension?
Solutions that I'm skeptical about:
There is crysalead/code. It simulates a timer using declare(ticks = 1);. I'm not sure wether this is a performant and solid approach. Any ideas?
I could run a cronjob that publishes a 'FLUSH' message to the same queue every 5 minutes and then explicitly flush the buffer when receiving this message but that would be cheating.

As I've mentioned in my comment you could use the signals. PHP allows you to register signal handlers to your scripts signals (i.e. SIGINT, SIGKILL etc.)
For your use-case you can use the SIGALRM signal. This signal will alarm your script after a certain time (that you can set) has expired. Positive side of these signals is that they are non-blocking. In other words the normal operation of your script will not be interfered with.
The adjusted solution (ticks are deprecated since PHP 5.3):
function signal_handler($signal) {
// You would flush here
print "Caught SIGALRM\n";
// Set the SIGALRM timer again or it won't trigger again
pcntl_alarm(300);
}
// register your handler with the SIGALRM signal
pcntl_signal(SIGALRM, "signal_handler", true);
// set the timeout for the SIGALRM signal to 300 seconds
pcntl_alarm(300);
// start loop and check for pending signals
while(pcntl_signal_dispatch() && your_loop_condition) {
//Execute your code here
}
Note: you can only use 1 SIGALRM signal in your script, if you set the time of your signal with pcntl_alarm the timer for your alarm will reset (without firing the signal) to its newly set value.

Related

PHP pcntl_alarm - pcntl_signal handler fires late

Problem
The title really says it all. I had a Timeout class that handled timing out using pcntl_alarm(), and during ssh2_* calls, the signal handler simply does not fire when it should.
In its most basic form,
print date('H:i:s'); // A - start waiting, ideally 5 seconds at the most
pcntl_alarm(5);
// SSH attempt to download remote file /dev/random to ensure it blocks *hard*
// (omitted)
print date('H:i:s'); // B - get here once it realizes it's a tripeless-cat scenario
with a signal handler that also outputs the date, I would expect this (rows B and C might be inverted, that does not matter)
"A, at 00:00:00"
"C, at 00:00:05" <-- from the signal handler
"B, at 00:00:05"
but the above will obtain this disappointing result instead:
"A, at 00:00:00"
"C, at 00:01:29" <-- from the signal handler
"B, at 00:01:29"
In other words, the signal handler does fire, but it does so once another, longer, timeout has expired. I can only guess this timeout is inside ssh2_*. Quickly browsing through the source code yielded nothing obvious. What's worse, this ~90 seconds timeout is what I can reproduce by halting a download. In other cases when the firewall dropped the wrong SSH2 packet, I got a stuck process with an effectively infinite timeout.
As the title might reveal, I have already checked other questions and this is definitely not a mis-quoting of SIGALRM (also, the Timeout class worked beautifully). It seems like I need a louder alarm when libssh2 is involved.
Workaround
This modification of the Timeout yields a slightly less precise, but always working system, yet it does so in an awkward, clunky way. I'd really prefer for pcntl_alarm to work in all cases as it's supposed to.
public static function install() {
self::$enabled = function_exists('pcntl_async_signals') && function_exists('posix_kill') && function_exists('pcntl_wait');
if (self::$enabled) {
// Just to be on the safe side
pcntl_async_signals(true);
pcntl_signal(SIGALRM, static function(int $signal, $siginfo) {
// Child just died.
$status = null;
pcntl_wait($status);
static::$expired = true;
});
}
return self::$enabled;
}
public static function wait(int $time) {
if (self::$enabled) {
static::$expired = false;
// arrange for a SIGALRM to be delivered in $time seconds
// pcntl_alarm($time);
$ppid = posix_getpid();
$pid = pcntl_fork();
if ($pid == -1) {
throw new RuntimeException('Could not fork alarming child');
}
if ($pid) {
// save the child's PID
self::$thread = $pid;
return;
}
// we are the child. Send SIGALRM to the parent after requested timeout
sleep($time);
posix_kill($ppid, SIGALRM);
die();
}
}
/**
* Cancel the timeout and verify whether it expired.
**/
public static function expired(): bool {
if (self::$enabled) {
// Have we spawned an alarm?
if (self::$thread) {
// Yes, so kill it.
posix_kill(self::$thread, SIGTERM);
self::$thread = 0;
}
// Maybe.
$status = null;
pcntl_wait($status);
// pcntl_alarm(0);
}
return static::$expired;
}
Question
Can pcntl_alarm() be made to work as expected?

PHP cURL Timing Issue

I have a PHP script that is used to query an API and download some JSON information / insert that information into a MySQL database, we'll call this scriptA.php. I need to run this script multiple times as minute, preferably as many times in a minute that I can without allowing two instances to run at the same exact time or with any overlap. My solution to this has been to create scriptB.php and put in on a one minute cron job. Here is the source code of scriptB.php...
function next_run()
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "http://somewebsite.com/scriptA.php");
curl_exec($curl);
curl_close($curl);
unset($curl);
}
$i = 0;
$times_to_run = 7;
$function = array();
while ($i++ < $times_to_run) {
$function = next_run();
sleep(3);
}
My question at this point is to how cURL performs when used in a loop, does this code trigger scriptA.php and THEN once it has finished loading it at that point start the next cURL request? Does the 3 second sleep even make a difference or will this literally run as fast as the time it takes each cURL request to complete. My objective is to time this script and run it as many times as possible in a one minute window without two iterations of it being run at the same time. I don't want to include the sleep statement if it is not needed. I believe what happens is cURL will run each request upon finishing the last, if I am wrong is there someway that I can instruct it to do this?
preferably as many times in a minute that I can without allowing two instances to run at the same exact time or with any overlap. - then you shouldn't use a cronjob at all, you should use a daemon. but if you for some reason have to use a cronjob (eg, if you're on a shared webhosting platform that doesn't allow daemons), guess you could use the sleep hack to run the same code several times a minute?
* * * * * /usr/bin/php /path/to/scriptA.php
* * * * * sleep 10; /usr/bin/php /path/to/scriptA.php
* * * * * sleep 20; /usr/bin/php /path/to/scriptA.php
* * * * * sleep 30; /usr/bin/php /path/to/scriptA.php
* * * * * sleep 40; /usr/bin/php /path/to/scriptA.php
* * * * * sleep 50; /usr/bin/php /path/to/scriptA.php
should make it execute every 10 seconds.
as for making sure it doesn't run in paralell if the previous execution hasn't finished yet, add this to the start of scriptA
call_user_func ( function () {
static $lock;
$lock = fopen ( __FILE__, "rb" );
if (! flock ( $lock, LOCK_EX | LOCK_NB )) {
// failed to get a lock, probably means another instance is already running
die ();
}
register_shutdown_function ( function () use (&$lock) {
flock ( $lock, LOCK_UN );
} );
} );
and it will just die() if another instance of scriptA is already running. however, if you want it to wait for the previous execution to finish, instead of just exiting, remove LOCK_NB... but that could be dangerous, if every, or even just a majority of the executions use more than 10 seconds, you'll have more and more processes waiting for the previous execution to finish, until you run out of ram.
as for your curl questions,
My question at this point is to how cURL performs when used in a loop, does this code trigger scriptA.php and THEN once it has finished loading it at that point start the next cURL request, that is correct, curl waits until the page has completely loaded, usually meaning the entire scriptA has completed. (you can tell scriptA to finish the pageload prematurely with the fastcgi_finish_request() function if you really want, but that's unusual)
Does the 3 second sleep even make a difference or will this literally run as fast as the time it takes each cURL request to complete - yes, the sleep will make the loop 3 seconds slower per iteration.
My objective is to time this script and run it as many times as possible in a one minute window without two iterations of it being run at the same time - then make it a daemon that never exits, rather than a cronjob.
I don't want to include the sleep statement if it is not needed. - it's not needed.
I believe what happens is cURL will run each request upon finishing the last - this is correct.
I need to run this script multiple times as minute, preferably as many times in a minute that I can without allowing two instances to run
Your in luck as I wrote a class to handle just such a thing. You can find it on my github here
https://github.com/ArtisticPhoenix/MISC/blob/master/ProcLock.php
I'll also copy the full code at the end of this post.
The basic idea is to create a file, I will call it afile.lock for this example. In this file is recorded the PID, or the process ID of the current process that is ran by cron. Then when cron attempts to run the process again, it checks this lock file and sees if there is a PHP process running that is using this PID.
if there is it updates the modified time of the file (and throws an exception)
if there is not then you are free to create a new instance of the "worker".
As a bonus th modified time of the lock file can be used by the script (whose PID we are tracking) as a way of shutting down in the event the file is not updated, so for example: if cron is stopped, or if the lock file is manually deleted you can set in in such a way that the running script will detect this and self destruct.
So not only can you keep multiple instances from running, you can tell the current instance to die if cron is turned off.
The basic usage is as follows. In the cron file that starts up the "worker"
//define a lock file (this is actually optional)
ProcLock::setLockFile(__DIR__.'/afile.lock');
try{
//if you didn't set a lock file you can pass it in with this method call
ProcLock::lock();
//execute your process
}catch(\Exception $e){
if($e->getCode() == ProcLock::ALREADY_LOCKED){
//just exit or what have you
}else{
//some other exception happened.
}
}
It's basically that easy.
Then in the running process you can every so often check (for example if you have a loop that runs something)
$expires = 90; //1 1/2 minute (you may need a bit of fudge time)
foreach($something as $a=>$b){
$lastAccess = ProcLock::getLastAccess()
if(false == $lastAccess || $lastAccess + $expires < time()){
//if last access is false (no lock file)
//or last access + expiration, is less then the current time
//log something like killed by lock timeout
exit();
}
}
Basically what this says is that either the lock file was deleted wile the process was running, or cron failed to update it before the expiration time. So here we are giving it 90 seconds and cron should be updating the lock file every 60 seconds. As I said the lock file is updated automatically if it's found when calling lock(), which calls canLock() which if it returns true meaning we can lock the process because its not currently locked, then it runs touch($lockfile) which updates the mtime (modified time).
Obviously you can only self kill the process in this way if it is actively checking the access and expiration times.
This script is designed to work both on windows and linux. On windows under certain circumstances the lock file won't properly be deleted (sometimes when hitting ctrl+c in the CMD window), however I have taken great pains to make sure this does not happen, so the class file contains a custom register_shutdown_function that runs when the PHP script ends.
When running something using the ProcLoc in the browser please note that the process id will always be the same no matter the tab its ran in. So if you open one tab that is Process locked, then open another tab, the process locker will see it as the same process and allow it to lock again. To properly run it in a browser and test the locking it must be done using two separate browsers such as crome and firefox. It's not really intended to be ran in the browser but this is one quirk I noticed.
One last note this class is completely static, as you can have only one Process ID per process that is running, which should be obvious.
The tricky parts are
making sure the lock file is disposed of in the event of even critical PHP failures
making sure another process didn't pick up the pid number when it was freed from PHP. This can be done with relative accuracy, in that we can tell if a PHP process is using it, and if so we assume its the process we need, there is much less chance a re-used PID would show up for another process very quickly, even less that it would be another PHP process
making all this work on both Linux and Windows
Lucky for you I have already invested sufficient time in this to do all these things, this is a more generic version of an original lock script I made for my job that we have used in this way successfully for 3 years in maintaining control over various synchronous cron jobs, everything from sFTP upload scanning, expired file clean up to RabbitMq message workers that run for an indefinite period of time.
In anycase here is the full code, enjoy.
<?php
/*
(c) 2017 ArtisticPhoenix
For license information please view the LICENSE file included with this source code GPL3.0.
Proccess Locker
==================================================================
This is a pseudo implementation of mutex since php does not have
any thread synchronization objects
This class uses files to provide locking functionality.
Lock will be released in following cases
1 - user calls unlock
2 - when this lock object gets deleted
3 - when request or script ends
4 - when pid of lock does not match self::$_pid
==================================================================
Only one Lock per Process!
-note- when running in a browser typically all tabs will have the same PID
so the locking will not be able to tell if it's the same process, to get
around this run in CLI, or use 2 diffrent browsers, so the PID numbers are diffrent.
This class is static for the simple fact that locking is done per-proces, so there is no need
to ever have duplate ProcLocks within the same process
---------------------------------------------------------------
*/
final class {
/**
* exception code numbers
* #var int
*/
const DIRECTORY_NOT_FOUND = 2000;
const LOCK_FIRST = 2001;
const FAILED_TO_UNLOCK = 2002;
const FAILED_TO_LOCK = 2003;
const ALREADY_LOCKED = 2004;
const UNKNOWN_PID = 2005;
const PROC_UNKNOWN_PID = 2006;
/**
* process _key
* #var string
*/
protected static $_lockFile;
/**
*
* #var int
*/
protected static $_pid;
/**
* No construction allowed
*/
private function __construct(){}
/**
* No clones allowed
*/
private function __clone(){}
/**
* globaly sets the lock file
* #param string $lockFile
*/
public static function setLockFile( $lockFile ){
$dir = dirname( $lockFile );
if( !is_dir( dirname( $lockFile ))){
throw new Exception("Directory {$dir} not found", self::DIRECTORY_NOT_FOUND); //pid directroy invalid
}
self::$_lockFile = $lockFile;
}
/**
* return global lockfile
*/
public static function getLockFile() {
return ( self::$_lockFile ) ? self::$_lockFile : false;
}
/**
* safe check for local or global lock file
*/
protected static function _chk_lock_file( $lockFile = null ){
if( !$lockFile && !self::$_lockFile ){
throw new Exception("Lock first", self::LOCK_FIRST); //
}elseif( $lockFile ){
return $lockFile;
}else{
return self::$_lockFile;
}
}
/**
*
* #param string $lockFile
*/
public static function unlock( $lockFile = null ){
if( !self::$_pid ){
//no pid stored - not locked for this process
return;
}
$lockFile = self::_chk_lock_file($lockFile);
if(!file_exists($lockFile) || unlink($lockFile)){
return true;
}else{
throw new Exception("Failed to unlock {$lockFile}", self::FAILED_TO_UNLOCK ); //no lock file exists to unlock or no permissions to delete file
}
}
/**
*
* #param string $lockFile
*/
public static function lock( $lockFile = null ){
$lockFile = self::_chk_lock_file($lockFile);
if( self::canLock( $lockFile )){
self::$_pid = getmypid();
if(!file_put_contents($lockFile, self::$_pid ) ){
throw new Exception("Failed to lock {$lockFile}", self::FAILED_TO_LOCK ); //no permission to create pid file
}
}else{
throw new Exception('Process is already running[ '.$lockFile.' ]', self::ALREADY_LOCKED );//there is a process running with this pid
}
}
/**
*
* #param string $lockFile
*/
public static function getPidFromLockFile( $lockFile = null ){
$lockFile = self::_chk_lock_file($lockFile);
if(!file_exists($lockFile) || !is_file($lockFile)){
return false;
}
$pid = file_get_contents($lockFile);
return intval(trim($pid));
}
/**
*
* #return number
*/
public static function getMyPid(){
return ( self::$_pid ) ? self::$_pid : false;
}
/**
*
* #param string $lockFile
* #param string $myPid
* #throws Exception
*/
public static function validatePid($lockFile = null, $myPid = false ){
$lockFile = self::_chk_lock_file($lockFile);
if( !self::$_pid && !$myPid ){
throw new Exception('no pid supplied', self::UNKNOWN_PID ); //no stored or injected pid number
}elseif( !$myPid ){
$myPid = self::$_pid;
}
return ( $myPid == self::getPidFromLockFile( $lockFile ));
}
/**
* update the mtime of lock file
* #param string $lockFile
*/
public static function canLock( $lockFile = null){
if( self::$_pid ){
throw new Exception("Process was already locked", self::ALREADY_LOCKED ); //process was already locked - call this only before locking
}
$lockFile = self::_chk_lock_file($lockFile);
$pid = self::getPidFromLockFile( $lockFile );
if( !$pid ){
//if there is a not a pid then there is no lock file and it's ok to lock it
return true;
}
//validate the pid in the existing file
$valid = self::_validateProcess($pid);
if( !$valid ){
//if it's not valid - delete the lock file
if(unlink($lockFile)){
return true;
}else{
throw new Exception("Failed to unlock {$lockFile}", self::FAILED_TO_UNLOCK ); //no lock file exists to unlock or no permissions to delete file
}
}
//if there was a valid process running return false, we cannot lock it.
//update the lock files mTime - this is usefull for a heartbeat, a periodic keepalive script.
touch($lockFile);
return false;
}
/**
*
* #param string $lockFile
*/
public static function getLastAccess( $lockFile = null ){
$lockFile = self::_chk_lock_file($lockFile);
clearstatcache( $lockFile );
if( file_exists( $lockFile )){
return filemtime( $lockFile );
}
return false;
}
/**
*
* #param int $pid
*/
protected static function _validateProcess( $pid ){
$task = false;
$pid = intval($pid);
if(stripos(php_uname('s'), 'win') > -1){
$task = shell_exec("tasklist /fi \"PID eq {$pid}\"");
/*
'INFO: No tasks are running which match the specified criteria.
'
*/
/*
'
Image Name PID Session Name Session# Mem Usage
========================= ======== ================ =========== ============
php.exe 5064 Console 1 64,516 K
'
*/
}else{
$cmd = "ps ".intval($pid);
$task = shell_exec($cmd);
/*
' PID TTY STAT TIME COMMAND
'
*/
}
//print_rr( $task );
if($task){
return ( preg_match('/php|httpd/', $task) ) ? true : false;
}
throw new Exception("pid detection failed {$pid}", self::PROC_UNKNOWN_PID); //failed to parse the pid look up results
//this has been tested on CentOs 5,6,7 and windows 7 and 10
}
/**
* destroy a lock ( safe unlock )
*/
public static function destroy($lockFile = null){
try{
$lockFile = self::_chk_lock_file($lockFile);
self::unlock( $lockFile );
}catch( Exception $e ){
//ignore errors here - this called from distruction so we dont care if it fails or succeeds
//generally a new process will be able to tell if the pid is still in use so
//this is just a cleanup process
}
}
}
/*
* register our shutdown handler - if the script dies unlock the lock
* this is superior to __destruct(), because the shutdown handler runs even in situation where PHP exhausts all memory
*/
register_shutdown_function(array('\\Lib\\Queue\\ProcLock',"destroy"));

How to avoid race hazard with multiple requests?

In order to protect script form race hazard, I am considering approach described by code sample
$file = 'yxz.lockctrl';
// if file exists, it means that some other request is running
while (file_exists($file))
{
sleep(1);
}
file_put_contents($file, '');
// do some work
unlink($file);
If I go this way, is it possible to create file with same name simultaneously from multiple requests?
I know that there is php mutex. I would like to handle this situation without any extensions (if possible).
Task for the program is to handle bids in auctions application. I would like to process every bid request sequentially. With most possible latency.
From what I understand you want to make sure only a single process at a time is running a certain piece of code. A mutex or similar mechanism could be used for this. I myself use lockfiles to have a solution that works on many platforms and doesn't rely on a specific library only available on Linux etc.
For that, I have written a small Lock class. Do note that it uses some non-standard functions from my library, for instance, to get where to store temporary files etc. But you could easily change that.
<?php
class Lock
{
private $_owned = false;
private $_name = null;
private $_lockFile = null;
private $_lockFilePointer = null;
public function __construct($name)
{
$this->_name = $name;
$this->_lockFile = PluginManager::getInstance()->getCorePlugin()->getTempDir('locks') . $name . '-' . sha1($name . PluginManager::getInstance()->getCorePlugin()->getPreference('EncryptionKey')->getValue()).'.lock';
}
public function __destruct()
{
$this->release();
}
/**
* Acquires a lock
*
* Returns true on success and false on failure.
* Could be told to wait (block) and if so for a max amount of seconds or return false right away.
*
* #param bool $wait
* #param null $maxWaitTime
* #return bool
* #throws \Exception
*/
public function acquire($wait = false, $maxWaitTime = null) {
$this->_lockFilePointer = fopen($this->_lockFile, 'c');
if(!$this->_lockFilePointer) {
throw new \RuntimeException(__('Unable to create lock file', 'dliCore'));
}
if($wait && $maxWaitTime === null) {
$flags = LOCK_EX;
}
else {
$flags = LOCK_EX | LOCK_NB;
}
$startTime = time();
while(1) {
if (flock($this->_lockFilePointer, $flags)) {
$this->_owned = true;
return true;
} else {
if($maxWaitTime === null || time() - $startTime > $maxWaitTime) {
fclose($this->_lockFilePointer);
return false;
}
sleep(1);
}
}
}
/**
* Releases the lock
*/
public function release()
{
if($this->_owned) {
#flock($this->_lockFilePointer, LOCK_UN);
#fclose($this->_lockFilePointer);
#unlink($this->_lockFile);
$this->_owned = false;
}
}
}
Usage
Now you can have two process that run at the same time and execute the same script
Process 1
$lock = new Lock('runExpensiveFunction');
if($lock->acquire()) {
// Some expensive function that should only run one at a time
runExpensiveFunction();
$lock->release();
}
Process 2
$lock = new Lock('runExpensiveFunction');
// Check will be false since the lock will already be held by someone else so the function is skipped
if($lock->acquire()) {
// Some expensive function that should only run one at a time
runExpensiveFunction();
$lock->release();
}
Another alternative would be to have the second process wait for the first one to finish instead of skipping the code.
$lock = new Lock('runExpensiveFunction');
// Process will now wait for the lock to become available. A max wait time can be set if needed.
if($lock->acquire(true)) {
// Some expensive function that should only run one at a time
runExpensiveFunction();
$lock->release();
}
Ram disk
To limit the number of writes to your HDD/SSD with the lockfiles you could crate a RAM disk to store them in.
On Linux you could add something like the following to /etc/fstab
tmpfs /mnt/ramdisk tmpfs nodev,nosuid,noexec,nodiratime,size=1024M 0 0
On Windows you can download something like ImDisk Toolkit and create a ramdisk with that.

php mutex for ram based wordpress cache in php

Im trying to implement a cache for a high traffic wp site in php. so far ive managed to store the results to a ramfs and load them directly from the htaccess. however during peak hours there are mora than one process generatin certain page and is becoming an issue
i was thinking that a mutex would help and i was wondering if there is a better way than system("mkdir cache.mutex")
From what I understand you want to make sure only a single process at a time is running a certain piece of code. A mutex or similar mechanism could be used for this. I myself use lockfiles to have a solution that works on many platforms and doesn't rely on a specific library only available on Linux etc.
For that, I have written a small Lock class. Do note that it uses some non-standard functions from my library, for instance, to get where to store temporary files etc. But you could easily change that.
<?php
class Lock
{
private $_owned = false;
private $_name = null;
private $_lockFile = null;
private $_lockFilePointer = null;
public function __construct($name)
{
$this->_name = $name;
$this->_lockFile = PluginManager::getInstance()->getCorePlugin()->getTempDir('locks') . $name . '-' . sha1($name . PluginManager::getInstance()->getCorePlugin()->getPreference('EncryptionKey')->getValue()).'.lock';
}
public function __destruct()
{
$this->release();
}
/**
* Acquires a lock
*
* Returns true on success and false on failure.
* Could be told to wait (block) and if so for a max amount of seconds or return false right away.
*
* #param bool $wait
* #param null $maxWaitTime
* #return bool
* #throws \Exception
*/
public function acquire($wait = false, $maxWaitTime = null) {
$this->_lockFilePointer = fopen($this->_lockFile, 'c');
if(!$this->_lockFilePointer) {
throw new \RuntimeException(__('Unable to create lock file', 'dliCore'));
}
if($wait && $maxWaitTime === null) {
$flags = LOCK_EX;
}
else {
$flags = LOCK_EX | LOCK_NB;
}
$startTime = time();
while(1) {
if (flock($this->_lockFilePointer, $flags)) {
$this->_owned = true;
return true;
} else {
if($maxWaitTime === null || time() - $startTime > $maxWaitTime) {
fclose($this->_lockFilePointer);
return false;
}
sleep(1);
}
}
}
/**
* Releases the lock
*/
public function release()
{
if($this->_owned) {
#flock($this->_lockFilePointer, LOCK_UN);
#fclose($this->_lockFilePointer);
#unlink($this->_lockFile);
$this->_owned = false;
}
}
}
Usage
Now you can have two process that run at the same time and execute the same script
Process 1
$lock = new Lock('runExpensiveFunction');
if($lock->acquire()) {
// Some expensive function that should only run one at a time
runExpensiveFunction();
$lock->release();
}
Process 2
$lock = new Lock('runExpensiveFunction');
// Check will be false since the lock will already be held by someone else so the function is skipped
if($lock->acquire()) {
// Some expensive function that should only run one at a time
runExpensiveFunction();
$lock->release();
}
Another alternative would be to have the second process wait for the first one to finish instead of skipping the code.
$lock = new Lock('runExpensiveFunction');
// Process will now wait for the lock to become available. A max wait time can be set if needed.
if($lock->acquire(true)) {
// Some expensive function that should only run one at a time
runExpensiveFunction();
$lock->release();
}
Ram disk
To limit the number of writes to your HDD/SSD with the lockfiles you could create a RAM disk to store them in.
On Linux you could add something like the following to /etc/fstab
tmpfs /mnt/ramdisk tmpfs nodev,nosuid,noexec,nodiratime,size=1024M 0 0
On Windows you can download something like ImDisk Toolkit and create a ramdisk with that.
I agree with #gries, a reverse proxy is going to be a really good bang-for-the-buck way to get high performance out of a high-volume Wordpress site. I've leveraged Varnish with quite a lot of success, though I suspect you can do so with nginx as well.

Executing functions in parallel

I have a function that needs to go over around 20K rows from an array, and apply an external script to each. This is a slow process, as PHP is waiting for the script to be executed before continuing with the next row.
In order to make this process faster I was thinking on running the function in different parts, at the same time. So, for example, rows 0 to 2000 as one function, 2001 to 4000 on another one, and so on. How can I do this in a neat way? I could make different cron jobs, one for each function with different params: myFunction(0, 2000), then another cron job with myFunction(2001, 4000), etc. but that doesn't seem too clean. What's a good way of doing this?
If you'd like to execute parallel tasks in PHP, I would consider using Gearman. Another approach would be to use pcntl_fork(), but I'd prefer actual workers when it's task based.
The only waiting time you suffer is between getting the data and processing the data. Processing the data is actually completely blocking anyway (you just simply have to wait for it). You will not likely gain any benefits past increasing the number of processes to the number of cores that you have. Basically I think this means the number of processes is small so scheduling the execution of 2-8 processes doesn't sound that hideous. If you are worried about not being able to process data while retrieving data, you could in theory get your data from the database in small blocks, and then distribute the processing load between a few processes, one for each core.
I think I align more with the forking child processes approach for actually running the processing threads. There is a brilliant demonstration in the comments on the pcntl_fork doc page showing an implementation of a job daemon class
http://php.net/manual/en/function.pcntl-fork.php
<?php
declare(ticks=1);
//A very basic job daemon that you can extend to your needs.
class JobDaemon{
public $maxProcesses = 25;
protected $jobsStarted = 0;
protected $currentJobs = array();
protected $signalQueue=array();
protected $parentPID;
public function __construct(){
echo "constructed \n";
$this->parentPID = getmypid();
pcntl_signal(SIGCHLD, array($this, "childSignalHandler"));
}
/**
* Run the Daemon
*/
public function run(){
echo "Running \n";
for($i=0; $i<10000; $i++){
$jobID = rand(0,10000000000000);
while(count($this->currentJobs) >= $this->maxProcesses){
echo "Maximum children allowed, waiting...\n";
sleep(1);
}
$launched = $this->launchJob($jobID);
}
//Wait for child processes to finish before exiting here
while(count($this->currentJobs)){
echo "Waiting for current jobs to finish... \n";
sleep(1);
}
}
/**
* Launch a job from the job queue
*/
protected function launchJob($jobID){
$pid = pcntl_fork();
if($pid == -1){
//Problem launching the job
error_log('Could not launch new job, exiting');
return false;
}
else if ($pid){
// Parent process
// Sometimes you can receive a signal to the childSignalHandler function before this code executes if
// the child script executes quickly enough!
//
$this->currentJobs[$pid] = $jobID;
// In the event that a signal for this pid was caught before we get here, it will be in our signalQueue array
// So let's go ahead and process it now as if we'd just received the signal
if(isset($this->signalQueue[$pid])){
echo "found $pid in the signal queue, processing it now \n";
$this->childSignalHandler(SIGCHLD, $pid, $this->signalQueue[$pid]);
unset($this->signalQueue[$pid]);
}
}
else{
//Forked child, do your deeds....
$exitStatus = 0; //Error code if you need to or whatever
echo "Doing something fun in pid ".getmypid()."\n";
exit($exitStatus);
}
return true;
}
public function childSignalHandler($signo, $pid=null, $status=null){
//If no pid is provided, that means we're getting the signal from the system. Let's figure out
//which child process ended
if(!$pid){
$pid = pcntl_waitpid(-1, $status, WNOHANG);
}
//Make sure we get all of the exited children
while($pid > 0){
if($pid && isset($this->currentJobs[$pid])){
$exitCode = pcntl_wexitstatus($status);
if($exitCode != 0){
echo "$pid exited with status ".$exitCode."\n";
}
unset($this->currentJobs[$pid]);
}
else if($pid){
//Oh no, our job has finished before this parent process could even note that it had been launched!
//Let's make note of it and handle it when the parent process is ready for it
echo "..... Adding $pid to the signal queue ..... \n";
$this->signalQueue[$pid] = $status;
}
$pid = pcntl_waitpid(-1, $status, WNOHANG);
}
return true;
}
}
you can use "PTHREADS"
very easy to install and works great on windows
download from here -> http://windows.php.net/downloads/pecl/releases/pthreads/2.0.4/
Extract the zip file and then
move the file 'php_pthreads.dll' to php\ext\ directory.
move the file 'pthreadVC2.dll' to php\ directory.
then add this line in your 'php.ini' file:
extension=php_pthreads.dll
save the file.
you just done :-)
now lets see example of how to use it:
class ChildThread extends Thread {
public $data;
public function run() {
/* Do some expensive work */
$this->data = 'result of expensive work';
}
}
$thread = new ChildThread();
if ($thread->start()) {
/*
* Do some expensive work, while already doing other
* work in the child thread.
*/
// wait until thread is finished
$thread->join();
// we can now even access $thread->data
}
for more information about PTHREADS read php docs here:
PHP DOCS PTHREADS
if you'r using WAMP like me, then you should add 'pthreadVC2.dll' into
\wamp\bin\apache\ApacheX.X.X\bin
and also edit the 'php.ini' file (same path) and add the same line as before
extension=php_pthreads.dll
GOOD LUCK!
What you are looking for is parallel which is a succinct concurrency API for PHP 7.2+
$runtime = new \parallel\Runtime();
$future = $runtime->run(function() {
for ($i = 0; $i < 500; $i++) {
echo "*";
}
return "easy";
});
for ($i = 0; $i < 500; $i++) {
echo ".";
}
printf("\nUsing \\parallel\\Runtime is %s\n", $future->value());
Output:
.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*..*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
Using \parallel\Runtime is easy
Have a look at pcntl_fork. This allows you to spawn child processes which can then do the separate work that you need.
Not sure if a solution for your situation but you can redirect the output of system calls to a file, thus PHP will not wait until the program is finished. Although this may result in overloading your server.
http://www.php.net/manual/en/function.exec.php - If a program is started with this function, in order for it to continue running in the background, the output of the program must be redirected to a file or another output stream. Failing to do so will cause PHP to hang until the execution of the program ends.
There's Guzzle with its concurrent requests
use GuzzleHttp\Client;
use GuzzleHttp\Promise;
$client = new Client(['base_uri' => 'http://httpbin.org/']);
$promises = [
'image' => $client->getAsync('/image'),
'png' => $client->getAsync('/image/png'),
'jpeg' => $client->getAsync('/image/jpeg'),
'webp' => $client->getAsync('/image/webp')
];
$responses = Promise\Utils::unwrap($promises);
There's the overhead of promises; but more importantly Guzzle only works with HTTP requests and it works with version 7+ and frameworks like Laravel.

Categories