Multithreaded File Processing in PHP with pthreads - php

I'm trying to create a script that process a number of files simultanously, the rule is, each file can only be processed once, and the input file is deleted after it has been processed. I created this script :
<?php
// Libraries for reading files
require_once "spooler.php";
// Configuration section ///////////////////////////////////////////////////////
$config["data"] = "data";
$config["threads"] = 20;
$config["timer"] = 1;
// Array to store currently processed files
$config["processed_files"] = array();
// Processing section //////////////////////////////////////////////////////////
$timer = 0;
$pool = new Pool($config["threads"], \ProcessingWorker::class);
while (true) {
// Read a number of files from the data folder according to the number of thread
$files = Spooler::read_spool_file($config["data"], $config["threads"]);
foreach ($files as $file) {
// Check if the file is already processed
if (in_array($file, $config["processed_files"])) continue;
// Submit the file to the worker
echo "Submitting $file\n";
$config["processed_files"][$file] = $file;
$pool->submit(new ProcessingJob($config, $file));
}
sleep($config["timer"]);
$timer++;
}
$pool->shutdown();
// Processing thread section ///////////////////////////////////////////////////
class ProcessingJob extends Stackable {
private $config;
private $file;
public function __construct($config, $file)
{
$this->config = $config;
$this->file = $file;
$this->complete = false;
}
public function run()
{
echo "Processing $this->file\n";
// Pretend we're doing something that takes time
sleep(mt_rand(1, 10));
file_put_contents("_LOG", $this->file."\n", FILE_APPEND);
// Delete the file
#unlink($this->file);
// Remove the file from the currently processing list
unset($this->config["processed_files"][$this->file]);
}
}
class ProcessingWorker extends Worker {
public function run() {}
}
However, this code doesn't work well, it doesn't process the same files twice, but instead sometimes it skip processing some files. Here's the file list it should be processed, but it only process these files.
Where am I doing it wrong?

Output to the log file isn't synchronized, it's highly likely that two threads are concurrently calling file_put_contents on the log file and so corrupting it's output.
You should not write to a log file in this way.
If $config['processed_files'] is intended to be manipulated by multiple contexts then it should be a thread safe structure descended from pthreads, not a plain PHP array.

Related

When using inotify_read, pnctl_signal does not interrupt the inotify_read

I am building a log parser program in PHP. Log parser reads the data from the log file created by ProFTPD and then runs some actions if it detects specific commands. To be able to detect changes in the log file, I am using Inotify. If log file gets too large, I want to rotate the log by sending a signal to the log parser to finish processing the current file and then terminate the log parser. Logrotate would then restart the log parser again after it makes sure that the original file that is being read is emptied.
The problem is that when I use Inotify and when the inotify is in blocking state, the interrupts won't work.
For example:
#!/usr/bin/php -q
<?php
declare(ticks = 1);
$log_parser = new LogParser();
$log_parser->process_log();
class LogParser {
private $ftp_log_file = '/var/log/proftpd/proftpd.log';
# file descriptor for the log file
private $fd;
# inotify instance
private $inotify_inst;
# watch id for the inotifier
private $watch_id;
public function process_log() {
// Open an inotify instance
$this->inotify_inst = inotify_init();
$this->watch_id = inotify_add_watch($this->inotify_inst, $this->ftp_log_file, IN_MODIFY);
$this->fd = fopen($this->ftp_log_file, 'r');
if ($this->fd === false)
die("unable to open $this->ftp_log_file!\n");
pcntl_signal(SIGUSR1, function($signal) {
$this->sig_handler($signal);
});
while (1) {
# If the thread gets blocked here, the signals do not work
$events = inotify_read($this->inotify_inst);
while ($line = trim(fgets($this->fd))) {
// Parse the log ...
}
}
fclose($this->fd);
// stop watching our directory
inotify_rm_watch($this->inotify_inst, $this->watch_id);
// close our inotify instance
fclose($this->inotify_inst);
}
private function sig_handler($signo) {
switch ($signo) {
case SIGUSR1:
// Do some action ...
}
}
}
I know that one solution could be that I start the parent process and then add the signal handler to that parent process. The parent process should start the log parser and the log parser would get blocked by inotify_read, but parent process wouldn't, but was wondering if there is a solution not involving the parent process - if the inotify is able to support interrupts?
Thanks
Found a solution here: php inotify blocking but with timeout
Final code:
#!/usr/bin/php -q
<?php
declare(ticks = 1);
$log_parser = new LogParser();
$log_parser->process_log();
class LogParser {
private $ftp_log_file = '/var/log/proftpd/proftpd.log';
# file descriptor for the log file
private $fd;
# inotify instance
private $inotify_inst;
# watch id for the inotifier
private $watch_id;
public function process_log() {
// Open an inotify instance
$this->inotify_inst = inotify_init();
stream_set_blocking($this->inotify_inst, false);
$this->watch_id = inotify_add_watch($this->inotify_inst, $this->ftp_log_file, IN_MODIFY);
$this->fd = fopen($this->ftp_log_file, 'r');
if ($this->fd === false)
die("unable to open $this->ftp_log_file!\n");
pcntl_signal(SIGUSR1, function($signal) {
$this->sig_handler($signal);
});
while (1) {
while (1) {
$r = array($this->inotify_inst);
$timeout = 60;
$w = array();
$e = array();
$time_left = stream_select($r, $w, $e, $timeout);
if ($time_left != 0) {
$events = inotify_read($this->inotify_inst);
if ($events) {
break;
}
}
}
while ($line = trim(fgets($this->fd))) {
// Parse the log ...
}
}
fclose($this->fd);
// stop watching our directory
inotify_rm_watch($this->inotify_inst, $this->watch_id);
// close our inotify instance
fclose($this->inotify_inst);
}
private function sig_handler($signo) {
switch ($signo) {
case SIGUSR1:
// Do some action ...
}
}
}
The suggested solution does not block interrupts and it also sets the thread in a non blocking state.

PHP Eval alternative to include a file

I am currently running a queue system with beanstalk + supervisor + PHP.
I would like my workers to automatically die when a new version is available (basically code update).
My current code is as follow
class Job1Controller extends Controller
{
public $currentVersion = 5;
public function actionIndex()
{
while (true) {
// check if a new version of the worker is available
$file = '/config/params.php';
$paramsContent = file_get_contents($file);
$params = eval('?>' . file_get_contents($file));
if ($params['Job1Version'] != $this->currentVersion) {
echo "not the same version, exit worker \n";
sleep(2);
exit();
} else {
echo "same version, continue processing \n";
}
}
}
}
When I will update the code, the params file will change with a new version number which will force the worker to terminate. I cannot use include as the file will be loaded in memory in the while loop. Knowing that the file params.php isn't critical in terms of security I wanted to know if there was another way of doing so?
Edit: the params.php looks as follow:
<?php
return [
'Job1Version' => 5
];
$params = require($file);
Since your file has a return statement, the returned value will be passed along.
After few tests I finally managed to find a solution which doesn't require versionning anymore.
$reflectionClass = new \ReflectionClass($this);
$lastUpdatedTimeOnStart = filemtime($reflectionClass->getFileName());
while (true) {
clearstatcache();
$reflectionClass = new \ReflectionClass($this);
$lastUpdatedTime = filemtime($reflectionClass->getFileName());
if ($lastUpdatedTime != $lastUpdatedTimeOnStart) {
// An update has been made, exit
} else {
// worker hasn't been modified since running
}
}
Whenever the file will be updated, the worker will automatically exit
Thanks to #Rudie who pointed me into the right direction.

How to touch a file and read the modification date in PHP on Linux?

I need to touch a file from within one PHP script and read the last time this file was touched from within another script, but no matter how I touch the file and read out the modification date, the modification date doesn't change, below is a test file.
How can I touch the log file and thus change the modification date, and then read this modification date?
class TestKeepAlive {
protected $log_file_name;
public function process() {
$this->log_file_name = 'test_keepalive_log.txt';
$this->_writeProcessIdToLogFile();
for ($index = 0; $index < 10; $index++) {
echo 'test' . PHP_EOL;
sleep(1);
touch($this->log_file_name);
$this->_touchLogFile();
$dateTimeLastTouched = $this->_getDateTimeLogFileLastTouched();
echo $dateTimeLastTouched . PHP_EOL;
}
}
protected function _touchLogFile() {
//touch($this->log_file_name);
exec("touch {$this->log_file_name}");
}
protected function _getDateTimeLogFileLastTouched() {
return filemtime($this->log_file_name);
}
protected function _writeProcessIdToLogFile() {
file_put_contents($this->log_file_name, getmypid());
}
}
$testKeepAlive = new TestKeepAlive();
$testKeepAlive->process();
You should use the function clearstatcache found in the PHP Manual
PHP caches the information those functions(filemtime) return in order
to provide
faster performance. However, in certain cases, you may want to clear the cached
information. For instance, if the same file is being checked multiple times within a
single script, and that file is in danger of being removed or changed during that
script's operation, you may elect to clear the status cache. In these cases, you can
use the clearstatcache() function to clear the information that PHP caches about a file.
Function:
protected function _getDateTimeLogFileLastTouched() {
clearstatcache();
return filemtime($this->log_file_name);
}

PHP Counter Using OOP

I'm new to OOP terminology, I am trying to create a class that make a hit counter.
I try the code below but it create just a counter.txt page with inside value 1. I dont know why its not incrementing.
class LOGFILE {
public function READ($FileName) {
$handle = fopen($FileName, 'r');
$fread = file_get_contents($FileName);
return $fread;
fclose($handle);
}
public function WRITE($FileName, $FileData) {
$handle = fopen($FileName, 'w');
$FileData = $fread +1;
fwrite($handle, $FileData);
fclose($handle);
}
}
$logfile = new LOGFILE();
$logfile -> WRITE("counter.txt",$FileData);
echo $logfile -> READ("counter.txt");
The reason is that $fread is local variable for both READ and WRITE methods. You need to make it private global variable for your class:
class LOGFILE {
private $fread;
public function READ($FileName) {
$this->fread = file_get_contents($FileName);
return $this->fread;
}
public function WRITE($FileName) {
$this->READ($FileName);
$handle = fopen($FileName, 'w');
$FileData = $this->fread +1;
fwrite($handle, $FileData);
fclose($handle);
}
}
$logfile = new LOGFILE();
$logfile -> WRITE("counter.txt");
echo $logfile -> READ("counter.txt");
Note: I have removed fopen and fclose because file_get_contents does not need it. In write you can use file_put_contents. Removed not used variable $FileData too. It's always a good practice to create variables methods and classes when they are needed.
Also take a look at best practices how to name your classes, variables, methods and so on. Here's best guide, IMO.
Let's start going over the corrected code and see what was missing:
<?php
class LOGFILE {
public function READ($FileName) {
$handle = fopen($FileName, 'r');
$fread = fgets($handle, 8192);
fclose($handle);
return $fread;
}
public function WRITE($FileName, $FileData) {
$counter = $this->READ($FileName);
$handle = fopen($FileName, 'w');
fwrite($handle, $FileData + $counter);
fclose($handle);
}
}
$logfile = new LOGFILE();
$FileData = 1;
$logfile -> WRITE("counter.txt",$FileData);
echo $logfile -> READ("counter.txt")."\n";
$logfile -> WRITE("counter.txt",$FileData);
echo $logfile -> READ("counter.txt")."\n";
?>
use of fgets instead of file_get_contents in READ (you can choose to use file_get_contents but I rather stay consistent with the other function that uses fopen)
use of READ inside function WRITE (the principal of code-reuse)
open of file with write permissions in WRITE: 'w'
init $FileData = 1;
no need to hold a private member: $fread
most important: do not write statements after return (like you did in READ) - statements that are written after return will not be executed!
This solution was tested successfully.
OOP must be used where it's needed. You need a simple thing so, no need of OOP.
<?php
function addValue($file='counter.txt', $amount=1) {
if( false == is_file($file) ) {
return false;
}
$initial = file_get_contents($file);
return #file_put_contents($initial+$amount);
}
addValue();
?>
Test your OOP knowledge on something complex, like a shopping cart or some other concept.
EDIT // so, if you need a simple example that looks complex, here you go :)
<?php
class log {
public $file = '';
private $amount = 0;
public function __construct( $file ) {
$this->file = $file;
$this->amount = 1;
}
public function makeAdd() {
$initial = file_get_contents($this->file);
return #file_put_contents($this->file, $initial + $this->amount);
}
function __call($f, $args) {
switch( $f ) {
case 'add':
if(isset($args[0]) && !empty($args[0])) {
$this->amount = (int)$args[0];
}
if( $this->amount == 0 ) {
throw new Exception('Not a valid amount.');
}
return $this->makeAdd();
break;
}
}
}
try {
// create log
$L = new log('count.txt');
// this will add 2
var_dump($L->add(2));
// this will also add 2
var_dump($L->add());
// until you rewrite the amount
var_dump($L->add(1));
// final result -> 5
} catch(Exception $e) {
die($e->getMessage());
}
?>
Good luck!
Use UpperCamelCase for class names. LogFile, not LOGFILE. When you have a variable and the most interesting thing about it is that it's expected to hold a reference to something that is_a LogFile you should name it logFile.
Use lowerCamelCase for functions. read and write, not READ and WRITE
No spaces around the arrow operator
Code after a return statement in a method can never be reached, so delete it.
read() does not use the handle returned by fopen, so don't call fopen
the temp variable $freed doesn't help us understand the code, so we can lose it
read is a slightly unconventional name. If we rename the function to getCount it will be more obvious what it does.
You said you wanted to make a hit counter. So rename the class from LogFile to HitCounter, and the variable to hitCounter
the $FileData parameter to write doesn't get used because the variable is re-assigned inside the function. We can lose it.
The write method is supposed to add one to the number in the file. Write doesn't really express that. Rename it to increment.
Use a blank line between functions. The procedural code at the end should generally be in a separate file, but here we can just add a couple of extra lines. Delete the blanks between the last three lines of code.
Don't repeat yourself - we shouldn't have to mention 'counter.txt' more than once. OOP is all about combining data structures and behaviour into classes, so make a class private variable to hold the filename, and pass it via a constructor
$fread doesn't exist in the scope of increment, so we can't use it. This won't work. Replace it with a call to to getCount()
Swap the first two lines of increment, so we're not doing two concurent accesses to the same file, although we might be running inside a server that's running our script twice and still doing two concurrent accesses.
Rename the variable $FileData to $count, since that's what it is.
Replace the fopen,fwrite,fclose sequence with file_put_contents, since that does the same thing and is more succinct.
We need tag, since our php code continues to the end of the file.
That leaves us with:
<?php
class HitCounter {
private $fileName;
public function __construct($fileName){
$this->fileName = $fileName;
}
public function getCount() {
return file_get_contents($this->fileName);
}
public function increment() {
$count = $this->getCount() + 1;
file_put_contents($this->fileName, $count);
}
}
$hitCounter = new HitCounter("counter.txt");
$hitCounter->increment();
echo $hitCounter->getCount();
You can create a static counter and increment it each time (instead of create file)
<?php
class CountClass {
public static $counter = 0;
function __construct() {
self::$counter++;
}
}
new CountClass();
new CountClass();
echo CountClass::$counter;
?>

Flag for just create an empty file if not exists?

Which flag should i use for create a file if not exists? Please not that i'll close the point right after fopen() because the "hard part" (decoding the encrypted content) is carried by load() function (decoding logic is not shown):
Class MyClass
{
protected $filename, $data;
public function __construct($filename)
{
$this->filename = $filename;
// Create if not exists
if(!file_exists($this->filename))
{
$fp = fopen($this->filename, '');
fclose($fp);
}
$this->load();
}
public function load()
{
$data = file_get_contents($this->filename);
$this->data = $data === false ? array() : $data;
}
}
wb is about all you'd need. Open a file for writing, truncate any file which already exists, set the file pointer to the start of this new file, and enable binary mode (which prevents PHP from translating line-ending characters on certain platforms like Windows).
'a+', here manual. Ensure that permissions for the dir is ok.

Categories