PHP parent/child timing communication - php

I am working on an app which will allow me to login to a remote telnet server and monitor statistics. The problem is that the telnet server has a minimum refresh rate of 10 seconds, and the refresh rate varies slightly depending on server load (the report itself has this refresh rate, not the server). I need the front-end of this system to refresh more often than every 10 seconds (5 seconds minimum). I have somewhat been able to accomplish this by forking, but eventually the timings synchronize (it's a gradual process which I am assuming is due to remote server loads).
Child Code:(i have removed quite a bit of code relating to how the app logs in and accesses the report, but it is basically key-presses through 7 menus to get to the page which refreshes - I can include if needed):
// 1 - Telnet Handshaking and initial screen load
$sHandshaking = '<IAC><WILL><OPT_BINARY><IAC><DO><OPT_BINARY><IAC><WILL><OPT_TSPEED><IAC><SB><OPT_TSPEED><IS>38400,38400<IAC><SE>';
// 2-4 - Keypresses to get to report (login, menus, etc. - Removed)
// Loop and cache
while(1){
// 4 - View Report
$oVT100->listen();
$reference = $temp;
$screen = $oVT100->getScreenFull();
if(empty($screen)){
Failed:
echo "FAILED";
file_put_contents($outFile,array('html'=>"<div class=\"header\"><font color='red'>Why are things always breaking?!<font color='red'></div>"));
goto restartIt; // If screen does not contain valid report, assume logout and start at top
}else{
$screen = parseReport($oVT100->getScreenFull());
$temp = json_decode($screen);
// Check old report file, if different save, else sleep a bit
$currentFile = file_get_contents($outFile);
if($screen !== $currentFile){
file_put_contents($outFile,$screen);
sleep(5);
}else{
sleep(1);
//usleep(500000);
}
}
}
As you can see with the child's code above, it logs in then infinite loops the report. It then writes the screen out to a cache file (if it is different from the existing file). As a dirty workaround to the refresh issue, I wrote a parent to fork:
$pids = array();
for($i = 0; $i < 2; $i++) {
sleep(5);
$pids[$i] = pcntl_fork();
if(!$pids[$i]) {
require_once(dirname(__FILE__).'/checkQueue.php');
exit();
}
}
for($i = 0; $i < 2; $i++) {
pcntl_waitpid($pids[$i], $status, WUNTRACED);
}
I immediately noticed the timing offset and had to spawn three children instead of two to keep under 5 seconds; but the timing worked for a few days. Eventually, all the children were updating the file at the same time and I had to restart the parent. What would be the best solution to monitor the child processes so that I maintain a refresh interval of less than five seconds?
[EDIT]
This is not a running log, it is a file which contains current call statistics for a call center. The data in the cache file needs to be no less than 5 seconds old at any time, but eventually all of the children sync and write to the log at nearly the same time. The issue isn't really with the local file, it is inconsistent remote server response times which eventually leads to the child processes running getting their report at the same time.

I'm hoping there's a better solution, but here's how I solved it - I feel kind of stupid for not having thought of this in the beginning:
$currentFile = file_get_contents($outFile);
if($screen !== $currentFile){
if(time()-filemtime($outFile) > 2){
file_put_contents($outFile,$screen);
sleep(5);
}else{
echo "Failed, sleeping\r";
sleep(2);
}
}else{
sleep(1);
//usleep(500000);
}
basically, I check the file write time of the cache file and if it is less than 2 seconds ago, sleep and re-execute loop. I was somewhat hoping there would be a solution contained within the parent app but I guess this works...

Related

Long PHP script runs multiple times

I have a products database that synchronizes with product data ever morning.
The process is very clear:
Get all products from database by query
Loop through all products, and get and xml from the other server by product_id
Update data from xml
Log the changes to file.
If I query a low amount of items, but limiting it to 500 random products for example, everything goes fine. But when I query all products, my script SOMETIMES goes on the fritz and starts looping multiple times. Hours later I still see my log file growing and products being added.
I checked everything I could think of, for example:
Are variables not used twice without overwriting each other
Does the function call itself
Does it happen with a low amount of products too: no.
The script is called using a cronjob, are the settings ok. (Yes)
The reason that makes it especially weird is that it sometimes goes right, and sometimes it doesnt. Could this be some memory problem?
EDIT
wget -q -O /dev/null http://example.eu/xxxxx/cron.php?operation=sync its in webmin called on a specific hour and minute
Code is hundreds of lines long...
Thanks
You have:
max_execution_time disabled. Your script won't end until the process is complete for as long as it needed.
memory_limit disabled. There is no limit to how much data stored in memory.
500 records were completed without issues. This indicates that the scripts completes its process before the next cronjob iteration. For example, if your cron runs every hour, then the 500 records are processed in less than an hour.
If you have a cronjob that is going to process large amount of records, then consider adding lock mechanism to the process. Only allow the script to run once, and start again when the previous process is complete.
You can create script lock as part of a shell script before executing your php script. Or, if you don't have an access to your server you can use database lock within the php script, something like this.
class ProductCronJob
{
protected $lockValue;
public function run()
{
// Obtain a lock
if ($this->obtainLock()) {
// Run your script if you have valid lock
$this->syncProducts();
// Release the lock on complete
$this->releaseLock();
}
}
protected function syncProducts()
{
// your long running script
}
protected function obtainLock()
{
$time = new \DateTime;
$timestamp = $time->getTimestamp();
$this->lockValue = $timestamp . '_syncProducts';
$db = JFactory::getDbo();
$lock = [
'lock' => $this->lockValue,
'timemodified' => $timestamp
];
// lock = '0' indicate that the cronjob is not active.
// Update #__cronlock set lock = '', timemodified = '' where name = 'syncProducts' and lock = '0'
// $result = $db->updateObject('#__cronlock', $lock, 'id');
// $lock = SELECT * FROM #__cronlock where name = 'syncProducts';
if ($lock !== false && (string)$lock !== (string)$this->lockValue) {
// Currently there is an active process - can't start a new one
return false;
// You can return false as above or add extra logic as below
// Check the current lock age - how long its been running for
// $diff = $timestamp - $lock['timemodified'];
// if ($diff >= 25200) {
// // The current script is active for 7 hours.
// // You can change 25200 to any number of seconds you want.
// // Here you can send notification email to site administrator.
// // ...
// }
}
return true;
}
protected function releaseLock()
{
// Update #__cronlock set lock = '0' where name = 'syncProducts'
}
}
Your script is running for quite some time (~45m) and wget think it's "timing out" since you don't return any data. By default wget will have a 900s timeout value and a retry count of 20. So first you should probably change your wget command to prevent this:
wget --tries=0 --timeout=0 -q -O /dev/null http://example.eu/xxxxx/cron.php?operation=sync
Now removing the timeout could lead to other issue, so instead you could send (and flush to force webserver to send it) data from your script to make sure wget doesn't think the script "timed out", something every 1000 loops or something like that. Think of this as a progress bar...
Just keep in mind that you will hit an issue when the run time will get close to your period as 2 crons will run in parallel. You should optimize your process and/or have a lock mechanism maybe?
I see two possibilities:
- chron calls the script much more often
- script takes too long somehow.
you can try estimate the time a single iteration of the loop takes.
this can be done with time(). perhaps the result is suprising, perhaps not. you can probably get the number of results too. multiply the two, that way you will have an estimate of how long the process should take.
$productsToSync = $db->loadObjectList();
and
foreach ($productsToSync AS $product) {
it seems you load every result into an array. this wont work for huge databases because obviously a million rows wont fit in memory. you should just get one result at a time. with mysql there are methods that just fetch one thing at a time from the resource, i hope yours allows the same.
I also see you execute another query each iteration of the loop. this is something I try to avoid. perhaps you can move this to after the first query has ended and do all of those in one big query? otoh this may bite my first suggestion.
also if something goes wrong, try to be paranoid when debugging. measure as much as you can. time as much as you can when its a performance issue. put the timings in you log file. usually you will find the bottleneck.
I solved the problem myself. Thanks for all the replies!
My MySQL timed out, that was the problem. As soon as I added:
ini_set('mysql.connect_timeout', 14400);
ini_set('default_socket_timeout', 14400);
to my script the problem stopped. I really hope this helps someone. Ill upvote all the locking answers, because those were very helpful!

creating schedule task without Cron job

Scheduled task needs to be created but its not possible to use Cron job (there is a warning from hosting provider that "running the cron Job more than once within a 45-minute period is a infraction of their rules and could result in shutting down the account."
php script (which insert data from txt to mysql database) should be executed every minute, ie this link should be called http://www.myserver.com/ImportCumulusFile.php?type=dayfile&key=letmein&table=Dayfile&file=./data/Jan10log.txt
Is there any other way?
There are multiple ways of doing repetitive jobs. Some of the ways that I can think about right away are:
Using: https://www.setcronjob.com/
Use an external site like this to fire off your url at set intervals
Using meta refresh. More here. You'd to have to open the page and leave it running.
Javascript/Ajax refresh. Similar to the above example.
Setting up a cron job. Most shared hosting do provide a way to set up cron jobs. Have a look at the cPanel of your hosting.
if you have shell access you could execute a php script via the shell
something like this would be an endless loop, that would sleep 60 seconds execute, collect garbage and repeat until the end of time.
while(true) {
sleep(60);
//script here
//end your script
}
or you could do a "poor mans cron" with ajax or meta refresh. i've done it before. basically, you just place a redirect with either javascript or html's meta refresh at the beggining of your script. access this script from your browser, and just leave it open. it'll refresh every 60 seconds, just like a cronjob.
yet another alternative to a cronjob, would be a bash script such as:
#!/bin/bash
while :
do
sleep 60
wget http://127.0.0.1/path/to/cronjob.php -O Temp --delete-after
done
all this being said, you probably will get caught by the host and get terminated anyway.
So your best solution:
go and sign up for a 5-10 dollar a month vps, and say good bye to shared hosting and hello to running your own little server.
if you do this, you can even stop using crappy php and use facebook's hhvm instead and enjoy its awesome performance.
There's a free service at
http://cron-job.org
That lets you set up a nice little alternative.
Option A
An easy way to realize it would be to create a file/database entry containing the execution time of your php script:
<?php
// crons.php
return [
'executebackup.php' => 1507979485,
'sendnewsletter.php' => 1507999485
];
?>
And on every request made through your visitors you check the current time and if its higher you include your php script:
<?php
// cronpixel.php
$crons = #include 'cache/crons.php';
foreach ($crons as $script => $time) {
if ($time < time()) {
// create lock to avoid race conditions
$lock = 'cache/' . md5($script) . '.lock';
if (file_exists($lock) || !mkdir($lock)) {
continue;
}
// start your php script
include($script);
// now update crons.php
$crons[ $script ] += 86400; // tomorrow
file_put_contents('cache/crons.php', '<?php return ' . var_export($crons, true) . '; ?' . '>')
// finally delete lock
rmdir($lock);
}
}
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
// image data
$im = imagecreate(1, 1);
$blk = imagecolorallocate($im, 0, 0, 0);
imagecolortransparent($im, $blk);
// image output
header("Content-type: image/gif");
imagegif($im);
// free memory
imagedestroy($im);
?>
Note: It will be rarily called on the exact second, because you do not know when your visitor will open your page (maybe 2 seconds later). So it makes sense to set the new time for the next day not through adding 86400 seconds. Instead use mktime.
Option B
This is a little project I realized in the past, that is similar to #r3wt 's idea, but covers race conditions and works on exact times like a cronjob would do in a scheduler without hitting the max_execution_time. And it works most of the time without the need to resurrect it (as done through visitors in Option A).
Explanation:
The script writes a lock file (to avoid race conditions) for the 15th, 30th, 45th and 60th second of a minute:
// cron monitoring
foreach ($allowed_crons as $cron_second) {
$cron_filename = 'cache/' . $cron_second . '_crnsec_lock';
// start missing cron requests
if (!file_exists($cron_filename)) {
cron_request($cron_second);
}
// restart interrupted cron requests
else if (filemtime($cron_filename) + 90 < time()) {
rmdir($cron_filename);
cron_request($cron_second);
}
}
Every time a lock file is missing the script creates it and uses sleep() to reach the exact second:
if (file_exists($cron_filename) || !mkdir($cron_filename)) {
return;
}
// add one minute if necessary
$date = new DateTime();
$cron_date = new DateTime();
$cron_date->setTime($cron_date->format('H'), $cron_date->format('i'), $sec);
$diff = $date->diff($cron_date);
if ($diff->invert && $diff->s > 0) {
$cron_date->setTime($cron_date->format('H'), $cron_date->format('i') + 1, $sec);
}
$diff = $date->diff($cron_date);
// we use sleep() as time_sleep_until() starts one second to early (https://bugs.php.net/bug.php?id=69044)
sleep($diff->s);
After waking up again, it sends a request to itself through fopen():
// note: filter_input returns the unchanged SERVER var (http://php.net/manual/de/function.filter-input.php#99124)
// note: filter_var is unsecure (http://www.d-mueller.de/blog/why-url-validation-with-filter_var-might-not-be-a-good-idea/)
$url = 'http' . isSecure() . '://' . filter_input(INPUT_SERVER, 'HTTP_HOST', FILTER_SANITIZE_URL) . htmlspecialchars($request_uri, ENT_QUOTES, 'UTF-8');
$context = stream_context_create(array(
'http' => array(
'timeout' => 1.0
)
));
// note: return "failed to open stream: HTTP request failed!" because timeout < time_sleep_until
if ($fp = #fopen($url, 'r', false, $context)) {
fclose($fp);
}
rmdir($cron_filename);
By that it calls itself infinitely and you are able to define different starting times:
if (isset($_GET['cron_second'])) {
if ($cron_second === 0 && !(date('i') % 15)) {
mycron('every 15 minutes');
}
if ($cron_second === 0 && !(date('i') % 60)) {
mycron('every hour');
}
}
Note: It produces 5760 requests per day (4 per minute). Not much, but a cronjob uses much less ressources. If your max_execution_time is high enough you could change it to calling itself only once per minute (1440 requests/day).
I understand that this question is bit old but I stumbled on it a week ago with this very question and the best and secure option we found was using a Web Service.
Our context:
We have our system in both shared hosting and private clouds.
We need that a script is activated once in a month (there are plans to create more schedules and to allow users to create some predetermined actions)
Our system provides access to many clients, so, when anyone uses the system it calls for a Web Service via Ajax and doesn't care about the response (after all everything is logged in our database and must run without user interaction)
What we've done is:
1 - An ajax call is called upon access in any major screen.
2 - The Web Service reads a schedule table on our database and calls whatever needs calling
3 - To avoid many stacked Web Service calls we check datetime with an interval of 10 mins before actually performing any actions
That's also a way to distribute the load balance and the schedules doesn't affect the system with user interaction.

PHP ends script execution automaticly

Firstly I want to give you the basic idea of what I am trying to do:
I'm trying to make a free web hosting service do some work for me. I've created one php page and MySQL db. The basic idea behind my PHP page is I have a while loop with condition of $shutdown, and some counter inside while loop to track whether code is running or not
<?php
/*
Connect to database etc. etc
*/
$shutdown = false;
// Main loop
while (!$shutdown)
{
// Check for user shutdown request
$strq = "SELECT * FROM TB_Shutdown;";
$result = mysql_query($strq);
$row = mysql_fetch_array($result);
if ($row[0] == "true")
{
$shutdown = true; // I know this statement is useless but nevermind
break;
}
//Increase counter
$strq = "SELECT * FROM TB_Counter;";
$result = mysql_query($strq);
$row = mysql_fetch_array($result);
if (intval($row[0]) == 60)
{
// Reset counter
$strq = "UPDATE TB_Counter SET value = 0";
$result = mysql_query($strq);
/*
I have some code to do some works at here its not important just curl stuff
*/
else
{
// Increase counter
$strq = "UPDATE TB_Counter SET value = " . (intval($row[0]) + 1);
$result = mysql_query($strq);
}
/*
I have some code to do some works at here its not important just curl stuff
*/
// Sleep
sleep(1);
}
?>
And I have a check.php which returns me the value from TB_Counter.
The problem is: I'm tracking the TB_Counter table every second. It stops after a while. If I close my webbrowser (which I called my main while php loop page from) it stops after like 2 minutes. If not after 5-7 mins I get the error "connection has been reset" on browser and loop stops.
What should I do to make my loop lasts forever?
You need to allow PHP to execute completely. There is an option in the PHP.INI file which says:
max_execution_time = 30;
This sets the maximum time in seconds a script is allowed to run
before it is terminated by the parser. This helps prevent poorly
written scripts from tying up the server. The default setting is 30.
When running PHP from the command line the default setting is 0.
The function set_time_limit:
Set the number of seconds a script is allowed to run. If this is
reached, the script returns a fatal error. The default limit is 30
seconds or, if it exists, the max_execution_time value defined in the
php.ini.
To check if PHP is running in safe mode, you can use this:
echo $phpinfo['PHP Core']['safe_mode'][0]
If it is going to be a huge process, you can consider running on Cron as a CronJob. A small explanation on it:
Cron is very simply a Linux module that allows you to run commands at predetermined times or intervals. In Windows, it’s called Scheduled Tasks. The name Cron is in fact derived from the same word from which we get the word chronology, which means order of time.
Using Cron, a developer can automate such tasks as mailing ezines that might be better sent during an off-hour, automatically updating stats, or the regeneration of static pages from dynamic sources. Systems administrators and Web hosts might want to generate quota reports on their clients, complete automatic credit card billing, or similar tasks. Cron has something for everyone!
Read more about Cron
You could use php function set_time_limit().
You should not handle this from a browser. Run a cron every minute doing the checks you need would be a better solution.
Next to that why would you update every second? Just write down a timestamp so you know when a request was made?
Making something to run forever is not doable. More important is to secure that your business process keeps running. So maybe it would be wise to put your business case here, you seem to need to count seconds and do something within a minute but it's not totally clear. So what do you need to do?

Multiple "agents" handling a single array

Apologies if this has been covered before - I did my searching but possibly may not know the correct terms to have used.
This process is handled with PHP.
Here's the situation:
I have a large array of file names. The script I have opens these files and enters their content into a database. Processing these files one at a time takes over 24 hours, and these files are updated on a daily basis.
Breaking the single large array into four smaller arrays and running concurrent processes finishes the job before the 24 hour window elapses, but sometimes one or two processes will finish hours before the others because file sizes vary on a daily basis.
Much like people who stock retail shelves (who else has worked that nightmare before?) pitch in to help out with what's left after finishing their own tasks, I'd like to have a script in place where these "agents" do the same.
Here's some basics of what I have figured out - it could be wrong, and I'm not too proud to protest if I am :-)
$files = array('file1','file2','file3','file4','file5');
//etc... on to over 4k elements
while($file = array_pop($files)){
//Something in here... I have no idea what.
}
Ideas? Something like four function calls or four loops within that overarching 'while' has crossed my mind, but I'm pretty sure it's going to wait on executing subsequent calls until the previous one(s) finish.
Any help is appreciated. I'm seriously stuck on this one!
Thanks!
A database-backed message queue seems the obvious solution but I think that's overkill in this case. I would simply put the files to be processed into a single dedicated queue directory, then use the DirectoryIterator class to scan it. Something like this:
while (true) {
look in the queue directory for a file
if you don't fine one, exit the script, all processing is done
if you find one, rename it or move it to a work directory
if the rename/move command succeeded, process the file
if the rename/move command failed, one of the other threads got it first
}
Edit:
Regarding launching the workers, you could use a simple shell script to spawn the PHP processes in the background:
NUM_WORKERS=5
for WORKER in $(seq 1 ${NUM_WORKERS})
do
echo "starting worker ${WORKER}"
php -f /path/to/my/process.php &
done
Then, create a cron entry to run this launcher, for example, at midnight:
0 0 * * * /path/to/launcher.sh
You want what's called a "message queue". Something like beanstalkd
You'll basically create a list of messages that include your individual filenames. You'll then create a set of processors to process them. Each processor will handle one file then go back to the queue to see if there are more messages/files waiting to be processed.
EDIT:
Here's an analogy to help explain message queues. Your first idea is like a human manager taking a stack of files, dividing them into four piles and then handing each of his four employees a pile to process. A message queue is more like this: the manager puts all the files on a table and tells each employee to take a single file from the table and process it. He tells them when they're done with the first file to keep taking files until there are no more files on the table. When all the files are done, the employees can go home.
One employee might end up with really large files and only handle a few, while another employee might get smaller files and handle many. It doesn't matter how many each employee handles, they'll all keep working until the table is empty.
I would have a socket server master script that hands out file paths to x number of slave scripts, until there are no files left to process. This way, all the slave scripts will keep running, and you can hand out file paths dynamically as they are requested.
Something like this:
master.php
<?php
// load the array of files to process (however you do this)
$fileList = file('filelist.txt');
// Create a listening socket on localhost
$serverSocket = stream_socket_server('tcp://127.0.0.1:7878');
$sockets = array($serverSocket);
$clients = array();
// Loop while there are still files to process
while (count($fileList)) {
// Run a select() call on the existing sockets' read buffers
// Skip to next iteration if no sockets are waiting for handling
if (stream_select($read = $sockets, $write = NULL, $except = NULL, 1) < 1) {
continue;
}
// Loop sockets with data to read
foreach ($read as $socket) {
if ($socket == $serverSocket) {
// Accept new clients
$sockets[] = $clients[] = stream_socket_accept($serverSocket);
} else if (trim(fgets($socket)) == 'next') {
// Hand out a new file path to the client
fwrite($socket, array_shift($fileList)."\n");
if (!count($fileList)) {
break 2;
}
}
}
}
// When we're done, disconnect the clients
foreach ($clients as $socket) {
#fclose($socket);
}
// ...and close the listen socket
#fclose($serverSocket);
slave.php
<?php
$socket = fsockopen('127.0.0.1', 7878);
while (!feof($socket)) {
// Get a new file path from the master
fwrite($socket,"next\n");
$path = trim(fgets($socket));
if (is_file($path)) {
// Process the file at $path here
}
}
You then just need to start master.php, then when it is running, you can start however many instances of slave.php as you want, and they will all keep running until there are no more files to process.
Obviously, this has no error handling, but it should provide a basic framework to get you started. This relies on blocking function calls (stream_select() and fgets()) to avoid a race condition - this may or may not be sufficient for your purposes.

Speeding up a PHP App

I have a list of data that needs to be processed. The way it works right now is this:
A user clicks a process button.
The PHP code takes the first item that needs to be processed, takes 15-25 secs to process it, moves on to the next item, and so on.
This takes way too long. What I'd like instead is that:
The user clicks the process button.
A PHP script takes the first item and starts to process it.
Simultaneously another instance of the script takes the next item and processes it.
And so on, so around 5-6 of the items are being process simultaneously and we get 6 items processed in 15-25 secs instead of just one.
Is something like this possible?
I was thinking that I use CRON to launch an instance of the script every second. All items that need to be processed will be flagged as such in the MySQL database, so whenever an instance is launched through CRON, it will simply take the next item flagged to be processed and remove the flag.
Thoughts?
Edit: To clarify something, each 'item' is stored in a mysql database table as seperate rows. Whenever processing starts on an item, it is flagged as being processed in the db, hence each new instance will simply grab the next row which is not being processed and process it. Hence I don't have to supply the items as command line arguments.
Here's one solution, not the greatest, but will work fine on Linux:
Split the processing PHP into a separate CLI scripts in which:
The command line inputs include `$id` and `$item`
The script writes its PID to a file in `/tmp/$id.$item.pid`
The script echos results as XML or something that can be read into PHP to stdout
When finished the script deletes the `/tmp/$id.$item.pid` file
Your master script (presumably on your webserver) would do:
`exec("nohup php myprocessing.php $id $item > /tmp/$id.$item.xml");` for each item
Poll the `/tmp/$id.$item.pid` files until all are deleted (sleep/check poll is enough)
If they are never deleted kill all the processing scripts and report failure
If successful read the from `/tmp/$id.$item.xml` for format/output to user
Delete the XML files if you don't want to cache for later use
A backgrounded nohup started application will run independent of the script that started it.
This interested me sufficiently that I decided to write a POC.
test.php
<?php
$dir = realpath(dirname(__FILE__));
$start = time();
// Time in seconds after which we give up and kill everything
$timeout = 25;
// The unique identifier for the request
$id = uniqid();
// Our "items" which would be supplied by the user
$items = array("foo", "bar", "0xdeadbeef");
// We exec a nohup command that is backgrounded which returns immediately
foreach ($items as $item) {
exec("nohup php proc.php $id $item > $dir/proc.$id.$item.out &");
}
echo "<pre>";
// Run until timeout or all processing has finished
while(time() - $start < $timeout)
{
echo (time() - $start), " seconds\n";
clearstatcache(); // Required since PHP will cache for file_exists
$running = array();
foreach($items as $item)
{
// If the pid file still exists the process is still running
if (file_exists("$dir/proc.$id.$item.pid")) {
$running[] = $item;
}
}
if (empty($running)) break;
echo implode($running, ','), " running\n";
flush();
sleep(1);
}
// Clean up if we timeout out
if (!empty($running)) {
clearstatcache();
foreach ($items as $item) {
// Kill process of anything still running (i.e. that has a pid file)
if(file_exists("$dir/proc.$id.$item.pid")
&& $pid = file_get_contents("$dir/proc.$id.$item.pid")) {
posix_kill($pid, 9);
unlink("$dir/proc.$id.$item.pid");
// Would want to log this in the real world
echo "Failed to process: ", $item, " pid ", $pid, "\n";
}
// delete the useless data
unlink("$dir/proc.$id.$item.out");
}
} else {
echo "Successfully processed all items in ", time() - $start, " seconds.\n";
foreach ($items as $item) {
// Grab the processed data and delete the file
echo(file_get_contents("$dir/proc.$id.$item.out"));
unlink("$dir/proc.$id.$item.out");
}
}
echo "</pre>";
?>
proc.php
<?php
$dir = realpath(dirname(__FILE__));
$id = $argv[1];
$item = $argv[2];
// Write out our pid file
file_put_contents("$dir/proc.$id.$item.pid", posix_getpid());
for($i=0;$i<80;++$i)
{
echo $item,':', $i, "\n";
usleep(250000);
}
// Remove our pid file to say we're done processing
unlink("proc.$id.$item.pid");
?>
Put test.php and proc.php in the same folder of your server, load test.php and enjoy.
You will of course need nohup (unix) and PHP cli to get this to work.
Lots of fun, I may find a use for it later.
Use an external workqueue like Beanstalkd which your PHP script writes a bunch of jobs too. You have as many worker processes pulling jobs from beanstalkd and processing them as fast as possible. You can spin up as many workers as you have memory / CPU. Your job body should contain as little information as possible, maybe just some IDs which you hit the DB with. beanstalkd has a slew of client APIs and itself has a very basic API, think memcached.
We use beanstalkd to process all of our background jobs, I love it. Easy to use, its very fast.
There is no multithreading in PHP, however you can use fork.
php.net:pcntl-fork
Or you could execute a system() command and start another process which is multithreaded.
can you implementing threading in javascript on the client side? seems to me i've seen a javascript library (from google perhaps?) that implements it. google it and i'm sure you'll find something. i've never done it, but i know its possible. anyway, your client-side javascript could activate (ajax) a php script once for each item in separate threads. that might be easier than trying to do it all on the server side.
-don
If you are running a high traffic PHP server you are INSANE if you do not use Alternative PHP Cache: http://php.net/manual/en/book.apc.php . You do not have to make code modifications to run APC.
Another useful technique that can work along with APC is using the Smarty template system which allows you to cache output so that pages do not have to be rebuilt.
To solve this problem, I've used two different products; Gearman and RabbitMQ.
The benefit of putting your jobs into some sort of queuing software like Gearman or Rabbit is that you have multiple machines, they can all participate in processing items off the queue(s).
Gearman is easier to setup, so I'd suggest poking around with it a bit first. If you find you need something more heavy duty with queue robustness; Look into RabbitMQ
http://www.danga.com/gearman/
http://pear.php.net/package/Net_Gearman (PEAR library)
You can use pcntl_fork() and family to fork a process - however you may need something like IPC to communicate back to the parent process that the child process (the one you fork'd) is finished.
You could have them write to shared memory, like via memcache or a DB.
You could also have the child process write the completed data to a file, that the parent process keeps checking - as each child process completes the file is created/written to/updated, and parent process can grab it, one at a time, and them throw them back to the callee/client.
The parent's job is to control the queue, to make sure the same data isn't processed twice and also to sanity check the children (better kill that runaway process and start over...etc)
Something else to keep in mind - on windows platforms you are going to be severely limited - I dont even think you have access to pcntl_ unless you compiled PHP with support for it.
Also, can you cache the data once its been processed, or is it unique data every time? that would surely speed things up..?

Categories