I am trying to create a daemon using System_Daemon package with CodeIgniter's CLI. This is a new area for me and I'm struggling.
Here's what I have:
A CI controller that injects messages into an AWS SQS queue (Thanks to [url=http://codeigniter.com/forums/member/196201/]coccodrillo[/url] for providing excellent instructions on how to integrated AWS SDK into CI. See here: Integrating AWS SDK as a library in Codeigniter).
A CI controller that receives messages in the queue and writes it out to a log file and then deletes the message in the queue.
I would like to have a CI daemon which will listen to this queue, receive messages when they are there and do something useful with the message and then delete the message. So I started with the example in the documentation for System_Daemon and added in the CI code from receiving program. See code below
Is this the right thing to do? Can you guide me into doing this the "right way"? I've trolled the various knowledgeable forums and have come up short....Help me please!
Mmiz
#!/usr/bin/php -q
<?php
// Make it possible to test in source directory
// This is for PEAR developers only
ini_set('include_path', ini_get('include_path').':..');
// Include Class
error_reporting(E_ALL);
require_once "System/Daemon.php";
// Bare minimum setup
System_Daemon::setOption("appName", "receiveaws");
System_Daemon::setOption("logLocation","/tmp/log/receiveaws.log");
System_Daemon::setOption("appPidLocation","/tmp/log/receiveaws/receiveaws.pid");
System_Daemon::log(System_Daemon::LOG_INFO, "Daemon not yet started so this will be written on-screen");
// Spawn Deamon!
System_Daemon::start();
System_Daemon::log(System_Daemon::LOG_INFO, "Daemon: '".
System_Daemon::getOption("appName").
"' spawned! This will be written to ".
System_Daemon::getOption("logLocation"));
System_Daemon::log(System_Daemon::LOG_WARNING, 'My php code starting');
class Receiveaws extends CI_Controller {
public function index(){
if ($this->input->is_cli_request()) {
//Load the aws library
$this->load->library('awslib');
$sqs = new AmazonSQS();
//Get the queue to look at
$res=$sqs->get_queue_url('example-queue');
//Get the queue's url
$qurl=($res->body->GetQueueUrlResult->QueueUrl);
System_Daemon::log(System_Daemon::LOG_INFO,$qurl);
//Get a message from the queue
$response = $sqs->receive_message($qurl);
//If there was a message received, then do something
if ($res->isOK()) {
System_Daemon::log(System_Daemon::LOG_INFO,"Receive message successful");
//Now delete message from queue
$res=$sqs->delete_message($qurl,$rcpt_hand);
if ($res->isOK()) {
System_Daemon::log(System_Daemon::LOG_INFO,"Delete message successful");
}
} else {
//go back to check for messages
//How do you do that?
}
} else {
//Access from URL - so bail out?
//how do you not bail out of the daemon from here?
}
}
}
System_Daemon::stop();
?>
A daemon is a process running 'forever' in the background.
Here, all you do is checking for one new message in the queue, then you exit.
Basicly you have to add a loop that take all the code that need to be executed. You need to perform a sleep in the loop to, to avoid your daemon taking up all available resources.
Anyway, php isn't good at daemon because some memory is never freed until the end of the script. If your script never end (like a daemon), it'll eat up all available memory (according to php configuration) then die with an error. You will have to code you script very carefully to avoid such memory leaks!
Also, take note that each time you ask the sqs library something, it send a http request to Amazon servers. It can be very costly to do that too often.
To compensate, I recommend you to use a cronjob that run every minute to check for new task. This way you avoid memory leaks (php process goes down between executions) and too much network usage (request are made one ina minute).
On the last note, if you don't plan on having many tasks (that means, your daemon does nothing 99% of the time), consider using a push queue instead. With a push queue, it's not your script that poll the queue anymore, but the queue notify your script (ie: call you script with a standard http request) every time some task need to be done. This avoid running the script needlessy.
I don't know if amazon provides push queues, but ironmq (another 'free' queue service) can provide them.
More information: http://dev.iron.io/mq/reference/push_queues/
Related
I'm trying to execute a Symfony Command using the Symfony Process component so it executes asynchronously when getting an API request.
When I do so I get the error message that Code: 127(Command not found), but when I run it manually from my console it works like a charm.
This is the call:
public function asyncTriggerExportWitnesses(): bool
{
$process = new Process(['php /var/www/bin/console app:excel:witness']);
$process->setTimeout(600);
$process->setOptions(['create_new_console' => true]);
$process->start();
$this->logInfo('pid witness export: ' . $process->getPid());
if (!$process->isSuccessful()) {
$this->logError('async witness export failed: ' . $process->getErrorOutput());
throw new ProcessFailedException($process);
}
return true;
}
And this is the error I get:
The command \"'php /var/www/bin/console app:excel:witness'\" failed.
Exit Code: 127(Command not found)
Working directory: /var/www/public
Output:================
Error Output:================
sh: exec: line 1: php /var/www/bin/console app:excel:witness: not found
What is wrong with my usage of the Process component?
Calling it like this doesn't work either:
$process = new Process(['/usr/local/bin/php', '/var/www/bin/console', 'app:excel:witness']);
this results in following error:
The command \"'/usr/local/bin/php' '/var/www/bin/console' 'app:excel:witness'\" failed.
Exit Code: ()
Working directory: /var/www/public
Output:
================
Error Output:
================
First, note that the Process component is not meant to run asynchronously after the parent process dies. So triggering async jobs to run during an API request is a not a good use case.
These two comments in the docs about running things asynchronously are very pertinent:
If a Response is sent before a child process had a chance to complete, the server process will be killed (depending on your OS). It means that your task will be stopped right away. Running an asynchronous process is not the same as running a process that survives its parent process.
If you want your process to survive the request/response cycle, you can take advantage of the kernel.terminate event, and run your command synchronously inside this event. Be aware that kernel.terminate is called only if you use PHP-FPM.
Beware also that if you do that, the said PHP-FPM process will not be available to serve any new request until the subprocess is finished. This means you can quickly block your FPM pool if you’re not careful enough. That is why it’s generally way better not to do any fancy things even after the request is sent, but to use a job queue instead.
If you want to run jobs asynchronously, just store the job "somewhere" (e.d a database, redis, a textfile, etc), and have a decoupled consumer go through the "pending jobs" and execute whatever you need without triggering the job within an API request.
This above is very easy to implement, but you could also just use Symfony Messenger that will do it for you. Dispatch messages on your API request, consume messages with your job queue consumer.
All this being said, your use of process is also failing because you are trying mixing sync and async methods.
Your second attempt at calling the command is at least successful in finding the executable, but since you call isSuccessful() before the job is done.
If you use start() (instead of run()), you cannot simply call isSuccessful() directly, because the job is not finished yet.
Here is how you would execute an async job (although again, this would very rarely be useful during an API request):
class ProcessCommand extends Command
{
protected static $defaultName = 'process_bg';
protected function execute(InputInterface $input, OutputInterface $output)
{
$phpBinaryFinder = new PhpExecutableFinder();
$pr = new Process([$phpBinaryFinder->find(), 'bin/console', 'bg']);
$pr->setWorkingDirectory(__DIR__ . '/../..');
$pr->start();
while ($pr->isRunning()) {
$output->write('.');
}
$output->writeln('');
if ( ! $pr->isSuccessful()) {
$output->writeln('Error!!!');
return self::FAILURE;
}
$output->writeln('Job finished');
return self::SUCCESS;
}
}
I like to use exec().
You'd need to add a couple of bits to the end of your command:
Use '2>&1' so the output has somewhere to go. From memory, this is important so that PHP isn't waiting for the output to be returned (or streamed or whatever).
Put '&' on the end to make the command run in the background.
Then it's a good idea to return a 202 (Accepted) rather than 200, because we don't yet know whether it was successful, as the command hasn't completed.
public function runMyCommandIntheBackground(string $projectDir): Response
exec("{ProjectDir}/bin/console your:command:name 2>&1 &");
return new Response('', Response::HTTP_ACCEPTED);
}
I noticed that when I have an endless worker I cannot profile PHP shell scripts. Because when it's killed it doesn't send the probe.
What changes shall I do?
When you are trying to profile a worker which is running an endless loop. In this case you have to manually edit your code to either remove the endless loop or instrument your code to manually call the close() method of the probe (https://blackfire.io/doc/manual-instrumentation).
That's because the data is sent to the agent only when the close() method is called (it is called automatically at the end of the program unless you killed it).
You can manually instrument some code by using the BlackfireProbe class that comes bundled with the Blackfire's probe:
// Get the probe main instance
$probe = BlackfireProbe::getMainInstance();
// start profiling the code
$probe->enable();
// Calling close() instead of disable() stops the profiling and forces the collected data to be sent to Blackfire:
// stop the profiling
// send the result to Blackfire
$probe->close();
As with auto-instrumentation, profiling is only active when the code is run through the Companion or the blackfire CLI utility. If not, all calls are converted to noops.
I don't know, maybe in 2015 following page did not exist, but now you can do profiling in following way: https://blackfire.io/docs/24-days/17-php-sdk
$blackfire = new LoopClient(new Client(), 10);
$blackfire->setSignal(SIGUSR1);
$blackfire->attachReference(7);
$blackfire->promoteReferenceSignal(SIGUSR2);
for (;;) {
$blackfire->startLoop($profileConfig);
consume();
$blackfire->endLoop();
usleep(400000);
}
Now you can send signal SIGUSR1 to process of this worker and LoopClient will start profiling. It'll listen 10 iterations of method consume and send last probe. After that it'll stop profiling.
We are running Laravel 4 with supervisord / SQS and we have 30+ different tasks being run using 10 worker processes. All has been going very well however it seems that certain tasks have started to timeout. We get an exception like this:
[Symfony\Component\Process\Exception\ProcessTimedOutException]
The process ""/usr/bin/php5" artisan queue:work --queue="https://sqs.us-east- 1.amazonaws.com/xxxx" --delay=0 --memory=128 --sleep=3 --tries=0 --env=development" exceeded the timeout of 180 seconds.
I can catch this exception using this:
App::error(function(Symfony\Component\Process\Exception\ProcessTimedOutException $exception) {
/// caught!
});
However I can't seem to determine WHICH task is being run (when the timeout occurs) and even better if I can access the data which was passed to the task..
I have tried logging the exception object stack trace:
$exception->getTraceAsString()
However, this doesn't get me enough detail about the task that was called.
UPDATE
I have done more research on how the php artisan queue:listen works. Some references:
Illuminate/Queue/Console/Listen
Illuminate/Queue/Listener
Symfony/Component/Process
Basically, when you call php artisan queue:listen, a SUB-PROCESS (using Symfony/Component/Process) is created which essentially runs the command php artisan queue:work. That sub-process fetches the next job from the queue, runs it, reports when complete, and then the Listener spawns another sub-process to handle the next job.
So, if one of the sub-processes is taking longer than the established timeout limit, the PARENT Listener throws an exception however, the parent instance has no data about the sub-process it created. WITH A SLIGHT EXCEPTION! It appears that the parent Listener DOES handle the sub-process' output. It appears to me that the parent does nothing more than render the sub-process' (worker) output to the console. However, perhaps there is a way to capture this output so that when an exception is thrown, we can log the output and therefore have knowledge about which task was running when the timeout occurred!
I have also noticed that when using supervisord, we are able to specify a stdout_logfile which logs all of the worker output. Right now we are using a single log file for all of our 10 supervisord "programs". We could change this to have each "program" use it's own log file and then perhaps when the timeout exception is thrown on the parent Listener, we could have it grab the last 10 lines of that log file. That would also give us info on which tasks are being run during the timeout. However, I am not sure how to "inform" the parent Listener which supervisord program it is running so it knows which log file to look at!
Looking at the exception class (Symfony\Component\Process\Exception\ProcessTimedOutException) I found the method getProcess() which returns an instance of Symfony\Component\Process\Process. In there you got getOutput(). The method does what it's name says.
As you proposed in the comments you can use this by echoing classname and parameters in each task and then using the generated output to determine the problematic task. As you said, it's not very elegant but I can't think of a better way (except maybe tinkering with the Illuminate\Queue\Listener class...)
Here's an example how you could do it (untested though)
I chose this format for the output:
ClassName:ParametersAsJson;ClassName:ParametersAsJson;
So in a BaseTask I'd do this:
abstract class BaseTask {
public function fire(){
echo get_class($this) . ':' . json_encode(func_get_args()) . ';';
}
}
Unfortunately that means in every task you will have to call parent::fire
class Task extends BaseTask {
public function fire($param1, $param2){
parent::fire($param1, $param2);
// do stuff
}
}
And finally, the exception handler:
App::error(function(Symfony\Component\Process\Exception\ProcessTimedOutException $exception) {
$output = $exception->getProcess()->getOutput();
$tasks = explode(';', $output);
array_pop($output); // remove empty task that's here because of the closing ";"
$lastTask = end($tasks);
$parts = explode(':', $lastTask);
$className = $parts[0];
$parameters = json_decode($parts[1]);
// write details to log
});
Since Laravel 4.1 there is a built-in mechanism for handling failed jobs where all job details are persisted in database or available at exception-time (or both). Detailed and clear documentation is available on Laravel's website.
To summarise:
Laravel can move failed jobs into a failed_jobs table for later review
You can register an exception handler via Queue::failing, which will receive detailed job information for immediate handling
However, it is questionable whether a timeout is considered a failure in Laravel, so this requires testing as I do not have hands-on experience with queues.
If you use Laravel 4.0 perhaps it would be worthwhile to evaluate an upgrade at least to 4.1 instead of writing complicated code that will become redundant once you really have to upgrade (you will upgrade at some point, right? :) ). Upgrade path seems quite straightforward.
While this does not directly answer your question for Laravel 4.0 it is something you and any future reader may consider.
we are writing a PHP script which creates virtual machines via a RESTful API call. That part is quite easy. Once that request to create the VM is sent to the server, the API request returns with essentially "Machine queued to be created...". When we create a virtual machine, we insert a record into a MySQL database basically with VM label, and DATE-CREATED-STARTED. That record also has a field DATE-CREATED-FINISHED which is NULL.
LABEL DATE-CREATED-STARTED DATE-CREATED-FINISHED
test-vm-1 2011-05-14 12:00:00 NULL
So here is our problem. How do we basically spin/spawn off a PHP worker, on the initial request, that checks the status of the queued virtual machine every 10 seconds, and when the virtual machine is up and running, updates DATE-CREATED-FINISHED. Keep in mind, the initial API request immediately returns "Machine queue to be created." and then exits. The PHP worker needs to be doing the 10 second check in the background.
Can your server not fire a request once the VM has been created?
Eg.
PHP script requests the server via your API to create a new VM.
PHP script records start time and exits. VM in queue on server waits to be created.
Server finally creates VM and calls an update tables php script.
That way you have no polling, no cron scripts, no background threads. etc. But only if you're system can work this way. Otherwise I'd look at setting up a cron script as mentioned by #dqhendricks or if possible a background script as #Savas Alp mentioned.
If your hosting allows, create a PHP CLI program and execute it in the background like the following.
<?php
while (true)
{
sleep(10);
// Do the checks etc.
}
?>
And run it like the following command:
php background.php & // Assuming you're using Linux
If your hosting does not allow running background jobs, you must utilize every opportunity to do this check; like doing it at the beginning of every PHP page request. To help facilitate this, after creating a virtual machine, the resulting page may refresh itself at every 10 seconds!
As variant, you can use Tasks module, and there is sample of task code:
class VMCheck extends \Tasks\Task
{
protected $vm_name;
public function add($vm_name)
{
$this->getStorage()->store(__CLASS__, $vm_name, true);
}
public function execute()
{
do
{
$check = CheckAPI_call($vm_name); //your checking code here
sleep(10);
}
while (empty($check));
}
public function restore($data)
{
$this->vm_name = $data;
}
}
i have develop a eblast application
The program is use to send a email to some recipients
the recipients email will be grab from a xls file
and the program has set it to send 10 email each time and sleep 30 seconds
and use ob_flush(); and flush(); to out put the steam of the process and display in frontend
yesterday my client test it with 9000 recipients (it should take arround 10hours)
and he told me the program has stop, and i found the log file has mark that the program has stopped at 65XX emails,
that mean the program has already sent 6XXX email (arround 7hour)
and this problem will never happen in cron job,but only happen when exec though the web browser
my frd told me because it is all about long time sleep?
and he suggest to use cron job, however my application already has cron job to set,
the client just want to have a feature to send the email immediately
any other solution to solve? use php call a linux command and excu a php email sending script?
Long running processes in Apache or IIS are tricky. The problem is if anything happens like a restart of the webserver or a timeout you lose your work. You are better off keeping this simple and making into a cron job but if you are up for the challenge it is possible to get around.
I've gotten around occasional webserver restarts by saving the state of my process into a database and a script that continually hits the page to check if it's up and working. So when the long running process first loads it checks if it should be running and if it should continue a job or not. In your case that might be the line number of the excel file.
It ends up being lot of extra work and you need to be very careful. From the sounds of your project I would keep it simple by going the cron job route you mentioned.
My solution is, try to set your cronjob to run every minutes.
However, you should save the state of your cronjob so that it didn't run twice.
I usually do it this way (Note that this cron is intended to run every minute):
if(stat_check_file('cron.stat'))
{
die("Found CRON.STAT, Exit!");
}
else
{
stat_create_stat_file('cron.stat');
//do your long process here...
}
stat_delete_stat_file('cron.stat');
function stat_check_file($filename)
{
global $rootdir;
return file_exists($rootdir.'/'.$filename);
}
function stat_create_stat_file($filename){
global $rootdir;
touch($rootdir.'/'.$filename);
}
function stat_delete_stat_file($filename)
{
global $rootdir;
if(stat_check_file($filename))
{
#unlink($rootdir.'/'.$filename);
}
}
Now, on your cronjob, simply load the xls, run it and write log to either database / file.
then, on your panel, read that log and display it so that your client will see right now, there's xxx email sent and xxx email to go.