Amazon Elastic Beanstalk Worker cronjob (SQS) triggers same message multiple times - php

All,
I have a quite disturbing problem with my Amazon Elastic Beanstalk Worker combined with SQS, which is supposed to provide a cron job scheduling - all this running with PHP.
Following scenario - I need a PHP script to be executed regularly in the background, which might eventually run for hours. I saw this nice introduction which seems to cover exact my scenario (AWS Worker Environments - see the Periodic Task part)
So I read quite a lot of howtos and set up an EBS Worker with the SQS (which actually is done automatically during creation of the worker) and provided the cron config (cron.yaml) within my deployment package.
The cron script is properly recognized. The sqs daemon starts, messages are put into the queue and trigger my PHP script exactly on schedule. The script is run and everything works fine.
The configuration of the queue looks like this:
SQS configuration
However after some time of processing (the script is still busy - and NO it is not the next scheduled run^^) a second message is opened and another instance of the same script is executed, and another, and another... in exactly 5 minutes intervals.
I suspect, somehow the message is not removed from the queue (although I ensured that the script sends status 200 back), which ends up in creating new message, if the script runs for too long.
Is there a way to prevent the spawning of another messages? Tell the queue or the sqs daemon not to create new flighing messages? Do I have to remove the message in my code? Although the tutorial states it should happen automatically
I would like to just trigger the script, remove the message from queue and let the script run. No fancy fallback / retry mechanisms please :-)
I spent many hours trying to find something on the internet. Unsuccessful. Any help is appreciated.
Thanks

a second message is opened and another instance of the same script is executed, and another, and another... in exactly 5 minutes intervals.
I doubt it is a second message. I believe it is the same message.
If you don't respond 200 OK before the Inactivity Timeout expires, then the message goes back to the queue, and yes, you'll receive it again, because the system assumes you've crashed, and you would want to see it again. That's part of the design.
There's an X-Aws-Sqsd-Receive-Count request header you're receiving that tells you approximately how many times the current message has been delivered. The X-Aws-Sqsd-Msgid request header identifies the unique message.
If you can't ensure that the script will finish before the timeout, then this is not likely an appropriate use case for this service. It sounds like the service is working correctly.

I know this doesn't directly answer your question regarding configuration, but I ran into a similar issue - my queue configuration is set exactly like yours, and in my Elastic Beanstalk setup, I've set the Visibility Timeout to 1800 seconds (or half an hour) and Max Retries to 2.
If a job runs for more than a minute, it gets run again and then thrown into the dead letter queue, even though after a 200 OK is returned from the application every time.
After a few hours, I realized that it was the Nginx server that was timing out - checking the Nginx error log yielded that insight. I don't know why Elastic Beanstalk includes a web server in this scenario... You may want to check if EB spawns a web server in front of your application, if all else fails.

Look at the Worker Environment documentation for details on the values you can configure. You can configure several different timeout values as well as "Max retries", which if set to 1 will prevent re-sends. However, your Dead Letter Queue will fill up with messages that were actually processed successfully, so that might not be your best option.

Related

RabbitMQ basic_get with multiple consumers

I'm moving some resource intensive functionality currently running on a cron to a RabbitMQ queue. I'm weary of having long running PHP consumer scripts so I'm thinking of doing the following:
Jobs are added to the queue at the start of the day.
A cron runs a command which starts a consumer.
The consumer uses basic_get to get a job, processes the job, acknowledges the job and then exits.
The cron runs again and the next job is processed.
I have a couple of questions around how well this will work.
If I decide to fire up 2 workers via the cron (running the command twice) and the first job is still being processed, and hasn't been acknowledged, would RabbitMQ ever send the same job to the second worker?
I've noticed that basic_consume will be more performant since there's no round trip when receiving each job. Is it possible to use basic_consume rather than basic_get without having to worry about the workers being left to run for too long?
The first part:
No it would not. This would happen only in the case when the first consumer dies without ACKing the message- then that message gets requeued and the next consumer gets it.
The second part:
You should use basic_consume because it's faster, asynchronous and generally better. Using any message retrieval methods has nothing to do with how long will the consumers run.

What is a method to notify you when things "don't" happen?

I have lots of different scripts and quite a few cron jobs that trigger different things throughout the day. Many times it is to download data from an external API or to periodically run a script of some type.
However, I am at a loss in finding a simple method to notify me if these things don't happen. For example, recently, something happened on one of my servers that caused all the cron jobs to stop running. It took a few days before I started getting complaints that things weren't working right. What are some of the methods you use to make sure things happen on a regular basis?
Nagios supports a type of check called "Passive Checks". Normally Nagios directly monitors a thing such as whether a server pings or a service is up using Active Checks. i.e. you ping a server, or ask about the status of a service every five minutes. If there's no response or the underlying nagios check script reports a failure then Nagios will eventually mark that host or service as "Hard Down". Then depending on your notification and alert rules you'll be alerted that something is broken.
Some checks such as checking if cronjobs have run is a bit more tricky because you can't directly ask a cron job if it ran. I guess you could write a script to trawl your cron logs to see if a cronjob ran within a certain time period but it can get complicated.
However, Nagios can be configured to "Passively" check for an "UP" status that is submitted to Nagios in a certain period of time by external services. So instead of Nagios directly polling for a status, you can turn things around and have your scripts submit a success/fail status to Nagios.
So say you have a task that should run every 24 hours. When the task completes it would submit a result directly to Nagios. On Nagios you'd configure a passive check to make sure this result appears within a window of 24 hours. If it doesn't (for example crond crashed or something deleted the cron job entry) then Nagios would alert you that it's had no result.
Relevant Nagios documentation:
http://nagios.sourceforge.net/docs/3_0/passivechecks.html
This article shows a worked example:
http://www.admin-magazine.com/Archive/2014/22/Nagios-Passive-Checks
The key to all this is the passive service check freshness_threshold, i.e. if Nagios doesn't see a new result within that timeperiod then it'll raise an alert.
These situations are typically what server monitoring tools such as Nagios and Munin are used for. These tools will allow you to monitor your server's up-time, alerting you (and also allowing you to take action) in case of anomalies.
Please let me know if you have any questions!
Try setting up an account at https://www.setcronjob.com/ - then your crons run independently of your server, you can manage crons on multiple servers, get more fine grained control of timings, and so on.

PHP program in shared server terminates in different location each time - fails 3% times

I've written a PHP script which performs web scraping from one site and parse input for my website.
The script is driven by cronjob periodically, and everything is hosted in a shared web-server.
The problem is: my script terminates several times a day, with no error message and in random location in code each time.
The script is long, performing 2 HTTP Gets and 4 HTTP Posts to other country website, each HTTP request takes ~3 seconds to complete; it also writes to files and r/w to/from MySql database.
I'm stuck on it after trying the following things:
1) Talking with my hosting support (IxWebHosting) - they just wasted my time, denying their responsibility and advised me to limit the cronjob periodicy to 5 minute rate maximum (before it was 3 minutes interval, however it didn't change anything.)
2) Instead of running from cronjob context, I've switched to the following method:
a. Cronjob calls a "loader PHP script" every 5 minutes.
b. The "loader PHP script" calls the real PHP script using HTTP Get and terminates before waiting for an answer.
c. The real PHP script perform its ~20 seconds job (here is where the program terminates in random location).
3) Put some log file timestamp writing in many places along the code in order to see where program terminated each run - this showed me the program terminates everywhere in the code.
4) In order to prove it's not my code fault I've performed the following test:
a. Cronjob calls another loader PHP script.
b. The PHP script performs HTTP request to a different testing-purpose PHP script and terminates without waiting for response.
c. The 2nd PHP script will perform dummy 20 seconds task: sleep for a second and write timestamp into log file for 20 times.
Result: the test succeeded! the 2nd program didn't fail... which means it has something to do with my code and the webserver I'm running at - however since it fails everytime in different place and only ~10 times a day (from 288 times it runs a day) then I can't tell where it is (and no error message of PHP).
Thanks in advance, sorry for long description - I'll be happy to provide more details upon request.
Are you logging the actual process, rather than writing logs from within the process ? e.g. does your cron job look like:
* * * * * /home/user/myTroublesomeJob.php 2>&1 >/tmp/crash.log
This will catch the stdout/sterr of the process itself. It may also be worth invoking your script from a parent shell script, and that can catch the PHP process exiting, and dump out the exit code (which would indicate a core dump, a signal being caught etc.). See here for more info.
Try setting the timeout at the start of the script.
set_time_limit(1800);
I found recently that if I ran a script manually, it went fine, but if it was run by Cron, it would throw timeout errors. Putting this limit in helped.
If you are running a script on a Shared server then it will not allow you to run long running Scripts.
If you script takes time then please use a dedicated server. Because in shared server many user are using shared resources so server automatically kills a script which is using extra resources.
..I will suggest you to use amazon EC2 free package. There you will be able to run long running scripts.
Thanks

Limit to cron job with PHP

I have a Cron Job with PHP which I want to set up on my webhost, but at the moment the script takes about 20 seconds to run with only 3 users data being refreshed. If I get a 1000 users - gonna take ages. Is there an alternative to Cron Job? Will my web host let me run a cron job which takes, for example, 10 minutes to run?
Your cron job can be as long as you want.
The main problem for you is that you must ensure the next cron job execution is not occuring while the first one is still running. You have a lot of solutions to avoid it, basically use a semaphore.
It can be a lock file, a record in database. Your cron job should check if the previous one is finished or not. A good thing is maybe sending you an email if he cannot run because of a long previous job (this way you'll have some notice alerting you that something is maybe getting wrong) By default cron jobs with bad error dstatus on exit are outputing all the standard output to the email of the account running the job, depending on how is configured the platform you could use this behavior or build an smtp connexion on the job (or store the alert in a database table).
If you want some alternatives to cron jobs you should have a look at work queues. You can mix work queues with a cron job, or use work queue in apache-php envirronment, lot of solutions, but the main idea is to make on single queue of things that should be done, and execute them one after the other (but be careful, if you handle theses tasks very slowly you'll get a big fat waiting queue).
A cron job shouldn't have any bearing on how long it's 'job' takes to complete. If you're jobs are taking 20 seconds to complete, it's PHP's fault, not cronjob.
Will my web host let me run a cron job which takes, for example, 10 minutes to run?
Ask your webhost.
If you want to learn about optimizing php scripts, take a look at Profiling PHP Code.

Checking the status of my PHP beanstalkd background processes

I have a website written in PHP (CakePHP) where certain resource intensive tasks are handled by a background process. This is done through the Beanstalkd message queue. I need some way to retrieve the status of that background process so I can monitor it with Monit.
The background process is a CakePHP Shell (just a PHP CLI script) that communicates with Beanstalkd. It simply does a reserve() on Benastalkd and waits for a new message. When it gets a message, it processes it. I want some way of monitoring this process with Monit so that it can restart the background process if something has gone wrong.
What I have been thinking about so far is writing a PHP CLI script that drops a message in Beanstalkd. The background process picks up the message and somehow communicates it's internal status back to the CLI script. But how? Sockets? Shared memory? Some other IPC method?
Or am I perhaps being too complicated here and is there a much easier way to monitor such a process with Monit?
Thanks in advance!
Here's what I ended up doing in the end.
The CLI script connects to beanstalkd, creates a new queue (tube) and starts watching it. Then it drops a highest priority message in the queue that the background daemon is watching. That message contains the name of the new queue that the CLI script is monitoring.
The background process receives this message almost immediately (because it is highest priority), generates a status message and puts it in the queue that the CLI script is watching. The CLI script receives it and then closes the queue.
When the CLI script does not get a response in 30 seconds it will exit with an error indicating the background daemon is (most likely) hung.
I tied all this into Monit. Monit can now check that the background daemon is running (via the pidfile and process list) and verify that it is actually still processing messages (by using the CLI tool to test that it responds to status requests)
There probably is a plugin to Monit or Nagios to connect, run the stats and return if there are 'too many'. There isn't a 'protocol' written already for that, but t doesn't appear to be exceeding difficult to modify an existing text-based one (like nntp, or smtp) to do what you want. It does mean writing it in C though, by the looks of it.
From a CLI-PHP script, I would go about it through one (or both) of two different methods.
1/ drop a (low-ish) priority message into the queue, and make sure it comes back within a few seconds. Putting it into a dedicated queue and making sure there's nothing there before you put it in there would be a good addition as well.
2/ perform a 'stats' and see how many are waiting: 'current-jobs-ready'.
To get the information back to a website (either way), you can write to a file, or into something like Memcached which gts read and acted upon.

Categories