Beanstalkd (via pheanstalk) allowing duplicate, simultaneous reserves? - php

Trying to wrap Pheanstalk in my PHP job base class. I'm testing the reserve and reserve with delay functionality and I've found that I can reserve a job from a second instances of my base class without the first instance releasing the job or the TTR timing out. This is unexpected since I was thinking this is exactly the thing job queues are supposed to prevent. Here are the beanstalkd commands for the first put and the first reserve along with time stamps. I also do a stats-job request at the end:
01:40:15: Sending command: use QueuedCoreEvent
01:40:15: Got response: USING QueuedCoreEvent
01:40:15: Sending command: put 1024 0 300 233
a:4:{s:9:"eventName";s:21:"ReQueueJob_eawu7xr9bi";s:6:"params";a:2:{s:12:"InstanceName";s:21:"ReQueueJob_eawu7xr9bi";s:17:"aValueToIncrement";i:123456;}s:9:"behaviors";a:1:{i:0;s:22:"BehMCoreEventTestDummy";}s:12:"failureCount";i:0;}
01:40:15: Got response: INSERTED 10
01:40:15: Sending command: watch QueuedCoreEvent
01:40:15: Got response: WATCHING 2
01:40:15: Sending command: ignore default
01:40:15: Got response: WATCHING 1
01:40:15: Sending command: reserve-with-timeout 0
01:40:15: Got response: RESERVED 10 233
01:40:15: Data: a:4:{s:9:"eventName";s:21:"ReQueueJob_eawu7xr9bi";s:6:"params";a:2:{s:12:"InstanceName";s:21:"ReQueueJob_eawu7xr9bi";s:17:"aValueToIncrement";i:123456;}s:9:"behaviors";a:1:{i:0;s:22:"BehMCoreEventTestDummy";}s:12:"failureCount";i:0;}
01:40:15: Sending command: stats-job 10
01:40:15: Got response: OK 162
01:40:15: Data: ---
id: 10
tube: QueuedCoreEvent
state: reserved
pri: 1024
age: 0
delay: 0
ttr: 300
time-left: 299
file: 0
reserves: 1
timeouts: 0
releases: 0
buries: 0
kicks: 0
So far, so good. Now I do another reserve from a second instance of my base class followed by another stats-job request. Notice the time stamps are within the same second, nowhere near the 300 second TTR I've set. Also notice in this second stats-job printout that there are 2 reserves of this job with 0 timeouts and 0 releases.
01:40:15: Sending command: watch QueuedCoreEvent
01:40:15: Got response: WATCHING 2
01:40:15: Sending command: ignore default
01:40:15: Got response: WATCHING 1
01:40:15: Sending command: reserve-with-timeout 0
01:40:15: Got response: RESERVED 10 233
01:40:15: Data: a:4:{s:9:"eventName";s:21:"ReQueueJob_eawu7xr9bi";s:6:"params";a:2:{s:12:"InstanceName";s:21:"ReQueueJob_eawu7xr9bi";s:17:"aValueToIncrement";i:123456;}s:9:"behaviors";a:1:{i:0;s:22:"BehMCoreEventTestDummy";}s:12:"failureCount";i:0;}
01:40:15: Sending command: stats-job 10
01:40:15: Got response: OK 162
01:40:15: Data: ---
id: 10
tube: QueuedCoreEvent
state: reserved
pri: 1024
age: 0
delay: 0
ttr: 300
time-left: 299
file: 0
reserves: 2
timeouts: 0
releases: 0
buries: 0
kicks: 0
Anyone have any ideas on what I might be doing wrong? Is there something I have to do to tell the queue I want jobs to only be accessed by one worker at a time? I'm doing an "unset" on the pheanstalk instance as soon as I get the job off the queue which I believe terminates the session with beanstalkd. Could this cause beanstalkd to decide the worker has died and automatically release the job without a timeout? I'm uncertain of how much beanstalkd relies on session state to determine worker state. I was assuming that I could open and close sessions with impunity and that job id was the only thing that beanstalkd cared about to tie job operations together, but that may have been foolish on my part... This is my first foray into job queues.
Thanks!

My guess is your first client instance closed the TCP socket to the beanstalkd server before the second one reserved the job.
Closing the TCP connection implicitly releases the job back onto the queue. These implicit releases (close connection, quit command etc) do not seem to increment the releases counter.
Here's an example:
# Create a job, reserve it, close the connection:
pda#paulbookpro ~ > telnet 0 11300
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
put 0 0 600 5
hello
INSERTED 1
reserve
RESERVED 1 5
hello
^]
telnet> close
Connection closed.
# Reserve the job, stats-job shows two reserves, zero releases.
# Use 'quit' command to close connection.
pda#paulbookpro ~ > telnet 0 11300
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
reserve
RESERVED 1 5
hello
stats-job 1
OK 151
---
id: 1
tube: default
state: reserved
pri: 0
age: 33
delay: 0
ttr: 600
time-left: 593
file: 0
reserves: 2
timeouts: 0
releases: 0
buries: 0
kicks: 0
quit
Connection closed by foreign host.
# Reserve the job, stats-job still shows zero releases.
# Explicitly release the job, stats-job shows one release.
pda#paulbookpro ~ > telnet 0 11300
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
reserve
RESERVED 1 5
hello
stats-job 1
OK 151
---
id: 1
tube: default
state: reserved
pri: 0
age: 46
delay: 0
ttr: 600
time-left: 597
file: 0
reserves: 3
timeouts: 0
releases: 0
buries: 0
kicks: 0
release 1 0 0
RELEASED
stats-job 1
OK 146
---
id: 1
tube: default
state: ready
pri: 0
age: 68
delay: 0
ttr: 600
time-left: 0
file: 0
reserves: 3
timeouts: 0
releases: 1
buries: 0
kicks: 0
quit
Connection closed by foreign host.

I got the same issue. The problem was in multiple connections opened to beanstalkd.
use Pheanstalk\Pheanstalk;
$pheanstalk = connect();
$pheanstalk->put(serialize([1]), 1, 0, 1800);
/** #var Job $job */
$job = $pheanstalk->reserve(10);
print_r($pheanstalk->statsJob($job->getId()));
// state reserved but
// only those connection that reserved a job can resolve/update it
$pheanstalk2 = connect();
print_r($pheanstalk->statsJob($job->getId()));
$pheanstalk2->delete($job);
// new connection opened in same process still cannot update the job
// PHP Fatal error: Uncaught Pheanstalk\Exception\ServerException: Cannot delete job 89: NOT_FOUND in /var/www/vendor/pda/pheanstalk/src/Command/DeleteCommand.php:45
function connect() {
$pheanstalk = new Pheanstalk(
'localhost',
11300,
5
);
return $pheanstalk;
}

Related

Guzzle HTTP / Curl error 7 for Laravel app running with Supervisord on localhost Mac OS

Laravel app works fine if It's started manually with the command "php artisan octane:start".
So I decided to run with supervisor and I discovered that all external HTTP requests were rejected with curl error 7. Below is a test configuration with curl
------ curl-test.conf -------
[program:curl_test]
process_name=%(program_name)s
command=/bin/sh -c "curl -v google.com"
autostart=true
loglevel=debug
autorestart=false
stdout_logfile=/tmp/curl-test.log
redirect_stderr=true
------------------ curl-test.log -----------
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 216.58.223.206...
* TCP_NODELAY set
* Immediate connect fail for 216.58.223.206: Software caused connection abort
* Closing connection 0
curl: (7) Couldn't connect to server
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 216.58.223.206...
* TCP_NODELAY set
* Immediate connect fail for 216.58.223.206: Software caused connection abort
* Closing connection 0
curl: (7) Couldn't connect to server
This happened as a result of an internet firewall restriction on my mac. This issue is closed now

How to troubleshoot job Queues with Beanstalk on Laravel

Background
I'm given a laravel app who's queue is configured by forge. And so I'm trying to make it run now on my localhost which is OSX
This is what I did:
installed beanstalk on OSX
ran beanstalk server on my console: $ beanstalk
ran the laravel worker command
$ php artisan queue:work beanstalkd --env=local --queue=default
I then did some actions that create jobs, but they never got processed. I used telnet as a poor man's monitor for beanstalk like so:
$ telnet localhost 11300
Trying ::1...
Connected to localhost.
Escape character is '^]'.
stats
OK 923
---
current-jobs-urgent: 0
current-jobs-ready: 3
current-jobs-reserved: 0
current-jobs-delayed: 0
current-jobs-buried: 0
cmd-put: 3
cmd-peek: 0
cmd-peek-ready: 0
cmd-peek-delayed: 0
cmd-peek-buried: 0
cmd-reserve: 0
cmd-reserve-with-timeout: 652
cmd-delete: 0
cmd-release: 0
cmd-use: 1
cmd-watch: 0
cmd-ignore: 0
cmd-bury: 0
cmd-kick: 0
cmd-touch: 0
cmd-stats: 8
cmd-stats-job: 0
cmd-stats-tube: 0
cmd-list-tubes: 0
cmd-list-tube-used: 0
cmd-list-tubes-watched: 0
cmd-pause-tube: 0
job-timeouts: 0
total-jobs: 3
max-job-size: 65535
current-tubes: 2
current-connections: 2
current-producers: 0
current-workers: 1
current-waiting: 0
total-connections: 8
pid: 56692
version: 1.10
rusage-utime: 0.010171
rusage-stime: 0.031001
uptime: 2023
binlog-oldest-index: 0
binlog-current-index: 0
binlog-records-migrated: 0
binlog-records-written: 0
binlog-max-size: 10485760
id: 3620777b4ee08cdc
Question
I can see that 3 jobs are ready.. but i have no idea how to dispatch them (or for that matter, find out what jobs are exactly inside of them). What should I do?
You can use the beanstalk console web app https://github.com/ptrofimov/beanstalk_console.
I would also log some info in a separated log file, to inform me about some values and details happening within the running job. Then I tail that log file while executing the queued jobs and watching the beanstalk console interface.

How to investigate why Laravel Beanstalk Queue is hoarding memory

Background
I'm running a laravel 5.3 powered web app on nginx. My prod is working fine (running on an AWS t2.medium) and my staging has been running fine until it recently got overloaded. Staging is a t2.micro.
Problem
The problem happened when started trying to hit the api endpoints, and started getting this error:
503 (Service Unavailable: Back-end server is at capacity)
using htop we got the following:
so we found that our beanstalk queues are taking insane amounts of memory.
What I have tried
we used telnet to peek into what's going inside beanstalk:
$~/beanstalk-console$ telnet localhost 11300
Trying 127.0.0.1...
Connected to staging-api-3.
Escape character is '^]'.
stats
OK 940
---
current-jobs-urgent: 0
current-jobs-ready: 0
current-jobs-reserved: 0
current-jobs-delayed: 2
current-jobs-buried: 0
cmd-put: 451
cmd-peek: 0
cmd-peek-ready: 0
cmd-peek-delayed: 0
cmd-peek-buried: 0
cmd-reserve: 0
cmd-reserve-with-timeout: 769174
cmd-delete: 449
cmd-release: 6
cmd-use: 321
cmd-watch: 579067
cmd-ignore: 579067
cmd-bury: 0
cmd-kick: 0
cmd-touch: 0
cmd-stats: 1
cmd-stats-job: 464
cmd-stats-tube: 0
cmd-list-tubes: 0
cmd-list-tube-used: 0
cmd-list-tubes-watched: 0
cmd-pause-tube: 0
job-timeouts: 0
total-jobs: 451
max-job-size: 65535
current-tubes: 2
current-connections: 1
current-producers: 0
current-workers: 0
current-waiting: 0
total-connections: 769377
pid: 1107
version: 1.10
rusage-utime: 97.572000
rusage-stime: 274.560000
uptime: 1609870
binlog-oldest-index: 0
binlog-current-index: 0
binlog-records-migrated: 0
binlog-records-written: 0
binlog-max-size: 10485760
id: 906b3629b01390dc
hostname: staging-api-3
nothing there seems to be concerning..
Question
I would like to have a more transparent look into whats going on in these jobs (ie what are the jobs exactly?) I know Laravel Horizon provides such services but only comes on Laravel 5.5. I researched what queue monitors are out there and tried to install beanstalk console. Right now when I installed it.. I'm getting 52.16.%ip%.%ip% took too long to respond. which I think is expected considering that the whole machine is already jammed.
I figure if I reboot the machine I can install beanstalk_console just fine, but then i'll lose the opportunity to investigate what's causing the problem this time around, sine it's a rare occurrence.. What else can I do to investigate and see what exactly are the jobs that are draining the CPU and why
Update
I restarted the instance, and the apis work now, but i'm still getting CPU is at 100%.. what am I missing?

App Engine Win SDK PHP timeout stuck at 30 seconds, should be 60?

I'm using Google's latest Windows App Engine PHP SDK, v1.9.38 to run some long running scripts on the local dev server and for some reason they're timing out at 30 seconds. Error is e.g. "Fatal error: The request was aborted because it exceeded the maximum execution time. in [my script path]\timertest.php on line 8"
The time out is supposed to be 60 seconds for automatic scaling! I'm not sure what I'm missing here... I'm doing various file processing in one script but I then wrote a test script to see if that failed at 30 secs too, and it did. Script is:
<?php
$a = 1;
do
{
syslog(LOG_INFO, $a.' Sleeping for 10 secs...\n');
sleep(10);
$a++;
}
while($a < 8)
?>
Output is:
INFO: 1 Sleeping for 10 secs...\n
INFO: 2 Sleeping for 10 secs...\n
INFO: 3 Sleeping for 10 secs...\n
ERROR:root:php failure (255) with:
stdout:
X-Powered-By: PHP/5.5.26
Content-type: text/html
<br />
<b>Fatal error</b>: The request was aborted because it exceeded the maximum execution time. in <b>[my script path]\timertest.php</b> on line <b>8</b><br />
INFO 2016-06-02 20:52:56,693 module.py:788] default: "GET /testing/timertest.php HTTP/1.1" 500 195
I was thinking it was a config error somewhere, but not sure what or where. My app.yaml is very standard:
application: ak2016-1
version: 1
runtime: php55
api_version: 1
handlers:
# Serve php scripts.
- url: /(.+\.php)$
script: \1
login: admin
and php.ini too:
google_app_engine.disable_readonly_filesystem = 1
upload_max_filesize = 8M
display_errors = "1"
display_startup_errors = "1"
As I say, this is an issue with the local dev SDK server only, I'm not bothered about the online live side as files I'm processing are local (and need to remain so).
Thanks for any suggestions etc!
I deployed the app sample on the Request Timer documentation, and was not able to duplicate your issue. My requests all timeout after ~60 seconds:
$ time curl https://<project-id>.appspot.com/timeout.php
Got timeout! Cleaning up...
real 1m0.127s
user 0m0.021s
sys 0m0.010s
I then copied your code, app.yaml, and php.ini to see if I could duplicate that, and received the following in my syslogs:
INFO: 1 Sleeping for 10 secs...\n
INFO: 2 Sleeping for 10 secs...\n
INFO: 3 Sleeping for 10 secs...\n
INFO: 4 Sleeping for 10 secs...\n
INFO: 5 Sleeping for 10 secs...\n
INFO: 6 Sleeping for 10 secs...\n
INFO: PHP Fatal error: The request was aborted because it exceeded the maximum execution time. in /base/data/home/apps/.../timeout2.php on line 9
However, if you continue to have issues with requests timing out after 30 seconds, I would suggest moving the offending code into task queues. I hope this helps!

APC making PHP 5.3 slower?

I recently learned about APC (I know, I'm late to the show) and decided to try it out on my development server. I did some benchmarking with ApacheBench, and to my surprise I've found that things are running slower than before.
I haven't made any code optimizations to use apc_fetch or anything, but I was under the impression the opcode caching should make a positive impact on its own?
C:\Apache24\bin>ab -n 1000 http://localhost/
This is ApacheBench, Version 2.3 <$Revision: 1178079 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Finished 1000 requests
Server Software: Apache/2.4.2
Server Hostname: localhost
Server Port: 80
Document Path: /
Document Length: 22820 bytes
Concurrency Level: 1
Time taken for tests: 120.910 seconds
Complete requests: 1000
Failed requests: 95
(Connect: 0, Receive: 0, Length: 95, Exceptions: 0)
Write errors: 0
Total transferred: 23181893 bytes
HTML transferred: 22819893 bytes
Requests per second: 8.27 [#/sec] (mean)
Time per request: 120.910 [ms] (mean)
Time per request: 120.910 [ms] (mean, across all concurrent requests)
Transfer rate: 187.23 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.4 0 1
Processing: 110 120 7.2 121 156
Waiting: 61 71 7.1 72 103
Total: 110 121 7.2 121 156
Percentage of the requests served within a certain time (ms)
50% 121
66% 122
75% 123
80% 130
90% 131
95% 132
98% 132
99% 137
100% 156 (longest request)
Here's the APC section of my php.ini. I've left most things at the default except for expanding the default size to 128MB instead of 32.
[APC]
apc.enabled = 1
apc.enable_cli = 1
apc.ttl=3600
apc.user_ttl=3600
apc.shm_size = 128M
apc.slam_defense = 0
Am I doing something wrong, or do I just need to use apc_fetch/store to really get a benefit from APC?
Thanks for any insight you guys can give.
Enabling APC with default settings will make a noticeable (to say the least) difference in response times for your PHP script. You don't have to use any of its specific store/fetch functions to get benefits from APC. In fact, normally you don't even need a benchmark to tell the difference; the difference should be apparent by simply navigating through your site.
If you don't see any difference and your benchmarks don't have some kind of error, then I'd suggest that you start debugging the issue (enable error reporting, check the logs, etc).

Categories