So I have a docker container build with buildpack that runs the following command under PID 1
procmgr /layers/paketo-buildpacks_php-web/php-web/procs.yml
is it possible to reload the configs passed into procmgr somehow?
🐳 [DEV] backend-v1-7476cc6cfd-l68p2 app #
ps aufx
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
cnb 30 1.0 0.0 18648 3444 pts/0 Ss 03:22 0:00 bash
cnb 77 0.0 0.0 34412 2944 pts/0 R+ 03:22 0:00 \_ ps aufx
cnb 1 0.0 0.1 1013616 4096 ? Ssl 03:20 0:00 procmgr /layers/paketo-buildpacks_php-web/php-web/procs.yml
cnb 24 0.0 0.7 323596 31260 ? Ss 03:20 0:00 php-fpm: master process (/layers/paketo-buildpacks_php-web/php-web/etc/php-fpm.conf)
cnb 28 0.3 0.6 330252 24112 ? S 03:20 0:00 \_ php-fpm: pool www
cnb 29 0.1 0.6 330016 23900 ? S 03:20 0:00 \_ php-fpm: pool www
cnb 25 0.0 0.1 25132 5648 ? S 03:20 0:00 nginx: master process nginx -p /workspace/app -c /workspace/app/nginx.conf
cnb 26 0.0 0.0 25132 1988 ? S 03:20 0:00 \_ nginx: worker process
cnb 27 0.0 0.0 25132 1988 ? S 03:20 0:00 \_ nginx: worker process
🐳 [DEV] backend-v1-7476cc6cfd-l68p2 app #
cat /layers/paketo-buildpacks_php-web/php-web/procs.yml
processes:
nginx:
command: nginx
args:
- -p
- /workspace/app
- -c
- /workspace/app/nginx.conf
php-fpm:
command: php-fpm
args:
- -p
- /layers/paketo-buildpacks_php-web/php-web
- -y
- /layers/paketo-buildpacks_php-web/php-web/etc/php-fpm.conf
- -c
- /layers/paketo-buildpacks_php-web/php-web/etc
At the time of writing this, no. The procmgr cli that is used here is very basic.
https://github.com/paketo-buildpacks/php-web/blob/main/cmd/procmgr/main.go
It loads the processes information listed in procs.yml, starts those processes, redirects all STDOUT/STDERR, and watches for a process to exit. If one process exits, then they all exit. There's no reload capability.
If you need to reload the information in procs.yml, you'll need to reload your container.
Supervisor is not started into my container and i am not able to run my php artisan queue:work command for my laravel project.
Extract from my Dockerfile
# Add worker to supervisor config file
COPY laravel-worker.conf /etc/supervisor/conf.d/
CMD ["/usr/bin/supervisord"]
Here is the laravel-worker.conf:
[program:laravel-worker]
command=php /var/www/test/current/artisan queue:work --tries=3
user=myuser
process_name=%(program_name)s_%(process_num)d
directory=/var/www/test/current
stdout_logfile=/tmp/supervisord.log
redirect_stderr=true
numprocs=1
autostart=true
autorestart=true
When i go into the container, the supervisor service is not started:
root#e7227ef40f63:/# service supervisor status
supervisord is not running.
And process are following:
root#e7227ef40f63:/# ps -aux | grep supervisor
root 1 0.0 0.0 4328 652 ? Ss 18:21 0:00 /bin/sh -c service ssh restart && service apache2 restart && service cron start && bash /usr/bin/supervisord
root 365 0.0 0.0 55808 10632 ? Ss 18:25 0:00 /usr/bin/python /usr/bin/supervisord
root 380 0.0 0.0 11120 712 ? S+ 18:27 0:00 grep supervisor
UPDATE
I edited my DockerFile and put this line:
ENTRYPOINT service ssh restart && service apache2 restart && service cron start && /usr/bin/supervisord && bash
The service is now well started when the container starts :
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.1 0.0 4328 652 ? Ss 05:20 0:00 /bin/sh -c service ssh restart && service apache2 restart && service cron start && /usr/bin/supervisord && bash
root 25 0.0 0.0 55176 1140 ? Ss 05:20 0:00 /usr/sbin/sshd
root 43 0.1 0.0 406408 25504 ? Ss 05:20 0:00 /usr/sbin/apache2 -k start
www-data 46 0.0 0.0 406440 8416 ? S 05:20 0:00 /usr/sbin/apache2 -k start
www-data 47 0.0 0.0 406440 8416 ? S 05:20 0:00 /usr/sbin/apache2 -k start
www-data 48 0.0 0.0 406440 8416 ? S 05:20 0:00 /usr/sbin/apache2 -k start
www-data 49 0.0 0.0 406440 8416 ? S 05:20 0:00 /usr/sbin/apache2 -k start
www-data 50 0.0 0.0 406440 8416 ? S 05:20 0:00 /usr/sbin/apache2 -k start
root 59 0.0 0.0 17484 636 ? Ss 05:20 0:00 /usr/sbin/cron
root 63 0.2 0.0 56012 10788 ? Ss 05:20 0:00 /usr/bin/python /usr/bin/supervisord
root 64 0.0 0.0 20032 1280 ? S 05:20 0:00 bash
root 89 0.1 0.0 20240 1996 ? Ss 05:20 0:00 bash
root 112 0.0 0.0 17492 1168 ? R+ 05:21 0:00 ps -aux
But it seams that supervisor doesn't start my config file because i don't see the 8 processes that should be run..
This is the bad part:
&& bash /usr/bin/supervisord
supervisord is not a bash script. Execute it as is: && /usr/bin/supervisord.
However, I recommend you to completely avoid using service in a container. In general, running more than one process into a container is considered an antipattern, but if you really need it, better only use supervisor . Create a .conf file for each process (cron, sshd, etc) and only run supervosord as is in your CMD.
I have install zabbix 2.4 (using rpms) in CentOS 6.8 with psql 8.4.20, apache httpd 2.2.15, php 5.6.22, but my 80 port was occupied by nginx, so I changed the httpd port to 8080, and when I restart the httpd service:
[root#kf8 ~]# service httpd restart
stop httpd: [yes]
start httpd:[Mon Aug 22 19:40:03 2016] [warn] module php5_module is already loaded, skipping
[Mon Aug 22 19:40:03 2016] [warn] module php5_module is already loaded, skipping
httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
[yes]
[root#kf8 ~]#
and I have start the zabbix-server/agent, the process listed below:
[root#kf8 ~]# ps aux|grep zabbix
zabbix 2616 0.0 0.0 163304 2832 ? S 16:48 0:00 zabbix_server -c /etc/zabbix/zabbix_server.conf
zabbix 2623 0.0 0.0 163340 2404 ? S 16:48 0:00 zabbix_server: configuration syncer [synced configuration in 0.007229 sec, idle 60 sec]
zabbix 2624 0.0 0.0 163304 2016 ? S 16:48 0:00 zabbix_server: db watchdog [synced alerts config in 0.005168 sec, idle 60 sec]
zabbix 2625 0.0 0.0 261388 3596 ? S 16:48 0:00 zabbix_server: poller #1 [got 0 values in 0.000070 sec, idle 5 sec]
zabbix 2626 0.0 0.0 261388 3596 ? S 16:48 0:00 zabbix_server: poller #2 [got 0 values in 0.000079 sec, idle 5 sec]
zabbix 2628 0.0 0.0 261388 3596 ? S 16:48 0:00 zabbix_server: poller #3 [got 0 values in 0.000076 sec, idle 5 sec]
zabbix 2629 0.0 0.0 261388 3596 ? S 16:48 0:00 zabbix_server: poller #4 [got 0 values in 0.000075 sec, idle 5 sec]
zabbix 2630 0.0 0.0 261388 3596 ? S 16:48 0:00 zabbix_server: poller #5 [got 0 values in 0.000074 sec, idle 5 sec]
zabbix 2631 0.0 0.0 261388 3596 ? S 16:48 0:00 zabbix_server: unreachable poller #1 [got 0 values in 0.000072 sec, idle 5 sec]
zabbix 2632 0.0 0.0 163304 2004 ? S 16:48 0:00 zabbix_server: trapper #1 [processed data in 0.000000 sec, waiting for connection]
zabbix 2633 0.0 0.0 163304 2004 ? S 16:48 0:00 zabbix_server: trapper #2 [processed data in 0.000000 sec, waiting for connection]
zabbix 2634 0.0 0.0 163304 2004 ? S 16:48 0:00 zabbix_server: trapper #3 [processed data in 0.000000 sec, waiting for connection]
postgres 2635 0.0 0.0 217652 5160 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
zabbix 2636 0.0 0.0 163304 2004 ? S 16:48 0:00 zabbix_server: trapper #4 [processed data in 0.000000 sec, waiting for connection]
zabbix 2637 0.0 0.0 163304 2004 ? S 16:48 0:00 zabbix_server: trapper #5 [processed data in 0.000000 sec, waiting for connection]
zabbix 2638 0.0 0.0 163752 1824 ? S 16:48 0:00 zabbix_server: icmp pinger #1 [got 0 values in 0.000103 sec, idle 5 sec]
zabbix 2639 0.0 0.0 163304 1988 ? S 16:48 0:00 zabbix_server: alerter [sent alerts: 0 success, 0 fail in 0.000385 sec, idle 30 sec]
zabbix 2640 0.0 0.0 163616 2136 ? S 16:48 0:00 zabbix_server: housekeeper [deleted 0 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 0.003631 sec, idle 1 hour(s)]
postgres 2641 0.0 0.0 217652 5160 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
zabbix 2642 0.0 0.0 163304 2092 ? S 16:48 0:00 zabbix_server: timer #1 [processed 0 triggers, 0 events in 0.000045 sec, 0 maintenances in 0.000000 sec, idle 30 sec]
postgres 2643 0.0 0.0 217652 5156 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
zabbix 2644 0.0 0.0 163304 2008 ? S 16:48 0:00 zabbix_server: http poller #1 [got 0 values in 0.000521 sec, idle 5 sec]
postgres 2645 0.0 0.0 217652 5156 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
zabbix 2646 0.0 0.0 260936 3548 ? S 16:48 0:00 zabbix_server: discoverer #1 [processed 0 rules in 0.000418 sec, idle 60 sec]
postgres 2647 0.0 0.0 217652 5148 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
zabbix 2648 0.0 0.0 163308 2004 ? S 16:48 0:00 zabbix_server: history syncer #1 [synced 0 items in 0.000025 sec, idle 5 sec]
zabbix 2649 0.0 0.0 163308 2004 ? S 16:48 0:00 zabbix_server: history syncer #2 [synced 0 items in 0.000019 sec, idle 5 sec]
zabbix 2650 0.0 0.0 163308 2004 ? S 16:48 0:00 zabbix_server: history syncer #3 [synced 0 items in 0.000025 sec, idle 5 sec]
postgres 2651 0.0 0.0 217744 5908 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
zabbix 2652 0.0 0.0 163308 2004 ? S 16:48 0:00 zabbix_server: history syncer #4 [synced 0 items in 0.000023 sec, idle 5 sec]
postgres 2653 0.0 0.0 217736 5988 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2654 0.0 0.0 217780 6120 ? Ss 16:48 0:01 postgres: zabbix zabbix [local] idle
zabbix 2655 0.0 0.0 163304 2008 ? S 16:48 0:00 zabbix_server: escalator [processed 0 escalations in 0.000228 sec, idle 3 sec]
zabbix 2656 0.0 0.0 163304 2020 ? S 16:48 0:00 zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.000070 sec, idle 5 sec]
zabbix 2657 0.0 0.0 163304 1780 ? S 16:48 0:01 zabbix_server: self-monitoring [processed data in 0.000041 sec, idle 1 sec]
postgres 2658 0.0 0.0 217652 5164 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2659 0.0 0.0 217652 5176 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2661 0.0 0.0 217652 5168 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2662 0.0 0.0 217652 5536 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2663 0.0 0.0 217652 5176 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2664 0.0 0.0 217652 5172 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2665 0.0 0.0 217652 5176 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2666 0.0 0.0 217652 5172 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2667 0.0 0.0 217652 5176 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2668 0.0 0.0 217652 5172 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2669 0.0 0.0 217652 5184 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2670 0.0 0.0 217652 5172 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
postgres 2671 0.0 0.0 217740 6068 ? Ss 16:48 0:00 postgres: zabbix zabbix [local] idle
zabbix 2746 0.0 0.0 77136 1384 ? S 16:48 0:00 zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf
zabbix 2748 0.0 0.0 77136 1976 ? S 16:48 0:06 zabbix_agentd: collector [idle 1 sec]
zabbix 2749 0.0 0.0 77136 1216 ? S 16:48 0:00 zabbix_agentd: listener #1 [waiting for connection]
zabbix 2750 0.0 0.0 77136 1216 ? S 16:48 0:00 zabbix_agentd: listener #2 [waiting for connection]
zabbix 2751 0.0 0.0 77136 1216 ? S 16:48 0:00 zabbix_agentd: listener #3 [waiting for connection]
postgres 2812 0.0 0.0 218652 7768 ? Ss 16:49 0:00 postgres: zabbix zabbix [local] idle
root 4790 0.0 0.0 103328 908 pts/1 S+ 19:42 0:00 grep zabbix
But I can not connect 2 the website, and I can not find and error in the zabbix-server/agent's error log files, though I open the most detailed log level.
The website shown like this:
404 Page Not Found
The page you requested was not found.
Try http://yourzabbixserverIP:8080/zabbix to access the zabbix web UI and trace the apache error logs that are stored in /var/log/httpd/error_log.
What does the error log say?
I have a wordpress site hosted on a digitalocean's droplet and installed through their one-click installation that goes down very often.
Here the url: sinc.marchespettacolo.it
Almost once a week i have to restart Apache from the console because the site is down and if i try to connect to it i get a timeout.
I'm not that pro on making a diagnosis on what is causing this error.
Could someone help me how i have to proceed to find out what's causing this situation and how i can find a solution to the problem?
I have also another droplet with another droplet where i've installed wordpress with the same process that gives no problems!
Some says could be a problem connected to Apache opening to many processes.
This is the last process log, done while the site was down.
root#sinc:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 33460 980 ? Ss 04:27 0:01 /sbin/init
root 2 0.0 0.0 0 0 ? S 04:27 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S 04:27 0:00 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S< 04:27 0:00 [kworker/0:0H]
root 7 0.0 0.0 0 0 ? S 04:27 0:12 [rcu_sched]
root 8 0.0 0.0 0 0 ? R 04:27 0:13 [rcuos/0]
root 9 0.0 0.0 0 0 ? S 04:27 0:00 [rcu_bh]
root 10 0.0 0.0 0 0 ? S 04:27 0:00 [rcuob/0]
root 11 0.0 0.0 0 0 ? S 04:27 0:00 [migration/0]
root 12 0.0 0.0 0 0 ? S 04:27 0:00 [watchdog/0]
root 13 0.0 0.0 0 0 ? S< 04:27 0:00 [khelper]
root 14 0.0 0.0 0 0 ? S 04:27 0:00 [kdevtmpfs]
root 15 0.0 0.0 0 0 ? S< 04:27 0:00 [netns]
root 16 0.0 0.0 0 0 ? S< 04:27 0:00 [writeback]
root 17 0.0 0.0 0 0 ? S< 04:27 0:00 [kintegrityd]
root 18 0.0 0.0 0 0 ? S< 04:27 0:00 [bioset]
root 19 0.0 0.0 0 0 ? S< 04:27 0:00 [kworker/u3:0]
root 20 0.0 0.0 0 0 ? S< 04:27 0:00 [kblockd]
root 21 0.0 0.0 0 0 ? S< 04:27 0:00 [ata_sff]
root 22 0.0 0.0 0 0 ? S 04:27 0:00 [khubd]
root 23 0.0 0.0 0 0 ? S< 04:27 0:00 [md]
root 24 0.0 0.0 0 0 ? S< 04:27 0:00 [devfreq_wq]
root 25 0.0 0.0 0 0 ? S 04:27 0:05 [kworker/0:1]
root 27 0.0 0.0 0 0 ? S 04:27 0:00 [khungtaskd]
root 28 3.5 0.0 0 0 ? S 04:27 18:14 [kswapd0]
root 29 0.0 0.0 0 0 ? SN 04:27 0:00 [ksmd]
root 30 0.0 0.0 0 0 ? SN 04:27 0:00 [khugepaged]
root 31 0.0 0.0 0 0 ? S 04:27 0:00 [fsnotify_mark]
root 32 0.0 0.0 0 0 ? S 04:27 0:00 [ecryptfs-kthrea]
root 33 0.0 0.0 0 0 ? S< 04:27 0:00 [crypto]
root 45 0.0 0.0 0 0 ? S< 04:27 0:00 [kthrotld]
root 47 0.0 0.0 0 0 ? S 04:27 0:00 [vballoon]
root 48 0.0 0.0 0 0 ? S 04:27 0:00 [scsi_eh_0]
root 49 0.0 0.0 0 0 ? S 04:27 0:00 [scsi_eh_1]
root 70 0.0 0.0 0 0 ? S< 04:27 0:00 [deferwq]
root 71 0.0 0.0 0 0 ? S< 04:27 0:00 [charger_manager]
root 116 0.0 0.0 0 0 ? S 04:27 0:00 [scsi_eh_2]
root 118 0.0 0.0 0 0 ? S< 04:27 0:00 [kpsmoused]
root 119 0.0 0.0 0 0 ? S 04:27 0:00 [kworker/0:2]
root 126 0.0 0.0 0 0 ? S 04:27 0:00 [jbd2/vda1-8]
root 127 0.0 0.0 0 0 ? S< 04:27 0:00 [ext4-rsv-conver]
root 313 0.0 0.0 19476 0 ? S 04:27 0:00 upstart-udev-bridge --daemon
root 323 0.0 0.0 51340 128 ? Ss 04:27 0:00 /lib/systemd/systemd-udevd --daemon
message+ 326 0.0 0.0 39228 268 ? Ss 04:27 0:00 dbus-daemon --system --fork
root 392 0.0 0.0 43452 560 ? Ss 04:27 0:00 /lib/systemd/systemd-logind
syslog 396 0.0 0.0 255844 360 ? Ssl 04:27 0:01 rsyslogd
root 430 0.0 0.0 15408 4 ? S 04:27 0:00 upstart-file-bridge --daemon
root 542 0.0 0.0 0 0 ? S< 04:27 0:00 [ttm_swap]
root 635 0.0 0.0 0 0 ? S< 04:27 0:00 [kvm-irqfd-clean]
root 771 0.0 0.0 15656 4 ? S 04:27 0:00 upstart-socket-bridge --daemon
root 791 0.0 0.0 15820 116 tty4 Ss+ 04:27 0:00 /sbin/getty -8 38400 tty4
root 795 0.0 0.0 15820 116 tty5 Ss+ 04:27 0:00 /sbin/getty -8 38400 tty5
root 800 0.0 0.0 15820 116 tty2 Ss+ 04:27 0:00 /sbin/getty -8 38400 tty2
root 801 0.0 0.0 15820 116 tty3 Ss+ 04:27 0:00 /sbin/getty -8 38400 tty3
root 804 0.0 0.0 15820 116 tty6 Ss+ 04:27 0:00 /sbin/getty -8 38400 tty6
root 837 0.0 0.0 61364 272 ? Ss 04:27 0:01 /usr/sbin/sshd -D
root 841 0.0 0.0 4368 96 ? Ss 04:27 0:00 acpid -c /etc/acpi/events -s /var/run/acpid.socket
root 843 0.0 0.0 23656 204 ? Ss 04:27 0:00 cron
daemon 844 0.0 0.0 19140 56 ? Ss 04:27 0:00 atd
whoopsie 868 0.0 0.0 335684 792 ? Ssl 04:27 0:00 whoopsie
mysql 916 0.1 2.5 918056 25668 ? Ssl 04:27 1:01 /usr/sbin/mysqld
root 1001 0.0 0.0 25344 376 ? Ss 04:27 0:00 /usr/lib/postfix/master
postfix 1006 0.0 0.0 27572 936 ? S 04:27 0:00 qmgr -l -t unix -u
root 1126 0.0 0.0 15820 116 tty1 Ss+ 04:27 0:00 /sbin/getty -8 38400 tty1
root 1384 0.0 0.0 0 0 ? S< 04:28 0:12 [kworker/u3:1]
root 1416 0.0 0.0 0 0 ? S 04:28 0:00 [kauditd]
root 2853 0.0 0.0 0 0 ? S 05:33 0:00 [kworker/u2:2]
root 3541 0.0 0.0 0 0 ? S 05:59 0:00 [kworker/u2:0]
postfix 6465 0.0 0.0 27408 968 ? S 12:48 0:00 pickup -l -t unix -u -c
root 6508 0.0 0.0 105632 596 ? Ss 12:58 0:00 sshd: root#pts/0
root 6578 0.0 0.1 22404 1484 pts/0 Ss 12:58 0:00 -bash
root 6639 0.0 0.0 0 0 ? S 13:00 0:00 [kworker/u2:1]
root 6673 0.0 1.5 312976 15964 ? Ss 13:00 0:00 /usr/sbin/apache2 -k start
www-data 6677 1.8 2.6 319132 27180 ? S 13:00 0:03 /usr/sbin/apache2 -k start
www-data 6678 1.8 2.6 319132 27180 ? S 13:00 0:03 /usr/sbin/apache2 -k start
www-data 6679 2.0 3.8 324792 38892 ? S 13:00 0:03 /usr/sbin/apache2 -k start
www-data 6680 1.8 3.5 325320 35736 ? S 13:00 0:03 /usr/sbin/apache2 -k start
www-data 6681 1.8 2.6 319132 27180 ? S 13:00 0:03 /usr/sbin/apache2 -k start
www-data 6688 1.8 2.6 319132 27176 ? S 13:00 0:03 /usr/sbin/apache2 -k start
www-data 6690 1.8 2.4 317344 25372 ? S 13:00 0:03 /usr/sbin/apache2 -k start
www-data 6691 1.7 2.6 319132 27176 ? S 13:00 0:02 /usr/sbin/apache2 -k start
root 6700 0.0 0.1 18448 1292 pts/0 R+ 13:03 0:00 ps aux
Any wonderfull idea?
Thaaaanks in advance.
UPDATE
05-Apr-2016
here the Apache error log as requested:
https://gist.github.com/iperdiscount/ff06cf131f7ac1ec31aa28761e36b1c9#file-gistfile1-txt
I have a query that looks like this:
SELECT id FROM user WHERE id='47'
The ID is indexed and reads for this query are always fast when using profiling data, like this.
SET profiling = 1;
SHOW PROFILES;
The queries always execute in around 0.0002 seconds.
However, if I profile the query from the PHP side, like this:
$current = microtime(true);
$data = $conn->query($full_query);
$elapsed = microtime(true) - $current;
Then occasionally maybe 1 out 50 of these queries will take something like .2 seconds. However, in my test script I have code to test this that profiles the query using SET profiling = 1; and even though the PHP round trip through PDO might be .2 seconds the query time was still 0.0002.
Things I know, or know that aren't causing the issue:
The query isn't slow. When I look at the same query, from the same query run, profiled in PHP and profiled using SET PROFILING the query is always fast and never logged in the slow query log even when it shows taking .2 seconds from the PHP side.
This is not skip-name-resolve related - this is inconsistent and I have skip-name-resolve already on
This is not query cache related, the behavior exists in both
This behavior happens even on queries coming out of the cache.
The query doesn't actually select the ID, but I use this query for testing to show that it isn't a disk access issue since that field is definitely indexed.
This tables is only 10-20 megs with something like a 1 meg index. The machine shows very little load and innodb is not using all of its buffers.
This is tested against a table that has no other activity against it other than my test queries.
Does anyone have any ideas of what else to check? This seems to me to be a networking issue, but I need to be able to see it and find the issue to fix it and I'm running out of places to check next. Any ideas?
I would profile the machine.
You say this occurs ~1 per 50 times, and that each query has a 0.2 sec benchmark. You should be able to put top in a screen, and then run a loop of queries in PHP to load-test the RDBMS and gather performance stats.
You will probably have to run more than 50 * 0.2 =10 seconds, since your "1 out of 50" statistic is probably based on hand-running individual queries - based on what I read in your description. Try 30-second and 90-second load tests.
During this time, watch your top process screen. Sort it by CPU by pressing P. Each time you press 'P' it will change the sort order for process-CPU-consumption, so make sure you have the most-consuming on top. (pressing M sorts by memory usage. check the man page for more)
Look for anything that bubbles to the top during the time(s) of your load-test. You should see something jump higher - however momentarily.
(note, such a process may not reach the top of the list — it need not, but could still introduce enough disk load or other activity to lag the MySQL server)
I have noticed the same phenomenon on my systems. Queries which normally take a millisecond will suddenly take 1-2 seconds. All of my cases are simple, single table INSERT/UPDATE/REPLACE statements --- not on any SELECTs. No load, locking, or thread build up is evident.
I had suspected that it's due to clearing out dirty pages, flushing changes to disk, or some hidden mutex, but I have yet to narrow it down.
Also Ruled Out
Server load -- no correlation with high
load Engine -- happens with InnoDB/MyISAM/Memory MySQL Query
Cache -- happens whether it's on or off
Log rotations -- no correlation in events
Good for you to have been using the query profiler already. If you're using MySQL 5.6, you also have access to a lot of new performance measurements in the PERFORMANCE_SCHEMA. This has the capability to measure a lot more detail than the query profiler, and it also measures globally instead of just one session. P_S is reportedly going to replace the query profiler.
To diagnose your issue, I would start by confirming or ruling out a TCP/IP issue. For example, test the PHP script to see if it gets the same intermittent latency when connecting via the UNIX socket. You can do this by connecting to localhost which means the PHP script must run on the same server as the database. If the problem goes away when you bypass TCP/IP, this would tell you that the root cause is likely to be TCP/IP.
If you're in a virtual environment like a cloud hosting, you can easily experience variations in performance because of other users of the same cloud intermittently using up all the bandwidth. This is one of the downsides of the cloud.
If you suspect it's a TCP/IP issue, you can test TCP/IP latency independently from PHP or MySQL. Typical tools that are readily available include ping or traceroute. But there are many others. You can also test network speed with netcat. Use a tool that can measure repeatedly over time, because it sounds like you have good performance most of the time, with occasional glitches.
Another possibility is that the fault lies in PHP. You can try profiling PHP with XHProf to find out where it is spending its time.
Try to isolate the problem. Run a little script like this:
https://drive.google.com/file/d/0B0P3JM22IdYZYXY3Y0h5QUg2WUk/edit?usp=sharing
... to see which steps in the chain are spiking. If you have ssh2 installed, it'll also return ps axu immediately after the longest-running test-loop to see what's running.
Running against localhost on my home development box, the results look like this:
Array
(
[tests summary] => Array
(
[host_ping] => Array
(
[total_time] => 0.010216474533081
[max_time] => 0.00014901161193848
[min_time] => 9.7036361694336E-5
[tests] => 100
[failed] => 0
[last_run] => 9.8943710327148E-5
[average] => 0.00010216474533081
)
[db_connect] => Array
(
[total_time] => 0.11583232879639
[max_time] => 0.0075201988220215
[min_time] => 0.0010058879852295
[tests] => 100
[failed] => 0
[last_run] => 0.0010249614715576
[average] => 0.0011583232879639
)
[db_select_db] => Array
(
[total_time] => 0.011744260787964
[max_time] => 0.00031399726867676
[min_time] => 0.00010991096496582
[tests] => 100
[failed] => 0
[last_run] => 0.0001530647277832
[average] => 0.00011744260787964
)
[db_dataless_query] => Array
(
[total_time] => 0.023221254348755
[max_time] => 0.00026106834411621
[min_time] => 0.00021100044250488
[tests] => 100
[failed] => 0
[last_run] => 0.00021481513977051
[average] => 0.00023221254348755
)
[db_data_query] => Array
(
[total_time] => 0.075078248977661
[max_time] => 0.0010559558868408
[min_time] => 0.00023698806762695
[tests] => 100
[failed] => 0
[last_run] => 0.00076413154602051
[average] => 0.00075078248977661
)
)
[worst full loop] => 0.039211988449097
[times at worst loop] => Array
(
[host_ping] => 0.00014400482177734
[db_connect] => 0.0075201988220215
[db_select_db] => 0.00012803077697754
[db_dataless_query] => 0.00023698806762695
[db_data_query] => 0.00023698806762695
)
[ps_at_worst] => USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 2884 1368 ? Ss Sep19 0:29 /sbin/init
root 2 0.0 0.0 0 0 ? S Sep19 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Sep19 0:00 [migration/0]
root 4 0.0 0.0 0 0 ? S Sep19 0:06 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S Sep19 0:00 [migration/0]
root 6 0.0 0.0 0 0 ? S Sep19 0:25 [watchdog/0]
root 7 0.0 0.0 0 0 ? S Sep19 7:42 [events/0]
root 8 0.0 0.0 0 0 ? S Sep19 0:00 [cgroup]
root 9 0.0 0.0 0 0 ? S Sep19 0:00 [khelper]
root 10 0.0 0.0 0 0 ? S Sep19 0:00 [netns]
root 11 0.0 0.0 0 0 ? S Sep19 0:00 [async/mgr]
root 12 0.0 0.0 0 0 ? S Sep19 0:00 [pm]
root 13 0.0 0.0 0 0 ? S Sep19 0:23 [sync_supers]
root 14 0.0 0.0 0 0 ? S Sep19 0:24 [bdi-default]
root 15 0.0 0.0 0 0 ? S Sep19 0:00 [kintegrityd/0]
root 16 0.0 0.0 0 0 ? S Sep19 0:47 [kblockd/0]
root 17 0.0 0.0 0 0 ? S Sep19 0:00 [kacpid]
root 18 0.0 0.0 0 0 ? S Sep19 0:00 [kacpi_notify]
root 19 0.0 0.0 0 0 ? S Sep19 0:00 [kacpi_hotplug]
root 20 0.0 0.0 0 0 ? S Sep19 0:00 [ata/0]
root 21 0.0 0.0 0 0 ? S Sep19 0:00 [ata_aux]
root 22 0.0 0.0 0 0 ? S Sep19 0:00 [ksuspend_usbd]
root 23 0.0 0.0 0 0 ? S Sep19 0:00 [khubd]
root 24 0.0 0.0 0 0 ? S Sep19 0:00 [kseriod]
root 25 0.0 0.0 0 0 ? S Sep19 0:00 [md/0]
root 26 0.0 0.0 0 0 ? S Sep19 0:00 [md_misc/0]
root 27 0.0 0.0 0 0 ? S Sep19 0:01 [khungtaskd]
root 28 0.0 0.0 0 0 ? S Sep19 0:00 [kswapd0]
root 29 0.0 0.0 0 0 ? SN Sep19 0:00 [ksmd]
root 30 0.0 0.0 0 0 ? S Sep19 0:00 [aio/0]
root 31 0.0 0.0 0 0 ? S Sep19 0:00 [crypto/0]
root 36 0.0 0.0 0 0 ? S Sep19 0:00 [kthrotld/0]
root 38 0.0 0.0 0 0 ? S Sep19 0:00 [kpsmoused]
root 39 0.0 0.0 0 0 ? S Sep19 0:00 [usbhid_resumer]
root 70 0.0 0.0 0 0 ? S Sep19 0:00 [iscsi_eh]
root 74 0.0 0.0 0 0 ? S Sep19 0:00 [cnic_wq]
root 75 0.0 0.0 0 0 ? S< Sep19 0:00 [bnx2i_thread/0]
root 87 0.0 0.0 0 0 ? S Sep19 0:00 [kstriped]
root 123 0.0 0.0 0 0 ? S Sep19 0:00 [ttm_swap]
root 130 0.0 0.0 0 0 ? S< Sep19 0:04 [kslowd000]
root 131 0.0 0.0 0 0 ? S< Sep19 0:05 [kslowd001]
root 231 0.0 0.0 0 0 ? S Sep19 0:00 [scsi_eh_0]
root 232 0.0 0.0 0 0 ? S Sep19 0:00 [scsi_eh_1]
root 291 0.0 0.0 0 0 ? S Sep19 0:35 [kdmflush]
root 293 0.0 0.0 0 0 ? S Sep19 0:00 [kdmflush]
root 313 0.0 0.0 0 0 ? S Sep19 2:11 [jbd2/dm-0-8]
root 314 0.0 0.0 0 0 ? S Sep19 0:00 [ext4-dio-unwrit]
root 396 0.0 0.0 2924 1124 ? S<s Sep19 0:00 /sbin/udevd -d
root 705 0.0 0.0 0 0 ? S Sep19 0:00 [kdmflush]
root 743 0.0 0.0 0 0 ? S Sep19 0:00 [jbd2/sda1-8]
root 744 0.0 0.0 0 0 ? S Sep19 0:00 [ext4-dio-unwrit]
root 745 0.0 0.0 0 0 ? S Sep19 0:00 [jbd2/dm-2-8]
root 746 0.0 0.0 0 0 ? S Sep19 0:00 [ext4-dio-unwrit]
root 819 0.0 0.0 0 0 ? S Sep19 0:18 [kauditd]
root 1028 0.0 0.0 3572 748 ? Ss Sep19 0:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient-eth0.leases -pf /var/run/dhclient-eth0.pid eth0
root 1072 0.0 0.0 13972 828 ? S<sl Sep19 2:13 auditd
root 1090 0.0 0.0 2052 512 ? Ss Sep19 0:00 /sbin/portreserve
root 1097 0.0 0.2 37568 3940 ? Sl Sep19 2:01 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
rpc 1120 0.0 0.0 2568 800 ? Ss Sep19 0:09 rpcbind
rpcuser 1138 0.0 0.0 2836 1224 ? Ss Sep19 0:00 rpc.statd
root 1161 0.0 0.0 0 0 ? S Sep19 0:00 [rpciod/0]
root 1165 0.0 0.0 2636 472 ? Ss Sep19 0:00 rpc.idmapd
root 1186 0.0 0.0 2940 756 ? Ss Sep19 13:27 lldpad -d
root 1195 0.0 0.0 0 0 ? S Sep19 0:00 [scsi_tgtd/0]
root 1196 0.0 0.0 0 0 ? S Sep19 0:00 [fc_exch_workque]
root 1197 0.0 0.0 0 0 ? S Sep19 0:00 [fc_rport_eq]
root 1199 0.0 0.0 0 0 ? S Sep19 0:00 [fcoe_work/0]
root 1200 0.0 0.0 0 0 ? S< Sep19 0:00 [fcoethread/0]
root 1201 0.0 0.0 0 0 ? S Sep19 0:00 [bnx2fc]
root 1202 0.0 0.0 0 0 ? S< Sep19 0:00 [bnx2fc_l2_threa]
root 1203 0.0 0.0 0 0 ? S< Sep19 0:00 [bnx2fc_thread/0]
root 1206 0.0 0.0 2184 564 ? Ss Sep19 1:08 /usr/sbin/fcoemon --syslog
root 1240 0.0 0.0 8556 976 ? Ss Sep19 1:22 /usr/sbin/sshd
root 1415 0.0 0.1 12376 2088 ? Ss Sep19 6:09 sendmail: accepting connections
smmsp 1424 0.0 0.0 12168 1680 ? Ss Sep19 0:02 sendmail: Queue runner#01:00:00 for /var/spool/clientmqueue
root 1441 0.0 0.0 5932 1260 ? Ss Sep19 0:56 crond
root 1456 0.0 0.0 2004 504 tty2 Ss+ Sep19 0:00 /sbin/mingetty /dev/tty2
root 1458 0.0 0.0 2004 504 tty3 Ss+ Sep19 0:00 /sbin/mingetty /dev/tty3
root 1460 0.0 0.0 2004 508 tty4 Ss+ Sep19 0:00 /sbin/mingetty /dev/tty4
root 1462 0.0 0.0 2004 504 tty5 Ss+ Sep19 0:00 /sbin/mingetty /dev/tty5
root 1464 0.0 0.0 2004 508 tty6 Ss+ Sep19 0:00 /sbin/mingetty /dev/tty6
root 1467 0.0 0.0 3316 1740 ? S< Sep19 0:00 /sbin/udevd -d
root 1468 0.0 0.0 3316 1740 ? S< Sep19 0:00 /sbin/udevd -d
apache 3796 0.0 0.4 32668 9452 ? S Dec16 0:08 /usr/sbin/httpd
apache 3800 0.0 0.4 32404 9444 ? S Dec16 0:08 /usr/sbin/httpd
apache 3801 0.0 0.4 33184 9556 ? S Dec16 0:07 /usr/sbin/httpd
apache 3821 0.0 0.4 32668 9612 ? S Dec16 0:08 /usr/sbin/httpd
apache 3840 0.0 0.4 32668 9612 ? S Dec16 0:07 /usr/sbin/httpd
apache 3841 0.0 0.4 32404 9464 ? S Dec16 0:07 /usr/sbin/httpd
apache 4032 0.0 0.4 32668 9632 ? S Dec16 0:07 /usr/sbin/httpd
apache 4348 0.0 0.4 32668 9460 ? S Dec16 0:07 /usr/sbin/httpd
apache 4355 0.0 0.4 32664 9464 ? S Dec16 0:07 /usr/sbin/httpd
apache 4356 0.0 0.5 32660 9728 ? S Dec16 0:07 /usr/sbin/httpd
apache 4422 0.0 0.4 32676 9460 ? S Dec16 0:06 /usr/sbin/httpd
root 5002 0.0 0.0 2004 504 tty1 Ss+ Nov21 0:00 /sbin/mingetty /dev/tty1
root 7540 0.0 0.0 5112 1380 ? S Dec17 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql 7642 0.1 1.0 136712 20140 ? Sl Dec17 2:35 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
root 8001 0.0 0.4 31028 9600 ? Ss Dec13 0:18 /usr/sbin/httpd
root 8092 0.0 0.0 0 0 ? S 13:47 0:00 [flush-253:2]
root 8511 0.0 0.0 0 0 ? S 13:48 0:00 [flush-8:0]
root 8551 16.0 0.4 28612 8008 pts/0 S+ 13:49 0:00 php test-mysql-connection.php exit
root 8552 44.0 0.1 11836 3252 ? Ss 13:49 0:00 sshd: root#notty
root 8560 0.0 0.0 4924 1032 ? Rs 13:49 0:00 ps axu
root 12520 0.0 0.1 11500 3212 ? Ss 09:05 0:00 sshd: jonwire [priv]
jonwire 12524 0.0 0.1 11832 1944 ? S 09:05 0:05 sshd: jonwire#pts/0
jonwire 12525 0.0 0.0 5248 1736 pts/0 Ss 09:05 0:00 -bash
root 16309 0.0 0.0 5432 1436 pts/0 S 12:01 0:00 su -
root 16313 0.0 0.0 5244 1732 pts/0 S 12:01 0:00 -bash
apache 16361 0.0 0.5 32908 9836 ? S Dec15 0:08 /usr/sbin/httpd
apache 16363 0.0 0.5 32908 9784 ? S Dec15 0:08 /usr/sbin/httpd
apache 16364 0.0 0.4 32660 9612 ? S Dec15 0:08 /usr/sbin/httpd
apache 16365 0.0 0.4 32668 9608 ? S Dec15 0:08 /usr/sbin/httpd
apache 16366 0.0 0.7 35076 13948 ? S Dec15 0:08 /usr/sbin/httpd
apache 16367 0.0 0.4 32248 9264 ? S Dec15 0:08 /usr/sbin/httpd
apache 16859 0.0 0.5 32916 9844 ? S Dec15 0:08 /usr/sbin/httpd
apache 20379 0.0 0.4 32248 8904 ? S Dec15 0:08 /usr/sbin/httpd
root 28368 0.0 0.0 0 0 ? S Nov01 0:21 [flush-253:0]
apache 31973 0.0 0.4 31668 8608 ? S Dec16 0:08 /usr/sbin/httpd
)
The results of ps axu here are pretty useless, because I'm connecting to localhost. But, I can see from these results that the the DB connect latency spikes occasionally, as does the "network" latency (some TCP/IP buffer?).
If I were you, I'd bump the number of test cycles up to 5000 or 50000.
I can merely guess, but since you eliminated server load, and I assume you checked for red flags in the InnoDb-Stats (phpmyadmin is a great help on that one, although there are more professional tools), what remains is an inconsistent usage of keys. Could it be that your query slightly varies, and that there is a constellation where suboptimal indices are used?
Please add an FORCE INDEX PRIMARY or alike repeat your tests.
Something I've found immensely useful in diagnosing MySQL issues in this vein is mysqltuner. It's a PERL script that looks at your instance of MySQL and suggests various tuning improvements. honestly, it gets hard to keep track of all the tuning you can do and this script is awesome for giving you a breakdown of potential choke points.
Something else to consider is how Linux itself works, which might also explain why you're lagging randomly. When you load top on a Linux box (any box, regardless of load), you'll notice your memory is almost totally used (unless you just rebooted). This isn't a problem or overloading of your box. Linux loads as much as it can into RAM to save time and swaps infrequently used things to your swap file, just like all modern operating systems (called virtual RAM). Normally not a big deal but you're probably using InnoDB as the table type (the current default), which loads things into RAM to save time as well. What could be happening is your query got loaded into RAM (speedy), but sat idle just long enough to get swapped out to the swap file (much slower). Thus you would get a small performance hit while Linux moved it back into RAM (swapfiles are more efficient at this than MySQL would be moving it from the disk). Neither MySQL nor InnoDB have any way to tell this because, as far as they are concerned, it's still in RAM. The problem is described in detail on this blog, with the relevant portion being
Normally a tiny bit of swap usage could be OK (we’re really concerned
about activity—swaps in and out), but in many cases, “real” useful
memory is being swapped: primarily parts of InnoDB’s buffer pool. When
it’s needed once again, a big performance hit is taken to swap it back
in, causing random delays in random queries. This can cause overall
unpredictable performance on production systems, and often once
swapping starts, the system may enter a performance death-spiral.
We found out that an issue with the underlying hardware was causing this. We moved the server to new hardware using VMotion and the issue went away. VMWare was not showing alerts or issues with the hardware. Nonetheless a move off that hardware fixed the issue. Very very odd.