I am running form submission tests on a website (WAMP) to create new users. Currently, my thread breaks when moving from 6-7 users to 8. The current configuration which breaks is:
1 Thread
8 Users
20 Loops
Each form submission contains approx. 25 input fields with no more than 25 characters in each.
I have removed all but one View Results In Tree module, increased the heap size to 4096 (which I come nowhere near), and am running it in non-GUI mode which made improvements but I am still running out of memory well before I should think. System monitor shows that Java is only using around 400MB.
Error says:
java.net.SocketException: No buffer space available (maximum connections reached?): connect
This results in an equal number of failed MySQL INSERTS (e.g. # Buffer Errors (in Results Tree) == # Failed Inserts).
I have looked into some of the Jmeter tuning tips, but none of them seem to apply beyond running in non-Gui. Is this Jmeter or is my application not handling the submissions correctly? Is this too many input fields per submission for this load? With only 5 users, everything works fine. Thanks in advance.
There is a known issue in Windows with leaky sockets. For those that fit the description, and if you are already running Jmeter in x64 mode but are still experiencing premature memory lapses during stress tests, this may help solve the problem: https://support.microsoft.com/kb/2577795
Related
Some days ago I implemented a system for autocomplete suggestions which works with a dictionary depending on the language. Here is how it works:
JQuery UI autocomplete -> call to .php file -> call to VB6 COM dll function -> call to .sqlite file and finding results based on typed letters -> return results to php -> return results to JS.
It works reasonably fast, as in, it takes on average ~7 (milliseconds) to complete each operation. During peak hours, google analytics shows ~1200 online users, and generally each day we get ~half a million calls to this particular function.
From the day this autocomplete suggestion system went online I started noticing hundreds of two very specific error messages:
Not enough storage is available to complete this operation.
CoInitialize has not been called.
Some information which may help:
a) These messages appear mostly during the peak visitors horus
b) They don't always appear only with the specific function, but on others as well (BUT never did before we implemented the aforementioned system)
c) I used sqlite databases before for other stuff (not as much "real-time" though, as in returning results when a user types) but never such a problem appeared
d) The size of the sqlite file is ~350MB with 3 tables, one of them has ~2,2 million entries, the other has 1,6 million and the other ~16thousand entries and all the necessary columns are indexed.
e) Obviously this database is used for read-only operations
f) Once the system gets disabled, all messages stop.
g) I get about a thousand such error messages each day for each message (with about 500.000 calls to the function / day)
The server system is two (x2) boxes: Core I7 4770 at 3,8GHZ, 32GB of RAM, Windows server 2012 and IIS.
The messages appear randomly and only during peak hours. I cannot replicate the problem at the development machine. Searching the internet has been fruitless till now. Any ideas on what's causing it and how to solve would be more than welcome.
Thank you.
The fact this is a transient problem that only happens at peak usage times is a pretty clear indication something is running out of memory - probably your VB DLL (that's likely the only place where CoInitialize is being called).
I've seen similar problems with too many applications running and winDoze running out of "system handles" - I've got a w2K Advanced Server box I've been using as a desktop machine (because m$ doesn't provide an upgrade path from an AS instance to a desktop one) typically running a dozen Explorer windows, WinSCP with 8-10 tabs, several PuTTY sessions, several DOS boxes, remote desktop connections, Firefox with 184 tabs in 4 windows, Thunderbird watching 40 email accounts, WinAmp, and a few others - and PaintShopPro and PhpED just do not want to be running at the same time due to the aforementioned system handle limitation.
Your problems might be fixed simply by throwing more RAM at the issue, but you should probably do some performance checking, and check for system settings to tweak.
I'm trying to figure out what is causing my system to open a large number of PHP threads. This issue has occurred 3 times over the last 2 weeks, and is capable of crashing our application if undetected for several hours, as once it opens up 300 database connections it prevents anyone further from connecting.
The application is based on CakePHP 2.X, is running across multiple EC2 Instances, which share an RDS database.
The primary identifier that something is going wrong is high number of database connections, as shown by this graph:
We have CloudWatch monitoring setup to notify us on slack when average connections go above 40 for more than 5 minutes (normally connections don't go much above 10).
Looking at New Relic I can also see that the number of php processes steadily increased by 1 per minute. This is on our operations server which just handles background processing and tasks, and does not handle any web traffic.
Over the same time the graphs on the web servers appear normal.
In looking at New Relics information on long-running processes there is no information provided that would suggest any php processes ran for 20+ minutes, however, these processes were killed manually which may be why they're not visible within New Relic - I believe it may not record processes which are killed.
While this issue has now occurred 3 times, I'm still unsure what is causing the problem or how to debug what a particular running php thread is doing.
The last time this happened I could see all the php threads running, and could see they had been running for some time, but had no idea what they were doing or how to find out what they were doing, and to prevent the database from becoming overloaded I had to kill them all.
Are there any tools, or other information I am overlooking here which may help me in my search to determine which particular process is causing this issue?
MySQL 5.1.73
Apache/2.2.15
PHP 5.6.13
CentOS release 6.5
Cakephp 3.1
After about 4 minutes (3 min, 57 seconds) the import process I'm running stops. There are no errors or warnings in any log that I can find. The import process consists of a lot of SQL calls and data processing, nothing too crazy, but it can take about 10 minutes to get through 5500 records if it's doing a full compare for updates.
Firefox: Secure Connection Failed - The connection to the server was reset while the page was loading.
Chrome: ERR_NO RESPONSE
The php set time limit is set to 900, which is working. I can set it to 5 seconds and get an error. The limit is not being reached.
I can sleep another controller for 10 minutes, and this error does not happen, indicating that something in the actual program is causing it to fail, and not the hosting service killing the request because it's taking too long (read about VPS doing this to prevent spam).
The php errors are turned all the way up in the php.ini, and just to be sure, in the controller itself.
The import process completes if I reduce the size of the file being imported. If it's just long enough, it will complete AND show the browser message. This indicates to me it's not failing at the same point of execution each time.
I have deleted all the cache and restarted the server.
I do not see any output in the apache logs other then that the request was made.
I do not see any errors in the mysql log, however, I don't know if it's because its not turned on.
The exact same code works on my local host without any issue. It's not a perfect match to the server, but it's close. Ubuntu Desktop vs Centos, php 5.5 vs php 5.6
I have kept an eye on the memory usage and don't see any issues there.
At this point I'm looking for any good suggestions on what else to look at or insights into what could be causing the failure. There are a lot of possible places to look, and without an error, it's really difficult to narrow down where the issue might be. Thanks in advance for any advice!
UPDATE
After taking a closer look at the memory usage during the request, I noticed it was getting much higher than it ideally should.
The httpd (apache) process gets killed and a new thread spawned. Once the new thread runs out of memory, the error shows up on the screen. When I had looked at it previous, it was only at 30%, probably because it had just killed the old process. Watching it the whole way through, I saw it get as high as 80%, which with the other processes was enough to get have it run out of memory, and a killed process can't log anything, hence the no errors or warnings. It is interesting to me that the process just starts right back up.
I found a command to show which processes had been killed due to memory which proved very useful:
dmesg | egrep -i 'killed process'
I did have similar problems with debugkit.
I had bug in my code during memory peak and the context was written to html in the error "log".
SEE THE LAST UPDATE AT THE BOTTOM FOR MY FINAL EVAL / SUGGESTIONS
This seems so basic that it should be a common problem.. but I've already searched for anything pertaining to this issue with no luck
-- Scenario --
I have a web application that, as one of it's functions, allows users to upload photos to the server. File size limits aren't the issue, but i can notice a visible difference in the speed of the server when i'm uploading a file vs not.
-- Testing --
I uploaded a 3MB file while signed in to another account (on another computer completely) to test the page load times in firebug. Caching has been disabled. The results are below:
Baseline page speed (without upload): 0.409, 0.449, 0.468
During 3MB file upload:1.28, 8.58, --upload complete -- 0.379
This problem obviously compounds if more than one user is uploading a photo at the same time. This seems insane considering all the power i have with the current setup.
-- Setup --
Mediatemple DV Level 3 Application server (4GB ram, 16 cores)
Mediatemple DV Dev level 1 database server (running mysiam tables)
Cloudflare CDN
Custom PHP application
Wordpress sales website (front end, same server - not connected in any way to the web app)
CentOS 6.5
Mysql 5.5
-- So Far --
I had the cloudtech team at MT tune the apache & nginx settings for me since i thought i had screwed this up, but the issue is still showing up
I am considering changing all the DB tables to innodb for concurrency, but this is not related to the question at hand
The server load averages do not seem to be significantly affected when uploading my test file
the output of "free -m" is below
total used free shared buffers cached
Mem: 4096 634 3461 0 0 215
-/+ buffers/cache: 419 3676
Swap: 1536 36 1499
EDIT 1
Is it possible to offload these types of things to an independent server? I realize the PHP used to upload the file would also have to be run from that server, but at least then only the upload / long process server would be affected and not the entire application.
Also, is there any other language or workflow that would work better for this? Perl? Python?
EDIT 2 (2014-08-28)
I forgot to mention two things
1) this issue isn't just with file uploads - it happens whenever a php script runs for an extended time. As a test, i ran a 3 minute php script on my end and sure enough, got a phone call from a client during the execution about the "slow" system.
2) I have high concurrent log in sessions running. Many of these users are likely on the same script at the same time.
Here is the output from htop. The "php-cgi" processes are the obvious offenders, but i don't know how to see which scripts are causing this load. Again, even if i did find the script, i feel like i should be able to run more than a handful of php scripts at a time.
EDIT 3 (2014-08-28)
Here is the htop at peak hours (right now). What's annoying is that the system is flying at the moment with 2x the traffic and processes.
EDIT 4 / UPDATE (2014-09-30)
After a month of staring at this, I've found some things to be true. I'm hoping some of this will help others get their high-growth applications in check before it turns into an issue of racing traffic with server upgrades (which is what happened here).
The Issue I was using MyISAM as the exclusive database engine. I had read through hundreds of docs and forum posts regarding whether InnoDB or MyISAM is the better engine to use, most sources giving vague evaluations or (at best) overly complicated benchmarking with vague settings claiming to increase (or even decrease..?) performance. Forget it all and USE InnoDB FOR ALL APPLICATIONS
Find a good resource to help you tune your MySQL server settings and run with it (see links below). Apparently the concurrent traffic on the server was overloading PHP while waiting for the table locks to release in MyISAM. This was causing excessive loads on the application server, while the DB server was just hanging out with hardly any CPU or MEM load. Transitioning to InnoDB allows high-concurrency at the cost of CPU and Memory (both GOOD things, buy a bigger DB server if you have to).
In the end, the load has transferred to the DB server, increasing concurrent traffic performance. To summarize DON'T USE MyISAM ON WEB APPLICATIONS. Period. I'm sure i'll get burned a bit by saying that, but the slight performance hit (yes, MyISAM is a BIT faster at low-concurrency, but who cares if your web app crashes) is well worth the increase in concurrency.
IMPORTANT POINTS
1) When you move your database over to InnoDB "ALTER TABLE my_table SET ENGINE="InnoDB" you will need to use the info found in the following links to set your innodb specific settings.
2) Be sure to code a loop wrapper in PHP (3 iterations sounds good) for your query calls to account for deadlocks (a situation where queries are basically competing for the same row(s) and one or both are stalled completely).
3) Write your PHP to look for ERRORS coming out of your new query wrapper.
EG: Send the query -> Error found -> Check for deadlock -> if deadlock, retry query after waiting 0.1 sec -> check again, if error found that isn't deadlock, return error to app, else continue until iteration limit is reached (3 in this example) -> if iteration limit is hit, return error to application
Do something now that your query has failed.
Helpful Links
http://www.percona.com/blog/2013/09/20/innodb-performance-optimization-basics-updated/
Handling innoDB deadlock
https://dba.stackexchange.com/questions/27328/how-large-should-be-mysql-innodb-buffer-pool-size
CAUTION
I crashed the MySQL server with the setting in the following link. You MUST completely stop mysqld (service mysqld stop) before renaming (NOT deleting) the original files (ib_logfile0, ib_logfile1), usually found here on RH/CentOS
/var/lib/mysql/ib_logfile0
http://www.percona.com/blog/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/
Once they are renamed, start your mysql daemon service mysqld start
Hopefully this helps someone,
-B
My Drupal 6 site has been running smoothly for years but recently has experienced intermittent periods of extreme slowness (10-60 sec page loads). Several hours of slowness followed by hours of normal (4-6 sec) page loads. The page always loads with no error, just sometimes takes forever.
My setup:
Windows Server 2003
Apache/2.2.15 (Win32) Jrun/4.0
PHP 5
MySql 5.1
Drupal 6
ColdFusion 9
Vmware virtual environment
DMZ behind a corporate firewall
Traffic: 1-3 hits/sec peak
Troubleshooting
No applicable errors in apache error log
No errors in drupal event log
Drupal devel module shows 242 queries in 366.23 milliseconds,page execution time 2069.62 ms. (So it looks like queries and php scripts are not the problem)
NO unusually high CPU, memory, or disk IO
Cold fusion apps, and other static pages outside of drupal also load slow
webpagetest.org test shows very high time-to-first-byte
The problem seems to be with Apache responding to requests, but previously I've only seen this behavior under 100% cpu load. Judging solely by resource monitoring, it looks as though very little is going on.
Here is the kicker - roughly half of the site's access comes from our LAN, but if I disable the firewall rule and block access from outside of our network, internal (LAN) access (1000+ devices) is speedy. But as soon as outside access is restored the site is crippled.
Apache config? Crawlers/bots? Attackers? I'm at the end of my rope, where should I be looking to determine where the problem lies?
------Edit:-----
Attached is a waterfall chart from webpagetest.org showing a 15 second load time. I've seen times as high as several minutes. And again, the server runs fine much of the time. The green areas indicate that the browser has sent a request and is waiting to recieve the first byte of data back from the server. This is certainly a back-end delay, but it is puzzling that the CPU is barely used during this slowness.
(Not enough rep to post an image, see https://webmasters.stackexchange.com/questions/54658/apache-very-high-page-load-time
------Edit------
On the Apache side of things - Is this possibly a ThreadsPerChild issue?
After much research, I may have found the solution. If I'm correct, it was an apache config problem. Specifically, the "ThreadsPerChild" directive. See... http://httpd.apache.org/docs/2.2/platform/windows.html
Because Apache for Windows is multithreaded, it does not use a
separate process for each request, as Apache can on Unix. Instead
there are usually only two Apache processes running: a parent process,
and a child which handles the requests. Within the child process each
request is handled by a separate thread.
ThreadsPerChild: This directive is new. It tells the server how many
threads it should use. This is the maximum number of connections the
server can handle at once, so be sure to set this number high enough
for your site if you get a lot of hits. The recommended default is
ThreadsPerChild 150, but this must be adjusted to reflect the greatest
anticipated number of simultaneous connections to accept.
Turns out, this directive was not set at all in my config and thus defaulted to 64. I confirmed this by viewing the number of threads for the second httpd.exe process in task manager. When the server was hitting more than 64 connections, the excess requests were simply having to wait for a thread to open up. I added ThreadsPerChild 150 in my httpd.conf.
Additionally, I enabled the apache status module
http://httpd.apache.org/docs/2.2/mod/mod_status.html
...which, among other things, allows one to see the total number of active request on the server at any given moment. Right away, I could see spikes of up to 80 active request. Time will tell, but I'm confident that this will resolve my issue. So far, 30 hours without a hiccup.
Apache is too bulk and clumsy for "1-3 hits/sec avg".
Once I have similar problem with much lighter (almost static-html, no DB) site, and similar hits/second.
No errors, no high network/CPU/memory/disk loads. Apache on WinXP.
I inserted nginx before Apache for static files and it started working like a charm.
Caching. The solution it caching.
Drupal (in common with most other large CMS platforms) has a tendency toward this kind of thing due to its nature -- every page is built on the fly, constructed from a whole stack of database tables and code modules. The more you've got in there, the slower it will be, but even fairly simple pages can become horribly slow if your site gets a bit of traffic.
Drupal has a page cache mechanism built-in which will cut your load dramatically. As long as your pages are static (ie no dynamic content) then you can simply switch on caching and watch the performance go right back up.
If you have dynamic content, you can still enable caching for the static parts of the page. It is a bit more complex (and beyond the scope of this answer), but it is worth the effort.
If that's still not enough, a server-based caching solution such as Varnish will definitely help.