I get around 4 million hits on my web server daily. On each page, I open 2 Mysql connections which get close after script execution.
After some optimisations done, for 10% of my requests .i.e. for 400k hits, I open single mysql connection now instead of 2.
After reducing Total Mysql connections per day, I checked Total Mysql No. of processes, sleeps etc, where I don't see any significant gain. It's almost same pre and post optimisation.
In which area I can expect to see the benefit in terms of performance? Will it be CPU utilisation? Memory benefit? IO?
I use LAMP stack.
Related
I am using CodeIgniter for my API implementation. Please find the server resources and technologies used as follows :
SUMMARY
Framework : CodeIgniter
Database : MySQL (Hosted on RDS) (1 MASTER & 2 SLAVE)
Hosting : AWS t2.Micro
Web Server : Nginx
Following is the Report of the LOADER.IO report as per my test.
My API MIN RESPONSE TIME : 383 MS
NUMBER OF HITS : 10000 / 1 MIN CONCURRENT
As you can see in below image AVERAGE RESPONSE is 6436 MS .
I am expecting at least 100000 / 1 MIN users for watching an event on my application.
I would appreciate if anybody can help with some OPTIMIZATIONS Suggestions.
MAJOR THINGS I have done so far
1) SWITCHED TO NGINX FROM APACHE
2) MASTER / SLAVE Configuration (1 MASTER , 2 SLAVE)
3) CHECKED each INDEX in USER JOURNEY in the application
4) CODE OPTIMIZATION : AS you can see 383 MS is a good Response time of an API
5) USED EXPLAIN of MYSQL for checking the explaination of Queries
I would suggest you to focus to tune your mysql to get faster queries execution and thus you can save time.
To to this, I would suggest to do the following:
You can setup them in /etc/my.cnf (Red Hat) or /etc/mysql/my.cnf (Debian) file:
# vi /etc/my.cnf
And then append following directives:
query_cache_size = 268435456
query_cache_type=1
query_cache_limit=1048576
In above example the maximum size of individual query results that can be cached set to 1048576 using query_cache_limit system variable. Memory size in Kb.
These changes will make your queries to give faster results by caching frequently executing queries result and it also updates its cached result when any rows will get updated. This will be done by mysql engine and this is how you can save time.
ONE MORE SUGGESTION:
As you are using t2.micro you would get 1Gig of RAM and 1 CPU. So I would suggest to go with t2.medium which will give you 4.0 GiB RAM and 2 CPU.
For 1667 SELECTs per second, you may need to have multiple Slaves. With such, you can scale arbitrarily far.
However, it may be that the SELECTs can be made efficient enough to not need the extra Slaves. Let's see the queries. Please include SHOW CREATE TABLE and EXPLAIN SELECT ....
It is possible to run thousands of simple queries per second.
"100000 / 1 MIN" -- Is that 100K connections? Or 100K queries from a smaller number of connections? There is a big difference -- establishing a connection is more costly than performing a simple query. Also, having 100K simultaneous connections is more than I have every heard of. (And I have seen thousands of servers. I have seen 10K connections (high-water-mark) and 3K "Threads_connected" -- both were in deep do-do for various reasons. I have almost never seen more than 200 "Threads_running" -- that is actual queries being performed simultaneously; that is too many for stability.)
Ouch -- With the query_cache_size at 256MB on 1GB of RAM, you don't have room for anything else! That is a tiny server. Even on a larger server do not set that tunable to more than 50M. Otherwise the "pruning" slows things down more than the QC speeds them up!
And, how big are the tables in question?
And, SHOW VARIABLES LIKE '%buffer%';
And, what version are you running? Version 5.7 is rated at about 64 simultaneous queries before the throughput stops improving, and (instead), response time heads for infinity.
To do realistic benchmarking, you need to provide realistic values for
How often a query is issued. (Benchmark programs tend to throw queries at the server one after another; this is not realistic.)
How long a query takes.
How many connections are involved. (I claim that 100K is not realistic.)
The heavy-hitters deliver millions of web pages per day. The typical page involves: connect, do a few queries, build html, disconnect -- all (typically) in less than a second. But only a small fraction of the time is any running. That is, 100 connections may equate to 0-5 queries running at any instant.
Please talk about Queries per second that need to be run. And please limit the number of queries run simultaneously.
I have a mysql table with about 90k rows. I have a routine I've written which loops through each one of these rows, and then crosschecks the results within another table with about 90k rows. If there is a match, I delete one of the rows. All the columns I'm cross checking I've made indexes in mysql.
When I run this script on a dedicated local server with 2 x quad 2.4ghz intel xeon, 24gb of ram (with php memory_limit set to 12288m), and with an SSD, the whole script takes about a minute to complete. I would imagine then that the servers resources are maxing out, but actually CPU is about 93% idle, ram is utilising about 6% and I'm looking at Read/Writes on the SSD and it's like not much is happening at all.
I mentioned the problem to somebody else who said that the problem is I'm executing a single-threaded process and wondering why it's not using all 8 processors, but even so, is checking through a mysql table 90k times really a big deal? Wouldn't at least one CPU be running at max?
Why doesn't my server attempt to throw more resources at the script when I run it? Or, how can I unleash more resources so that my local web app runs not like a low spec'd VPS?
Depending on the size of the rows, 90K rows isn't a whole lot. Odds are they're all cached in RAM.
As for the CPUs, your process is not quite single threaded, but it's pretty close. Both your process and the DB server are separate processes, the problem is of course that your process stops while the DB server processes the request, so whatever core has your process scheduled shuts down just as the one with DB spools up.
As the commenter mentioned, it's likely you can do this more efficiently by offloading most of the processing to the DB server. Most of your time is just in statement overhead sending 90K SQL statements to the server.
I have a PHP script that runs very simple queries on a MySQL database (up to a maximum of 25 times per page). Is it going to be worth caching the results (e.g. using APC), or is the performance difference likely to be negligible?
Caching is just about always worth it. Pulling it from APC's in memory user cache vs. establishing a db connection and running queries is a massive difference--especially if you're doing 25 queries on a page!
The benefits will compound:
Pulling from memory you'll serve up requests faster by requiring less overhead
You'll free up db connections
You'll free up apache processes faster
All of which will help server up request faster...
I am having very high CPU spikes on mysqld process (greater than 100%, and even saw a 300% at one point). My load average is around: .25, .34, .28.
I read this great post about this issue: MySQL high CPU usage
One of the main things to do is disable persistent connections. So I checked my php.ini and mysql.allow_persistent = on and mysql.max_persistent = -1 -- which means no limit.
This raises a few questions for me before changing anything just to be sure:
If my mysqld process is spiking over 100% every couple seconds shouldn't my load average be higher then they are?
What will disabling persistent links do - will my scripts continue to function as is?
If I turn this off and reload php what does this mean for my current users as there will be many active users.
EDIT:
CPU Info: Core2Quad q9400 2.6 Ghz
Persistent connections won't use any CPU by themselves - if nothing's using a connection, it's just sitting idle and only consumes a bit of memory and occupies a socket.
Load averages are just that - averages. If you have a process that alternates between 0% and 100% 10 times a second, you'd get a load average of 0.5. They're good for figuring out long-term persistent high cpu, but by their nature hide/obliterate signs of spikes.
Persistent connections with mysql are usually not needed. MySQL has a relatively fast connection protocol and any time savings from using persistent connections is fairly minimal. The downside is that once a connection goes persistent, it can be left in an inconsistent state. e.g. If an app using the connection dies unexpectedly, MySQL will not see that and start cleaning up. This means that any server-side variables created by the app, any locks, any transactions, etc... will be left at the state they were in when the app crashed.
When the connection gets re-used by another app, you'll start out with what amounts to dirty dishes in the sink and an unflushed toilet. It can quite easily cause deadlocks because of the dangling transactions/locks - the new app won't know about them, and the old app is no longer around to relinquish those.
Spikes are fine. This is MySQL doing work. Your load average seems appropriate.
Disabling persistent links simply means that the scripts cannot use an existing connection to the database. I wouldn't recommend disabling this. At the very least, if you want to disable them, do it on the application later, rather than on MySQL. This might even increase load slightly, depending on the conditions.
Finally, DB persistence has nothing to do with the users on your site (generally). Users make a request, and once all of the page resources are loaded, that is it, until the next request. (Except in a few specific cases.) In any case, while the request is happening, the script will still be connected to the DB.
I created a crawler that will operate as a cron job. The object of the crawler is to go through posts on my site and pull keywords from them.
Currently, I am optimizing the script for both speed and server load - but I am curious one what types of benchmarks for each are considered "good"?
For example, here are some configurations I have tested, running through 5,000 posts each time (you'll notice the trade off between speed and memory):
Test 1 - script optimized for memory conservation:
Run time: 52 seconds
Avg. memory load: ~6mb
Peak memory load: ~7mb
Test 2 - script optimized for speed
Run time: 30 seconds
Avg. memory load: ~40mb
Peak memory load: ~48mb
Clearly the decision here is speed vs. server load. I am curious what your reactions are to these numbers. Is 40mb an expensive number, if it increases speed so drastically (and also minimizes MySQL connections?)
Or is it better to run the script slower with more MySQL connections, and keep the overhead memory low?
This is a really subjective question given that what is "tolerable" depends on many factors such as how many concurrent processes will be running, the specs of the hardware it'll be running on, and how long you expect it to take.