Optimizing mysql / PHP based website

Optimizing mysql / PHP based website | 300 qps - php

Hey,
I currently have over 300+ qps on my mysql. There is roughly 12000 UIP a day / no cron on fairly heavy PHP websites. I know it's pretty hard to judge if is it ok without seeing the website but do you think that it is a total overkill?
What is your experience? If I optimize the scripts, do you think that I would be able to get substantially lower of qps? I mean if I get to 200 qps that won't help me much. Thanks

currently have over 300+ qps on my mysql
Your website can run on a Via C3, good for you !
do you think that it is a total overkill?
That depends if it's
1 page/s doing 300 queries, yeah you got a problem.
30-60 pages/s doing 5-10 queries each, then you got no problem.
12000 UIP a day
We had a site with 50-60.000, and it ran on a Via C3 (your toaster is a datacenter compared to that crap server) but the torrent tracker used about 50% of the cpu, so only half of that tiny cpu was available to the website, which never seemed to use any significant fraction of it anyway.
What is your experience?
If you want to know if you are going to kill your server, or if your website is optimizized, the following has close to zero information content :
UIP (unless you get facebook-like numbers)
queries/s (unless you're above 10.000) (I've seen a cheap dual core blast 20.000 qps using postgres)
But the following is extremely important :
dynamic pages/second served
number of queries per page
time duration of each query (ALL OF THEM)
server architecture
vmstat, iostat outputs
database logs
webserver logs
database's own slow_query, lock, and IO logs and statistics
You're not focusing on the right metric...

I think you are missing the point here. If 300+ qps are too much heavily depends on the website itself, on the users per second that visit the website, that the background scripts that are concurrently running, and so on. You should be able to test and/or compute an average query throughput for your server, to understand if 300+ qps are fair or not. And, by the way, it depends on what these queries are asking for (a couple of fields, or large amount of binary data?).
Surely, if you optimize the scripts and/or reduce the number of queries, you can lower the load on the database, but without having specific data we cannot properly answer your question. To lower a 300+ qps load to under 200 qps, you should on average lower your total queries by at least 1/3rd.

Optimizing a script can do wonders. I've taken scripts that took 3 minutes before to .5 seconds after simply by optimizing how the calls were made to the server. That is an extreme situation, of course. I would focus mainly on minimizing the number of queries by combining them if possible. Maybe get creative with your queries to include more information in each hit.
And going from 300 to 200 qps is actually a huge improvement. That's a 33% drop in traffic to your server... that's significant.

You should not focus on the script, focus on the server.
You are not saying if these 300+ querys are causing issues. If your server is not dead, no reason to lower the amount. And if you have already done optimization, you should focus on the server. Upgrade it or buy more servers.

Related

couple questions for a server

I have a theoretical question I'm hoping you can help me with.
Alright I have a home based server
300 down/30 up Internet (the best I can get where I am at).
I have a static IP and Permission from my IP to host my said server.
Alright everything's in place right? No, well maybe.
Being my server isn't top of the line I worry about 1 single thing,
let's say when I release my App (using my server as a back-end).
Let's say I have 20 thousand active users daily for 1 hour each per day, what's the likelihood of hundreds of people submitting posts (think of twitter posting, only text though) at one given time between 0-300 milliseconds.
What I mean is, think of MySQL, running queries, would 500 people posting text of 140 characters each drop a system to a crawl even with (in a perfect world - perfectly designed queries). And what would the likelihood be that 500 people submit at the same 0-300ms.. To me it doesn't seem very likely until you get into hundreds of thousands of people.
In other words, theory of time.
20K Active users Daily.
Likelihood of estimated theoretical queries per second IF they are all active at one given time. Lets say the average time per people posting ranges from 5-90 seconds. (Reading > Then posting).
I just don't see an issue, but something in the back of my head is making me over think this. The reason this came up is because my Web Hoster (Host-Gator) I found out had like 25 max connections to MySQL. Which freaked me out, and made me realize - what's the likelihood of 25 people at any given time on a small (20K active users on an app) reading that limit.
I haven't yet set any max connections on my server yet, I will eventually. (For obvious reasons). I just want to make sure I set it up to be optimally proficient, and in doing so, I need to at least articulate how many queries per second I can expect on average per so many people, E.G. 20K people.
To me 20K users isn't that many, but at the same time, I'm not very good at averaging things out in this kind of situation, because it's nothing but a big ('Unknown'), I mean how can I truly predict something like this without being in a live production environment. But being in a live production environment and having it smack be right back in the face and ruining my credibility to the end user is the worst that can happen.
Server Specs: (TO BE UPGRADED WHEN NEEDED).
Windows 10
8 Gigs of Ram.
Intel Core I3
Apache + PHP + MySQL (Not Xampp or Wamp), just each individual things set up.
1 TB SSD
Nothing else running on it - dedicated only for this sole purpose.
Yes I know, not very strong, but it's all I have and can afford at the moment.
There's no way in the world you'll catch me spending $174 a month for a decent dedicated server when i can save up a few months and just outright buy a new one later.
So this is just a temporary solution until I can afford something better.
Thanks guys/gals.

First off, I would drop the Windows 10 OS and go with Ubuntu 16.
Be realistic when buying a server, as some companies advertise servers in the low hundreds, and after you really get things built out correctly, you are in the thousands of dollars.
Think of it like you are getting a 'chassis only'.
Second, don't try to emulate what good IP companies do for pennies on the dollar - you'll go broke and not have enough time to focus on your core application.
Once it takes off and starts generating revenue, then put your first dollars into infrastructure improvements.
Take if from someone who has gone done this path before ;)
Good luck !

Running 600+ threads with PHP pthreads - what about the overhead

I have a server with 2 physical CPU which have together 24 cores and 10 GB RAM.
The PHP program is calculating a statistic and I could run each section totally independent of the others. Once all calculations finished I have only to "merge" them.
Therefore I had the idea to perform each calculation phase in a separate thread created/controlled by "pthread".
Each calculation takes around 0,10 seconds but the amount of the calculations lets it take that long when they are serialized.
My questions:
Is there a limitation when creating a new "thread" with "pthreads"?
What is the overhead when creating a new thread? I must consider this to avoid a new delay.
I can imagine that for several seconds the load would be very high but then it ends suddenly once each calculation finished. This is not the problem. It is "my" server and I do not have to take care regarding other users [or when it is a shared server].

While "waiting" for an answer :-) I started to rewrite the class.
I can summarize it like this:
There is no way to start 600 threads at once. I expected it but I wanted to know where is the limit. My configuration "allowed" around 160 threads to be started.
When starting more than these 150 threads the PHP script stopped working without any further notice.
As Franz Gleichmann pointed out the whole process took longer when starting lot of threads. I found out that starting 20 threads has the best performance.
The achieved performance gain is between 20% and 50% - I am satisfied.
I don't know if it is a bug in the pthread library but I could not access any class members. I had to move the class members inside the function. Due to the fact the calculation is in one function it did not bother me and I do not investigate it further.

MySQL take up to 300% cpu use when site is visited by more than X people

I'm puzzled; I assume a slow query.
Note: all my queries are tested and run great when there`s less people using my app/website (less then 0.01sec each).
So I've some high cpu usage with my current setup and I was wondering why? Is it possible it's an index issue?
Our possible solution: we thought we could use an XML cache file to store the informations each hour, and then reduce the load on our MySQL query? (update files each hour).
Will it be good for us to do such things? Since we have an SSD drive? Or will it be slower then before?
Currently in high traffic time, our website/app can take up to 30 seconds before return the first byte. My website is running under a Plesk 12 server.
UPDATE
Here's more informations about my mysql setup..
http://pastebin.com/KqvFYy8y

Is it possible it's an index issue?
Perhaps but not necessarily. You need first to identify which query is slow. You find that in the slow query log. Then analyze the query. This is explained in literature or you can contact a consultant / tutor for that.
We thought we could use an xml cache file to store the informations each hour.. and then reduce the load on our mysql query?
Well, cache invalidation is not the easiest thing to do, but with a fixed rythm every hour this seems easy enough. But take care that it will only help if the actual query you cache was slow. Mysql normally has a query cache built in, check if it is enabled or not first.
Will it be good for us to do such things?
Normally if the things to do are good, the results will be good, too. Sometimes even bad things will result in good results, so such a general question is hard to answer. Instead I suggest you gain more concrete information first before you continue to ask around. Sounds more like guessing. Stop guessing. Really, that's only for the first two minutes, after that, just stop guessing.
Since we have an ssd drive? Or will it be slower then before?
You can try to throw hardware on it. Again lierature and a consultant / tutor can help you greatly with that. But just stop guessing. Really.

I assume the query is not slow all the time. If this is true, the query is not very likely the problem.
You need to know what is using the CPU. Likely a runaway script with an infinite loop.
Try this:
<?php
header('Content-Type: text/plain; charset=utf-8');
echo system('ps auxww');
?>
This should return a list in this format:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
Scan down the %CPU column and look for your user name in the USER column
If you see a process taking 100% CPU, you may want to get the PID number and:
system('kill 1234');
Where 1234 is the PID
The mysql processes running at 441% and 218% seems very problematic.
Assuming this is a shared server, there may be another user running queries that is hogging the CPU. you may need to take that up with your provider.
I've been watching on one of my shared servers and the CPU for the mysql process has not gone over 16%.
MySQLTuner
From the link it appears you have heavy traffic.
The Tuner was running 23.5 minutes
Joins performed without indexes: 69863
69863 in 23.5 min. comes out to almost 50 queries per second.
Does this sound correct? Running a query with a JOIN 150 times per second.
Index JOIN Table
You have a query with a JOIN.
The tables are joined by column(s).
On the joined table add an index to the column that joins the two table together.

PHP and CPU - Process of chat + notifications

My site has a PHP process running, for each window/tab open, that runs in a maximum of 1 minute, and it returns notifications/chat messages/people online or offline. When JavaScript gets the output, it calls the same PHP process again and so on.
This is like Facebook chat.
But, seems it is taking too much CPU when it is running. Have you something in mind how Facebook handles this problem? What do they do so their processes don't take too much CPU and put their servers down?
My process has a "while(true)", with a "sleep(1)" at the end. Inside the cycle, it checks for notifications, checks if one of the current online people got offline/changed status, reads unread messages, etc.
Let me know if you need more info about how my process works.
Does calling other PHPs from "system()" (and wait for its output) alleviate this?
I ask this because it makes other processes to check notifications, and flushes when finished, while the main PHP is just collecting the results.
Thank you.

I think your main problem here is the parallelism. Apache and PHP do not excell at tasks like this where 100+ Users have an open HTTP-Request.
If in your while(true) you spend 0.1 second on CPU-bound workload (checking change status or other useful things) and 1 second on the sleep, this would result in a CPU load of 100% as soon as you have 10 users online in the chat. So in order so serve more users with THIS model of a chat you would have to optimize the workload in your while(true) cycle and/or bring the sleep interval from 1 second to 3 or higher.
I had the same problem in a http-based chat system I wrote many years ago where at some point too many parallel mysql-selects where slowing down the chat, creating havy load on the system.
What I did is implement a fast "ring-buffer" for messages and status information in shared memory (sysv back in the day - today I would probably use APC or memcached). All operations write and read in the buffer and the buffer itself gets periodicaly "flushed" into the database to persist it (but alot less often than once per second per user). If no persistance is needed you can omit a backend of course.
I was able to increase the number of user I could serve by roughly 500% that way.
BUT as soon as you solved this isse you will be faced with another: Available System Memory (100+ apache processes a ~5MB each - fun) and process context switching overhead. The more active processes you have the more your operating system will spend on the overhead involved with assigning "fair enough" CPU-slots AFAIK.
You'll see it is very hard to scale efficently with apache and PHP alone for your usecase. There are open source tools, client and serverbased to help though. One I remember places a server before the apache and queues messages internally while having a very efficent multi-socket communication with javascript clients making real "push" events possible. Unfortunatly I do not remember any names so you'll have to research or hope on the stackoverflow-community to bring in what my brain discarded allready ;)
Edit:
Hi Nuno,
the comment field has too few characters so I reply here.
Lets get to the 10 users in parallel again:
10*0.1 second CPU time per cycle (assumed) is roughly 1s combined CPU-time over a period of 1.1 second (1 second sleep + 0.1 second execute). This 1 / 1.1 which I would boldly round to 100% cpu utilization even though it is "only" %90.9
If there is 10*0.1s CPU time "stretched" over a period of not 1.1 seconds but 3.1 (3 seconds sleep + 0.1 seconds execute) the calculation is 1 / 3.1 = %32
And it is logical. If your checking-cycle queries your backend three times slower you have only a third of the load on your system.
Regarding the shared memory: The name might imply it but if you use good IDs for your cache-areas, like one ID per conversation or user, you will have private areas within the shared memory. Database tables also rely on you providing good IDs to seperate private data from public information so those should be arround allready :)
I would also not "split" any more. The fewer PHP-processes you have to "juggle" in parallel the easier it is for your systems and for you. Unless you see it makes absolutly sense because one type of notification takes alot more querying ressources than another and you want to have different refresh-times or something like that. But even this can be decided in the whyile cycle. users "away"-status could be checked every 30 seconds while the messages he might have written could get checked every 3. No reason to create more cycles. Just different counter variables or using the right divisor in a modulo operation.
The inventor of PHP said that he believes man is too limited to controll parallel processes :)
Edit 2
ok lets build a formula. We have these variables:
duration of execution (e)
duration of sleep (s)
duration of one cycle (C)
number of concurrent users (u)
CPU load (l)
c=e+s
l=ue / c #expresses "how often" the available time-slot c fits into the CPU load generated by 30 CONCURRENT users.
l=ue / (e+s)
for 30 users ASSUMING that you have 0.1s execution time and 1 second sleep
l=30*0.1 / (0.1 + 1)
l=2.73
l= %273 CPU utilization (aka you need 3 cores :P)
exceeding capab. of your CPU measn that cycles will run longer than you intend. the overal response time will increase (and cpu runs hot)

PHP blocks all sleep() and system() calls. What you really need is to research pcntl_fork(). Fortunately, I had these problems over a decade ago and you can look at most of my code.
I had the need for a PHP application that could connect to multiple IRC servers, sit in unlimited IRC chatrooms, moderate, interact with, and receive commands from people. All this and more was done in a process efficient way.
You can check out the entire project at http://sourceforge.net/projects/phpegg/ The code you want is in source/connect.inc.

php memory how much is too much

I'm currently re-writing my site using my own framework (it's very simple and does exactly what I need, i've no need for something like Zend or Cake PHP). I've done alot of work in making sure everything is cached properly, caching pages in files so avoid sql queries and generally limiting the number of sql queries.
Overall it looks like it's very speedy. The average time taken for the front page (taken over 100 times) is 0.046152 microseconds.
But one thing i'm not sure about is whether i've done enough to reduce php memory usage. The only time i've ever encountered problems with it is when uploading large files.
Using memory_get_peak_usage(TRUE), which I THINK returns the highest amount of memory used whilst the script has been running, the average (taken over 100 times) is 1572864 bytes.
Is that good?
I realise you don't know what it is i'm doing (it's rather simple, get the 10 latest articles, the comment count for each, get the user controls, popular tags in the sidebar etc). But would you be at all worried with a script using that sort of memory getting hit 50,000 times a day? Or once every second at peak times?
I realise that this is a very open ended question. Hopefully you can understand that it's a bit of a stab in the dark and i'm really just looking for some re-assurance that it's not going to die horribly come re-launch day.
EDIT: Just an mini experiment I did for myself. I downloaded and installed Wordpress and a default installation with no extra add ons, just one user and just one post and it used 10.5 megabytes of memory or "11010048 bytes". Quite pleased with my 1.5mb now.

Memory usage values can vary heavily and are subject to fluctuation, but as you already say in your update, a regular WordPress instance is much, much fatter than that. I have had great troubles to get the WordPress backend running with a memory_limit of sixteen megabytes - let alone when Plug-ins come into play. So from that, I'd say a peak of 1,5 Megabytes performing normal tasks is quite okay.
Generation time is extremely subject to the hardware your site runs on, obviously. However, a generation time of 0.046152 seconds (I assume you mean seconds here) sounds very okay to me under normal circumstances.

It is a subjective question. PHP has a lot of overhead and when calling the function with TRUE, that overhead will be included. You'll see what I mean when you call the function in a simple Hello World script. Also keep in mind that results can differ greatly depending on whether PHP is run as an apache module or FastCGI.
Unfortunately, no one can provide assurances. There will always be unforseen variables that can bring down a site. Perform load testing. Use a code profiler to narrow down the location of any bottlenecks to see if there are ways to make those code blocks more efficient
Encyclopaedia Britannica thought they were prepared when they launched their ad-supported encyclopedia ten years ago. The developers didn't know they would be announcing it on Good Morning America the day of the launch. The whole thing came crashing down for days.

As long as your systems aren't swapping, your memory usage is reasonable. Any additional concern is just premature optimization.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.