Lately my site has been getting about 2.5 million hits per day (on average). I record hits to each and every page (it's an adult site), so I'm able to have a Top 10 sort of thing that shows top Websites, Models, Galleries and Images. I record the hit, as well as the users IP so those individual sections only get incremented one time per user, every 24 hours. The problem with this is that it's updating the mysql database each hit. So of course, my site has started getting 504 errors.
I looked around and saw that memcached might be a solution. Store hits in memory and push to the database every X mins. I also saw some people suggest using MongoDB, which to my understanding is also a memory type storage. Would this be the way to go? Would you recommend memcached or MongoDB for what I'm trying to do? Or is this not the way to proceed because it just means more mysql calls in a shorter time frame (1 huge batch, say, every minute would mean 60 seconds worth of hits versus smaller batches every second).
I have both memcached and MongoDB installed on my server, so either is an option.
there may be much easier solutions to obtain better database performance without new software packages. the volumes you mention are not particularly large.
i'll list a just a few of many possibilities.
1. if you are on a version of mysql older than 5.6, then updating to 5.6+ will almost certainly yield a very significant improvement because the storage engine is much better for 5.6 and above.
2. if the busiest tables use a storage engine other than innodb, then switch to innodb. [you can do this with phpmyadmin]
3. get some help tuning buffer sizes in my.ini [it takes some skill] and/or increasing ram on the database server(s).
4. consider spreading the workload across more drives and/or switch part or all of the database to solid state drives [or better conventional drives]
5. if the database server(s) is/are memory or compute bound then bigger or more servers may be needed.
6. make sure the bottleneck is not external to the database server(s).
The way this site (stackoverflow.com) implements it is by maintaining in-memory data structure of question views which gets flushed to DB every 15 minutes or so. There is no need to stress DB by saving each hit - too much IO. This in-memory structure could be just within your application as a map of ip and hits/time or it could be in memcached. I don't think you really need memcached for this purpose.
So the general idea to do batch updates that you had is a good one.
Related
I have a PHP application that is executed up to one hundred times simultaneously, and very often. (its a telegram anti-spam bot with 250k+ users)
The script itself makes various DB calls (tickers update, counters etc.) but it also load each time some more or less 'static' data from the database, like regexes or json config files.
My script is also doing image manipulation, so the server's CPU and RAM are sometimes under pressure.
Some days ago i ran into a problem, the apache2 OOM-Killer was killing the mysql server process due to lack of avaible memory. The mysql server were not restarting automaticaly, leaving my script broken for hours.
I already made some code optimisations that enabled my server to breathe, but what i'm looking now, is to have some caching method to store data between script executions, with the possibility to update them based on a time interval.
First i thought about flat file where i could serialize data, but i would like to know if it is a good idea or not regarding performances.
In my case, is there a benefit of using caching data over mysql queries ?
What are the pro/con, regarding speed of access, speed of execution ?
Finaly, what caching method should i implement ?
I know that the simplest solution is to upgrade my server capacity, I plan to do so anytime soon.
Server is running Debian 11, PHP 8.0
Thank you.
If you could use a NoSQL to provide those queries it would speed up dramatically.
Now if this is a no go, you can go old school and keep that "static" data in the filesystem.
You can then create a timer of your own that runs, for example, every 20 minutes to update the files.
When you ask info regarding speed of access, speed of execution the answer will always be "depends" but from what you said it would be better to access the file system that being constantly querying the database for the same info...
The complexity, consistency, etc, lead me to recommend against a caching layer. Instead, let's work a bit more on other optimizations.
OOM implies that something is tuned improperly. Show us what you have in my.cnf. How much RAM do you have? How much RAM des the image processing take? (PHP's image* library is something of a memory hog.) We need to start by knowing how much RAM can MySQL can have.
For tuning, please provide GLOBAL STATUS and VARIABLES. See http://mysql.rjweb.org/doc.php/mysql_analysis
That link also shows how to gather the slowlog. In it we should be able to find the "worst" queries and work on optimizing them. For "one hundred times simultaneously", even fast queries need to be further optimized. When providing the 'worst' queries, please provide SHOW CREATE TABLE.
Another technique is to decrease the number of children that Apache is allowed to run. Apache will queue up others. "Hundreds" is too many for Apache or MySQL; it is better to wait to start some of them rather than having "hundreds" stumbling over each other.
I have a shop on PHP platform (bad developed) where there are a lot of bad queries (long queries without indexes, order by rand(), dynamic counting, ..)
I do not have the possibility to change the queries for now, but I have to tune the server to stay alive.
I tried everything I know, I had very big databases, which I optimized with changing MySQL engine and tune it, but it is first project, where everything goes down.
Current situation:
1. Frontend (PHP) - AWS m4.large instance, Amazon Linux (latest), PHP 7.0 with opcache enabled (apache module), mod_pagespeed enabled with external Memcached on AWS Elasticache (t2.micro, 5% average load)
2. SQL - Percona DB with TokuDB engine on AWS c4.xlarge instance (50% average load)
I need to lower the instances, mainly the c4.xlarge, but if I switch down to c4.large, after a few hours there is a big overload.
Database has only 300MB of data, I tried query cache, InnoDB, XtraDB, ... with no success, and the traffic is also very low. PHP uses MySQLi extension.
Do you know any possible solution, how to lower the load without refactoring the queries?
Use of www.mysqlcalculator.com would be a quick way to get a brain check on about a dozen memory consumption factors in less than 2 minutes.
Would love to see your SHOW GLOBAL STATUS and SHOW GLOBAL VARIABLES, if you could get them posted to allow analysis of activity and configuration requests. 3 minutes of your GENERAL_LOG during a busy time would be interesting and could identify troublesome/time consuming SELECT's, etc.
We're running a website which performs financial modeling and it takes a while for memcache to build its cache. For instance, after 1 week the number of hits is only at 48% and the cache used is 2GB (out of 5GB allocated). Since we don't want to loose that cache should the server crash or need to be restarted, we would like to save it somewhere.
Q: What are the best options for storing the content of the memcache cache somewhere permanent (and restoring that content)?
So far we haven't seen memcache reach the point whereby % of hits doesn't improve. We know we quickly get to 30% hits with 300MB of data, which corresponds to caching of shared content. Afterwards, objects become much bigger and are created less frequently. By looking at our munin graphs, I would say we could reach our best % of hits within 2 to 3 months. I really think we have a case for saving our memcache data.
FYI I'm not adding the graph showing the % of hits/misses because it evolves so slowly that it's not really readable.
You might try:
CouchBase
Redis
Or you may find more tips on StackOverflow here: alternative to memcached that can persist to disk
I'd not suggest using Redis in a financial production environment, it's to unstable for it at this time. It is something to keep an eye out for in the future.
CouchBase is a solution.
You can also use repcached (http://repcached.lab.klab.org/) to replicate the memcache data to a 2nd memcache daemon in case the first one crashes, you still have all your memcache data. You can also use this type of setup to loadbalance your memcache usage if you want, i've been using it for quite some time now and are very happy with it's performance.
I am trying to write a client-server app.
Basically, there is a Master program that needs to maintain a MySQL database that keeps track of the processing done on the server-side,
and a Slave program that queries the database to see what to do for keeping in sync with the Master. There can be many slaves at the same time.
All the programs must be able to run from anywhere in the world.
For now, I have tried setting up a MySQL database on a shared hosting server as where the DB is hosted
and made C++ programs for the master and slave that use CURL library to make request to a php file (ex.: www.myserver.com/check.php) located on my hosting server.
The master program calls the URL every second and some PHP code is executed to keep the database up to date. I did a test with a single slave program that calls the URL every second also and execute PHP code that queries the database.
With that setup however, my web hoster suspended my account and told me that I was 'using too much CPU resources' and I that would need to use a dedicated server (200$ per month rather than 10$) from their analysis of the CPU resources that were needed. And that was with one Master and only one Slave, so no more than 5-6 MySql queries per second. What would it be with 10 slaves then..?
Am I missing something?
Would there be a better setup than what I was planning to use in order to achieve the syncing mechanism that I need between two and more far apart programs?
I would use Google App Engine for storing the data. You can read about free quotas and pricing here.
I think the syncing approach you are taking is probably fine.
The more significant question you need to ask yourself is, what is the maximum acceptable time between sync's that is acceptable? If you truly need to have virtually realtime syncing happening between two databases on opposite sites of the world, then you will be using significant bandwidth and you will unfortunately have to pay for it, as your host pointed out.
Figure out what is acceptable to you in terms of time. Is it okay for the databases to only sync once a minute? Once every 5 minutes?
Also, when running sync's like this in rapid succession, it is important to make sure you are not overlapping your syncs: Before a sync happens, test to see if a sync is already in process and has not finished yet. If a sync is still happening, then don't start another. If there is not a sync happening, then do one. This will prevent a lot of unnecessary overhead and sync's happening on top of eachother.
Are you using a shared web host? What you are doing sounds like excessive use for a shared (cPanel-type) host - use a VPS instead. You can get an unmanaged VPS with 512M for 10-20USD pcm depending on spec.
Edit: if your bottleneck is CPU rather than bandwidth, have you tried bundling up updates inside a transaction? Let us say you are getting 10 updates per second, and you decide you are happy with a propagation delay of 2 seconds. Rather than opening a connection and a transaction for 20 statements, bundle them together in a single transaction that executes every two seconds. That would substantially reduce your CPU usage.
I own a community website of about 12.000 users (write heavy), 100 concurrent users max on a single VPS with 1Gb ram. The load rarely goes above 3 and response is quite good.
Currently a simple file cache is used to store DB query results to ease the load on the DB, but the website still can slow down over 220 concurrent users (load test).
How can I find out what the bottleneck is?
I assume that DB is fine as cache is working fine, however Disk IO could cause problem. Each pageload has about 10 includes and 10-20 querys from DB or from the file cache, plus lots of php processing.
I tried using memcache instead of the file cache, but to my suprise the load test seemed to like file cache more.
I plan to use Alternative PHP Cache, but I still don't really understand how that cache is invalidated. I have a singe index.php that handles all requests. Will the cache store the result for each individual request? Will it clear the cache automatically if one of my includes (or query result from cache) change?
Any other suggestions for finding bottlenecks (tried xdebug)?
Thanks,
Hamlet
I plan to use Alternative PHP Cache,
but I still don't really understand
how that cache is invalidated. I have
a singe index.php that handles all
requests. Will the cache store the
result for each individual request?
Will it clear the cache automatically
if one of my includes (or query result
from cache) change?
APC doesn't cache output. It caches your compiled bytecode.
Essentially, a normal PHP request looks like this:
PHP files are parsed and compiled to bytecode
The PHP interpreter executes the bytecode
APC caches the result of the first step, so you aren't reparsing/recompiling the same code over and over again. By default, it still stat()s your PHP files on every request, to see if the file has been modified since its cached copy was compiled -- so any changes to your code will automatically invalidate the cached copy.
You can also use APC much like you'd use memcached, for storing arbitrary user data. Keep in mind, however:
A memcached server can serve data to multiple servers; data cached in APC can only really be used locally. Better to serve a gig of data from one memcached box to four servers, than to have 4 copies of that gig of data in APC on each individual server.
Memcached, in my experience, is better at handling large numbers of concurrent writes to a single cache key.
APC doesn't seem to cope very well with its cache filling up. Fragmentation increases, and performance drops.
Also, beware: unless you've set up some sort of locking mechanism, your file-based cache is likely to become corrupt due to simultaneous writes. If you have implemented locking, that may become a bottleneck of its own. IMO, concurrency is tricky -- let memcached/APC/the database deal with it.
You mention you used XDebug - what weren't you able to do? Typically, to start tracking down a bottleneck you enable profiling of a request and then view the resulting "cachegrind" file in KCacheGrind or WinCacheGrind.
As for using a cache system, a dynamic script such as yours will generally do something like this
construct a cache "key" from the unique inputs to the script
ask the caching system if it has data for that key. If has, you're good to go!
otherwise, do all the hard work to generate the data, and ask the caching system to store it under the desired key for next time.
APC Cache can help to speed things up further by caching the parsed version of the PHP code.
MySQL has its own query cache.
You can enable it by setting query_cache_size to more than 0.
The query results are taken from the cache if the query is repeated verbatim and does not contain certain things like non-deterministic functions, session variables and some other things describe here:
The cache for a query is invalidated by issuing any DML operation against any of the underlying queries.
I turned on and configured APC on the test server and got a performance increase of about 400%
300 concurrent users with response time 1,4 secs max :) Good for a start.
Update:
Live server test results
Original:
No APC: 220 concurrent users, server load 20, response time 5000ms
No APC: 250 concurrent users, server load 20+, site is unavailable
New:
APC enabled: 250 concurrent users, server load 2, response time is 600ms
APC enabled: 350 concurrent users, server load 10, response time is 1500ms
APC enabled: 500 concurrent users, server load 20, response is 5000ms + site is fully operational, but a bit slow but can be used normally
Thanks for the suggestions, this is pretty great improvement.
Query cache is disabled as the site is write heavy thus cache would be invalidated constantly for whole tables.
I would say that it's likely that your database is IO bound, I don't know exactly what a "VPS" is, but if it's some kind of VM, then there is almost guaranteed to be very poorly performing IO.
Get it on to real hardware ASAP; and get a sensible amount of ram (1G is tiny; 16G sounds more reasonable).
Then you may be able to tune your db so it can behave properly. How big are your data in total? If you can get all of them (or most of them) to fit in your database cache (not the dodgy query cache, the proper innodb buffer pool one), then do so.
I'm assuming you're using the innodb engine; if so, then set up the buffer pool to be big enough for all your data - if you don't have enough ram, buy more until you do (No, really!).
Then your db queries should be fast even if they're fairly bad (yes).
The tricky bit is, if you have a single machine, how to carve up ram usage between mysql and PHP - the web server (I assume Apache), particularly if you use prefork and lots of MaxClients, can use up loads of ram and deprive your database of it.
Get some decent monitoring on the job (with trending), and make changes carefully and record exactly when you made them.