I have a PHP application that is executed up to one hundred times simultaneously, and very often. (its a telegram anti-spam bot with 250k+ users)
The script itself makes various DB calls (tickers update, counters etc.) but it also load each time some more or less 'static' data from the database, like regexes or json config files.
My script is also doing image manipulation, so the server's CPU and RAM are sometimes under pressure.
Some days ago i ran into a problem, the apache2 OOM-Killer was killing the mysql server process due to lack of avaible memory. The mysql server were not restarting automaticaly, leaving my script broken for hours.
I already made some code optimisations that enabled my server to breathe, but what i'm looking now, is to have some caching method to store data between script executions, with the possibility to update them based on a time interval.
First i thought about flat file where i could serialize data, but i would like to know if it is a good idea or not regarding performances.
In my case, is there a benefit of using caching data over mysql queries ?
What are the pro/con, regarding speed of access, speed of execution ?
Finaly, what caching method should i implement ?
I know that the simplest solution is to upgrade my server capacity, I plan to do so anytime soon.
Server is running Debian 11, PHP 8.0
Thank you.
If you could use a NoSQL to provide those queries it would speed up dramatically.
Now if this is a no go, you can go old school and keep that "static" data in the filesystem.
You can then create a timer of your own that runs, for example, every 20 minutes to update the files.
When you ask info regarding speed of access, speed of execution the answer will always be "depends" but from what you said it would be better to access the file system that being constantly querying the database for the same info...
The complexity, consistency, etc, lead me to recommend against a caching layer. Instead, let's work a bit more on other optimizations.
OOM implies that something is tuned improperly. Show us what you have in my.cnf. How much RAM do you have? How much RAM des the image processing take? (PHP's image* library is something of a memory hog.) We need to start by knowing how much RAM can MySQL can have.
For tuning, please provide GLOBAL STATUS and VARIABLES. See http://mysql.rjweb.org/doc.php/mysql_analysis
That link also shows how to gather the slowlog. In it we should be able to find the "worst" queries and work on optimizing them. For "one hundred times simultaneously", even fast queries need to be further optimized. When providing the 'worst' queries, please provide SHOW CREATE TABLE.
Another technique is to decrease the number of children that Apache is allowed to run. Apache will queue up others. "Hundreds" is too many for Apache or MySQL; it is better to wait to start some of them rather than having "hundreds" stumbling over each other.
Related
Lately my site has been getting about 2.5 million hits per day (on average). I record hits to each and every page (it's an adult site), so I'm able to have a Top 10 sort of thing that shows top Websites, Models, Galleries and Images. I record the hit, as well as the users IP so those individual sections only get incremented one time per user, every 24 hours. The problem with this is that it's updating the mysql database each hit. So of course, my site has started getting 504 errors.
I looked around and saw that memcached might be a solution. Store hits in memory and push to the database every X mins. I also saw some people suggest using MongoDB, which to my understanding is also a memory type storage. Would this be the way to go? Would you recommend memcached or MongoDB for what I'm trying to do? Or is this not the way to proceed because it just means more mysql calls in a shorter time frame (1 huge batch, say, every minute would mean 60 seconds worth of hits versus smaller batches every second).
I have both memcached and MongoDB installed on my server, so either is an option.
there may be much easier solutions to obtain better database performance without new software packages. the volumes you mention are not particularly large.
i'll list a just a few of many possibilities.
1. if you are on a version of mysql older than 5.6, then updating to 5.6+ will almost certainly yield a very significant improvement because the storage engine is much better for 5.6 and above.
2. if the busiest tables use a storage engine other than innodb, then switch to innodb. [you can do this with phpmyadmin]
3. get some help tuning buffer sizes in my.ini [it takes some skill] and/or increasing ram on the database server(s).
4. consider spreading the workload across more drives and/or switch part or all of the database to solid state drives [or better conventional drives]
5. if the database server(s) is/are memory or compute bound then bigger or more servers may be needed.
6. make sure the bottleneck is not external to the database server(s).
The way this site (stackoverflow.com) implements it is by maintaining in-memory data structure of question views which gets flushed to DB every 15 minutes or so. There is no need to stress DB by saving each hit - too much IO. This in-memory structure could be just within your application as a map of ip and hits/time or it could be in memcached. I don't think you really need memcached for this purpose.
So the general idea to do batch updates that you had is a good one.
I have a php file which parses a txt file and writes the data to a Mysql table. The xml file is quite big, with over 6 million lines. I did this on my home computer, and it took about six hours for the whole process. Now I'm trying to do the exact same thing on my beefed-up dedicated server (32GB ram), and 12 hours later, it barely got through 10% of the records.
I don't know if it's connected, but I also imported a large sql file through phpmyadmin several days ago, and I thought it took much longer than it should.
What could be the problem?
TIA!
Unless you do profiling and stuff like EXPLAIN queries, it's hard to say.
There are some possibilities that may be worth investigating though:
Lots of indexes: If you're doing INSERTS, then every index associated with the table you're INSERTING into will need to be updated. If there's a lot of indexes, then a single insert can trigger a lot of writes. You can solve this by dropping the indexes before you start and reinstating them afterward
MyISAM versus InnoDB: The former tends to be faster as it sacrifices features for speed. Writing to an InnoDB table tends to be slower. NOTE: I'm merely pointing out that this is a potential cause of an application running slower, I'm not recommending that you change an InnoDB table to MyISAM!
No transaction: If using InnoDB, you can speed up bulk operations by doing them inside a transaction. If you're not using a transaction, then there's an implicit transaction around every INSERT you do.
Connection between the PHP machine and the SQL server: In testing you were probably running both PHP and the SQL server on the same box. You may have been connecting through a named pipe or over a TCP/IP connection (which has more overhead), but in either case the bandwidth is effectively unlimited. If the SQL server isn't the same machine as the one running the PHP script then it will be restricted to whatever bandwidth exists in the connection between the two.
Concurrent users: You were the only user at any given time of your test SQL database. The live system may and will have any number of additional users connected and running queries at a given time. That's going to take time away from your script, adding to its run time. You should run big SQL jobs at night so as not to inconvenience other users, but also so they can't take performance away from you too.
There are other reasons too, but the ones above are worth investigating first.
Of course the problem may be on the PHP side, you can't be sure that it's on the database until you investigate exactly where it's slowing down and why.
Check if php memory_limit setting or Mysql buffer settings is lower on server than local.
Well, I ended up implementing all the changes to the DB settings as advised here: http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/
And now the db is roaring along! I'm not sure exactly which setting was the one that made the difference, but it's working now, so that the main thing! In any case all of you also gave me great advice which I'll be following up on, so thanks!
I am trying to write a client-server app.
Basically, there is a Master program that needs to maintain a MySQL database that keeps track of the processing done on the server-side,
and a Slave program that queries the database to see what to do for keeping in sync with the Master. There can be many slaves at the same time.
All the programs must be able to run from anywhere in the world.
For now, I have tried setting up a MySQL database on a shared hosting server as where the DB is hosted
and made C++ programs for the master and slave that use CURL library to make request to a php file (ex.: www.myserver.com/check.php) located on my hosting server.
The master program calls the URL every second and some PHP code is executed to keep the database up to date. I did a test with a single slave program that calls the URL every second also and execute PHP code that queries the database.
With that setup however, my web hoster suspended my account and told me that I was 'using too much CPU resources' and I that would need to use a dedicated server (200$ per month rather than 10$) from their analysis of the CPU resources that were needed. And that was with one Master and only one Slave, so no more than 5-6 MySql queries per second. What would it be with 10 slaves then..?
Am I missing something?
Would there be a better setup than what I was planning to use in order to achieve the syncing mechanism that I need between two and more far apart programs?
I would use Google App Engine for storing the data. You can read about free quotas and pricing here.
I think the syncing approach you are taking is probably fine.
The more significant question you need to ask yourself is, what is the maximum acceptable time between sync's that is acceptable? If you truly need to have virtually realtime syncing happening between two databases on opposite sites of the world, then you will be using significant bandwidth and you will unfortunately have to pay for it, as your host pointed out.
Figure out what is acceptable to you in terms of time. Is it okay for the databases to only sync once a minute? Once every 5 minutes?
Also, when running sync's like this in rapid succession, it is important to make sure you are not overlapping your syncs: Before a sync happens, test to see if a sync is already in process and has not finished yet. If a sync is still happening, then don't start another. If there is not a sync happening, then do one. This will prevent a lot of unnecessary overhead and sync's happening on top of eachother.
Are you using a shared web host? What you are doing sounds like excessive use for a shared (cPanel-type) host - use a VPS instead. You can get an unmanaged VPS with 512M for 10-20USD pcm depending on spec.
Edit: if your bottleneck is CPU rather than bandwidth, have you tried bundling up updates inside a transaction? Let us say you are getting 10 updates per second, and you decide you are happy with a propagation delay of 2 seconds. Rather than opening a connection and a transaction for 20 statements, bundle them together in a single transaction that executes every two seconds. That would substantially reduce your CPU usage.
Is there a standard solution to scale up a website which runs on PHP + Apache web server ?
As in I get a traffic of about 100,000 requests/day as of now. 6 months down the line I expect it to grow to 200,000 requests/day. The first cut solution which comes to my mind is deploying more Apache web servers with mod_php, but something seems so wrong about it.
Any ideas ?
Try these two options first before adding new servers. They may allow you to stick with one server, but your results may vary.
For speeding the site up when you are hit with many concurrent users, look into installing the APC PECL extension (http://us2.php.net/manual/en/book.apc.php). APC will allow you to cache the compiled version of your scripts, saving the step of the PHP interpreter running each time a script is executed.
Also, if you are experiencing heavy load on the database server, look into installing memcached and caching database results for a certain time period, if possible (http://us2.php.net/manual/en/book.memcache.php).
Finally, if you do decide to get a separate server, look into possibly getting a dedicated SQL box. This, of course, assumes that your application is a database heavy application, as web apps are these days. Segregating SQL into a separate box allows it to take advantage of all of the resources on that box, with more cache and processing power. It could be the way to go.
i don't have any experience with scaling realy large websites, but i don't think you'll need so scale to different servers in this case. i have a browsergame with 40.000-60.000 requests per day, some cronjobs doing a lot of stuff every 5 minutes and a teamspeak-server on a small server (40 $ / month) and havn't got any performance problems till now.
20.000 requests / day is only one every fifth second, sounds like one box should be able to deal with that just fine? If not I'd first have a look at bottlenecks in your code. Redundant database calls? Double-looping database calls rather than simple joins? Are you caching anything?
How to scale after this is totally dependent on your application, how/where do you keep session state and so forth, general advice has limited applicability.
if you like it then you should have put a cache on it
That question may appear strange.
But every time I made PHP projects in the past, I encountered this sort of bad experience:
Scripts cancel running after 10 seconds. This results in very bad database inconsistencies (bad example for an deleting loop: User is about to delete an photo album. Album object gets deleted from database, and then half way down of deleting the photos the script gets killed right where it is, and 10.000 photos are left with no reference).
It's not transaction-safe. I've never found a way to do something securely, to ensure it's done. If script gets killed, it gets killed. Right in the middle of a loop. It gets just killed. That never happened on tomcat with java. Java runs and runs and runs, if it takes long.
Lot's of newsletter-scripts try to come around that problem by splitting the job up into a lot of packages, i.e. sending 100 at a time, then relading the page (oh man, really stupid), doing the next one, and so on. Most often something hangs or script will take longer than 10 seconds, and your platform is crippled up.
But then, I hear that very big projects use PHP like studivz (the german facebook clone, actually the biggest german website). So there is a tiny light of hope that this bad behavior just comes from unprofessional hosting companies who just kill php scripts because their servers are so bad. What's the truth about this? Can it be configured in such a way, that scripts never get killed because they take a little longer?
Is PHP suitable for very large projects?
Whenever I see a question like that, I get a bit uneasy. What does very large mean? What may be large to you, may be small to me or vice versa. And that is even assuming that we use the same metric. Are you measuring time to build the project, complete life-cycle of the project, money that are involved, number of people using it, number of developers to build/maintain it, etc. etc.
That said, the problems you're describing sounds like you don't know your technology good enough. That would be a problem for you regardless of which technology you picked. For example, use database transactions to ensure atomicity. And use asynchronous offline jobs to process long running tasks (Such as dispatching a mailing list).
A lot if the bad behaviour is covered in good frameworks like the Zend Framework.
Anything that takes longer the 10 seconds is really messed up but you can always raise the execution time with http://de3.php.net/set_time_limit
A lot of big sites are writen in PHP: Facebook, Wikipedia, StudiVZ, Digg.com etc.. a lot of the things you are talking about are just configuration things maybe you should look into that?
Are you looking for set_time_limit() and ignore_user_abort()?
Performance is not a feature you can just throw in after most of the site is done.
You have to design the site for heavy load.
If a database task is normally involving 10K rows, you should be prepared not just the execution time issues, but other maintenance questions.
Worst case: make a consistency tool to check and fix those errors.
Better: instead of phisically delete the images, just flag them and let background services to take care of the expensive maneuvers.
Best: you can utilize a job queue service and add this job to the queue.
If you do need to do transactions in php, you can just do:
mysql_query("BEGIN");
/// do your queries here
mysql_query("COMMIT");
The commit command will just complete the transaction.
If any errors occur, you can just rollback with:
mysql_query("ROLLBACK");
Edit: Note this will only work if you are using a database that supports transactions, such as InnoDB
You can configure how much time is allowed for executing a script, either in the php.ini setting or via ini_set/set_time_limit
Instead of studivz (the German Facebook clone), you could look at the actual Facebook which is entirely PHP. Or Digg. Or many Yahoo sites. Or many, many others.
ignore_user_abort is probably what you're looking for, but you could also add another layer in terms of scheduled maintenance jobs. They basically run on a specified interval and do various things to make sure your data/filesystem are in a state that you want... deleting old/unlinked files is just one of many things you can do.
For these large loops like deleting photo albums or sending 1000's of emails your looking for ignore_user_abort and set_time_limit.
Something like this:
ignore_user_abort(true); //users leaves webpage will not kill script
set_time_limit(0); //script can take as long as it wants
for(i=0;i<10000;i++)
costly_very_important_operation();
Be carefull however that this could potentially run the script forever:
ignore_user_abort(true); //users leaves webpage will not kill script
set_time_limit(0); //script can take as long as it wants
while(true)
do_something();
That script will never die, unless you restart your server.
Therefore it is best to never set the time_limit the 0.
Technically no programming language is transaction safe, it's the database that needs to be transaction safe. So if the script/code running dies or disconnects, for whatever reason, the transaction will be rolled back.
Putting queries in a loop is a very bad idea unless it is specifically design to be running in batches and breaking a much larger set into smaller pieces. Adjusting PHP timers and limits is generally a stop gap solution, you are still dependent on the client browser if using the web to kick off a script.
If I have a long process that needs to be kicked off by a browser, I "disconnect" the process from the browser and web server so control is returned to the user while the script runs. PHP scripts run from the command line can run for hours if you want. You can then use AJAX, or reload the page, to check on the progress of the long running script.
There are security concern with this code, but to "disconnect" a process from PHP running under something like Apache:
exec("nohup /usr/bin/php -f /path/to/script.php > /dev/null 2>&1 &");
But that really has nothing to do with PHP being suitable for large projects or being transaction safe. PHP can be used for large projects, but since by default there is no code that remains "resident" between hits, it can get slow if not designed right. Also, since there is no namespace support, you want to plan ahead if you have a large development team.
It's fine for a Java based system to take a few minutes to startup, initialize and load all the default objects. But this is unacceptable with PHP. PHP will take more planning for larger systems. The question is, when does the time saved in using PHP get wasted by the additional planning time required for a large system?
The reason you most likely experienced bad database consistencies in the past is because you were using the MyISAM engine for mysql (which DOES NOT support transactions). Use InnoDB instead, it supports transactions and performs row level locking.
Or use postgreSQL.
Many, many software sites are made in PHP. However, you will not hear about millions of web pages made in PHP that do not exist anymore because they were abandoned. Those pages may have burned all company money for dealing with PHP mess, or maybe they bankrupted because their soft was so crappy that customer did not want it… PHP seems good at the startup, but it does not scale very well. Yes, there are many huge web sites made in PHP, but they are rather exceptions, than a norm.