Should I be concerned about performance with connections to multiple databases? - php

I have a PHP app that is running on an Apache server with MySQL databases.
Based on the subdomain that users access, I am connecting them to a database (sub1.domain.com connects to database_sub1 and sub2.domain.com connects to database_sub2). Right now there are 10 subdomain-database combos, but that number could potentially grow to well over 100.
So, is this a bad thing?
Considering my situation, is mysql_pconnect the way to go?
Thanks, and please let me know if more info would be helpful.
Josh

Is this an app you have written?
If so, from a maintenance standpoint this may turn into a nightmare.
What happens when you alter the program and need to change the database?
Unless you have a sweet migration tool to help you make changes to all your databases to the new schema you may find yourself in a world of hurt.
I realize you may be too far into this project now, but if a little additional relation was added to the schema to differentiate between the domains (companies/users), you could run them all off one database with little additional overhead.
If performance really becomes a problem (Read this) you can implement Clustering or another elegant solution, but at least you won't have 100+ databases to maintain.

It partly depends on the rest of your configuration, but as long as each transaction only involves one connection then the database client code should perform as you would expect - about the same as with a single database but with more possibilities to improve the performance of the database servers, up to the limit of the network bandwidth.
If more than one connection participates in a transaction, then you probably need an XA compliant transaction manager, and these usually carry a significant performance overhead.

No, it's not a bad thing.
It's rather question of number of parallel connections in total. This can be defined by "max_connections" in mysql settings (default it's 151 since MySQL 5.1.15), and is limited by capability of your platform (i.e. 2048< on Windows, more on Linux), hardware (RAM) and system settings (mainly by limit of open files). It can be a bottleneck if you have many parallel users, number of databases is not important.
I made a script which connects 400+ databases in one execution (one after one, not parallel) and i found mysql + php handling it very well (no significant memory leaks, no big overhead). So i assume there will be no problem with your configuration.
And, finnaly - mysql_pconnect is generally not good think in web developement if there is no significant overhead in connecting database per se. You have to manage it really carefully to avoid problems with max_connections, locks, pending scripts etc. I think that pconnect has limited use (ie. cron job runned every second or something like that)

Related

Caching data to spare mysql queries

I have a PHP application that is executed up to one hundred times simultaneously, and very often. (its a telegram anti-spam bot with 250k+ users)
The script itself makes various DB calls (tickers update, counters etc.) but it also load each time some more or less 'static' data from the database, like regexes or json config files.
My script is also doing image manipulation, so the server's CPU and RAM are sometimes under pressure.
Some days ago i ran into a problem, the apache2 OOM-Killer was killing the mysql server process due to lack of avaible memory. The mysql server were not restarting automaticaly, leaving my script broken for hours.
I already made some code optimisations that enabled my server to breathe, but what i'm looking now, is to have some caching method to store data between script executions, with the possibility to update them based on a time interval.
First i thought about flat file where i could serialize data, but i would like to know if it is a good idea or not regarding performances.
In my case, is there a benefit of using caching data over mysql queries ?
What are the pro/con, regarding speed of access, speed of execution ?
Finaly, what caching method should i implement ?
I know that the simplest solution is to upgrade my server capacity, I plan to do so anytime soon.
Server is running Debian 11, PHP 8.0
Thank you.
If you could use a NoSQL to provide those queries it would speed up dramatically.
Now if this is a no go, you can go old school and keep that "static" data in the filesystem.
You can then create a timer of your own that runs, for example, every 20 minutes to update the files.
When you ask info regarding speed of access, speed of execution the answer will always be "depends" but from what you said it would be better to access the file system that being constantly querying the database for the same info...
The complexity, consistency, etc, lead me to recommend against a caching layer. Instead, let's work a bit more on other optimizations.
OOM implies that something is tuned improperly. Show us what you have in my.cnf. How much RAM do you have? How much RAM des the image processing take? (PHP's image* library is something of a memory hog.) We need to start by knowing how much RAM can MySQL can have.
For tuning, please provide GLOBAL STATUS and VARIABLES. See http://mysql.rjweb.org/doc.php/mysql_analysis
That link also shows how to gather the slowlog. In it we should be able to find the "worst" queries and work on optimizing them. For "one hundred times simultaneously", even fast queries need to be further optimized. When providing the 'worst' queries, please provide SHOW CREATE TABLE.
Another technique is to decrease the number of children that Apache is allowed to run. Apache will queue up others. "Hundreds" is too many for Apache or MySQL; it is better to wait to start some of them rather than having "hundreds" stumbling over each other.

Is there a way to figure out why my production mysql is so slow?

I have a php file which parses a txt file and writes the data to a Mysql table. The xml file is quite big, with over 6 million lines. I did this on my home computer, and it took about six hours for the whole process. Now I'm trying to do the exact same thing on my beefed-up dedicated server (32GB ram), and 12 hours later, it barely got through 10% of the records.
I don't know if it's connected, but I also imported a large sql file through phpmyadmin several days ago, and I thought it took much longer than it should.
What could be the problem?
TIA!
Unless you do profiling and stuff like EXPLAIN queries, it's hard to say.
There are some possibilities that may be worth investigating though:
Lots of indexes: If you're doing INSERTS, then every index associated with the table you're INSERTING into will need to be updated. If there's a lot of indexes, then a single insert can trigger a lot of writes. You can solve this by dropping the indexes before you start and reinstating them afterward
MyISAM versus InnoDB: The former tends to be faster as it sacrifices features for speed. Writing to an InnoDB table tends to be slower. NOTE: I'm merely pointing out that this is a potential cause of an application running slower, I'm not recommending that you change an InnoDB table to MyISAM!
No transaction: If using InnoDB, you can speed up bulk operations by doing them inside a transaction. If you're not using a transaction, then there's an implicit transaction around every INSERT you do.
Connection between the PHP machine and the SQL server: In testing you were probably running both PHP and the SQL server on the same box. You may have been connecting through a named pipe or over a TCP/IP connection (which has more overhead), but in either case the bandwidth is effectively unlimited. If the SQL server isn't the same machine as the one running the PHP script then it will be restricted to whatever bandwidth exists in the connection between the two.
Concurrent users: You were the only user at any given time of your test SQL database. The live system may and will have any number of additional users connected and running queries at a given time. That's going to take time away from your script, adding to its run time. You should run big SQL jobs at night so as not to inconvenience other users, but also so they can't take performance away from you too.
There are other reasons too, but the ones above are worth investigating first.
Of course the problem may be on the PHP side, you can't be sure that it's on the database until you investigate exactly where it's slowing down and why.
Check if php memory_limit setting or Mysql buffer settings is lower on server than local.
Well, I ended up implementing all the changes to the DB settings as advised here: http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/
And now the db is roaring along! I'm not sure exactly which setting was the one that made the difference, but it's working now, so that the main thing! In any case all of you also gave me great advice which I'll be following up on, so thanks!

Performance testing my PHP / MySQL Application for AWS

I am writing a PHP application which uses MySQL in the backend. I am expecting about 800 users a second to be hitting our servers, with requests coming from an iOS app.
The application is spread out over about 8 diffrenet PHP scripts which are doing very simple SELECT queries (occasionally with 1 join) and simple INSERT queries where I'm only inserting one row at a time (with less than 10kb of data per row on average). Theres about a 50/50 split between SELECTS and INSERTS.
The plan is to use Amazon Web Services and host the application on EC2s to spread the CPU load and RDS (with MySQL) to handle the database, but I'm aware RDS doesn't scale out, only up. So, before committing to an AWS solution, I need to benchmark my application on our development server (not a million miles of the medium RDS solution spec) to see roughly how many requests a second my application and MySQL can handle (for ballpark figures) - before doing an actual benchmark on AWS itself.
I believe I only really need to performance test the queries within the PHP, as EC2 should handle the CPU load, but I do need to see if / how RDS (MySQL) copes under that many users.
Any advice on how to handle this situation would be appreciated.
Thank-you in advance!
Have you considered using Apache Benchmark? Should do the job here. Also I've heard good things about siege but haven't tested yet.
If you have 800 user hits per second, it could be a good idea to consider sharding to begin with. Designing and implementing sharding right at the beginning will allow you to start with a small number of hosts, and then scale out more easily later. If you design for only one server, even if for now it will handle the load, pretty soon you will need to scale up and then it will be much more complex to switch to a sharding architecture when the application is already in production.

Keeping two distant programs in-sync using MySql

I am trying to write a client-server app.
Basically, there is a Master program that needs to maintain a MySQL database that keeps track of the processing done on the server-side,
and a Slave program that queries the database to see what to do for keeping in sync with the Master. There can be many slaves at the same time.
All the programs must be able to run from anywhere in the world.
For now, I have tried setting up a MySQL database on a shared hosting server as where the DB is hosted
and made C++ programs for the master and slave that use CURL library to make request to a php file (ex.: www.myserver.com/check.php) located on my hosting server.
The master program calls the URL every second and some PHP code is executed to keep the database up to date. I did a test with a single slave program that calls the URL every second also and execute PHP code that queries the database.
With that setup however, my web hoster suspended my account and told me that I was 'using too much CPU resources' and I that would need to use a dedicated server (200$ per month rather than 10$) from their analysis of the CPU resources that were needed. And that was with one Master and only one Slave, so no more than 5-6 MySql queries per second. What would it be with 10 slaves then..?
Am I missing something?
Would there be a better setup than what I was planning to use in order to achieve the syncing mechanism that I need between two and more far apart programs?
I would use Google App Engine for storing the data. You can read about free quotas and pricing here.
I think the syncing approach you are taking is probably fine.
The more significant question you need to ask yourself is, what is the maximum acceptable time between sync's that is acceptable? If you truly need to have virtually realtime syncing happening between two databases on opposite sites of the world, then you will be using significant bandwidth and you will unfortunately have to pay for it, as your host pointed out.
Figure out what is acceptable to you in terms of time. Is it okay for the databases to only sync once a minute? Once every 5 minutes?
Also, when running sync's like this in rapid succession, it is important to make sure you are not overlapping your syncs: Before a sync happens, test to see if a sync is already in process and has not finished yet. If a sync is still happening, then don't start another. If there is not a sync happening, then do one. This will prevent a lot of unnecessary overhead and sync's happening on top of eachother.
Are you using a shared web host? What you are doing sounds like excessive use for a shared (cPanel-type) host - use a VPS instead. You can get an unmanaged VPS with 512M for 10-20USD pcm depending on spec.
Edit: if your bottleneck is CPU rather than bandwidth, have you tried bundling up updates inside a transaction? Let us say you are getting 10 updates per second, and you decide you are happy with a propagation delay of 2 seconds. Rather than opening a connection and a transaction for 20 statements, bundle them together in a single transaction that executes every two seconds. That would substantially reduce your CPU usage.

Keep MySQL DBs sync'ed with minimum latency from PHP

I have searched SO and read many questions but didn't find any that really answers my question, first a little background info:
PHP Script receive data (gaming site)
Several DB servers are available for redundancy and performance
I am well aware of MySQL Replication and Cluster but here are my problems with those solutions:
1) In Replication if the Master fails, the entire grid fails or long downtimes are suffered
2) In Cluster, first I thought that in order to add another node one must also suffer downtime, but reading again the documentation Im not so sure anymore
Q1: Can someone please clarify if the "rolling restart" actually means downtime for any application connecting to the grid?
Since I was under the impression that downtime was inevitable it seemed to me that a 3d application would solve this problem:
PHP connects to 3d App, 3d App inserts/updates/deletes into one database to quickly return last_insert_id, PHP continues its process and 3d App continues inserting/updating/deleting from the other data nodes. In this scenario each DB is not replicated or clustered, they are standalone DB servers, the 3d App is a daemon.
Q2: Does anybody know of such an app?
In the above scenario selects from the PHP end would randomly choose a DB server (to load balance)
Thank you for your time and wisdom
A rolling restart basically tracks a series of nodes and restarts them one by one. It makes sure that no users are logged on to the node before the restart, then restarts, moves on to the next node or server and so forth. So yes your server will be restarted but in a sequence and so if you have a cluster setup with n nodes, each node restarts one by one, hence either removing the down time or limiting it.
I would suggest integrating your PHP script with a NoSQL database, you can set up clusters for those, and will have almost no latency. If you still want a MySQL Synced database then you can also try to set up the NoSQL as master and replicate to a MySQL slave, that too is possible.
Lots of questions here.
There is no implicit functionality within master-slave erplication for promoting a slave in the event that a master fails. Bu scripting it yourself is trivial.
For master-master replication that is just not an issue - OTOH, running with lots of nodes, a failure can increasing divergence in the datasets.
A lot of the functionality described by your 3rd app is implemented by mysqlproxy - although there's nothing to stop you building the functionality into your own DB abstraction layer (you can hand off processing via an asynchronous message/http call or as a shutdown function)

Categories