PHP / PDO with a MySQL Cluster

PHP / PDO with a MySQL Cluster - php

I have been asked to re-develop an old php web app which currently uses mysql_query functions to access a replicated database (4 slaves, 1 master).
Part of this redevelopment will move some of the database into a mysql-cluster. I usually use PDO to access databases these days and I am trying to find out whether or not PDO will play nicely with a cluster, but I can't find much useful information on the web.
Does anyone have any experience with this? I have never worked with a cluster before ...

I've done this a couple different ways with different levels of success. The short answer is that your PDO connections should work fine. The options, as I see them, are as follows:
If you are using replication, then either write a class that handles connections to various servers or use a proxy. The proxy may be a hardware or a software. MySQL Proxy (http://docs.oracle.com/cd/E17952_01/refman-5.5-en/mysql-proxy.html) is the software load balancer I used to use and for the most part it did the trick. It automatically routes traffic between your readers and writers, and handles failover like a champ. Every now and then we'd write a query that would throw it off and have to tweak things, but that was years ago. It may be in better shape now.
Another option is to use a standard load balancer and create two connections - one for the writer and the other for the readers. Your app can decide which connection to use based on the function it's trying to perform.
Finally, you could consider using the max db cluster available from MySQL. In this setup, the MySQL servers are all readers AND writers. You only need one connection, but you'll need a load balancer to route all of the traffic. Max db cluster can be tricky if the indexes become out of sync, so tread lightly if you go with this option.
Clarification: When I refer to connections what I mean is an address and port to connect to MySQL on - not to be confused with concurrent connections running on the same port.
Good luck!

Have you considered hiding the cluster behind a hardware or software load balancer (e.g. HAProxy)? This way, the client code doesn't need to deal with the cluster at all, it sees the cluster as just one virtual server.
You still need to distinguish applications that write from those that read. In our system, we put the slave servers behind the load balancer, and read-only applications use this cluster, while writing applications access the master server directly. We don't try to make this happen automatically; applications that need to update the database simply use a different server hostname and username.

Write a wrapper class for the DB that has your connect and query functions in it...
The query function needs to look at the very first word to detect if it's a SELECT and use the slave DB connection, anything else (INSERT, UPDATE, RENAME, CREATE etc...) needs to go the MASTER server.
The connect() function would look at the array of slaves and pick a random one to use.
You should only connect to the master slave when you need to do an update (Most webpages shouldn't be updating the DB, only reading data... make sure you don't waste time connecting to the MASTER db when you won't use it)
You can also use a static variable in your class to hold your DB connections, that way connections are shared between instances of your DB class (i.e. you only have to open the DB connection once instead of everytime you call '$db = new DB()')
Abstracting the database functions into a class like this also makes it easier to debug or add features

Related

Query Percona Cluster Nodes For Current Load

I'm attempting to put together a mysql load balancer with a mockup php like script. Problem is i've been looking about countless variables in the database and cant find a variable of the current load on that server so I can pick the faster server to give the client.

Mysql is not aware of the server resource use, so what to do is to use for example Cacti, get the data from there and use that in your loadbalancing app.
Another way is just to use round robin and assume the systems will be pretty normally distributed over time.
Third option is to auto scale the number of slave servers using for example Kubernetes with NFS & ZFS for central storage and making snapshots of the database available on the slave nodes (for a read only solution

advantages of persistent mysql connections

On my website, when an user opens his profile or any other page, (almost all pages use data from mysql), my website makes around 50 connections to mysql in loading the page. This is the case on my development server, running elementary OS.
When I came to know about persistent connections in mysql, I was puzzled. If I am to run this website on a VPS (with low RAM in starting), and considering the overhead produced by a large number of mysql connections, will using persistent connections improve my website's performance?
Currently, I start and end a connection in every function. Is there a better way to connect to mysql?
And, taking into account that if 100 users are using my website simultaneously, what will be the performance if each page makes around 50-60 connections?

You asked so I will answer. You are doing this incorrectly. You should start the processing on each page request (each externally-accessible .php file) by using a common function to establish a single database connection, then you should reuse that connection.
You are getting away with this because you're probably using an automatic connection pool built in to your php database-access library, and because you have not yet scaled up your application.
You won't be able to scale this up very far using this multiconnection strategy because it will perform very badly indeed when you add users.
There are lots of examples of working open-source php-based web app systems you can look at. WordPress is an example. You'll find that most of them start by opening a database connection and storing its handle in a global variable.
You asked:
According to CBroe's comment, I changed my strategy. Actually, I am
using multiple database connections, but the functions are same (don't
ask why lol). So if I open connections on start, and then pass the
handler to the function, will that be an improvement?
Yes, that will be fine. You need to avoid churning connections to get best performance.
If you need connections to more than one different database, you can open them all. But it sounds like you need to hit just one database.
PHP doesn't have significant overhead when passing a handler to a function, so don't worry about that.

As explained wonderfully by Ollie Jones, I opened the connection on start, and my connections dropped from 50-60 per page to 1 per page. Although I don't see any change in performance on my local development server, this will surely we a great improvement when its on a live server. There is no need for me to use persistent connections yet.

Load balancing multiple read database on MySql / PHP / CodeIgniter

I am using Amazon's RDS. I have a single database, and we are getting fairly heavy traffic. I already scaled our EC2 instances without any issues, it's been working great, but I want to loosen the database load by creating:
1 - Write database
2 - Read databases
Obviously, I will have to have multiple connections going on in my script, and reading from one and writing to one is easy enough, but what is the logic for load balancing multiple read databases?
Is there something in Amazon I can setup to do this? Like the load balancing for EC2? Or is this something I have to setup within my scripts automatically?
Technically, I may NOT need 2 read db instances at this time, but surely this is a common thing, right? I would assume this would need to be done, and I was curious about the architecture.

Unfortunately there is no easy way of doing this. Due to the automagically managed nature of RDS, you are at the mercy of amazon and the services they provide. You have a few options though.
1. You stick with RDS and set up a round robin DNS.
This is achieved easiest through route53. You do this by creating multiple CNAME records for each of your read replicas' endpoints. eg db.mydomain.com -> somename.23ui23asdad4r.region.rds.amazonaws.com
Make sure to turn on weighted routing policy and set the weight and "set ID" to the same.
rinse and repeat for each read replica.
http://note.io/1agsSMB
Caveat 1: this is not a true load balancer. This is simply rolling a die and pointing each request to one of your RDS
Caveat 2: There is no way to health check your RDS instances and there is no way to auto-scale the instances either. Unless you do some crazy things with cloud watch trigger scripts to manually add and remove RDS read replicas and update route53.
2. Use a die roll in your application itself.
A really cheap and nasty approach you could try is to create a config for each of your read replicas in CodeIgniter and when you connect to the database you randomly choose one.
Caveats: Same as above but even worse as you will need to update your codeigniter config each time you add or remove a read replica.
3. Spend hours and hours porting your RDS to ec2 instances.
You move your database to EC2 instances. This is perhaps the most difficult solution as you will need to manage ALL of your database tweaking and tuning yourself. On the plus side you will be able to put them in an autoscaling group and behind an internal load balancer in your VPC

RDS cluster provides you two endpoints read and write. If you send the read traffic on read endpoint, AWS will manage load balancing for all read replicas. You can also apply a scaling policy for read replicas.
These options are available for AWS Aurora clusters.

Do unused mysql connections slow down scripts?

I am in the process of writing an API for my website, along with a pretty large class to process all of the API requests.
Most pages, if not all all the pages on the website will send at least one request to the api on load. The most important priority for this website is efficiency and resultantly, very quick server processing.
I am therefore seeking a little bit of advice when it comes to classes and certain PHP functions.
Firstly, it looks like the class I am writing will probably end up being around 3000 lines of code. If this is initialised on each page, ignoring the fact that only one or two of the functions within the class will be used per page, will this make the API much slower? Should I be looking at separate files with extensions to the class for each class method?
Secondly, I currently have all of my connections to the various databases in separate files within a directory. Within each connection is the function mysql_pconnect(). At the moment, I only require these files when needed. So if the method needs a connection to the x database, then I simply place require(connection...) into the method. Is it bad to include files within a class?
I am asking, because the only other solution is to require all of the connections at the top of the page so that the class can access them without requiring them for each method. This is why I would like to know whether having several connections at once, even if they are not used, is taxing on the server?
So three questions really:
Does initiating a large class at the beginning of each page slow down the script runtime, even if only one class method is being used? Is this why people use 'class extends class'?
Is it bad to 'require()' files within a class?
Do unused connections to a mysql database slow down the runtime of a script?

No, an unused MySQL connection won't consume much (if any) cpu time, though it will occupy a bit of memory to handle the various bits of 'state' which have to be maintained on a per-connection basis.
However, note that MySQL's connection protocol is actually fairly "light-weight". Maintaining a pool of persistent connections sounds attractive, but the cost of establishing a new connection is already very low anyways.
Persistent connections are a quick fix to solving connection overhead, but they do bring in issues. The worst one being abandoned connections can leave the connections in an indeterminate state (in-progress transactions, changed server variables/configurations, etc...) and you can quite easily create inadvertent deadlocks unless you're very careful.

If you are writing an api to serve data from your db to multiple 3rd parties you are better off not letting them query the db, not even through a webservice (unless you have second-to-second db mutations that these 3rd parties need immediately)
Better way would be to write xml file(s) to a protected location and use a webservice to auth a 3rd party and serve the xml file(s).
Then update the xml periodically.

PHP, MySQL and a large nummer of simple queries

I'm implementing an application that will have a lot of clients querying lots of small data packages from my webserver. Now I'm unsure whether to use persistent data connections to the database or not. The database is currently on the same system as the webserver and could connect via the socket, but this may change in the near future.
As I know a few releases of PHP ago mysqli_pconnect was removed because it behaved suboptimally. In the meantime it seems to be back again.
Based on my scenario I suppose I won't have an other chance to handle thousands of queries per minute but with loads of persistent connections and a MySQL configuration that reserves only little resources per connection, right?
Thanks for your input!

What happened when you tested it?
With the nest will in the world, there's no practical way you can convey all the informaton required for people to provide a definitive answer in a SO response. However usually there is very little overhead in establishing a mysql connection, particularly if it resides on the same system as the database client (in this case the webserver). There's even less overhead if you use a filesystem rather than a network socket.
So I'd suggest abstracting all your database calls so you can easily switch between connection types, but write your system to use on-demand connections, and ensure you code explicitly releases the connection as soon as practical - then see how it behaves.
C.

Are PHP persistant connections evil?
The problem is there can be only so
many connections active between Host
“Apache” and Host “MySQL”
Persistant connections usually give problems in that you hit the maximum number of connections. Also, in your case it does not give a great benefit since your database server is on the same host. Keep it to normal connections for now.

As they say, your mileage may vary, but I've never had good experiences using persistent connections from PHP, including MySQL and Oracle (both ODBC and OCI8). Every time I've tested it, the system fails to reuse connections. With high load, I end up hitting the top limit while I have hundreds of idle connections.
So my advice is that you actually try it and find out whether your set-up is reusing connections properly. If it isn't working as expected, it won't be a big lose anyway: opening a MySQL connection is not particularly costly compared to other DBMS.
Also, don't forget to reset all relevant settings when appropriate (whatever session value you change, it'll be waiting for you next time to stablish a connection and happen to reuse that one).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.