Query Percona Cluster Nodes For Current Load - php

I'm attempting to put together a mysql load balancer with a mockup php like script. Problem is i've been looking about countless variables in the database and cant find a variable of the current load on that server so I can pick the faster server to give the client.

Mysql is not aware of the server resource use, so what to do is to use for example Cacti, get the data from there and use that in your loadbalancing app.
Another way is just to use round robin and assume the systems will be pretty normally distributed over time.
Third option is to auto scale the number of slave servers using for example Kubernetes with NFS & ZFS for central storage and making snapshots of the database available on the slave nodes (for a read only solution

Related

LOAD DATA LOCAL INFILE from Google Storage / Google App Engine?

I need to run a process that will perform about 10,000 mysql inserts into a GoogleSQL instance. Normally, I would use a load data local infile query for this to avoid the script timing out, but my app is running in Google App Engine which has a read-only filesystem. Normally, when I need my GAE app to write to the filesystem, I can just use file names prefixed with gs:// and the php code will read/write to/from Google Storage transparently.
However, I doubt that MySQL will understand a file path of gs://path/to/my/file.
Is there another way that I can make a dynamically generated local file available in a Google App Engine environment so that I can load it into my GoogleSQL instance?
Otherwise, I feel like I'm going to need to build a looping ajax system to insert X rows at a time until it's gone through however many I need (10,000... 20,000, etc).
I know that I can put multiple values sets into a single insert to speed it all up and I'm planning to do that, but with the datasets as large as I'm dealing with, that still won't speed things up enough to avoid the timeouts consistently.

Load balancing multiple read database on MySql / PHP / CodeIgniter

I am using Amazon's RDS. I have a single database, and we are getting fairly heavy traffic. I already scaled our EC2 instances without any issues, it's been working great, but I want to loosen the database load by creating:
1 - Write database
2 - Read databases
Obviously, I will have to have multiple connections going on in my script, and reading from one and writing to one is easy enough, but what is the logic for load balancing multiple read databases?
Is there something in Amazon I can setup to do this? Like the load balancing for EC2? Or is this something I have to setup within my scripts automatically?
Technically, I may NOT need 2 read db instances at this time, but surely this is a common thing, right? I would assume this would need to be done, and I was curious about the architecture.
Unfortunately there is no easy way of doing this. Due to the automagically managed nature of RDS, you are at the mercy of amazon and the services they provide. You have a few options though.
1. You stick with RDS and set up a round robin DNS.
This is achieved easiest through route53. You do this by creating multiple CNAME records for each of your read replicas' endpoints. eg db.mydomain.com -> somename.23ui23asdad4r.region.rds.amazonaws.com
Make sure to turn on weighted routing policy and set the weight and "set ID" to the same.
rinse and repeat for each read replica.
http://note.io/1agsSMB
Caveat 1: this is not a true load balancer. This is simply rolling a die and pointing each request to one of your RDS
Caveat 2: There is no way to health check your RDS instances and there is no way to auto-scale the instances either. Unless you do some crazy things with cloud watch trigger scripts to manually add and remove RDS read replicas and update route53.
2. Use a die roll in your application itself.
A really cheap and nasty approach you could try is to create a config for each of your read replicas in CodeIgniter and when you connect to the database you randomly choose one.
Caveats: Same as above but even worse as you will need to update your codeigniter config each time you add or remove a read replica.
3. Spend hours and hours porting your RDS to ec2 instances.
You move your database to EC2 instances. This is perhaps the most difficult solution as you will need to manage ALL of your database tweaking and tuning yourself. On the plus side you will be able to put them in an autoscaling group and behind an internal load balancer in your VPC
RDS cluster provides you two endpoints read and write. If you send the read traffic on read endpoint, AWS will manage load balancing for all read replicas. You can also apply a scaling policy for read replicas.
These options are available for AWS Aurora clusters.

mysql incremental replication from remote DB

We are developing a web application using mysql and php.
In our application, we are expected to sync our local mysql database with a remote mysql database ( running on a different host) based on a trigger from the user interface.
User trigger is in the form of a webpage and when the user clicks on a button, the php server script is fired which should perform this synchronization in the background.
We planned to do it in a simple way by opening db connection with the remote and local db and inserting the rows, one row at a time. But the size of the remote DB table can be very high ( as big as a few million entries) and hence we need a more efficient solution.
Can someone help us with the sql query / php code which can do this db sync in an efficient manner without burdening the remote DB too much.
Thanks in advance
Shyam
UPDATE
The remote DB is not in my control so I cannot configure it as master or do any other settings on it. So that is one major limitation I have. That is why I want to do it programatically using php. Another option I have is to read blocks of 1000 rows from remote DB and insert into the local DB. But I wanted to know if there is a better way ?
You shouldn't concern yourself with MySQL replication from an application layer when the data layer has this functionality built-in. Please read up on "Master/Slave replication with MySQL". https://www.digitalocean.com/community/articles/how-to-set-up-master-slave-replication-in-mysql is a starting point.
The cost of replication is minimal as far as the Master is concerned. It basically works like this:
The Master logs all (relevant) activity into a flat log file
The Slave downloads this log every now and then and replays the binary log locally
Therefore, the impact on the Master is only writing linear data to a log file, plus a little bit of bandwidth. If you still worry about the impact on the Master, you could throttle the link between Master and Slave (at the system level), or just open the link during low activity times (by issuing STOP/START SLAVE commands at appropriate times).
I should also mention that built-in replication takes place at a low level inside the MySQL engine. I do not think you can achieve better performance with an external process. If you want to fully sinchronise your local dtabase when you hit this "Synchronise" button, then look no further.
If you can live with partial synchronisation, then you could have this button resume replication for a short timeframe (eg. START SLAVE for 10 secondes and STOP SLAVE again automatically; the user needs to click again to get more data) synchronised.

PHP / PDO with a MySQL Cluster

I have been asked to re-develop an old php web app which currently uses mysql_query functions to access a replicated database (4 slaves, 1 master).
Part of this redevelopment will move some of the database into a mysql-cluster. I usually use PDO to access databases these days and I am trying to find out whether or not PDO will play nicely with a cluster, but I can't find much useful information on the web.
Does anyone have any experience with this? I have never worked with a cluster before ...
I've done this a couple different ways with different levels of success. The short answer is that your PDO connections should work fine. The options, as I see them, are as follows:
If you are using replication, then either write a class that handles connections to various servers or use a proxy. The proxy may be a hardware or a software. MySQL Proxy (http://docs.oracle.com/cd/E17952_01/refman-5.5-en/mysql-proxy.html) is the software load balancer I used to use and for the most part it did the trick. It automatically routes traffic between your readers and writers, and handles failover like a champ. Every now and then we'd write a query that would throw it off and have to tweak things, but that was years ago. It may be in better shape now.
Another option is to use a standard load balancer and create two connections - one for the writer and the other for the readers. Your app can decide which connection to use based on the function it's trying to perform.
Finally, you could consider using the max db cluster available from MySQL. In this setup, the MySQL servers are all readers AND writers. You only need one connection, but you'll need a load balancer to route all of the traffic. Max db cluster can be tricky if the indexes become out of sync, so tread lightly if you go with this option.
Clarification: When I refer to connections what I mean is an address and port to connect to MySQL on - not to be confused with concurrent connections running on the same port.
Good luck!
Have you considered hiding the cluster behind a hardware or software load balancer (e.g. HAProxy)? This way, the client code doesn't need to deal with the cluster at all, it sees the cluster as just one virtual server.
You still need to distinguish applications that write from those that read. In our system, we put the slave servers behind the load balancer, and read-only applications use this cluster, while writing applications access the master server directly. We don't try to make this happen automatically; applications that need to update the database simply use a different server hostname and username.
Write a wrapper class for the DB that has your connect and query functions in it...
The query function needs to look at the very first word to detect if it's a SELECT and use the slave DB connection, anything else (INSERT, UPDATE, RENAME, CREATE etc...) needs to go the MASTER server.
The connect() function would look at the array of slaves and pick a random one to use.
You should only connect to the master slave when you need to do an update (Most webpages shouldn't be updating the DB, only reading data... make sure you don't waste time connecting to the MASTER db when you won't use it)
You can also use a static variable in your class to hold your DB connections, that way connections are shared between instances of your DB class (i.e. you only have to open the DB connection once instead of everytime you call '$db = new DB()')
Abstracting the database functions into a class like this also makes it easier to debug or add features

File / Image Replication

I have a simple question and wish to hear others' experiences regarding which is the best way to replicate images across multiple hosts.
I have determined that storing images in the database and then using database replication over multiple hosts would result in maximum availability.
The worry I have with the filesystem is the difficulty synchronising the images (e.g I don't want 5 servers all hitting the same server for images!).
Now, the only concerns I have with storing images in the database is the extra queries hitting the database and the extra handling i'd have to put in place in apache if I wanted 'virtual' image links to point to database entries. (e.g AddHandler)
As far as my understanding goes:
If you have a script serving up the
images: Each image would require a
database call.
If you display the images inline as
binary data: Which could be done in
a single database call.
To provide external / linkable
images you would have to add a
addHandler for the extension you
wish to 'fake' and point it to your
scripting language (e.g php, asp).
I might have missed something, but I'm curious if anyone has any better ideas?
Edit:
Tom has suggested using mod_rewrite to save using an AddHandler, I have accepted as a proposed solution to the AddHandler issue; however I don't yet feel like I have a complete solution yet so please, please, keep answering ;)
A few have suggested using lighttpd over Apache. How different are the ISAPI modules for lighttpd?
If you store images in the database, you take an extra database hit plus you lose the innate caching/file serving optimizations in your web server. Apache will serve a static image much faster than PHP can manage it.
In our large app environments, we use up to 4 clusters:
App server cluster
Web service/data service cluster
Static resource (image, documents, multi-media) cluster
Database cluster
You'd be surprised how much traffic a static resource server can handle. Since it's not really computing (no app logic), a response can be optimized like crazy. If you go with a separate static resource cluster, you also leave yourself open to change just that portion of your architecture. For instance, in some benchmarks lighttpd is even faster at serving static resources than apache. If you have a separate cluster, you can change your http server there without changing anything else in your app environment.
I'd start with a 2-machine static resource cluster and see how that performs. That's another benefit of separating functions - you can scale out only where you need it. As far as synchronizing files, take a look at existing file synchronization tools versus rolling your own. You may find something that does what you need without having to write a line of code.
Serving the images from wherever you decide to store them is a trivial problem; I won't discuss how to solve it.
Deciding where to store them is the real decision you need to make. You need to think about what your goals are:
Redundancy of hardware
Lots of cheap storage
Read-scaling
Write-scaling
The last two are not the same and will definitely cause problems.
If you are confident that the size of this image library will not exceed the disc you're happy to put on your web servers (say, 200G at the time of writing, as being the largest high speed server-grade discs that can be obtained; I assume you want to use 1U web servers so you won't be able to store more than that in raid1, depending on your vendor), then you can get very good read-scaling by placing a copy of all the images on every web server.
Of course you might want to keep a master copy somewhere too, and have a daemon or process which syncs them from time to time, and have monitoring to check that they remain in sync and this daemon works, but these are details. Keeping a copy on every web server will make read-scaling pretty much perfect.
But keeping a copy everywhere will ruin write-scalability, as every single web server will have to write every changed / new file. Therefore your total write throughput will be limited to the slowest single web server in the cluster.
"Sharding" your image data between many servers will give good read/write scalability, but is a nontrivial exercise. It may also allow you to use cheap(ish) storage.
Having a single central server (or active/passive pair or something) with expensive IO hardware will give better write-throughput than using "cheap" IO hardware everywhere, but you'll then be limited by read-scalability.
Having your images in a database doesn't necessarily mean a database call for each one; you could cache these separately on each host (e.g. in temporary files) when they are retrieved. The source images would still be in the database and easy to synchronise across servers.
You also don't really need to add Apache handlers to serve an image through a PHP script whilst maintaining nice urls- you can make urls like http://server/image.php/param1/param2/param3.JPG and read the parameters through $_SERVER['PATH_INFO'] . You could also remove the 'image.php' portion of the URL (if you needed to) using mod_rewrite.
What you are looking for already exists and is called MogileFS
Target setup involves mogilefsd, replicated mysql databases and lighttd/perlbal for serving files; It will bring you failover, fine grained file replication (for exemple, you can decide to duplicate end-user images on several physical devices, and to keep only one physical instance of thumbnails). Load balancing can also be achieved quite easily.

Categories