PHP/PDO MariaDB Galera Cluster

PHP/PDO MariaDB Galera Cluster - php

I am in the final stages of configuring a service that is accessible from four global locations (with plans to add more later). I will be running the servers on an Ubuntu 12.04 box with MariaDB. My initial thought was to create servers that run independently of each other with 4 distinct databases and live with the constraint that users would only be able to login to the server where they were initially registered.
However, I have just run into this article that has got me thinking... .
From my reading of things if I set up a Galera cluster with master-master replication as suggested in the article I can move have the luxury of one large database that is consistently available across all four servers. I have gathered (and am hoping) that with the cluster setup correctly and functioning well I need do pretty much nothing in my PHP code (the four MariaDB instances will have the same user to access the database) - not even alter the PDO connection string.
However, this sounds almost too good to be true. My questions are:
are there other issues involved here that make for complications?
Do the PHP PDO connection strings need to be altered in anway?
Does the fact that my application is already structured to ensure that there is absolutely zero chance of two servers attempting to simultaneously write the same row help?
And finally, reading from the MariaDB docs, that this will not work with the TokuDB storage engine?
Is there a way to specifically stop the replication of a selected table? Could I in fact exploit the "only InnoDB/XtraDB" constraint and use another storage engine on the table I do not want to have replicated?

are there other issues involved here that make for complications?
There are some Known Limitations that you should be aware of. Generally, with clusters, you should ideally have an odd number of nodes to prevent split brain conditions, but an even number will usually work just as well.
Do the PHP PDO connection strings need to be altered in anway?
No. Your existing connection strings should work.
Does the fact that my application is already structured to ensure that there is absolutely zero chance of two servers attempting to simultaneously write the same row help?
Look at the known limitations and make sure your application will still do that. If you're using named locks, you'll need to change your application.
And finally, reading from the MariaDB docs, that this will not work with the TokuDB storage engine?
TokuDB support was added in the recent galera cluster distribution. I have used some and it does replicate just like InnoDB but I wouldn't rely on it since it's new in the galera cluster build.
Is there a way to specifically stop the replication of a selected table? Could I in fact exploit the "only InnoDB/XtraDB" constraint and use another storage engine on the table I do not want to have replicated?
I've heard a lot of people ask if they can omit tables or databases from replication but I still haven't heard a good reason why. Galera replication provides HA and is cheap and easy so even if some tables aren't important I can't find any realistic reason to not replicate the data. That being said, you could have data not replicated by using MyISAM/Aria.
I've been using MariaDB with galera in multiple moderately sized projects and it is the best solution I've found for HA and it also provides performance benefits. Other solutions are generally expensive or not mature. One thing you should consider is setting up a proxy for connecting to the database servers like HA Proxy, mysql-proxy, or glbd (which I use) to provide better redundancy and connection balancing for performance.
In response to DroidOS's comment below:
Every write in the cluster needs to be agreed upon by every node so any latency between nodes is added to every write. So, basically, every write will have the greatest round trip time between the writing server and the other nodes added to it.
No. Galera replication is all or nothing across the entire cluster. If any node has a problem writing the data, which can happen if a table doesn't have a primary key, the node will gracefully kill itself since it can't guarantee its data is consistent with the rest of the cluster. If that happens, the rest of the cluster will continue to operate normally. If there is a network issue, if one of the segments has quorum, it will continue to operate normally. Any segments without quorum will wait for more nodes to get quorum but will not accept queries. With this behavior, you can be sure that any node that you are able to query is consistent with the rest of the cluster.

Given that this has turned out to be such a popular question I thought I should add an extra answer by way of comment for anyone who runs into it.
The big issue with synchronous replication is the latency that introduced by the process. There will certainly be times when synchronous replication is required and latency has to be managed and then lived with. However, you might on reflection -as I did - realize that you can live with lazy replication. There are commercial solutions that deliver this albeit at a hefty fee. You also have the possibility of spinning your own solution - easier than you might think.

Related

MariaDB multiple DB split between multiple servers - can I query like they all are on the same instance?

I have this scenario: I have a PhP 5.6 application that connects to a MariaDB (edit: I stated MySQL before but is not correct) instance.
In this instance I have a "master" DB and a DB for each firm using the application. Each new firm -> a new DB.
The number of DBs is growing exponentially and we are considering a way to split these client's DBs between multiple MariaDB servers.
The problem is that there are many queries that join between client's DB and master DB, so we cannot blindly just connect to another Host.
Is it possible to setup a MariaDB instance that have databases in other hosts, but still "virtually sees them" as on the same instance (so that cross DB queries still work) ?
I tried to google this without success.
Thank you in advance!
EDIT: I've found out about Federated Tables that might be the solution.
Question is, we like to split DBs between servers because we might have a number of DBs in the range of 50.000-100.000 and we fear about performances.
If I create these DBs locally all with federated tables, will this solve my issue or are we still facing performance issues?

Sounds like you need FederatedX to get to the common database from the client dbs.
And "sharding" is putting several clients on each of several servers. Not one client on each of 50K servers.
Or is there some requirement you have not provided yet?
How big are these "clients"? How big is the largest? Are there security concerns if multiple clients live on the same server? In the same MariaDB instance? In the same table?
Have you already set up some form of Proxy to route a client request to the server for that client? Or are you expecting the client to own his server?
Etc, etc.
Keep in mind that reaching into a 'federated' table is slow, in some cases orders of magnitude slower. So work hard on minimizing the need for federation.
Another approach would be less burden on the individual queries, but more admin effort -- duplicate the "common" database across all of the shards. This could probably be done via replication by having the "common" server be a Master and all the shards by Slaves.

You can use Firebase it's fast easy more then MySQL v
But if you need to this
You can make a simple API on the other server that has the instance for MySQL this API will run your query on the server and return back the results

Zend Framework Multiple Database Stop Working with slow query

I'm running a big Zend Framework web application with 5 database, independent of each other, distributed on 2 database servers running on Mysql 5.6.36 - CentOS7 with 16gb ram 8 core processor each. However, if one of the 2 database servers stops responding because of slows query, the users on the other server cannot access the web application. The only way to turn on the application is to restart mysql on that server. I try different things without success. The strange thing is that if I turn off one of the servers the system continues to work correctly.

It's hard to offer a meaningful answer, because you've given us no information about your tables or your queries (in fact, you haven't asked a question at all, you've just told us a story! :-).
I will offer a guess that you are using MyISAM for one or more of your tables. This means a query from one client locks the table(s) it queries, and blocks concurrent updates from other clients.
To confirm you have this problem, use SHOW PROCESSLIST on each of your two database servers at the time you experience the contention between web apps. You might see a bunch of queries stuck waiting for a lock (it may appear in the processlist with the state of "Updating").
If so, you might have better luck if you alter your tables' storage engine to InnoDB. See https://dev.mysql.com/doc/refman/5.7/en/converting-tables-to-innodb.html

keep 2 mysql dbs identical on two machines

I have two mysql databases on two machines. Let's say first is production db and the second one is identical clone. My php app uses production db for default. But I need to have both dbs identical at the same time. It means I need solution for cases when production db is unavailable (for example connection error) so I just manually set second one and my app runs as usual. I would like to make it "failure resistant".
How should I do this? I think of making dump every minute but it is not a good solution when db is complex with many data...

These answers assume that standard replication is not an option for you, for whatever reason:
The following options are existing methods for manual data-sync that are well known, and would be good when combined when wrapped into a scripting language like bash etc. for a cronjob as needed with some logic to specify specific tables as needed, guarantee it is safe to run them in light of load, etc. on a production box.
Option 1: pt-table-sync
The pt-table-sync tool from the Percona MySQL toolkit allows for master-master, master-slave sync on demand in an existing replication scheme. Or you can use it to sync two servers that do not have any relationship.
Docs here from Percona
Following the example, for one way sync'ing.
pt-table-sync --execute h=sourcehost1,u=msandbox,p=msandbox h=desthost d=yourdb t=yourtables
Additionally the following features exist:
Dry Run Mode (--dry-run) - Program will connect, plan the sync, analyze conflicts and tell you how it would resolve the sync. This is key to making sure you use this powerful tool the right way.
Conflict analysis - see how the data compare - feed this back into your script to catch potential issues, or don't perform the sync to save time if there isn't a difference.
As I understand, a master-slave relationship need not exist necessarily - but the sync is more efficient if it does exist since more efficient checksum algorithms can be used for comparing the data.
Option 2: Hot/Streaming Backups with XtraDb
Alternatively, you could use something like the free Percona XtraBackup in it's host streaming mode to keep a backup file in sync, and restore to your dev box as needed.
XtraBackup makes MySQL hot backups for all versions of Percona Server,
MySQL, MariaDB, and Drizzle. It performs streaming, compressed, and
incremental MySQL backups.
Option C: LVM Snapshots
LVM snapshots are probably not the best option for a production box if you need to take them at any sort of frequency due to the brief locking/IO issues, but nonetheless here you go: MySQL Backups Using LVM Snapshots
All of these tools move data either one way, or bi-directionally - as such a thorough understanding of how to employ them is critical to avoid data loss.

MySQL replication vs other techniques

Im having a really hard time trying to go down the RIGHT road in a project.
I'm a one man band with a tight budget.
2 dedicated servers
MySQL 5 / php5
I'm using server 1 to consume a lot of data from various feeds. The server/software is running 24/7 generating a huge database.
Server 2 - holds a copy
Of the database with a web frontend
I don't have any experience of MySQL replication. I've been researching and from what I can tell the slaves are updated right after the master.
I want to have a very speedy website so that's why the processing is done on server 1, whilst sever 2 simply selects data.
If MySQL replication is mimicking server 1 then surely this is going slow down server 2 and have the opposite of the desired effect.
What I thought might best suit this scenario is to write a script to automate the process.
Server 2 has 2 databases. One for live one for processing.
The script ascertains which database is live and instead uses the other one.
It's drops any tables in it.
The script dumps the database from server 1.
Installs it on server 2's newly emptied database.
The script changes the websites config file to utilise the new database.
The process can be repeated over and over.
Whilst the database install will be large it can happen its entirety at night and should mean no down time.
Is this better than doing MySQL replication ?
I would welcome advice.

Its hard to believe that a database dump/load cycle would be faster than replication. Especially row-based (non-query) replication. Replication can be lagged (by running SLAVE STOP SQL_THREAD on the slave) if you don't want it during peak times (but of course you must have sufficient non-peak times to catch up). (Remember that MySQL has three replication modes: statement, row, and mixed. Statement-based does the exact same update load on the slaves, row-based just sends the rows that changed, and should be fairly cheap CPU-wise)
Either all your slaves are fast enough to apply changes, and still have plenty of I/O bandwidth and CPU time to handle SELECTs, or no number of slaves will help. Its possible some other method (e.g., direct copying of data files) might be faster, but more fragile, and really you're talking some relatively minor gains. If you can't handle the update load, your choice with MySQL is to shard (split so each server is only responsible for part of the data) or buy faster hardware.
But ultimately, this is all taking shots in the dark. You can fairly easily change from replication, to rsync, to some insane scheme involving drbd, to whatever, that really only affects your database layer, maybe only the database itself. You need actual benchmarks—actual data—to make decisions like this. I will tell you that as a general rule, properly-designed large OLTP databases run out of I/O bandwidth first.
I'd suggest start with what's easy. And that'd be a single database server, or built-in replication. Keep in mind that sharding may be necessary at some point.
Actually, there is probably one question you want to answer fairly early: Do you really want to go with MySQL? Consider PostgreSQL.

A high volume of inserts can most certainly impact front end performance, but the answer for your scenario depends on very specifically how your processing engine impacts resources. There are certain combinations of settings that will allow high performance on selects while inserting data constantly. It depends on your specific duty cycle, storage engine, indexing scheme, etc.
You start by thoroughly understanding table locking http://dev.mysql.com/doc/refman/5.0/en/table-locking.html This is a must!
Then you can explore features like INSERT DELAYED http://dev.mysql.com/doc/refman/5.0/en/insert-delayed.html
And optimize your indices (as few as possible) to reduce the impact of each insert http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
Since it sounds like your requirements are driven by lots of data growth (inserts), if you can't get the performance you need from a single instance, replication probably won't help. In which case you should go for the nightly load scenario.
We have a similar use case, and we do nightly batch loads, with replication for backup/failover purposes only.

You say "If MySQL replication is mimicking server 1 then surely this is going slow down server 2 and have the opposite of the desired effect."
I don't think this is going to slow down the server. Have you tried it and measured any performance difference? I really think this is the best way to go for you, unless you clearly measure a performance impact because of the replication.

You really haven't provided enough info on what you're aiming to do, but here's my best understanding: server1 is fetching data (using bandwidth) and processing it in some way, (using processing power and I/O); server2 is serving live info to users that is based on the post-processed data. Availability for server2 is more important than for server1, and a problem on server1 should not affect server2's operations.
If the there's a significant difference between the raw data that server1 is fetching and the 'finished' data for use on server2, perhaps with some temporary data being produced along the way, just have server1 do its work, and use some kind of a script to periodically bring post-processed data from server1 to server2. Perhaps post-processed data is smaller than the raw stuff that server1 is working on?
If server1 is not really doing much processing, just fetching of data and insertion into db, then replication might be reasonable way to move data from #1 to #2.
An in-between approach would be to only replicate certain post-processed tables, so server1 can do its work in other tables in mysql, and when the final product is being inserted into the replicated table, it will automatically appear on server2.
Have fun.

Does a separate MySQL server make sense when using Nginx instead of Apache?

Consider a web app in which a call to the app consists of PHP script running several MySQL queries, some of them memcached.
The PHP does not do very complex job. It is mainly serving the MySQL data with some formatting.
In the past it used to be recommended to put MySQL and the app engine (PHP/Apache) on separate boxes.
However, when the data can be divided horizontally (for example when there are ten different customers using the service and it is possible to divide the data per customer) and when Nginx +FastCGI is used instead of heavier Apache, doesn't it make sense to put Nginx Memcache and MySQL on the same box? Then when more customers come, add similar boxes?
Background: We are moving to Amazon Ec2. And a separate box for MySQL and app server means double EBS volumes (needed on app servers to keep the code persistent as it changes often). Also if something happens to the database box, more customers will fail.
Clarification: Currently the app is running with LAMP on a single server (before moving to EC2).

If your application architecture is already designed to support Nginx and MySQL on separate instances, you may want to host all your services on the same instance until you receive enough traffic that justifies the separation.
In general, creating new identical instances with the full stack (Nginx + Your Application + MySQL) will make your setup much more difficult to maintain. Think about taking backups, releasing application updates, patching the database engine, updating the database schema, generating reports on all your clients, etc. If you opt for this method, you would really need to find some big advantages in order to offset all the disadvantages.

You need to measure carefully how much memory overhead everything has - I can't see enginex vs Apache making much difference, it's PHP which will use all the RAM (this in turn depends on how many processes the web server chooses to run, but that's more of a tuning issue).
Personally I'd stay away from enginex on the grounds that it is too risky to run such a weird server in production.
Databases always need lots of ram, and the only way you can sensibly tune the memory buffers is to have them on dedicated servers. This is assuming you have big data.
If you have very small data, you could keep it on the same box.
Likewise, memcached makes almost no sense if you're not running it on dedicated boxes. Taking memory from MySQL to give to memcached is really robbing Peter to pay Paul. MySQL can cache stuff in its innodb_buffer_pool quite efficiently (This saves IO, but may end up using more CPU as you won't cache presentation logic etc, which may be possible with memcached).
Memcached is only sensible if you're running it on dedicated boxes with lots of ram; it is also only sensible if you don't have enough grunt in your db servers to serve the read-workload of your app. Think about this before deploying it.

If your application is able to work with PHP and MySQL on different servers (I don't see why this wouldn't work, actually), then, it'll also work with PHP and MySQL on the same server.
The real question is : will your servers be able to handle the load of both Apache/nginx/PHP, MySQL, and memcached ?
And there is only one way to answer that question : you have to test in a "real" "production" configuration, to determine own loaded your servers are -- or use some tool like ab, siege, or OpenSTA to "simulate" that load.
If there is not too much load with everything on the same server... Well, go with it, if it makes the hosting of your application cheapier ;-)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.