I have two mysql databases on two machines. Let's say first is production db and the second one is identical clone. My php app uses production db for default. But I need to have both dbs identical at the same time. It means I need solution for cases when production db is unavailable (for example connection error) so I just manually set second one and my app runs as usual. I would like to make it "failure resistant".
How should I do this? I think of making dump every minute but it is not a good solution when db is complex with many data...
These answers assume that standard replication is not an option for you, for whatever reason:
The following options are existing methods for manual data-sync that are well known, and would be good when combined when wrapped into a scripting language like bash etc. for a cronjob as needed with some logic to specify specific tables as needed, guarantee it is safe to run them in light of load, etc. on a production box.
Option 1: pt-table-sync
The pt-table-sync tool from the Percona MySQL toolkit allows for master-master, master-slave sync on demand in an existing replication scheme. Or you can use it to sync two servers that do not have any relationship.
Docs here from Percona
Following the example, for one way sync'ing.
pt-table-sync --execute h=sourcehost1,u=msandbox,p=msandbox h=desthost d=yourdb t=yourtables
Additionally the following features exist:
Dry Run Mode (--dry-run) - Program will connect, plan the sync, analyze conflicts and tell you how it would resolve the sync. This is key to making sure you use this powerful tool the right way.
Conflict analysis - see how the data compare - feed this back into your script to catch potential issues, or don't perform the sync to save time if there isn't a difference.
As I understand, a master-slave relationship need not exist necessarily - but the sync is more efficient if it does exist since more efficient checksum algorithms can be used for comparing the data.
Option 2: Hot/Streaming Backups with XtraDb
Alternatively, you could use something like the free Percona XtraBackup in it's host streaming mode to keep a backup file in sync, and restore to your dev box as needed.
XtraBackup makes MySQL hot backups for all versions of Percona Server,
MySQL, MariaDB, and Drizzle. It performs streaming, compressed, and
incremental MySQL backups.
Option C: LVM Snapshots
LVM snapshots are probably not the best option for a production box if you need to take them at any sort of frequency due to the brief locking/IO issues, but nonetheless here you go: MySQL Backups Using LVM Snapshots
All of these tools move data either one way, or bi-directionally - as such a thorough understanding of how to employ them is critical to avoid data loss.
Related
I want local database sync with live database using CURL and CRON job in LARAVEL not using third party tool manual code
it is almost a complicated issue. you need to pay attention to a bunch of concepts :
CAP Theorem
In the CAP theorem, also named Brewer's theorem, we know that any
distributed data store can only provide two of the following three
guarantees:
Consistency
Every read receives the most recent write or an error.
Availability
Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
Partition tolerance
The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
since you have Partition tolerance in your desired situation, so your problem is the choice between consistency and high availability.
I mentioned this theorem to notice you that you cant have any guaranty for having them both in WHATEVER solution you asking for.
anyway ...
the Databases availability and consistency issues are one of the tops arguments in the world of information technology. there are several ways for achieving such a purpose, for example using :
implementation clustering
master/slave - master/master - active/passive - active-active - etc architectures
using changelogs for keeping two DB sync
implementing CDC or some structure like it
and every other solution.
so I must say that your issue cant be satisfy without using third-party libraries of some system-level configuration, you cant do it with just a simple CURL.
but if you insist to do it with the nearest approach to curl, you can use NFS, by getting SQL backup and loading it instead of your local one.
I am in the final stages of configuring a service that is accessible from four global locations (with plans to add more later). I will be running the servers on an Ubuntu 12.04 box with MariaDB. My initial thought was to create servers that run independently of each other with 4 distinct databases and live with the constraint that users would only be able to login to the server where they were initially registered.
However, I have just run into this article that has got me thinking... .
From my reading of things if I set up a Galera cluster with master-master replication as suggested in the article I can move have the luxury of one large database that is consistently available across all four servers. I have gathered (and am hoping) that with the cluster setup correctly and functioning well I need do pretty much nothing in my PHP code (the four MariaDB instances will have the same user to access the database) - not even alter the PDO connection string.
However, this sounds almost too good to be true. My questions are:
are there other issues involved here that make for complications?
Do the PHP PDO connection strings need to be altered in anway?
Does the fact that my application is already structured to ensure that there is absolutely zero chance of two servers attempting to simultaneously write the same row help?
And finally, reading from the MariaDB docs, that this will not work with the TokuDB storage engine?
Is there a way to specifically stop the replication of a selected table? Could I in fact exploit the "only InnoDB/XtraDB" constraint and use another storage engine on the table I do not want to have replicated?
are there other issues involved here that make for complications?
There are some Known Limitations that you should be aware of. Generally, with clusters, you should ideally have an odd number of nodes to prevent split brain conditions, but an even number will usually work just as well.
Do the PHP PDO connection strings need to be altered in anway?
No. Your existing connection strings should work.
Does the fact that my application is already structured to ensure that there is absolutely zero chance of two servers attempting to simultaneously write the same row help?
Look at the known limitations and make sure your application will still do that. If you're using named locks, you'll need to change your application.
And finally, reading from the MariaDB docs, that this will not work with the TokuDB storage engine?
TokuDB support was added in the recent galera cluster distribution. I have used some and it does replicate just like InnoDB but I wouldn't rely on it since it's new in the galera cluster build.
Is there a way to specifically stop the replication of a selected table? Could I in fact exploit the "only InnoDB/XtraDB" constraint and use another storage engine on the table I do not want to have replicated?
I've heard a lot of people ask if they can omit tables or databases from replication but I still haven't heard a good reason why. Galera replication provides HA and is cheap and easy so even if some tables aren't important I can't find any realistic reason to not replicate the data. That being said, you could have data not replicated by using MyISAM/Aria.
I've been using MariaDB with galera in multiple moderately sized projects and it is the best solution I've found for HA and it also provides performance benefits. Other solutions are generally expensive or not mature. One thing you should consider is setting up a proxy for connecting to the database servers like HA Proxy, mysql-proxy, or glbd (which I use) to provide better redundancy and connection balancing for performance.
In response to DroidOS's comment below:
Every write in the cluster needs to be agreed upon by every node so any latency between nodes is added to every write. So, basically, every write will have the greatest round trip time between the writing server and the other nodes added to it.
No. Galera replication is all or nothing across the entire cluster. If any node has a problem writing the data, which can happen if a table doesn't have a primary key, the node will gracefully kill itself since it can't guarantee its data is consistent with the rest of the cluster. If that happens, the rest of the cluster will continue to operate normally. If there is a network issue, if one of the segments has quorum, it will continue to operate normally. Any segments without quorum will wait for more nodes to get quorum but will not accept queries. With this behavior, you can be sure that any node that you are able to query is consistent with the rest of the cluster.
Given that this has turned out to be such a popular question I thought I should add an extra answer by way of comment for anyone who runs into it.
The big issue with synchronous replication is the latency that introduced by the process. There will certainly be times when synchronous replication is required and latency has to be managed and then lived with. However, you might on reflection -as I did - realize that you can live with lazy replication. There are commercial solutions that deliver this albeit at a hefty fee. You also have the possibility of spinning your own solution - easier than you might think.
Im having a really hard time trying to go down the RIGHT road in a project.
I'm a one man band with a tight budget.
2 dedicated servers
MySQL 5 / php5
I'm using server 1 to consume a lot of data from various feeds. The server/software is running 24/7 generating a huge database.
Server 2 - holds a copy
Of the database with a web frontend
I don't have any experience of MySQL replication. I've been researching and from what I can tell the slaves are updated right after the master.
I want to have a very speedy website so that's why the processing is done on server 1, whilst sever 2 simply selects data.
If MySQL replication is mimicking server 1 then surely this is going slow down server 2 and have the opposite of the desired effect.
What I thought might best suit this scenario is to write a script to automate the process.
Server 2 has 2 databases. One for live one for processing.
The script ascertains which database is live and instead uses the other one.
It's drops any tables in it.
The script dumps the database from server 1.
Installs it on server 2's newly emptied database.
The script changes the websites config file to utilise the new database.
The process can be repeated over and over.
Whilst the database install will be large it can happen its entirety at night and should mean no down time.
Is this better than doing MySQL replication ?
I would welcome advice.
Its hard to believe that a database dump/load cycle would be faster than replication. Especially row-based (non-query) replication. Replication can be lagged (by running SLAVE STOP SQL_THREAD on the slave) if you don't want it during peak times (but of course you must have sufficient non-peak times to catch up). (Remember that MySQL has three replication modes: statement, row, and mixed. Statement-based does the exact same update load on the slaves, row-based just sends the rows that changed, and should be fairly cheap CPU-wise)
Either all your slaves are fast enough to apply changes, and still have plenty of I/O bandwidth and CPU time to handle SELECTs, or no number of slaves will help. Its possible some other method (e.g., direct copying of data files) might be faster, but more fragile, and really you're talking some relatively minor gains. If you can't handle the update load, your choice with MySQL is to shard (split so each server is only responsible for part of the data) or buy faster hardware.
But ultimately, this is all taking shots in the dark. You can fairly easily change from replication, to rsync, to some insane scheme involving drbd, to whatever, that really only affects your database layer, maybe only the database itself. You need actual benchmarks—actual data—to make decisions like this. I will tell you that as a general rule, properly-designed large OLTP databases run out of I/O bandwidth first.
I'd suggest start with what's easy. And that'd be a single database server, or built-in replication. Keep in mind that sharding may be necessary at some point.
Actually, there is probably one question you want to answer fairly early: Do you really want to go with MySQL? Consider PostgreSQL.
A high volume of inserts can most certainly impact front end performance, but the answer for your scenario depends on very specifically how your processing engine impacts resources. There are certain combinations of settings that will allow high performance on selects while inserting data constantly. It depends on your specific duty cycle, storage engine, indexing scheme, etc.
You start by thoroughly understanding table locking http://dev.mysql.com/doc/refman/5.0/en/table-locking.html This is a must!
Then you can explore features like INSERT DELAYED http://dev.mysql.com/doc/refman/5.0/en/insert-delayed.html
And optimize your indices (as few as possible) to reduce the impact of each insert http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
Since it sounds like your requirements are driven by lots of data growth (inserts), if you can't get the performance you need from a single instance, replication probably won't help. In which case you should go for the nightly load scenario.
We have a similar use case, and we do nightly batch loads, with replication for backup/failover purposes only.
You say "If MySQL replication is mimicking server 1 then surely this is going slow down server 2 and have the opposite of the desired effect."
I don't think this is going to slow down the server. Have you tried it and measured any performance difference? I really think this is the best way to go for you, unless you clearly measure a performance impact because of the replication.
You really haven't provided enough info on what you're aiming to do, but here's my best understanding: server1 is fetching data (using bandwidth) and processing it in some way, (using processing power and I/O); server2 is serving live info to users that is based on the post-processed data. Availability for server2 is more important than for server1, and a problem on server1 should not affect server2's operations.
If the there's a significant difference between the raw data that server1 is fetching and the 'finished' data for use on server2, perhaps with some temporary data being produced along the way, just have server1 do its work, and use some kind of a script to periodically bring post-processed data from server1 to server2. Perhaps post-processed data is smaller than the raw stuff that server1 is working on?
If server1 is not really doing much processing, just fetching of data and insertion into db, then replication might be reasonable way to move data from #1 to #2.
An in-between approach would be to only replicate certain post-processed tables, so server1 can do its work in other tables in mysql, and when the final product is being inserted into the replicated table, it will automatically appear on server2.
Have fun.
I maintain a PHP driven web application with Oracle backend. The app interacts with a number of third-party apps so information is managed with a combination of XML files, Microsoft Access databases and HTML forms. There are currently 80 tables with many BLOBs and a pretty good bunch of foreign key relationships. All procedures are carefully explained in a document that (of course) nobody ever reads. The customer was feeling uneasy about his data so he was given an estimate with some improvements that could be made (stuff like adding previews and confirmations in some operations).
Sadly, the customer misinterpreted one of the specs (a partial export to be written in 12 man-hours) and he's expecting a full backup and restore feature that would allow him to save and restore the complete database through a web browser without the DBA intervention.
Before having yet another argument with the client, I'd like to know whether I have any option to actually implement this feature in a timely manner, considering that it doesn't need any refinements (e.g., there is no need to select what to restore).
Production server is a Windows Server 2003 box running PHP/5.2.9. The Oracle server is a remote box running "Oracle9i Release 9.2.0.1.0 - 64bit Production".
(Please note I'm not a DBA so there may be well-known solutions I'm not aware of.)
Oracle is a monster. Once you've read this you'll realise that how you backup the system depends totally on how it has been configured. The short answer is to automate whatever manual process - invoke it as a long running process (since this is MSWindows, prefix the rman command with 'start') then use polling to detect when it finishes (e.g. wrap rman in a DOS batch file which logs start and end times).
I'd be hard pushed to think of a more difficult problem to provide a generic solution for than Oracle runing on top of MSWindows. The latter may be nice for users to click on buttons, but automating anything is a PITA.
Have fun :)
Finally, I had the chance of implementing full Oracle backup from PHP in a later project. I used the Oracle Data Pump command-line utilities, available since 10g. In short:
You define an Oracle directory to map a keyword to a physical directory and grant write permission to the app's Oracle user.
You run expdp with the appropriate arguments and get a complete dump in a single file.
To restore a backup, you run impdp.
It's also advisable to run commands with proc_open() rather than system() since you can bypass_shell if on Windows and have fine-grained control on the process.
As for this question, the pre-10g alternative is the "exp" / "imp" combo.
Consider a web app in which a call to the app consists of PHP script running several MySQL queries, some of them memcached.
The PHP does not do very complex job. It is mainly serving the MySQL data with some formatting.
In the past it used to be recommended to put MySQL and the app engine (PHP/Apache) on separate boxes.
However, when the data can be divided horizontally (for example when there are ten different customers using the service and it is possible to divide the data per customer) and when Nginx +FastCGI is used instead of heavier Apache, doesn't it make sense to put Nginx Memcache and MySQL on the same box? Then when more customers come, add similar boxes?
Background: We are moving to Amazon Ec2. And a separate box for MySQL and app server means double EBS volumes (needed on app servers to keep the code persistent as it changes often). Also if something happens to the database box, more customers will fail.
Clarification: Currently the app is running with LAMP on a single server (before moving to EC2).
If your application architecture is already designed to support Nginx and MySQL on separate instances, you may want to host all your services on the same instance until you receive enough traffic that justifies the separation.
In general, creating new identical instances with the full stack (Nginx + Your Application + MySQL) will make your setup much more difficult to maintain. Think about taking backups, releasing application updates, patching the database engine, updating the database schema, generating reports on all your clients, etc. If you opt for this method, you would really need to find some big advantages in order to offset all the disadvantages.
You need to measure carefully how much memory overhead everything has - I can't see enginex vs Apache making much difference, it's PHP which will use all the RAM (this in turn depends on how many processes the web server chooses to run, but that's more of a tuning issue).
Personally I'd stay away from enginex on the grounds that it is too risky to run such a weird server in production.
Databases always need lots of ram, and the only way you can sensibly tune the memory buffers is to have them on dedicated servers. This is assuming you have big data.
If you have very small data, you could keep it on the same box.
Likewise, memcached makes almost no sense if you're not running it on dedicated boxes. Taking memory from MySQL to give to memcached is really robbing Peter to pay Paul. MySQL can cache stuff in its innodb_buffer_pool quite efficiently (This saves IO, but may end up using more CPU as you won't cache presentation logic etc, which may be possible with memcached).
Memcached is only sensible if you're running it on dedicated boxes with lots of ram; it is also only sensible if you don't have enough grunt in your db servers to serve the read-workload of your app. Think about this before deploying it.
If your application is able to work with PHP and MySQL on different servers (I don't see why this wouldn't work, actually), then, it'll also work with PHP and MySQL on the same server.
The real question is : will your servers be able to handle the load of both Apache/nginx/PHP, MySQL, and memcached ?
And there is only one way to answer that question : you have to test in a "real" "production" configuration, to determine own loaded your servers are -- or use some tool like ab, siege, or OpenSTA to "simulate" that load.
If there is not too much load with everything on the same server... Well, go with it, if it makes the hosting of your application cheapier ;-)