MySQL database replication methods

MySQL database replication methods - php

I am building a site to be used by clients it would store there basic information and projects or services there paying for to my company. The entire login + panel would run under SSL/HTTPS but my main concern comes down to Database Replication to prevent any events where something is lost.
Because some of the projects are hosted by me for the clients I need a way to assure there data is safe and sound. At the moment I am using Media Temple GS service but will move to DV service ones more customers start to pickup.
Based on my personal knowledge I was thinking of doing something like you would do with Hard Drives. Where there is a Master and then Slave. In SQL terms there would be a Master (Index) Database and there would be few Slaves (Cache) Databases.
But the question is, what would be the best way to replicate or to backup the Master onto Slave(s) and should I have additional GS or DV servers or is using the same server but with different DB name good enough?
Edit
I did some looking around MT and came accross there MySQL GridContainer which seems to do the same as owing 2nd server. Would this be a good alternative to actuall 2nd server?

Idea of replication for backup is replicating database to another database that you can stop and create full backup of that stopped database, while your production database is running.
You can use same server for creating backup files, but don't forget, that backup can ruin server performance (hard disk load). Additionally - when database is big, and you need historical backup files - you may need to compress backup files, and compression opertion will ruin your server performance totally.
You can't avoid second server, because you have to copy backup to another machine anyway (backup on same machine makes no sense).
So in general - it's better to replicate to another machine, which can be used also in crisis situations, when master server is down.
I found nice article about many solutions for high availability MySQL: link to mysql.com.

Related

How to handle video and image upload to storage servers?

I am in the process of developing an application (in Go or possibly in PHP) where users needs to upload photos and images.
I have setup up a couple of ZFS (mirror) storage servers on different locations, but I am in doubt about how to let users best upload files. ZFS handles quotas and reservation.
I am running with a replicated Galera database on all servers, both for safety, but also for easy access to user accounts from each server. In other words, each server has a local copy of the database all the time. All users are virtual users only.
So far I have testes the following setup options:
Solution 1
Running SFTP (ProFTPD with module) or FTPS (Pure-FTPs with TLS) on the storage servers with virtual users.
This gives people direct access to the storage servers using a client like Filezilla. At the same time users can also upload using our web GUI from our main web server.
One advantage with this setup is that the FTP server can handle virtual users. Our web application will also send files via SFTP or FTPS.
One disadvantage is that FTP is meeh, annoying to firewall. Also I much prefer FTP over SSH (SFTP), rather than FTP over TLS (FTPS). However, only ProFTPD has a module for SSH, but it has been a real pain (many problems with non-working configuration options and file permission errors) to work with compared to PureFTPd, but PureFTPd only supports TLS.
Running with real SSH/SCP accounts and using PAM is not an option.
Solution 2
Mount the storage servers locally on the web server using NFS or CIFS (Samba is great at automatic resume in case a box goes down).
In this setup users can only upload via our main web server. The web server application, and the application running on the storage servers, then needs to support resumable uploads. I have been looking into using the tus protocol.
A disadvantage with both the above setups is that storage capacity needs to be managed somehow. When storage server 1 reaches its maximum number of users, the application needs to know this and then only create virtual users for storage server 2, 3, etc.
I have calculated how many users each storage server can hold and then have the web application check the database with virtual users to see when it needs to move newly created users to the next storage server.
This is rather old school, but it works.
Solution 3
Same as solution 2 (no FTP), but clone our web application upload thingy to each storage server and then redirect users (or provide them with a physical link to the storage server, s1.example.com, s2.example.com, etc.)
The possibly advantage with this setup is that users upload directly to the storage server they have been assigned to rather than go trough our main web server (preventing it from becoming a possible bottleneck).
Solution 4
Use GlusterFS on the storage servers and build a cluster that can easily be expanded. I have tested out GlusterFS and it works very well for this purpose.
The advantage with this setup is that I don't really need to care about where files physically go on which storage servers, and I can easily expand storage by adding more servers to the cluster.
However, the disadvantage here is again that our main web server might become a bottleneck.
I have also considered adding a load balancer and then use multiple web server in case our main web server becomes a bottleneck for uploading files.
In any case, I much prefer to keep it simple! I don't like adding stuff. I want it to be easy to maintain in the long run.
Any ideas, suggestions, and advice will be greatly appreciated.
How do you do it?

A web application should be agnostic of the underlying storage in case we are talking of file storage; Separation of concerns.
(S)FTP(S) on the other hand is not a storage method. It is a communication protocol. It does not preclude you from having a shared storage. See above.
ZFS does not come with the ability of shared storage included, so you are basically down to the following choices:
Which underlying filesystem?
Do I want to offer an additional access mode via (S)FTP(S)?
How do I make my filesystem available across multiple servers? GlusterFS, CIFS or NFS?
So, let us walk this through.
Filesystem
I know ZFS is intriguing, but here is the thing: xfs for example already has a maximum filesystem size of 8 exbibytes minus one byte. The specialist term for this is "a s...load". To give you a relation: The library of congress holds about 20TB of digital media - and would fit into that roughly 400k times. Even good ol' ext4 can hold 50k LoCs. And if you hold that much data, your FS is your smallest concern. Building the next couple of power plants to keep your stuff going presumably is.
Gist Nice to think about, but use whatever you feel comfortable with. I personally use xfs (on LVM) for pretty much everything.
Additional access methods
Sure, why not? Aside from the security nightmare (privilege escalation, anyone?). And ProFTPd, with it's in build coffee machine and kitchen sink is the last FTP server I would use for anything. It has a ginormous code base, which lends itself to accidentally introducing vulnerabilities.
Basically it boils down to the skills present in the project. Can you guys properly harden a system and an FTP server and monitor it for security incidents? Unless your answer is a confident "Yes, ofc, plenty with experience with it!" you should minimize the attack surface you present.
Gist Don't, unless you really know what you are doing. And if you have to ask, you probably do not. No offense intended, just stating facts.
Shared filesystem
Personally, I have made... less than perfect experiences with GlusterFS. The replication has quite some requirements when it comes to network latency and stuff. In a nutshell: if we are talking of multiple availability zones, say EMEA, APAC and NCSA, it is close to impossible. You'd be stuck to georeplication, which is less than ideal for the use case you describe.
NFS and CIFS on the other hand have the problem that there is no replication at all, and all clients need to access the same server instance in order to access the data - hardly a good idea if you think you need an underlying ZFS to get along.
Gist Shared filesystems at a global scale with halfway decent replication lags and access times are very hard to do and can get very expensive.
Haha, Smartypants, so what would you suggest?
Scale. Slowly. In the beginning, you should be able to get along with a simple FS based repository for your files. And then check various other means for large scale shared storage and migrate to it.
Taking the turn towards implementation, I would even go a step further, you should make your storage an interface:
// Storer takes the source and stores its contents under path for further reading via
// Retriever.
type Storer interface {
StreamTo(path string, source io.Reader) (err error)
}
// Retriever takes a path and streams the file it has stored under path to w.
type Retriever interface {
StreamFrom(path string, w io.Writer) (err error)
}
// Repository is a composite interface. It requires a
// repository to accept andf provide streams of files
type Repository interface {
Storer
Retriever
Close() error
}
Now, you can implement various storage methods quite easily:
// FileStore represents a filesystem based file Repository.
type FileStore struct {
basepath string
}
// StreamFrom statisfies the Retriever interface.
func (s *FileStore) StreamFrom(path string, w io.Writer) (err error) {
f, err := os.OpenFile(filepath.Join(s.basepath, path), os.O_RDONLY|os.O_EXCL, 0640)
if err != nil {
return handleErr(path, err)
}
defer f.Close()
_, err = io.Copy(w, f)
return err
}
Personally, I think this would be a great use case for GridFS, which, despite its name is not a filesystem, but a feature of MongoDB. As for the reasons:
MongoDB comes with a concept called replica sets to ensure availability with transparent automatic failover between servers
It comes with a rather simple mechanism of automatic data partitioning, called a sharded cluster
It comes with an indefinite number of access gateways called mongos query routers to access your sharded data.
For the client, aside from the connection URL, all this is transparent. So it does not make a difference (almost, aside from read preference and write concern) whether it's storage backend consists of a single server or a globally replicated sharded cluster with 600 nodes.
If done properly, there is not a single point of failure, you can replicate across availability zones while keeping the "hot" data close to the respective users.
I have created a repository on GitHub which contains an example of the interface suggestion and implements a filesystem based repository as well as a MongoDB repository. You might want to have a look at it. It lacks caching at the moment. In case you would like to see that implemented, please open an issue there.

MySQL failover & PHP

We run a fairly busy website, and currently it runs on a traditional one server LAMP stack.
The site has lots of legacy code, and the database is very big (approx 50GB when gzipped, so probably 4 or 5 times that..)
Unfortunately, the site is very fragile and although I'm quite confident load balancing servers with just one database backend I'm at a bit of a loss with replication and that sort of thing.
There's a lot of data written and read to the database at all times - I think we can probably failover to a slave MySQL database fairly easily, but I'm confused about what needs to happen when the master comes back online (if a master/slave setup is suitable...) does the master pick up any written rows when it comes back up from the slave or does something else have to happen?
Is there a standard way of making PHP decide whether to use a master or slave database?
Perhaps someone can point me in the way of a good blog post that can guide me?
Thanks,
John

If you are trying to create a failover solution for your entire website, I found this article interesting. It talks about creating a clone of the mySql database and keeping them in sync with rsync.
A simpler solution would be to just backup your database periodically with a script that runs from a cron job. Also set up a static web page failover solution. This website has an article on setting that up. That's the way we do it. This way - if your database has issues, you can restore it using on of your backups, while you failover to your temporary static page.

Scaling Image Hosting With Multiple Servers

I have an image host which is gaining in popularity and need to start thinking about scaling (it's all currently hosted on a single machine).
I want to host the content on multiple amazon machines in order to be able to scale horizontally.
Can someone give me a basic rundown of the architecture (DB, image files, etc.) and or point me to some resources?

As far as the database is concerned, you probably want to use replication - whereby your 'master' database is replicated (in real time) to several 'slave' databases. All transactional statements (inserts, updates, deletes, etc.) are executed on the master database, then these are replicated to all of the slave databases in real time. Then, you can distribute the applications queries (select statements) across all of the slave databases for load balancing.
You may also want to keep copies of files on multiple servers for redundancy. Tools like rsync are good for this.
Finally, Amazon has cloud load balancers, so that incoming connections can be distributed to multiple servers.

Shared database desktop/web app

We have 2 applications: web-based (PHP) and Desktop (VB) sharing the same database (Hostgator). Our web app has a fast access to the database (it's localhost). Our desktop application suffers with slow access and frequent timeouts.
What's the best approach to this kind of issue? Should we share a database? Are there any other solution.
Thanks

Some possible solutions:
Get a faster DB server
Move your database to a server that is closer to the desktop(s)
Host your webserver/DB at the location of the desktop(s)
Have two DBs, the current one that is local to the webserver and a second one that is local to the desktop(s) and set the second up as a slave to the first. You would have to consider if the desktop(s) write to the DB in this scenario. This option is probably not a good one unless the desktop(s) are read-only and aren't worried about possibly out-of-date data. This could potentially work if the desktop(s) read a lot but write less frequently.

There is no problem to "share" a DB. Have you checked the server load and the connection stability?

AFAIK, I dont suppose, this could be a problem. Because, web or desktop, both access the database with MySQL server, so it mustn't be giving mixed performance results.

The problem is probably not that it's shared; rather, it's probably the network that the data is going over. There are very few circumstances in which it's faster to use a network connection than localhost for accessing MySQL data, so you can't expect the same performance from both.
However, you should be able to get a fairly fast and reliable db connection over a good network. If you're moving huge amounts of data, you may have to employ some sort of caching. But if the issues are happening even on moderately-sized queries, you may have to bring that issue to your hosting company for troubleshooting. Many shared hosts are not optimized for remote DB hosting (most sites don't need/use/want it), so if they can't accomodate it, you may have to move to a host that will meet your needs.

Does a separate MySQL server make sense when using Nginx instead of Apache?

Consider a web app in which a call to the app consists of PHP script running several MySQL queries, some of them memcached.
The PHP does not do very complex job. It is mainly serving the MySQL data with some formatting.
In the past it used to be recommended to put MySQL and the app engine (PHP/Apache) on separate boxes.
However, when the data can be divided horizontally (for example when there are ten different customers using the service and it is possible to divide the data per customer) and when Nginx +FastCGI is used instead of heavier Apache, doesn't it make sense to put Nginx Memcache and MySQL on the same box? Then when more customers come, add similar boxes?
Background: We are moving to Amazon Ec2. And a separate box for MySQL and app server means double EBS volumes (needed on app servers to keep the code persistent as it changes often). Also if something happens to the database box, more customers will fail.
Clarification: Currently the app is running with LAMP on a single server (before moving to EC2).

If your application architecture is already designed to support Nginx and MySQL on separate instances, you may want to host all your services on the same instance until you receive enough traffic that justifies the separation.
In general, creating new identical instances with the full stack (Nginx + Your Application + MySQL) will make your setup much more difficult to maintain. Think about taking backups, releasing application updates, patching the database engine, updating the database schema, generating reports on all your clients, etc. If you opt for this method, you would really need to find some big advantages in order to offset all the disadvantages.

You need to measure carefully how much memory overhead everything has - I can't see enginex vs Apache making much difference, it's PHP which will use all the RAM (this in turn depends on how many processes the web server chooses to run, but that's more of a tuning issue).
Personally I'd stay away from enginex on the grounds that it is too risky to run such a weird server in production.
Databases always need lots of ram, and the only way you can sensibly tune the memory buffers is to have them on dedicated servers. This is assuming you have big data.
If you have very small data, you could keep it on the same box.
Likewise, memcached makes almost no sense if you're not running it on dedicated boxes. Taking memory from MySQL to give to memcached is really robbing Peter to pay Paul. MySQL can cache stuff in its innodb_buffer_pool quite efficiently (This saves IO, but may end up using more CPU as you won't cache presentation logic etc, which may be possible with memcached).
Memcached is only sensible if you're running it on dedicated boxes with lots of ram; it is also only sensible if you don't have enough grunt in your db servers to serve the read-workload of your app. Think about this before deploying it.

If your application is able to work with PHP and MySQL on different servers (I don't see why this wouldn't work, actually), then, it'll also work with PHP and MySQL on the same server.
The real question is : will your servers be able to handle the load of both Apache/nginx/PHP, MySQL, and memcached ?
And there is only one way to answer that question : you have to test in a "real" "production" configuration, to determine own loaded your servers are -- or use some tool like ab, siege, or OpenSTA to "simulate" that load.
If there is not too much load with everything on the same server... Well, go with it, if it makes the hosting of your application cheapier ;-)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.