I am developing a web application using php and mysql. This application runs on three different locations.
On internet
Head office
Branch office
Application runs on local server on head office and branch office. Internet connection is not available on every time. Customers placing orders through these three locations. My problem is, I want to synchronize the data among these three databases and keep these three databases up to date. Is there any way to do this?
I'm using SymmetricDS to synchronize databases. It is capable of synchronizing or replicating data between nodes (servers/databases), only pushing or pulling the data you define. It is a software based on Java, it has a steep learning curve, but it really does the job.
SymmetricDS can be set up to push changes from one node to the two other nodes, thus making sure that all three nodes contains the same data. You need to make sure that primary keys are unique keys, and not auto incremented values assigned by the database as this most likely will be an issue across the three different databases you'd like to synchronize.
The software installs triggers on the database, and captures changes when INSERT, UPDATE or DELETE (and other) operations are carried out. These data changes are then invoked on the other nodes. The software needs to run on each location, but does not need an internet connection that is available at all times.
I did worry in the beginning that triggers on all my tables would slow down performance, but this has not been a problem at all. I can't say that we've discovered any issues with performance after the triggers were installed.
Have a look at http://symmetricds.org/ for more details.
Try Schema and Data Comparison tools in dbForge Studio for MySQL.
We also have stand-alone tools:
dbForge Data Compare
dbForge Schema Compare
Related
Team,
We are building a web project (like IT ticketing system). And we expect to have some big clients as soon we release the product. There should be three ways to raise a ticket: 1) via web application (forms), 2) via email or 3) via phone call to agent. According to our research 99% of tickets come via email and that means we shall be storing a lot of long messages etc.
The project is scoped so that we have two interfaces: agents (IT folks handling queries) and clients (people who ask for help).
The question here is what would you suggest us to do considering expected data and storage growth:
centralize everything so that we have one app with a
single huge database (easy to backup etc. unless we stuck with ex.
data corruption or similar)...
separate app in two parts one for IT agents and another one for
clients. The idea is to split application in two: one centralized
interface and back-end for IT agents and another one for clients.
For each client we would create a separate database along with a
copy of the PHP project (code syncing is easy to automate). Multiple
client instances could be hosted on one or many servers. They would
communicate via APIs. For example: IT agent opens a dashboard and
the list of outstanding tickets is displayed. If that agent is
working on 10 big clients back-end would need to contact 10
instances via API and request outstanding tickets. We can ensure
only certain number of queries would be displayed...
Please feel free to add third option as well.
I am not quite sure that I understood everything correctly but from what I understood I can point out the following key points about
your system requirement:
You are dealing with lot of data, and the data will grow fast
Most of traffic is coming from Email ticketing system
You have a multi-client system
You have an agent which can view data from multiple clients.
Question is can this agent manipulate(create, update, delete) data from multiple clients?
This is quite important point for future limitations of the architecture or not. I will assume that it can only read data from multiple clients.
Your 2 suggestions:
I would not recommend that as approach as many other problems could arise as the database grows.
For example you will be forced to add Indexes to speed up queries on your db which will help in the
beginning but later this will come to hunt you down especially if you have to add a lot of Non-Clustered
indexes. You could make it a little better by using Read-only replicas but even with this you will at some point
have issues. The problem will still remain in your 1 main database which will grow.
Quote:
separate app in two parts one for IT agents and another one for
clients. The idea is to split application in two: one centralized
interface and back-end for IT agents and another one for clients. For
each client we would create a separate database along with a copy of
the PHP project (code syncing is easy to automate). Multiple client
instances could be hosted on one or many servers. They would
communicate via APIs. For example: IT agent opens a dashboard and the
list of outstanding tickets is displayed. If that agent is working on
10 big clients back-end would need to contact 10 instances via API and
request outstanding tickets. We can ensure only certain number of
queries would be displayed...
You can split it to 2 separate apps as you said:
Centralized Interface + back-end, would call the 1 or multiple databases
Client application + back-end(monolith or multiple services), would call the same database as Centralized interface
but only for current client
As far as I understood your problem is not scaling Web-Servers(your back-end) but the db? If your problem is scaling the back-end as well
then you can consider either scaling to multiple instances or splitting your domain to micro-services and scaling that architecture on
micro-service level for each service independently.
My Suggestion:
1. Scaling your back-end:
You can keep everything in one service(monolithic approach) and deploy it
on multiple servers and scale the whole service together. There is nothing wrong with this.
Like everything it depends of your Business/Domain requirements and what worked best for you.
Although it is very popular these days to use micro-services they are not the best solution for
every problem. I have worked with both types of architecture and they have worked fine for
different scenarios.
You can even have middle-ground solution between them to take on specific part which has
high scaling demand and extract that to be a separate service(like creating Tickets sub-system service)
and the rest of the application which has low demand would be one big service.
2. Scaling your database:
Considering the above points I would suggest you to use Data Sharding or Data Partitioning.
You can read about data sharding here.
In general it is a way to logically and physically split your data from one database to multiple based
on some partitioning or shard key.
This means that you can take one specific concept in your Domain as the Shard key to split the data based on it.
In your case this could be CustomerId. This can only be done if the Business operations which include more
then one Customer is not the case for your Business. Means if the all your operations are done within one Customer.
The only exception here would be reading/viewing more customers together. This is fine as this does not need any
transnational behavior.
This really depends on your Business-scenarios and logic.
If splitting your database to multiple databases based on shard-key CustomerId is not enough you can take a shard-key
which is even more specific inside the Customer scope. Again it depends if your Domain allows this.
In this case it could be for example the concept a CustomerA would have CustomerA-Europe shard
CustomerA-USA shard, CustomerA-Africa and so on.
This would represent the logical shard. The physical shard would be the physical database.
The important point is that you pick your logical shard-key in the beginning so that you can easily
migrate your data to different physical databases later when you need it based on that shard-key.
Additionally to this you could include Historization for some heavy tables to separate the up to date
data from your historical data. You can read more about this here.
This is somewhat of an abstract question but hopefully pretty simple at the same time. I just have no idea the best way to go about this except for an export/import and I can't do that due to permission issues. So i need some alternatives.
On one server, we'll call it 1.2.3 I have a database with 2 schemas, Rdb and test. These schemas have 27 and 3 tables respectively. This database stores call info from our phone system but we have reader access only so we're very limited in what we can do beyond selecting and joining for data records and info.
I then have a production database server, call it 3.2.1 With my main schemas and I'd like to place the previous 30 tables into one of these production schemas. After the migration is done, I'll need to create a script that will check the data on the first connection and then update the new schema on the production connection, but that's after the bulk migration is done.
I'm wondering if a php script would be the way to go about this initial migration, though. I'm using MySQL workbench and the export wizard fails for the read only database, but if there's another way in the interface then I don't know about it.
It's quite a bit of data, and I'm not necessarily looking for the fastest way but the easiest and most fail safe way.
For a one time data move, the easiest way is to use the command line tool mysqldump to dump your tables to file, then load the resulting file with mysql. This assumes that you are either shutting down 1.2.3, or will reconfigure your phone system to point to 3.2.1 (or update DNS appropriately). Also, this is much easier if you can get downtime on the phone system to move the data.
we have reader access only so we're very limited in what we can do beyond selecting and joining for data records
This really limits your options.
Master/Slave replication requires REPLICATION SLAVE privilege, which you probably need a user with SUPER privilege to create a replication user.
Trigger based replication solutions like SymetricDS will require a user with CREATE ROUTINE in order to create the triggers
An "Extract, Transform, Load" solution like Clover ETL will work best if tables have LAST_CHANGED timestamps. If they don't, then you would need ALTER TABLE privilege.
Different tools for different goals.
Master/Slave replication is generally used for Disaster Recovery, Availability or Read Scaling
Hetergenous Replication to replicate some (or all) tables between different environments (could be different RDBMS, or different replica sets) in a continuous, but asynchronous fashion.
ETL for bulk, hourly/daily/periodic data movements, with the ability to pick a subset of columns, aggregate, convert timestamp formats, merge with multiple sources, and generally fix whatever you need to with the data.
That should help you determine really what your situation is - whether it's a one time load with a temporary data sync, or if it's an on-going replication (real-time, or delayed).
Edit:
https://www.percona.com/doc/percona-toolkit/LATEST/index.html
Check out the Persona Toolkit. Specifically pt-table-sync and pt-table-checksum. They will help with this.
I am working on a project that synchronizes online and offline features due to the unstable Internet. I have come up with a possible solution. That is to create 2 similar databases for both online and offline and sync the two. My question is that is this a good method? Or are there better options?
I have researched online on the subject but I haven't come across anything substantive. One useful link I found was on database Replication. But I want the offline version to detect Internet presence and sync accordingly.
Pls can you help me find solutions or clues to solve my problem?
I'd suggest you have an online storage for syncing and a local database(browser indexeddb, program sqllite or something similar) and log all your changes in your local database but have a record with what data was entered after last sync.
When you have a connection you sync all new data with the online storage at set intervals(like once every 5 mins or constant stream if you have the bandwidth/cpu capacity)
When the user logs in from a "fresh" location the online database pushes all data to the client who fills the local database with the data and then it resumes normal syncing function.
Plan A: Primary-Primary replication (formerly called Master-Master). You do need to be careful PRIMARY KEYs and UNIQUE keys. While the "other" machine is offline, you could write conflicting values to a table. Later, when they try to sync up, replication will freeze, requiring manual intervention. (Not a pretty sight.)
Plan B: Write changes to some storage other than the db. This suffers the same drawbacks as Plan A, plus there is a bunch of coding on your part to implement it.
Plan C: Galera cluster with 3 nodes. When all 3 nodes are up, all can take writes. If one node goes down, or network problems make it seem offline to the other two, it will automatically become read-only. After things get fixed, the sync is done automatically.
Plan D: Only write to a reliable Primary; let the other be a readonly Replica. (But this violates your requirement about an "unstable Internet".)
None of these perfectly fits the requirements. Plan A seems to be the only one that has a chance. Let's look at that.
If you have any UNIQUE key in any table and you might insert new rows into it, the problem exists. Even something as innocuous as a 'normalization table' wherein you insert a name and get back an id for use in other tables has the problem. You might do that on both servers with the same name and get different ids. Now you have a mess that is virtually impossible to fix.
Not sure if its outside the scope of the project but you can try these:
https://pouchdb.com/
https://couchdb.apache.org/
" PouchDB is an open-source JavaScript database inspired by Apache CouchDB that is designed to run well within the browser.
PouchDB was created to help web developers build applications that work as well offline as they do online.
It enables applications to store data locally while offline, then synchronize it with CouchDB and compatible servers when the application is back online, keeping the user's data in sync no matter where they next login. "
Many database libraries come setup for multiple database connections - but I've never actually known of an scripting application that needed to connect to two databases during it's run. (compiled, daemon-running languages are a different matter).
I understand having database slaves so that you can spread the load out - but usually on startup only one of them is chosen to handle that scripts needs.
So why would a PHP or Ruby application need to connect to more than one database? Or rather, why would you split your data up among several databases?
The only thing I can think of is bad design from a slowly evolving system that started off in multiple separate parts.
Are you talking about different physical database servers or different databases in the "schema" sense?
Regarding physical servers, If you're using MySQL replication you might write to a master and always read from a slave. This helps split the load among each database.
The simple answer is "scalability".
The ready availability of replication and clustering in a number of database products makes multiple database use a definite 'this must be possible'. Any decent ORM should know how to connect to multiple databases as required.
But even when the main application doesn't connect to more than one, there will often be other needs that do. Report generation, either scripted or ad-hoc, often involve queries that run for a long time. These are best run on database replicants dedicated (and configured) for these queries so they don't disrupt the main application.
Another good use is a type of scripted processing. Many apps will have a regular process that needs to rummage through a large part of the database. Whislt updates obviously have to go to the master, the big read queries can be run off a replicant.
Of course, the obvious need is simple performance. I oversaw a webapp and database that grew from surviving comfortably on one MySQL databse on a 32-bit dual-core machine with 3Gb to needing two 8-core 64-bit servers with 8Gb. Once it reached this stage, it relied on the database handler directing traffic to both servers. We had a window of about 50 minutes in a day where it could survive on just one database.
I have a Ruby application that connects to multiple databases. One database contains user login credentials (which is shared between several other projects). Another database contains archived data that my application tracks and compares (that only my application accesses). Another database contains data regarding physical machine resources which my application uses to generate new data (these resources are used by several different applications). By splitting the data into multiple databases, different applications only access the data that they need to be accessing.
It is all too frequently the case that some of the data you need is stored in The Wrong Database. Sometimes it's personnel records in a PeopleSoft (Oracle) database. Maybe it's Enterprise CRM data on Informix. Or some departmental database stored in MS SQL Server. Whatever it is, it's in a different database, but you still need access (hopefully read-only).
Unless your primary database is magic-based, it isn't going to be able to provide you with remote table access for every other database out there. (Most will only provide remote access to other databases of the same type, eg: MySQL->MySQL.) When that all too frequent situation occurs, you'll have no other option but to have multiple database connections, and be glad that your framework supports it.
I have a site that connects with two databases. One powers the website content (CMS DB) the other drives a web application that runs within the site (large amounts of non-CMS data) In fact, the latter uses replication.
I don't feel that's bad design. If one set of data has no relation to the other, then it makes sense even from a pure organization perspective to house it in a separate DB. Otherwise, people would just put all their tables in one DB.
For added security, I always create two accounts for every database: a read-only account (good for SELECT) and a read-write account (for SELECT, UPDATE, INSERT, DELETE and whatever else I might need). On some pages, I may need to use both accounts, thus I will consume two connections for only one database.
Well, reading from one and writing to another is a very common use case. It's easy and fun to write a data access layer that reads from one connection (reading from the slave), and writes to another (the master). A single script might make multiple reads before writing -- perhaps some lookups are necessary for validation, for instance.
Scripting languages are also frequently used for integration. You might have two off-the-shelf codebases, both of which want to maintain their own database. Your integration code might want to talk to both of them.
In general, you can usually design out of using more than one connection, but in general, I don't see anything fundamentally wrong with using connections to more than one database.
Other reasons to have multiple databases. We have one application that everyone can access. We also have client database that are very differnt from client to client. It is easier to maintain the application that all clients use (and which is maintained by a differnte team) if the client_specific data is separated out to their own databases. It is also easier to move the client to a new server when they become a large enterprise client rather than the smaller clietns who run on a server with many other clients.
Further there are types of data that are transactional and need to be in databases that are set to full recovery mode with full transaction logging. Other data is only populated from imports and does not need transactional logging and which might slow down the system as the log grew enough to handle the 10,000,000 record import. These are often split out to a separate databse so they can be in simple recovery mode as it si not necessary to recover data from the transaction log if there is a problem, it can be easily recoverd by re-running the import.
Then data is split out into datawarehouses which are optimized for data reporting not transactions. Again these reporting databases are usually separate databases (often on separate servers).
Then you have the databases for multiple different COTS applications (we have accounting databases, Credit Card transaction porcessing databases, HR databases, our project management database). A particular website might need to access more than one of these or transfer information from one to the other. Believe me vendors won't let you copy their database structure into one database to rule them all.
We have several hundred databases here on many differnt servers.
currently I have two websites:
1. A website connected to mySQL database in host A.
2. A website connected to Ms. Access database in Host B.
Is there anyway if I update the database in Host B, the database in Host A can be updated automatically?
Thank you. Really appreciate your help.
Two options: - (1) at the database level, with whats commonly called ETL (Extract Transform and Load). In the Microsoft world you'd use SSIS (which comes as part of MS SQL) to move data about. This would be a common approach within an enterpruise, particularly if you have a lot of control over then environment.
(2) some sort of "service" based approach. Maybe you provide some sort of interface (like a web service) so that one application can call the other. The issue with that is that you need to build it into the application - but you seem to be after a database driven solution (?)
Have a think about what you're trying to do and who should be responsible for that - are you sure it's the database?
Regarding your specific technology - I'm not sure about MySQL as I haven't used it much myself; I don't know of any 'easy' way to have MySQL and Access talk to each other, so yopu may have to write something.
The data your exchanging - how much and how often? How timely (can you have one poll the other every hour, or does it need to be 'real-time')?
You could consider using a (new) third system that brokered communcation between the two databases, so that they could remain ignorant of each other and the need to update.
Is it likely you'll have a third database to update later (or a 4th, etc...?)
can you change the database platform to something that is common across both / all sites, and which has some sort of messaging / updating system built in?