Replicate Master DB to Different Slaves - php

I have a master database which would be the cloud server that consisted of different schools.
Dashboard type that has the details of each school. Can edit their information and other data.
Now those schools are deployed to their corresponding school location which would be the local server.
Dashboard type that can only edit the specific school deployed in the local server. Can edit their information and other data.
Now what I want to happen is, to synchronize the cloud to local server on their corresponding school if something is changed. That also goes for local to cloud server.
Note: If you guys ever tried Evernote, that can edit the notes information on whatever device you're using and still be able to
synchronize when you have internet or manually clicked synchronize.
When the local server doesn't have internet connection and edited some data in school. Once the internet is up, the data from local and cloud server should be synchronize.
That's the logic that I'm pursuing to have.
Would anyone shed some light for me where to start off? I couldn't think of any solution that fit my problem.
I also think of using php to foreach loop all over the table and data that corresponds to current date and time. But I know that would be so bad.
Edited: I deleted references / posts of other SO questions regarding this matter.
The application pegs that I found are
Evernote
Todoist
Servers:
Local Server Computer: Windows 10 (Deployed in Schools)
Cloud Server: Probably some dedicated hosting that uses phpmyadmin
Not to be picky but, hopefully the answer would be you're talking to a newbie to master to slave database process. I don't have experience for this.

When we used to do this we would:
Make sure every table we wanted to sync had datetime columns for Created; Modified; & Deleted. They would also have a boolean isDeleted column (so rather than physically delete records we would flag it to true and ignore it in queries). This means we could query for any records that have been deleted since a certain time and return an array of these deleted IDs.
In each DB (Master and slave) create a table that stores the last successful sync datetime. In the master this table stores multiple records: 1 for each school, but in the slave it just needs 1 record - the last time it synced with the master.
In your case every so often each of the slaves would:
Call a webservice (a URL) of the master, lets say called 'helloMaster'. It would pass in the school name (or some specific identifier), the last time they successfully synced with the master, authentication details (for security) and expect a response from the master of whether the master had any updates for the school since that datetime provided. Really the point here is just looking for an acknowledgement that the master available and listening (ie. the internet is still up).
Then, the slave would call another webservice, lets say called 'sendUpdates'. It would again pass in the school name, last successful sync, (plus security authentication details) & three arrays for any added, updated and deleted records since last sync. The master just acknowledge receipt. If a receipt was acknowledged then the slave to move to step 3, otherwise the slave would try step 1 again after a pause of some duration. So now the Master has updates from the slave. Note: it is up to the master to decide how to merge any records if there are conflicts with its pending slave updates.
The slave then calls a webservice, lets say 'getUpdates'. It passes in the school name, last successful sync, security authentication details, & the master then return to it three arrays for any added, updated and deleted records it has which the slave is expected to apply to its database.
Finally once the slave tries to update its records it will then notifies the master of success/failure through another webservice, say 'updateStatus'. If successful then the master will return a new sync date for the slave to store (this will exactly match the date the master stores in its table). If it fails then the error is logged in the master and we go back to step 1 after a pause.
I have left out some detail out about error handling, getting the times accurate across all devices (there might be different time zones involved), and some other bits and pieces, but that's the gist of it.
I may make refinements after thinking on it more (or others might edit my post).
Hope that helps at least.

I will suggest you to go with the Trivial Solution, which according to me is:
Create a SQLlite or any database (MySQL or your choice) in local server
Keep a always running thread which will be pinging (makes an API call) your Master database every 5 minutes (depends on how much delay is accepted)
With that thread you can detect whether you're connected to the internet or not.
If connected to internet
a) Send local changes with the request to master server, this master server is an application server, which will be capable to update changes of local machines in school (you received this changes by an API call) to the master database after certain validations according to your application usage.
b) Receive updated changes from the server after the API call, this changes are served after solving conflicts (like if data in school server was updated earlier than data updated in master database so which one you will accept based on your requirement).
If not connected to internet, keep storing changes in local database and reflect those changes in Application which is running in school, but when you get connected push those changes to master server and pull actual changes which is applicable from the master server.
This is complicated to do it by your own, but if the scale is small I will prefer to implement your own APIs for the database applications which will connect in this manner.
Better solution will be to use Google Firebase, which is a real time database which is asynchronously updated whenever there is change in any machine, but can cost you higher if its really not required. But yes it will really give you Evernote type realtime editing features for your database systems.

This is not a problem that can be solved by database replication.
Generally speaking, database replication can operate in one of two modes:
Master/slave replication, which is what MySQL uses. In this mode, all writes must be routed to a single "master" server, and all of the replica databases receive a feed of changes from the master.
This doesn't suit your needs, as writes can only be made to the master. (Modifying one of the replicas directly would result in it becoming permanently out of sync with the master.)
Quorum-based replication, which is used by some newer databases. All database replicas connect to each other. So long as at least half of all replicas are connected (that is, the cluster has reached "quorum"), writes can be made to any of the active databases, and will be propagated to all of the other databases. A database that is not connected will be brought up to date when it joins the quorum.
This doesn't suit your needs either, as a disconnected replica cannot be written to. Worse, having more than half of all replicas disconnect from the master would prevent the remaining databases from being written to either!
What you need is some sort of data synchronization solution. Any solution will require some logic -- which you will have to write! -- to resolve conflicts. (For instance, if a record is modified in the master database while a school's local replica is disconnected, and the same record is also modified there, you will need some way to reconcile those differences.)

No need for any complicated setup or APIs. MySQL allows you to easily replicate your database. MySQL will ensure the replication is correctly and timely done and whenever internet is available. (and its fast too)
There are:
Master - slave: Master edits slave reads or in other words one way synchronization from master to slave.
Master - Master: Master1 edits master2 reads and edits or in other words two way synchronization. Both server will push and pull updates.
assuming your cloud server has schema for each school and each schema is accessible by its own username and password. i.e db_school1, db_school2
now you have the option to replicate only a selected database schema from your cloud to local master. In your case, school one's local master will only "do replicate db_school1"
in case if you want to replicate only specific table, MySQL also has that option "replicate-do-table"
the actual replication process is very easy but can get very deep when you have different scenarios.
few things you want to take a note, server ids, different auto-increment value on each server to avoid conflicts with new records. i.e Master1 generates records on odd number, Master 2 on even numbers so there won't be a duplicate primary key issues. Server down alerts/monitoring, error skipping
I'm not sure if you are on linux or windows, I've wrote simple c# application which checks if any of the master is not replicating or stopped for any reason and sends email. monitoring is crucial!
here some links for master master replication:
https://www.howtoforge.com/mysql_master_master_replication
https://www.digitalocean.com/community/tutorials/how-to-set-up-mysql-master-master-replication
also worth reading this optimised tabl-level replication info:
https://dba.stackexchange.com/questions/37015/how-can-i-replicate-some-tables-without-transferring-the-entire-log
hope this helps.

Edit:
The original version of this answer proposed MongoDB; but with further reading MongoDB is not so reliable with dodgy internet connections. CouchDB is designed for offline documents, which is what you need - although it's harder to get gong than MongoDB, unfortunately.
Original:
I'd suggest not using MySQL but deploy a document store designed for replication such as CouchDB - unless you go for the commercial MySQL clustering services.
Being a lover of the power of MySQL I find it hard to suggest you use something else, but in this case, you really should.
Here is why -
Problems using MySQL replication
Why MySQL had good replication (and that's most likely what you should be using if you're synchronizing a MySQL database - as recommended by others) there are some things to watch out for.
"Unique Key" clashes will give you a massive headache; the most
likely cause of this is "Auto Incrementing" IDs that are common in
MySQL applications (don't use them for syncing operation unless there
is a clear "read+write"->"read-only" relationship - which there isn't
in your case.)
Primary keys must be generated by each server but unique across all servers. Possibly by adding a mix of a server identifier and a unique ID for that server (Server1_1, Server1_2, Server1_3 etc will not clash with Server2_1)
MySQL sync only supports on-way unless you look at their clustering solutions (https://www.mysql.com/products/cluster/).
Problems doing it "manually" with time stamping the record.
Another answer recommends keeping "Time Updated" records. While I've done this approach there are some big gotchas to be careful of.
"Unique Key" clashes (as mentioned above; same problems - don't use them except primary keys, and generate primary keys unique to the server)
Multiple updates on multiple servers need to be precisely time-synced
and clashes handled according to rules. This can be a headache.
What happens when updates are received way out-of-order; which fields have been updated, which weren't? You probably don't need to update the whole record, but how do you know?
If you must, try one of the commercial solutions as mentioned in answers https://serverfault.com/questions/58829/how-to-keep-multiple-read-write-db-servers-in-sync and https://community.spiceworks.com/topic/352125-how-to-synchronize-multiple-mysql-servers and Strategy on synchronizing database from multiple locations to a central database and vice versa (etc - Google for more)
Problems doing it "manually" with journalling.
Journalling is keeping a separate record of what has changed and when. "Database X, Table Y, Field Z was updated to value A at time B" or "Table A had new record added with these details [...]". This allows you much finer control of what to update.
if you look at database sync techniques, this is actually what is going on in the background; in MySQL's case it keeps a binary log of the updates
you only ever share the journal, never the original record.
When another server receives a journal entry, if has a much greater picture of what has happened before/after and can replay updates and ensure you get the correct details.
problems arise when the journalling/database get out of Sync (MySQL is actually a pain when this happens!). You need to have a "refresh" script ready to roll that sits outside the journalling that will sync the DB to the master.
It's complicated. So...
Solution: Using a document store designed for replication, e.g. MongoDB
Bearing all this that in mind, why not use a document store that already does all that for you? CouchDB has support and handles all the journalling and syncing (http://docs.couchdb.org/en/master/replication/protocol.html).
There are others out there, but I believe you'll end up with less headaches and errors than with the other solutions.

Master to master replication in MySQL can be accomplished without key violations while using auto_increment. Here is a link that explains how.
If you have tables without primary keys I'm not sure what will happen (I always include auto_increment primary keys on tables)
http://brendanschwartz.com/post/12702901390/mysql-master-master-replication
The auto-increment-offset and auto-increment-increment effect the auto_increment values as shown in the config samples from the article...
server_id = 1
log_bin = /var/log/mysql/mysql-bin.log
log_bin_index = /var/log/mysql/mysql-bin.log.index
relay_log = /var/log/mysql/mysql-relay-bin
relay_log_index = /var/log/mysql/mysql-relay-bin.index
expire_logs_days = 10
max_binlog_size = 100M
log_slave_updates = 1
auto-increment-increment = 2
auto-increment-offset = 1
server_id = 2
log_bin = /var/log/mysql/mysql-bin.log
log_bin_index = /var/log/mysql/mysql-bin.log.index
relay_log = /var/log/mysql/mysql-relay-bin
relay_log_index = /var/log/mysql/mysql-relay-bin.index
expire_logs_days = 10
max_binlog_size = 100M
log_slave_updates = 1
auto-increment-increment = 2
auto-increment-offset = 2

Related

Which database writer model is optimal?

I have 2 servers.
Server 1: (Master Server)
- Master Database MariaDB
- PHP Service
+ Reader from Master Database
+ Writer into Master database
Server 2: (Slave Server)
- Slave Database from Server 1
- PHP Service
+ Reader from Slave Database
So, I want to create Writer into Master Database for Server 2.
And I drawing and two way for insert data into Master Server in this image (The direction of the arrow is the direction the data will come.).
I do not know which the way to choose for optimization and I do not have much time to program both tests.
Anyone who has experience in this or a new idea. thank you
I assume the Web Service next to the Master is for ingesting data from sources other than the "Clients"?
The first picture is better. Why incur the extra hop and the extra coding of the second picture?
But... Beware of the "critical read" problem. This where, for example, a client posts a blog comment, then goes to the next web page, but that page fails to show the comment. The problem is that the write to the Master may not have gotten to the Slave before the user gets to that page.
This is solved in a variety of ways; I am merely alerting you of the issue. (The issue is probably more prevalent in the second picture.)
I have done the first picture many times; never even considered the second picture.

Mysql, data migration between databases/servers (migrate now with regular updates later)

This is somewhat of an abstract question but hopefully pretty simple at the same time. I just have no idea the best way to go about this except for an export/import and I can't do that due to permission issues. So i need some alternatives.
On one server, we'll call it 1.2.3 I have a database with 2 schemas, Rdb and test. These schemas have 27 and 3 tables respectively. This database stores call info from our phone system but we have reader access only so we're very limited in what we can do beyond selecting and joining for data records and info.
I then have a production database server, call it 3.2.1 With my main schemas and I'd like to place the previous 30 tables into one of these production schemas. After the migration is done, I'll need to create a script that will check the data on the first connection and then update the new schema on the production connection, but that's after the bulk migration is done.
I'm wondering if a php script would be the way to go about this initial migration, though. I'm using MySQL workbench and the export wizard fails for the read only database, but if there's another way in the interface then I don't know about it.
It's quite a bit of data, and I'm not necessarily looking for the fastest way but the easiest and most fail safe way.
For a one time data move, the easiest way is to use the command line tool mysqldump to dump your tables to file, then load the resulting file with mysql. This assumes that you are either shutting down 1.2.3, or will reconfigure your phone system to point to 3.2.1 (or update DNS appropriately). Also, this is much easier if you can get downtime on the phone system to move the data.
we have reader access only so we're very limited in what we can do beyond selecting and joining for data records
This really limits your options.
Master/Slave replication requires REPLICATION SLAVE privilege, which you probably need a user with SUPER privilege to create a replication user.
Trigger based replication solutions like SymetricDS will require a user with CREATE ROUTINE in order to create the triggers
An "Extract, Transform, Load" solution like Clover ETL will work best if tables have LAST_CHANGED timestamps. If they don't, then you would need ALTER TABLE privilege.
Different tools for different goals.
Master/Slave replication is generally used for Disaster Recovery, Availability or Read Scaling
Hetergenous Replication to replicate some (or all) tables between different environments (could be different RDBMS, or different replica sets) in a continuous, but asynchronous fashion.
ETL for bulk, hourly/daily/periodic data movements, with the ability to pick a subset of columns, aggregate, convert timestamp formats, merge with multiple sources, and generally fix whatever you need to with the data.
That should help you determine really what your situation is - whether it's a one time load with a temporary data sync, or if it's an on-going replication (real-time, or delayed).
Edit:
https://www.percona.com/doc/percona-toolkit/LATEST/index.html
Check out the Persona Toolkit. Specifically pt-table-sync and pt-table-checksum. They will help with this.

Online and offline synchronization

I am working on a project that synchronizes online and offline features due to the unstable Internet. I have come up with a possible solution. That is to create 2 similar databases for both online and offline and sync the two. My question is that is this a good method? Or are there better options?
I have researched online on the subject but I haven't come across anything substantive. One useful link I found was on database Replication. But I want the offline version to detect Internet presence and sync accordingly.
Pls can you help me find solutions or clues to solve my problem?
I'd suggest you have an online storage for syncing and a local database(browser indexeddb, program sqllite or something similar) and log all your changes in your local database but have a record with what data was entered after last sync.
When you have a connection you sync all new data with the online storage at set intervals(like once every 5 mins or constant stream if you have the bandwidth/cpu capacity)
When the user logs in from a "fresh" location the online database pushes all data to the client who fills the local database with the data and then it resumes normal syncing function.
Plan A: Primary-Primary replication (formerly called Master-Master). You do need to be careful PRIMARY KEYs and UNIQUE keys. While the "other" machine is offline, you could write conflicting values to a table. Later, when they try to sync up, replication will freeze, requiring manual intervention. (Not a pretty sight.)
Plan B: Write changes to some storage other than the db. This suffers the same drawbacks as Plan A, plus there is a bunch of coding on your part to implement it.
Plan C: Galera cluster with 3 nodes. When all 3 nodes are up, all can take writes. If one node goes down, or network problems make it seem offline to the other two, it will automatically become read-only. After things get fixed, the sync is done automatically.
Plan D: Only write to a reliable Primary; let the other be a readonly Replica. (But this violates your requirement about an "unstable Internet".)
None of these perfectly fits the requirements. Plan A seems to be the only one that has a chance. Let's look at that.
If you have any UNIQUE key in any table and you might insert new rows into it, the problem exists. Even something as innocuous as a 'normalization table' wherein you insert a name and get back an id for use in other tables has the problem. You might do that on both servers with the same name and get different ids. Now you have a mess that is virtually impossible to fix.
Not sure if its outside the scope of the project but you can try these:
https://pouchdb.com/
https://couchdb.apache.org/
" PouchDB is an open-source JavaScript database inspired by Apache CouchDB that is designed to run well within the browser.
PouchDB was created to help web developers build applications that work as well offline as they do online.
It enables applications to store data locally while offline, then synchronize it with CouchDB and compatible servers when the application is back online, keeping the user's data in sync no matter where they next login. "

MySQL Database - Capacity planning dilemma, advice please

I have a MySQL hosting and capacity planning question. I would like to know the minimum hosting requirements to host a MySQL database of the type and size described below:
Background: I have a customer in the finance industry who has bought a bespoke software CMS platform written in PHP with a MySQL database.
Their current solution does not have any reports, and the software vendor who provided it only allows them to use some PHP pages to export the entire contents of tables which the customer then has to manually manipulate in Excel to obtain their business reporting.
The vendor will not allow them access to their live database in order
to run Crystal Reports saying that this is a risk to the database, preferring them to purchase an expensive database replication solution; so the
customer continues to perform tedious manual exports of entire tables
every day.
The database: The database is currently 90MB in size and a custom 9 month old PHP solution sits on top of it. The customer has no access to this as it is hosted by their current vendor. There are 43 tables in total, of which one - a whopping big log table uses up 99% of the database size.
The top four tables sizes containing the business data are tiny tables;
34.62 MB
13.79 MB
8.46 MB
7.59 MB
The vast majority of the tables are simple look-up tables for data values and have only a few rows.
The largest table in the database, however, is a big-ass log table which is 1400MB in size. This table alone accounts for over 99.9% of the total database size.
The question: Considering that the solution is (log table notwithstanding) very small, with only a few staff members making data entry via some simple PHP forms, is there a realistic problem with running Crystal Reports against such a database in production? Bearing in mind that there are times during the day - the majority of the day in fact - when this database is simply not being used. Lunchtimes for example and out of hours.
The vendor maintains that there is a fundamental risk to the business to query live data and that running Crystal Reports against this database could cause it to "crash the live db and the business loses operations".
The customer is keen to have a live dashboard too; which could be written with a very small SQL query to aggregate some numbers from those small tables listed above.
I usually work with SQL Server and Oracle and I have absolutely no qualms about allowing a Crystal Report or running a view to populate a UI with some real time data from the live database - especially a database this small; after all what is the database for if one cannot SELECT from it now and again?
Is it necessary, to avoid "hanging the server" and to "avoid querying when other operations are occurring on the server", to replicate this MySQL database to a second, reporting database? In my experience, the need to do this only applies to sensitive, security-risk or databases with high transactional volumes.
System usage: The system is heavily reliant on scheduled CRON jobs every half hour. There may be 500 users per week each logging on and entering some data (but not much data - see table sizes above).
Any comments are warmly welcome.
Thanks for your time.
1) You need 2 $5 digital ocean servers.
2) "crash the live db and the business loses operations". Is absolutely false. They are idiots. What they are likely hiding is the poor structure of their database. They likely have 1 table architecture for all of their clients only separating from a client_id. Giving access to the table would give access to all of the client data which is why they force a giant replication solution so they can make sure you are only getting YOUR data.
3) Is it necessary, to avoid "hanging the server" and to "avoid querying when other operations are occurring on the server"? Yes it is.
4) to replicate this MySQL database to a second, reporting database? Yes this is good practice as you can setup fail over in the event that the worst happens. If you are really paranoid you can setup remote fail over from different companies. seeing as how this is in the financial sector I am pretty sure you want that.
5) In my experience, the need to do this only applies to sensitive, security-risk or databases with high transactional volumes. In my experience it is always good to have your data backed up because sh*t happens in life and usually when you least expect it.
As for your real-time usage. Assuming the database is structured properly with indexes and using InnoDB you should have minimal issues supporting 100 requests per second, so I think your 500 a week user problem is something to not worry about.
Like i had mentioned what you likely want is 2 servers at different providers, likely the cheapest instances you can get since you don't need a huge amount of space or resources. You can setup DNS to make 1 the primary and 1 the replication slave, then in a disaster scenario change the DNS and make the other one the master.
I hope this helps.

Android SyncAdapter mechanism [duplicate]

I'm looking for some general strategies for synchronizing data on a central server with client applications that are not always online.
In my particular case, I have an android phone application with an sqlite database and a PHP web application with a MySQL database.
Users will be able to add and edit information on the phone application and on the web application. I need to make sure that changes made one place are reflected everywhere even when the phone is not able to immediately communicate with the server.
I am not concerned with how to transfer data from the phone to the server or vice versa. I'm mentioning my particular technologies only because I cannot use, for example, the replication features available to MySQL.
I know that the client-server data synchronization problem has been around for a long, long time and would like information - articles, books, advice, etc - about patterns for handling the problem. I'd like to know about general strategies for dealing with synchronization to compare strengths, weaknesses and trade-offs.
The first thing you have to decide is a general policy about which side is considered "authoritative" in case of conflicting changes.
I.e.: suppose Record #125 is changed on the server on January 5th at 10pm and the same record is changed on one of the phones (let's call it Client A) on January 5th at 11pm.
Last synch was on Jan 3rd. Then the user reconnects on, say, January 8th.
Identifying what needs to be changed is "easy" in the sense that both the client and the server know the date of the last synch, so anything created or updated (see below for more on this) since the last synch needs to be reconciled.
So, suppose that the only changed record is #125.
You either decide that one of the two automatically "wins" and overwrites the other, or you need to support a reconcile phase where a user can decide which version (server or client) is the correct one, overwriting the other.
This decision is extremely important and you must weight the "role" of the clients. Especially if there is a potential conflict not only between client and server, but in case different clients can change the same record(s).
[Assuming that #125 can be modified by a second client (Client B) there is a chance that Client B, which hasn't synched yet, will provide yet another version of the same record, making the previous conflict resolution moot]
Regarding the "created or updated" point above... how can you properly identify a record if it has been originated on one of the clients (assuming this makes sense in your problem domain)?
Let's suppose your app manages a list of business contacts. If Client A says you have to add a newly created John Smith, and the server has a John Smith created yesterday by Client D... do you create two records because you cannot be certain that they aren't different persons? Will you ask the user to reconcile this conflict too?
Do clients have "ownership" of a subset of data? I.e. if Client B is setup to be the "authority" on data for Area #5 can Client A modify/create records for Area #5 or not? (This would make some conflict resolution easier, but may prove unfeasible for your situation).
To sum it up the main problems are:
How to define "identity" considering that detached clients may not have accessed the server before creating a new record.
The previous situation, no matter how sophisticated the solution, may result in data duplication, so you must foresee how to periodically solve these and how to inform the clients that what they considered as "Record #675" has actually been merged with/superseded by Record #543
Decide if conflicts will be resolved by fiat (e.g. "The server version always trumps the client's if the former has been updated since the last synch") or by manual intervention
In case of fiat, especially if you decide that the client takes precedence, you must also take care of how to deal with other, not-yet-synched clients that may have some more changes coming.
The previous items don't take in account the granularity of your data (in order to make things simpler to describe). Suffice to say that instead of reasoning at the "Record" level, as in my example, you may find more appropriate to record change at the field level, instead. Or to work on a set of records (e.g. Person record + Address record + Contacts record) at a time treating their aggregate as a sort of "Meta Record".
Bibliography:
More on this, of course, on Wikipedia.
A simple synchronization algorithm by the author of Vdirsyncer
OBJC article on data synch
SyncML®: Synchronizing and Managing Your Mobile Data (Book on O'Reilly Safari)
Conflict-free Replicated Data Types
Optimistic Replication YASUSHI SAITO (HP Laboratories) and MARC SHAPIRO (Microsoft Research Ltd.) - ACM Computing Surveys, Vol. V, No. N, 3 2005.
Alexander Traud, Juergen Nagler-Ihlein, Frank Kargl, and Michael Weber. 2008. Cyclic Data Synchronization through Reusing SyncML. In Proceedings of the The Ninth International Conference on Mobile Data Management (MDM '08). IEEE Computer Society, Washington, DC, USA, 165-172. DOI=10.1109/MDM.2008.10 http://dx.doi.org/10.1109/MDM.2008.10
Lam, F., Lam, N., and Wong, R. 2002. Efficient synchronization for mobile XML data. In Proceedings of the Eleventh international Conference on information and Knowledge Management (McLean, Virginia, USA, November 04 - 09, 2002). CIKM '02. ACM, New York, NY, 153-160. DOI= http://doi.acm.org/10.1145/584792.584820
Cunha, P. R. and Maibaum, T. S. 1981. Resource &equil; abstract data type + synchronization - A methodology for message oriented programming -. In Proceedings of the 5th international Conference on Software Engineering (San Diego, California, United States, March 09 - 12, 1981). International Conference on Software Engineering. IEEE Press, Piscataway, NJ, 263-272.
(The last three are from the ACM digital library, no idea if you are a member or if you can get those through other channels).
From the Dr.Dobbs site:
Creating Apps with SQL Server CE and SQL RDA by Bill Wagner May 19, 2004 (Best practices for designing an application for both the desktop and mobile PC - Windows/.NET)
From arxiv.org:
A Conflict-Free Replicated JSON Datatype - the paper describes a JSON CRDT implementation (Conflict-free replicated datatypes - CRDTs - are a family of data structures that support concurrent modification and that guarantee convergence of such concurrent updates).
I would recommend that you have a timestamp column in every table and every time you insert or update, update the timestamp value of each affected row. Then, you iterate over all tables checking if the timestamp is newer than the one you have in the destination database. If it´s newer, then check if you have to insert or update.
Observation 1: be aware of physical deletes since the rows are deleted from source db and you have to do the same at the server db. You can solve this avoiding physical deletes or logging every deletes in a table with timestamps. Something like this: DeletedRows = (id, table_name, pk_column, pk_column_value, timestamp) So, you have to read all the new rows of DeletedRows table and execute a delete at the server using table_name, pk_column and pk_column_value.
Observation 2: be aware of FK since inserting data in a table that´s related to another table could fail. You should deactivate every FK before data synchronization.
If anyone is dealing with similar design issue and needs to synchronize changes across multiple Android devices I recommend checking Google Cloud Messaging for Android (GCM).
I am working on one solution where changes done on one client must be propagated to other clients. And I just implemented a proof of concept implementation (server & client) and it works like a charm.
Basically, each client sends delta changes to the server. E.g. resource id ABCD1234 has changed from value 100 to 99.
Server validates these delta changes against its database and either approves the change (client is in sync) and updates its database or rejects the change (client is out of sync).
If the change is approved by the server, server then notifies other clients (excluding the one who sent the delta change) via GCM and sends multicast message carrying the same delta change. Clients process this message and updates their database.
Cool thing is that these changes are propagated almost instantaneously!!! if those devices are online. And I do not need to implement any polling mechanism on those clients.
Keep in mind that if a device is offline too long and there is more than 100 messages waiting in GCM queue for delivery, GCM will discard those message and will send a special message when the devices gets back online. In that case the client must do a full sync with server.
Check also this tutorial to get started with CGM client implementation.
this answers developers who are using the Xamarin framework (see https://stackoverflow.com/questions/40156342/sync-online-offline-data)
A very simple way to achieve this with the xamarin framework is to use the Azure’s Offline Data Sync as it allows to push and pull data from the server on demand. Read operations are done locally, and write operations are pushed on demand; If the network connection breaks, the write operations are queued until the connection is restored, then executed.
The implementation is rather simple:
1) create a Mobile app in azure portal (you can try it for free here https://tryappservice.azure.com/)
2) connect your client to the mobile app.
https://azure.microsoft.com/en-us/documentation/articles/app-service-mobile-xamarin-forms-get-started/
3) the code to setup your local repository:
const string path = "localrepository.db";
//Create our azure mobile app client
this.MobileService = new MobileServiceClient("the api address as setup on Mobile app services in azure");
//setup our local sqlite store and initialize a table
var repository = new MobileServiceSQLiteStore(path);
// initialize a Foo table
store.DefineTable<Foo>();
// init repository synchronisation
await this.MobileService.SyncContext.InitializeAsync(repository);
var fooTable = this.MobileService.GetSyncTable<Foo>();
4) then to push and pull your data to ensure we have the latest changes:
await this.MobileService.SyncContext.PushAsync();
await this.saleItemsTable.PullAsync("allFoos", fooTable.CreateQuery());
https://azure.microsoft.com/en-us/documentation/articles/app-service-mobile-xamarin-forms-get-started-offline-data/
I suggest you also take a look at Symmetricds. it is a SQLite replication library available to android systems. you can use it to synchronize your client and server database, I also suggest to have separate databases on server for each client. Trying to hold the data of all users in one mysql database is not always the best idea. Specially if the user data is going to grow fast.
Lets call it the CUDR Sync problem (I don't like CRUD - because Create/Update/Delete are writes and should be paired together)
The problem may also be looked at from write-offliine-first or write-online-first perspective. The write-offline-approach has a problem with unique identifier conflict, and also multiple network calls for same transaction increasing risk (or cost)...
I personally find write-online-first approach easier to manage (so it will be the single source of truth - from where everything else is synced). The write-online-approach will require not letting users write offline first - they will write offline by getting ok response form online write.
He may read offline first and as soon as network is available get the data from online and update the local database and then update the ui....
One way to avoid the unique identifier conflict would be to use a combination of unique user id + table name or table id + row id (generated by sqlite)... and then use the synced boolean flag column with it.. but still the registration has to be done online first to get the unique id on which all other ids will be generated... here the issue will also be if clocks are not synced - which someone mentioned above...

Categories