Android SyncAdapter mechanism [duplicate] - php

I'm looking for some general strategies for synchronizing data on a central server with client applications that are not always online.
In my particular case, I have an android phone application with an sqlite database and a PHP web application with a MySQL database.
Users will be able to add and edit information on the phone application and on the web application. I need to make sure that changes made one place are reflected everywhere even when the phone is not able to immediately communicate with the server.
I am not concerned with how to transfer data from the phone to the server or vice versa. I'm mentioning my particular technologies only because I cannot use, for example, the replication features available to MySQL.
I know that the client-server data synchronization problem has been around for a long, long time and would like information - articles, books, advice, etc - about patterns for handling the problem. I'd like to know about general strategies for dealing with synchronization to compare strengths, weaknesses and trade-offs.

The first thing you have to decide is a general policy about which side is considered "authoritative" in case of conflicting changes.
I.e.: suppose Record #125 is changed on the server on January 5th at 10pm and the same record is changed on one of the phones (let's call it Client A) on January 5th at 11pm.
Last synch was on Jan 3rd. Then the user reconnects on, say, January 8th.
Identifying what needs to be changed is "easy" in the sense that both the client and the server know the date of the last synch, so anything created or updated (see below for more on this) since the last synch needs to be reconciled.
So, suppose that the only changed record is #125.
You either decide that one of the two automatically "wins" and overwrites the other, or you need to support a reconcile phase where a user can decide which version (server or client) is the correct one, overwriting the other.
This decision is extremely important and you must weight the "role" of the clients. Especially if there is a potential conflict not only between client and server, but in case different clients can change the same record(s).
[Assuming that #125 can be modified by a second client (Client B) there is a chance that Client B, which hasn't synched yet, will provide yet another version of the same record, making the previous conflict resolution moot]
Regarding the "created or updated" point above... how can you properly identify a record if it has been originated on one of the clients (assuming this makes sense in your problem domain)?
Let's suppose your app manages a list of business contacts. If Client A says you have to add a newly created John Smith, and the server has a John Smith created yesterday by Client D... do you create two records because you cannot be certain that they aren't different persons? Will you ask the user to reconcile this conflict too?
Do clients have "ownership" of a subset of data? I.e. if Client B is setup to be the "authority" on data for Area #5 can Client A modify/create records for Area #5 or not? (This would make some conflict resolution easier, but may prove unfeasible for your situation).
To sum it up the main problems are:
How to define "identity" considering that detached clients may not have accessed the server before creating a new record.
The previous situation, no matter how sophisticated the solution, may result in data duplication, so you must foresee how to periodically solve these and how to inform the clients that what they considered as "Record #675" has actually been merged with/superseded by Record #543
Decide if conflicts will be resolved by fiat (e.g. "The server version always trumps the client's if the former has been updated since the last synch") or by manual intervention
In case of fiat, especially if you decide that the client takes precedence, you must also take care of how to deal with other, not-yet-synched clients that may have some more changes coming.
The previous items don't take in account the granularity of your data (in order to make things simpler to describe). Suffice to say that instead of reasoning at the "Record" level, as in my example, you may find more appropriate to record change at the field level, instead. Or to work on a set of records (e.g. Person record + Address record + Contacts record) at a time treating their aggregate as a sort of "Meta Record".
Bibliography:
More on this, of course, on Wikipedia.
A simple synchronization algorithm by the author of Vdirsyncer
OBJC article on data synch
SyncML®: Synchronizing and Managing Your Mobile Data (Book on O'Reilly Safari)
Conflict-free Replicated Data Types
Optimistic Replication YASUSHI SAITO (HP Laboratories) and MARC SHAPIRO (Microsoft Research Ltd.) - ACM Computing Surveys, Vol. V, No. N, 3 2005.
Alexander Traud, Juergen Nagler-Ihlein, Frank Kargl, and Michael Weber. 2008. Cyclic Data Synchronization through Reusing SyncML. In Proceedings of the The Ninth International Conference on Mobile Data Management (MDM '08). IEEE Computer Society, Washington, DC, USA, 165-172. DOI=10.1109/MDM.2008.10 http://dx.doi.org/10.1109/MDM.2008.10
Lam, F., Lam, N., and Wong, R. 2002. Efficient synchronization for mobile XML data. In Proceedings of the Eleventh international Conference on information and Knowledge Management (McLean, Virginia, USA, November 04 - 09, 2002). CIKM '02. ACM, New York, NY, 153-160. DOI= http://doi.acm.org/10.1145/584792.584820
Cunha, P. R. and Maibaum, T. S. 1981. Resource &equil; abstract data type + synchronization - A methodology for message oriented programming -. In Proceedings of the 5th international Conference on Software Engineering (San Diego, California, United States, March 09 - 12, 1981). International Conference on Software Engineering. IEEE Press, Piscataway, NJ, 263-272.
(The last three are from the ACM digital library, no idea if you are a member or if you can get those through other channels).
From the Dr.Dobbs site:
Creating Apps with SQL Server CE and SQL RDA by Bill Wagner May 19, 2004 (Best practices for designing an application for both the desktop and mobile PC - Windows/.NET)
From arxiv.org:
A Conflict-Free Replicated JSON Datatype - the paper describes a JSON CRDT implementation (Conflict-free replicated datatypes - CRDTs - are a family of data structures that support concurrent modification and that guarantee convergence of such concurrent updates).

I would recommend that you have a timestamp column in every table and every time you insert or update, update the timestamp value of each affected row. Then, you iterate over all tables checking if the timestamp is newer than the one you have in the destination database. If it´s newer, then check if you have to insert or update.
Observation 1: be aware of physical deletes since the rows are deleted from source db and you have to do the same at the server db. You can solve this avoiding physical deletes or logging every deletes in a table with timestamps. Something like this: DeletedRows = (id, table_name, pk_column, pk_column_value, timestamp) So, you have to read all the new rows of DeletedRows table and execute a delete at the server using table_name, pk_column and pk_column_value.
Observation 2: be aware of FK since inserting data in a table that´s related to another table could fail. You should deactivate every FK before data synchronization.

If anyone is dealing with similar design issue and needs to synchronize changes across multiple Android devices I recommend checking Google Cloud Messaging for Android (GCM).
I am working on one solution where changes done on one client must be propagated to other clients. And I just implemented a proof of concept implementation (server & client) and it works like a charm.
Basically, each client sends delta changes to the server. E.g. resource id ABCD1234 has changed from value 100 to 99.
Server validates these delta changes against its database and either approves the change (client is in sync) and updates its database or rejects the change (client is out of sync).
If the change is approved by the server, server then notifies other clients (excluding the one who sent the delta change) via GCM and sends multicast message carrying the same delta change. Clients process this message and updates their database.
Cool thing is that these changes are propagated almost instantaneously!!! if those devices are online. And I do not need to implement any polling mechanism on those clients.
Keep in mind that if a device is offline too long and there is more than 100 messages waiting in GCM queue for delivery, GCM will discard those message and will send a special message when the devices gets back online. In that case the client must do a full sync with server.
Check also this tutorial to get started with CGM client implementation.

this answers developers who are using the Xamarin framework (see https://stackoverflow.com/questions/40156342/sync-online-offline-data)
A very simple way to achieve this with the xamarin framework is to use the Azure’s Offline Data Sync as it allows to push and pull data from the server on demand. Read operations are done locally, and write operations are pushed on demand; If the network connection breaks, the write operations are queued until the connection is restored, then executed.
The implementation is rather simple:
1) create a Mobile app in azure portal (you can try it for free here https://tryappservice.azure.com/)
2) connect your client to the mobile app.
https://azure.microsoft.com/en-us/documentation/articles/app-service-mobile-xamarin-forms-get-started/
3) the code to setup your local repository:
const string path = "localrepository.db";
//Create our azure mobile app client
this.MobileService = new MobileServiceClient("the api address as setup on Mobile app services in azure");
//setup our local sqlite store and initialize a table
var repository = new MobileServiceSQLiteStore(path);
// initialize a Foo table
store.DefineTable<Foo>();
// init repository synchronisation
await this.MobileService.SyncContext.InitializeAsync(repository);
var fooTable = this.MobileService.GetSyncTable<Foo>();
4) then to push and pull your data to ensure we have the latest changes:
await this.MobileService.SyncContext.PushAsync();
await this.saleItemsTable.PullAsync("allFoos", fooTable.CreateQuery());
https://azure.microsoft.com/en-us/documentation/articles/app-service-mobile-xamarin-forms-get-started-offline-data/

I suggest you also take a look at Symmetricds. it is a SQLite replication library available to android systems. you can use it to synchronize your client and server database, I also suggest to have separate databases on server for each client. Trying to hold the data of all users in one mysql database is not always the best idea. Specially if the user data is going to grow fast.

Lets call it the CUDR Sync problem (I don't like CRUD - because Create/Update/Delete are writes and should be paired together)
The problem may also be looked at from write-offliine-first or write-online-first perspective. The write-offline-approach has a problem with unique identifier conflict, and also multiple network calls for same transaction increasing risk (or cost)...
I personally find write-online-first approach easier to manage (so it will be the single source of truth - from where everything else is synced). The write-online-approach will require not letting users write offline first - they will write offline by getting ok response form online write.
He may read offline first and as soon as network is available get the data from online and update the local database and then update the ui....
One way to avoid the unique identifier conflict would be to use a combination of unique user id + table name or table id + row id (generated by sqlite)... and then use the synced boolean flag column with it.. but still the registration has to be done online first to get the unique id on which all other ids will be generated... here the issue will also be if clocks are not synced - which someone mentioned above...

Related

Web project: Multiple instances vs single instance

Team,
We are building a web project (like IT ticketing system). And we expect to have some big clients as soon we release the product. There should be three ways to raise a ticket: 1) via web application (forms), 2) via email or 3) via phone call to agent. According to our research 99% of tickets come via email and that means we shall be storing a lot of long messages etc.
The project is scoped so that we have two interfaces: agents (IT folks handling queries) and clients (people who ask for help).
The question here is what would you suggest us to do considering expected data and storage growth:
centralize everything so that we have one app with a
single huge database (easy to backup etc. unless we stuck with ex.
data corruption or similar)...
separate app in two parts one for IT agents and another one for
clients. The idea is to split application in two: one centralized
interface and back-end for IT agents and another one for clients.
For each client we would create a separate database along with a
copy of the PHP project (code syncing is easy to automate). Multiple
client instances could be hosted on one or many servers. They would
communicate via APIs. For example: IT agent opens a dashboard and
the list of outstanding tickets is displayed. If that agent is
working on 10 big clients back-end would need to contact 10
instances via API and request outstanding tickets. We can ensure
only certain number of queries would be displayed...
Please feel free to add third option as well.
I am not quite sure that I understood everything correctly but from what I understood I can point out the following key points about
your system requirement:
You are dealing with lot of data, and the data will grow fast
Most of traffic is coming from Email ticketing system
You have a multi-client system
You have an agent which can view data from multiple clients.
Question is can this agent manipulate(create, update, delete) data from multiple clients?
This is quite important point for future limitations of the architecture or not. I will assume that it can only read data from multiple clients.
Your 2 suggestions:
I would not recommend that as approach as many other problems could arise as the database grows.
For example you will be forced to add Indexes to speed up queries on your db which will help in the
beginning but later this will come to hunt you down especially if you have to add a lot of Non-Clustered
indexes. You could make it a little better by using Read-only replicas but even with this you will at some point
have issues. The problem will still remain in your 1 main database which will grow.
Quote:
separate app in two parts one for IT agents and another one for
clients. The idea is to split application in two: one centralized
interface and back-end for IT agents and another one for clients. For
each client we would create a separate database along with a copy of
the PHP project (code syncing is easy to automate). Multiple client
instances could be hosted on one or many servers. They would
communicate via APIs. For example: IT agent opens a dashboard and the
list of outstanding tickets is displayed. If that agent is working on
10 big clients back-end would need to contact 10 instances via API and
request outstanding tickets. We can ensure only certain number of
queries would be displayed...
You can split it to 2 separate apps as you said:
Centralized Interface + back-end, would call the 1 or multiple databases
Client application + back-end(monolith or multiple services), would call the same database as Centralized interface
but only for current client
As far as I understood your problem is not scaling Web-Servers(your back-end) but the db? If your problem is scaling the back-end as well
then you can consider either scaling to multiple instances or splitting your domain to micro-services and scaling that architecture on
micro-service level for each service independently.
My Suggestion:
1. Scaling your back-end:
You can keep everything in one service(monolithic approach) and deploy it
on multiple servers and scale the whole service together. There is nothing wrong with this.
Like everything it depends of your Business/Domain requirements and what worked best for you.
Although it is very popular these days to use micro-services they are not the best solution for
every problem. I have worked with both types of architecture and they have worked fine for
different scenarios.
You can even have middle-ground solution between them to take on specific part which has
high scaling demand and extract that to be a separate service(like creating Tickets sub-system service)
and the rest of the application which has low demand would be one big service.
2. Scaling your database:
Considering the above points I would suggest you to use Data Sharding or Data Partitioning.
You can read about data sharding here.
In general it is a way to logically and physically split your data from one database to multiple based
on some partitioning or shard key.
This means that you can take one specific concept in your Domain as the Shard key to split the data based on it.
In your case this could be CustomerId. This can only be done if the Business operations which include more
then one Customer is not the case for your Business. Means if the all your operations are done within one Customer.
The only exception here would be reading/viewing more customers together. This is fine as this does not need any
transnational behavior.
This really depends on your Business-scenarios and logic.
If splitting your database to multiple databases based on shard-key CustomerId is not enough you can take a shard-key
which is even more specific inside the Customer scope. Again it depends if your Domain allows this.
In this case it could be for example the concept a CustomerA would have CustomerA-Europe shard
CustomerA-USA shard, CustomerA-Africa and so on.
This would represent the logical shard. The physical shard would be the physical database.
The important point is that you pick your logical shard-key in the beginning so that you can easily
migrate your data to different physical databases later when you need it based on that shard-key.
Additionally to this you could include Historization for some heavy tables to separate the up to date
data from your historical data. You can read more about this here.

Replicate Master DB to Different Slaves

I have a master database which would be the cloud server that consisted of different schools.
Dashboard type that has the details of each school. Can edit their information and other data.
Now those schools are deployed to their corresponding school location which would be the local server.
Dashboard type that can only edit the specific school deployed in the local server. Can edit their information and other data.
Now what I want to happen is, to synchronize the cloud to local server on their corresponding school if something is changed. That also goes for local to cloud server.
Note: If you guys ever tried Evernote, that can edit the notes information on whatever device you're using and still be able to
synchronize when you have internet or manually clicked synchronize.
When the local server doesn't have internet connection and edited some data in school. Once the internet is up, the data from local and cloud server should be synchronize.
That's the logic that I'm pursuing to have.
Would anyone shed some light for me where to start off? I couldn't think of any solution that fit my problem.
I also think of using php to foreach loop all over the table and data that corresponds to current date and time. But I know that would be so bad.
Edited: I deleted references / posts of other SO questions regarding this matter.
The application pegs that I found are
Evernote
Todoist
Servers:
Local Server Computer: Windows 10 (Deployed in Schools)
Cloud Server: Probably some dedicated hosting that uses phpmyadmin
Not to be picky but, hopefully the answer would be you're talking to a newbie to master to slave database process. I don't have experience for this.
When we used to do this we would:
Make sure every table we wanted to sync had datetime columns for Created; Modified; & Deleted. They would also have a boolean isDeleted column (so rather than physically delete records we would flag it to true and ignore it in queries). This means we could query for any records that have been deleted since a certain time and return an array of these deleted IDs.
In each DB (Master and slave) create a table that stores the last successful sync datetime. In the master this table stores multiple records: 1 for each school, but in the slave it just needs 1 record - the last time it synced with the master.
In your case every so often each of the slaves would:
Call a webservice (a URL) of the master, lets say called 'helloMaster'. It would pass in the school name (or some specific identifier), the last time they successfully synced with the master, authentication details (for security) and expect a response from the master of whether the master had any updates for the school since that datetime provided. Really the point here is just looking for an acknowledgement that the master available and listening (ie. the internet is still up).
Then, the slave would call another webservice, lets say called 'sendUpdates'. It would again pass in the school name, last successful sync, (plus security authentication details) & three arrays for any added, updated and deleted records since last sync. The master just acknowledge receipt. If a receipt was acknowledged then the slave to move to step 3, otherwise the slave would try step 1 again after a pause of some duration. So now the Master has updates from the slave. Note: it is up to the master to decide how to merge any records if there are conflicts with its pending slave updates.
The slave then calls a webservice, lets say 'getUpdates'. It passes in the school name, last successful sync, security authentication details, & the master then return to it three arrays for any added, updated and deleted records it has which the slave is expected to apply to its database.
Finally once the slave tries to update its records it will then notifies the master of success/failure through another webservice, say 'updateStatus'. If successful then the master will return a new sync date for the slave to store (this will exactly match the date the master stores in its table). If it fails then the error is logged in the master and we go back to step 1 after a pause.
I have left out some detail out about error handling, getting the times accurate across all devices (there might be different time zones involved), and some other bits and pieces, but that's the gist of it.
I may make refinements after thinking on it more (or others might edit my post).
Hope that helps at least.
I will suggest you to go with the Trivial Solution, which according to me is:
Create a SQLlite or any database (MySQL or your choice) in local server
Keep a always running thread which will be pinging (makes an API call) your Master database every 5 minutes (depends on how much delay is accepted)
With that thread you can detect whether you're connected to the internet or not.
If connected to internet
a) Send local changes with the request to master server, this master server is an application server, which will be capable to update changes of local machines in school (you received this changes by an API call) to the master database after certain validations according to your application usage.
b) Receive updated changes from the server after the API call, this changes are served after solving conflicts (like if data in school server was updated earlier than data updated in master database so which one you will accept based on your requirement).
If not connected to internet, keep storing changes in local database and reflect those changes in Application which is running in school, but when you get connected push those changes to master server and pull actual changes which is applicable from the master server.
This is complicated to do it by your own, but if the scale is small I will prefer to implement your own APIs for the database applications which will connect in this manner.
Better solution will be to use Google Firebase, which is a real time database which is asynchronously updated whenever there is change in any machine, but can cost you higher if its really not required. But yes it will really give you Evernote type realtime editing features for your database systems.
This is not a problem that can be solved by database replication.
Generally speaking, database replication can operate in one of two modes:
Master/slave replication, which is what MySQL uses. In this mode, all writes must be routed to a single "master" server, and all of the replica databases receive a feed of changes from the master.
This doesn't suit your needs, as writes can only be made to the master. (Modifying one of the replicas directly would result in it becoming permanently out of sync with the master.)
Quorum-based replication, which is used by some newer databases. All database replicas connect to each other. So long as at least half of all replicas are connected (that is, the cluster has reached "quorum"), writes can be made to any of the active databases, and will be propagated to all of the other databases. A database that is not connected will be brought up to date when it joins the quorum.
This doesn't suit your needs either, as a disconnected replica cannot be written to. Worse, having more than half of all replicas disconnect from the master would prevent the remaining databases from being written to either!
What you need is some sort of data synchronization solution. Any solution will require some logic -- which you will have to write! -- to resolve conflicts. (For instance, if a record is modified in the master database while a school's local replica is disconnected, and the same record is also modified there, you will need some way to reconcile those differences.)
No need for any complicated setup or APIs. MySQL allows you to easily replicate your database. MySQL will ensure the replication is correctly and timely done and whenever internet is available. (and its fast too)
There are:
Master - slave: Master edits slave reads or in other words one way synchronization from master to slave.
Master - Master: Master1 edits master2 reads and edits or in other words two way synchronization. Both server will push and pull updates.
assuming your cloud server has schema for each school and each schema is accessible by its own username and password. i.e db_school1, db_school2
now you have the option to replicate only a selected database schema from your cloud to local master. In your case, school one's local master will only "do replicate db_school1"
in case if you want to replicate only specific table, MySQL also has that option "replicate-do-table"
the actual replication process is very easy but can get very deep when you have different scenarios.
few things you want to take a note, server ids, different auto-increment value on each server to avoid conflicts with new records. i.e Master1 generates records on odd number, Master 2 on even numbers so there won't be a duplicate primary key issues. Server down alerts/monitoring, error skipping
I'm not sure if you are on linux or windows, I've wrote simple c# application which checks if any of the master is not replicating or stopped for any reason and sends email. monitoring is crucial!
here some links for master master replication:
https://www.howtoforge.com/mysql_master_master_replication
https://www.digitalocean.com/community/tutorials/how-to-set-up-mysql-master-master-replication
also worth reading this optimised tabl-level replication info:
https://dba.stackexchange.com/questions/37015/how-can-i-replicate-some-tables-without-transferring-the-entire-log
hope this helps.
Edit:
The original version of this answer proposed MongoDB; but with further reading MongoDB is not so reliable with dodgy internet connections. CouchDB is designed for offline documents, which is what you need - although it's harder to get gong than MongoDB, unfortunately.
Original:
I'd suggest not using MySQL but deploy a document store designed for replication such as CouchDB - unless you go for the commercial MySQL clustering services.
Being a lover of the power of MySQL I find it hard to suggest you use something else, but in this case, you really should.
Here is why -
Problems using MySQL replication
Why MySQL had good replication (and that's most likely what you should be using if you're synchronizing a MySQL database - as recommended by others) there are some things to watch out for.
"Unique Key" clashes will give you a massive headache; the most
likely cause of this is "Auto Incrementing" IDs that are common in
MySQL applications (don't use them for syncing operation unless there
is a clear "read+write"->"read-only" relationship - which there isn't
in your case.)
Primary keys must be generated by each server but unique across all servers. Possibly by adding a mix of a server identifier and a unique ID for that server (Server1_1, Server1_2, Server1_3 etc will not clash with Server2_1)
MySQL sync only supports on-way unless you look at their clustering solutions (https://www.mysql.com/products/cluster/).
Problems doing it "manually" with time stamping the record.
Another answer recommends keeping "Time Updated" records. While I've done this approach there are some big gotchas to be careful of.
"Unique Key" clashes (as mentioned above; same problems - don't use them except primary keys, and generate primary keys unique to the server)
Multiple updates on multiple servers need to be precisely time-synced
and clashes handled according to rules. This can be a headache.
What happens when updates are received way out-of-order; which fields have been updated, which weren't? You probably don't need to update the whole record, but how do you know?
If you must, try one of the commercial solutions as mentioned in answers https://serverfault.com/questions/58829/how-to-keep-multiple-read-write-db-servers-in-sync and https://community.spiceworks.com/topic/352125-how-to-synchronize-multiple-mysql-servers and Strategy on synchronizing database from multiple locations to a central database and vice versa (etc - Google for more)
Problems doing it "manually" with journalling.
Journalling is keeping a separate record of what has changed and when. "Database X, Table Y, Field Z was updated to value A at time B" or "Table A had new record added with these details [...]". This allows you much finer control of what to update.
if you look at database sync techniques, this is actually what is going on in the background; in MySQL's case it keeps a binary log of the updates
you only ever share the journal, never the original record.
When another server receives a journal entry, if has a much greater picture of what has happened before/after and can replay updates and ensure you get the correct details.
problems arise when the journalling/database get out of Sync (MySQL is actually a pain when this happens!). You need to have a "refresh" script ready to roll that sits outside the journalling that will sync the DB to the master.
It's complicated. So...
Solution: Using a document store designed for replication, e.g. MongoDB
Bearing all this that in mind, why not use a document store that already does all that for you? CouchDB has support and handles all the journalling and syncing (http://docs.couchdb.org/en/master/replication/protocol.html).
There are others out there, but I believe you'll end up with less headaches and errors than with the other solutions.
Master to master replication in MySQL can be accomplished without key violations while using auto_increment. Here is a link that explains how.
If you have tables without primary keys I'm not sure what will happen (I always include auto_increment primary keys on tables)
http://brendanschwartz.com/post/12702901390/mysql-master-master-replication
The auto-increment-offset and auto-increment-increment effect the auto_increment values as shown in the config samples from the article...
server_id = 1
log_bin = /var/log/mysql/mysql-bin.log
log_bin_index = /var/log/mysql/mysql-bin.log.index
relay_log = /var/log/mysql/mysql-relay-bin
relay_log_index = /var/log/mysql/mysql-relay-bin.index
expire_logs_days = 10
max_binlog_size = 100M
log_slave_updates = 1
auto-increment-increment = 2
auto-increment-offset = 1
server_id = 2
log_bin = /var/log/mysql/mysql-bin.log
log_bin_index = /var/log/mysql/mysql-bin.log.index
relay_log = /var/log/mysql/mysql-relay-bin
relay_log_index = /var/log/mysql/mysql-relay-bin.index
expire_logs_days = 10
max_binlog_size = 100M
log_slave_updates = 1
auto-increment-increment = 2
auto-increment-offset = 2

Online and offline synchronization

I am working on a project that synchronizes online and offline features due to the unstable Internet. I have come up with a possible solution. That is to create 2 similar databases for both online and offline and sync the two. My question is that is this a good method? Or are there better options?
I have researched online on the subject but I haven't come across anything substantive. One useful link I found was on database Replication. But I want the offline version to detect Internet presence and sync accordingly.
Pls can you help me find solutions or clues to solve my problem?
I'd suggest you have an online storage for syncing and a local database(browser indexeddb, program sqllite or something similar) and log all your changes in your local database but have a record with what data was entered after last sync.
When you have a connection you sync all new data with the online storage at set intervals(like once every 5 mins or constant stream if you have the bandwidth/cpu capacity)
When the user logs in from a "fresh" location the online database pushes all data to the client who fills the local database with the data and then it resumes normal syncing function.
Plan A: Primary-Primary replication (formerly called Master-Master). You do need to be careful PRIMARY KEYs and UNIQUE keys. While the "other" machine is offline, you could write conflicting values to a table. Later, when they try to sync up, replication will freeze, requiring manual intervention. (Not a pretty sight.)
Plan B: Write changes to some storage other than the db. This suffers the same drawbacks as Plan A, plus there is a bunch of coding on your part to implement it.
Plan C: Galera cluster with 3 nodes. When all 3 nodes are up, all can take writes. If one node goes down, or network problems make it seem offline to the other two, it will automatically become read-only. After things get fixed, the sync is done automatically.
Plan D: Only write to a reliable Primary; let the other be a readonly Replica. (But this violates your requirement about an "unstable Internet".)
None of these perfectly fits the requirements. Plan A seems to be the only one that has a chance. Let's look at that.
If you have any UNIQUE key in any table and you might insert new rows into it, the problem exists. Even something as innocuous as a 'normalization table' wherein you insert a name and get back an id for use in other tables has the problem. You might do that on both servers with the same name and get different ids. Now you have a mess that is virtually impossible to fix.
Not sure if its outside the scope of the project but you can try these:
https://pouchdb.com/
https://couchdb.apache.org/
" PouchDB is an open-source JavaScript database inspired by Apache CouchDB that is designed to run well within the browser.
PouchDB was created to help web developers build applications that work as well offline as they do online.
It enables applications to store data locally while offline, then synchronize it with CouchDB and compatible servers when the application is back online, keeping the user's data in sync no matter where they next login. "

MySQL Database - Capacity planning dilemma, advice please

I have a MySQL hosting and capacity planning question. I would like to know the minimum hosting requirements to host a MySQL database of the type and size described below:
Background: I have a customer in the finance industry who has bought a bespoke software CMS platform written in PHP with a MySQL database.
Their current solution does not have any reports, and the software vendor who provided it only allows them to use some PHP pages to export the entire contents of tables which the customer then has to manually manipulate in Excel to obtain their business reporting.
The vendor will not allow them access to their live database in order
to run Crystal Reports saying that this is a risk to the database, preferring them to purchase an expensive database replication solution; so the
customer continues to perform tedious manual exports of entire tables
every day.
The database: The database is currently 90MB in size and a custom 9 month old PHP solution sits on top of it. The customer has no access to this as it is hosted by their current vendor. There are 43 tables in total, of which one - a whopping big log table uses up 99% of the database size.
The top four tables sizes containing the business data are tiny tables;
34.62 MB
13.79 MB
8.46 MB
7.59 MB
The vast majority of the tables are simple look-up tables for data values and have only a few rows.
The largest table in the database, however, is a big-ass log table which is 1400MB in size. This table alone accounts for over 99.9% of the total database size.
The question: Considering that the solution is (log table notwithstanding) very small, with only a few staff members making data entry via some simple PHP forms, is there a realistic problem with running Crystal Reports against such a database in production? Bearing in mind that there are times during the day - the majority of the day in fact - when this database is simply not being used. Lunchtimes for example and out of hours.
The vendor maintains that there is a fundamental risk to the business to query live data and that running Crystal Reports against this database could cause it to "crash the live db and the business loses operations".
The customer is keen to have a live dashboard too; which could be written with a very small SQL query to aggregate some numbers from those small tables listed above.
I usually work with SQL Server and Oracle and I have absolutely no qualms about allowing a Crystal Report or running a view to populate a UI with some real time data from the live database - especially a database this small; after all what is the database for if one cannot SELECT from it now and again?
Is it necessary, to avoid "hanging the server" and to "avoid querying when other operations are occurring on the server", to replicate this MySQL database to a second, reporting database? In my experience, the need to do this only applies to sensitive, security-risk or databases with high transactional volumes.
System usage: The system is heavily reliant on scheduled CRON jobs every half hour. There may be 500 users per week each logging on and entering some data (but not much data - see table sizes above).
Any comments are warmly welcome.
Thanks for your time.
1) You need 2 $5 digital ocean servers.
2) "crash the live db and the business loses operations". Is absolutely false. They are idiots. What they are likely hiding is the poor structure of their database. They likely have 1 table architecture for all of their clients only separating from a client_id. Giving access to the table would give access to all of the client data which is why they force a giant replication solution so they can make sure you are only getting YOUR data.
3) Is it necessary, to avoid "hanging the server" and to "avoid querying when other operations are occurring on the server"? Yes it is.
4) to replicate this MySQL database to a second, reporting database? Yes this is good practice as you can setup fail over in the event that the worst happens. If you are really paranoid you can setup remote fail over from different companies. seeing as how this is in the financial sector I am pretty sure you want that.
5) In my experience, the need to do this only applies to sensitive, security-risk or databases with high transactional volumes. In my experience it is always good to have your data backed up because sh*t happens in life and usually when you least expect it.
As for your real-time usage. Assuming the database is structured properly with indexes and using InnoDB you should have minimal issues supporting 100 requests per second, so I think your 500 a week user problem is something to not worry about.
Like i had mentioned what you likely want is 2 servers at different providers, likely the cheapest instances you can get since you don't need a huge amount of space or resources. You can setup DNS to make 1 the primary and 1 the replication slave, then in a disaster scenario change the DNS and make the other one the master.
I hope this helps.

Database structure for a system with multisite - Database & PHP

The system I'm working is structured as below. Given that I'm planning to use Joomla as the base.
a(www.a.com),b(www.b.com),c(www.c.com) are search portals which allows user to to search for reservation.
x(www.x.com),y(www.y.com),z(www.z.com) are hotels where booking are made by users.
www.a.com's user can only search for the booking which are in
www.x.com
www.b.com's user can only search for the booking which are in
www.x.com,www.y.com
www.c.com's user can search for all the booking which are in
www.x.com, www.y.com, www.z.com
All a,b,c,x,y,z runs the same system. But they should have separate domains. So according to my finding and research architecture should be as above where an API integrate all database calls.
Given that only 6 instance are shown here(a,b,c,x,y,z). There can be up to 100 with different search combinations.
My problems,
Should I maintain a single database for the whole system ? If so how can I unplug one instance if required(EG : removing www.a.com from the system or removing www.z.com from the system) ? Since I'm using mysql will it not be cumbersome for the system due to the number of records?
If I maintain separate database for each instance how can I do the search? How can I integrate required records into one and do the search?
Is there a different database approach to be used rather than mentioned above ?
The problem you describe is "multitenancy" - it's a fairly tricky problem to solve, but luckily, others have written up some useful approaches. (Though the link is to Microsoft, it applies to most SQL environments except in the details).
The trade-offs in your case are:
Does the data from your hotels fit into a single schema? Do their "vacancy" records have the same fields?
How many hotels will there be? 3 separate databases is kinda manageable; 30 is probably not; 300 is definitely not.
How large will the database grow? How many vacancy records?
How likely is it that the data structures will change over time? How likely is it that one hotel will need a change that the others don't?
By far the simplest to manage and develop against is the "single database" model, but only if the data is moderately homogenous in schema, and as long as you can query the data with reasonable performance. I'd not worry about putting a lot of records in MySQL - it scales very well.
In such a design, you'd map "portal" to "hotel" in a lookup table:
PortalHotelAccess
PortalID HotelID
-----------------
A X
B X
B Y
C X
C Y
C Z
I can suggest 2 approaches. Which one to choose depends from some additional information about whole system. In fact, the main question is whether your system can impersonate (substitune by itself, in legal meaning) any of data providers (x, y, z, etc) from consumers point of view (a, b, c, etc) or not.
Centralized DB
First one is actually based on your original scheme with centralized API. It implies a single search engine, collecting required data from data sources, aggregating it in its own DB, and providing to data cosumers.
This is most likely a preferrable solution if data sources are different in their data representation, so you need to preprocess it for uniformity. Also this variant protects your clients from possible problems in connectivity, that is if one of source site goes offline for a short period (I think this may be even up to several hours without a great impact on the booking service actuality), you can still handle requests to the offline site, and store all new documents in the central DB until the problems solved. On the other hand, this means that you should provide some sort of two-way synchronization between your DB and every data source site. Also the centralized DB should be created with reliability in mind in the first place, so it seems that it should be distributed (preferrably over different data centers).
As a result - this approach will probably give best user experience, but will require sufficient efforts for robust implementation.
Multiple Local DBs
If every data provider runs its own DB, but all of them (including backend APIs) are based on a single standard, you can eliminate the need to copy their data into central DB. Of course, the central point should remain, but it will host a middle-layer logic only without DB. The layer is actually an API which binds (x, y, z) with appropriate (a, b, c) - that is a configuration, nothing more. Every consumer site will host a widget (can be just a javascript or fully-fledged web-application) loaded from your central point with appropriate settings embedded into it.
The widget will request all specified backends directly, and aggregate their results in a single list.
This variant is much like most of todays web-applications work, it's simplier to implement, but it is more error-prone.

Categories