Which database writer model is optimal? - php

I have 2 servers.
Server 1: (Master Server)
- Master Database MariaDB
- PHP Service
+ Reader from Master Database
+ Writer into Master database
Server 2: (Slave Server)
- Slave Database from Server 1
- PHP Service
+ Reader from Slave Database
So, I want to create Writer into Master Database for Server 2.
And I drawing and two way for insert data into Master Server in this image (The direction of the arrow is the direction the data will come.).
I do not know which the way to choose for optimization and I do not have much time to program both tests.
Anyone who has experience in this or a new idea. thank you

I assume the Web Service next to the Master is for ingesting data from sources other than the "Clients"?
The first picture is better. Why incur the extra hop and the extra coding of the second picture?
But... Beware of the "critical read" problem. This where, for example, a client posts a blog comment, then goes to the next web page, but that page fails to show the comment. The problem is that the write to the Master may not have gotten to the Slave before the user gets to that page.
This is solved in a variety of ways; I am merely alerting you of the issue. (The issue is probably more prevalent in the second picture.)
I have done the first picture many times; never even considered the second picture.

Related

HTTP - single user on a single connection

I have a raspberry pi3 running this project from instructables.com
Simple and intuitive web interface for your Raspberry Pi
I need to make the buttons function so that only one user at a time can push a button while a lot of users can view the page. This is to control a pan/tilt web camera where Button Zero pans the camera left, Button One pans the camera right, etc. The raspberry drives relays that drive the motors in the Pelco Camera Pan/tilt mount. I can't have one user trying to pan left while another user on a different http connection tries to pan right. There is no log-in to access this raspberry.
Is there an Apache2 setting to accomplish this? I don't think this can be solved with adding code to the GPIO.php file, or is there? Can I use a semophore or $global flag to limit button actuation with multiple concurrent viewers?
You need a mutex lock. Two quick and dirty ways I may suggest.
Lock a local file on the system
Open the file, and lock it for writing.
During this time if another user attempted to control the camera you would first attempt to open this file with and lock it for writing and it would fail with an error.
At that point you know you can't give the second user control.
PHP lock a file for writing
Note: You could use multiple different files for different locks. For example if you had multiple cameras then you could use multiple files to lock, one for each camera.
Use a database like MySQL
Using a database like MySQL you can lock a specific row in a table, effectively doing the same thing as we did with a file in the last example.
If a second user comes along we again attempt to lock that same row and we will fail, at this point we can reject the second user's request.
Lock a single row in MySQL
Note: You can user multiple rows, where each row may represent a different camera as mentioned above.
Other things to consider
I highly recommend providing your users the ability to see if they are they current user or not, and implementing a way to fairly switch between users so that a single user can't hog all the fun. Perhaps something as simple as a 15 or 30 second timer which switches control between the current active users.

Replicate Master DB to Different Slaves

I have a master database which would be the cloud server that consisted of different schools.
Dashboard type that has the details of each school. Can edit their information and other data.
Now those schools are deployed to their corresponding school location which would be the local server.
Dashboard type that can only edit the specific school deployed in the local server. Can edit their information and other data.
Now what I want to happen is, to synchronize the cloud to local server on their corresponding school if something is changed. That also goes for local to cloud server.
Note: If you guys ever tried Evernote, that can edit the notes information on whatever device you're using and still be able to
synchronize when you have internet or manually clicked synchronize.
When the local server doesn't have internet connection and edited some data in school. Once the internet is up, the data from local and cloud server should be synchronize.
That's the logic that I'm pursuing to have.
Would anyone shed some light for me where to start off? I couldn't think of any solution that fit my problem.
I also think of using php to foreach loop all over the table and data that corresponds to current date and time. But I know that would be so bad.
Edited: I deleted references / posts of other SO questions regarding this matter.
The application pegs that I found are
Evernote
Todoist
Servers:
Local Server Computer: Windows 10 (Deployed in Schools)
Cloud Server: Probably some dedicated hosting that uses phpmyadmin
Not to be picky but, hopefully the answer would be you're talking to a newbie to master to slave database process. I don't have experience for this.
When we used to do this we would:
Make sure every table we wanted to sync had datetime columns for Created; Modified; & Deleted. They would also have a boolean isDeleted column (so rather than physically delete records we would flag it to true and ignore it in queries). This means we could query for any records that have been deleted since a certain time and return an array of these deleted IDs.
In each DB (Master and slave) create a table that stores the last successful sync datetime. In the master this table stores multiple records: 1 for each school, but in the slave it just needs 1 record - the last time it synced with the master.
In your case every so often each of the slaves would:
Call a webservice (a URL) of the master, lets say called 'helloMaster'. It would pass in the school name (or some specific identifier), the last time they successfully synced with the master, authentication details (for security) and expect a response from the master of whether the master had any updates for the school since that datetime provided. Really the point here is just looking for an acknowledgement that the master available and listening (ie. the internet is still up).
Then, the slave would call another webservice, lets say called 'sendUpdates'. It would again pass in the school name, last successful sync, (plus security authentication details) & three arrays for any added, updated and deleted records since last sync. The master just acknowledge receipt. If a receipt was acknowledged then the slave to move to step 3, otherwise the slave would try step 1 again after a pause of some duration. So now the Master has updates from the slave. Note: it is up to the master to decide how to merge any records if there are conflicts with its pending slave updates.
The slave then calls a webservice, lets say 'getUpdates'. It passes in the school name, last successful sync, security authentication details, & the master then return to it three arrays for any added, updated and deleted records it has which the slave is expected to apply to its database.
Finally once the slave tries to update its records it will then notifies the master of success/failure through another webservice, say 'updateStatus'. If successful then the master will return a new sync date for the slave to store (this will exactly match the date the master stores in its table). If it fails then the error is logged in the master and we go back to step 1 after a pause.
I have left out some detail out about error handling, getting the times accurate across all devices (there might be different time zones involved), and some other bits and pieces, but that's the gist of it.
I may make refinements after thinking on it more (or others might edit my post).
Hope that helps at least.
I will suggest you to go with the Trivial Solution, which according to me is:
Create a SQLlite or any database (MySQL or your choice) in local server
Keep a always running thread which will be pinging (makes an API call) your Master database every 5 minutes (depends on how much delay is accepted)
With that thread you can detect whether you're connected to the internet or not.
If connected to internet
a) Send local changes with the request to master server, this master server is an application server, which will be capable to update changes of local machines in school (you received this changes by an API call) to the master database after certain validations according to your application usage.
b) Receive updated changes from the server after the API call, this changes are served after solving conflicts (like if data in school server was updated earlier than data updated in master database so which one you will accept based on your requirement).
If not connected to internet, keep storing changes in local database and reflect those changes in Application which is running in school, but when you get connected push those changes to master server and pull actual changes which is applicable from the master server.
This is complicated to do it by your own, but if the scale is small I will prefer to implement your own APIs for the database applications which will connect in this manner.
Better solution will be to use Google Firebase, which is a real time database which is asynchronously updated whenever there is change in any machine, but can cost you higher if its really not required. But yes it will really give you Evernote type realtime editing features for your database systems.
This is not a problem that can be solved by database replication.
Generally speaking, database replication can operate in one of two modes:
Master/slave replication, which is what MySQL uses. In this mode, all writes must be routed to a single "master" server, and all of the replica databases receive a feed of changes from the master.
This doesn't suit your needs, as writes can only be made to the master. (Modifying one of the replicas directly would result in it becoming permanently out of sync with the master.)
Quorum-based replication, which is used by some newer databases. All database replicas connect to each other. So long as at least half of all replicas are connected (that is, the cluster has reached "quorum"), writes can be made to any of the active databases, and will be propagated to all of the other databases. A database that is not connected will be brought up to date when it joins the quorum.
This doesn't suit your needs either, as a disconnected replica cannot be written to. Worse, having more than half of all replicas disconnect from the master would prevent the remaining databases from being written to either!
What you need is some sort of data synchronization solution. Any solution will require some logic -- which you will have to write! -- to resolve conflicts. (For instance, if a record is modified in the master database while a school's local replica is disconnected, and the same record is also modified there, you will need some way to reconcile those differences.)
No need for any complicated setup or APIs. MySQL allows you to easily replicate your database. MySQL will ensure the replication is correctly and timely done and whenever internet is available. (and its fast too)
There are:
Master - slave: Master edits slave reads or in other words one way synchronization from master to slave.
Master - Master: Master1 edits master2 reads and edits or in other words two way synchronization. Both server will push and pull updates.
assuming your cloud server has schema for each school and each schema is accessible by its own username and password. i.e db_school1, db_school2
now you have the option to replicate only a selected database schema from your cloud to local master. In your case, school one's local master will only "do replicate db_school1"
in case if you want to replicate only specific table, MySQL also has that option "replicate-do-table"
the actual replication process is very easy but can get very deep when you have different scenarios.
few things you want to take a note, server ids, different auto-increment value on each server to avoid conflicts with new records. i.e Master1 generates records on odd number, Master 2 on even numbers so there won't be a duplicate primary key issues. Server down alerts/monitoring, error skipping
I'm not sure if you are on linux or windows, I've wrote simple c# application which checks if any of the master is not replicating or stopped for any reason and sends email. monitoring is crucial!
here some links for master master replication:
https://www.howtoforge.com/mysql_master_master_replication
https://www.digitalocean.com/community/tutorials/how-to-set-up-mysql-master-master-replication
also worth reading this optimised tabl-level replication info:
https://dba.stackexchange.com/questions/37015/how-can-i-replicate-some-tables-without-transferring-the-entire-log
hope this helps.
Edit:
The original version of this answer proposed MongoDB; but with further reading MongoDB is not so reliable with dodgy internet connections. CouchDB is designed for offline documents, which is what you need - although it's harder to get gong than MongoDB, unfortunately.
Original:
I'd suggest not using MySQL but deploy a document store designed for replication such as CouchDB - unless you go for the commercial MySQL clustering services.
Being a lover of the power of MySQL I find it hard to suggest you use something else, but in this case, you really should.
Here is why -
Problems using MySQL replication
Why MySQL had good replication (and that's most likely what you should be using if you're synchronizing a MySQL database - as recommended by others) there are some things to watch out for.
"Unique Key" clashes will give you a massive headache; the most
likely cause of this is "Auto Incrementing" IDs that are common in
MySQL applications (don't use them for syncing operation unless there
is a clear "read+write"->"read-only" relationship - which there isn't
in your case.)
Primary keys must be generated by each server but unique across all servers. Possibly by adding a mix of a server identifier and a unique ID for that server (Server1_1, Server1_2, Server1_3 etc will not clash with Server2_1)
MySQL sync only supports on-way unless you look at their clustering solutions (https://www.mysql.com/products/cluster/).
Problems doing it "manually" with time stamping the record.
Another answer recommends keeping "Time Updated" records. While I've done this approach there are some big gotchas to be careful of.
"Unique Key" clashes (as mentioned above; same problems - don't use them except primary keys, and generate primary keys unique to the server)
Multiple updates on multiple servers need to be precisely time-synced
and clashes handled according to rules. This can be a headache.
What happens when updates are received way out-of-order; which fields have been updated, which weren't? You probably don't need to update the whole record, but how do you know?
If you must, try one of the commercial solutions as mentioned in answers https://serverfault.com/questions/58829/how-to-keep-multiple-read-write-db-servers-in-sync and https://community.spiceworks.com/topic/352125-how-to-synchronize-multiple-mysql-servers and Strategy on synchronizing database from multiple locations to a central database and vice versa (etc - Google for more)
Problems doing it "manually" with journalling.
Journalling is keeping a separate record of what has changed and when. "Database X, Table Y, Field Z was updated to value A at time B" or "Table A had new record added with these details [...]". This allows you much finer control of what to update.
if you look at database sync techniques, this is actually what is going on in the background; in MySQL's case it keeps a binary log of the updates
you only ever share the journal, never the original record.
When another server receives a journal entry, if has a much greater picture of what has happened before/after and can replay updates and ensure you get the correct details.
problems arise when the journalling/database get out of Sync (MySQL is actually a pain when this happens!). You need to have a "refresh" script ready to roll that sits outside the journalling that will sync the DB to the master.
It's complicated. So...
Solution: Using a document store designed for replication, e.g. MongoDB
Bearing all this that in mind, why not use a document store that already does all that for you? CouchDB has support and handles all the journalling and syncing (http://docs.couchdb.org/en/master/replication/protocol.html).
There are others out there, but I believe you'll end up with less headaches and errors than with the other solutions.
Master to master replication in MySQL can be accomplished without key violations while using auto_increment. Here is a link that explains how.
If you have tables without primary keys I'm not sure what will happen (I always include auto_increment primary keys on tables)
http://brendanschwartz.com/post/12702901390/mysql-master-master-replication
The auto-increment-offset and auto-increment-increment effect the auto_increment values as shown in the config samples from the article...
server_id = 1
log_bin = /var/log/mysql/mysql-bin.log
log_bin_index = /var/log/mysql/mysql-bin.log.index
relay_log = /var/log/mysql/mysql-relay-bin
relay_log_index = /var/log/mysql/mysql-relay-bin.index
expire_logs_days = 10
max_binlog_size = 100M
log_slave_updates = 1
auto-increment-increment = 2
auto-increment-offset = 1
server_id = 2
log_bin = /var/log/mysql/mysql-bin.log
log_bin_index = /var/log/mysql/mysql-bin.log.index
relay_log = /var/log/mysql/mysql-relay-bin
relay_log_index = /var/log/mysql/mysql-relay-bin.index
expire_logs_days = 10
max_binlog_size = 100M
log_slave_updates = 1
auto-increment-increment = 2
auto-increment-offset = 2

Online and offline synchronization

I am working on a project that synchronizes online and offline features due to the unstable Internet. I have come up with a possible solution. That is to create 2 similar databases for both online and offline and sync the two. My question is that is this a good method? Or are there better options?
I have researched online on the subject but I haven't come across anything substantive. One useful link I found was on database Replication. But I want the offline version to detect Internet presence and sync accordingly.
Pls can you help me find solutions or clues to solve my problem?
I'd suggest you have an online storage for syncing and a local database(browser indexeddb, program sqllite or something similar) and log all your changes in your local database but have a record with what data was entered after last sync.
When you have a connection you sync all new data with the online storage at set intervals(like once every 5 mins or constant stream if you have the bandwidth/cpu capacity)
When the user logs in from a "fresh" location the online database pushes all data to the client who fills the local database with the data and then it resumes normal syncing function.
Plan A: Primary-Primary replication (formerly called Master-Master). You do need to be careful PRIMARY KEYs and UNIQUE keys. While the "other" machine is offline, you could write conflicting values to a table. Later, when they try to sync up, replication will freeze, requiring manual intervention. (Not a pretty sight.)
Plan B: Write changes to some storage other than the db. This suffers the same drawbacks as Plan A, plus there is a bunch of coding on your part to implement it.
Plan C: Galera cluster with 3 nodes. When all 3 nodes are up, all can take writes. If one node goes down, or network problems make it seem offline to the other two, it will automatically become read-only. After things get fixed, the sync is done automatically.
Plan D: Only write to a reliable Primary; let the other be a readonly Replica. (But this violates your requirement about an "unstable Internet".)
None of these perfectly fits the requirements. Plan A seems to be the only one that has a chance. Let's look at that.
If you have any UNIQUE key in any table and you might insert new rows into it, the problem exists. Even something as innocuous as a 'normalization table' wherein you insert a name and get back an id for use in other tables has the problem. You might do that on both servers with the same name and get different ids. Now you have a mess that is virtually impossible to fix.
Not sure if its outside the scope of the project but you can try these:
https://pouchdb.com/
https://couchdb.apache.org/
" PouchDB is an open-source JavaScript database inspired by Apache CouchDB that is designed to run well within the browser.
PouchDB was created to help web developers build applications that work as well offline as they do online.
It enables applications to store data locally while offline, then synchronize it with CouchDB and compatible servers when the application is back online, keeping the user's data in sync no matter where they next login. "

How to "upgrade" the database in real world?

My company have develop a web application using php + mysql. The system can display a product's original price and discount price to the user. If you haven't logined, you get the original price, if you loginned , you get the discount price. It is pretty easy to understand.
But my company want more features in the system, it want to display different prices base on different user. For example, user A is a golden parnter, he can get 50% off. User B is a silver parnter, only have 30 % off. But this logic is not prepare in the original system, so I need to add some attribute in the database, at least a user type in this example. Is there any recommendation on how to merge current database to my new version of database. Also, all the data should preserver, and the server should works 24/7. (within stop the database)
Is it possible to do so? Also , any recommend for future maintaince advice? Thz u.
I would recommend writing a tool to run SQL queries to your databases incrementally. Much like Rails migrations.
In the system I am currently working on, we have such tool written in python, we name our scripts something like 000000_somename.sql, where the 0s is the revision number in our SCM (subversion), and the tool is run as part of development/testing and finally deploying to production.
This has the benefit of being able to go back in time in terms of database changes, much like in code (if you use a source code version control tool) too.
http://dev.mysql.com/doc/refman/5.1/en/alter-table.html
Here are more concrete examples of ALTER TABLE.
http://php.about.com/od/learnmysql/p/alter_table.htm
You can add the necessary columns to your table with ALTER TABLE, then set the user type for each user with UPDATE. Then deploy the new version of your app. that uses the new column.
Did you use an ORM for data access layer ? I know Doctrine comes with a migration API which allow version switch up and down (in case something went wrong with new version).
Outside any framework or ORM consideration, a fast script will minimize slowdown (or downtime if process is too long).
To my opinion, I'd rather prefer a 30sec website access interruption with an information page, than getting shorter interuption time but getting visible bugs or no display at all. If interruption times matters, it's best doing this at night or when lesser traffic.
This can all be done in one script (or at least launched by one commande line), when we'd to do such scripts we include in a shell script :
putting application in standby (temporary static page) : you can use .htaccess redirect or whatever applicable to your app/server environment.
svn udpate (or switch) for source code and assets upgrade
empty caches, cleaning up temp files, etc.
rebuild generated classes (symfony specific)
upgrade DB structure with ALTER / CREATE TABLE querys
if needed, migrate data from old structure to new : depending on what you changed on structure, it may require fetching data before altering DB structure, or use tmp tables.
if all went well, remove temporary page. Upgrade done
if something went wrong display a red message to the operator so it can see what happened, try to fix it and then remove waiting page by hand.
The script should do checks at each steps and stop a first error, and it should be verbose (but concise) about what it does at all steps, thus you can fix the app faster if something has to went wrong.
The best would be a recoverable script (error at step 2 - stop process - manual fix - recover at step 3), I never took the time to implement it this way.
If works pretty well but these kind of script have to be intensively tested, on an environnement as closest as possible to the production one.
In general we develop such scripts locally, and test them on the same platform tha the production env (just different paths and DB)
If the waiting page is not an option, you can go whithout but you need to ensure data and users session integrity. As an example, use LOCK on tables during upgrade/data transfer and use exclusive locks on modified files (SVN does I think)
There could other better solutions, but it's basically what I use and it do the job for us. The major drawback is that kind of script had to be rewritten at each major release, this incitate me to search for other options to do this, but which one ??? I would be glad if someone here had better and simpler alternative.

Best way to handle concurrency issues

i have a LAPP (linux, apache, postgresql and php) environment, but the question is pretty the same both on Postgres or Mysql.
I have an cms app i developed, that handle clients, documents (estimates, invoices, etc..) and other data, structured in 1 postgres DB with many schemas (one for each our customer using the app); let's assume around 200 schemas, each of them used concurrently by 15 people (avg).
EDIT: I do have an timestamp field named last_update on every table, and a trigger that update the timestamp every time the row is update.
The situation is:
People Foo and Bar are editing the document 0001, using a form with every document details.
Foo change the shipment details, for example.
Bar change the phone numbers, and some items in the document.
Foo press the 'Save' button, the app update the db.
Bar press the 'Save' button after bar, resending the form with the old shipment details.
In the database, the Foo changes have been lost.
The situation i want to have:
People Foo, Bar, John, Mary, Paoul are editing the document 0001, using a form with every document details.
Foo change the shipment details, for example.
Bar and the others change something else.
Foo press the 'Save' button, the app update the db.
Bar and the others get an alert 'Warning! this document has been changet by someone else. Click here to load the actuals data'.
I've wondered to use ajax to do this; simply using an hidden field with the id of the document and the last-updated timestamp, every 5 seconds check if the last-updated time is the same and do nothing, else, show the alert dialog box.
So, the page check-last-update.php should look something like:
<?php
//[connect to db, postgres or mysql]
$documentId = isset($_POST['document-id']) ? $_POST['document-id'] : 0;
$lastUpdateTime = isset($_POST['last-update-time']) ? $_POST['last-update-time'] : 0;
//in the real life i sanitize the data and use prepared statements;
$qr = pg_query("
SELECT
last_update_time
FROM
documents
WHERE
id = '$documentId'
");
$ray = pg_fetch_assoc($qr);
if($ray['last_update_time'] > $lastUpdateTime){
//someone else updated the document since i opened it!
echo 'reload';
}else{
echo 'ok';
}
?>
But i dont like to stress the db every 5 seconds for every user that have one (or more...) documents opened.
So, what can be another efficent solution without nuking the db?
I thought to use files, creating for example an empty txt file for each document, and everytime the document is updated, i 'touch' the file updating the 'last modified time' as well... but i guess that this would be slower than db and give problems when i have much users editing the same document.
If someone else have a better idea or any suggestion, please describe it in details!
* - - - - - UPDATE - - - - - *
I definitely choosen to NOT hit the db for check the 'last update timestamp', dont mind if the query will be pretty fast, the (main) database server has other tasks to fullfill, dont like the idea to increase his overload for that thing.
So, im taking this way:
Every time a document is updated by someone, i must do something to sign the new timestamp outside the db environment, e.g. without asking the db. My ideas are:
File-system: for each document i create an empry txt files named as the id of the document, everytime the document is update, i 'touch' the file. Im expecting to have thousands of those empty files.
APC, php cache: this will be probably a more flexible way than the first one, but im wondering if keeping thousands and thousands of data permanently in the apc wont slow down the php execution itself, or consume the server memory. Im little bit afraid to choose this way.
Another db, sqlite or mysql (that are faster and lighter with simple db structures) used to store just the documents ID and timestamps.
Whatever way i choose (files, apc, sub-db) im seriously thinking to use another web-server (lighttp?) on a sub-domain, to handle all those.. long-polling requests.
YET ANOTHER EDIT:
The file's way wouldnt work.
APC can be the solution.
Hitting the DB can be the solution too, creating a table just to handle the timestamps (with only two column, document_id and last_update_timestamp) that need to be as fast and light as possible.
Long polling: that's the way i'll choose, using lighttpd under apache to load static files (images, css, js, etc..), and just for this type of long-polling; This will lighten the apache2 load, specially for the polling.
Apache will proxy-up all those request to lighttpd.
Now, i only have to decide between db solution and APC solution..
p.s: thanks to all whom already answered me, you have been really usefull!
I agree that I probably wouldn't hit the database for this. I suppose I would use APC cache (or some other in-memory cache) to maintain this information. What you are describing is clearly optimistic locking at the detailed record level. The higher the level in the database structure the less you need to deal with. It sounds like you want to check with multiple tables within a structure.
I would maintain a cache (in APC) of the IDs and the timestamps of the last updated time keyed by the table name. So for example I might have an array of table names where each entry is keyed by ID and the actual value is the last updated timestamp. There are probably many ways to set this up with arrays or other structures but you get the idea. I would probably add a timeout to the cache so that entries in the cache are removed after a certain period of time - i.e., I wouldn't want the cache to grow and assume that 1 day old entries aren't useful anymore).
With this architecture you would need to do the following (in addition to setting up APC):
on any update to any (applicable) table, update the APC cache entry with the new timestamp.
within ajax just go as far "back" as php (to obtain the APC cache to check the entry) rather than all of the way "back" to the database.
I think you can use a condition in the UPDATE statement like WHERE ID=? AND LAST_UPDATE=?.
The idea is that you will only succeed in updating when you are the last one reading that row. If someone else has committed something, you will fail, and once you know you've failed, you can query the changes.
Hibernate uses a version field to do that. Give every table such a field and use a trigger to increment it on every update. When storing an update, compare the current version with the version when the data was read earlier. If those don't match, throw an exception. Use transactions to make the check-and-update atomic.
You will need some type of version stamp field for each record. What it is doesn't matter as long as you can guarantee that making any change to a record will result in that version stamp being different. Best practice is to then check and make sure the loaded record's version stamp is the same as the version stamp in the DB when the user clicks save, and if it's different handle it.
How you handle it is up to you. At the very least you'd want to offer to reload from the DB so the user can verify that they still want to save. One up from that would be to attempt to merge their changes into the new DB record and then ask them to verify that the merge worked correctly.
If you want to periodically poll any DB capable of handling your system should be able to take the poll load. 10 users polling once every 5 seconds is 2 transactions per second. This is a trivial load, and should be no problem at all. To keep the average load close to the actual load, just jitter the polling time slightly (instead of doing it exactly every 5 seconds, do it every 4-6 seconds, for example).
Donnie's answer (polling) is probably your best option - simple and works. It'll cover almost every case (its unlikely a simple PK lookup would hurt performance, even on a very popular site).
For completeness, and if you wanted to avoid polling, you can use a push-model. There's various ways described in the Wikipedia article. If you can maintain a write-through cache (everytime you update the record, you update the cache), then you can almost completely eliminate the database load.
Don't use a timestamp "last_updated" column, though. Edits within the same second aren't unheard of. You could get away with it if you add extra information (server that did the update, remote address, port, etc) to ensure that, if two requests came in at the same second, to the same server, you could detect the difference. If you need that precision, though, you might as well use a unique revision field (it doesn't necessarily have to be an incrementing integer, just unique within that record's lifespan).
Someone mentioned persistent connections - this would reduce the setup cost of the polling queries (every connection consumes resources on the database and host machine, naturally). You would keep a single connection (or as few as possible) open all the time (or as long as possible) and use that (in combination with caching and memoization, if desired).
Finally, there are SQL statements that allow you to add a condition on UPDATE or INSERT. My SQl is really rusting, but I think its something like UPDATE ... WHERE .... To match this level of protection, you would have to do your own row locking prior to sending the update (and all the error handling and cleanup that might entail). Its unlikely you'd need this; I'm just mentioning it for completness.
Edit:
Your solution sounds fine (cache timestamps, proxy polling requests to a another server). The only change I'd make is to update the cached timestamps on every save. This will keep the cache fresher. I'd also check the timestamp directly from the db when saving to prevent a save sneaking in due to stale cache data.
If you use APC for caching, then a second HTTP server doesn't make sense - you'd have to run it on the same machine (APC uses shared memory). The same physical machine would be doing the work, but with the additional overhead of a second HTTP server. If you want to off load the polling requests to a second server (lighttpd, in your case), then it would be better to setup lightttpd in front of Apache on a second physical machine and use a shared caching server (memcache) so that the lighttpd server can read the cached timestamps, and Apache can update the cached timestamps. The rationale for putting lighttpd in front of Apache is, if most requests are polling requests, to avoid the heavier-weight Apache process usage.
You probably don't need a second server at all, really. Apache should be able to handle the additional requests. If it can't, then I'd revisit your configuration (specifically the directives that control how many worker processes you run and how many requests they are allowed to handle before being killed).
Your approach of querying the database is the best one. If you do it every 5 seconds and you have 15 concurrent users then you're looking at ~3 queries a second. It should be a very small query too, returning only one row of data. If your database can't handle 3 transactions a second then you might have to look at a better database because 3 queries/second is nothing.
Timestamp the records in the table so you can quickly see if anything has changed without having to diff each field.
This is slightly off topic, but you can use the PEAR package (or PECL package, I forget which) xdiff to send back good user guidance when you do get a collision.
First off only update the fields that have changed on when writing to the database, this will decrease database load.
Second, query the timestamp of the last update, if you have a older timestamp then the current version in the database then throw the warning to the client.
Third is to somehow push this information to the client, though some kind of persistent connection with the server, enabling a concurrent two way connection.
Polling is rarely a nice solution.
You could do the timstamp check only when the user (with the open document) is doing something active with the document like scrolling, moving the mouse over it or starts to edit. Then the user gets an alert if the document has been changed.
.....
I know it was not what you asked for but ... why not a edit-singleton?
The singleton could be a userID column in the document-table.
If a user wants to edit the document, the document is locked for edit by other users.
Or have edit-singletons on the individual fields/groups of information.
Only one user can edit the document at a time. If another user has the document open and want to edit a single timestamp check reveal that the document has been altered and is reloaded.
With a singleton there is no polling and only one timestamp check when the user "touches" and/or wants to edit the document.
But perhaps a singleton mechanism doesn't fit your system.
Regards
Sigersted
Ahhh, i though it was easyer.
So, lets make the point: i have a generic database (pgsql or mysql doesn't matter), that contains many generic objects.
I have $x (actually $x = 200, but is growing, hoping will reach 1000 soon) of exact copy of this database, and for each of them up to 20 (avg 10) users for 9 hours at day.
If one of those users is viewing a record, any record, i must advice him if someone edit the same record.
Let's say Foo is watching the document 0001, sit up for a coffee, Bar open and edit the same document, when Foo come back he must see an 'Warning, someone else edited this document! click here to refresh tha page.'.
That'all i need atm, probably i'll extend this situation, adding a way to see the changes and rollback, but this is not the point.
Some of you suggested to check the 'last update' timestamp only when foo try to save the document; Can be a solution too, but i need something in real-time ( 10 sec deelay ).
Long polling, bad way, but seem to be the only one.
So, what i've done:
Installed Lighttp on my machine (and php5 as fastcgi);
Loaded apache2's proxy module (all, or 403 error will hit you);
Changed the lighttpd port from 80 (that is used by apache2) to 81;
Configured apache2 to proxying the request from mydomain.com/polling/* to polling.mydomain.com (served with Lighttp)
Now, i have another sub http-service that i'll use both for polling and load static content (images, etc..), in order to reduce the apache2's load.
Becose i dont want to nuke the database for the timestamp check, i've tryed some caches system (that can be called from php).
APC: quite simple to install and manage, very lightweight and faster, this would be my first choice.. if only the cache would be sharable between two cgi process (i need to store in cache a value from apache2's php process, and read it from lighttpd's php process)
Memcached: around 4-5 times slower than APC, but run as a single process that can be touched everywhere in my environment. I'll go with this one, atm. (even if is slower, the use i'll do of it is relatively simple).
Now, i just have to try this system loading some test datas to see ho will move 'under pressure' and optimize it.
I suppost this environment will work for other long-polling situations (chat?)
Thanks to everyone who gave me hear!
I suggest: when you first query the record that might be changed, hang onto a local copy. When "updating", compare the copy in the locked table/row against your copy, and if it's changed, kick it back to the user.

Categories