What is the best way to do testing database (MYSQL spesific) - php

Right now i'm on testing something in a database. It's a wordpress database. i have to write and delete and do other operation on it. As you know it, it has indexing mechanism that will always make every new post inherit the next highest possible ID.
Please consider that this database is a copying of used database. it has been written before. So, i will need to make sure when i finish my testing, it will be the same
Right now, my only solution is making backup. So if i have end in some section of planned testing, i will backup it and start next testing on another copy of it.
Fortunately, the size of database is only a small one. so delete and copy and backup it will be easy. but i know this way of database testing is only partial solution.It force me to create too many backup copy. I don't know what i will do if the database has bigger size. it will be a very long of testing nightmare.
so i wonder is there any solution that work just like rollback. So it will just lock the database and just put new entry as some kind of cache. I can erase it or write it into the database.
i use mysql and phpmyadmin and use it to developed some custom solution.
EDIT ::: How to effectively doing testing on database when developing PHP solution ?

If your file system supports writeable snapshots - LVM, ZFS, Veritas, etc. - you can take an instant copy of the entire database partition, mount that in another place, start a new instance of MYSQL which uses the snapshot, perform your testing, remove your snapshot - all without disturbing your replica.
The snapshot only needs storage for the amount of data that gets changed during your testing, and so might only need a few GB.

After you delete the record, when you insert the new record, if you set the id to the number you want, rather than leaving it blank, it should just fill that index again.

Is the post number an autoincrement field? If not, hack into WordPress (temporarily) and look for the code where the new posts are stored. Before saving why don't you add a constant e.g. 1000 to post id number. When you are finished with your testing simply delete the id numbers that are greater than your constant.

Related

How to merge local and live databases?

We've been developing for Wordpress for several years and whilst our workflow has been upgraded at several points there's one thing that we've never solved... merging a local Wordpress database with a live database.
So I'm talking about having a local version of the site where files and data are changed, whilst the data on the live site is also changing at the same time.
All I can find is the perfect world scenario of pulling the site down, nobody (even customers) touching the live site, then pushing the local site back up. I.e copying one thing over the other.
How can this be done without running a tonne of mysql commands? (it feels like they could fall over if they're not properly checked!) Can this be done via Gulp's (I've seen it mentioned) or a plugin?
Just to be clear, I'm not talking about pushing/pulling data back and forth via something like WP Migrate DB Pro, BackupBuddy or anything similar - this is a merge, not replacing one database with another.
I would love to know how other developers get around this!
File changes are fairly simple to get around, it's when there's data changes that it causes the nightmare.
WP Stagecoach does do a merge but you can't work locally, it creates a staging site from the live site that you're supposed to work on. The merge works great but it's a killer blow not to be able to work locally.
I've also been told by the developers that datahawk.io will do what I want but there's no release date on that.
It sounds like VersionPress might do what you need:
VersionPress staging
A couple of caveats: I haven't used it, so can't vouch for its effectiveness; and it's currently in early access.
Important : Take a backup of Live database before merging Local data to it.
Follow these steps might help in migrating the large percentage of data and merging it to live
Go to wp back-end of Local site Tools->Export.
Select All content radio button (if not selected by default).
This will bring an Xml file containing all the local data comprised of all default post types and custom post types.
Open this XML file in notepad++ or any editor and find and replace the Local URL with the Live URL.
Now visit the Live site and Import the XML under Tools->Import.
Upload the files (images) manually.
This will bring a large percentage of data from Local to Live .
Rest of the data you will have to write custom scripts.
Risk factors are :
When uploading the images from Local to Live , images of same name
will be overriden.
Wordpress saves the images in post_meta generating a serialized data for the images , than should be taken care of when uploading the database.
Serialized data in post_meta for post_type="attachment" saves serialized data for 3 or 4 dimensions of the images.
Usernames or email ids of users when importing the data , can be same (Or wp performs the function of checking unique usernames and emails) then those users will not be imported (might be possible).
If I were you I'd do the following (slow but affords you the greatest chance of success)
First off, set up a third database somewhere. Cloud services would probably be ideal, since you could get a powerful server with an SSD for a couple of hours. You'll need that horsepower.
Second, we're going to mysqldump the first DB and pipe the output into our cloud DB.
mysqldump -u user -ppassword dbname | mysql -u root -ppass -h somecloud.db.internet
Now we have a full copy of DB #1. If your cloud supports snapshotting data, be sure to take one now.
The last step is to write a PHP script that, slowly but surely, selects the data from the second DB and writes it to the third. We want to do this one record at a time. Why? Well, we need to maintain the relationships between records. So let's take comments and posts. When we pull post #1 from DB #2 it won't be able to keep record #1 because DB #1 already had one. So now post #1 becomes post #132. That means that all the comments for post #1 now need to be written as belonging to post #132. You'll also have to pull the records for the users who made those posts, because their user IDs will also change.
There's no easy fix for this but the WP structure isn't terribly complex. Building a simple loop to pull the data and translate it shouldn't be more then a couple of hours of work.
If I understand you, to merge local and live database, until now I'm using other software such as NavicatPremium, it has Data Sycn feature.
This can be achieved live using spring-xd, create a JDBC Stream to pull data from one db and insert into the other. (This acts as streaming so you don't have to disturb any environment)
The first thing you need to do is asses if it would be easier to do some copy-paste data entry instead of a migration script. Sometimes the best answer is to suck it up and do it manually using the CMS interface. This avoids any potential conflicts with merging primary keys, but you may need to watch for references like the creator of a post or similar data.
If it's just outright too much to manually migrate, you're stuck with writing a script or finding one that is already written for you. Assuming there's nothing out there, here's what you do...
ALWAYS MAKE A BACKUP BEFORE RUNNING MIGRATIONS!
1) Make a list of what you need to transfer. Do you need users, posts, etc.? Find the database tables and add them to the list.
2) Make a note all possible foreign keys in the database tables being merged into the new database. For example, wp_posts has post_author referencing wp_users. These will need specific attention during the migration. Use this documentation to help find them.
3) Once you know what tables you need and what they reference, you need to write the script. Start by figuring out what content is new for the other database. The safest way is to do this manually with some kind of side-by-side list. However, you can come up with your own rules on how to automatically match table rows. Maybe to check for $post1->post_content === $post2->post_content in cases the text needs to be the same. The only catch here is the primary/foreign keys are off limits for these rules.
4) How do you merge new content? The general idea is that all primary keys will need to be changed for any new content. You want to use everything except for the id of post and insert that into the new database. There will be an auto-increment to create the new id, so you wont need the previous id (unless you want it for script output/debug).
5) The tricky part is handling the foreign keys. This process is going to vary wildly depending on what you plan on migrating. What you need to know is which foreign key goes to which (possibly new) primary key. If you're only migrating posts, you may need to hard-code a user id to user id mapping for the post_author column, then use this to replace the values.
But what if I don't know the user ids for the mapping because some users also need to be migrated?
This is where is gets tricky. You will need to first define the merge rules to see if a user already exists. For new users, you need record the id of the newly inserted users. Then after all users are migrated, the post_author value will need to be replaced when it references a newly merged user.
6) Write and test the script! Test it on dummy databases first. And again, make backups before using it on your databases!
I've done something simillar with ETL (Extract, Transform, Load) process when I was moving data from one CMS to another.
Rather than writing a script I used a Pentaho Data Integration (Kettle) tool.
The Idea of ETL is pretty much straight forward:
Extract the data (for instance from one database)
Transform it to suit your needs
Load it to the final destination (your second database).
The tool is easy to use and it allows you to experiment with various steps and outputs to investigate the data. When you design a right ETL proces, you are ready to merge those databases of yours.
How can this be done without running a tonne of mysql commands?
No way. If both local and web sites are running at the same time how can you prevent not having the same ids' with different content?
so if you want to do this you can use mysql repication.i think it will help you to merge with different database mysql.

What will be better: click counter on mysql or on flat file?

I always was sure it is better and faster to use flat files to store realtime visit/click counter data: open file in append mode, lock it, put data and then close. Then read this file by crontab once in a five minutes, store contents to DB and truncate file for new data.
But today my friend told me, that it is a wrong way. It will better to have a permanent MySql connection and write data right to DB on every click. First, DB can store results to memory table. Second, even we store to a table located on disk, then this file is permanently opened by it, so no need to find it on disk and open again and again on every query.
What do you think about it?
UPD: We talking about high-traffic sites, about million per day.
Your friend is right. Write to a file and then a cronjob sending to database every 5 minutes? That sounds very convoluted. I can't imagine a good reason for not writing directly to DB.
Also, when you write to a file in the way you described, the operations are serialized. A user will have to wait for the other one to release the lock before writing. That simply won't scale if you ever need it. The same will happen with a DB if you always write to the same row, but you can have multiple rows for the same value, write to a random one and sum them when you need the total.
It doesn't make much sense to use a memory table in this case. If your data doesn't need to be persisted, it's much simpler to use a memcache you probably already have somewhere and simply increment the value for the key.
If you use a database WITHOUT transactions, you will get the same underlying performance as using files with more reliability and less coding.
It could be true that writing to a database is heavy - e.g. the DB could be on a different server so you have network traffic, or it could be a transactional DB in which case every write has at least 2 writes (potentially more if indexes are involved), but if you're aware of all this stuff then you can use a DB, take advantage of decades of work by others and make your programming task easy.

Writing xml feed to database, How do I safely delete old records and update with new?

I am writing information from an XML feed to a database for use on our site. We have found the xml feeds can be inconsistent, so writing info to the database has been a good solution for us.
Ideally I want to cron a file once a day that parses the xml and then writes it to the database. What methodology should I use to eliminate the data from the previous day because I no longer need it once we cron the file and update with the new daily records.
Bad:
cron file -> delete old records -> write new records
What if the xml is not quite right or there is a problem with the script? Then we blew away the data and can't get any new data at the moment.
If the XML info is bad, at least I can then write in some php on the front end to still display the older data but with dates modified or something.
What type of checks and fail safes would be best for my application? I need to update the records each day but only delete the old records if I know for sure we have good new data to import.
I would suggest a backup in the form of a mysql dump. Essentially, the dump is a snapshot of a database at a given time. So if you start the process and something goes wrong, you can revert it back to the point it was at before you started. The workflow would be something along the lines of:
Create dump -> try {Delete old records -> Create new records } catch (Load dump back into database)
If you are using mySQL more information on dumps can be found at: http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html
most other databases have some form of dump as well
Create a guid for your table by hashing a couple of the fields together - whichever ones are persistant between updates. For example, if you are updating inventory you might use the distributor and sku as the input for your guid.
Then when you update just use a mysql REPLACE query to exchange the old data for new data.
REPLACE
Or use an INSERT...on duplicate key update
The nice thing about this is if your script fails for some reason you can safely run it again without getting extra rows pushed into your table.
If you are worried about bad XML data being pushed into your db just validate all your data before pushing it into your table and anything that shouldn't go just gets skipped.
You might want to take a sql backup at the beginning of the script - and if somehow your table gets really messed up you can always go back and restore to a safe backup.

Best way to handle concurrency issues

i have a LAPP (linux, apache, postgresql and php) environment, but the question is pretty the same both on Postgres or Mysql.
I have an cms app i developed, that handle clients, documents (estimates, invoices, etc..) and other data, structured in 1 postgres DB with many schemas (one for each our customer using the app); let's assume around 200 schemas, each of them used concurrently by 15 people (avg).
EDIT: I do have an timestamp field named last_update on every table, and a trigger that update the timestamp every time the row is update.
The situation is:
People Foo and Bar are editing the document 0001, using a form with every document details.
Foo change the shipment details, for example.
Bar change the phone numbers, and some items in the document.
Foo press the 'Save' button, the app update the db.
Bar press the 'Save' button after bar, resending the form with the old shipment details.
In the database, the Foo changes have been lost.
The situation i want to have:
People Foo, Bar, John, Mary, Paoul are editing the document 0001, using a form with every document details.
Foo change the shipment details, for example.
Bar and the others change something else.
Foo press the 'Save' button, the app update the db.
Bar and the others get an alert 'Warning! this document has been changet by someone else. Click here to load the actuals data'.
I've wondered to use ajax to do this; simply using an hidden field with the id of the document and the last-updated timestamp, every 5 seconds check if the last-updated time is the same and do nothing, else, show the alert dialog box.
So, the page check-last-update.php should look something like:
<?php
//[connect to db, postgres or mysql]
$documentId = isset($_POST['document-id']) ? $_POST['document-id'] : 0;
$lastUpdateTime = isset($_POST['last-update-time']) ? $_POST['last-update-time'] : 0;
//in the real life i sanitize the data and use prepared statements;
$qr = pg_query("
SELECT
last_update_time
FROM
documents
WHERE
id = '$documentId'
");
$ray = pg_fetch_assoc($qr);
if($ray['last_update_time'] > $lastUpdateTime){
//someone else updated the document since i opened it!
echo 'reload';
}else{
echo 'ok';
}
?>
But i dont like to stress the db every 5 seconds for every user that have one (or more...) documents opened.
So, what can be another efficent solution without nuking the db?
I thought to use files, creating for example an empty txt file for each document, and everytime the document is updated, i 'touch' the file updating the 'last modified time' as well... but i guess that this would be slower than db and give problems when i have much users editing the same document.
If someone else have a better idea or any suggestion, please describe it in details!
* - - - - - UPDATE - - - - - *
I definitely choosen to NOT hit the db for check the 'last update timestamp', dont mind if the query will be pretty fast, the (main) database server has other tasks to fullfill, dont like the idea to increase his overload for that thing.
So, im taking this way:
Every time a document is updated by someone, i must do something to sign the new timestamp outside the db environment, e.g. without asking the db. My ideas are:
File-system: for each document i create an empry txt files named as the id of the document, everytime the document is update, i 'touch' the file. Im expecting to have thousands of those empty files.
APC, php cache: this will be probably a more flexible way than the first one, but im wondering if keeping thousands and thousands of data permanently in the apc wont slow down the php execution itself, or consume the server memory. Im little bit afraid to choose this way.
Another db, sqlite or mysql (that are faster and lighter with simple db structures) used to store just the documents ID and timestamps.
Whatever way i choose (files, apc, sub-db) im seriously thinking to use another web-server (lighttp?) on a sub-domain, to handle all those.. long-polling requests.
YET ANOTHER EDIT:
The file's way wouldnt work.
APC can be the solution.
Hitting the DB can be the solution too, creating a table just to handle the timestamps (with only two column, document_id and last_update_timestamp) that need to be as fast and light as possible.
Long polling: that's the way i'll choose, using lighttpd under apache to load static files (images, css, js, etc..), and just for this type of long-polling; This will lighten the apache2 load, specially for the polling.
Apache will proxy-up all those request to lighttpd.
Now, i only have to decide between db solution and APC solution..
p.s: thanks to all whom already answered me, you have been really usefull!
I agree that I probably wouldn't hit the database for this. I suppose I would use APC cache (or some other in-memory cache) to maintain this information. What you are describing is clearly optimistic locking at the detailed record level. The higher the level in the database structure the less you need to deal with. It sounds like you want to check with multiple tables within a structure.
I would maintain a cache (in APC) of the IDs and the timestamps of the last updated time keyed by the table name. So for example I might have an array of table names where each entry is keyed by ID and the actual value is the last updated timestamp. There are probably many ways to set this up with arrays or other structures but you get the idea. I would probably add a timeout to the cache so that entries in the cache are removed after a certain period of time - i.e., I wouldn't want the cache to grow and assume that 1 day old entries aren't useful anymore).
With this architecture you would need to do the following (in addition to setting up APC):
on any update to any (applicable) table, update the APC cache entry with the new timestamp.
within ajax just go as far "back" as php (to obtain the APC cache to check the entry) rather than all of the way "back" to the database.
I think you can use a condition in the UPDATE statement like WHERE ID=? AND LAST_UPDATE=?.
The idea is that you will only succeed in updating when you are the last one reading that row. If someone else has committed something, you will fail, and once you know you've failed, you can query the changes.
Hibernate uses a version field to do that. Give every table such a field and use a trigger to increment it on every update. When storing an update, compare the current version with the version when the data was read earlier. If those don't match, throw an exception. Use transactions to make the check-and-update atomic.
You will need some type of version stamp field for each record. What it is doesn't matter as long as you can guarantee that making any change to a record will result in that version stamp being different. Best practice is to then check and make sure the loaded record's version stamp is the same as the version stamp in the DB when the user clicks save, and if it's different handle it.
How you handle it is up to you. At the very least you'd want to offer to reload from the DB so the user can verify that they still want to save. One up from that would be to attempt to merge their changes into the new DB record and then ask them to verify that the merge worked correctly.
If you want to periodically poll any DB capable of handling your system should be able to take the poll load. 10 users polling once every 5 seconds is 2 transactions per second. This is a trivial load, and should be no problem at all. To keep the average load close to the actual load, just jitter the polling time slightly (instead of doing it exactly every 5 seconds, do it every 4-6 seconds, for example).
Donnie's answer (polling) is probably your best option - simple and works. It'll cover almost every case (its unlikely a simple PK lookup would hurt performance, even on a very popular site).
For completeness, and if you wanted to avoid polling, you can use a push-model. There's various ways described in the Wikipedia article. If you can maintain a write-through cache (everytime you update the record, you update the cache), then you can almost completely eliminate the database load.
Don't use a timestamp "last_updated" column, though. Edits within the same second aren't unheard of. You could get away with it if you add extra information (server that did the update, remote address, port, etc) to ensure that, if two requests came in at the same second, to the same server, you could detect the difference. If you need that precision, though, you might as well use a unique revision field (it doesn't necessarily have to be an incrementing integer, just unique within that record's lifespan).
Someone mentioned persistent connections - this would reduce the setup cost of the polling queries (every connection consumes resources on the database and host machine, naturally). You would keep a single connection (or as few as possible) open all the time (or as long as possible) and use that (in combination with caching and memoization, if desired).
Finally, there are SQL statements that allow you to add a condition on UPDATE or INSERT. My SQl is really rusting, but I think its something like UPDATE ... WHERE .... To match this level of protection, you would have to do your own row locking prior to sending the update (and all the error handling and cleanup that might entail). Its unlikely you'd need this; I'm just mentioning it for completness.
Edit:
Your solution sounds fine (cache timestamps, proxy polling requests to a another server). The only change I'd make is to update the cached timestamps on every save. This will keep the cache fresher. I'd also check the timestamp directly from the db when saving to prevent a save sneaking in due to stale cache data.
If you use APC for caching, then a second HTTP server doesn't make sense - you'd have to run it on the same machine (APC uses shared memory). The same physical machine would be doing the work, but with the additional overhead of a second HTTP server. If you want to off load the polling requests to a second server (lighttpd, in your case), then it would be better to setup lightttpd in front of Apache on a second physical machine and use a shared caching server (memcache) so that the lighttpd server can read the cached timestamps, and Apache can update the cached timestamps. The rationale for putting lighttpd in front of Apache is, if most requests are polling requests, to avoid the heavier-weight Apache process usage.
You probably don't need a second server at all, really. Apache should be able to handle the additional requests. If it can't, then I'd revisit your configuration (specifically the directives that control how many worker processes you run and how many requests they are allowed to handle before being killed).
Your approach of querying the database is the best one. If you do it every 5 seconds and you have 15 concurrent users then you're looking at ~3 queries a second. It should be a very small query too, returning only one row of data. If your database can't handle 3 transactions a second then you might have to look at a better database because 3 queries/second is nothing.
Timestamp the records in the table so you can quickly see if anything has changed without having to diff each field.
This is slightly off topic, but you can use the PEAR package (or PECL package, I forget which) xdiff to send back good user guidance when you do get a collision.
First off only update the fields that have changed on when writing to the database, this will decrease database load.
Second, query the timestamp of the last update, if you have a older timestamp then the current version in the database then throw the warning to the client.
Third is to somehow push this information to the client, though some kind of persistent connection with the server, enabling a concurrent two way connection.
Polling is rarely a nice solution.
You could do the timstamp check only when the user (with the open document) is doing something active with the document like scrolling, moving the mouse over it or starts to edit. Then the user gets an alert if the document has been changed.
.....
I know it was not what you asked for but ... why not a edit-singleton?
The singleton could be a userID column in the document-table.
If a user wants to edit the document, the document is locked for edit by other users.
Or have edit-singletons on the individual fields/groups of information.
Only one user can edit the document at a time. If another user has the document open and want to edit a single timestamp check reveal that the document has been altered and is reloaded.
With a singleton there is no polling and only one timestamp check when the user "touches" and/or wants to edit the document.
But perhaps a singleton mechanism doesn't fit your system.
Regards
Sigersted
Ahhh, i though it was easyer.
So, lets make the point: i have a generic database (pgsql or mysql doesn't matter), that contains many generic objects.
I have $x (actually $x = 200, but is growing, hoping will reach 1000 soon) of exact copy of this database, and for each of them up to 20 (avg 10) users for 9 hours at day.
If one of those users is viewing a record, any record, i must advice him if someone edit the same record.
Let's say Foo is watching the document 0001, sit up for a coffee, Bar open and edit the same document, when Foo come back he must see an 'Warning, someone else edited this document! click here to refresh tha page.'.
That'all i need atm, probably i'll extend this situation, adding a way to see the changes and rollback, but this is not the point.
Some of you suggested to check the 'last update' timestamp only when foo try to save the document; Can be a solution too, but i need something in real-time ( 10 sec deelay ).
Long polling, bad way, but seem to be the only one.
So, what i've done:
Installed Lighttp on my machine (and php5 as fastcgi);
Loaded apache2's proxy module (all, or 403 error will hit you);
Changed the lighttpd port from 80 (that is used by apache2) to 81;
Configured apache2 to proxying the request from mydomain.com/polling/* to polling.mydomain.com (served with Lighttp)
Now, i have another sub http-service that i'll use both for polling and load static content (images, etc..), in order to reduce the apache2's load.
Becose i dont want to nuke the database for the timestamp check, i've tryed some caches system (that can be called from php).
APC: quite simple to install and manage, very lightweight and faster, this would be my first choice.. if only the cache would be sharable between two cgi process (i need to store in cache a value from apache2's php process, and read it from lighttpd's php process)
Memcached: around 4-5 times slower than APC, but run as a single process that can be touched everywhere in my environment. I'll go with this one, atm. (even if is slower, the use i'll do of it is relatively simple).
Now, i just have to try this system loading some test datas to see ho will move 'under pressure' and optimize it.
I suppost this environment will work for other long-polling situations (chat?)
Thanks to everyone who gave me hear!
I suggest: when you first query the record that might be changed, hang onto a local copy. When "updating", compare the copy in the locked table/row against your copy, and if it's changed, kick it back to the user.

What Would be a Suitable Way to Log Changes Within a Database Using CodeIgniter

I want to create a simple auditing system for my small CodeIgniter application. Such that it would take a snapshot of a table entry before the entry has been edited. One way I could think of would be to create a news_audit table, which would replicate all the columns in the news table. It would also create a new record for each change with the added column of date added. What are your views, and opinions of building such functionality into a PHP web application?
There are a few things to take into account before you decide which solution to use:
If your table is large (or could become large) your audit trail needs to be in a seperate table as you describe or performance will suffer.
If you need an audit that can't (potentially) be modified except to add new entries it needs to have INSERT permissions only for the application (and to be cast iron needs to be on a dedicated logging server...)
I would avoid creating audit records in the same table as it might be confusing to another developer (who might no realize they need to filter out the old ones without dates) and will clutter the table with audit rows, which will force the db to cache more disk blocks than it needs to (== performance cost). Also to index this properly might be a problem if your db does not index NULLS. Querying for the most recent version will involve a sub-query if you choose to time stamp them all.
The cleanest way to solve this, if your database supports it, is to create an UPDATE TRIGGER on your news table that copies the old values to a seperate audit table which needs only INSERT permissions). This way the logic is built into the database, and so your applications need not be concerned with it, they just UPDATE the data and the db takes care of keeping the change log. The body of the trigger will just be an INSERT statement, so if you haven't written one before it should not take long to do.
If I knew which db you are using I might be able to post an example...
What we do (and you would want to set up archiving beforehand depending on size and use), but we created an audit table that stores user information, time, and then the changes in XML with the table name.
If you are in SQL2005+ you can then easily search the XML for changes if needed.
We then added triggers to our table to catch what we wanted to audit (inserts, deletes, updates...)
Then with simple serialization we are able to restore and replicate changes.
What scale are we looking at here? On average, are entries going to be edited often or infrequently?
Depending on how many edits you expect for the average item, it might make more sense to store diff's of large blocks of data as opposed to a full copy of the data.
One way I like is to put it into the table itself. You would simply add a 'valid_until' column. When you "edit" a row, you simply make a copy of it and stamp the 'valid_until' field on the old row. The valid rows are the ones without 'valid_until' set. In short, you make it copy-on-write. Don't forget to make your primary keys a combination of the original primary key and the valid_until field. Also set up constraints or triggers to make sure that for each ID there can be only one row that does not have it's valid_until set.
This has upsides and downsides. The upside is less tables. The downside is far more rows in your tables. I would recommend this structure if you often need to access old data. By simply adding a simple WHERE to your queries you can query the state of a table at a previous date/time.
If you only need to access your old data occasionally then I would not recommend this though.
You can take this all the way to the extreme by building a Temportal database.
In small to medium size project I use the following set of rules:
All code is stored under Revision Control System (i.e. Subversion)
There is a directory for SQL patches in source code (i.e. patches/)
All files in this directory start with serial number followed by short description (i.e. 086_added_login_unique_constraint.sql)
All changes to DB schema must be recorded as separate files. No file can be changed after it's checked in to version control system. All bugs must be fixed by issuing another patch. It is important to stick closely to this rule.
Small script remembers serial number of last executed patch in local environment and runs subsequent patches when needed.
This way you can guarantee, that you can recreate your DB schema easily without the need of importing whole data dump. Creating such patches is no brainer. Just run command in console/UI/web frontend and copy-paste it into patch if successful. Then just add it to repo and commit changes.
This approach scales reasonably well. Worked for PHP/PostgreSQL project consisting of 1300+ classes and 200+ tables/views.

Categories