How to merge local and live databases?

How to merge local and live databases? - php

We've been developing for Wordpress for several years and whilst our workflow has been upgraded at several points there's one thing that we've never solved... merging a local Wordpress database with a live database.
So I'm talking about having a local version of the site where files and data are changed, whilst the data on the live site is also changing at the same time.
All I can find is the perfect world scenario of pulling the site down, nobody (even customers) touching the live site, then pushing the local site back up. I.e copying one thing over the other.
How can this be done without running a tonne of mysql commands? (it feels like they could fall over if they're not properly checked!) Can this be done via Gulp's (I've seen it mentioned) or a plugin?
Just to be clear, I'm not talking about pushing/pulling data back and forth via something like WP Migrate DB Pro, BackupBuddy or anything similar - this is a merge, not replacing one database with another.
I would love to know how other developers get around this!
File changes are fairly simple to get around, it's when there's data changes that it causes the nightmare.
WP Stagecoach does do a merge but you can't work locally, it creates a staging site from the live site that you're supposed to work on. The merge works great but it's a killer blow not to be able to work locally.
I've also been told by the developers that datahawk.io will do what I want but there's no release date on that.

It sounds like VersionPress might do what you need:
VersionPress staging
A couple of caveats: I haven't used it, so can't vouch for its effectiveness; and it's currently in early access.

Important : Take a backup of Live database before merging Local data to it.
Follow these steps might help in migrating the large percentage of data and merging it to live
Go to wp back-end of Local site Tools->Export.
Select All content radio button (if not selected by default).
This will bring an Xml file containing all the local data comprised of all default post types and custom post types.
Open this XML file in notepad++ or any editor and find and replace the Local URL with the Live URL.
Now visit the Live site and Import the XML under Tools->Import.
Upload the files (images) manually.
This will bring a large percentage of data from Local to Live .
Rest of the data you will have to write custom scripts.
Risk factors are :
When uploading the images from Local to Live , images of same name
will be overriden.
Wordpress saves the images in post_meta generating a serialized data for the images , than should be taken care of when uploading the database.
Serialized data in post_meta for post_type="attachment" saves serialized data for 3 or 4 dimensions of the images.
Usernames or email ids of users when importing the data , can be same (Or wp performs the function of checking unique usernames and emails) then those users will not be imported (might be possible).

If I were you I'd do the following (slow but affords you the greatest chance of success)
First off, set up a third database somewhere. Cloud services would probably be ideal, since you could get a powerful server with an SSD for a couple of hours. You'll need that horsepower.
Second, we're going to mysqldump the first DB and pipe the output into our cloud DB.
mysqldump -u user -ppassword dbname | mysql -u root -ppass -h somecloud.db.internet
Now we have a full copy of DB #1. If your cloud supports snapshotting data, be sure to take one now.
The last step is to write a PHP script that, slowly but surely, selects the data from the second DB and writes it to the third. We want to do this one record at a time. Why? Well, we need to maintain the relationships between records. So let's take comments and posts. When we pull post #1 from DB #2 it won't be able to keep record #1 because DB #1 already had one. So now post #1 becomes post #132. That means that all the comments for post #1 now need to be written as belonging to post #132. You'll also have to pull the records for the users who made those posts, because their user IDs will also change.
There's no easy fix for this but the WP structure isn't terribly complex. Building a simple loop to pull the data and translate it shouldn't be more then a couple of hours of work.

If I understand you, to merge local and live database, until now I'm using other software such as NavicatPremium, it has Data Sycn feature.

This can be achieved live using spring-xd, create a JDBC Stream to pull data from one db and insert into the other. (This acts as streaming so you don't have to disturb any environment)

The first thing you need to do is asses if it would be easier to do some copy-paste data entry instead of a migration script. Sometimes the best answer is to suck it up and do it manually using the CMS interface. This avoids any potential conflicts with merging primary keys, but you may need to watch for references like the creator of a post or similar data.
If it's just outright too much to manually migrate, you're stuck with writing a script or finding one that is already written for you. Assuming there's nothing out there, here's what you do...
ALWAYS MAKE A BACKUP BEFORE RUNNING MIGRATIONS!
1) Make a list of what you need to transfer. Do you need users, posts, etc.? Find the database tables and add them to the list.
2) Make a note all possible foreign keys in the database tables being merged into the new database. For example, wp_posts has post_author referencing wp_users. These will need specific attention during the migration. Use this documentation to help find them.
3) Once you know what tables you need and what they reference, you need to write the script. Start by figuring out what content is new for the other database. The safest way is to do this manually with some kind of side-by-side list. However, you can come up with your own rules on how to automatically match table rows. Maybe to check for $post1->post_content === $post2->post_content in cases the text needs to be the same. The only catch here is the primary/foreign keys are off limits for these rules.
4) How do you merge new content? The general idea is that all primary keys will need to be changed for any new content. You want to use everything except for the id of post and insert that into the new database. There will be an auto-increment to create the new id, so you wont need the previous id (unless you want it for script output/debug).
5) The tricky part is handling the foreign keys. This process is going to vary wildly depending on what you plan on migrating. What you need to know is which foreign key goes to which (possibly new) primary key. If you're only migrating posts, you may need to hard-code a user id to user id mapping for the post_author column, then use this to replace the values.
But what if I don't know the user ids for the mapping because some users also need to be migrated?
This is where is gets tricky. You will need to first define the merge rules to see if a user already exists. For new users, you need record the id of the newly inserted users. Then after all users are migrated, the post_author value will need to be replaced when it references a newly merged user.
6) Write and test the script! Test it on dummy databases first. And again, make backups before using it on your databases!

I've done something simillar with ETL (Extract, Transform, Load) process when I was moving data from one CMS to another.
Rather than writing a script I used a Pentaho Data Integration (Kettle) tool.
The Idea of ETL is pretty much straight forward:
Extract the data (for instance from one database)
Transform it to suit your needs
Load it to the final destination (your second database).
The tool is easy to use and it allows you to experiment with various steps and outputs to investigate the data. When you design a right ETL proces, you are ready to merge those databases of yours.

How can this be done without running a tonne of mysql commands?
No way. If both local and web sites are running at the same time how can you prevent not having the same ids' with different content?

so if you want to do this you can use mysql repication.i think it will help you to merge with different database mysql.

Related

Redis CRUD patterns

i've recently started learning Redis and am currently building an app using it as sole datastore and I'd like to check with other Redis users if some of my conclusions are correct as well as ask a few questions. I'm using phpredis if that's relevant but I guess the questions should apply to any language as it's more of a pattern thing.
As an example, consider a CRUD interface to save websites (name and domain) with the following requirements:
Check for existing names/domains when saving/validating a new site (duplicate check)
Listing all websites with sorting and pagination
I have initially chosen the following "schema" to save this information:
A key "prefix:website_ids" in which I use INCR to generate new website id's
A set "prefix:wslist" in which I add the website id generated above
A hash for each website "prefix:ws:ID" with the fields name and website
The saving/validation issue
With the above information alone I was unable (as far as I know) to check for duplicate names or domains when adding a new website. To solve this issue I've done the following:
Two sets with keys "prefix:wsnames" and "prefix:wsdomains" where I also SADD the website name and domains.
This way, when adding a new website I can check if the submitted name or domain already exist in either of these sets with SISMEMBER and fail the validation if needed.
Now if i'm saving data with 50 fields instead of just 2 and wanted to prevent duplicates I'd have to create a similar set for each of the fields I wanted to validate.
QUESTION 1: Is the above a common pattern to solve this problem or is there any other/better way people use to solve this type of issue?
The listing/sorting issue
To list websites and sort by name or domain (ascending or descending) as well as limiting results for pagination I use something like:
SORT prefix:wslist BY prefix:ws:*->name ALPHA ASC LIMIT 0 10
This gives me 10 website ids ordered by name. Now to get these results I came to the following options (examples in php):
Option 1:
$wslist = the sort command here;
$websites = array();
foreach($wslist as $ws) {
$websites[$ws] = $redis->hGetAll('prefix:ws:'.$ws);
}
The above gives me a usable array with website id's as key and an array of fields. Unfortunately this has the problem that I'm doing multiple requests to redis inside a loop and common sense (at least coming from RDBMs) tells me that's not optimal.
The better way it would seem to be to use redis pipelining/multi and send all request in a single go:
Option 2:
$wslist = the sort command here;
$redis->multi();
foreach($wslist as $ws) {
$redis->hGetAll('prefix:ws:'.$ws);
}
$websites = $redis->exec();
The problem with this approach is that now I don't get each website's respective ID unless I then loop the $websites array again to associate each one. Another option is to maybe also save a field "id" with the respective website id inside the hash itself along with name and domain.
QUESTIONS 2/3: What's the best way to get these results in a usable array without having to loop multiple times? Is it correct or good practice to also save the id number as a field inside the hash just so I can also get it with the results?
Disclaimer: I understand that the coding and schema building paradigms when using a key->value datastores like Redis are different from RDBMs and document stores and so notions of "best way to do X" are likely to be different depending on the data and application at hand.
I also understand that Redis might not even be the most suitable datastore to use in mostly CRUD type apps but I'd still like to get any insights from more experienced developers since CRUD interfaces are very common on most apps.

Answer 1
Your proposal looks pretty common. I'm not sure why you need an auto-incrementing ID though. I imagine the domain name has to be unique, or the website name has to be unique, or at the very least the combination of the two has to be unique. If this is the case it sounds like you already have a perfectly good key, so why invent an integer key when you don't need it?
Having a SET for domains and a SET for website names is a perfect solution for quickly checking to see if a specific domain or website name already exists. Though, if one of those (domain or website name) is your key you might not even need these SETs since you could just look if the key prefix:ws:domain-or-ws-name-here exists.
Also, using a HASH for each website so you can store your 50 fields of details for the website inside is perfect. That is what hashes are for.
Answer 2
First, let me point out that if your websites and domain names are stored in SORTED SETs instead of SETs, they will already be alphabetized (assuming they are given the same score). If you are trying to support other sort options this might not help much, but wanted to point it out.
Your Option 1 and Option 2 are actually both relatively reasonable. Redis is lightning fast, so Option 1 isn't as unreasonable as it seems at first. Option 2 is clearly even more optimal from the perspective of redis since all the commands will be bufferred and executed all at once. Though, it will require additional processing in PHP afterwards as you noted if you want the array to be indexed by the id.
There is a 3rd option: lua scripting. You can have redis execute a Lua script that returns both the ids and hash values all in one shot. But, not being super familiar with PHP anymore and how redis's multibyte replies map to PHPs arrays I'm not 100% sure what the lua script would look like. You'll need to look for examples or do some trial and error. It should be a pretty simple script, though.
Conclusion
I think redis sounds like a decent solution for your problem. Just keep in mind the dataset needs to always be small enough to keep in memory. If that's not really a concern (unless your fields are huge, you should be able to fit thousands of websites into only a few MB) or if you don't mind having to upgrade your RAM to grow your DB, then Redis is perfectly suitable.
Be familiar with the various persistence options and configurations for redis and what they mean for availability and reliability. Also, make sure you have a backup solution in place. I would recommend having both a secondary redis instance that slaves off of your main instance, and a recurring process that backs up your redis database file at least daily.

Recreate a database using existing php code

So I have an old website which was coded over an extended period of time but has been inactive for 3 or so years. I have the full PHP source to the site, but the problem is I do not have a backup of the database any longer. I'm wondering what the best solution to recreating the database would be? It is a large site so manually going through each PHP file and trying to keep track of which tables are referenced is no small task. I've tried googling for the answer but have had no luck. Does anyone know of any tools that are available to help extract this information from the PHP and at least give me the basis of a database skeleton? Otherwise, has anyone ever had to do this? Any tips to help me along and possibly speed up the process? It is a mySQL database I'm trying to use.

The way I would do it:
Write a subset of SQLi or whatever interface was used to access the DB to intercept all DB accesses.
Replace all DB accesses with the dummy version of yours.
The basic idea is to emulate the DB so that the PHP code runs long enough to activate the various DB accesses, which in turn will allow you to analyze the way the DB is built and used.
From within these dummy functions:
print the SQL code used
regenerate just enough dummy results to let the rest of the code run, based on the tables and fields mentioned in the query parameters and the PHP code that retrieves them (you won't learn much from a SELECT *, but you can see what fields the PHP code expects to get from it)
once you have understood enough of the DB structure, recreate the tables and let the original code work on them little by little
have the previous designer flogged to death for not having provided a way to recreate the DB programatically

There are currently two answers based on the information you provided.
1) you can't do this
PHP is a typeless language. you could check you sql statements for finding field and table names. but it will not complete. if there is a select * from table, you can't see the fields. so you need to check there php accesses the fields. maybe by name or by index. you could be happy if this is done by name, because you can extract the name of the fields. finally the data types will missing. also missing: where are is an index on, what are primary keys, constrains etc.
2) easy, yes you can!
because your php is using a modern framework with contains a orm. this created the database for you. a meta information are included in the php classes/design.
just check the manual how to recreate the database.

Copy SQL From one Server to another and ignore Duplicatess

I have 2 websites that offer exactly the same content just with different layouts, Im currently updating both sites daily putting the same content on both, I want to make a php script or something similar and run it on a cron to automatically copy the DB from one server to the other server but i don't want it to duplicate the content already there, The Database has a unique field for which it can check against.
Thanks

You should have only the 1 database, with 2+ templates (skins) for displaying the content. Duplicating the same data in production, when there are no differences between the data sets, is somewhat pointless.
--EDIT-- May 15, 2012 # 7:06 PM ET
If you REALLY want to maintain duplicate production db's, I would suggest a web service that sends the data from one site to another. You can also think about using a trigger in your db. It really does depend on your setup, where the web server(s) are located, the DB server(s), etc.

PHP / MySQL Conceptual Database 'Sync' question

I am working on a PHP class implementing PDO to sync a local database's table with a remote one.
The Question
I am looking for some ideas / methods / suggestions on how to implement a 'backup' feature to my 'syncing' process. The ideas is: Before the actual insert of the data takes place, I do a full wipe of the local table's data. Time is not a factor so I figure this is the cleanest and simplest solution and I wont have to worry about checking for differences in the data and all that jazz. The Problem is, I want to implement some kind of security measure in case there is a problem during the insert of data, like loss of internet connection or something. The only idea I have so far is: Copy said table to be synced -> wipe said table -> insert remote tables data into local table -> if successful delete backup copy.

Check out mk-table-sync. It compares two tables on different servers, using checksums of chunks of rows. If a given chunk is identical between the two servers, no copying is needed. If the chunk differs, it copies just the chunk it needs. You don't have to wipe the local table.
Another alternative is to copy the remote data to a distinct table name. If it completes successfully, then DROP the old table and RENAME the new local copy to the original table's name. If the copy fails or is interrupted, then drop the local copy with the distinct name and try again. Meanwhile, your other local table with the previous data is untouched.

Following is Web tool that sync database between you and server or other developer.
It is Git Based. So you should use Git in project.
But it only helpful while developing Application. it is not tool for compare databases.
For Sync Databases you regularly push code to Git.
Git Project : https://github.com/hardeepvicky/DB-Sync

What Would be a Suitable Way to Log Changes Within a Database Using CodeIgniter

I want to create a simple auditing system for my small CodeIgniter application. Such that it would take a snapshot of a table entry before the entry has been edited. One way I could think of would be to create a news_audit table, which would replicate all the columns in the news table. It would also create a new record for each change with the added column of date added. What are your views, and opinions of building such functionality into a PHP web application?

There are a few things to take into account before you decide which solution to use:
If your table is large (or could become large) your audit trail needs to be in a seperate table as you describe or performance will suffer.
If you need an audit that can't (potentially) be modified except to add new entries it needs to have INSERT permissions only for the application (and to be cast iron needs to be on a dedicated logging server...)
I would avoid creating audit records in the same table as it might be confusing to another developer (who might no realize they need to filter out the old ones without dates) and will clutter the table with audit rows, which will force the db to cache more disk blocks than it needs to (== performance cost). Also to index this properly might be a problem if your db does not index NULLS. Querying for the most recent version will involve a sub-query if you choose to time stamp them all.
The cleanest way to solve this, if your database supports it, is to create an UPDATE TRIGGER on your news table that copies the old values to a seperate audit table which needs only INSERT permissions). This way the logic is built into the database, and so your applications need not be concerned with it, they just UPDATE the data and the db takes care of keeping the change log. The body of the trigger will just be an INSERT statement, so if you haven't written one before it should not take long to do.
If I knew which db you are using I might be able to post an example...

What we do (and you would want to set up archiving beforehand depending on size and use), but we created an audit table that stores user information, time, and then the changes in XML with the table name.
If you are in SQL2005+ you can then easily search the XML for changes if needed.
We then added triggers to our table to catch what we wanted to audit (inserts, deletes, updates...)
Then with simple serialization we are able to restore and replicate changes.

What scale are we looking at here? On average, are entries going to be edited often or infrequently?
Depending on how many edits you expect for the average item, it might make more sense to store diff's of large blocks of data as opposed to a full copy of the data.

One way I like is to put it into the table itself. You would simply add a 'valid_until' column. When you "edit" a row, you simply make a copy of it and stamp the 'valid_until' field on the old row. The valid rows are the ones without 'valid_until' set. In short, you make it copy-on-write. Don't forget to make your primary keys a combination of the original primary key and the valid_until field. Also set up constraints or triggers to make sure that for each ID there can be only one row that does not have it's valid_until set.
This has upsides and downsides. The upside is less tables. The downside is far more rows in your tables. I would recommend this structure if you often need to access old data. By simply adding a simple WHERE to your queries you can query the state of a table at a previous date/time.
If you only need to access your old data occasionally then I would not recommend this though.
You can take this all the way to the extreme by building a Temportal database.

In small to medium size project I use the following set of rules:
All code is stored under Revision Control System (i.e. Subversion)
There is a directory for SQL patches in source code (i.e. patches/)
All files in this directory start with serial number followed by short description (i.e. 086_added_login_unique_constraint.sql)
All changes to DB schema must be recorded as separate files. No file can be changed after it's checked in to version control system. All bugs must be fixed by issuing another patch. It is important to stick closely to this rule.
Small script remembers serial number of last executed patch in local environment and runs subsequent patches when needed.
This way you can guarantee, that you can recreate your DB schema easily without the need of importing whole data dump. Creating such patches is no brainer. Just run command in console/UI/web frontend and copy-paste it into patch if successful. Then just add it to repo and commit changes.
This approach scales reasonably well. Worked for PHP/PostgreSQL project consisting of 1300+ classes and 200+ tables/views.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.