Mysql live database migration/conversion - php

This is probably any team will encounter at some point so I'm counting on experience other guys had.
We are in a process of migrating old MySQL database to a new database with structure changed quite a bit. Some tables were split into multiple smaller tables, some data was joined from multiple smaller to one larger table.
We ran a test and it takes a few hours to migrate database to a new form. The problem is, the old database is our production database, changes every minute. We cannot have a few hours downtime.
What approach do you think would be ok in such a situation?
Let's say you have table called "users" with 1M rows. It's being changed every second. Some fields are updated, some rows are added and some rows are deleted. That's the problem why we cannot make a snapshot at certain point of time because after the migration is done, we would have 3 hours of data unsynced.

One approach I've used in the past was to use replication.
We created a replication scheme between the old production database and a slave which was used for the data migration. When we started the migration, we switched off the replication temporarily, and used the slave database as the data source for the migration; the old production system remained operational.
Once the migration script had completed, and our consistency checks had run, we re-enabled replication from the old production system to the replicated slave. Once the replication had completed, we hung up the "down for maintenance" sign on production, re-ran the data migration scripts and consistency checks, pointed the system to the new database, and took down the "down for maintenance" sign.
There was downtime, but it was minutes, rather than hours.
This does depend on your database schema to make it easy to identify changed/new data.
If your schema does not lend itself to easy querying to find new or changed records, and you don't want to add new columns to keep track of this, the easiest solution is to create separate tables to keep track of the migration status.
For instance:
TABLE: USERS (your normal, replicated table)
----------------------
USER_ID
NAME
ADDRESS
.....
TABLE: USERS_STATUS (keeps track of changes, only exists on the "slave")
-----------------
USER_ID
STATUS
DATE
You populate this table via a trigger on the USERS table for insert, delete and update - for each of those actions, you set a separate status.
This allows you to quickly find all records that changed since you ran your first migration script, and only migrate those records.
Because you're not modifying your production environment, and the triggers only fire on the "slave" environment, you shouldn't introduce any performance or instability problems on the production environment.

There's one approach I used once and that should work for you too, however you'll need to do modify your production datasets for that. Just briefly:
Add a new column named "migrated" (or so) to every table you want to migrate. Give it a boolean type. Set it to 0 by default.
When your migration script runs it has to set this flag to 1 for every entry that has been migrated to the new db. All entries that are already "1" have to be ignored. That way you won't run into synchronization issues.
That way you can run the migration script as often as you like.
You will have a downtime, but it will be just a minimal one because during that downtime you only have to migrate a few datasets (practically the last "delta" between the last run of the migration script and now).

Could you run the new database in parallel with the current one? That way you can later migrate the old data from your old db to your new one and your "live" situation will already have been captured on the new one.
What I mean is: when you write something to the old db, you will also have to write the data to the new one.

Related

Auto sync multiple sql tables with different table structure

I'm going to build a new version of my website. The old one is written in Cakephp and for the new one I'm going to use Laravel for that I need to split some large tables into smaller ones.
An example is given above.
But till the development of the new project complete, I need to sync data between these tables.
I cant use Cron or Replication in this situation. Coz Cron job will take time, I cant update the Old code coz it will also take time.
So how do I do this?

Clone object and save in a different table

I have a number of tables with a large number of rows, some of them nearing a million. There are background tasks which keep accessing some recent records in these tables. Because of the ever increasing size, the tasks keep on taking a longer time to complete. Besides, when showing data on the front end, the calls to server also take a very long time.
Hence, I thought it is better to create a replica of such tables (as an archive) and keep saving data in these 'archive' tables (for future use if any). The idea is that whenever a record is completely processed, it will be deleted from the 'live' tables and be stored in the 'archive' tables.
PHP clone does not work as it creates an entity exactly same as the orginal.
One definite way is to follow exact same steps to create the entity, and always simultaneously keep on modifying.
Is there a better way to do this?
What you are looking for is "Partitioning". Both MySQL and Postgres have some beefy manuals.
Probably the best way to implement this is to use a daemon script that runs the partitioning queries every X time.

Update a whole table with PHP every x minutes according to a CSV

I have to update a big table (products) in a MySQL database, every 10 minutes with PHP. I have to run the PHP script with cron job, and I get the most up to date products from a CSV file. The table has currently ~18000 rows, and unfortunately I can not tell how much it will change in a 10 min period. The most important thing is of course I do not want the users to notice the update in the background.
These are my ideas and fears:
Idea1: I know that there is a way to load a csv file into a table with MySQL, so maybe I can use a transaction to truncate the table, and import the CSV. But even if I use transactions, as long as the table is large, I'm afraid that there will be a little chance for some users to see the empty database.
Idea2: I could compare the old and the new csv file with a library and only update/add/remove the changed rows. This way I think there it's not possible for a user to see an empty database, but I'm afraid this method will cost a lot of RAM and CPU, and I'm on a shared hosting.
So basically I would like to know which method is the most secure to update a table completely without the users noticing it.
Assuming InnoDB and default isolation level, you can start a transaction, delete all rows, insert your new rows, then commit. Before the commit completes, users will see the previous state.
While the transaction is open (after the deletes), updates will block, but SELECTs will not. Since it's a read only table for the user, it won't be an issue. They'll still be able to SELECT while the transaction is open.
You can learn the details by reading about MVCC. The gist of it is that any time someone performs a SELECT, MySQL uses the data in the database plus the rollback segment to fetch the previous state until the transaction is committed or rolled back.
From MySQL docs:
InnoDB uses the information in the rollback segment to perform the
undo operations needed in a transaction rollback. It also uses the
information to build earlier versions of a row for a consistent read.
Only after the commit completes will the users see the new data instead of the old data, and they won't see the new data until their current transaction is over.

dump selected data from one db to another in mysql

Here's the situation:
I have a mySQL db on a remote server. I need data from 4 of its tables. On occasion, the schema of these tables is changed (new fields are added, but not removed). At the moment, the tables have > 300,000 records.
This data needs to be imported into the localhost mySQL instance. These same 4 tables exist (with the same names), but the fields needed are a subset of the fields in the remote db tables. The data in these local tables is considered read-only and is never written to. Everything needs to be run in a transaction so there is always some data in the local tables, even if it is a day old. The localhost tables are used by an active website, so this entire process needs to complete as quickly as possible to minimize downtime.
This process runs once per day.
The options as I see them:
Get a mysqldump of the structure/data of the remote tables and save to file. Drop the localhost tables, and run the dumped sql script. Then recreate the needed indexes on the 4 tables.
Truncate the localhost tables. Run SELECT queries on the remote db in PHP and retrieve only the fields needed instead of the entire row. Then loop through the results and create INSERT statements from this data.
My questions:
Performance wise, which is my best option?
Which one will complete the fastest?
Will either one put a heavier load on the server?
Would indexing the
tables take the same amount of time in both options?
If there is no good reason for having the local d/b be a subset of the remote, make the structure the same and enable database replication on the needed tables. Replication works by the master tracking all changes made, and managing each slave d/b's pointer into the changes. Each slave says give me all changes since the last request. For a sizeable database, this is far more efficient than any alternative you have selected. It comes with only modest cost.
As for schema changes, I think the alter information is logged by the master, so the slave(s) can replicate those as well. The mechanism definitely replicates drop table ... if exists and create table ... select, so alter logically should follow, but I have not tried it.
Here it is: confirmation that alter is properly replicated.

How to add index to 18 GB innodb mysql table without affecting production performance?

How can I add index to an 18 GB innodb mysql table without affecting the production performance? The table is frequently accessed, I tried altering the table just now and it turns up to have locked more than 200 queries out, and that's bad for performance. Are there any other ways to do it?
TheOnly92 - there is another option, one that even amazon and ebay use. it's not considered 'bad' form to have infrequesnt maintenence periods where the site is unaccesible. on those occassions, you'll see a 404 maintenence page being displayed with a user friendly messge saying that the site is undergoing essential upgrades between the hours of .... etc
this might be the most pragmatic solution as creating this page will take you 5 mins, whereas the other option may take many hours to figure out and many more to implement. also, as it would be infrequent, then it's unlikely that your users would be put off by such a message or period of downtime.
jim
Another option is to use pt-online-schema-change. It will create a copy of the old table with the new index and create trigger that will reflect all changes from the old table to the new one. In end, it will change the name of the old table.
It depends, how critical is it that you don't lose new records?
Duplicate the table structure using CREATE TABLE ... LIKE ..., add the new index to the duplicate table, do a INSERT INTO ... SELECT ... FROM ... to grab all the data, then run a pair of ALTERs to rename the old table, then rename the new table to the old table's name.
Unfortunately if any data in the old table changes between the time that the INSERT/SELECT runs and the tables get renamed, it may be lost. It might be possible to get the data back using one of the Maatkit tools for table comparison.
Another common pattern relies on duplicate hardware. Set up a replication slave, add the index there, then make the slave the master.
Neither of these is likely to be fast, but they'll probably be faster than adding the index directly. If you can afford the downtime, however, you really should just take the system down while you're altering/copying/switching slaves. Not having to worry about getting the data back in sync will make your life easier.
(You may wish to consider switching to a database that lets you add indexes without locking the table.)

Categories