We created a import script which imports about 120GB of data into a MySQL database. The data is saved in a few hunderd directories (all are seperated databases). Each directory contains files with table structures and table data.
The issue being; it works on my local machine with a subset of the actual data, but when the import is ran on the server (Which takes a few days). Not all the tables are created (even tables that are tested locally). The odd thing is that the script, when ran on the server does not show any errors on the creation of the tables.
Here is on a high level how the script works:
Find all directories that represent a database
Create all databases
Per database loop through the tables: create table, fill table
Added the code on gist: https://gist.github.com/3349872
Add more logging to see steps that succeded since you might be having problems with memory usage or execution times.
Why dont you create sql files from given CVS files and then just do normal importing in bash?
mysql -u root -ppassword db_something< db_user.sql
Auch, the problem was in the code. Amazingly stupid mistake.
When testing the code on a subset of all the files all table information and table content where available. When a table count not be created the function enters a logging statement and than returns. On the real data this was a the mistake because there are files with no data and no structure so after creating a few tables this creation of the tables of a certain database went wrong and did a return and so didn't create the other tables.
Related
I have a MySQL database on which i want to import data from multiple csv files. For the data I provided a table on which I want to merge the several files into one (join). Unfortunately I have the problem that my data is too big and therefore it is quite time-consuming until I get everything stored in the table. Therefore the question: What is the best way to deal with a huge amount of data?
I took the liberty to create a temporary table for each csv file and load the data into it. Then I joined all tables and wanted to insert the result of my query into the big table and there I already had the problem with the long waiting time. I would like to limit the solutions to the following languages: MySQL, PHP. So far I used the GUI of datagrip and the sql-console for importing these files.
Use any data integration tool like Pentaho, then follow the below steps:
Pentaho has CSV import object
You could join multiple CSV file using join object
Select all the columns from merging output
Then push it to MySQL using DB connector output object
There is a pretty neat library that does exactly this. Helps you to migrate data from one source to another. And it does pretty quickly.
https://github.com/DivineOmega/uxdm
You could use a shell script to loop through the files (this one assumes they're in the current directory)
#!/bin/bash
for f in *.csv
do
mysql -e "load data infile '"$f"' into table my_table" -u username --password=your_password my_database
done
You can achieve this easily with the use of pentaho data integration (ETL tool).
It provided us csv data input in which you can mention your csv file. then link to table output step in which you can use jdbc or jndi connection of your mysql database.
Hi I'm building a enterprise management system (php based) for a medium size company. I'm trying to migrate their existing customer records about (9000 records) into my db. Our db schemas are different.
Here are the steps I'm planning to take:
1.) Get the .csv file for each table and clean it up (get rid of unnecessary columns, remove blanks rows which seem to be littered throughout table)
2.) Import the tables into my database via phpmyadmin
3.) Write a php script to loop grab tables with this old data and then process and insert them into MY db tables
I was wondering if this plan I outlined above make sense or is the optimal way to do it?
Thanks
There is an data migration is possible in MySQL Workbench 6.0. I have migrated more than millions of record so this is not big deal.
Try
http://www.mysql.com/products/workbench/migrate/
Here's the situation:
I have a mySQL db on a remote server. I need data from 4 of its tables. On occasion, the schema of these tables is changed (new fields are added, but not removed). At the moment, the tables have > 300,000 records.
This data needs to be imported into the localhost mySQL instance. These same 4 tables exist (with the same names), but the fields needed are a subset of the fields in the remote db tables. The data in these local tables is considered read-only and is never written to. Everything needs to be run in a transaction so there is always some data in the local tables, even if it is a day old. The localhost tables are used by an active website, so this entire process needs to complete as quickly as possible to minimize downtime.
This process runs once per day.
The options as I see them:
Get a mysqldump of the structure/data of the remote tables and save to file. Drop the localhost tables, and run the dumped sql script. Then recreate the needed indexes on the 4 tables.
Truncate the localhost tables. Run SELECT queries on the remote db in PHP and retrieve only the fields needed instead of the entire row. Then loop through the results and create INSERT statements from this data.
My questions:
Performance wise, which is my best option?
Which one will complete the fastest?
Will either one put a heavier load on the server?
Would indexing the
tables take the same amount of time in both options?
If there is no good reason for having the local d/b be a subset of the remote, make the structure the same and enable database replication on the needed tables. Replication works by the master tracking all changes made, and managing each slave d/b's pointer into the changes. Each slave says give me all changes since the last request. For a sizeable database, this is far more efficient than any alternative you have selected. It comes with only modest cost.
As for schema changes, I think the alter information is logged by the master, so the slave(s) can replicate those as well. The mechanism definitely replicates drop table ... if exists and create table ... select, so alter logically should follow, but I have not tried it.
Here it is: confirmation that alter is properly replicated.
If I have exported a .sql file with my database in it, can I then only import "parts" of that database instead of the entire database to MySql?
The question appeared when I was trying it out on a test database.
I exported the testdatabase.
Then emptied some of the tables in the database.
Then I planned on importing from the .sql file and hope the emptied tables would be refilled with whatever they where populated with.
But I get an error:
#1007 Can't create database 'database_name' - database exists
Offcourse it exists, but is it possible to only import values of the already existing tables from the .sql backup?
Or must I remove the entire database and then import the database?
FYI I am using PhpMyAdmin for this currently.
It's straightforward to edit the file and remove the parts you're not interested in having restored, Camran.
Alternatively - import the entire file into a separate database (change the database name # the top of the file) and then use INSERT statements to copy the data from the tables in this new database to the other.
I solved this problem by writing a script to dump each table into each individual file and all the CREATE TABLE statements off in their own file. It's gotten a bit long and fancy over the years, so I can't really post it here.
The other approach is to tell MySQL to ignore errors. With the CLI, you provide the -f switch, but I'm not familiar enough with PhpMyAdmin to know how to do that.
I'm creating locally a big database using MySQL and PHPmyAdmin. I'm constantly adding a lot of info to the database. I have right now more than 10MB of data and I want to export the database to the server but I have a 10MB file size limit in the Import section of PHPmyAdmin of my web host.
So, first question is how I can split the data or something like that to be able to import?
BUT, because I'm constantly adding new data locally, I also need to export the new data to the web host database.
So second question is: How to update the database if the new data added is in between all the 'old/already uploaded' data?
Don't use phpMyAdmin to import large files. You'll be way better off using the mysql CLI to import a dump of your DB. Importing is very easy, transfer the SQL file to the server and afterwards execute the following on the server (you can launch this command from a PHP script using shell_exec or system if needed) mysql --user=user --password=password database < database_dump.sql. Of course the database has to exist, and the user you provide should have the necessary privilege(s) to update the database.
As for syncing changes : that can be very difficult, and depends on a lot of factors. Are you the only party providing new information or are others adding new records as well? Are you going modify the table structure over time as well?
If you're the only one adding data, and the table structure doesn't vary then you could use a boolean flag or a timestamp to determine the records that need to be transferred. Based on that field you could create partial dumps with phpMyAdmin (by writing a SQL command and clicking Export at the bottom, making sure you only export the data) and import these as described above.
BTW You could also look into setting up a master-slave scenario with MySQL, where your data is transferred automatically to the other server (just another option, which might be better depending on your specific needs). For more information, refer to the Replication chapter in the MySQL manual.
What I would do, in 3 steps:
Step 1:
Export your db structure, without content. This is easy to manage on the exporting page of phpmyadmin. After that, I'd instert that into the new db.
Step 2:
Add a new BOOL column in your local db in every table. The function of this is, to store if a data is new, or even not. Because of this set the default to true
Step 3:
Create a php script witch connects to both databases. The script needs to get the data from your local database, and put it into the new one.
I would do this with following mysql methods http://dev.mysql.com/doc/refman/5.0/en/show-tables.html, http://dev.mysql.com/doc/refman/5.0/en/describe.html, select, update and insert
then you have to run your script everytime you want to sync your local pc with the server.