Import data from multiple csv data into one master table - php

I have a MySQL database on which i want to import data from multiple csv files. For the data I provided a table on which I want to merge the several files into one (join). Unfortunately I have the problem that my data is too big and therefore it is quite time-consuming until I get everything stored in the table. Therefore the question: What is the best way to deal with a huge amount of data?
I took the liberty to create a temporary table for each csv file and load the data into it. Then I joined all tables and wanted to insert the result of my query into the big table and there I already had the problem with the long waiting time. I would like to limit the solutions to the following languages: MySQL, PHP. So far I used the GUI of datagrip and the sql-console for importing these files.

Use any data integration tool like Pentaho, then follow the below steps:
Pentaho has CSV import object
You could join multiple CSV file using join object
Select all the columns from merging output
Then push it to MySQL using DB connector output object

There is a pretty neat library that does exactly this. Helps you to migrate data from one source to another. And it does pretty quickly.
https://github.com/DivineOmega/uxdm

You could use a shell script to loop through the files (this one assumes they're in the current directory)
#!/bin/bash
for f in *.csv
do
mysql -e "load data infile '"$f"' into table my_table" -u username --password=your_password my_database
done

You can achieve this easily with the use of pentaho data integration (ETL tool).
It provided us csv data input in which you can mention your csv file. then link to table output step in which you can use jdbc or jndi connection of your mysql database.

Related

Generate .sql file vs execute queries with php

I am trying to import some data in one table from a database to another database.
I cannot just copy them, because format of both tables of the two databases are different.
With the fetched data from one database, I am able to create insert queries.
I want to know which is better:
Execute those queries in PHP itself by creating a new connection to second database.
Write all queries to .sql file and then import it directly in second database.
I am looking at the aspects of performance and ease of implementation.
Note: I am expecting the data in the table to be more than ten thousand rows
If you go with the first Option, There are chances that you could make some mistakes.
I prefer you to go with the Second option to Write all queries to .sql file and then import it directly in second database. Thanks
vJ
I would certainly go for the second option. Why use php for a one time action.
You can just solve this in the database with SQL only
I would go for the second option.
Then I would:
get an overview over both table structures
Export the data from the first table in a flat file format like CSV.
If necessary, transform the data from the first table to the second using a script or a tool.
Import the modified data into the second table.
The database vendors have good tools for exporting, manipulating and importing data.
If only the name of the tables are different, vendor tools importing feature often have good functionality for mapping data from one table to another. In my own case, I've used Oracle SQL developer, but please let me know your vendor and I can give you a pointer in the right direction.

Best practice, import data feed and download image on every row with PHP as fast as possible

I have multiple CSV files (150k-500k lines for now) with data I want to import to my MySQL DB.
This is my workflow at the moment:
Import files to a temporary table in db (raw lines)
Select one line at the time, explode it to an array, clean it up and import it.
Every item has an image, and I download it using curl. After downloading it I resize it with codeigniters resizer (gd2). Both this steps are absolutely necessary, and takes time. I want (need) to delete and reimport fresh files daily to keep the content fresh.
The reason for the temporary db save was to se if I could spawn multiple instances of the import script with crontab. This didn’t give me the results that I wanted.
Do you have any design ideas on how I can do this in a “fast” way?
The site is running on a 4GB 1.8 Ghz Dual core dedicated server.
Thanks :)
MySQL has a feature called LOAD DATA INFILE which does exactly what it sounds like you're trying to do.
From the question, it's not clear whether you're using it already or not? But even if you are, it sounds like you could improve the way you're doing it.
A SQL script like this could work for you:
LOAD DATA INFILE filename.csv
INTO table tablename
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(
field1,
field2,
field3,
#var1,
#var2,
etc
)
SET
field4 = #var1 / 100,
field5 = (SELECT id FROM table2 WHERE name=#var2 LIMIT 1),
etc
That's a fairly complex example, showing how you can import your CSV data directly into your table, and manipulate it into the correct format all in one go.
The great thing about this is that it's actually very quick. We use this to import a 500,000 record file on a weekly basis, and it is several orders of magnitude faster than a PHP program that would read the file and write to the DB. We do run it from a PHP program, but PHP isn't responsible for any of the importing; MySQL does everything itself from the one query.
In our case, even though we do manipulate the import data a lot, we still write it to a temp table, as we have about a dozen further processing steps before it goes into the master table. But in your case it sounds like this method may save you from having to use a temp table at all.
MySQL manual page: http://dev.mysql.com/doc/refman/5.1/en/load-data.html
As for downloading the images, I'm not sure how you could speed that up, other than keeping an eye on which of the imported records have been updated, and only fetching the images for the records that have changed. But I'm guessing if that's a viable solution then you're probably doing it already.
Still, I hope the MySQL suggestion is helpful.
The fastest thing to do is use threading.
I would suggest two Workers, one with a connection to MySQL and one to download and resize your images, open the CSV, read it using fgets or whatever, with each line, create a Stackable that will insert into the database, pass that stackable to another that will download the file ( and know the ID of the row where the data is stored ) and resize it. You might want to employ more than one worker for images ...
http://docs.php.net/Worker
http://docs.php.net/Stackable
http://docs.php.net/Thread
(be sure to reference docs.php.net, the docs build is a little behind)
http://pthreads.org (a basic breakdown of how things work to be found on index)
http://github.com/krakjoe/pthreads (windows downloads available here if you want to test locally )
http://pecl.php.net/package/pthreads (last release is a little out of date)

Importing large amounts of data using MySQL/PHP

We created a import script which imports about 120GB of data into a MySQL database. The data is saved in a few hunderd directories (all are seperated databases). Each directory contains files with table structures and table data.
The issue being; it works on my local machine with a subset of the actual data, but when the import is ran on the server (Which takes a few days). Not all the tables are created (even tables that are tested locally). The odd thing is that the script, when ran on the server does not show any errors on the creation of the tables.
Here is on a high level how the script works:
Find all directories that represent a database
Create all databases
Per database loop through the tables: create table, fill table
Added the code on gist: https://gist.github.com/3349872
Add more logging to see steps that succeded since you might be having problems with memory usage or execution times.
Why dont you create sql files from given CVS files and then just do normal importing in bash?
mysql -u root -ppassword db_something< db_user.sql
Auch, the problem was in the code. Amazingly stupid mistake.
When testing the code on a subset of all the files all table information and table content where available. When a table count not be created the function enters a logging statement and than returns. On the real data this was a the mistake because there are files with no data and no structure so after creating a few tables this creation of the tables of a certain database went wrong and did a return and so didn't create the other tables.

Only import tables from a complete MySql database export

If I have exported a .sql file with my database in it, can I then only import "parts" of that database instead of the entire database to MySql?
The question appeared when I was trying it out on a test database.
I exported the testdatabase.
Then emptied some of the tables in the database.
Then I planned on importing from the .sql file and hope the emptied tables would be refilled with whatever they where populated with.
But I get an error:
#1007 Can't create database 'database_name' - database exists
Offcourse it exists, but is it possible to only import values of the already existing tables from the .sql backup?
Or must I remove the entire database and then import the database?
FYI I am using PhpMyAdmin for this currently.
It's straightforward to edit the file and remove the parts you're not interested in having restored, Camran.
Alternatively - import the entire file into a separate database (change the database name # the top of the file) and then use INSERT statements to copy the data from the tables in this new database to the other.
I solved this problem by writing a script to dump each table into each individual file and all the CREATE TABLE statements off in their own file. It's gotten a bit long and fancy over the years, so I can't really post it here.
The other approach is to tell MySQL to ignore errors. With the CLI, you provide the -f switch, but I'm not familiar enough with PhpMyAdmin to know how to do that.

SQL/PHP: How to upload big database to server when I have import file size limit? And then update

I'm creating locally a big database using MySQL and PHPmyAdmin. I'm constantly adding a lot of info to the database. I have right now more than 10MB of data and I want to export the database to the server but I have a 10MB file size limit in the Import section of PHPmyAdmin of my web host.
So, first question is how I can split the data or something like that to be able to import?
BUT, because I'm constantly adding new data locally, I also need to export the new data to the web host database.
So second question is: How to update the database if the new data added is in between all the 'old/already uploaded' data?
Don't use phpMyAdmin to import large files. You'll be way better off using the mysql CLI to import a dump of your DB. Importing is very easy, transfer the SQL file to the server and afterwards execute the following on the server (you can launch this command from a PHP script using shell_exec or system if needed) mysql --user=user --password=password database < database_dump.sql. Of course the database has to exist, and the user you provide should have the necessary privilege(s) to update the database.
As for syncing changes : that can be very difficult, and depends on a lot of factors. Are you the only party providing new information or are others adding new records as well? Are you going modify the table structure over time as well?
If you're the only one adding data, and the table structure doesn't vary then you could use a boolean flag or a timestamp to determine the records that need to be transferred. Based on that field you could create partial dumps with phpMyAdmin (by writing a SQL command and clicking Export at the bottom, making sure you only export the data) and import these as described above.
BTW You could also look into setting up a master-slave scenario with MySQL, where your data is transferred automatically to the other server (just another option, which might be better depending on your specific needs). For more information, refer to the Replication chapter in the MySQL manual.
What I would do, in 3 steps:
Step 1:
Export your db structure, without content. This is easy to manage on the exporting page of phpmyadmin. After that, I'd instert that into the new db.
Step 2:
Add a new BOOL column in your local db in every table. The function of this is, to store if a data is new, or even not. Because of this set the default to true
Step 3:
Create a php script witch connects to both databases. The script needs to get the data from your local database, and put it into the new one.
I would do this with following mysql methods http://dev.mysql.com/doc/refman/5.0/en/show-tables.html, http://dev.mysql.com/doc/refman/5.0/en/describe.html, select, update and insert
then you have to run your script everytime you want to sync your local pc with the server.

Categories