I've been having an issue for days now and have hit a brick wall. Firstly, as the title suggests I have been working to import CSV files to a SQL database.
To be more specific, this is done through PHP scripts on the server and through MySQL into the DB.
I currently have around 30 CSV files (this number is projected to increase) which are updated daily, then a cron script is triggered once per day to update the new data. It loads the file through LOAD DATA INFILE. All of this works perfectly.
The problem is:
Each CSV file contains a different column count. The column count ranges between 50-56 columns. The data I am storing in this collective database only requires the first 8 columns. I already know how to skip individual columns using #dummy thanks to the following Q&A: How to skip columns in CSV file when importing into MySQL table using LOAD DATA INFILE?
However, as the dummy count will not always be the same due to the different column counts, I was wondering if there was a way to get the data from columns 1-8 then ignore all after regardless of column count?
A rather rough patch up would be to first read the beginning line in php, to count columns by commas. Then knowing the amount, subtract 8 and generate the sql command now knowing how many columns you need to ignore.
Just include the eight columns to populate and it will us the first eight from the CSV row:
LOAD DATA INFILE 'file.txt' INTO TABLE t1 (c1, c2, c3, c4, c5, c6, c7, c8)
Related
I get new 10000s of xml files data everyday.
and I always have run a query to see if there is any new data into those XML files and if that doesn't exists into our database then insert that data into our table.
Here is the code
if(!Dictionary::where('word_code' , '=' , $dic->word_code)->exists()) {
// then insert into the database.
}
where $dic->word_code is coming from thousands of XML files. every time it opens up the new XML file one by one then check this record exists then open a new XML file and check if it doesn't exists then insert the record then move to another file and do the same procedure with 10000 XML of files.
each XML file is about 40 to 80mb which has lots of data.
I already have 2981293 rows so far and checking against 2981293 rows with my XML files then inserting the row seems to be really time-consuming and resource greedy task.
word_code is already index.
The current method takes about 8 hours to finish up the procedure.
By the way I must mention this that after running this huge procedure of 8 hours it downloads about 1500 to 2000 rows of data per day.
Comparing the file to the database line by line is the core issue. Both the filesystem and databases support comparing millions of rows very quickly.
You have two options.
Option 1:
Keep a file backup of the previous run to run filesystem compare to find differences in the file.
Option 2:
Load the XML file into a MySQL table using LOAD DATA INFILE. Then run a query on all rows to find both new and changed rows. Be sure to index the table with a well defined unique key to keep this query efficient.
I would split this job into two tasks:
Use your PHP script to load the XML data unconditionally in a temporary table that has no constraints, no primary key, no indexes. Make sure to truncate that table before loading the data.
Perform one single INSERT statement, to merge records from that temporary table into your main table, possibly with an ON DUPLICATE KEY UPDATE or IGNORE option, or otherwise a negative join clause. See INSERT IGNORE vs INSERT … ON DUPLICATE KEY UPDATE
For the second point, you could for instance do this:
INSERT IGNORE
INTO main
SELECT *
FROM temp;
If the field to compare is not a primary key in the main table, or is not uniquely indexed, then you might need a statement like this:
INSERT INTO main
SELECT temp.*
FROM temp
LEFT JOIN main m2
ON m2.word_code = temp.word_code
WHERE m2.word_code is NULL;
But this will be a bit slower than a primary-key based solution.
I have built a database with 6 tables, roughly 175 fields. About 130 of these fields are to be populated from data on a CSV.
Currently, a handheld device exports this CSV and it is read into a spreadsheet but it's moving to a database. So, on the front end when someone uploads a CSV, it will populate the database.
Question:
I'm trying to figure out the best way to break that CSV up line by line and put certain info into certain tables. Is that possible? If so how?
I was hoping I could query to create a header for each CSV field and map it to database fields (Since the CSV will always be in the same order).
I don't think of it as a RBAR problem. If you load the file as-is into a single staging table, you can then run something like the following for each table:
INSERT INTO destTable (col1, col2)
SELECT col1, col2
FROM StageTable
WHERE col3 = 'criteria'
That way, you keep everything set-based. Of course it depends on the number of records involved, but processing data row by row and TSQL are generally not a good fit. SSIS does a much better job of that than TSQL.
tag it by associative array in columns example
id,name,color
1,jo,red
2,ma,blue
3,j,yellow
get the first line in one array, so just compare by index the value in a loop
I have a database table with 6 columns of 365 rows of data. I need to swap the 3rd column (named 'Date_line') with new data while leaving the other 5 columns in place, without exporting the whole table, but can't get phpMyAdmin to work with me.
Normally I'd just truncate the table and upload a revised CSV file for the whole table, but here's the catch: I have to update 232 data tables with this same exact column of data (the column data is common to all 232 tables). To do all 232 individually would mean exporting each table, opening it in Excel, swapping the old column for the new one, converting to CSV then re-uploading. It would be a lot easier if I could just import a single column CSV to overwrite the old one. But I don't know how.
I'd like to do this using the phpMyAdmin interface... I'm not much experienced in assigning scripts. Is there a way?
I have a large excel file of ship locations for the next 2 years. Currently, I am manually splitting this file into multiple files and then importing into MYSQL. This wont work for long though, as the Excel file gets updated everyday and needs an easier way to be imported.
The data doesnt start till row 10, Column D, and goes till Column I, then another Ship starts on Row 10 Column J to O, ect, for 22 ships. (Row 10 being the header titles)
Is there a way to automate this? I have done some research and found I probably need to convert the XLS to CSV which isnt a problem, but I havnt found a way to state Row 10 Column's D-I into Table1, Column J to O to Table2, Column's P-U to Table3, ect.
Can someone point me in the right direction or provide some assistance. Thanks for all your help!
It's hard to tell not knowing all the details but I would propose the following flow:
Save your file as CVS (can be automated via VBA, PowerShell...)
Load it as it is in a staging table using LOAD DATA INFILE
If necessary validate, clean, normalize the data in the staging table
Use INSERT INTO ... SELECT ... to copy data to factual tables (selecting only necessary columns e.g. from J to O...)
Truncate the staging table
If the rules of data extraction are deterministic steps 2-5 can wrapped in a stored procedure.
I have a question to which I have been unable to find the answer.
I can create an extra column in a PHP recordset by using an existing column and duplicating it:
SELECT
id_tst,
name_tst,
age_tst,
price_tst,
price_tst AS newprice_tst
FROM test_tst
From what I can work out the AS will only duplicate an existing colulmn or rename a column rs.
I want to add two extra columns to a table, but with no values.
I know a lot of people will say whats the point in that; it's pointless to have 2 columns with no data.
The reason is I am doing a price updating module for a CMS system, where the user can download a .csv file containing prices, modify the prices in a spreadsheet then re-upload the CSV to update he prices.
the two extra columns would be to hold the new prices keeping the old so a roll back from the CSV file could be performed if nessecary.
I could just get the client to add the two new colulmns into the spreadsheet, but would prefer to have the exported CSV with the columns already in place.
Is it possible to create blank columns when creaing an rs?
You can create empty "dummy" columns by aliasing a blank string:
SELECT '' AS emptyColumn, column1, column2 FROM table
This will produce another column in your query with all blank values.