I have built a database with 6 tables, roughly 175 fields. About 130 of these fields are to be populated from data on a CSV.
Currently, a handheld device exports this CSV and it is read into a spreadsheet but it's moving to a database. So, on the front end when someone uploads a CSV, it will populate the database.
Question:
I'm trying to figure out the best way to break that CSV up line by line and put certain info into certain tables. Is that possible? If so how?
I was hoping I could query to create a header for each CSV field and map it to database fields (Since the CSV will always be in the same order).
I don't think of it as a RBAR problem. If you load the file as-is into a single staging table, you can then run something like the following for each table:
INSERT INTO destTable (col1, col2)
SELECT col1, col2
FROM StageTable
WHERE col3 = 'criteria'
That way, you keep everything set-based. Of course it depends on the number of records involved, but processing data row by row and TSQL are generally not a good fit. SSIS does a much better job of that than TSQL.
tag it by associative array in columns example
id,name,color
1,jo,red
2,ma,blue
3,j,yellow
get the first line in one array, so just compare by index the value in a loop
Related
I get new 10000s of xml files data everyday.
and I always have run a query to see if there is any new data into those XML files and if that doesn't exists into our database then insert that data into our table.
Here is the code
if(!Dictionary::where('word_code' , '=' , $dic->word_code)->exists()) {
// then insert into the database.
}
where $dic->word_code is coming from thousands of XML files. every time it opens up the new XML file one by one then check this record exists then open a new XML file and check if it doesn't exists then insert the record then move to another file and do the same procedure with 10000 XML of files.
each XML file is about 40 to 80mb which has lots of data.
I already have 2981293 rows so far and checking against 2981293 rows with my XML files then inserting the row seems to be really time-consuming and resource greedy task.
word_code is already index.
The current method takes about 8 hours to finish up the procedure.
By the way I must mention this that after running this huge procedure of 8 hours it downloads about 1500 to 2000 rows of data per day.
Comparing the file to the database line by line is the core issue. Both the filesystem and databases support comparing millions of rows very quickly.
You have two options.
Option 1:
Keep a file backup of the previous run to run filesystem compare to find differences in the file.
Option 2:
Load the XML file into a MySQL table using LOAD DATA INFILE. Then run a query on all rows to find both new and changed rows. Be sure to index the table with a well defined unique key to keep this query efficient.
I would split this job into two tasks:
Use your PHP script to load the XML data unconditionally in a temporary table that has no constraints, no primary key, no indexes. Make sure to truncate that table before loading the data.
Perform one single INSERT statement, to merge records from that temporary table into your main table, possibly with an ON DUPLICATE KEY UPDATE or IGNORE option, or otherwise a negative join clause. See INSERT IGNORE vs INSERT … ON DUPLICATE KEY UPDATE
For the second point, you could for instance do this:
INSERT IGNORE
INTO main
SELECT *
FROM temp;
If the field to compare is not a primary key in the main table, or is not uniquely indexed, then you might need a statement like this:
INSERT INTO main
SELECT temp.*
FROM temp
LEFT JOIN main m2
ON m2.word_code = temp.word_code
WHERE m2.word_code is NULL;
But this will be a bit slower than a primary-key based solution.
I've been having an issue for days now and have hit a brick wall. Firstly, as the title suggests I have been working to import CSV files to a SQL database.
To be more specific, this is done through PHP scripts on the server and through MySQL into the DB.
I currently have around 30 CSV files (this number is projected to increase) which are updated daily, then a cron script is triggered once per day to update the new data. It loads the file through LOAD DATA INFILE. All of this works perfectly.
The problem is:
Each CSV file contains a different column count. The column count ranges between 50-56 columns. The data I am storing in this collective database only requires the first 8 columns. I already know how to skip individual columns using #dummy thanks to the following Q&A: How to skip columns in CSV file when importing into MySQL table using LOAD DATA INFILE?
However, as the dummy count will not always be the same due to the different column counts, I was wondering if there was a way to get the data from columns 1-8 then ignore all after regardless of column count?
A rather rough patch up would be to first read the beginning line in php, to count columns by commas. Then knowing the amount, subtract 8 and generate the sql command now knowing how many columns you need to ignore.
Just include the eight columns to populate and it will us the first eight from the CSV row:
LOAD DATA INFILE 'file.txt' INTO TABLE t1 (c1, c2, c3, c4, c5, c6, c7, c8)
I have a database table with 6 columns of 365 rows of data. I need to swap the 3rd column (named 'Date_line') with new data while leaving the other 5 columns in place, without exporting the whole table, but can't get phpMyAdmin to work with me.
Normally I'd just truncate the table and upload a revised CSV file for the whole table, but here's the catch: I have to update 232 data tables with this same exact column of data (the column data is common to all 232 tables). To do all 232 individually would mean exporting each table, opening it in Excel, swapping the old column for the new one, converting to CSV then re-uploading. It would be a lot easier if I could just import a single column CSV to overwrite the old one. But I don't know how.
I'd like to do this using the phpMyAdmin interface... I'm not much experienced in assigning scripts. Is there a way?
So basically I have a bunch of 1 Gig data files (compressed) with just text files containing JSON data with timestamps and other stuff.
I will be using PHP code to insert this data into MYSQL database.
I will not be able to store these text files in memory! Therefor I have to process each data-file line by line. To do this I am using stream_get_line().
Some of the data contained will be updates, some will be inserts.
Question
Would it be faster to use Insert / Select / Update statements, or create a CSV file and import it that way?
Create a file thats a bulk operation and then execute it from sql?
I need to basically insert data with a primary key that doesnt exist, and update fields on data if the primary key does exist. But I will be doing this in LARGE Quantities.
Performance is always and issue.
Update
The table has 22,000 Columns, and only say 10-20 of them do not contain 0.
I would load all of the data to a temporary table and let mysql do the heavy lifting.
create the temporary table by doing create table temp_table as select * from live_table where 1=0;
Read the file and create a data product that is compatible for loading with load data infile.
Load the data into the temporary table and add an index for your primary key
Next Isolate you updates by doing a inner query between the live table and the temporary table. walk through and do your updates.
remove all of your updates from the temporary (again using an inner join between it and the live table).
process all of the inserts with a simple insert into live_table as select * from temp_table.
drop the temporary table, go home and have a frosty beverage.
This may be over simplified for your use case but with a little tweaking it should work a treat.
I have a question to which I have been unable to find the answer.
I can create an extra column in a PHP recordset by using an existing column and duplicating it:
SELECT
id_tst,
name_tst,
age_tst,
price_tst,
price_tst AS newprice_tst
FROM test_tst
From what I can work out the AS will only duplicate an existing colulmn or rename a column rs.
I want to add two extra columns to a table, but with no values.
I know a lot of people will say whats the point in that; it's pointless to have 2 columns with no data.
The reason is I am doing a price updating module for a CMS system, where the user can download a .csv file containing prices, modify the prices in a spreadsheet then re-upload the CSV to update he prices.
the two extra columns would be to hold the new prices keeping the old so a roll back from the CSV file could be performed if nessecary.
I could just get the client to add the two new colulmns into the spreadsheet, but would prefer to have the exported CSV with the columns already in place.
Is it possible to create blank columns when creaing an rs?
You can create empty "dummy" columns by aliasing a blank string:
SELECT '' AS emptyColumn, column1, column2 FROM table
This will produce another column in your query with all blank values.