This might be a vague question. I am given 4 CSV files with about 500k row in each of them on a daily basis. I need to perform 'join' and 'where' equivalent RDMS operations on them to create daily reports. For example, the work flow could be:
Join 2 CSV files based on a column with IDs
Filter dataset down based on a date column
Join the new filtered dataset with another CSV file based on some where conditions
Further filter them down based on more criterias
.... // Repeat
Output final dataset into a CSV file
I was thinking of writing a PHP script to:
Load each CSV file into a relational database like MySQL
Perform the joins and where conditions with SQL
Load results into a temporary table
Repeat 2 and 3
Load final data into a table
Export the table into a CSV file.
What do you think is the best approach?
Related
i need to export my database to excel. But i have two tabel (users,users document) i need to mix them when exporting and add extra column dynamicly? is there any way
I have watched many videos but those are single database and can't add column
You can use SELECT INTO outfile which will write the output into the file and in this case you want to write into CSV. Write down your JOIN queries with some tables, after that you can send the result into the file. You must define your column in the queries, otherwise if you want to have a dynamic column, you must write a function to accommodate that.
I have built a database with 6 tables, roughly 175 fields. About 130 of these fields are to be populated from data on a CSV.
Currently, a handheld device exports this CSV and it is read into a spreadsheet but it's moving to a database. So, on the front end when someone uploads a CSV, it will populate the database.
Question:
I'm trying to figure out the best way to break that CSV up line by line and put certain info into certain tables. Is that possible? If so how?
I was hoping I could query to create a header for each CSV field and map it to database fields (Since the CSV will always be in the same order).
I don't think of it as a RBAR problem. If you load the file as-is into a single staging table, you can then run something like the following for each table:
INSERT INTO destTable (col1, col2)
SELECT col1, col2
FROM StageTable
WHERE col3 = 'criteria'
That way, you keep everything set-based. Of course it depends on the number of records involved, but processing data row by row and TSQL are generally not a good fit. SSIS does a much better job of that than TSQL.
tag it by associative array in columns example
id,name,color
1,jo,red
2,ma,blue
3,j,yellow
get the first line in one array, so just compare by index the value in a loop
I have a huge CSV File with 20000+ User Entries+User Fields, that i have to compare with our Users in our Database.
The aim is to archive Every User in our database that is not in the CSV File.
My solution would be:
Get Multidimensional Array out of the CSV File
Get every User of the Database
While fetching the User, iterate through CSV array and look if User is in CSV
It is a solution that works but it draws way too much performance.
20,000~ User in CSV * 20,000~ User in Database.
=>400,000,000 Iterations (If no User is found of course...)
Is there a way to reduce the iterations to 20000~?
Yes, you can import csv data into another table and use SQL join to fetch the desired result. That way your data will be fetch much faster than before. Use temp table to import csv file.
I have some questions about customizing export result(excel) with php. The idea is i want to export my query of my raw data (mysql table) to excel file but with some customization in the result.
for example i want to have result which is summary of the table like below table:
The 3rd column until 7th column is named based on the last 5 days of my report date.
My idea is:
1. create temporary table using format as the result table i want to generate
2. Insert the table with my raw data.
3. Delete those table.
Is that efective?or is there any better idea?
You can always use a view. Which is essentially a select statement with your data in there, and which will be updated whenever your tables are updated. Then you can just do a 'select * from view_name' and export that into your excel.
Depending on the size of the data, there is no need to think about performance.
Edit the data before
You can have a temp table. Depending on the data, this is very fast if you can select and insert the data based on indexes. Then you make a SELECT * from tmp_table; and you have all your data
Edit the data after
You can just join over the different tables, get the data and then loop (read as foreach) over the result array and change the data and export it afterwards
I want to upload a csv to a database with php, but before i do that i want to modify some of the content.
The database table the csv will come from has 3 columns: id, postcode, debt_amount
The database table the csv will go to has 4 columns: id, postcode, debt_amount, count
What i want to do first is modify the full postcode to just show the first part before the 'space'.
Then i want to consolidate all the rows that have the same modified postcode, this will do two things:
Count the number of rows with the same modified postcode and place the total number into the consolidated row in the column 'count'.
Add up the 'debt_amount' column for the same modified postcode and put the total amount into the consolidated row under the 'debt_amount' column.
These processes would need to run together.
After that is done i want to upload it to the database.
I don't know if this is the best way of doing it, or if i should process the data from the first database first and export it into a CSV, to just allow me to upload the CSV on the other database.
Any help on either process would be good.
Thanks
I think it is best to process this data in MySQL itself. You may decide if you would like to process it in the source database or the target database.
So, if the processing involves:
modify the postcode
count #rows with same modified-postcode
sum debt_amount for same modified-postcode
Either do the processing in the source database, store the results in a temporary table, generate the CSV and then import to the target database. Or generate the CSV from the source DB, import to the target database in a temporary table, do the processing and store the results to the final table.
Do the standard file upload.
Read the CSV content from the temporary upload file.
Process the CSV Data (e.g. with SplFileObjectDocs.
Insert the processed data into your database.