I have a program that I use to read CSV file and insert the data into a database. I am having trouble with it because it needs to able to insert big records ( up to 10,000 rows ) of data at a time. At first I had it looping through and inserting each record one at a time. That is slow because it calls an insert function 10,000 times... Next I tried to group it together so it inserted 50 rows at a time. I figured this way it would have to connect to the database less, but it is still too slow. What is an efficient way to insert many rows of a CSV file into a database? Also, I have to edit some data(such as add a 1 to a username if two are the same) before it goes into the database.
For a text file you can use the LOAD DATA INFILE command which is designed to do exactly this. It'll handle CSV files by default, but has extensive options for handling other text formats, including re-ordering columns, ignoring input rows, and reformatting data as it loads.
So I ended up using the fputcsv to put the data I changed into a new CSV file, then I used the LOAD DATA INFILE command to put the data from the new csv file into the table. This changed it from timing out at 120 secs for 1000 entries, to taking about 10 seconds to do 10,000 entries. Thank you to everyone that replied.
I have this crazy idea: Could you run multiple parallels scripts, each one takes care of a bunch of rows from your CSV.
Some thing like this:
<?php
// this tells linux to run the import.php in background,
// and releases your caller script.
//
// do this several times, and you could increase the overal time
$cmd = "nohup php import.php [start] [end] & &>/dev/null";
exec($cmd);
Also, have you tried to increase these limit of 50 bulk inserts to 100 or 500 for example?
Related
Some background:
I have a php program that does a lot of things with large data sets that I get every 15 minutes (about 10 million records each file every 15 minutes). I have a table on a mysql database with phone numbers (over 300 million rows) that I need to check with each row in my file and if that phone number from the mysql table is contained in the raw file record I need to know that so I can add it to my statistics record. So far I have tried to just do a sql call each time like:
select * from phone.table where number = '$phoneNumber';
Where $phoneNumber is the number in the raw record that I'm trying to compare. Then I check if the query brought back results and that is how I know if that record contained a phone number I need to check for.
That is me doing 10 million sql queries every 15 minutes and it is just too slow and too memory intensive. The second thing I tried was to just do the sql query once and store the results in an array and compare the raw record phone numbers that way. But a 300 million record array stored in memory was just too much as well.
I'm at a loss here and I can't seem to find a way to do it. Just to add a few things, yes I have to have the table stored in mysql and yes I have to do this with PHP (boss requires it being done in php).
I have a PHP program which gets from an API the weather forecast data for the following 240 hours, for 100 different cities (for a total of 24.000 records; I save them in a single table). The program gets, for every city and for every hour, temperature, humidity, probability of precipitation, sky cover and wind speed. This data is in JSON format, and I have to store all of it into a database, preferably mySQL. It is important that this operation has to be done in a single time for all the cities.
Since I would like to update the values every 10 minutes or so, performance is very important. If someone can tell me which is the most efficient way to update my table with the values from the JSON it would be of great help.
So far I have tried the following strategies:
1) decode the JSON and use a loop with a prepared statement to update each value at a time {too slow};
2) use a stored procedure {I do not know how to pass the procedure a whole JSON object, and I know there is a limited number of individual parameters I can pass};
3) use LOAD DATA INFILE {the generation of the csv file is too slow};
4) use UPDATE with CASE, generating the sql dynamically {the string gets so long that the execution is too slow}.
I will be happy to provide additional information if needed.
You have a single table with about a dozen columns, correct? And you need to insert 100 rows every 10 minutes, correct?
Inserting 100 rows like that every second would be only slightly challenging. Please show us the SQL code; something must be miserably wrong with it. I can't imagine how any of your options would take more than a few seconds. Is "a few seconds" too slow?
Or does the table have only 100 rows? And you are issuing 100 updates every 10 minutes? Still, no sweat.
Rebuild technique:
If practical, I would build a new table with the new data, then swap tables:
CREATE TABLE new LIKE real;
Load the data (LOAD DATA INFILE is good if you have a .csv)
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
There is no downtime -- the real is always available, regardless of how long the load takes.
(Doing a massive update is much more "effort" inside the database; reloading should be faster.)
I have a mysql table with 2.8 million records and i want to convert all of these to JSON. I wrote a script to convert but it stops with a memory warning.
Then i tried to create smaller files (file1 is 0,
to 100000 records, file 2 is 100000 to 1000000 records etc ) and combine with windows copy command. It works, but each file is a JSON array (like [{...}]) and when it merges, it becomes separate sections like [{}][{}] (where i want it like [{................}])
Is there any better solution to do this ?
I would suggest you to change 'memory_limit' in your php.ini configuration. Also if this takes much time then you can handle this by cron job(if possible)
OR
you can decode your ALL json files and merge it in a single array and then again encode in json and put in the json file.
Finally i did this. Please see the steps (i am not sure this is the right one, but it works).
Totally, i have 2.6 million records in my table. I created a script which will select mysql rows, convert to json and write to a file.
Select records from 0 to 1 million and create file 1. Repeat from 1 to 2 million and 2 to 2.6 million for file2 and file 3.
Combine these files using JQ (http://stedolan.github.io/jq/) and create a single JSON file.
I have a report generation functionality.
Export to csv or txt .
For each month it will be 25000 records each row with 55 columns.
For yearly it will be more than 300000!!
i try to add memory limit ,but i dont think its good!! Anyway now its 128M.
My expectation
I will split the date range selected by user into a range of 25 days or 30.
I will run fetch data for 25 days , then write the csv.
Then fetch next 25000 , write that . like this.
How can I attain this?
for fetching am using a function $result= fetchRecords();
For writing csv , I am passing this $result array to view page and by looping and seperating by comma am printing.
So in controller it will be $template->records=$result;
If i do this in a for loop
for(){
$result= fetchRecords();
$template->records=$result;
}
I dont hink this will work.
How to do this? execute fetch.write then fetch then write.
Can you please suggest better way to implement this in PHP keeping it in memory limt?
I was fetching a huge data from db to an array . Then again loop this array to write into csv or text.That was really killing the memory.
Now , as i am fetching data from db , in that same loop am writing to the file.
No usage of array. Great difference.
PHP can do better export to csv or text. PHP is really a great language.
Now am downloading a csv of 25 mb . :) i break the excel.. as it exceed 65550 records .:)
More than enough for me.
So the lesson learned is- Dont use arrays for this type of huge data storing.
I need to write a script to export contact data for printing on stickers for mailing. There could be up to 20 000 records in the database table.
The main problem is that the number of records could be so high and the site is hosted on a shared server and exporting the whole 20k records would, presumably, kill the script or stall.
I the numbers were not so high i would simply export all data into a hCard file.
The solution needs to be in PHP, and the resulting file must be usable by MS Office for use to print out address stickers.
All ideas are welcome!
I'm presuming that you DO have access to MySQL server. Why not connect to MySQL directly it should save you a LOT of time. If the operation takes to long or you expect performance issues then plan it at midnight.
You can export directly to csv like this,
SELECT * INTO OUTFILE 'result.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM my_table;
Load it in Word through mailmerge and of you go!
20k records should be very fast to export to a CSV file. if your shared hosting is so resource-starved that it can't process 20k records in under a couple of seconds, then you've got bigger problems.
Work in batches...
Load 50 Addresses into $i.csv.
reload your site (tip: header()), repeat for around 400 times.
copy the 400 cvs files into one big one (even by using notepad).
Open e.g. with Excel.
It depends on the host, but you can generally allow for longer script execution times by using set_time_limit. This coupled with dumping the data into a csv file is one way. As longneck has stated, 20k records should be faster than the 30 seconds usually allotted for scripts to run.
Code example: (I already had to do this)
$static_amount = 100; // 100 Entries to the same time
for($i = 0; $i < $mysql->count(); $i+$static_amount)
{
$toFile[$i] = $mysql->query('SELECT * FROM table WHERE id < $i AND id > $static_amount');
sleep(10); // very important!!!
}
OR
if($_COOKIE['amount'])
{
$_COOKIE['amount'] = 100;
}
$toFile = $mysql->query('SELECT * FROM table WHERE id > $_COOKIE['alreadyPerformed'] LIMIT $_COOKIE['amount']';
setCookie('amount') = 100;
setCookie('alreadyPerformed') = $_COOKIE['alreadyPerformed'] + 100;