I have a report generation functionality.
Export to csv or txt .
For each month it will be 25000 records each row with 55 columns.
For yearly it will be more than 300000!!
i try to add memory limit ,but i dont think its good!! Anyway now its 128M.
My expectation
I will split the date range selected by user into a range of 25 days or 30.
I will run fetch data for 25 days , then write the csv.
Then fetch next 25000 , write that . like this.
How can I attain this?
for fetching am using a function $result= fetchRecords();
For writing csv , I am passing this $result array to view page and by looping and seperating by comma am printing.
So in controller it will be $template->records=$result;
If i do this in a for loop
for(){
$result= fetchRecords();
$template->records=$result;
}
I dont hink this will work.
How to do this? execute fetch.write then fetch then write.
Can you please suggest better way to implement this in PHP keeping it in memory limt?
I was fetching a huge data from db to an array . Then again loop this array to write into csv or text.That was really killing the memory.
Now , as i am fetching data from db , in that same loop am writing to the file.
No usage of array. Great difference.
PHP can do better export to csv or text. PHP is really a great language.
Now am downloading a csv of 25 mb . :) i break the excel.. as it exceed 65550 records .:)
More than enough for me.
So the lesson learned is- Dont use arrays for this type of huge data storing.
Related
I have a PHP program which gets from an API the weather forecast data for the following 240 hours, for 100 different cities (for a total of 24.000 records; I save them in a single table). The program gets, for every city and for every hour, temperature, humidity, probability of precipitation, sky cover and wind speed. This data is in JSON format, and I have to store all of it into a database, preferably mySQL. It is important that this operation has to be done in a single time for all the cities.
Since I would like to update the values every 10 minutes or so, performance is very important. If someone can tell me which is the most efficient way to update my table with the values from the JSON it would be of great help.
So far I have tried the following strategies:
1) decode the JSON and use a loop with a prepared statement to update each value at a time {too slow};
2) use a stored procedure {I do not know how to pass the procedure a whole JSON object, and I know there is a limited number of individual parameters I can pass};
3) use LOAD DATA INFILE {the generation of the csv file is too slow};
4) use UPDATE with CASE, generating the sql dynamically {the string gets so long that the execution is too slow}.
I will be happy to provide additional information if needed.
You have a single table with about a dozen columns, correct? And you need to insert 100 rows every 10 minutes, correct?
Inserting 100 rows like that every second would be only slightly challenging. Please show us the SQL code; something must be miserably wrong with it. I can't imagine how any of your options would take more than a few seconds. Is "a few seconds" too slow?
Or does the table have only 100 rows? And you are issuing 100 updates every 10 minutes? Still, no sweat.
Rebuild technique:
If practical, I would build a new table with the new data, then swap tables:
CREATE TABLE new LIKE real;
Load the data (LOAD DATA INFILE is good if you have a .csv)
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
There is no downtime -- the real is always available, regardless of how long the load takes.
(Doing a massive update is much more "effort" inside the database; reloading should be faster.)
I have a mysql table with 2.8 million records and i want to convert all of these to JSON. I wrote a script to convert but it stops with a memory warning.
Then i tried to create smaller files (file1 is 0,
to 100000 records, file 2 is 100000 to 1000000 records etc ) and combine with windows copy command. It works, but each file is a JSON array (like [{...}]) and when it merges, it becomes separate sections like [{}][{}] (where i want it like [{................}])
Is there any better solution to do this ?
I would suggest you to change 'memory_limit' in your php.ini configuration. Also if this takes much time then you can handle this by cron job(if possible)
OR
you can decode your ALL json files and merge it in a single array and then again encode in json and put in the json file.
Finally i did this. Please see the steps (i am not sure this is the right one, but it works).
Totally, i have 2.6 million records in my table. I created a script which will select mysql rows, convert to json and write to a file.
Select records from 0 to 1 million and create file 1. Repeat from 1 to 2 million and 2 to 2.6 million for file2 and file 3.
Combine these files using JQ (http://stedolan.github.io/jq/) and create a single JSON file.
I have a program that I use to read CSV file and insert the data into a database. I am having trouble with it because it needs to able to insert big records ( up to 10,000 rows ) of data at a time. At first I had it looping through and inserting each record one at a time. That is slow because it calls an insert function 10,000 times... Next I tried to group it together so it inserted 50 rows at a time. I figured this way it would have to connect to the database less, but it is still too slow. What is an efficient way to insert many rows of a CSV file into a database? Also, I have to edit some data(such as add a 1 to a username if two are the same) before it goes into the database.
For a text file you can use the LOAD DATA INFILE command which is designed to do exactly this. It'll handle CSV files by default, but has extensive options for handling other text formats, including re-ordering columns, ignoring input rows, and reformatting data as it loads.
So I ended up using the fputcsv to put the data I changed into a new CSV file, then I used the LOAD DATA INFILE command to put the data from the new csv file into the table. This changed it from timing out at 120 secs for 1000 entries, to taking about 10 seconds to do 10,000 entries. Thank you to everyone that replied.
I have this crazy idea: Could you run multiple parallels scripts, each one takes care of a bunch of rows from your CSV.
Some thing like this:
<?php
// this tells linux to run the import.php in background,
// and releases your caller script.
//
// do this several times, and you could increase the overal time
$cmd = "nohup php import.php [start] [end] & &>/dev/null";
exec($cmd);
Also, have you tried to increase these limit of 50 bulk inserts to 100 or 500 for example?
I have a data set in mysql with 150 rows. I have a set of 2 for loops that run math calculations based on some user inputs and the dataset. The code does calculations for 30 row windows, and accumulates the results for each 30 row window in an array. What I mean is, I do a "cycle" of calculations on rows 0-29, then 1-30, then 2-31, etc... That would result in 120 "cycles".
Right now the for loop is set up like so (there are more fields, I just trimmed the code for simplicity of this question.
$period=30;
$query = "SELECT * FROM table";
$result = mysql_query($query);
while ($row = mysql_fetch_assoc($result)){
$data[] = array("Date" => $row['Date'], "ID" => $row['ID']);
}
for($i=0;$i<(count($data)-$window);$i++){
for($j=0;$j<$window;$j++){
//do calculations here with $data[]
$results[$i][$j]= calculations;
}
}
This works fine for the number of rows I have. However, I opened up the script to a larger dataset (1700 rows) with a different window (360 rows). This means there are exponentially more iterations. It gave me an out of memory error. Some quick use of memory_get_peak_usage() showed that memory would just continually increase.
I'm starting to think that having the loops search through that data array is extremely laborious, especially when the "window" overlaps on a lot of the "cycles". Example: Cycle 0 goes through rows 0-29. Cycle 1 goes through rows 1-30. So, both of those cycles share a row of data that they need, but I'm telling PHP to look for the new data each time.
Is there a way to structure this better? I'm getting kind of lost thinking about running these concurrent cycles.
I think the array that is blowing memory will be the $result array. In your small sample it will be a 2 dimensional array with 150x149 cells. array( 150, 149 ). At 144 bytes per element thats 3,218,400 bytes slightly over 3 Meg + remaining bucket space.
In you second larger sample it will be array(1700,1699). At 144 bytes per element thats 415,915,200 bytes, thats slightly over 406Meg + remaining bucket space, just to hold the results of your calculations.
I think you need to ask if you really need to hold all this data. If you really do, you may have to come up with another way of storing it.
I dont see any point attempting the 1000's odd database calls as this will only add to the overhead as you still have to maintain the hugh list of results in an array.
The SQL Way
You can accomplish this by using LIMIT
$period = 30;
$cycle = 0; //
$query = "SELECT * FROM table LIMIT $cycle,$period";
This will return only the results you need for each cycle. You will need to loop and increment $cycle. The way you are doing it now is probably better, however.
This won't loop back however and grab the first of the data, you will have to add additional logic to handle that case.
I'm developing an algorithm for intense calculations on multiple huge arrays. Right now I have used PHP arrays to do the job but, it seems slower than what I needed it to be. I was thinking on using MySQLi tables and convert the php arrays into database rows and then start the calculations to solve the speed issue.
At the very first step, when I was converting a 20*10 PHP array into 200 rows of database containing zeros, it took a long time. Here is the code: (Basically the following code is generating a zero matrix, if you're interested to know)
$stmt = $mysqli->prepare("INSERT INTO `table` (`Row`, `Col`, `Value`) VALUES (?, ?, '0')");
for($i=0;$i<$rowsNo;$i++){
for($j=0;$j<$colsNo;$j++){
//$myArray[$j]=array_fill(0,$colsNo,0);
$stmt->bind_param("ii", $i, $j);
$stmt->execute();
}
}
$stmt->close();
The commented-out line "$myArray[$j]=array_fill(0,$colsNo,0);" would generate the array very fast while filling out the table in next two lines, took a very longer time.
Array time: 0.00068 seconds
MySQLi time: 25.76 seconds
There is a lot more calculating remaining and I got worried even after modifying numerous parts it may get worse. I searched a lot but I couldn't find any answer on whether the array is a better choice or mysql tables? Has anybody done or know about any benchmarking test on this?
I really appreciate any help.
Thanks in advance
UPDATE:
I did the following test for a 273*273 matrix. I created two versions for the same data. First one, a two-dimension PHP array and the second one, a table with 273*273=74529 rows, both containing the same data. The followings are the speed test results for retrieving similar data from both [in here, finding out which column(s) of a certain row has a value equal to 1 - the other columns are zero]:
It took 0.00021 seconds for the array.
It took 0.0026 seconds for mysqli table. (more than 10 times slower)
My conclusion is sticking to the arrays instead of converting them into database tables.
Last thing to say, in case the mentioned data is stored in the database table in the first place, generating an array and then using it would be much much slower as shown below (slower due to data retrieval from database):
It took 0.9 seconds for the array. (more than 400 times slower)
It took 0.0021 seconds for mysqli table.
The main reason is not that the database itself is slower. The main reason is that the database access the hard-drive to store data and PHP functions use only the RAM memory to execute this procedure, wich is faster than the Hard-Drive.
Although there is a way to speed up your insert queries (most likely you are using innodb table without transaction), the very statement of question is wrong.
A database intended - in the first place - to store data. To store it permanently. It does it well. It can do calculations too, but again - before doing any calculations there is one necessary step - to store data.
If you want to do your calculations on a stored data - it's ok to use a database.
If you want to push your data in database only to calculate it - it makes not too much sense.
In my case, as shown on the update part of the question, I think arrays have better performance than mysql databases.
Array usage showed 10 times faster response even when I search through the cells to find desired values in a row. Even good indexing of the table couldn't beat the array functionality and speed.