Export big mysql table into .xls format - php

I need exposrt mysql table to .xls format, this is fragment from my code
$result = mysql_query( /* here query */ );
$objPHPExcel = new PHPExcel();
$rowNumber = 1;
while ($row = mysql_fetch_row($result)) {
$col = 'A';
foreach($row as $cell) {
$objPHPExcel->getActiveSheet()->setCellValue($col.$rowNumber,$cell);
$col++;
}
$rowNumber++;
}
Problem is that, in table is 500 000 rows and in while cycle at every iteration when I make also foreach cycle, this takes very many time at php file execution.
Possible to optimize this code?

500,000 rows will always take a lot of time to write.... even if you speed it up by using the worksheet's fromArray() method to get rid of your foreach loop; and (as nichar has pointed out) this is too many rows for the xls format to handle unless you split them across multiple worksheets.
You can reduce the memory requirements by enabling cell caching (SQLite gives the best memory usage), but it will still take a long time to execute for 500,000 rows and anything this size should be run as a batch/cron job

This is a point to note rather than a direct answer to your question - but if the Excel file format you're outputting is .xls, the maximum rows would be 65,536 and if it is MS Excel 2007+ format e.g .xlsx, the maximum rows would be 1,048,576.
So without changing the output format to .xlsx (which is an entirely different structure), the files will be too large to open.

Consider dumping the data into a csv file, and then importing it into Excel. Should be a lot faster.

If you get a php timeout, you can reset the limit by adding this inside the while or for loop:
set_time_limit(300); //whatever seconds you want
If you're running it through the browser, your server may be timing out. I recommend you run it on command line to avoid this.
Also, similar to what nickhar mentioned, it can be an excel issue. I would try outputting as a csv file. I think it will allow you to output more lines.

Related

PHP - How to append to a JSON file

I generate JSON files which I load into datatables, and these JSON files can contain thousands of rows from my database. To generate them, I need to loop through every row in the database and add each database row as a new row in the JSON file. The problem I'm running into is this:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 262643 bytes)
What I'm doing is I get the JSON file with file_get_contents($json_file) and decode it into an array then I add a new row to the array, then encode the array back into JSON and export it to the file with file_put_contents($json_file).
Is there a better way to do this? Is there a way I can prevent the memory increasing with each loop iteration? Or is there a way I can clear the memory before it reaches the limit? I need the script to run to completion, but with this memory problem it barely gets up to 5% completion before crashing.
I can keep rerunning the script and each time I rerun it, it adds more rows to the JSON file, so if this memory problem is unavoidable, is there a way to automatically rerun the script numerous times until its finished? For example could I detect the memory usage, and detect when its about to reach the limit, then exit out of the script and restart it? I'm on wpengine so they won't allow security risky functions like exec().
So I switched to using CSV files and it solved the memory problem. The script runs vastly faster too. JQuery DataTables doesn't have built in support for CSV files, so I wrote a function to convert the CSV file to JSON:
public function csv_to_json($post_type) {
$data = array(
"recordsTotal" => $this->num_rows,
"recordsFiltered" => $this->num_rows,
"data"=>array()
);
if (($handle = fopen($this->csv_file, 'r')) === false) {
die('Error opening file');
}
$headers = fgetcsv($handle, 1024, "\t");
$complete = array();
while ($row = fgetcsv($handle, 1024, "\t")) {
$complete[] = array_combine($headers, $row);
}
fclose($handle);
$data['data'] = $complete;
file_put_contents($this->json_file,json_encode($data,JSON_PRETTY_PRINT));
}
So the result is I create a CSV file and a JSON file much faster than creating a JSON file alone, and there are no issues with memory limits.
Personally as I said in the comments, I would use CSV files. They have several advantages.
you can read / write one line at a time so you only manage the memory for one line
you can just append new data into the file.
PHP has plenty of built in support using either the fputcsv() or SPL file objects.
you can load them directly into the database using using "Load Data Infile"
http://dev.mysql.com/doc/refman/5.7/en/load-data.html
The only cons are
keep the same schema through the whole file
no nested data structures
The issue with Json, is ( as far as I know ) you have to keep the whole thing in memory as a single data set. Therefor you cannot stream it ( line for line ) like a normal text file. There is really no solution beside limiting the size of the json data, which may or may not even be easy to do. You can increase the memory some, but that is just a temporary fix if you expect the data to continue to grow.
We use CSV files in a production environment and I regularly deal with datasets that are 800k or 1M rows. I've even seen one that was 10M rows. We have a single table of 60M rows ( MySql ) that is populated from CSV uploads. So it will work and be robust.
If your set on Json, then I would just come up with a fixed number of rows that works and design your code to only run that many rows at a time. It's impossible for me to guess how to do that without more details.

PHP Using fgetcsv on a huge csv file

Using fgetcsv, can I somehow do a destructive read where rows I've read and processed would be discarded so if I don't make it through the whole file in the first pass, I can come back and pick up where I left off before the script timed out?
Additional Details:
I'm getting a daily product feed from a vendor that comes across as a 200mb .gz file. When I unpack the file, it turns into a 1.5gb .csv with nearly 500,000 rows and 20 - 25 fields. I need to read this information into a MySQL db, ideally with PHP so I can schedule a CRON to run the script at my web hosting provider every day.
I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.
My idea was to grab the information from the .csv using the fgetcsv function, but I'm expecting to have to take multiple passes at the file because of the 3 minute timeout, I was thinking it would be nice to whittle away at the file as I process it so I wouldn't need to spend cycles skipping over rows that were already processed in a previous pass.
From your problem description it really sounds like you need to switch hosts. Processing a 2 GB file with a hard time limit is not a very constructive environment. Having said that, deleting read lines from the file is even less constructive, since you would have to rewrite the entire 2 GB to disk minus the part you have already read, which is incredibly expensive.
Assuming you save how many rows you have already processed, you can skip rows like this:
$alreadyProcessed = 42; // for example
$i = 0;
while ($row = fgetcsv($fileHandle)) {
if ($i++ < $alreadyProcessed) {
continue;
}
...
}
However, this means you're reading the entire 2 GB file from the beginning each time you go through it, which in itself already takes a while and you'll be able to process fewer and fewer rows each time you start again.
The best solution here is to remember the current position of the file pointer, for which ftell is the function you're looking for:
$lastPosition = file_get_contents('last_position.txt');
$fh = fopen('my.csv', 'r');
fseek($fh, $lastPosition);
while ($row = fgetcsv($fh)) {
...
file_put_contents('last_position.txt', ftell($fh));
}
This allows you to jump right back to the last position you were at and continue reading. You obviously want to add a lot of error handling here, so you're never in an inconsistent state no matter which point your script is interrupted at.
You can avoid timeout and memory error to some extent when reading like a Stream. By Reading line by line and then inserts each line into a database (Or Process accordingly). In that way only single line is hold in memory on each iteration. Please note don't try to load a huge csv-file into an array, that really would consume a lot of memory.
if(($handle = fopen("yourHugeCSV.csv", 'r')) !== false)
{
// Get the first row (Header)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// Process Your Data
unset($data);
}
fclose($handle);
}
I think a better solution (it will be phenomnally inefficient to continuously rewind and write to open file stream) would be to track the file position of each record read (using ftell) and store it with the data you've read - then if you have to resume, then just fseek to the last position.
You could try loading the file directly using mysql's read file function (which will likely be a lot faster) although I've had problems with this in the past and ended up writing my own php code.
I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.
What have you tried?
The memory can be limited by other means than the php.ini file, but I can't imagine how anyone could actually prevent you from using a different execution time (even if ini_set is disabled, from the command line you could run php -d max_execution_time=3000 /your/script.php or php -c /path/to/custom/inifile /your/script.php )
Unless you are trying to fit the entire datafile into memory then there should be no issue with a memory limit of 128Mb

Read CSV from end to beginning in PHP

I am using PHP to expose vehicle GPS data from a CSV file. This data is captured at least every 30 seconds for over 70 vehicles and includes 19 columns of data. This produces several thousand rows of data and file sizes around 614kb. New data is appended to end of the file. I need to pull out the last row of data for each vehicle, which should represent the most the recent status. I am able to pull out one row for each unit, however since the CSV file is in chronological order I am pulling out the oldest data in the file instead of the newest. Is it possible to read the CSV from the end to the beginning? I have seen some solutions, however they typically involve loading the entire file into memory and then reversing it, this sounds very inefficient. Do I have any other options? Thank you for any advice you can offer.
EDIT: I am using this data to map real-time locations on-the-fly. The data is only provided to me in CSV format, so I think importing into a DB is out of the question.
With fseek you can set the pointer to the end of the file and offset it negative to read a file backwards.
If you must use csv files instead of a database, then perhaps you could read the file line-by-line. This will prevent more than the last line being stored in memory (thanks to the garbage collector).
$handle = #fopen("/path/to/yourfile.csv", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
// old values of $last are garbage collected after re-assignment
$last = $line;
// you can perform optional computations on past data here if desired
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
// $last will now contain the last line of the file.
// You may now do whatever with it
}
edit: I did not see the fseek() post. If all you need is the last line, then that is the way to go.

Writing a very large CSV file From DB output in PHP

I have a DB of sensor data that is being collected every second. The client would like to be able to download 12hour chunks in CSV format - This is all done.
The output is sadly not straight data and needs to be processed before the CSV can be created (parts are stored as JSON in the DB) - so I cant just dump the table.
So, to reduce load, I figured that the first time the file is downloaded, I would cache it to disk, then any more requests just download that file.
If I dont try to write it (using file_put_contents, FILE_APPEND), and just echo every line it is fine, but writing it, even if I give the script 512M it runs out of memory.
so this works
while($stmt->fetch()){
//processing code
$content = //CSV formatting
echo $content;
}
This does not
while($stmt->fetch()){
//processing code
$content = //CSV formatting
file_put_contents($pathToFile, $content, FILE_APPEND);
}
It seems like even thought I am calling file_put_contents at every line, it is storing it all to memory.
Any suggestions?
The problem is that file_put_contents is trying to dump the entire thing at once. Instead you should loop through in your formatting and use fopen, fwrite, fclose.
while($stmt->fetch()){
//processing code
$content[] = //CSV formatting
$file = fopen($pathToFile, a);
foreach($content as $line)
{
fwrite($file, $line);
}
fclose($file);
}
This will limit the amount of data trying to be tossed around in data at any given time.
I agree completely with writing one line at a time, you will never have memory issues this way since there is never more than 1 line loaded in to memory at a time. I have an application that does the same. A problem I have found with this method however, is that the file takes forever to finish writing. So this post is to back up what has already been said, but also to ask all of you for an opinion on how to speed this up? For example, my system cleans a data file against a suppression file, so I read in one line at a time and look for a match in the suppression file, then if no match is found, I write the line in to the new cleaned file. A 50k line file is taking about 4 hours to finish however, so I am hoping to find a better way. I have tried this several ways, and at this point I load the entire suppression file in to memory now to avoid my main reading loop to have to run another loop through each line in the suppression file, but even that is still taking hours.
So, line by line is by far the best way to manage your system's memory, but I'd like to get the processing time for a 50k line file (lines are email addresses and first and last names) to finishing running in less than 30 minutes if possible.
fyi: the suppression file is 16,000 kb in size and total memory used by the script as told by memory_get_usage() is about 35 megs.
Thanks!

php parallelprocess or one process after another on filewriting

Am exporting data to csv. after 25000 records , memory exhausted.
Memory limit increasing is ok.
If i have 100000 rows, can i write it as 4 process.
write first 25000 rows, then next 25000 then next...
Is this possible in csv export?
Will this have any advantage? Or this is same exporting whole data?
Any multiple processing or parallel processing have some advantage?
Well, this depends on how you're generating the CSV.
Assuming that you're doing it as the result of a database query (or some other import), you could try streaming instead of building and then returning.
Basically, you turn off output buffering first:
while(ob_get_level() > 0) {
ob_end_flush();
}
Then, when you're building it, echo it out row by row:
foreach ($rows as $row) {
echo '"'.$row[0].'","'.$row[1].'"'."\n";
}
That way, you're not using too much memory in PHP.
You could also write the data to a temporary file, and then stream that file back:
$file = tmpfile();
foreach ($rows as $row) {
fputcsv($file, $row);
}
rewind($file);
fpassthru($file); // Sends the file to the client
fclose($file);
But again, it all depends on what you're doing. It sounds to me like you're building the CSV in a string (which is eating all your memory). That's why I suggested these two options...
The problem is if you fork the process, you have to worry about cleaning its children up, and you're still using the same amount of memory. Ultimately you're limited by the machine memory, but if you don't want to have to conditionally increase php's memory_limit based on the number of iterations, then forking may be the way to go.
If you compiled PHP with --enable-pcntl and --enable-sigchild, you're good to go - otherwise, you won't be able to fork the process. One workaround would be to have a master script that delegates the execution of other scripts, but if you're using backticks or shell() or exec() (or anything similar) it starts to get sloppy and you'll have to take a lot of steps to ensure that your commands cannot be tainted/exploited.

Categories