Output 1,000s of records to text file

Output 1,000s of records to text file - php

So I was hoping to be able to get by with a simple solution to read records from a database and save them to a text file that the user downloads. I have been doing this on the fly and for under 20,000 records, this works great. Over 20,000 records and I'm loading too much data into memory and PHP hits a fatal error.
My thought was to just grab everything in chunks. So I grab XX number of rows and echo them to the file and then loop to get the next XX rows until I'm done.
I am just echoing the results right now though, not building the file and then sending it for download, which I'm guessing I'll have to do.
The issue at this point succinctly is that with up to 20,000 rows, the file builds and downloads perfectly. With more than that, I get an empty file.
The code:
header('Content-type: application/txt');
header('Content-Disposition: attachment; filename="export.'.$file_type.'"');
header('Expires: 0');
header('Cache-Control: must-revalidate');
// I do other things to check for records before, hence the do-while loop
$this->items = $model->getItems();
do {
foreach ($this->items as $k => $item) {
$i=0;
$tables = count($this->data['column']);
foreach ($this->data['column'] as $table => $fields) {
$columns = count($fields);
$j = 0;
foreach ($fields as $field => $junk) {
if ($quote_output) {
echo '"'.ucwords(str_replace(array('"'), array('\"'), $item->$field)).'"';
} else {
echo ''.$item->$field.'';
}
$j++;
if ($j<$columns) {
echo $delim;
}
}
$i++;
if ($i<$tables) {
echo $delim;
}
}
echo "\n";
}
} while($this->items = $this->_model->getItems());

Very large tables won't work that way.
You have to output the data as you read it from the database. If you need to sorted, then use the database ORDER BY for that purpose.
So more or less
// assuming you use a var such as $query to handle the DB
while(!$query->eof())
{
$fields = $query->read_next();
echo $fields; // with your formatting, maybe call a function...
}
The empty result is normal. If the memory is exhausted before any echo happens then nothing was sent to the browser.
Note also that PHP has a time limit (a watchdog) that you may need to tweak. The default is defined in your php.ini. You may set it to zero if you expect the tables to grow very much.

You should change your str_replace for addslashes(). This will probably free some memory.
Then I suggest you to save a file and use php file functions to do so: fopen() or file_put_contents().
I hope that might help you!

Actually, this might be simple fix. If PHP is running out of memory it's probably because the output buffer is filling before the file is sent. If so, simply flush() at regular intervals.
This will flush after each line:
do {
foreach(...) {
// assemble your output line here
}
echo "\n";
flush();
}
} while($this->items = $this->_model->getItems());
Flushing after each line might prove too slow, in which case add a counter and flush after every hundred, or whatever works best.

Related

Wordpress query with 19000+ products

I have problem with my Wordpress Query.
What I'm try to do:
I have CSV file with products data(name, price, stock, sku etc.)
And I want to import this file, but when I'm trying to get Product ID by SKU my query is too high for my server, but I'm doing some stupid idea : in foreach I'm trying to get all product_id.
It's possible to split my wp query without killing my server?
I'm trying sleep but this is no result...
My code is here:
public function new_import_stock_prices(){
global $wpdb;
global $post;
if ( !function_exists( 'wc_get_product_id_by_sku' ) ) {
require_once '/includes/wc-product-functions.php';
}
echo '<h1>Import stanów magazynowych i cen z pliku CSV </h1>';
echo '<h4>Plik pobierany jest z netis/products.csv</h4>';
$fn = 'https://e-xxxxx.pl/xxx/products.csv';
$file_array = file($fn);
echo '<table>';
echo '<tr>';
echo '<td>LP</td>';
echo '<td>Nazwa</td>';
echo '<td>SKU</td>';
echo '<td>Stan magazynowy</td>';
echo '<td>Cena</td>';
echo '<td>Product ID</td>';
$i = 1;
if ( in_array( 'woocommerce/woocommerce.php', apply_filters( 'active_plugins', get_option( 'active_plugins' ) ) ) ) {
foreach ($file_array as $line_number =>&$line)
{
if ($line_number > 0 && $line_number % 10 == 0) {
$row2=explode('|',$line);
$sku = $row2[1];
// get the product ID from the SKU
$product_id = $wpdb->get_var( $wpdb->prepare( "SELECT post_id FROM $wpdb->postmeta WHERE meta_key='_sku' AND meta_value='%s' LIMIT 1", $sku ) );
// Get an instance of the WC_Product object
$product = new WC_Product( $product_id );
//Get product stock quantity and stock status
$stock_quantity = $product->get_stock_quantity();
$stock_status = $product->get_stock_status();
echo '<tr>';
echo '<td>'.$i.'</td>';
echo '<td>'.$row2[0].'</td>';
echo '<td>'.$row2[1].'</td>';
echo '<td>'.$row2[5].'</td>';
echo '<td>'.$row2[2].'</td>';
echo '<td>'.$product_id.'</td>';
echo '</tr>';
$i = $i +1;
sleep(10);
}
}
}
echo '</table>';
}
BTW. my wp_postmeta table has ~900 000+ records :O

And I want to import this file
I don't see any code for importing, I see code for displaying. Assuming by import, you mean display:
What's probably happening is one of a few things.
your running out of memory (you should get an error for this)
don't use file($fn) use file functions that open the file and read line by line, such as fgetcsv
your running out of time
not much you can do about this, except send less data
your overwhelming the browser buffer by sending to much output.
again not much you can do about this but send less data.
The only real solution (Assuming by import, you mean display) is to page the data.
Now even in a file you can page the data, but I would suggest using SQLFileObject instead of the procedural file functions. That said you can page using the procedural style but its by Byte Offset, not page number.
While I can't code an entire paging system I can give you some tips:
For example
//hard to tell how many lines in the file
$fn = 'https://e-xxxxx.pl/xxx/products.csv';
$f = fopen($fn, 'r');
fseek($f, $_GET['offset']); //seek to a byte offset
$i=0;
while(!feof($f) && ($row=fgetcsv($f)) && null !== $row[0]){
if($i==10)
$offset = ftell($f); //get byte offset
++$i;
}
ftell and fseek allow you get or move the file pointer (in bytes). So you can start reading from a predefined offset that you can pass around in the url ... etc.
You can do the same thing with SplFileObject, but a bit better.
try {
$fn = 'https://e-xxxxx.pl/xxx/products.csv';
$csv = new SplFileObject($fn, 'r');
} catch (RuntimeException $e ) {
printf("Error openning csv: %s\n", $e->getMessage());
}
$csv->seek($_GET['line']); //seek to a predefined line
while(!$csv->eof() && ($row = $csv->fgetcsv()) && null !== $row[0]) {
if(($csv->key()-$_GET['line'])==10)
$line = $csv->key(); //get line offset
++$i;
}
The main advantage of SPL is you can use the row number, which is much easier to work with.
You can also get the total number of lines in a file like this
$csv->seek(PHP_INT_MAX);
$total = $csv->key();
$csv->rewind(); //or $csv->seek($_GET['line'])
Basically this seeks to the largest possible INT PHP can handle, but because the file is a fixed length, it puts the pointer at the end of the file, then using key we can get the line number. Then we simply rewind to where we want to read from.
I mention the total number of rows because in paging it's nice to be able to show that.
Another option (to display)
Besides paging is to output the page without buffering.
// Turn off output buffering
ini_set('output_buffering', 'off');
// Turn off PHP output compression
ini_set('zlib.output_compression', false);
//Flush (send) the output buffer and turn off output buffering
//ob_end_flush();
while (ob_get_level()) ob_end_flush();
// Implicitly flush the buffer(s)
ini_set('implicit_flush', true);
ob_implicit_flush(true);
Combine this with one of the methods I showed above to read the file 1 line at a time, and you may be able to eventually read all that data out.
Saving
For saving the data, your probably going to need to break it into batches, the same thing with paging can be done here (using offset or line). So that you only import a couple thousand rows at a time. I would also recommend not outputting the data, because you can give the browser more buffer then it can handle and lock it up. However if you page the data you can break it into small enough chunks that the browser can handle it.
You can even automate this using successive AJAX calls. Basically you would call the code on the backend to save a certain number of rows (x). The sever would respond, and then you would make another call for (x) more rows, save & repeat.
I want to display all products id, to check if it's correct. Next step is change stock, price and saving products
It would be easier to do this work in something like excel, just from a data entry standpoint, no one wants to edit thousands of rows on a web page and then have their session time out or something like that.
Hope that helps.

fputcsv - running out of memory during creation of larger files

I am creating sometimes large csv files from db information for users to then download - 100k or more rows. It appears I am running into a memory issue during the csv creation on some of the larger files. Here is an example of how I am currently handling creation of the csv.
Is there any way around this? Originally had 32mb and changed that to 64mb and still having the issue.
//columns array
$log_columns = array(
'1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'9'
);
//results from the db
$results = $log_stmt->fetchAll(PDO::FETCH_ASSOC);
$log_file = 'test.csv';
$log_path = $_SERVER['DOCUMENT_ROOT'].'/../user-data/'.$_SESSION['user']['account_id'].'/downloads/';
// if location does not exist create it
if(!file_exists($log_path))
{
mkdir($log_path, 0755, true);
}
// open file handler
$fp = fopen($log_path.$log_file, 'wb');
// write the csv column titles / labels
fputcsv($fp, $log_columns);
//are there any logs?
if($results)
{
//write the rows
foreach($results as $row)
{
//rows array
$log_rows = array(
$row['1'],
$row['2'],
$row['3'],
$row['4'],
$row['5'],
$row['6'],
$row['7'],
$row['8'],
$row['9']
);
//write the rows
$newcsv = fputcsv($fp, $log_rows);
}//end foreach
}
// there were no results so just return an empty log
else
{
$newcsv = fputcsv($fp, array('No results found.') );
}
//close handler
fclose($fp);
// if csv was created return true
if($newcsv)
{
return true;
}
UPDATE :
Using a while loop and fetch instead of foreach and fetchAll still produces a memory error.
while($result = $log_stmt->fetch(PDO::FETCH_ASSOC))
How is that possible if I am only loading one one row at a time?
UPDATE 2 :
I have further tracked this down to the while loop using memory_get_usage();
echo (floor( memory_get_usage() / 1024) ).' kb<br />';
Before the while loop starts the result is 4658 kb and then for each iteration of the while loop it increases 1kb every 2-3 loops until it reaches the 32748 kb max memory allowed.
What can I do to solve this issue?
UPDATE 3 :
Played around more with this today... the way this works just does not make much sense to me - I can only assume it is a strange behavior with php's GC.
scenario 1 : My query gets all 80k rows and uses a while loop to output them. Memory used is around 4500kb after the query is fetched then increments 1kb every two to three rows that are outputted in the loop. Memory is not released what so ever and it crashes without enough memory at some point.
while($results = $log_stmt->fetch(PDO::FETCH_ASSOC))
{
echo $results['timestamp'].'<br/>';
}
scenario 2 : My query is now looped and gets 1000 rows at a time with a loop within that outputting each row. Memory maxes at 400k as it loops and completes the entire output with no memory issues.
For this example I just used a counter 80 times as I know there is more than 80k rows to retrieve. In reality I would have to do this different obviously.
$t_counter = 0;
while($t_counter < 80)
{
//set bindings
$binding = array(
'cw_start' => $t_counter * 1000,
//some other bindings...
);
$log_stmt->execute($binding);
echo $t_counter.' after statement '.floor( memory_get_usage() / 1024 ).' kb<br />';
while($results = $log_stmt->fetch(PDO::FETCH_ASSOC))
{
echo $results['capture_timestamp'].'<br/>';
}
echo $t_counter.' after while'.floor( memory_get_usage() / 1024 ).' kb<br />';
$t_counter++;
}
So I guess my question is why does the first scenario have incrementing memory usage and nothing is released? In that while loop there are no new variables and everything is 'reused'. The exact same situation happens in the second scenario just within another loop.

fetchAll fetches all records, who not just query it and do a while loop with fetch then it does not need to load all the result set in memory.
http://php.net/manual/en/pdostatement.fetch.php

Then i think you should try reading the files in bits. Read them and append into one csv file,that way you free memory during the process .
You could do count(*) ,but try to find the total count before the multiple collection

I have been using php's csv myself, i even use it as a databse system(nosql)
try
csv code for reading
<?php
$CSVfp = fopen("filename.csv", "r");
if($CSVfp !== FALSE) {
$con=1;
while(! feof($CSVfp))
{
do something
}?>
**csv code for writting **
<?php
$list = array
(
"edmond,dog,cat,redonton",
"Glenn,Quagmire,Oslo,Norway",
);$file = fopen("filename.csv","w");foreach ($list as $line)
{fputcsv($file,explode(',',$line));}fclose($file); ?>

php fgetcsv multiple lines not only one or all

I wand to read biiiiig CSV-Files and want to insert them into a database. That already works:
if(($handleF = fopen($path."\\".$file, 'r')) !== false){
$i = 1;
// loop through the file line-by-line
while(($dataRow = fgetcsv($handleF,0,";")) !== false) {
// Only start at the startRow, otherwise skip the row.
if($i >= $startRow){
// Check if to use headers
if($lookAtHeaders == 1 && $i == $startRow){
$this->createUberschriften( array_map(array($this, "convert"), $dataRow ) );
} else {
$dataRow = array_map(array($this, "convert"), $dataRow );
$data = $this->changeMapping($dataRow, $startCol);
$this->executeInsert($data, $tableFields);
}
unset($dataRow);
}
$i++;
}
fclose($handleF);
}
My problem of this solution is, that it's very slow. But the files are too big to put it directly into the memory... So I wand to ask, if there a posibility to read, for example 10 lines, into the $dataRow array not only one or all.
I want to get a better balance between the memory and the performance.
Do you understand what i mean? Thanks for help.
Greetz
V
EDIT:
Ok, I still have to try to find a solution with the MSSQL-Database. My solution was to stack the data and than make a multiple-MSSQL-Insert:
while(($dataRow = fgetcsv($handleF,0,";")) !== false) {
// Only start at the startRow, otherwise skip the row.
if($i >= $startRow){
// Check if to use headers
if($lookAtHeaders == 1 && $i == $startRow){
$this->createUberschriften( array_map(array($this, "convert"), $dataRow ) );
} else {
$dataRow = array_map(array($this, "convert"), $dataRow );
$data = $this->changeMapping($dataRow, $startCol);
$this->setCurrentRow($i);
if(count($dataStack) > 210){
array_push($dataStack, $data);
#echo '<pre>', print_r($dataStack), '</pre>';
$this->executeInsert($dataStack, $tableFields, true);
// reset the stack
unset($dataStack);
$dataStack = array();
} else {
array_push($dataStack, $data);
}
unset($data);
}
$i++;
unset($dataRow);
}
}
Finaly I have to loop the Stack and build in mulitiple Insert in the method "executeInsert", to create a query like this:
INSERT INTO [myTable] (field1, field2) VALUES ('data1', 'data2'),('data2', 'datta3')...
That works much better. I still have to check the best balance, but therefor i can change only the value '210' in the code above. I hope that help's everybody with a similar problem.
Attention: Don't forget to execute the method "executeInsert" again after readin the complete file, because it could happen that there are still some data in the stack and the method will only be executed when the stack reach the size of 210....
Greetz
V

I think your bottleneck is not reading the file. Which is a text file. Your bottleneck is the INSERT in the SQL table.
Do something, just comment the line that actually do the insert and you will see the difference.
I had this same issue in the past, where i did exactly what you are doing. reading a 5+ million lines CSV and inserting it in a Mysql table. The execution time was 60 hours which is
unrealistic.
My solutions was switch to another db technology. I selected MongoDB and the execution time
was reduced to 5 minutes. MongoDB performs really fast on this scenarios and also have a tool called mongoimport that will allow you to import a csv file firectly from the command line.
Give it a try if the db technology is not a limitation on your side.
Another solution will be spliting the huge CSV file into chunks and then run the same php script multiple times in parallel and each one will take care of the chunks with an specific preffix or suffix on the filename.
I don't know which specific OS are you using, but in Unix/Linux there is a command line tool
called split that will do that for you and will also add any prefix or suffix you want to the filename of the chunks.

Removing Lines in php ? Is this possible?

I have been struggling to create a Simple ( really simple ) chat system for my website as my knowledge on Javascripting/AJAX are Limited after gather resources and help from many kind people I was able to create my simple chat system but left with one problem.
The messages are posted to a file called "msg.html" in this format :
<p><span id="name">$name</span><span id="Msg">$message</span></p>
And then using PHP and AJAX I will retrieve the messages instantly from the file using the
file(); function and a foreach(){} loop withing PHP here is the code :
<?php
$file = 'msg.html';
$data = file($file);
$max_lines = 20;
if(count($data) > $max_lines){
// here i want the data to be deleted from oldest until i only have 20 messages left.
}
foreach($data as $line_num => $line){
echo $line_num . " . " . $line;
}
?>
My Question is how can i delete the oldest messages so that i am only left with the latest 20 Messages ?

How does something like this seem to you:
$file = 'msg.html';
$data = file($file);
$max_lines = 20;
foreach($data as $line_num => $line)
{
if ($line_num < $max_lines)
{
echo $line_num . " . " . $line;
}
else
{
unset($data[$line_num]);
}
}
file_put_contents('msg.html', $data);
?>
http://www.php.net/manual/en/function.file-put-contents.php for more info :)

I suppose you can read the file, explode it into an array, chop off everything but last 20 fields and write it back to file, overwriting the old one... Perhaps not the best solution but one that comes to mind if you really cant use database as Delan suggested

That's called round-robin if I recall correctly.
As far as I know, you can't remove arbitrary portions of a file. You need to overwrite the file with the new contents (or create a new file and remove the old one). You could also store messages in individual files but of course that implies up to $max_lines files to read.
You should also use flock() to avoid data corruption. Depending on the platform it's not 100% reliable but it's better than nothing.

CSV file generation error

I'm working on a project for a client - a wordpress plugin that creates and maintains a database of organization members. I'll note that this plugin creates a new table within the wordpress database (instead of dealing with the data as custom_post_type meta data). I've made a lot of modifications to much of the plugin, but I'm having an issue with a feature (that I've left unchanged).
One half of this feature does a csv import and insert, and that works great. The other half of this sequence is a feature to download the contents of this table as a csv. This part works fine on my local system, but fails when running from the server. I've poured over each portion of this script and everything seems to make sense. I'm, frankly, at a loss as to why it's failing.
The php file that contains the logic is simply linked to. The file:
<?php
// initiate wordpress
include('../../../wp-blog-header.php');
// phpinfo();
function fputcsv4($fh, $arr) {
$csv = "";
while (list($key, $val) = each($arr)) {
$val = str_replace('"', '""', $val);
$csv .= '"'.$val.'",';
}
$csv = substr($csv, 0, -1);
$csv .= "\n";
if (!#fwrite($fh, $csv))
return FALSE;
}
//get member info and column data
$table_name = $wpdb->prefix . "member_db";
$year = date ('Y');
$members = $wpdb->get_results("SELECT * FROM ".$table_name, ARRAY_A);
$columns = $wpdb->get_results("SHOW COLUMNS FROM ".$table_name, ARRAY_A);
// echo 'SQL: '.$sql.', RESULT: '.$result.'<br>';
//output headers
header("Content-type: application/octet-stream");
header("Content-Disposition: attachment; filename=\"members.csv\"");
//open output stream
$output = fopen("php://output",'w');
//output column headings
$data[0] = "ID";
$i = 1;
foreach ($columns as $column){
//DIAG: echo '<pre>'; print_r($column); echo '</pre>';
$field_name = '';
$words = explode("_", $column['Field']);
foreach ($words as $word) $field_name .= $word.' ';
if ( $column['Field'] != 'id' && $column['Field'] != 'date_updated' ) {
$data[$i] = ucwords($field_name);
$i++;
}
}
$data[$i] = "Date Updated";
fputcsv4($output, $data);
//output data
foreach ($members as $member){
// echo '<pre>'; print_r($member); echo '</pre>';
$data[0] = $member['id'];
$i = 1;
foreach ($columns as $column){
//DIAG: echo '<pre>'; print_r($column); echo '</pre>';
if ( $column['Field'] != 'id' && $column['Field'] != 'date_updated' ) {
$data[$i] = $member[$column['Field']];
$i++;
}
}
$data[$i] = $member['date_updated'];
//echo '<pre>'; print_r($data); echo '</pre>';
fputcsv4($output, $data);
}
fclose($output);
?>
So, obviously, a routine wherein a query is run, $output is established with fopen, each row is then formatted as comma delimited and fwrited, and finally the file is fclosed where it gets pushed to a local system.
The error that I'm getting (from the server) is
Error 6 (net::ERR_FILE_NOT_FOUND): The file or directory could not be found.
But it clearly is getting found, its just failing. If I enable phpinfo() (PHP Version 5.2.17) at the top of the file, I definitely get a response - notably Cannot modify header information (I'm pretty sure because phpinfo() has already generated a header). All the expected data does get printed to the bottom of the page (after all the phpinfo diagnostics), however, so that much at least is working correctly.
I am guessing there is something preventing the fopen, fwrite, or fclose functions from working properly (a server setting?), but I don't have enough experience with this to identify exactly what the problem is.
I'll note again that this works exactly as expected in my test environment (localhost/XAMPP, netbeans).
Any thoughts would be most appreciated.
update
Ok - spent some more time with this today. I've tried each of the suggested fixes, including #Rudu's writeCSVLine fix and #Fernando Costa's file_put_contents() recommendation. The fact is, they all work locally. Either just echoing or the fopen,fwrite,fclose routine, doesn't matter, works great.
What does seem to be a problem is the inclusion of the wp-blog-header.php at the start of the file and then the additional header() calls. (The path is definitely correct on the server, btw.)
If I comment out the include, I get a csv file downloaded with some errors planted in it (because $wpdb doesn't exist. And if comment out the headers, I get all my data printed to the page.
So... any ideas what could be going on here?
Some obvious conflict of the wordpress environment and the proper creation of a file.
Learning a lot, but no closer to an answer... Thinking I may need to just avoid the wordpress stuff and do a manual sql query.

Ok so I'm wondering why you've taken this approach. Nothing wrong with php://output but all it does is allow you to write to the output buffer the same way as print and echo... if you're having trouble with it, just use print or echo :) Any optimizations you could have got from using fwrite on the stream then gets lost by you string-building the $csv variable and then writing that in one go to the output stream (Not that optimizations are particularly necessary). All that in mind my solution (in keeping with your original design) would be this:
function escapeCSVcell($val) {
return str_replace('"','""',$val);
//What about new lines in values? Perhaps not relevant to your
// data but they'll mess up your output ;)
}
function writeCSVLine($arr) {
$first=true;
foreach ($arr as $v) {
if (!$first) {echo ",";}
$first=false;
echo "\"".escapeCSVcell($v)."\"";
}
echo "\n"; // May want to use \r\n depending on consuming script
}
Now use writeCSVLine in place of fputcsv4.

Ran into this same issue. Stumbled upon this thread which does the same thing but hooks into the 'plugins_loaded' action and exports the CSV then. https://wordpress.stackexchange.com/questions/3480/how-can-i-force-a-file-download-in-the-wordpress-backend
Exporting the CSV early eliminates the risk of the headers already being modified before you get to them.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Output 1,000s of records to text file - php

You should change your str_replace for addslashes(). This will probably free some memory. Then I suggest you to save a file and use php file functions to do so: fopen() or file_put_contents(). I hope that might help you!

Related

Wordpress query with 19000+ products

fputcsv - running out of memory during creation of larger files

php fgetcsv multiple lines not only one or all

Removing Lines in php ? Is this possible?

CSV file generation error

Categories

Resources