Although my application is built using the Yii framework this is more of a general PHP issue (I think).
I have some code which takes a Yii CActiveDataProvider, loops over it and builds a CSV export. This works great and correctly builds and sends the CSV.
I encounter a problem when trying to export a larger dataset. I have successfully output ~2500 records without any problem but when I run the exact same code for a larger set of data (~5000 records) the script appears to run ok but sends a zero length/blank CSV. I can't figure out why... it seems to run for a while and then sends the CSV, no errors or warnings in the logs. Could it be that the output is being flushed or similar before it's ready?
Code is as follows (added a couple of inline comments here for clarity):
<?php
header('Content-type: text/csv');
header('Content-Disposition: attachment; filename="vacancies.csv"');
set_time_limit(240); // I know this is arbitrarily long, it's just to avoid any timeout
$outstream = fopen("php://output", 'w');
$headings = array(
$vacancy->getAttributeLabel('vacancy_id'), // this is a Yii method that returns the active record attribute as a string
...
);
fputcsv($outstream, $headings, ',', '"');
foreach($dp->getData() as $vacancy){ // the getData() method pulls the next active record model out of the Yii dataprovider and the values for various attributes are set below
$row = array(
$vacancy->vacancy_id,
...
);
fputcsv($outstream, $row, ',', '"');
}
fclose($outstream);
?>
Any thoughts on why this is working ok up to a certain number of records?
Update
After re-checking the logs as suggested below I've found I am in fact running out of memory, doh!
I can write out to the filesystem and that gets me up to about 3000 records but then runs out of memory. Any idea of the best way to alter my code to avoid running out of memory?
Thanks very much for the suggestions to check the error logs, I had somehow missed an out of memory error I was getting.
The problem was in fact caused by the way I was using the CActiveDataProvider from the Yii Framework. Reading straight from the DataProvider as I was doing in my question was reading each row into memory, as the script ran on this meant I eventually ran out of memory available to PHP.
There are a couple of ways to fix this, one is to set pagination on the dataprovider to a smaller number of records and to manually iterate over the data, loading only the pagesize into memory each iteration.
The option I went for is to use a CDataProviderIterator to handle this for me $iterator = new CDataProviderIterator($dp); this prevents memory filling up with the records I'm retrieving.
Note that I also had to add an ob_flush(); call to prevent the output buffer from filling up with the CSV contents iteslf.
For reference I ended up with the following:
<?php
header('Content-type: text/csv');
header('Content-Disposition: attachment; filename="vacancies.csv"');
set_time_limit(240);
$outstream = fopen("php://output", 'w');
$headings = array(
$vacancy->getAttributeLabel('vacancy_id'),
...
);
fputcsv($outstream, $headings, ',', '"');
$iterator = new CDataProviderIterator($dp); // create an iterator instead of just using the dataprovider
foreach($iterator as $vacancy){ // use the new iterator here
$row = array(
$vacancy->vacancy_id,
...
);
fputcsv($outstream, $row, ',', '"');
ob_flush(); // explicitly call a flush to avoid filling the buffer
}
fclose($outstream);
?>
Would not have thought to go back and look at the logs again without the suggestion so many thanks :)
Related
So I have this CI Project that converts from database into CSV.
Deployed in a SSH server.
I try to load all data(It's over 2,000,000+) then convert it to CSV.
My first try I filter it with rows having only emails(so it gives me 66,000+ data.)
It successfully exported the data into csv(took a bit of time).
But when I finally try to export all data, after I click the "Convert to CSV", It will take so much time loading and the browser give error:
This page isn’t working
<server-ip-address> didn’t send any data.
ERR_EMPTY_RESPONSE
Does this have something to matter with the server?
I tried changing settings in the /etc/php.ini with these settings:
max_execution_time = 259200
max_input_time = 259200
memory_limit = 300M
session.gc_maxlifetime = 1440
But it still gives me same error.
How can I resolve this? Please help.|
UPDATE: I include my code for the csv download, here it is:
public function convcsv(){
ini_set('memory_limit', '-1');
set_time_limit(0);
$prefKey = $this->session->flashdata('prefKey');
$searchKey = $this->session->flashdata('searchKey');
$withEmail = $this->session->flashdata('withEmail');
log_message('debug', 'CONVCSV prefKey = ' . $prefKey);
log_message('debug', 'CONVCSV searchKey = ' . $searchKey);
$list = $this->user_model->get_users($prefKey, $searchKey, $withEmail, "", "");
log_message('debug', 'Fetched data');
$headerArray = array("id", "prefecture_id", "industry_id", "offset", "name", "email");
// Header
$header = str_replace(",", "", $headerArray);
$datas = implode(',', $header) . "\r\n";
// Body
foreach($list as $body)
{
// 配列の内容を「,」区切りで連結する
$orig_email = $body['email'];
$mstring = preg_replace("/^([^a-zA-Z0-9])*/",',',$orig_email);
preg_match_all("/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $mstring, $matches);
$email = implode($matches[0]);
//$email = $matches[0];
$datas .= $body["id"].",".$body["prefecture_id"].",".$body["industry_id"].",".$body["offset"].",".preg_replace('/[,]/',' ',$body["name"]).",".$email."\r\n";
}
// 文字コード返還
$datas = mb_convert_encoding($datas, "SJIS-win", "UTF-8");
// ダウンロード開始
$csvFileName = "phpList_" . date('Ymd_His') . ".csv";
header('Content-Type: application/octet-stream');
header('Content-Disposition: attachment; filename=' . $csvFileName);
header('Content-Transfer-Encoding: binary');
while (ob_get_level() > 0)
{
ob_end_clean();
}
ob_start();
print trim($datas);
ob_flush();
ob_end_clean();
exit;
}
Ok, I will try to explain this as best i can with what little data you gave. I will assume you can pull the data from the database. If not you can use unbuffered queries in PDO ( I only use PDO for last 4-5 years )
http://php.net/manual/en/mysqlinfo.concepts.buffering.php
As a side note, I've pulled 110 million rows from MySql using unbuffered queries, this is on a server with 56GB of ram though ( Azure standard_A8, it's pretty l33t ).
For outputting the data, typically when the browser loads a page it "builds" it all server side and then dumps it in one go on the browser (generally speaking). In your case this is too much data. So,
(Psudo ish Code )
set_time_limit(-1); //set time limit.
header('Content-Type: text/csv; charset=utf-8');
header('Content-Disposition: attachment; filename=data.csv');
$f = fopen('php:\\output', 'w');
while( false !== ($row = $stmt->fetch(PDO:FETCH_ASSOC) ) ){
fputcsv($f, $row);
flush();
}
The disadvantage is there is no real way to tell the download file size before hand. We are basically sending the download headers, then dumping each line into the output stream and flushing it to the browser one line at a time.
Overall this is slower then doing it in one push, because there is more network traffic but it manages memory better, sometimes it is what it is.
You can see some example on the page ( for streaming output )
http://php.net/manual/en/function.flush.php
And you might have to use some stuff like this (first) if it doesn't work,
#apache_setenv('no-gzip', 1);
ini_set('zlib.output_compression', 0);
ini_set('implicit_flush', 1);
The download should start pretty much instantly though, but there is no guarantee that the complete file will be output as an error part way through could prematurely end the script.
You may have issues with this too memory_limit = 300M, I'm spoiled and have 2GB as the default and up to 6GB at run time ini_set('memory_limit', '300M'); //set at runtime
Lastly, I feel compelled to say not to set the time limit globally but instead do it this way set_time_limit(-1); at run time. That way it only affects this script and not the server as a whole. However you may run into issues with timeouts in apache itself. It's a very tricky thing because there are a lot of moving parts between the server and the browser. Many of which can depend on the server, the servers OS, the browser etc. ( environment )
Ideally you would do this though FTP download, but this is probably the best solution ( at least in concept ), it's just a matter of sending easily digestible chunks.
I am working on a web application (php, javascript, html) that holds a large amount os user information. It is designed for temporary jobs.
The thing is, I have three tables (mysql) with information about the users. One for the address and other things, other for driver license, certificates, ..., and the last one for user experience.
What I want to do, is have a "print" button that generates a txt file with all that data in one file. The thing is that I alredy have an idea of how to do it.
1) retrieve all the information for the tables in a join with username.
2) I already had the function to "resource_to_array" for building an array with all the data
3) I could just go for each column of the array and saving the information that I want
But what I am asking is for experience doing something like this. This is the first time for me, and I want it to make it good and scalable for the future.
How will be a good way to do implement it? also, how could I create a plain text with all that information? (this part is in where I have more doubts about)
I know that maybe is a weird question..but I do not want code, I just want a vision for the implementation. Also, is there is a library os something similar that do what I want to do
Thank you very much.
First of all make a link that will generate text file.
Print
Now, in print.php you have to put your logic.
Execute the join Query for fetching the data.
Fetch "array" for building an array with all the data.
Get the element from the array that you want to save in text file.
Now logic is here, Use file handling for creating the text file with save data.
Logic is here(print.php) :
<?php
$conn = new mysqli("host-name","username","password","database-name");
$query = "....";
$result = $conn->query($query);
$rs = $result->fetch_array(MYSQLI_BOTH);
while($rs = $result->fetch_array(MYSQLI_BOTH))
{
$data = $rs['column-name']; // data you want to save in text file
$f = fopen("save.txt", 'a');
fwrite($f, $data);
fclose($f);
}
$filename = "save-data.txt"; // name of the file you want to download on clicking the link
$file = "save-data.txt";
$type = filetype($file);
// Send file headers
header("Content-type: $type");
header("Content-Disposition: attachment;filename=$filename");
header("Content-Transfer-Encoding: binary");
header('Pragma: no-cache');
header('Expires: 0');
// Send the file contents.
set_time_limit(0);
readfile($file);
?>
Sidenote: The a switch appends to file. If you do not wish to keep adding to it, use the w switch:
$f = fopen("save.txt", 'w');
I hope it will help you.
function outputCSV($data) {
$outstream = fopen("php://output", 'w');
function __outputCSV(&$vals, $key, $filehandler){
fputcsv($filehandler, $vals, ',', '"');
}
array_walk($data, '__outputCSV', $outstream);
fclose($outstream);
}
function someFunctionInTheBigPHPFile() {
header("Content-type: text/csv");
header("Content-Disposition: attachment; filename=file.csv");
header("Pragma: no-cache");
header("Expires: 0");
$mydata = array(
array('data11', 'data12', 'data13'),
array('data21', 'data22', 'data23'),
array('data31', 'data32', 'data23'));
outputCSV($mydata);
exit;
}
The output CSV does contain the data array. The problem is, this array is displayed along with the rest of the webpage, that is everything before this function is called and everything that comes after it, despite these two functions being the only ones that deal with any fopen and writing to files.
How can I stop the rest of the webpage from interfering? I only want the data array in the CSV..
EDIT: I managed to chop off everything succeeding my array by adding exit;, but I still have the problem of the entire website being displayed before the array.
Stop execution after outputting the CSV data. You can do this with die() or exit().
At the beginning of the PHP file, check straight away if you want to print your CSV (this is probably passed through $_POST or $_GET). If so, run it straight through the function and end that function with an exit;.
This prevents anything from happening before or after the CSV is created. For some reason all code on the page is included in the new file, even if the file stream was opened and closed at times independant of when the page's content was computed.
And this effectively leaves you with only what you wanted, not the rubbish before or after it.
Maybe i misunderstood you, but someFunctionInTheBigPHPFile() prints out the file to the screen. So, why are you using this i you dont want to a screen output ?
I'm currently testing out a new export to CSV feature for a report generated via a webapp. The relevant code looks thus:
$my_report_data = ReportDAO::runCampaignAnalysis($campaign_id, $start_date, $end_date);
$this->getResponse()->clearHttpHeaders();
$this->getResponse()->setHttpHeader('Content-Type', 'application/vnd.ms-excel');
$this->getResponse()->setHttpHeader('Content-Disposition', 'attachment; filename='export.csv');
$outstream = fopen("php://output", "w");
function __outputCSV(&$vals, $key, $filehandler) {
$retval = fputcsv($filehandler, $tempArray);
if($retval == FALSE) {
error_log('Uh oh, spaghetti o!');
error_log('The current line being processed is: ' . join('|', $vals));
}
}
array_walk($my_report_data, "__outputCSV", $outstream);
fclose($outstream);
return sfView::HEADER_ONLY;
$my_report_data is merely an multi-dimensional array of the form seen here.
The code works perfectly with small datasets e.g. 100 rows and below (not quite sure where the cut-off is unfortunately). With larger datasets; however, rather than being presented with a File Open/Save dialog by my browser when I attempt to export to CSV, the raw report contents get displayed on the web page.
I've examined the HTTP headers with the 'Live HTTP Headers' plugin for Firefox and found that with larger datasets the headers aren't set properly and appear as 'text/html; charset=utf-8' rather than 'application/vnd.ms-excel'. Very bizarre.
Odd: you might want to try flushing the output buffers periodically. I've found I need to do that to prevent similar errors. Something along the lines of:
if($len > 250){ $len = 0; if(ob_get_length()) ob_flush(); }
where $len is a count of lines. Messy to fit that into your array_walk, but that might help.
I believe I've found a possible solution. Not entirely sure why it works; but it does. I merely added the following before the call to clearHttpHeaders:
ob_start('ob_gzhandler');
Worked like a charm.
Of course after hours of pondering this problem, the first comment on my question lead me to solve it immediately.
The problem was that, although I was including this code within its own function at the top of the page, I was calling it only if a certain flag was set in the $_POST array. I wasn't checking for the flag until the end of the PHP file. I moved that check before the function, and it worked.
The original question is below:
I'm trying to use the fopen() function in PHP to output a CSV file, and although it contains the data I want, it also contains the entire HTML structure of the page, as well as inline stylesheets, before the content that I actually want to output.
I'm using this code (from here) pretty much unchanged. I'm very unfamiliar with PHP streaming and output, so I started from what I hope was a firm foundation:
$fileName = 'somefile.csv';
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header('Content-Description: File Transfer');
header("Content-type: text/csv");
header("Content-Disposition: attachment; filename={$fileName}");
header("Expires: 0");
header("Pragma: public");
$fh = #fopen( 'php://output', 'w' );
global $wpdb;
$query = "SELECT * FROM `{$wpdb->prefix}my_table`";
$results = $wpdb->get_results( $query, ARRAY_A );
$headerDisplayed = false;
foreach ( $results as $data ) {
// Add a header row if it hasn't been added yet
if ( !$headerDisplayed ) {
// Use the keys from $data as the titles
fputcsv($fh, array_keys($data));
$headerDisplayed = true;
}
// Put the data into the stream
fputcsv($fh, $data);
}
// Close the file
fclose($fh);
// Make sure nothing else is sent, our file is done
exit;
My assumption is that this example was intended to be included in its own external PHP file, but due to the constraints I'm dealing with, I'm trying to include it inline instead. I've mucked about with output buffering a bit with no positive results, but the PHP documentation on these is quite sparse, so there's probably something I'm missing.
Problem seems to be that, at the same time you try to output from the same PHP file, the CSV file AND some html content. You've got to separate them, to have 2 different URLs.
I guess your PHP code is surrounded by the html code (and css inline) you're talking about.
What you've got to do is:
have a PHP script that only outputs the CSV content (which only contains the code you showed us, with the opening php tag of course)
have another PHP script which produces html code, and provides a link to the previous script (for example).
You are on the right track with the 'include it inline' reason to why you're getting everything else before the data.
This script will need to be it's own separate file called directly, instead of including it inline in another script. I understand you have other database connections and such that have to be set up first. You'll have to extract those out of your standard pages and include those on this page.
Of course after hours of pondering this problem, the first comment on my question lead me to solve it immediately.
The problem was that, although I was including this code within its own function at the top of the page, I was calling it only if a certain flag was set in the $_POST array. I wasn't checking for the flag until the end of the PHP file. I moved that check before the function, and it worked.