I'm trying to export a lot of data trough a CSV export. The amount of data it's really big, around 100.000 records and counting.
My client usually uses two tabs to browse and check several stuff at the same time. So a requirement is that while the export is being made, he can continues browsing the system.
The issue is that when the CSV is being generated on the server, the session is blocked, you cannot load another page until the generation is completed.
This is what I'm doing:
Open the file
Loop trough the amount of data(One query per cycle, each cycle queries 5000 records) pd: I cannot change this, because of certain limitations.
write the data into the file
free memory
close the file
set headers to begin download
During the entire process, it's not possible to navigate the site in another tab.
The block of code:
$temp = 1;
$first = true;
$fileName = 'csv_data_' . date("Y-m-d") . '-' . time() . '.csv';
$filePath = CSV_EXPORT_PATH . $fileName;
// create CSV file
$fp = fopen($filePath, 'a');
// get data
for ($i = 1; $i <= $temp; $i++) {
// get lines
$data = $oPB->getData(ROWS_PER_CYCLE, $i); // ROWS_PER_CYCLE = 5000
// if something is empty, exit
if (empty($data)) {
break;
}
// write the data that will be exported into a file
fwrite($fp, $export->arrayToCsv($data, '', '', $first));
// count element
$temp = ceil($data[0]->foundRows / ROWS_PER_CYCLE); // foundRows is always the same value, doesn't change per query.
$first = false; // hide header for next rows
// free memory
unset($lines);
}
// close file
fclose($fp);
/**
* Begin Download
*/
$export->csvDownload($filePath); // set headers
Some considerations:
The count is being made in the same query, but it's not entering into an infinite loop, works as expected. It's contained into $data[0]->foundRows, and avoids an unnecesary query to count all the available records.
There're several memory limitations due to environment settings, that I cannot change.
Does anyone know How can I improve this? Or any other solution.
Thanks for reading.
I'm replying only because it can be helpful to someone else. A colleague came up with a solution for this problem.
Call the function session_write_close() before
$temp = 1;
Doing this, you're ending the current session and storing the session data, so I'm being able to download the file a continue navigating in other tabs.
I hope it helps some one.
Some considerations about this solution:
You must no require to use session data after session_write_close()
The export script is in another file. For ex: home.php calls trough a link export.php
Related
I have a small web-page that delivers different content to a user based on a %3 (modulo 3) of a counter. The counter is read in from a file with php, at which point the counter is incremented and written back into the file over the old value.
I am trying to get an even distribution of the content in question which is why I have this in place.
I am worried that if two users access the page at a similar time then they could either both be served the same data or that one might fail to increment the counter since the other is currently writing to the file.
I am fairly new to web-dev so I am not sure how to approach this without mutex's. I considered having only one open and close and doing all of the operations inside of it but I am trying to minimize time where in which a user could fail to access the file. (hence why the read and write are in separate opens)
What would be the best way to implement a sort of mutual exclusion so that only one person will access the file at a time and create a queue for access if multiple overlapping requests for the file come in? The primary goal is to preserve the ratio of the content that is being shown to users which involves keeping the counter consistent.
The code is as follows :
<?php
session_start();
$counterName = "<path/to/file>";
$file = fopen($counterName, "r");
$currCount = fread($file, filesize($counterName));
fclose($file);
$newCount = $currCount + 1;
$file = fopen($counterName,'w');
if(fwrite($file, $newCount) === FALSE){
echo "Could not write to the file";
}
fclose($file);
?>
Just in case anyone finds themselves with the same issue, I was able to fix the problem by adding in
flock($fp, LOCK_EX | LOCK_NB) before writing into the file as per the documentation for php's flock function. I have read that it is not the most reliable, but for what I am doing it was enough.
Documentation here for convenience.
https://www.php.net/manual/en/function.flock.php
I have a web interface that I built into the admin section of a WordPress site. It scrapes a few tables in my database and just displays a big list of data row by row. There are about 30,000 rows of this data, displayed with a basic echo in a for loop. Displaying all 30,000 rows on a page works fine.
Additionally, I include an option to download a CSV file of the complete rows of data. I use fopen and then fputcsv to build the CSV file for download from the result of the data query. This feature used to work, but now that the dataset is at 30,000, the CSV will no longer generate correctly. What happens is the first 200~1000 rows will be written to the CSV file leaving out the majority of the data. I estimate that the CSV that is not properly generated in my case would be about 10 Megs. Then the file will download the first 200~1000 rows as though everything was working correctly.
Here is the code:
// This gets a huge list of data from a SP I built. This data is well formed
$data = $this->run_stats_stored_procedure($job_to_report);
// This is where the data is converted into a csv file. This part is broken
// the file may already exist at that location burn it down if it does
if(file_exists(ABSPATH . "some/path/to/my/file/csv_export.csv")) {
unlink(ABSPATH . "some/path/to/my/file/csv_export.csv");
}
$csv_file_handler = fopen(ABSPATH . "some/path/to/my/file/candidate_export.csv", 'w');
if(!empty($csv_file_handler)) {
$title_array = array(
"ID",
"other_feild"
);
fputcsv($csv_file_handler, $title_array, ",");
if(!empty($data)) {
foreach($data as $data_piece) {
$array_as_csv_line = array();
foreach($data_piece as $object_property) {
$array_as_csv_line[] = (string)$object_property;
}
fputcsv($csv_file_handler, $array_as_csv_line, ",");
unset($array_as_csv_line);
}
} else {
fputcsv($csv_file_handler, array("empty"), ",");
}
// pros clean everything up when they are done
fclose($csv_file_handler);
}
I'm not sure what I need to change to get the entire CSV file to download. I believe this could be a configuration issue, but I'm not should. I am led to believe this because this function used to work with even 20,000 csv rows, it is now at 30,000 and breaking. Please let me know if additional info would help. Has anyone bumped into issues with huge CSV files before? Thank you to anyone who can help.
Is the "download" taking more than say a minute, two minutes, or three minutes? If so, the webserver could be closing the connection. For example, if you're using the Apache FCGI module, it has this directive:
FcgidBusyTimeout
which defaults to 300 seconds.
This is the maximum time limit for request handling. If a FastCGI request does not complete within FcgidBusyTimeout seconds, it will be subject to termination.
Hope this helps you solve your problem.
The answer that I am currently implementing is to allow the script to use more time. To do this, I am simply running the following code before the script runs:
set_time_limit ( 3600 );
I am doing further research because this is not a sustainable solution. Any further advice would be greatly appreciated.
What's the cleanest way in php to open a file, read the contents, and subsequently overwrite the file's contents with some output based on the original contents? Specifically, I'm trying to open a file populated with a list of items (separated by newlines), process/add items to the list, remove the oldest N entries from the list, and finally write the list back into the file.
fopen(<path>, 'a+')
flock(<handle>, LOCK_EX)
fread(<handle>, filesize(<path>))
// process contents and remove old entries
fwrite(<handle>, <contents>)
flock(<handle>, LOCK_UN)
fclose(<handle>)
Note that I need to lock the file with flock() in order to protect it across multiple page requests. Will the 'w+' flag when fopen()ing do the trick? The php manual states that it will truncate the file to zero length, so it seems that may prevent me from reading the file's current contents.
If the file isn't overly large (that is, you can be confident loading it won't blow PHP's memory limit), then the easiest way to go is to just read the entire file into a string (file_get_contents()), process the string, and write the result back to the file (file_put_contents()). This approach has two problems:
If the file is too large (say, tens or hundreds of megabytes), or the processing is memory-hungry, you're going to run out of memory (even more so when you have multiple instances of the thing running).
The operation is destructive; when the saving fails halfway through, you lose all your original data.
If any of these is a concern, plan B is to process the file and at the same time write to a temporary file; after successful completion, close both files, rename (or delete) the original file and then rename the temporary file to the original filename.
Read
$data = file_get_contents($filename);
Write
file_put_contents($filename, $data);
One solution is to use a separate lock file to control access.
This solution assumes that only your script, or scripts you have access to, will want to write to the file. This is because the scripts will need to know to check a separate file for access.
$file_lock = obtain_file_lock();
if ($file_lock) {
$old_information = file_get_contents('/path/to/main/file');
$new_information = update_information_somehow($old_information);
file_put_contents('/path/to/main/file', $new_information);
release_file_lock($file_lock);
}
function obtain_file_lock() {
$attempts = 10;
// There are probably better ways of dealing with waiting for a file
// lock but this shows the principle of dealing with the original
// question.
for ($ii = 0; $ii < $attempts; $ii++) {
$lock_file = fopen('/path/to/lock/file', 'r'); //only need read access
if (flock($lock_file, LOCK_EX)) {
return $lock_file;
} else {
//give time for other process to release lock
usleep(100000); //0.1 seconds
}
}
//This is only reached if all attempts fail.
//Error code here for dealing with that eventuality.
}
function release_file_lock($lock_file) {
flock($lock_file, LOCK_UN);
fclose($lock_file);
}
This should prevent a concurrently-running script reading old information and updating that, causing you to lose information that another script has updated after you read the file. It will allow only one instance of the script to read the file and then overwrite it with updated information.
While this hopefully answers the original question, it doesn't give a good solution to making sure all concurrent scripts have the ability to record their information eventually.
Ok so i have a .txt file with a bunch of urls. I got a script that gets 1 of the lines randomly. I then included this into another page.
However I want the url to change every 15 minutes. So I'm guessing I'm gonna need to use a cron, however I'm not sure how I should put it all into place.
I found if you include a file, it's still going to give a random output so I'm guessing if I run the cron and the include file it's going to get messy.
So what I'm thinking is I have a script that randomly selects a url from my initial text file then it saves it to another .txt file and I include that file on the final page.
I just found this which is sort of in the right direction:
Include php code within echo from a random text
I'm not the best with writing php (can understand it perfectly) so all help is appreciated!
So what I'm thinking is I have a
script that randomly selects a url
from my initial text file then it
saves it to another .txt file and I
include that file on the final page.
That's pretty much what I would do.
To re-generate that file, though, you don't necessarily need a cron.
You could use the following idea :
If the file has been modified less that 15 minutes ago (which you can find out using filemtime() and comparing it with time())
then, use what in the file
else
re-generate the file, randomly choosing one URL from the big file
and use the newly generated file
This way, no need for a cron : the first user that arrives more than 15 minutes after the previous modification of the file will re-generate it, with a new URL.
Alright so I sorta solved my own question:
<?php
// load the file that contain thecode
$adfile = "urls.txt";
$ads = array();
// one line per code
$fh = fopen($adfile, "r");
while(!feof($fh)) {
$line = fgets($fh, 10240);
$line = trim($line);
if($line != "") {
$ads[] = $line;
}
}
// randomly pick an code
$num = count($ads);
$idx = rand(0, $num-1);
$f = fopen("output.txt", "w");
fwrite($f, $ads[$idx]);
fclose($f);
?>
However is there anyway I can delete the chosen line once it has been picked?
This first script gets called several times for each user via an AJAX request. It calls another script on a different server to get the last line of a text file. It works fine, but I think there is a lot of room for improvement but I am not a very good PHP coder, so I am hoping with the help of the community I can optimize this for speed and efficiency:
AJAX POST Request made to this script
<?php session_start();
$fileName = $_POST['textFile'];
$result = file_get_contents($_SESSION['serverURL']."fileReader.php?textFile=$fileName");
echo $result;
?>
It makes a GET request to this external script which reads a text file
<?php
$fileName = $_GET['textFile'];
if (file_exists('text/'.$fileName.'.txt')) {
$lines = file('text/'.$fileName.'.txt');
echo $lines[sizeof($lines)-1];
}
else{
echo 0;
}
?>
I would appreciate any help. I think there is more improvement that can be made in the first script. It makes an expensive function call (file_get_contents), well at least I think its expensive!
This script should limit the locations and file types that it's going to return.
Think of somebody trying this:
http://www.yoursite.com/yourscript.php?textFile=../../../etc/passwd (or something similar)
Try to find out where delays occur.. does the HTTP request take long, or is the file so large that reading it takes long.
If the request is slow, try caching results locally.
If the file is huge, then you could set up a cron job that extracts the last line of the file at regular intervals (or at every change), and save that to a file that your other script can access directly.
readfile is your friend here
it reads a file on disk and streams it to the client.
script 1:
<?php
session_start();
// added basic argument filtering
$fileName = preg_replace('/[^A-Za-z0-9_]/', '', $_POST['textFile']);
$fileName = $_SESSION['serverURL'].'text/'.$fileName.'.txt';
if (file_exists($fileName)) {
// script 2 could be pasted here
//for the entire file
//readfile($fileName);
//for just the last line
$lines = file($fileName);
echo $lines[count($lines)-1];
exit(0);
}
echo 0;
?>
This script could further be improved by adding caching to it. But that is more complicated.
The very basic caching could be.
script 2:
<?php
$lastModifiedTimeStamp filemtime($fileName);
if (isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
$browserCachedCopyTimestamp = strtotime(preg_replace('/;.*$/', '', $_SERVER['HTTP_IF_MODIFIED_SINCE']));
if ($browserCachedCopyTimestamp >= $lastModifiedTimeStamp) {
header("HTTP/1.0 304 Not Modified");
exit(0);
}
}
header('Content-Length: '.filesize($fileName));
header('Expires: '.gmdate('D, d M Y H:i:s \G\M\T', time() + 604800)); // (3600 * 24 * 7)
header('Last-Modified: '.date('D, d M Y H:i:s \G\M\T', $lastModifiedTimeStamp));
?>
First things first: Do you really need to optimize that? Is that the slowest part in your use case? Have you used xdebug to verify that? If you've done that, read on:
You cannot really optimize the first script usefully: If you need a http-request, you need a http-request. Skipping the http request could be a performance gain, though, if it is possible (i.e. if the first script can access the same files the second script would operate on).
As for the second script: Reading the whole file into memory does look like some overhead, but that is neglibable, if the files are small. The code looks very readable, I would leave it as is in that case.
If your files are big, however, you might want to use fopen() and its friends fseek() and fread()
# Do not forget to sanitize the file name here!
# An attacker could demand the last line of your password
# file or similar! ($fileName = '../../passwords.txt')
$filePointer = fopen($fileName, 'r');
$i = 1;
$chunkSize = 200;
# Read 200 byte chunks from the file and check if the chunk
# contains a newline
do {
fseek($filePointer, -($i * $chunkSize), SEEK_END);
$line = fread($filePointer, $i++ * $chunkSize);
} while (($pos = strrpos($line, "\n")) === false);
return substr($line, $pos + 1);
If the files are unchanging, you should cache the last line.
If the files are changing and you control the way they are produced, it might or might not be an improvement to reverse the order lines are written, depending on how often a line is read over its lifetime.
Edit:
Your server could figure out what it wants to write to its log, put it in memcache, and then write it to the log. The request for the last line could be fulfulled from memcache instead of file read.
The most probable source of delay is that cross-server HTTP request. If the files are small, the cost of fopen/fread/fclose is nothing compared to the whole HTTP request.
(Not long ago I used HTTP to retrieve images to dinamically generate image-based menus. Replacing the HTTP request by a local file read reduced the delay from seconds to tenths of a second.)
I assume that the obvious solution of accessing the file server filesystem directly is out of the question. If not, then it's the best and simplest option.
If not, you could use caching. Instead of getting the whole file, you just issue a HEAD request and compare the timestamp to a local copy.
Also, if you are ajax-updating a lot of clients based on the same files, you might consider looking at using comet (meteor, for example). It's used for things like chats, where a single change has to be broadcasted to several clients.