php large file download timeout - php

first time posting so sorry if I get anything wrong.
I'm trying to create a secure file download storefront. Actually it works, but only with small file. I have a 1.9gb product to download and it keeps stopping partway through the transfer. Inconsistent sizes too, I've had up to 1gb, but often it is 200-500mb.
The aim is to create a space where only users with a registered account can download the file, so direct link is not possible.
I've read elsewhere on this site that resetting the script timeout within the file read loop should get around the script time limit.
try
{
$num_bytes = filesize ("products/" . $filename);
$mp3content = fopen("products/" . $filename, "rb") or die("Couldn't get handle");
$bytes_read=0;
if ($mp3content) {
while (!feof($mp3content)) {
set_time_limit(30);
$buffer = fread($mp3content, 4096);
echo $buffer;
$bytes_read+=4096;
}
fclose($handle);
}
}
catch (Exception $e)
{
error_log("User failed to download file: " . $row['FILENAME'] . "(" . $row['MIMETYPE'] . ")\n" . $e, 1, getErrorEmail());
}
error_log("Bytes downloaded:" . $bytes_read . " of " . $num_bytes, 1, getErrorEmail());
I don't receive the final error log email on large files that fail, but I do get the emails on smaller files that succeed, so I know the code works in principle.

Turns out my hosting is the issue. The PHP code is correct, but my shared hosting environment limits all php scripts to 30 seconds, which in the case of the code above, takes about 15 minutes to run its course. Unless someone can come up with a way of keeping PHP tied up in file handling methods which don't contribute to the timer, looks like this one is stuck.

Try this one
set_time_limit(0);

I had the same problem so I thought of a different approach.
When file is requested, I make a hard link of the file in a random named directory inside the "download" folder and give the user the link for 4 hours.
File url finishes being like this.
http://example.com/downloads/3nd83js92kj29dmcb39dj39/myfile.zip
Every call to the script parses the "download" folders and delete all folders and their contents that have over 4 hours of creation time to keep the thing clean.
This is not safe for brut force attacks, but can be worked around.

Related

Weird PHP error: exec() hangs sometimes on simple script

Heyas,
So this simple exec() script runs fine for the first two times, in trying to generate a PDF file from a webpage (using wkhtmltopdf).
It first) deletes the existing file, and second) creates the new PDF file in its place. If I run the script a second time, it deletes the file again, and then creates a new one, as expected. However, if I run it one more time, it deletes the files, creates a new one, but then the script seems to hang until the 30-second 504 timeout error is given. The script, when it works, only takes about 3 seconds to run/return. It also kills the entire server (any other local PHP sites no longer work). If I restart the PHP server, everything still hangs (with no success). Interestingly, if I run the script once, and then restart the PHP server, I can keep doing this without issue (but only generating the PDF up to two times). No PHP errors are logged.
Why would it be stalling out subsequent times?
$filePath = 'C:\\wtserver\\tmp\\order_' . $orderId . '.pdf';
// delete an existing file
if(file_exists($filePath)) {
if(!unlink($filePath)) {
echo 'Error deleting existing file: ' . $filePath;
return;
}
}
// generates PDF file at C:\wtserver\tmp\order_ID.pdf
exec('wkhtmltopdf http://google.com ' . $filePath);
I've tried a simple loop to check for the script's completion (successful output), and then try to exit, but it still hangs:
while(true) {
if(file_exists($filePath)) {
echo 'exit';
exit(); // have also tried die()
break;
}
//todo: add time check/don't hang
}
If I can't figure this bit out, for now, is there a way to kill the exec script, wrapping it somehow? The PDF is still generated, so the script is working, but I need to kill it and return a response to the user.
Solution:
Have to redirect standard output AND standard error, to end the process immediately, ie. in Windows:
exec('wkhtmltopdf http://google.com ' . $filePath . ' > NUL 2> NUL');
do you know that you can run the executable in background, like this
exec($cmd . " > /dev/null &");
This way you can immediately come out of it.

PHP, check if the file is being written to/updated by PHP script?

I have a script that re-writes a file every few hours. This file is inserted into end users html, via php include.
How can I check if my script, at this exact moment, is working (e.g. re-writing) the file when it is being called to user for display? Is it even an issue, in terms of what will happen if they access the file at the same time, what are the odds and will the user just have to wait untill the script is finished its work?
Thanks in advance!
More on the subject...
Is this a way forward using file_put_contents and LOCK_EX?
when script saves its data every now and then
file_put_contents($content,"text", LOCK_EX);
and when user opens the page
if (file_exists("text")) {
function include_file() {
$file = fopen("text", "r");
if (flock($file, LOCK_EX)) {
include_file();
}
else {
echo file_get_contents("text");
}
}
} else {
echo 'no such file';
}
Could anyone advice me on the syntax, is this a proper way to call include_file() after condition and how can I limit a number of such calls?
I guess this solution is also good, except same call to include_file(), would it even work?
function include_file() {
$time = time();
$file = filectime("text");
if ($file + 1 < $time) {
echo "good to read";
} else {
echo "have to wait";
include_file();
}
}
To check if the file is currently being written, you can use filectime() function to get the actual time the file is being written.
You can get current timestamp on top of your script in a variable and whenever you need to access the file, you can compare the current timestamp with the filectime() of that file, if file creation time is latest then the scenario occured when you have to wait for that file to be written and you can log that in database or another file.
To prevent this scenario from happening, you can change the script which is writing the file so that, it first creates temporary file and once it's done you just replace (move or rename) the temporary file with original file, this action would require very less time compared to file writing and make the scenario occurrence very rare possibility.
Even if read and replace operation occurs simultaneously, the time the read script has to wait will be very less.
Depending on the size of the file, this might be an issue of concurrency. But you might solve that quite easy: before starting to write the file, you might create a kind of "lock file", i.e. if your file is named "incfile.php" you might create an "incfile.php.lock". Once you're doen with writing, you will remove this file.
On the include side, you can check for the existance of the "incfile.php.lock" and wait until it's disappeared, need some looping and sleeping in the unlikely case of a concurrent access.
Basically, you should consider another solution by just writing the data which is rendered in to that file to a database (locks etc are available) and render that in a module which then gets included in your page. Solutions like yours are hardly to maintain on the long run ...
This question is old, but I add this answer because the other answers have no code.
function write_to_file(string $fp, string $string) : bool {
$timestamp_before_fwrite = date("U");
$stream = fopen($fp, "w");
fwrite($stream, $string);
while(is_resource($stream)) {
fclose($stream);
}
$file_last_changed = filemtime($fp);
if ($file_last_changed < $timestamp_before_fwrite) {
//File not changed code
return false;
}
return true;
}
This is the function I use to write to file, it first gets the current timestamp before making changes to the file, and then I compare the timestamp to the last time the file was changed.

wkhtmltopdf failing to convert local pages to PDF

I've been trying to get wkhtmltopdf to convert pages on a website and it's failing to convert pages that are on the same. It'll convert and store external pages (tried it with google and bbc.co.uk, both worked) so the permissions are fine but if I try to convert a local page, either a static html file or one generated by a script, it takes around 3 minutes before failing.
The output says the page has failed to load, if forcibly ignore this, I end up with a blank PDF.
I thought it might be session locking but closing the session resulted in the same issue. I feel it's something down to the way the server may be behaving though
Here's the code in question:
session_write_close ();
set_time_limit (0);
ini_set('memory_limit', '1024M');
Yii::app()->setTheme("frontend");
// Grabbing the page name
$ls_url = Yii::app()->request->getHostInfo().Yii::app()->request->url;
// Let's remove the PDF otherwise we'll be in endless loop
$ls_url = str_replace('.pdf','',$ls_url);
// Setting paths
$ls_basePath = Yii::app()->basePath."/../extras/wkhtmltopdf/";
if(PHP_OS=="Darwin")
$ls_binary = $ls_basePath . "wkhtmltopdf-osx";
else
$ls_binary = $ls_basePath . "wkhtmltopdf";
$ls_generatedPagesPath = $ls_basePath . "generated-pages/";
$ls_outputFileName = str_replace(array("/",":"),"-",$ls_url)."--".date("dmY-His").".pdf";
$ls_outputFile = $ls_generatedPagesPath. $ls_outputFileName;
// making sure no nasty chars are in place
$ls_command = escapeshellcmd($ls_binary ." --load-error-handling ignore " . $ls_url . " " . $ls_outputFile);
// Let's run things now
system($ls_command);
did you lynx that exact url? since wkhtmltopdf is actually small but powerful webkit browser, it fails places just like a normal browser.
check the URL you gave, check external URLs within your page are accessible from your server. It loads CSS, external images, iframes, everything before it even starts making PDF.
Personally, I love wkhtmltpdf. nothing beats it.

Creating files on a time (hourly) basis

I experimenting with twitter streaming API,
I use Phirehose to connect to twitter and fetch the data but having problems storing it in files for further processing.
Basically what I want to do is to create a file named
date("YmdH")."."txt"
for every hour of connection.
Here is how my code looks like right now (not handling the hourly change of files)
public function enqueueStatus($status)
$data = json_decode($status,true);
if(isset($data['text'])/*more conditions here*/) {
$fp = fopen("/tmp/$time.txt");
fwirte ($status,$fp);
fclose($fp);
}
Help is as always much appreciated :)
You want the 'append' mode in fopen - this will either append to a file or create it.
if(isset($data['text'])/*more conditions here*/) {
$fp = fopen("/tmp/" . date("YmdH") . ".txt", "a");
fwrite ($status,$fp);
fclose($fp);
}
From the Phirehose googlecode wiki:
As of Phirehose version 0.2.2 there is
an example of a simple "ghetto queue"
included in the tarball (see file:
ghetto-queue-collect.php and
ghetto-queue-consume.php) that shows
how statuses could be easily collected
on to the filesystem for processing
and then picked up by a separate
process (consume).
This is a complete working sample of doing what you want to do. The rotation time interval is configurable too. Additionally there's another script to consume and process the written files too.
Now if only I could find a way to stop the whole sript, my log keeps filling up (the script continues execution) even if I close the browser tab :P

How can I optimize this simple PHP script?

This first script gets called several times for each user via an AJAX request. It calls another script on a different server to get the last line of a text file. It works fine, but I think there is a lot of room for improvement but I am not a very good PHP coder, so I am hoping with the help of the community I can optimize this for speed and efficiency:
AJAX POST Request made to this script
<?php session_start();
$fileName = $_POST['textFile'];
$result = file_get_contents($_SESSION['serverURL']."fileReader.php?textFile=$fileName");
echo $result;
?>
It makes a GET request to this external script which reads a text file
<?php
$fileName = $_GET['textFile'];
if (file_exists('text/'.$fileName.'.txt')) {
$lines = file('text/'.$fileName.'.txt');
echo $lines[sizeof($lines)-1];
}
else{
echo 0;
}
?>
I would appreciate any help. I think there is more improvement that can be made in the first script. It makes an expensive function call (file_get_contents), well at least I think its expensive!
This script should limit the locations and file types that it's going to return.
Think of somebody trying this:
http://www.yoursite.com/yourscript.php?textFile=../../../etc/passwd (or something similar)
Try to find out where delays occur.. does the HTTP request take long, or is the file so large that reading it takes long.
If the request is slow, try caching results locally.
If the file is huge, then you could set up a cron job that extracts the last line of the file at regular intervals (or at every change), and save that to a file that your other script can access directly.
readfile is your friend here
it reads a file on disk and streams it to the client.
script 1:
<?php
session_start();
// added basic argument filtering
$fileName = preg_replace('/[^A-Za-z0-9_]/', '', $_POST['textFile']);
$fileName = $_SESSION['serverURL'].'text/'.$fileName.'.txt';
if (file_exists($fileName)) {
// script 2 could be pasted here
//for the entire file
//readfile($fileName);
//for just the last line
$lines = file($fileName);
echo $lines[count($lines)-1];
exit(0);
}
echo 0;
?>
This script could further be improved by adding caching to it. But that is more complicated.
The very basic caching could be.
script 2:
<?php
$lastModifiedTimeStamp filemtime($fileName);
if (isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
$browserCachedCopyTimestamp = strtotime(preg_replace('/;.*$/', '', $_SERVER['HTTP_IF_MODIFIED_SINCE']));
if ($browserCachedCopyTimestamp >= $lastModifiedTimeStamp) {
header("HTTP/1.0 304 Not Modified");
exit(0);
}
}
header('Content-Length: '.filesize($fileName));
header('Expires: '.gmdate('D, d M Y H:i:s \G\M\T', time() + 604800)); // (3600 * 24 * 7)
header('Last-Modified: '.date('D, d M Y H:i:s \G\M\T', $lastModifiedTimeStamp));
?>
First things first: Do you really need to optimize that? Is that the slowest part in your use case? Have you used xdebug to verify that? If you've done that, read on:
You cannot really optimize the first script usefully: If you need a http-request, you need a http-request. Skipping the http request could be a performance gain, though, if it is possible (i.e. if the first script can access the same files the second script would operate on).
As for the second script: Reading the whole file into memory does look like some overhead, but that is neglibable, if the files are small. The code looks very readable, I would leave it as is in that case.
If your files are big, however, you might want to use fopen() and its friends fseek() and fread()
# Do not forget to sanitize the file name here!
# An attacker could demand the last line of your password
# file or similar! ($fileName = '../../passwords.txt')
$filePointer = fopen($fileName, 'r');
$i = 1;
$chunkSize = 200;
# Read 200 byte chunks from the file and check if the chunk
# contains a newline
do {
fseek($filePointer, -($i * $chunkSize), SEEK_END);
$line = fread($filePointer, $i++ * $chunkSize);
} while (($pos = strrpos($line, "\n")) === false);
return substr($line, $pos + 1);
If the files are unchanging, you should cache the last line.
If the files are changing and you control the way they are produced, it might or might not be an improvement to reverse the order lines are written, depending on how often a line is read over its lifetime.
Edit:
Your server could figure out what it wants to write to its log, put it in memcache, and then write it to the log. The request for the last line could be fulfulled from memcache instead of file read.
The most probable source of delay is that cross-server HTTP request. If the files are small, the cost of fopen/fread/fclose is nothing compared to the whole HTTP request.
(Not long ago I used HTTP to retrieve images to dinamically generate image-based menus. Replacing the HTTP request by a local file read reduced the delay from seconds to tenths of a second.)
I assume that the obvious solution of accessing the file server filesystem directly is out of the question. If not, then it's the best and simplest option.
If not, you could use caching. Instead of getting the whole file, you just issue a HEAD request and compare the timestamp to a local copy.
Also, if you are ajax-updating a lot of clients based on the same files, you might consider looking at using comet (meteor, for example). It's used for things like chats, where a single change has to be broadcasted to several clients.

Categories