Downloading a very large XML file with PHP

Downloading a very large XML file with PHP - php

I currently have a script written that begins downloading an large (1.3GB) XML file from the web but I have encountered a number of problems. This is my code:
function readfile_chunked ($filename) {
$chunksize = 1*(1024*1024);
$buffer = '';
$handle = fopen($filename, 'rb');
if ($handle === false) {
return false;
}
while (!feof($handle)) {
$buffer = fread($handle, $chunksize);
//print $buffer;
$myFile = "test.xml";
$fh = fopen($myFile, 'a') or die("can't open file");
fwrite($fh, $buffer);
fclose($fh);
}
return fclose($handle);
}
The first (and main) problem is the following error while downloading saying:
Fatal error: Maximum execution time of 30 seconds exceeded in /Applications/MAMP/htdocs/test/test.php on line 53
As I understand it this is basically a timeout and i've read about changing timeout settings in php.ini but i'm conscious that when this application goes live i won't be able to edit the php.ini file on the shared server.
This problem brings me onto my next one - i want to implement some kind of error-checking and prevention. For example, if the connection to the server goes down i'd like to be able to resume when the connection is restored. I realise this may not be possible though. An alternative would be to compare filesizes of local and remote maybe?
I also need to add an Accept-Encoding: gzip HTTP header in my request.
And that would finally bring me onto some kind of progress notification that I would like, presumably constantly polling with JavaScript comparing local and remote filesizes perhaps?
The first two points, however, would be the most important as currently I can't download the file I require. Any help would be appreciated.

Regarding your question about the timeout. I would suggest to run that task as a cron job. When running PHP from the command line, the default setting of maximum execution time is 0 (no time limit). This way you will avoid the guess work on how long it will take to download the file, which is variable that depends on various factors. I believe the majority of shared hosts allow you to run cron jobs.
For download resuming and gzip, I would suggest using the PEAR package HTTP_Download
It supports HTTP compression, caching and partial downloads, resuming and sending raw data

I had a similar problem with php and inserted the following code to get around the execution time problem:
ignore_user_abort(true);
set_time_limit(0);
ini_set('memory_limit', '2048M');

Related

Function file() reads only part of contents in PHP 7.4.27

We have a complex web interface on our dedicated server (Windows Server) which has few console processes with PHP applications running in background and doing various data processing. Some of them write to log files
# file_put_contents( $logsFileNamePath, $fileContents, FILE_APPEND | LOCK_EX );
and there is one console process that analyzes these log files and seeks potential problems, it executes the code below (it can happen in the same time when other console processes write to the log file):
$logContent = file( $filePath );
$logContent = array_reverse( $logContent );
//... analyze the content
and this has been working fine for years untill recent PHP upgrade from 7.4.20 to 7.4.27 and suddenly sometimes the process that analyzes log files gets contents from a log file which is truncated in the middle of some line. My suspicion is that there is some internal error in this newer version of PHP that causes this... maybe locking of a file no longer works ? It looks like!
Has anyone of you seen similar problem recently? Do you have any ideas what to do with this? Thanks.

A patch update from .20 to .27 should not include any backward compatibility breaks, so your experience here is almost certainly just perceptionary.
LOCK_EX is advisory, it does not have to be respected, and file() does not look at it. It seems pretty clear that your reader is reading before the writer has finished. If you want the reader to respect the lock, you'll have to use identical locking mechanisms (i.e., fopen() and flock()) in the reader as well:
$fp = fopen('/path/to/file', 'r');
flock($fp, LOCK_EX); // this will block until the writer is done
while ($line = fgets($fp)) {
// process $line
}
fclose($fp);
You might also simply update the writer so that it writes to a temp file and then renames it when it's done:
file_put_contents('/path/to/file.temp', $contents);
rename('/path/to/file.temp', '/path/to/file');
And then update your reader to expect the file to occasionally be missing:
if (!file_exists('/path/to/file')) {
throw new Exception('Nothing to do.');
}

read big file in php (more than 500mb)

I need to read a large file to find some labels and create a dynamic form. I can not use file() or file_get_contents() because the file size.
If I read the file line by line with the following code
set_time_limit(0);
$handle = fopen($file, 'r');
set_time_limit(0);
if ($handle) {
while (!feof($handle)) {
$line = fgets($handle);
if ($line) {
//do something.
}
}
}
echo 'Read complete';
I get the following error in Chrome:
Error 101 (net::ERR_CONNECTION_RESET)
This error occurs after several minutes so that the constant max_input_time, I think not is the problem.(is set to 60).

What browser software do you use? Apache, nginx? You should set the max accepted file upload at somewhere higher than 500MB. Furthermore, the max upload size in the php.ini should be bigger than 500MB, too, and I think that PHP must be allowed to spawn processes larger than 500MB. (check this in your php config).

Set the memory limit ini_set("memory_limit","600M");also you need to set the time out limit
set_time_limit(0);

Generally long running processes should not be done while the users waits for them to complete.
I'd recommend using a background job oriented tool that can handle this type of work and can be queried about the status of the job (running/finished/error).
My first guess is that something in the middle breaks the connection because of a timeout. Whether it's a timeout in the web server (which PHP cannot know about) or some firewall, it doesn't really matter, PHP gets a signal to close the connection and the script stops running. You could circumvent this behaviour by using ignore-user-abort(true), this along with set_time_limit(0) should do the trick.
The caveat is that whatever caused the connection abort will still do it, though the script would still finish it's job. One very annoying side effect is that this script could possibly be executed multiple times in parallel without neither of them ever completing.
Again, I recommend using some background task to do it and an interface for the end-user (browser) to verify the status of that task. You could also implement a basic one yourself via cron jobs and database/text files that hold the status.

Downloading large(ish) zip served with PHP gets corrupted for people with a slow connection

I'm a novice, so I'll try and do my best to explain a problem I'm having. I apologize in advance if there's something I left out or is unclear.
I'm serving an 81MB zip file outside my root directory to people who are validated beforehand. I've been getting reports of corrupted downloads or an inability to complete the download. I've verified this happening on my machine if I simulate a slow connection.
I'm on shared hosting running Apache-Coyote/1.1.
I get a network timeout error. I think my host might be doing killing the downloads if they take too long, but they haven't verified either way.
I thought I was maybe running into a memory limit or time limit, so my host installed the apache module XSendFile. My headers in the file that handles the download after validation are being set this way:
<?php
set_time_limit(0);
$file = '/absolute/path/to/myzip/myzip.zip';
header("X-Sendfile: $file");
header("Content-type: application/zip");
header('Content-Disposition: attachment; filename="' . basename($file) . '"');
Any help or suggestions would be appreciated. Thanks!

I would suggest taking a look at this comment:
http://www.php.net/manual/en/function.readfile.php#99406
Particularly, if you are using apache. If not the code in the link above should be helpful:
I started running into trouble when I had really large files being sent to clients with really slow download speeds. In those cases, the
script would time out and the download would terminate with an
incomplete file. I am dead-set against disabling script timeouts - any
time that is the solution to a programming problem, you are doing
something wrong - so I attempted to scale the timeout based on the
size of the file. That ultimately failed though because it was
impossible to predict the speed at which the end user would be
downloading the file at, so it was really just a best guess so
inevitably we still get reports of script timeouts.
Then I stumbled across a fantastic Apache module called mod_xsendfile ( https://tn123.org/mod_xsendfile/ (binaries) or
https://github.com/nmaier/mod_xsendfile (source)). This module
basically monitors the output buffer for the presence of special
headers, and when it finds them it triggers apache to send the file on
its own, almost as if the user requested the file directly. PHP
processing is halted at that point, so no timeout errors regardless of
the size of the file or the download speed of the client. And the end
client gets the full benefits of Apache sending the file, such as an
accurate file size report and download status bar.
The code I finally ended up with is too long to post here, but in general is uses the mod_xsendfile module if it is present, and if not
the script falls back to using the code I originally posted. You can
find some example code at https://gist.github.com/854168
EDIT
Just to have a reference of code that does the "chunking" Link to Original Code:
<?php
function readfile_chunked ($filename,$type='array') {
$chunk_array=array();
$chunksize = 1*(1024*1024); // how many bytes per chunk
$buffer = '';
$handle = fopen($filename, 'rb');
if ($handle === false) {
return false;
}
while (!feof($handle)) {
switch($type)
{
case'array':
// Returns Lines Array like file()
$lines[] = fgets($handle, $chunksize);
break;
case'string':
// Returns Lines String like file_get_contents()
$lines = fread($handle, $chunksize);
break;
}
}
fclose($handle);
return $lines;
}
?>

PHP Downloading a Huge Movie File (500 MB) with cURL

Okay, I have a problem that I hope you can help me fix.
I am running a server that stores video files that are very large, some up to 650 MB. I need a user to be able to request this page and have it download the file to their machine. I have tried everything, but a plain readfile() request hangs for about 90 seconds before quitting and gives me a "No data received error 324 code," a chunked readfile script that I have found from several websites doesn't even start a download, FTP through PHP solutions did nothing but give me errors when I tried to get the file, and the only cURL solutions that I have found just create another file on my server. That is not what I need.
To be clear I need the user to be able to download the file to their computer and not to the server.
I don't know if this code is garbage or if it just needs a tweak or two, but any help is appreciated!
<?php
$fn = $_GET["fn"];
echo $fn."<br/>";
$url = $fn;
$path = "dl".$fn;
$fp = fopen($path, 'w');
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FILE, $fp);
$data = curl_exec($ch);
curl_close($ch);
fclose($fp);
?>

I wouldn't recommend serving large binary files using PHP or any other scripting technology for that matter. They where never design for this -- you can use apache, nginx or whatever standard http server you have on the back end. If you still need to use PHP, then you should probably check out readfile_chunked.
http://php.net/readfile#48683
and here's a great tutorial.
http://teddy.fr/blog/how-serve-big-files-through-php
good luck.

readfile() doesnt buffer. However, php itself might buffer. Turn buffering off
while (ob_get_level())
ob_end_clean();
readfile($file);
Your web server might buffer. Turn that off too. How you do it depends on the webserver, and why its buffering.

I see two problem that can happen:
First: Your web server may be closed the connection by timeout. you should look the web server config.
Second: Timeout with curl. I recommend to see this post.

File download via PHP being mysteriously interrupted on Dreamhost

I've written a simple PHP script to download a hidden file if the user has proper authentication. The whole set up works fine: it sends the proper headers, and the file transfer begins just fine (and ends just fine - for small files).
However, when I try to serve a 150 MB file, the connection gets mysteriously interrupted somewhere close to the middle of the file. Here's the relevant code fragment (taken from somewhere on the Internet and adapted by me):
function readfile_chunked($filename, $retbytes = TRUE) {
$handle = fopen($filename, 'rb');
if ($handle === false) return false;
while (!feof($handle) and (connection_status()==0)) {
print(fread($handle, 1024*1024));
set_time_limit(0);
ob_flush();
flush();
}
return fclose($handle);
}
I also do some other code BEFORE calling that function above, to try to solve the issue, but as far as I can tell, it does nothing:
session_write_close();
ob_end_clean();
ignore_user_abort();
set_time_limit(0);
As you can see, it doesn't attempt to load the whole file in memory at once or anything insane like that. To make it even more puzzling, the actual point in the transfer where it kills it seems to float between 50 and 110 MB, and it seems to kill ALL connections to the same file within a few seconds of each other (tried this by trying to download simultaneously with a friend). Nothing is appended to the interrupted file, and I see no errors on the logs.
I'm using Dreamhost, so I suspect that their watchdog might be killing my process because it's been running for too long. Does anyone have any experience to share on the matter? Could something else be the issue? Is there any workaround?
For the record, my Dreamhost is setup to use PHP 5.2.1 FastCGI.

I have little experience with Dreamhost, but you could use mod_xsendilfe instead (if Dreamhost allows it).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Downloading a very large XML file with PHP - php

I had a similar problem with php and inserted the following code to get around the execution time problem: ignore_user_abort(true); set_time_limit(0); ini_set('memory_limit', '2048M');

Related

Function file() reads only part of contents in PHP 7.4.27

read big file in php (more than 500mb)

Downloading large(ish) zip served with PHP gets corrupted for people with a slow connection

PHP Downloading a Huge Movie File (500 MB) with cURL

File download via PHP being mysteriously interrupted on Dreamhost

Categories

Resources