Unpack large files with gzip in PHP - php

I'm using a simple unzip function (as seen below) for my files so I don't have to unzip files manually before they are processed further.
function uncompress($srcName, $dstName) {
$string = implode("", gzfile($srcName));
$fp = fopen($dstName, "w");
fwrite($fp, $string, strlen($string));
fclose($fp);
}
The problem is that if the gzip file is large (e.g. 50mb) the unzipping takes a large amount of ram to process.
The question: can I parse a gzipped file in chunks and still get the correct result? Or is there a better other way to handle the issue of extracting large gzip files (even if it takes a few seconds more)?

gzfile() is a convenience method that calls gzopen, gzread, and gzclose.
So, yes, you can manually do the gzopen and gzread the file in chunks.
This will uncompress the file in 4kB chunks:
function uncompress($srcName, $dstName) {
$sfp = gzopen($srcName, "rb");
$fp = fopen($dstName, "w");
while (!gzeof($sfp)) {
$string = gzread($sfp, 4096);
fwrite($fp, $string, strlen($string));
}
gzclose($sfp);
fclose($fp);
}

try with
function uncompress($srcName, $dstName) {
$fp = fopen($dstName, "w");
fwrite($fp, implode("", gzfile($srcName)));
fclose($fp);
}
$length parameter is optional.

If you are on a Linux host, have the required privilegies to run commands, and the gzip command is installed, you could try calling it with something like shell_exec
SOmething a bit like this, I guess, would do :
shell_exec('gzip -d your_file.gz');
This way, the file wouldn't be unzip by PHP.
As a sidenote :
Take care where the command is run from (ot use a swith to tell "decompress to that directory")
You might want to take a look at escapeshellarg too ;-)

As maliayas mentioned, it may lead to a bug. I experienced an unexpected fall out of the while loop, but the gz file has been decompressed successfully. The whole code looks like this and works better for me:
function gzDecompressFile($srcName, $dstName) {
$error = false;
if( $file = gzopen($srcName, 'rb') ) { // open gz file
$out_file = fopen($dstName, 'wb'); // open destination file
while (($string = gzread($file, 4096)) != '') { // read 4kb at a time
if( !fwrite($out_file, $string) ) { // check if writing was successful
$error = true;
}
}
// close files
fclose($out_file);
gzclose($file);
} else {
$error = true;
}
if ($error)
return false;
else
return true;
}

Related

Transfer a file of any type in 1k chunks over HTTP

I need to transfer files of any type or size over HTTP/GET in ~1k chunks. The resulting file hash needs to match the source file. This needs to be done in native PHP without any special tools. I have a basic strategy but I'm getting odd results. This proof of concept just copies the file locally.
CODE
<?php
$input="/home/lm1/Music/Ellise - Feeling Something Bad.mp3";
$a=pathinfo($input);
$output=$a["basename"];
echo "\n> ".md5_file($input);
$fp=fopen($input,'rb');
if ($fp) {
while(!feof($fp)) {
$buffer=base64_encode(fread($fp,1024));
// echo "\n\n".md5($buffer);
write($output,$buffer);
}
fclose($fp);
echo "\n> ".md5_file($output);
echo "\n";
}
function write($file,$buffer) {
// echo "\n".md5($buffer);
$fp = fopen($file, 'ab');
fwrite($fp, base64_decode($buffer));
fclose($fp);
}
?>
OUTPUT
> d31e102b1cae9c73bbf5a12615a8ea36
> 9f03f6c88ed61c07cb534922d6d31864
Thanks in advance.
fread already advances the file pointer position, so there's no need to keep track of it. Same with frwite, so consecutive calls automatically append to the given file. Thus, you could simplify your approach to (code adapted from this answer on how to efficiently write a large input stream to a file):
$src = "a.test";
$dest = "b.test";
$fp_src = fopen($src, 'rb');
if ($fp_src) {
$fp_dest = fopen($dest, 'wb');
$buffer_size = 1024;
while(!feof($fp_src)) {
fwrite($fp_dest, fread($fp_src, $buffer_size));
}
fclose($fp_src);
fclose($fp_dest);
echo md5_file($src)."\n"; // 88e4af2f85080a280e7f00e50d96b7f7
echo md5_file($dest)."\n"; // 88e4af2f85080a280e7f00e50d96b7f7
}
If you want to keep both processes separated, you'd do:
$src = "a.test";
$dest = "b.test";
if (file_exists($dest)) {
unlink($dest); // So we don't append to an existing file
}
$fp = fopen($src,'rb');
if ($fp) {
while(!feof($fp)){
$buffer = base64_encode(fread($fp, 1024));
write($dest, $buffer);
}
fclose($fp);
}
function write($file, $buffer) {
$fp = fopen($file, 'ab');
fwrite($fp, base64_decode($buffer));
fclose($fp);
}
echo md5_file($src)."\n"; // 88e4af2f85080a280e7f00e50d96b7f7
echo md5_file($dest)."\n"; // 88e4af2f85080a280e7f00e50d96b7f7
As for how to stream files over HTTP, you might want to have a look at:
Streaming a large file using PHP

Issue on Reading .txt inside a Zipped File by PHP [duplicate]

I need to read the content of a single file, "test.txt", inside of a zip file. The whole zip file is a very large file (2gb) and contains a lot of files (10,000,000), and as such extracting the whole thing is not a viable solution for me. How can I read a single file?
Try using the zip:// wrapper:
$handle = fopen('zip://test.zip#test.txt', 'r');
$result = '';
while (!feof($handle)) {
$result .= fread($handle, 8192);
}
fclose($handle);
echo $result;
You can use file_get_contents too:
$result = file_get_contents('zip://test.zip#test.txt');
echo $result;
Please note #Rocket-Hazmat fopen solution may cause an infinite loop if a zip file is protected with a password, since fopen will fail and feof fails to return true.
You may want to change it to
$handle = fopen('zip://file.zip#file.txt', 'r');
$result = '';
if ($handle) {
while (!feof($handle)) {
$result .= fread($handle, 8192);
}
fclose($handle);
}
echo $result;
This solves the infinite loop issue, but if your zip file is protected with a password then you may see something like
Warning: file_get_contents(zip://file.zip#file.txt): failed to open
stream: operation failed
There's a solution however
As of PHP 7.2 support for encrypted archives was added.
So you can do it this way for both file_get_contents and fopen
$options = [
'zip' => [
'password' => '1234'
]
];
$context = stream_context_create($options);
echo file_get_contents('zip://file.zip#file.txt', false, $context);
A better solution however to check if a file exists or not before reading it without worrying about encrypted archives is using ZipArchive
$zip = new ZipArchive;
if ($zip->open('file.zip') !== TRUE) {
exit('failed');
}
if ($zip->locateName('file.txt') !== false) {
echo 'File exists';
} else {
echo 'File does not exist';
}
This will work (no need to know the password)
Note: To locate a folder using locateName method you need to pass it like folder/ with a
forward slash at the end.

PHP not writing to file from one source

I have an issue I can't seem to find the solution for. I am trying to write to a flat text file. I have echoed all variables out on the screen, verified permissions for the user (www-data) and just for grins set everything in the whole folder to 777 - all to no avail. Worst part is I can call on the same function from another file and it writes. I can't see to find the common thread here.....
function ReplaceAreaInFile($AreaStart, $AreaEnd, $File, $ReplaceWith){
$FileContents = GetFileAsString($File);
$Section = GetAreaFromFile($AreaStart, $AreaEnd, $FileContents, TRUE);
if(isset($Section)){
$SectionTop = $AreaStart."\n";
$SectionTop .= $ReplaceWith;
$NewContents = str_replace($Section, $SectionTop, $FileContents);
if (!$Handle = fopen($File, 'w')) {
return "Cannot open file ($File)";
exit;
}/*
if(!flock($Handle, LOCK_EX | LOCK_NB)) {
echo 'Unable to obtain file lock';
exit(-1);
}*/
if (fwrite($Handle, $NewContents) === FALSE) {
return "Cannot write to file ($File)";
exit;
}else{
return $NewContents;
}
}else{
return "<p align=\"center\">There was an issue saving your settings. Please try again. If the issue persists contact your provider.</p>";
}
}
Try with...
$Handle = fopen($File, 'w');
if ($Handle === false) {
die("Cannot open file ($File)");
}
$written = fwrite($Handle, $NewContents);
if ($written === false) {
die("Invalid arguments - could not write to file ($File)");
}
if ((strlen($NewContents) > 0) && ($written < strlen($NewContents))) {
die("There was a problem writing to $File - $written chars written");
}
fclose($Handle);
echo "Wrote $written bytes to $File\n"; // or log to a file
return $NewContents;
and also check for any problems in the error log. There should be something, assuming you've enabled error logging.
You need to check for number of characters written since in PHP fwrite behaves like this:
After having problems with fwrite() returning 0 in cases where one
would fully expect a return value of false, I took a look at the
source code for php's fwrite() itself. The function will only return
false if you pass in invalid arguments. Any other error, just as a
broken pipe or closed connection, will result in a return value of
less than strlen($string), in most cases 0.
Also, note that you might be writing to a file, but to a different file that you're expecting to write. Absolute paths might help with tracking this.
The final solution I ended up using for this:
function ReplaceAreaInFile($AreaStart, $AreaEnd, $File, $ReplaceWith){
$FileContents = GetFileAsString($File);
$Section = GetAreaFromFile($AreaStart, $AreaEnd, $FileContents, TRUE);
if(isset($Section)){
$SectionTop = $AreaStart."\n";
$SectionTop .= $ReplaceWith;
$NewContents = str_replace($Section, $SectionTop, $FileContents);
return $NewContents;
}else{
return "<p align=\"center\">There was an issue saving your settings.</p>";
}
}
function WriteNewConfigToFile($File2WriteName, $ContentsForFile){
file_put_contents($File2WriteName, $ContentsForFile, LOCK_EX);
}
I did end up using absolute file paths and had to check the permissions on the files. I had to make sure the www-data user in Apache was able to write to the files and was also the user running the script.

Catching errors when downloading massive files via PHP

I am attempting to download fairly large files (up to, possibly over 1GB) from a remote HTTP server through a PHP script. I am using fgets() to read the remote file line by line and write the file contents into a local file that is created through tempnam(). However, the downloads of very large files (several hundred MB) are failing. Is there any way I can rework the script to catch the errors that are occurring?
Because the download is only part of a larger overall process, I would like to be able to handle the downloads and deal with errors in the PHP script rather than having to go to wget or some other process.
This is the script I am using now:
$tempfile = fopen($inFilename, 'w');
$handle = #fopen("https://" . $server . ".domain.com/file/path.pl?keyID=" . $keyID . "&format=" . $format . "&zipped=true", "r");
$firstline = '';
if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
if ($firstline == '') $firstline = $buffer;
fwrite($tempfile, $buffer);
}
fclose($handle);
fclose($tempfile);
return $firstline;
} else {
throw new Exception ('Unable to open remote file.');
}
I'd say you're looking for stream_notification_callback (especially the STREAM_NOTIFY_FAILURE & STREAM_NOTIFY_COMPLETED constants)

How To watch a file write in PHP?

I want to make movement such as the tail command with PHP,
but how may watch append to the file?
I don't believe that there's some magical way to do it. You just have to continuously poll the file size and output any new data. This is actually quite easy, and the only real thing to watch out for is that file sizes and other stat data is cached in php. The solution to this is to call clearstatcache() before outputting any data.
Here's a quick sample, that doesn't include any error handling:
function follow($file)
{
$size = 0;
while (true) {
clearstatcache();
$currentSize = filesize($file);
if ($size == $currentSize) {
usleep(100);
continue;
}
$fh = fopen($file, "r");
fseek($fh, $size);
while ($d = fgets($fh)) {
echo $d;
}
fclose($fh);
$size = $currentSize;
}
}
follow("file.txt");
$handle = popen("tail -f /var/log/your_file.log 2>&1", 'r');
while(!feof($handle)) {
$buffer = fgets($handle);
echo "$buffer\n";
flush();
}
pclose($handle);
Checkout php-tail on Google code. It's a 2 file implementation with PHP and Javascript and it has very little overhead in my testing.
It even supports filtering with a grep keyword (useful for ffmpeg which spits out frame rate etc every second).
$handler = fopen('somefile.txt', 'r');
// move you at the end of file
fseek($handler, filesize( ));
// move you at the begining of file
fseek($handler, 0);
And probably you will want to consider a use of stream_get_line
Instead of polling filesize you regular checking the file modification time: filemtime
Below is what I adapted from above. Call it periodically with an ajax call and append to your 'holder' (textarea)... Hope this helps... thank you to all of you who contribute to stackoverflow and other such forums!
/* Used by the programming module to output debug.txt */
session_start();
$_SESSION['tailSize'] = filesize("./debugLog.txt");
if($_SESSION['tailPrevSize'] == '' || $_SESSION['tailPrevSize'] > $_SESSION['tailSize'])
{
$_SESSION['tailPrevSize'] = $_SESSION['tailSize'];
}
$tailDiff = $_SESSION['tailSize'] - $_SESSION['tailPrevSize'];
$_SESSION['tailPrevSize'] = $_SESSION['tailSize'];
/* Include your own security checks (valid user, etc) if required here */
if(!$valid_user) {
echo "Invalid system mode for this page.";
}
$handle = popen("tail -c ".$tailDiff." ./debugLog.txt 2>&1", 'r');
while(!feof($handle)) {
$buffer = fgets($handle);
echo "$buffer";
flush();
}
pclose($handle);

Categories