Unzipping larger files with PHP

Unzipping larger files with PHP - php

I'm trying to unzip a 14MB archive with PHP with code like this:
$zip = zip_open("c:\kosmas.zip");
while ($zip_entry = zip_read($zip)) {
$fp = fopen("c:/unzip/import.xml", "w");
if (zip_entry_open($zip, $zip_entry, "r")) {
$buf = zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
fwrite($fp,"$buf");
zip_entry_close($zip_entry);
fclose($fp);
break;
}
zip_close($zip);
}
It fails on my localhost with 128MB memory limit with the classic "Allowed memory size of blablabla bytes exhausted". On the server, I've got 16MB limit, is there a better way to do this so that I could fit into this limit? I don't see why this has to allocate more than 128MB of memory. Thanks in advance.
Solution:
I started reading the files in 10Kb chunks, problem solved with peak memory usage arnoud 1.5MB.
$filename = 'c:\kosmas.zip';
$archive = zip_open($filename);
while($entry = zip_read($archive)){
$size = zip_entry_filesize($entry);
$name = zip_entry_name($entry);
$unzipped = fopen('c:/unzip/'.$name,'wb');
while($size > 0){
$chunkSize = ($size > 10240) ? 10240 : $size;
$size -= $chunkSize;
$chunk = zip_entry_read($entry, $chunkSize);
if($chunk !== false) fwrite($unzipped, $chunk);
}
fclose($unzipped);
}

Why do you read the whole file at once?
$buf = zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
fwrite($fp,"$buf");
Try to read small chunks of it and writing them to a file.

Just because a zip is less than PHP's memory limit & perhaps the unzipped is as well, doesn't take account of PHP's overhead generally and more importantly the memory needed to actually unzip the file, which whilst I'm not expert with compression I'd expect may well be a lot more than the final unzipped size.

For a file of that size, perhaps it is better if you use shell_exec() instead:
shell_exec('unzip archive.zip -d /destination_path');
PHP must not be running in safe mode and you must have access to both shell_exec and unzip for this method to work.
Update:
Given that command line tools are not available, all I can think of is to create a script and send the file to a remote server where command line tools are available, extract the file and download the contents.

function my_unzip($full_pathname){
$unzipped_content = '';
$zd = gzopen($full_pathname, "r");
while ($zip_file = gzread($zd, 10000000)){
$unzipped_content.= $zip_file;
}
gzclose($zd);
return $unzipped_content;
}

Related

PHP Memory access

Can we access some of the system's heap memory with PHP like in C/C++? Actually, I keep getting Fatal error: Allowed memory size of 134217728 bytes exhausted while trying to do some big file operation in PHP.
I know we can tweak this limit in Apache2 config. But, for processing large unknown sized files and data can we have some kind of access to the heap to process & save the file? Also, if yes, then is there a mechanism to clear the memory after usage?
Sample code
<?php
$filename = "a.csv";
$handle = fopen($filename, "r");
$contents = fread($handle, filesize($filename));
echo $contents;
fclose($handle);
?>
Here, a.csv is a 80mb file. Can there be a heap operation using some sort of pointer?

Have you tried reading the file in chunks, e.g.:
<?php
$chunk = 256 * 256; // set chunk size to your liking
$filename = "a.csv";
$handle = fopen($filename, 'rb');
while (!feof($handle))
{
$data = fread($handle, $chunk);
echo $data;
ob_flush();
flush();
}
fclose($handle);

Check file size before creation

I'm creating a system to generate sitemaps for a application I'm working on, and one of the requirements for generating sitemaps is that each sitemap shouldn't have a file size greater than 10mb (10,485,760 bytes) as can be seen here.
This is my code to create the sitemap:
$fp = fopen($this->getSitemapPath() . $filename, 'w');
fwrite($fp, $siteMap->__toString());
fclose($fp);
The method $siteMap->__toString() holds a maximum of 50000 links.
Is there a way to check the resulting file size before calling the function fwrite?

Sure, you can use mb_strlen to get the length of your string before you write it out to a file.
$contents = $siteMap->__toString();
if(mb_strlen($contents, '8bit') >= 10485760) {
echo "Oops, this is too big";
} else {
fwrite($fp, $contents);
}

Unzipping a file with PHP, but not all files are extracted

I'm working on extracting a zip archive with PHP. The structure of the archive is seven folders, each of which contains on the order of 10,000 files, each around 1 kB.
My code is pretty simple and uses the ZipArchive class:
$zip = new ZipArchive();
$result = $zip->open($filename);
if ($result === true) {
$zip->extractTo($tmpdir);
$zip->close();
}
The problem I'm having, though, is that the extraction seems to halt. The first folder is fully extracted, but only about half of the second one is. None of the other five are extracted at all.
I also tried using this code, which breaks it into chunks of 10 kB at a time, but got the exact same result:
$archive = zip_open($filename);
while ($entry = zip_read($archive)) {
$size = zip_entry_filesize($entry);
$name = zip_entry_name($entry);
if (substr($name, -1) == '/') {
if (!file_exists($tmpdir . $name)) mkdir($tmpdir . $name);
} else {
$unzipped = fopen($tmpdir . $name, 'wb');
while ($size > 0) {
$chunkSize = ($size > 10240) ? 10240 : $size;
$size -= $chunkSize;
$chunk = zip_entry_read($entry, $chunkSize);
if ($chunk !== false) fwrite($unzipped, $chunk);
}
fclose($unzipped);
}
}
I've also tried increasing the memory limit in PHP from 512 MB to 1024 MB, but again got the same result. Unzipped everything is around 100 MB, so I wouldn't anticipate it being a memory issue anyway.

Probably its your max execution time... disable the limit completely by setting it to 0 or set a good value.
ini_set('max_execution_time', 10000);
... dont set it to 0 in production use ...
If you don't have access to ini_set() because of the disable_function directive you may have to edit its value in your php.ini directly.

getimagesize() limiting file size for remote URL

I could use getimagesize() to validate an image, but the problem is what if the mischievous user puts a link to a 10GB random file then it would whack my production server's bandwidth. How do I limit the filesize getimagesize() is getting? (eg. 5MB max image size)
PS: I did research before asking.

You can download the file separately, imposing a maximum size you wish to download:
function mygetimagesize($url, $max_size = -1)
{
// create temporary file to store data from $url
if (false === ($tmpfname = tempnam(sys_get_temp_dir(), uniqid('mgis')))) {
return false;
}
// open input and output
if (false === ($in = fopen($url, 'rb')) || false === ($out = fopen($tmpfname, 'wb'))) {
unlink($tmpfname);
return false;
}
// copy at most $max_size bytes
stream_copy_to_stream($in, $out, $max_size);
// close input and output file
fclose($in); fclose($out);
// retrieve image information
$info = getimagesize($tmpfname);
// get rid of temporary file
unlink($tmpfname);
return $info;
}

You don't want to do something like getimagesize('http://example.com') to begin with, since this will download the image once, check the size, then discard the downloaded image data. That's a real waste of bandwidth.
So, separate the download process from the checking of the image size. For example, use fopen to open the image URL, read little by little and write it to a temporary file, keeping count of how much you have read. Once you cross 5MB and are still not finished reading, you stop and reject the image.
You could try to read the HTTP Content-Size header before starting the actual download to weed out obviously large files, but you cannot rely on it, since it can be spoofed or omitted.

Here is an example, you need to make some change to fit your requirement.
function getimagesize_limit($url, $limit)
{
global $phpbb_root_path;
$tmpfilename = tempnam($phpbb_root_path . 'store/', unique_id() . '-');
$fp = fopen($url, 'r');
if (!$fp) return false;
$tmpfile = fopen($tmpfilename, 'w');
$size = 0;
while (!feof($fp) && $size<$limit)
{
$content = fread($fp, 8192);
$size += 8192; fwrite($tmpfile, $content);
}
fclose($fp);
fclose($tmpfile);
$is = getimagesize($tmpfilename);
unlink($tmpfilename);
return $is;
}

php how to get web image size in kb?

php how to get web image size in kb?
getimagesize only get the width and height.
and filesize caused waring.
$imgsize=filesize("http://static.adzerk.net/Advertisers/2564.jpg");
echo $imgsize;
Warning: filesize() [function.filesize]: stat failed for http://static.adzerk.net/Advertisers/2564.jpg
Is there any other way to get a web image size in kb?

Short of doing a complete HTTP request, there is no easy way:
$img = get_headers("http://static.adzerk.net/Advertisers/2564.jpg", 1);
print $img["Content-Length"];
You can likely utilize cURL however to send a lighter HEAD request instead.

<?php
$file_size = filesize($_SERVER['DOCUMENT_ROOT']."/Advertisers/2564.jpg"); // Get file size in bytes
$file_size = $file_size / 1024; // Get file size in KB
echo $file_size; // Echo file size
?>

Not sure about using filesize() for remote files, but there are good snippets on php.net though about using cURL.
http://www.php.net/manual/en/function.filesize.php#92462

That sounds like a permissions issue because filesize() should work just fine.
Here is an example:
php > echo filesize("./9832712.jpg");
1433719
Make sure the permissions are set correctly on the image and that the path is also correct. You will need to apply some math to convert from bytes to KB but after doing that you should be in good shape!

Here is a good link regarding filesize()
You cannot use filesize() to retrieve remote file information. It must first be downloaded or determined by another method
Using Curl here is a good method:
Tutorial

You can use also this function
<?php
$filesize=file_get_size($dir.'/'.$ff);
$filesize=$filesize/1024;// to convert in KB
echo $filesize;
function file_get_size($file) {
//open file
$fh = fopen($file, "r");
//declare some variables
$size = "0";
$char = "";
//set file pointer to 0; I'm a little bit paranoid, you can remove this
fseek($fh, 0, SEEK_SET);
//set multiplicator to zero
$count = 0;
while (true) {
//jump 1 MB forward in file
fseek($fh, 1048576, SEEK_CUR);
//check if we actually left the file
if (($char = fgetc($fh)) !== false) {
//if not, go on
$count ++;
} else {
//else jump back where we were before leaving and exit loop
fseek($fh, -1048576, SEEK_CUR);
break;
}
}
//we could make $count jumps, so the file is at least $count * 1.000001 MB large
//1048577 because we jump 1 MB and fgetc goes 1 B forward too
$size = bcmul("1048577", $count);
//now count the last few bytes; they're always less than 1048576 so it's quite fast
$fine = 0;
while(false !== ($char = fgetc($fh))) {
$fine ++;
}
//and add them
$size = bcadd($size, $fine);
fclose($fh);
return $size;
}
?>

You can get the file size by using the get_headers() function. Use below code:
$image = get_headers($url, 1);
$bytes = $image["Content-Length"];
$mb = $bytes/(1024 * 1024);
echo number_format($mb,2) . " MB";

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Unzipping larger files with PHP - php

Why do you read the whole file at once? $buf = zip_entry_read($zip_entry, zip_entry_filesize($zip_entry)); fwrite($fp,"$buf"); Try to read small chunks of it and writing them to a file.

function my_unzip($full_pathname){ $unzipped_content = ''; $zd = gzopen($full_pathname, "r"); while ($zip_file = gzread($zd, 10000000)){ $unzipped_content.= $zip_file; } gzclose($zd); return $unzipped_content; }

Related

PHP Memory access

Check file size before creation

Unzipping a file with PHP, but not all files are extracted

getimagesize() limiting file size for remote URL

php how to get web image size in kb?

Categories

Resources