Run out of memory writing files to zip with flysystem

Run out of memory writing files to zip with flysystem - php

I'm programming a tool that gathers images uploaded by a user into a zip-archive. For this I came across ZipArchiveAdapter from Flysystem that seems to do a good job.
I'm encountering an issue with the memory limit when the amount of files in the zip archive goes into the thousands.
When the amount of images for a user starts to go beyond a 1000 it usually fails due to the available memory being exhausted. To get to the point where it seems to handle most users with less than 1000 images I've increased memory limit to 4GB, but increasing it beyond this is not really an option.
Simplified code at this point:
<?php
use League\Flysystem\Filesystem;
use League\Flysystem\ZipArchive\ZipArchiveAdapter;
use League\Flysystem\Memory\MemoryAdapter;
class User {
// ... Other user code
public function createZipFile()
{
$tmpFile = tempnam('/tmp', "zippedimages_");
$download = new FileSystem(new ZipArchiveAdapter($tmpFile));
if ($this->getImageCount()) {
foreach ($this->getImages() as $image) {
$path_in_zip = "My Images/{$image->category->title}/{$image->id}_{$image->image->filename}";
$download->write($path_in_zip, $image->image->getData());
}
}
$download->getAdapter()->getArchive()->close();
return $tmpFile;
// Upload zip to s3-storage
}
}
So my questions:
a) Is there a way to have Flysystem write to the zip-file "on the go" to disk? Currently it stores the entire zip in memory before writing to disk when the object is destroyed.
b) Should I utilize another library that would be better for this?
c) Should I take another approach here? For example having the user download multiple smaller zips instead of one large zip. (Ideally I want them to download just one file regardless)

Related

Create a temp excel & bulk download all with zip in PHP

I want to download my large db data as excel.
But due to the huge amount of data, it feels like impossible to download it, as it puts a lot of load on the server, takes a lot of processing time & also it keeps crashing.
Now, I want to create multiple temporary excel files 'on the go' of some limited data Ex: 50000 rows of data & at the end of the data I want to download all these temporary files in Zip.
So, it doesn't loads up the server & keeps it from crashing.
Is it achievable via PHP-CodeIgniter. Can anybody guide me ?

You can do it in many ways.
which you have thought zipping that in a folder
increase the limit of execution and memory limit in CI or in server also.
implement a queue service
For zipping you can do like below way
<?php
// Enter the name of directory
$pathdir = "DirectoryName/";
$zipcreated = "NameOfZip.zip";
$zip = new ZipArchive;
if($zip -> open($zipcreated, ZipArchive::CREATE ) === TRUE) {
// Store the path into the variable
$dir = opendir($pathdir);
while($file = readdir($dir)) {
if(is_file($pathdir.$file)) {
$zip -> addFile($pathdir.$file, $file);
}
}
$zip ->close();
}
?>
For Increasing option:
You can use excel5 in PHPExcel. It can handle 65536 rows. For that
ini_set("max_execution_time", 'time_limit'); //see manual
ini_set('memory_limit', 'memory_limit'); //your memory limit as string
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');
After that follow the documentation of PHPExcel.
And for the queue option:
This is a bit complex process.
You can implement a queue service and give the responsibility to generate the excel to the queue service. After generation, you can notify the user or return the download URL of the excel file.
For notification, you need to implement a notification service also.

How can I get count files in bucket to avoid memory leak?

I'm using AWS PHP SDK, and I need to avoid memory leak when I get many files from S3.
I want to set limit. If bucket has more than 50k files I want to throw an exception. Does S3 has functionality to get count of files in bucket/prefix before I get all files from S3?
My current solution looks like this, but it's bad
$documents = $driver->client->getPaginator('ListObjects',
$arguments)->search('Contents[].Key');
if (iterator_count($document) > $limit) { // but this way got all docs to memory
throw new Exception("We exceeded the limit");
}

Security of unzipping user submitted files

Not so much of a coding problem here, but a general question relating to security.
I'm currently working on a project that allows user submitted content.
A key part of this content is the user uploads a Zip file.
The zip file should contain only mp3 files.
I then unzip those files to a directory on the server, so that we can stream the audio on the website for users to listen to.
My concern is that this opens us up for some potentially damaging zip files.
I've read about 'zipbombs' in the past, and obviously don't want a malicious zip file causing damage.
So, is there a safe way of doing this?
Can i scan the zip file without unzipping it first, and if it contains anything other than MP3's delete it or flag a warning to the admin?
If it makes a difference i'm developing the site on Wordpress.
I currently use the built in upload features of wordpress to let the user upload the zip file to our server (i'm not sure if there's any form of security within wordpress already to scan the zip file?)

Code, only extract MP3s from zip, ignore everthing else
$zip = new ZipArchive();
$filename = 'newzip.zip';
if ($zip->open($filename)!==TRUE) {
exit("cannot open <$filename>\n");
}
for ($i=0; $i<$zip->numFiles;$i++) {
$info = $zip->statIndex($i);
$file = pathinfo($info['name']);
if(strtolower($file['extension']) == "mp3") {
file_put_contents(basename($info['name']), $zip->getFromIndex($i));
}
}
$zip->close();
I would use use something like id3_get_version (http://www.php.net/manual/en/function.id3-get-version.php) to ensure the contents of the file is mp3 too

Is there a reason they need to ZIP the MP3s? Unless there's a lot of text frames in the ID3v2 info in the MP3s, the file size will actually increase with the ZIP due to storage of the dictionary.
As far as I know, there isn't any way to scan a ZIP without actually parsing it. The data are opaque until you run each bit through the Huffman dictionary. And how would you determine what file is an MP3? By file extension? By frames? MP3 encoders have a loose standard (decoders have a more stringent spec) which makes it difficult to scan the file structure without false negatives.
Here are some ZIP security risks:
Comment data that causes buffer overflows. Solution: remove comment data.
ZIPs that are small in compressed size but inflate to fill the filesystem (classic ZIP bomb). Solution: check inflated size before inflating; check dictionary to ensure it has many entries, and that the compressed data isn't all 1's.
Nested ZIPs (related to #2). Solution: stop when an entry in the ZIP archive is itself ZIP data. You can determine this by checking for the central directory's marker, the number 0x02014b50 (hex, always little-endian in ZIP - http://en.wikipedia.org/wiki/Zip_%28file_format%29#Structure).
Nested directory structures, intended to exceed the filesystem's limit and hang the deflating process. Solution: don't unzip directories.
So, either do a lot of scrubbing and integrity checks, or at the very least use PHP to scan the archive; check each file for its MP3-ness (however you do that - extension and the presence of MP3 headers? You can't rely on them being at byte 0, though. http://en.wikipedia.org/wiki/MP3#File_structure) and deflated file size (http://www.php.net/manual/en/function.zip-entry-filesize.php). Bail out if an inflated file is too big, or if there are any non-MP3s present.

Use the following code the file names inside a .zip archive:
$zip = zip_open('test.zip');
while($entry = zip_read($zip)) {
$file_name = zip_entry_name($entry);
$ext = pathinfo($file_name, PATHINFO_EXTENSION);
if(strtoupper($ext) !== 'MP3') {
notify_admin($file_name);
}
}
Note that following code will only have look at the extension. Meaning that user can upload anything what has a MP3 extension. To really check if the file is an mp3 you'll have to unpack it. I would advice you to do that in a temporary directory.
After the file is unpacked you may analyze it using, for example ffmpeg or whatever. Having detailed data about bitrate, track lenght, etc will be interesting in any case.
If the analysis fails you can flag the file.

is it possible to limit the size of uncompressed zipped files in unix?

i am implementing a service where i have to extract a zip file which was uploaded by a user.
in order to avoid disk overflow, i have to limit BOTH zip file size AND unzipped files size.
is there anyway to do that (check unzipped files size) BEFORE unzipping?
(for security reasons).
i am using unix, called from a PHP script.

Since you're working in PHP, use its ZipArchive library.
$zip = zip_open($file);
$extracted_size = 0;
while (($zip_entry = zip_read($zip))) {
$extracted_size += zip_entry_filesize($zip_entry);
if ($extracted_size > $max_extracted_size) {
// abort
}
}
// do the actual unzipping
You might want to put a limit on the number of files as well, or add a constant amount per file, to take into account the size of the metadata for each file. While you can't easily get a precise figure for that, adding a few hundred bytes to a couple of kilobytes per file is a reasonable estimate.

Estimate required memory for libGD operation

Before attempting to resize an image in PHP using libGD, I'd like to check if there's enough memory available to do the operation, because an "out of memory" completely kills the PHP process and can't be catched.
My idea was that I'd need 4 byte of memory for each pixel (RGBA) in the original and in the new image:
// check available memory
if(!is_mem_available(($from_w * $from_h * 4) + ($to_w * $to_h * 4))){
return false;
}
Tests showed that this much more memory than the library really seem to use. Can anyone suggest a better method?

You should check this comment out, and also this one.

I imagine it must be possible to find out GD's peak memory usage by analyzing imagecopyresampled's source code, but this may be hard, require extended profiling, vary from version to version, and be generally unreliable.
Depending on your situation, a different approach comes to mind: When resizing an image, call another PHP script on the same server, but using http:
$file = urlencode("/path/to/file");
$result = file_get_contents("http://example.com/dir/canary.php?file=$file&width=1000&height=2000");
(sanitizing the file parameter, obviously)
If that script fails with an "out of memory" error, you'll know the image is too large.
If it successfully manages to resize the image, it could return the path to a temporary file containing the resize result. Things would go ahead normally from there.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.