How to quickly zip large files in PHP - php

I wrote a PHP script to dynamically pack files selected by the client into zip file and force a download. It works well except that when the number of files is huge (like over 50000), it takes a very long time for the download dialog box to appear on the client side.
I thought about improving this using cache (these files are not changed very often), but because the selection of the files are totally decided by the user, and there are tens of thousands of combinations on the selection, it is very hard to cache combinations. I also thought about generating zip archives for individual files first, and then combining the zip files on-the-fly. But I did't find a way to concatenate zip files in PHP. Another way I can think of is sending (i.e., reading) the zip file at the same time as generating it. I also don't know if this is supported.
If someone could help me on this, I would really appreciate your help.

To extened Mike Sherov's answer, try using a combination of Tar and Gzip/Zip. Individually pre-compress all the files using Gzip/Zip, Then when the client makes their selection, you simply Tar those files together. That way you still get the benefit of compression and the simplicity of downloading one file, but none of the overheads and delays associated with compressing large files in real time.

While not a silver bullet, you can try tar'ing the files instead. The resulting file is larger, but compression time is much shorter. See here for more info: http://birdhouse.org/blog/2010/03/08/zip-vs-tar-gzip/

Check out mod_zip for Nginx:
https://github.com/evanmiller/mod_zip
It streams a ZIP file to the client dynamically and can include very large (2GB+) files while using very little RAM.

Related

Create a PDF file on the fly and stream it while it is not yet finished?

We want to merge a lot of PDF files into one big file and send it to the client. However, the resources on our production server are very restricted, so merging all files in memory first and then sending the finished PDF file results in our script being killed because it exhausts its available memory.
The only solution (besides getting a better server, obviously) would be starting to stream the PDF file before it is fully created to bypass the memory limit.
However I wonder if that is even possible. Can PDF files be streamed before they're fully created? Or doesn't the PDF file format allow streaming unfinished files because some headers or whatever have to be set after the full contents are certain?
If it is possible, which PDF library supports creating a file as a stream? Most libraries that I know of (like TCPDF) seem to create the full file in memory and then in the end output this finished result somewhere (i. e. via the $tcpdf->Output() method).
The PDF file format is entirely able to be streamed. There's certainly nothing that'll prevent it anyway.
As an example, we recently had a customer that required reading a single page over a HTTP connection to a remote PDF, without downloading or reading the whole PDF. We're able to do this by making many small HTTP requests for specific content within the PDF. We use the trailer at the end of the PDF and the cross reference table to find the required content without having to parse the whole PDF.
If I understand your problem, it looks like your current library you're using loads each PDF in memory before creating or streaming out the merged document.
If we look at this problem a different way, the better solution would be for the PDF library to only take references to the PDFs to be merged, then when the merged PDF is being created or streamed, pull in the content and resources from the PDFs to be merged, as-and-when required.
I'm not sure how many PHP libraries there are that can do this as I'm not too up-to-date with PHP, but I know there are probably a few C/C++ libraries that may be able to do this. I understand PHP can use extensions to call these libraries. Only downside is that they'll likely have commercial licenses.
Disclaimer: I work for the Mako SDK R&D group, hence why I know for sure there are some libraries which will do this. :)

Is php good for large file uploads such as videos

hi i wanted to know if uploading large files like videos ( over 200 mb - 1gb) from php is a good option after setting up the server configuration like max_post_size , execution time etc. The reason i ask this question is because i read some where that when a large file is uploaded , best practice is to break that file into chunks and upload it ( I think youtube does that). Do i need to use another language like python or C++ for uploading large files or is php enough. If i need to use another language can anyone please help me with reading material for that .
Thank you.
PHP will hold the entire file in memory while the upload is happening. That means that if you are uploading 5 files in parallel, then at the very most you will need 5GB+ of memory.
This can be done in PHP, and I have done this using a chunking method. There are several SO questions on this topic:
File uploads; How to utilize “chunking”?
Upload 1GB files using chunking in PHP
But my personal preference is to use plupload. It is a very complete cross-platform (JS, Flash, Silverlight) upload script with a nice PHP code sample to handle chunking.
Its not only PHP to be considered for large file uploads. Your web server also need to support that, at least in nginx. I don't know how httpd handles that, but as you said splitting in chunks are viable solution. FTP is another option.

is it dangerous to zip/unzip unknown uploaded files on server that may contain anything? even viruses?

i am trying to implement a user system (php, apache) where the user can upload several files and download a zipped version of them. (or uploading a zipped file and download the uncompressed files).
question: is there any risk to zip/unzip those unknown files?
in other words, are unix/php zip/unzip operations treat files as text only or some execution can occur?
This question is relevant to all compression methods, zip is just an example.
EDIT: #Alex Brown AND #Parallelis wrote 2 risks so obviously it is not safe.
any one can post a short explanation on how to implement a safe compress/uncompress of unknown files?
As It seems reasonable to me. You cant do this because of some issues, what if those files are bootstrap scripts? (Refer comments of Alex and Parallelis for 2 more issues).
Solutions :
If you going to store the zip files as zip files after being uploaded, Doing so you will face additional issues since zips can contain lots of files that may or may not be appropriate. In that case you may want to get a list of the contents of the file to automatically include in your field so people downloading them will know if the file contents are valid. You could also integrate with something like ClamAV to scan all the files that are uploaded.
Note: Google is doing same thing, they use their anti virus scanner programs (which offcourse are not available for public use).
Also you can place the file in a temporary directory first and then use zip_open on it in that location. Their you will be able to use OS level commands (which come with their own risks) to verify the integrity of the file without actually unzipping it. You can also refer this tool for same thing.
There are several potential issues:
Zip bomb - this is generally not that much of an issue any more because most decompression tools / languages will prevent nested levels of decompression.
Relative paths: This in my mind is your biggest concern - the zip is decompressed, but it includes the file: ../../../../../../vendor/autoload.php for example.. this then overwrites your autoload.php file and is executed whenever someone views your website. Game over.
A zillion inodes. A zip file may contain millions of 0 byte files, using up all your available inodes on the system. This would stop the hard disk being able to create new files on that partition. This could be medium-bad.
You also should know that zip archives could contain symlinks. If user can read files after unzip, it is possible to read arbitrary files on your filesystem.
zip utility has --symlinks option for storing symlinks.

What are the difficulties/issues to consider when allowing ZIP file uploads?

I allow PDF files to be uploaded to my site (PHP).
I would like to offer the ability to also allow .zip files which contain PDF files in directories so it is easier for users to simply zip a directory and upload one file instead of uploading multiple zip files individual.
For those of you who offer a .zip file upload feature to your (PHP) website, what are the technical, security, and other issues you have faced?
Be careful how you unpack the zip, you could find yourself consuming far more resources than you expected. Perhaps some setrlimit(2) resource limits before unpacking would be wise.
The unzip(1) utility has several nice safety features built in; the -^ command line option turns off control-character filtering, so make sure you don't touch this :) and the -: command line option allows stupid pathnames like ../../../../etc/passwd. Make sure you're on at least version 5.50, so that those stupid pathnames are forbidden by default. (And don't use that command line option. I mention the options just so you can more easily find the documentation for them. :)
If you use an API, make sure it has options to prevent both kinds of silly filenames.
Assuming the .zip gets unpacked eventually you would have to make sure the directory they get unpacked in is unreachable by the the clients' browsers (with .htaccess or by placing it outside the web root directory), and even in that case I'd still monitor the contents of the unpacked .zip to make sure they didn't contain anything that might prove harmful (php or other files run by the server, html spoofs).
Another issue is i guess the upload_max_filesize set in php.ini, you can make sure it can be set big enough to suit your purposes before you start coding.
edit: also read sarnold's answer ;)
AFAIK, php can handle zip files pretty efficiently. Difficulties/Issues that I can think of is, while accessing the file where We need to extract the zip first, and then retrieve the actual needed file. Due to that reason, extracting a zip, might consume additional amount of server time, depending on the size of the file itself.
Where As, during uploads, I do not suppose there is any difficulties or issues specially emphasized on zip types.

Upload merged Images and then break apart with PHP

Looking for a way to enable someone to upload a single file which will be series of image files (all gif) merged together as one big file. Here is what I need to do:
Using VB6, want to merge the image files (potentially dozens of them) into a single file
Upload file to a PHP Script (easy enough)
Have PHP break apart the single file and write image files
I know how to handle the uploading of the file. I also know how to write the image files in PHP. What I am unsure of is the merging/un-merging operation.
In theory, I should just be able to use VB6 to merge all images using binary read/writing. However, does anyone know the series of binary codes that prefix each .gif file so PHP can pick up on that, or do I need to write some sort of binary separator in between each merged image?
I could surely tinker with this myself, but I thought some of you smarter-than-me coders may have already done this, and/or could provide a link, some code, or some 'things to consider'.
Thanks.
Instead of merging/un-merging, if the whole purpose is to avoid overhead of sending dozens of files, why not zipping them and unzipping in PHP?
That should be far easier than the merging operation you're proposing.
Here's a free Zip/Unzip library for Windows: Info-ZIP
Here's some sample code that uses Info-ZIP: Zip and Unzip Using VB5 or VB6
Here's PHP's documentation on the ZIP module: php.net/zip
Here's an example of how to use "unzip" command through PHP, rather than using the Zip module: Zipping and Unzipping Files with PHP
Google is your friend :)

Categories