Capturing a file POSTed to PHP without writing to disk - php

Is there a way to have PHP capture a file being uploaded (POSTed) such that PHP can manipulate the file before it is ever written to disk?
Example application: A form where a user can upload a file to my web application, and my application encrypts it with PGP before it ever is written to disk.

Uploaded files are stored as temporary files by PHP until you manipulate them (see http://www.php.net/manual/en/features.file-upload.post-method.php). You can use INI settings to control where the files are stored, but you can't really intercept them before they are written to a temp location.
You can certainly PGP encrypt them before you move the file into its final location.

Depending on the library you use on the front end, it should be possible to base64 encode the file and then send it as regular post data to PHP, thus avoiding writing the file to disk. There would be added overhead, of course, and you will end up sending more data over the wire this way. But, it can work (I've done it, though scrapped the idea due to the additional bandwidth usage).
Your backend will need to properly decode the data and handle it from there. I would recommend chunking the data if you do it this way so you can retry failed portions without having to upload the entire file again.

Related

Process Uploaded file on web server without storing locally first?

I am trying to process the user uploaded file real time on the websever,
but it seems, APACHE invokes PHP, only once complete file is uploaded.
When i uploaded the file using CURL, and set
Transfer-Encoding : "Chunked"
I had some success, but can't do same thing via browser.
I used Dropzone.js but when i tried to set same header, it said Transfer -Encoding is an unsafe header, hence not setting it.
This answer explains what is the issue there.
Can't set Transfer-Encoding :"Chunked from Browser"
In a Nutshell problem is , when a user uploads the file to webserver, i want webserver to start processing it as soon as first byte is available.
by process i mean, PIPING it to a Named Pipe.
Dont want 500mb first getting uploaded to a server, then start processing it.
But with current Webserver (APACHE - PHP), I cant seem to be able to accomplish it.
could someone please explain, what technology stack or workarounds to use, so that i can upload the large file via browser and start processing it, as soon as first byte is available.
It is possible to use NodeJS/Multiparty to do that. Here they have an example of a direct upload to Amazon S3. This is the form, which sets content type to multipart/form-data. And here is the function for form parts processing. part parameter is of type ReadableStream, which will allow per-chunk processing of the input using data event.
More on readable streams in node js is here.
If you really want that (sorry don`t think thats a good idea) you should try looking for a FUSE Filesystem which does your job.
Maybe there is already one https://github.com/libfuse/libfuse/wiki/Filesystems
Or you should write your own.
But remember as soon as the upload is completed and the post script finishes his job the temp file will be deleted
you can upload file with html5 resumable upload tools (like Resumable.js) and process uploaded parts as soon as they received.
or as a workaround , you may find the path of uploaded file (usually in /tmp) and then write a background job to stream it to 3rd app. it may be harder.
there may be other solutions...

Are there risks of exploit with uploading a file this way?

I have a simple form in PHP that make the conversion of a file from XML to SQL or vice versa. In this form, there is an input where the user can upload a file with xml or sql extension. I do various check (extension of the file and file size), I read the file using the value tmp_name of the global array $_FILES, I do some operation for the conversion, and then I save the modified file in my server (compressed in gz) with the original name of the file in the client machine in order to give the link to the user and download it.
My question are: There are risks of exploit with the steps that I listed above? In general, there are risks of exploit with uploading a file? Or risks, to upload the file with the original name in the client machine?
Thanks.
There are always risks of exploit when allowing users to upload files, so it's good to be worried.
You say "I do some operation for the conversion", so while I cannot comment specifically on the safety of this operation, there could be risks here depending on the operation and the content provided to it (e.g. buffer overruns, invalid data).
I'm assuming you are saving your file with a .gz extension.
Saving with the client filename could pose compatibility problems if you do not clean the filename at all. There are differences in disallowed (or problem) characters between filesystems, such as & in Unix or : on Windows. Sometimes if you simply save a file, and simply read it again your code won't "find" it, unless you escape or strip all these characters properly.
The client filename could possibly pose a risk, if the name could have for example a path embedded into it. A presented filename such as "../../../../home/user/file" could possibly trick your program into overwriting a file, as long as the permissions were very badly implemented and you are simply concatenating paths. At worst I'd say this would be an annoyance or DoS attack - limited to overwriting gzip files and "breaking" them.
The client filename could possibly overwrite another user's files? I'm not sure what your namespacing is, but a clever attacker could try to trick another user into downloading their xml/sql file by naming it cleverly.
Also if you could guess someone else's filename, you could guess the resulting URLs and war-dial through them looking for content.
All of these risks would go away, if you used a nice GUID to name the file. Or mapped it to each user's session (e.g. file1.gz is only valid to that user's session).
I generally don't use client names, or seriously validate and clean them before re-presenting them.
The main risk of uploading files with php (and other interpreted languages, as a matter of fact) is that user can upload a .php file and if it's stored inside a web-root, then execute it.
From your question, it seems you only allow certain non .php extensions. Make sure you do this check on server-side, not just on the client. Also, if you don't need to serve the uploaded file on web afterwards, store it outside of your web-root. If you force the filename and extension after upload (not keeping the original one), then you can have even more control over what's going on in your part of the system.
Other file types can also be exploited (images for example, see https://imagetragick.com/) so it's a good idea to check specifically for the file-types you want uploadable.

PHP data upload security and optimization [duplicate]

I am allowing users to upload files to my server. What possible security threats do I face and how can I eliminate them?
Let's say I am allowing users to upload images to my server either from their system or from net. Now to check even the size of these images I have to store them in my /tmp folder. Isn't it risky? How can I minimize the risk?
Also let's say I am using wget to download the images from the link that the users upload in my form. I first have to save those files in my server to check if they actually are images. Also what if a prankster gives me a URL and I end up downloading an entire website full of malware?
First of all, realize that uploading a file means that the user is giving you a lot of data in various formats, and that the user has full control over that data. That's even a concern for a normal form text field, file uploads are the same and a lot more. The first rule is: Don't trust any of it.
What you get from the user with a file upload:
the file data
a file name
a MIME type
These are the three main components of the file upload, and none of it is trustable.
Do not trust the MIME type in $_FILES['file']['type']. It's an entirely arbitrary, user supplied value.
Don't use the file name for anything important. It's an entirely arbitrary, user supplied value. You cannot trust the file extension or the name in general. Do not save the file to the server's hard disk using something like 'dir/' . $_FILES['file']['name']. If the name is '../../../passwd', you're overwriting files in other directories. Always generate a random name yourself to save the file as. If you want you can store the original file name in a database as meta data.
Never let anybody or anything access the file arbitrarily. For example, if an attacker uploads a malicious.php file to your server and you're storing it in the webroot directory of your site, a user can simply go to example.com/uploads/malicious.php to execute that file and run arbitrary PHP code on your server.
Never store arbitrary uploaded files anywhere publicly, always store them somewhere where only your application has access to them.
Only allow specific processes access to the files. If it's supposed to be an image file, only allow a script that reads images and resizes them to access the file directly. If this script has problems reading the file, it's probably not an image file, flag it and/or discard it. The same goes for other file types. If the file is supposed to be downloadable by other users, create a script that serves the file up for download and does nothing else with it.
If you don't know what file type you're dealing with, detect the MIME type of the file yourself and/or try to let a specific process open the file (e.g. let an image resize process try to resize the supposed image). Be careful here as well, if there's a vulnerability in that process, a maliciously crafted file may exploit it which may lead to security breaches (the most common example of such attacks is Adobe's PDF Reader).
To address your specific questions:
[T]o check even the size of these images I have to store them in my /tmp folder. Isn't it risky?
No. Just storing data in a file in a temp folder is not risky if you're not doing anything with that data. Data is just data, regardless of its contents. It's only risky if you're trying to execute the data or if a program is parsing the data which can be tricked into doing unexpected things by malicious data if the program contains parsing flaws.
Of course, having any sort of malicious data sitting around on the disk is more risky than having no malicious data anywhere. You never know who'll come along and do something with it. So you should validate any uploaded data and discard it as soon as possible if it doesn't pass validation.
What if a prankster gives me a url and I end up downloading an entire website full of malware?
It's up to you what exactly you download. One URL will result at most in one blob of data. If you are parsing that data and are downloading the content of more URLs based on that initial blob that's your problem. Don't do it. But even if you did, well, then you'd have a temp directory full of stuff. Again, this is not dangerous if you're not doing anything dangerous with that stuff.
1 simple scenario will be :
If you use a upload interface where there are no restrictions about the type of files allowed for upload then an attacker can upload a PHP or .NET file with malicious code that can lead to a server compromise.
refer:
http://www.acunetix.com/websitesecurity/upload-forms-threat.htm
Above link discusses the common issues
also refer:
http://php.net/manual/en/features.file-upload.php
Here are some of them:
When a file is uploaded to the server, PHP will set the variable $_FILES[‘uploadedfile’][‘type’] to the mime-type provided by the web browser the client is using. However, a file upload form validation cannot depend on this value only. A malicious user can easily upload files using a script or some other automated application that allows sending of HTTP POST requests, which allow him to send a fake mime-type.
It is almost impossible to compile a list that includes all possible extensions that an attacker can use. E.g. If the code is running in a hosted environment, usually such environments allow a large number of scripting languages, such as Perl, Python, Ruby etc, and the list can be endless.
A malicious user can easily bypass such check by uploading a file called “.htaccess”, which contains a line of code similar to the below: AddType application/x-httpd-php .jpg
There are common rules to avoid general issues with files upload:
Store uploaded files not under your website root folder - so users won't be able to rewrite your application files and directly access uploaded files (for example in /var/uploads while your app is in /var/www).
Store sanitated files names in database and physical files give name of file hash value (this also resolves issue of storing files duplicates - they'll have equal hashes).
To avoid issues with filesystem in case there are too many files in /var/uploads folder, consider to store files in folders tree like that:
file hash = 234wffqwdedqwdcs -> store it in /var/uploads/23/234wffqwdedqwdcs
common rule: /var/uploads/<first 2 hash letters>/<hash>
install nginx if you haven't done its already - it serves static like magic and its 'X-Accel-Redirect' header will allow you to serve files with permissions being checked first by custom script

Serving large file downloads from remote server

We have files that are hosted on RapidShare which we would like to serve through our own website. Basically, when a user requests http://site.com/download.php?file=whatever.txt, the script should stream the file from RapidShare to the user.
The only thing I'm having trouble getting my head around is how to properly stream it. I'd like to use cURL, but I'm not sure if I can read the download from RapidShare in chunks and then echo them to the user. The best way I've thought of so far is to use a combination of fopen, fread, echo'ing the chunk of the file to the user, flushing, and repeating that process until the entire file is transferred.
I'm aware of the PHP readfile() function aswell, but would that be the best option? Bear in mind that these files can be several GB's in size, and although we have servers with 16GB RAM I want to keep the memory usage as low as possible.
Thank you for any advice.
HTTP has a Header called "Range" which basically allows you to fetch any chunk of a file (knowing that you already know the file size), but since PHP isn't multi-threaded aware, I don't see any benefit of using it.
Afaik, if you don't want to consume all your RAM, the only way to go is a two steps way.
First, stream the remote file using fopen()/fread() (or any php functions which allow you to use stream), split the read in small chunks (2048 bits may be enough), write/append the result to a tempfile(), then "echoing" back to your user by reading the temporary file.
That way, even a file 2To would, basically, consumes 2048 bits since only the chunk and the handle of the file is in memory.
You may also write some kind of proxy manager to cache and keep already downloaded files to avoid the remote reading process if a file is heavily downloaded (and keep it locally for a given time).

Mp3 streaming/downloading website - apache server memory issue

I have a website, in which users can upload mp3 files (uploadify), stream them using an html5 player (jplayer) and download them using a php script (www.zubrag.com/scripts/).
When a user uploads a song, the path to the audio file is saved in the database and i'm using that data in order to play and show a download link for the song.
The problem that i'm experiencing is that, according to my host, this method is using a lot of memory on the server, which is dedicated.
Link to script: http://pastebin.com/Vus8SRa7
How should I handle the script properly? And what would be the best way to track down the problem? Any ideas on cleaning up the code?
Any help much appreciated.
I would recommend storing your files on disk (named something random [check for collisions!] or sequential, without file extension, and outside of the doc root), and only store information in your DB. It's much easier to stream a file from disk this way than it is out of a database result.
When you retrieve an entire file's contents out of a database result, that data has to be in memory. readfile() doesn't have this issue. Use headers to return the original file name when sending the file back to the client, if you wish.
I would suggest you not to buffer the content when you are writing binary data of the MP3 onto your HTTP output. That way you'd be saving a lot on physical and virtual memory usage.

Categories