Memory leak at .csv download - php

Is it possible that the creation of a rather largge (20MB) .csv download creates a memory leak in case the user stops the download/export before the file has been saved on his machine?
If yes, how would you catch and counter this problem?

It's possible but I would imagine it would get cleared up eventually. Either way, HTTPds are generally a lot more efficient at serving files than a server side language.
If you're worried, save the file (I assume we're talking about a dynamically generated file) to the filesystem (somewhere where the server can see it) and redirect the user to that URL.
For security (albeit through obscurity), make the filename something hideous (eg a hash of their username and a description of the file) and make sure people can't get a directory listing of the dir it lives in. Might make sense to date-tag the file (eg: filename-year-month-day.ext) so you can run something automatic to clean up the files after 24 hours.

If you are generating the file on the fly and streaming it to the user you may want to look at
http://php.net/manual/en/features.connection-handling.php and perform some cleanup if the connection gets aborted or times out.

Related

why php uploaded files are stored temporarily in a temporary directory

in all the tutorials I've read the file upload form is first submitted and then the file in the temporary directory is evaluated to decide whether it should be moved to the permanent directory ... my question is : what's the point of that ?
if the file was uploaded by an attacker and it's executable wouldn't it harm the system before it's evaluated and deleted ?
other than that , does the user have to wait for the file to be uploaded just to find out that the file can't be uploaded because it doesn't have the expected format ?
I guess it's better to use some kind client-side code for that but I'm asking since no one was bringing that up as an option
(if it's better to evaluate the file using client-side code , how is it done ? )
thanks in advance
First of all (really important): Always validate user input on the server side. But still you may validate user input additionally on the client-side (for usability reasons).
As you already assumed, it's for security reasons.
You don't really want unsecure files (e.g. PHP code or other bad stuff) in your 'production user files directory' as this could have some other effects (like maybe already affecting website statistics being displayed, etc.).
So before you move uploaded user files to the correct directory, you should definitely make sure that the file is okay according to your business rules (e.g. no executable code, maximum file size, etc.) and then move it. This temporary upload directory should always be non-public (meaning the web server not having access to it).
There are a few reasons for doing it this way.
Firstly, it makes it less likely for interrupted uploads to be left lying around (for example if the server experiences a power cut while the file is being uploaded).
Secondly, it ensures all files in your upload directory are complete. You won't read any partial data when you read files in your upload directory.
Finally it allows keeping files that have not yet been validated (for example like checking their file type) separate from those that have.

Securing Uploaded Files (php and html)

I have a simple site which allows users to upload files (among other things obviously). I am teaching myself php/html as I go along.
Currently the site has the following traits:
--When users register a folder is created in their name.
--All files the user uploads are placed in that folder (with a time stamp added to the name to avoid any issues with duplicates).
--When a file is uploaded information about it is stored in an SQL database.
simple stuff.
So, now my question is what steps do I need to take to:
Prevent google from archiving the uploaded files.
Prevent users from accessing the uploaded files unless they are logged in.
Prevent users from uploading malicious files.
Notes:
I would assume that B, would automatically achieve A. I can restrict users to only uploading files with .doc and .docx extensions. Would this be enough to save against C? I would assume not.
There is a number of things you want to do, and your question is quite broad.
For the Google indexing, you can work with the /robots.txt. You did not specify if you also want to apply ACL (Access Control List) to the files, so that might or might not be enough. Serving the files through a script might work, but you have to be very careful not to use include, require or similar things that might be tricked into executing code. You instead want to open the file, read it and serve it through File operations primitives.
Read about "path traversal". You want to avoid that, both in upload and in download (if you serve the file somehow).
The definition of "malicious files" is quite broad. Malicious for who? You could run an antivirus on the uplaod, for instance, if you are worried about your side being used to distribute malwares (you should). If you want to make sure that people can't harm the server, you have at the very least make sure they can only upload a bunch of filetypes. Checking extensions and mimetype is a beginning, but don't trust that (you can embed code in png and it's valid if it's included via include()).
Then there is the problem of XSS, if users can upload HTML contents or stuff that gets interpreted as such. Make sure to serve a content-disposition header and a non-html content type.
That's a start, but as you said there is much more.
Your biggest threat is going to be if a person manages to upload a file with a .php extension (or some other extension that results in server side scripting/processing). Any code in the file runs on your server with whatever permissions the web server has (varies by configuration).
If the end result of the uploads is just that you want to be able to serve the files as downloads (rather than let someone view them directly in the browser), you'd be well off to store the downloads in a non web-accessible directory, and serve the files via a script that forces a download and doesn't attempt to execute anything regardless of the extension (see http://php.net/header).
This also makes it much easier to facilitate only allowing downloads if a person is logged in, whereas before, you would need some .htaccess magic to achieve this.
You should not upload to webserver-serving directories if you do not want the files to be available.
I suggest you use X-Sendfile, which is a header that instructs the server to send a file to the user. Your PHP script called 'fetch so-and-so file' would do whatever authentication you have in place (I assume you have something already) and then return the header. So long as the web server can access the file, it will then serve the file.
See this question: Using X-Sendfile with Apache/PHP

Security risk if allowing all file type extensions on user uploads

Background: I have a website where people can store transactions. As part of this transaction, they could attached a receipt if they wanted.
Question: Is there any security risk if a user is allowed to upload any type of file extension to my website?
Info:
The user will be only person to ever re-download the same file
There will be no opportunity for the user to "run" the file
They will only be able to download it back to themselves.
No other user will ever have access to another users files
There will be a size restriction on the say (say 2mb)
More info: I was originally going to restrict the files to "pdf/doc/docx" - but then realised some people might want to store a jpg, or a .xls etc - and realised the list of files they "might" want to store is quite large...
edit: The file will be stored outside public_html - and served via a "readfile()" function that accepts a filename (not a path) - so is there anything that can 'upset' readfile()?
Yes, it is definitely a security risk unless you take precautions. Lets say, to re-download the file, the use has to go to example.com/uploads/{filename}. The user could upload a malicious PHP file, and then 'redownload' it by going to example.com/uploads/malicious.php. This would, of course, cause the PHP script to execute on your server giving him enough power to completely wreck everything.
To prevent this, create a page that receives the filename as a parameter, and then serve the page to the user with the correct content-type.
Something like, example.com/files?filename=malicious.php
"There will be no opportunity for the user to "run" the file"
As long as you are 100% sure that that will hold true, it is secure. However, make sure the file will not be able to be executed by the webserver. For example, if the user uploads a .php file, make sure the server does not execute it.
Computers don't run programs magically by themselves, so basically you just need to ensure that the user has no ability to trick your server into running the file. This means making sure the proper handlers are disabled if the files are under the web root, or passing them through a proxy script if they are not (basically echo file_get_contents('/path/to/upload') with some other logic)
Another option would be to store the file like name.upload but this would require keeping a list of original names that map to the storage names.

PHP: How do I avoid reading partial files that are pushed to me with FTP?

Files are being pushed to my server via FTP. I process them with PHP code in a Drupal module. O/S is Ubuntu and the FTP server is vsftp.
At regular intervals I will check for new files, process them with SimpleXML and move them to a "Done" folder. How do I avoid processing a partially uploaded file?
vsftp has lock_upload_files defaulted to yes. I thought of attempting to move the files first, expecting the move to fail on a currently uploading file. That doesn't seem to happen, at least on the command line. If I start uploading a large file and move, it just keeps growing in the new location. I guess the directory entry is not locked.
Should I try fopen with mode 'a' or 'r+' just to see if it succeeds before attempting to load into SimpleXML or is there a better way to do this? I guess I could just detect SimpleXML load failing but... that seems messy.
I don't have control of the sender. They won't do an upload and rename.
Thanks
Using the lock_upload_files configuration option of vsftpd leads to locking files with the fcntl() function. This places advisory lock(s) on uploaded file(s) which are in progress. Other programs don't need to consider advisory locks, and mv for example does not. Advisory locks are in general just an advice for programs that care about such locks.
You need another command line tool like lockrun which respects advisory locks.
Note: lockrun must be compiled with the WAIT_AND_LOCK(fd) macro to use the lockf() and not the flock() function in order to work with locks that are set by fcntl() under Linux. So when lockrun is compiled with using lockf() then it will cooperate with the locks set by vsftpd.
With such features (lockrun, mv, lock_upload_files) you can build a shell script or similar that moves files one by one, checking if the file is locked beforehand and holding an advisory lock on it as long as the file is moved. If the file is locked by vsftpd then lockrun can skip the call to mv so that running uploads are skipped.
If locking doesn't work, I don't know of a solution as clean/simple as you'd like. You could make an educated guess by not processing files whose last modified time (which you can get with filemtime()) is within the past x minutes.
If you want a higher degree of confidence than that, you could check and store each file's size (using filesize()) in a simple database, and every x minutes check new size against its old size. If the size hasn't changed in x minutes, you can assume nothing more is being sent.
The lsof linux command lists opened files on your system. I suggest executing it with shell_exec() from PHP and parsing the output to see what files are still being used by your FTP server.
Picking up on the previous answer, you could copy the file over and then compare the sizes of the copied file and the original file at a fixed interval.
If the sizes match, the upload is done, delete the copy, work with the file.
If the sizes do not match, copy the file again.
repeat.
Here's another idea: create a super (but hopefully not root) FTP user that can access some or all of the upload directories. Instead of your PHP code reading uploaded files right off the disk, make it connect to the local FTP server and download files. This way vsftpd handles the locking for you (assuming you leave lock_upload_files enabled). You'll only be able to download a file once vsftp releases the exclusive/write lock (once writing is complete).
You mentioned trying flock in your comment (and how it fails). It does indeed seem painful to try to match whatever locking vsftpd is doing, but dio_fcntl might be worth a shot.
I guess you've solved your problem years ago but still.
If you use some pattern to find the files you need you can ask the party uploading the file to use different name and rename the file once the upload has completed.
You should check the Hidden Stores in proftp, more info here:
http://www.proftpd.org/docs/directives/linked/config_ref_HiddenStores.html

Server file read/write concurrency issue

I have a web-service serving from a MySQL database. I would like to create cache file to improve the performance. The idea is once a while we read data from DB and generate a text file. My question is:
What if a client-side user is accessing the file while we are generating it?
We are using LAMP. In PHP there is flock() handles concurrency problem, but my understanding is that it's only for when 2 PHP processes accessing the file simultaneously. Our case is different.
I don't know whether this will cause issues at all. If so, how can I prevent it?
Thanks,
don't use locking;
if your cachefile is /tmp/cache.txt then you should always regenerate the cache to /tmp/cache2.txt and then do a
mv /tmp/cache2.txt /tmp/cache.txt
or
rename('/tmp/cache2.txt','/tmp/cache.txt')
the mv/rename operation is atomic if it happens inside the same filesystem; no locking needed
All sorts of optimisation options here;
1) Are you using the MySQL queryCache - that can take a huge load off the database to start with.
2) You could pull the file through a web proxy like squid (or Apache configured as a reverse caching proxy). I do this all the time and it's a really handy technique - generate the file by fetching it from a url using wget for example (that way you can have it in a cron job). The web proxy takes care of either delivering the same file that was there before, or regenerating it if needs be.
3) You don't want to be rolling your own file locking solution in this scenario.
Depending on your scenario, you could also consider cacheing pages in something like memcache which is fantastic for high traffic scenarios, but possibly beyond the scope of this question.
You can use A -> B switching to avoid this issue.
E.g. : Let there be two copies of this cache file A and B, program should read these via a symlink, C.
When program is building the cache, it would modify the file that is not "current" I.e. if C link to A, update B. Once update is complete, switch symlink to B.
next time, update A and switch symlink to A once update is complete.
this way clients would never read a file while it is being updated.
When a client-side access the file, it reads it as it is in that moment.
flock() is for when 2 PHP processes accessing the file simultaneously.
I would solve it like this:
While generating the new text file, save it to a temporary file (cache.tmp), that way the old file (cache.txt) is being accessed like before.
When generation is done, delete the old file and rename the new file
To avoid problems during that short period of time, your code should check wether cache.txt exists and retry for a short period of time.
Trivial but that should do the trick

Categories