Hello everyone, I'm developing a photo sharing web site using the CodeIgniter PHP framework. The idea is that people could upload their photos, manage them (through some sort of file browser which allows them to create subfolders, drag files around, etc) and edit them (some basic things like resizing, rotating and cropping to start with, and later on, I'll add some advanced features).
I've already implemented a third party authentication solution for CI (Redux Authentication 2 Beta) and I'm now integrating a JS/PHP file manager (AjaxExplorer), but the problem is that the PHP backend for managing files (moving, copying, etc) is trusting too much on the user input from the ajax calls. For instance, it's doing things like this (simplified for the sake of clarity):
move_uploaded_file($_FILES['upload']['tmp_name'], $root.$username.$_POST['destination_dir']);
As you can see, there are obvious security concerns as it blindly accepts whatever path the user throws in! I can already see someone sending something like "../AnotherUser/" as the $_POST['destination_dir'] value.
My question is: What's the best way to "sandbox" a user, in order to only allow him to manage his own data? Do I just validate+filter the inputs, hoping to catch every attempt of intrusion? Are there any libraries/packages dedicated to address this specific issue?
I think this problem must be somehow solved in any (mature enough) project, which gives its users the power of managing their files through a web browser, so I expected to find some clear guidelines around this (as there are a lot about SQL Injection, XSS, CSRF, etc) but I guess I'm not using the right keywords.
What's the best way to "sandbox" a user, in order to only allow him to manage his own data?
Allow any filenames/directory names the user wants, but simply don't use them on the server side filesystem. Instead, write the path names into a database with a primary key, and use the primary key as a filename like ‘34256.dat’ in a flat storage directory (or even as a BLOB in the database if you prefer). Then serve up via a download script or URL rewrite to make the desired filename appear in the URL.
Sanitising incoming filenames is hard. Detecting ‘..’ is only the beginning. Too-long filenames; too-short filenames; combinations of leading and trailing dots; combinations of leading and trailing whitespace; the different directory separators of different platforms; characters that are invalid on some platforms; control characters; Unicode characters and the environment-specific ways of addressing them; ADSs; filenames (‘.htaccess’) or extensions (‘.php’, ‘.cgi’) that might be ‘special’ to your web server; Windows's reserved filenames...
You can spend a lifetime tracking down funny little quirks of filepath rules on various platforms, or you can just forget it and use the database.
I'm not sure what your destination_dir looks like, but what I thought of was assigning directories keys, and then getting the directory based on that key. For example:
//$_POST['destination_dir'] = '4hg43h5g453j45b3';
*_query('SELECT dir FROM destinations WHERE key = ? LIMIT 1'); //etc.
However you have to predefine keys before hand. Another alternative could be the opposite: md5/sha1 the input and use that as the destination_dir, then store that key in the database with the associated label.
There are no library's that I know of.
However in your particular example, strip all (back)slashes and dots from the string and then append a slash to the end of it, that way the user can't change folders.
$destdir = str_replace(array('.', '/', '\\'), '', $_POST['destination_dir']);
$destdir .= "/";
Related
I have partial understanding of why a developer will have a filename like afs342sf.css opposed to something more readable like main.css - I do not believe that the developer named the file manually; I'm sure it was done programmatically upon insert into whatever database. I'm a bit baffled on why this would be needed, and how it will be called.
If a database has a table with, and excluding others for simplicity:
file_id, file_name, file_display_name, file_size.......... etc.
When called for data it's using file_display_name (afs342sf.css or simply afs342sf) as a reference - href="/yourhost/www/afs342sf.css" - what on earth is the difference when someone can easily use the same GET request info, or have I got this theory all wrong? - I'm a paranoid one typically (apparently good for security) and have confused myself because it could also be the id for it, but isn't that giving away too much? Then there's the thought of what if the program changes the filename upon every request; could it get lost when other requests are incoming, and it doesn't have a fixed address name?
Last but not least, I would much appreciate it if anyone could post any links to pages that could help with stealthy, or concealed file retrieval methods. For the record I do not hide the .php extension - being self taught and learning from a trusted community is overwhelming for knowledge.
You're right, it's not named manually. Arbitrary filenames like that usually mean they are generated with a tool like Assetic. This is primarily used for files that have to be converted before being put on the web (SASS to CSS; Coffeescript to Javascript).
Assetic also has a cache-busting plug-in that generates filenames based on the hash so when the contents change, browsers will be forced to fetch the new file (this is a standard cache-busting technique). This is useful because static files usually have long expiry dates, and there's no other way to alert the browser that the file has changed.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I am using a GoDaddy web hosting plan on a Windows platform. This was not my choice -- it has to do with a different part of the actual site using ASP.NET (also not my choice).
I have a SQL database with a bunch of entries with some non-sensitive customer information. The primary key on this is an AutoIncrement integer, and I have a series of PDF files that match up with each of those integers (e.g. 555.pdf, 7891.pdf, etc).
My goal is to restrict direct access to these files, I want users to have to go through a search and login process (PHP) first. Originally I planned to put the files above the PUBLIC_HTML folder, but GoDaddy refuses to give me root access without a dedicated server ($20 a month from them).
The next thing I looked into was HTACCESS. I was going to restrict access to the files to only PHP scripts by only allowing access to the Server's IP Address (or localhost/127.0.0.1). Unfortunately this doesn't work because GoDaddy does not run Apache on its Windows servers.
I could put the files into BLOBs in the database, but that gets really messy when I need to work with them quickly (plus I have had some trouble with that approach).
Any suggestions to restrict access to the files only to a PHP script (readfile())?
Since you can't put the files anywhere but in your public_html directory, you'll have to go for the feared/hated "security by obscurity" method
Create a randomly named sub-directory to store the files in: public_html/RANDOMGARBAGE
Make sure the directory is not browseable. Disable directory browsing (if you can), and put a default document (index.html?) in there as well, so even if browsing is on, you won't get the directory listing.
Don't store your files with guessable names. Instead of storing them with the database ID, store them with a salted+hashed name instead: $crypted_filename = sha1($real_filename . 'some hard-to-guess salt text'); (of course, make this more complex if you need to). Store the original filename in your database. So you end up with something like:
public_html/RANDOMGARBAGE/5bf1fd927dfb8679496a2e6cf00cbe50c1c87145
public_html/RANDOMGARBAGE/7ec1f0eb9119d48eb6a3176ca47380c6496304c8
Serve up the files via a PHP script - never link to the hashed filename directly
Download
which then does:
<?php
$fileID = (int)$_GET['fileID'];
$crypted_file = sha1($fileID . 'some hard-to-guess salt text');
$full_path = 'public_html/RANDOMGARBAGE/' . $crypted_file;
if (is_readable($full_path)) {
if(user_is_allowed_to_see_this_file()) {
/// send file to user with readfile()
header("Content-disposition: attachment; filename=$ORIGINAL_FILENAME");
readfile($full_path);
} else {
die("Permission denied");
}
} else {
/// handle problems here
die("Uh-oh. Can't find/read file");
}
This way the user will never see what your "s00per seekrit" filename is, they'll just see their browser hit ...php?fileID=37 and start a download of secret file.pdf
On top of this, you can occasionally rename the special sub-directory to something else on a regular basis, as well as change the salt text (which then requires you update all the hashed filenames with the new sha1 values).
You can simply hide them. It's security-through-obscurity, but it sounds like your best option if you can't either keep them out of the web-root, or find a way to tell the server not to serve them directly.
So stick them in some randomly-named directory:
asd8b8asd8327bh/123.pdf
asd8b8asd8327bh/124.pdf
asd8b8asd8327bh/125.pdf
...
Then write yourself a little PHP script that will send appropriate headers, and pass the file contents through.
for example:
<?PHP
//pdf.php
$id = $_GET['id'];
//make sure nobody is doing anything sneaky. is_numeric() might do the trick if the IDs are always integers.
if (!some_validation_passes($id)){
die();
}
<?php
header('Content-type: application/pdf');
header('Content-Disposition: attachment; filename="'.$id.'.pdf"');
readfile('asd8b8asd8327bh'.$id.'pdf');
Now, the above is really no better than just serving the files directly (yet), since people can still increment the id parameter in the query string.
But you ought to be able to figure out how to handle authorization pretty easily.
Because PHP uses the web server user's permissions, there is no way to restrict access to the files without either:
Placing them outside the DOCROOT
Changing the web server configuration to disallow access to those files
Changing the file so it will be interpreted by the web server, thus hiding its contents
Putting them in a database counts as outside the DOCROOT. For the third option, you could make the PDFs PHP files, but honestly, that would be pretty convoluted.
I recommend you contact GoDaddy and see if they have some way to configure per-directory file permissions.
Make a folder web inaccessable via chmod. PHP will still be able to include/require whatever is on the server, but users will not be able to navigate to the files ever.
Example:
This is set to 770, IE User and Group can Read/Write/Execute, Other can do nothing.
I have searched around a bit, and have not really found a professional type response to how to have secure fileupload capability. So I wanted to get the opinion of some of the experts on this site. I am currently allowing upload of mp3s and images, and while I am pretty confident in preventing xss and injection attacks on my site, I am not really familiar with fileupload security. I basically just use php fileinfo and check an array of accepted filetypes against the filetype. For images, there is the getimagesize function and some additional checks. As far as storing them, I just have a folder within my directory, because I want the users to be able to use the files. If anyone could give me some tips I would really appreciate it.
I usually invoke ClamAV when accepting files that can be shared. With PHP, this is rather easily accomplished with php-clamav.
One of the last things you want to do is spread malware around the globe :)
If you can, do this in the background after a file is uploaded, but before making it public. A quirk with this class is that it can load the entire ClamAV virus definition database into memory, which will almost certainly stink if PHP is running under Apache conventionally (think on the order of +120 MB of memory per instance).
Using something like beanstalkd to scan uploads then update your DB to make them public is a very good way to work around this.
I mentioned this only because the other answers had not, in no way did I intend this to be a complete solution. See the other answers posted here, this is a step you should be finishing with. Always, always, always sanitize your input, make sure it's of the expected type, etc (did I mention that you should read the other answers too?)
"malicious" files are not the only way to hurt your server (and if your site is down, it hurts your users).
For example, a possibility to hurt a server would be to upload a lot of very small files :
it would not use all the space on the disk,
but could use all available inodes...
...And when there is no free inode left, it's not possible to create any file anymore ; which, obviously, is bad.
After that, there is also the problems like :
copyright
content that is not OK to you or your users (nudity ? )
For that, there's not much you an do with technical solutions -- but an "alert the moderator" feature is oftne helpful ;-)
No, because this could easily be spoofed. There's an article that describes how a server could be attacked by uploading a 1x1 "jpg file" and how to prevent it. Good read.
The first thing to do would be to disable execution of any server side code (e.g. PHP) in that directory via server configuration. Setting up a whitelist for MIME types (or file extensions, since your server uses those to figure out the mime type in the first place) and only allowing media files (not HTML or anything) will protect you from XSS injections. Those combined with a file type check should be quite sufficient - the only thing I can think of that might get through those are things that exploit image/audio decoders, and for spotting those you'd need something close to a virus scanner.
To start with the "file-type" ($_FILES['userfile']['type']) is completely meaningless. This is a variable in the HTTP post request that can be ANY VALUE the attacker wants. Remove this check ASAP.
getimagesize() Is an excellent way to verify that an image is real. Sounds files can be a bit more tricky, you could call file /tmp/temp_uploaded_file on the commandline.
By far the most important part of an uploaded file is the file's extension. If the file is a .php, then you just got hacked. It gets worse, Apache can be configured to ignore the first file extension if it doesn't recognize it, and then use the next extension, so this file would be executed a normal .php file: backdoor.php.junk. By default this should be disabled, but it was enabled by default a few years ago.
You MUST MUST MUST use a file extension White List. So you want to force using files like: jpg,jpeg,gif,png,mp3 and reject it otherwise.
if exiv2 can't remove the metadata its probably malicious or corrupted in some way atleast. following required exiv2 be installed on your unix system. Unfortunately, this might be dangerous if the file contains malicious shell code. not sure how sturdy exiv2 is against shell exploits, so use with caution. i haven't used it, but i've thought about using it.
function isFileMalicious($file)
{
try{
$out = [];
#exec('exiv2 rm '.escapeshellarg($file).' 2>&1',$out);
if(!empty($out)){
return false;
}
}
catch(exception $e)
{
return false;
}
return true;
}
Update: I have now written a PHP extension called php_ssdeep for the ssdeep C API to facilitate fuzzy hashing and hash comparisons in PHP natively. More information can be found over at my blog. I hope this is helpful to people.
I am involved in writing a custom document management application in PHP on a Linux box that will store various file formats (potentially 1000's of files) and we need to be able to check whether a text document has been uploaded before to prevent duplication in the database.
Essentially when a user uploads a new file we would like to be able to present them with a list of files that are either duplicates or contain similar content. This would then allow them to choose one of the pre-existing documents or continue uploading their own.
Similar documents would be determined by looking through their content for similar sentances and perhaps a dynamically generated list of keywords. We can then display a percentage match to the user to help them find the duplicates.
Can you recommend any packages for this process and any ideas of how you might have done this in the past?
The direct duplicate I think can be done by getting all the text content and
Stripping whitespace
Removing punctuation
Convert to lower or upper case
then form an MD5 hash to compare with any new documents. Stripping those items out should help prevent dupes not being found if the user edits a document to add in extra paragraph breaks for example. Any thoughts?
This process could also potentially run as a nightly job and we could notify the user of any duplicates when they next login if the computational requirement is too great to run in realtime. Realtime would be preferred however.
Update: I have now written a PHP extension called php_ssdeep for the ssdeep C API to facilitate fuzzy hashing and hash comparisons in PHP natively. More information can be found over at my blog. I hope this is helpful to people.
I have found a program that does what its creator, Jesse Kornblum, calls "Fuzzy Hashing". Very basically it makes hashes of a file that can be used to detect similar files or identical matches.
The theory behind it is documented here: Identifying almost identical files using context triggered piecewise hashing
ssdeep is the name of the program and it can be run on Windows or Linux. It was intended for use in forensic computing, but it seems suited enough to our purposes. I have done a short test on an old Pentium 4 machine and it takes about 3 secs to go through a hash file of 23MB (hashes for just under 135,000 files) looking for matches against two files. That time includes creating hashes for the two files I was searching against as well.
I'm working on a similar problem in web2project and after asking around and digging, I came to the conclusion of "the user doesn't care". Having duplicate documents doesn't matter to the user as long as they can find their own document by its own name.
That being said, here's the approach I'm taking:
Allow a user to upload a document associating it with whichever Projects/Tasks they want;
The file should be renamed to prevent someone getting at it via http.. or better stored outside the web root. The user will still see their filename in the system and if they download it, you can set the headers with the "proper" filename;
At some point in the future, process the document to see if there are duplicates.. at this point though, we are not modifying the document. After all, there could be important reasons the whitespace or capitalization is changed;
If there are dupes, delete the new file and then link to the old one;
If there aren't dupes, do nothing;
Index the file for search terms - depending on the file format, there are lots of options, even for Word docs;
Throughout all of this, we don't tell the user it was a duplicate... they don't care. It's us (developers, db admins, etc) that care.
And yes, this works even if they upload a new version of the file later. First, you delete the reference to the file, then - just like in garbage collection - you only delete the old file if there are zero references to it.
What would be the safest way to include pages with $_GET without puttingt allowed pages in an array/use switch etc. I have many pages so no thank you.
$content = addslashes($_GET['content']);
if (file_exists(PAGE_PATH."$content.html")) {
include(PAGE_PATH."$content.html");
}
How safe is that?
Thanks.
This is very bad practice. You should setup a controller to handle dispatching to the code that needs to be executed or retrieved rather than trying to directly include it from a variable supplied by a user. You shouldn't trust user input when including files, ever. You have nothing to prevent them from including things you do not want included.
You'll sleep safer if you check the input for a valid pattern. e.g. suppose you know the included files never have a subdirectory and are always alphanumeric
if (preg_match('/^[a-z0-9]+$/', $_GET['page']))
{
$file=PAGE_PATH.$_GET['page'].".html";
if (file_exists($file))
{
readfile($file);
}
}
I've used readfile, as if the .html files are just static, there's no need to use include.
The possible flaw with your approach is that you can engineer a path to any HTML file in the system, and have it executed as PHP. If you could find some way to get an HTML file of your own devising on the filesystem, you can execute it through your script.
Match it against a regex that only accepts "a-zA-Z-".
edit: I don't think that blocking specific patterns is a good idea. I'd rather do, like I said, a regex that only accepts chars that we know that won't cause exploits.
Assuming the "allowed" set of pages all exist under PAGE_PATH, then I might suggest something like the following:
Trim page name
Reject page names which start with a slash (could be an attempt at an absolute path)
Reject page names which contain .. (could be an attempt at path traversal)
Explicitly prefix PAGE_PATH and include the (hopefully) safe path
If your page names all follow some consistent rules, e.g. alphanumeric characters, then you could in theory use a regular expression to validate, rejecting "bad" page names.
There's some more discussion of these issues on the PHP web site.
It looks generally safe as in you are checking that the page actually exists before displaying it. However, you may wish to create a blacklist of pages that people should not be able to view with the valid $_SESSION credentials. This can be done either with an array/switch or you can simply have all special pages in a certain directory, and check for that.
You could scan the directory containing all HTML templates first and cache all template names in an array, that you can validate the GET parameter against.
But even if you cache this array it still creates some kind of overhead.
Don't. You'll never anticipate all possible attacks and you'll get hacked.
If you want to keep your code free of arrays and such, use a database with two columns, ID and path. Request the page by numeric ID. Ignore all requests for ids that are not purely numeric and not in your range of valid IDs. If you're concerned about SEO you can add arbitrary page names after the numeric id in your links, just like Stack Overflow does.
The database need not be heavy-duty. You can use SQLite, for example.
The safest method involves cleaning up the request a bit.
Strip out any ../
Strip out ^\/
From there, make sure that you check to see if the file they're requesting exists, and can be read. Then, just include it.
You should use at least something like that to prevent XSS attacks.
$content = htmlentities($_GET['page'], ENT_QUOTES, 'UTF-8');
And addslashes won't protect you from SQL Injections.