Should we sanitize $_FILES['filename']['name']? - php

After the user uploads an image to the server, should we sanitize $_FILES['filename']['name']?
I do check file size/file type etc. But I don't check other things. Is there a potential security hole?
Thank you

Absolutely! As #Bob has already mentioned it's too easy for common file names to be overwritten.
There are also some issues that you might want to cover, for instance not all the allowed chars in Windows are allowed in *nix, and vice versa. A filename may also contain a relative path and could potentially overwrite other non-uploaded files.
Here is the Upload() method I wrote for the phunction PHP framework:
function Upload($source, $destination, $chmod = null)
{
$result = array();
$destination = self::Path($destination);
if ((is_dir($destination) === true) && (array_key_exists($source, $_FILES) === true))
{
if (count($_FILES[$source], COUNT_RECURSIVE) == 5)
{
foreach ($_FILES[$source] as $key => $value)
{
$_FILES[$source][$key] = array($value);
}
}
foreach (array_map('basename', $_FILES[$source]['name']) as $key => $value)
{
$result[$value] = false;
if ($_FILES[$source]['error'][$key] == UPLOAD_ERR_OK)
{
$file = ph()->Text->Slug($value, '_', '.');
if (file_exists($destination . $file) === true)
{
$file = substr_replace($file, '_' . md5_file($_FILES[$source]['tmp_name'][$key]), strrpos($value, '.'), 0);
}
if (move_uploaded_file($_FILES[$source]['tmp_name'][$key], $destination . $file) === true)
{
if (self::Chmod($destination . $file, $chmod) === true)
{
$result[$value] = $destination . $file;
}
}
}
}
}
return $result;
}
The important parts are:
array_map('basename', ...), this makes sure that the file doesn't contain any relative paths.
ph()->Text->Slug(), this makes sure only .0-9a-zA-Z are allowed in the filename, all the other chars are replaced by underscores (_)
md5_file(), this is added to the filename iff another file with the same name already exists
I prefer to use the user supplied name since search engines can use that to deliver better results, but if that is not important to you a simple microtime(true) or md5_file() could simplify things a bit.
Hope this helps! =)

The filename is an arbitrary user supplied string. As a general rule, never trust arbitrary user supplied values.
You should never use the user supplied filename as the name to save the file under on the server, always create your own filename. The only thing you may want to do with it is to save it as metadata for informational purposes. When outputting that metadata, take the usual precautions like sanitation and escaping.

you also need to check for duplicate names. It's too easy for multiple people to upload an image called 'mycat.jpg', which if uploaded to the same folder would overwrite a previously uploaded file by the same name. You can do this by putting a unique id in the file name (as Prix suggests). Also verify that the file type doesn't just end with an image extension but also is an actual image; you don't want your server acting as a blind host for random files.

Related

Comparing files and creating notepad to tell which line is different in php

I was stuck for sometime after this some sort of code. I'm not so new yet not so old in coding using php but I need help with regards to this one.
I have two directories which has more than 1 file, it may come as text or xml.
Lets say directory 1 is populated today and directory 2 will be populated tom.Both directories have the same number of files and the same name and order.
What I want to have is a comparing tool which will comapre each and every file but of course i need to compare those with similar names and then if there's a difference or update in either of the file, it will create a notepad which will write the filename, line in which they have difference.
I came up to this part but im stuck and i need your help guys.
Thanks.
<?php
$dirA = glob('C:\Users\aganda88\Desktop\testing docs\testingdocs1\*');
$dirB = glob('C:\Users\aganda88\Desktop\testing docs\testingdocs2\*');
//Checking that the files are the same
foreach($dirA as $fileName) {
// the file exists in the other folder as well with the same name
if ($exists = array_search($fileName, $dirB) !== false) {
// it exists!
if (md5_file($fileName) !== md5_file($dirB[$exists])) {
// The files are not identical so i need to create a text file to show what line is not the same
copy($fileName, $dirB[$exists]);
/* problem here is you didn't specify which in your requirements */
}
} else {
// it doesn't (what to do here? you didn't specify!)
}
}
// compare the other way
foreach($dirB as $fileName) {
// does the file exist in the other directory?
if ($exists = array_search($fileName, $dirA) !== false) {
// it exists!
if (md5_file($fileName) !== md5_file($dirA[$exists])) {
copy($fileName, $dirA[$exists]);
}
} else {
// it doesn't (what to do here? you didn't specify!)
}
}
?>
Get a list of all files in each directory
You can use glob to grab a list of files in each directory...
$dirA = glob('C:\Users\aganda88\Desktop\testing docs\testingdocs1\*');
$dirB = glob('C:\Users\aganda88\Desktop\testing docs\testingdocs2\*');
Checking that the files are the same
You can do a simple md5 checksum on matching filenames to determine if they are identical or not.
foreach($dirA as $fileName) {
// does the file exist in the other directory?
if ($exists = array_search($fileName, $dirB) !== false) {
// it exists!
if (md5_file($fileName) !== md5_file($dirB[$exists])) {
// The files aren't identical so one of them needs updated
copy($fileName, $dirB[$exists]);
/* problem here is you didn't specify which in your requirements */
}
} else {
// it doesn't (what to do here? you didn't specify!)
}
}
// compare the other way
foreach($dirB as $fileName) {
// does the file exist in the other directory?
if ($exists = array_search($fileName, $dirA) !== false) {
// it exists!
if (md5_file($fileName) !== md5_file($dirA[$exists])) {
copy($fileName, $dirA[$exists]);
}
} else {
// it doesn't (what to do here? you didn't specify!)
}
}
Update files
There's one part of your requirements that you have not clearly defined behavior in... Which file do you update? Because while a change may have been made to C:\Users\aganda88\Desktop\testing docs\testingdocs1\test1.txt these requirements do not distinguish between you overwriting C:\Users\aganda88\Desktop\testing docs\testingdocs1\test1.txt with C:\Users\aganda88\Desktop\testing docs\testingdocs2\test1.txt or C:\Users\aganda88\Desktop\testing docs\testingdocs2\test1.txt with C:\Users\aganda88\Desktop\testing docs\testingdocs1\test1.txt... i.e. the fact that the files are not the same doesn't specify which one should be the source and which one should be the destination. Without that it doesn't really matter if you know that they aren't the same or not...

Search Within Text Files (.doc, .docx, .pdf etc) in mysql database

I want to make a module which search within a files (file type: .doc, .docx, .pdf). By using "file_get_contents()" I can find the files but for that I have to specify the location where all the files are. In my case I have the files in many folders (like this: C:\xampp\htdocs\cats1\attachments\site_1\0xxx..) the files are always store in the "0xxx" folder (By other application). I just want to specify the path so that no matter how many "folders" the "0xxx" folder contain, it search within it. I am quite new to php, please do help. My code for this application is below.
<?php
$matched_files = array();
if(isset($_POST['submit']))
{
$skills = $_POST['skills'];
$experience= $_POST['experience'];
$location = $_POST['location'];
$path = 'C:\Docs';
$dir = dir($path);
// Get next file/dir name in directory
while (false !== ($file = $dir->read()))
{
if ($file != '.' && $file != '..')
{
// Is this entry a file or directory?
if (is_file($path . '/' . $file))
{
// Its a file, yay! Lets get the file's contents
$data = file_get_contents($path . '/' . $file);
// Is the str in the data (case-insensitive search)
if (stripos($data, $skills) !== false and (stripos($data, $experience) !== false and (stripos($data, $location) !== false)))
{
$matched_files[] = $file;
}
}
}
}
$dir->close();
$matched_files_unique = array_unique($matched_files);
}
?>
The files that you're mentioning are not text files. Additionally, it is not a good idea to store these files' contents in a database. Here's the approach I would take:
Store these files using their hash (generated from something like
sha1()) as the file name to store the files to the filesystem.
Create a table to store the metadata (file name, data uploaded, hash
name) of the files.
Within the above-mentioned table, create a text column to store
the extracted text from the files. Each file type will require a
different tool. For instance, for PDFs, you can use something like
pdftotext.
Do your searches in the database by selecting the filename (hash)
from the table where the keywords are contained within the text
column (or whatever search criteria you want).
Open the file named by the returned hash and return that file to the
user.

How to use $_GET path with file_exists and keep it safe?

I have a function to check if a file exists via jQuery which makes a call to a PHP script which I'll use when changing certain images at the click of a button on my index page.
jQuery function:
function fileExists(path){
$.getJSON("/ajax/fileExists.php",{ path: path },
function (data){
return data.path;
});
}
fileExists.php:
$path=$_SERVER['DOCUMENT_ROOT'].'/packs'.$_GET['path'];
if(file_exists($path)){
echo json_encode(TRUE);
}else{
echo json_encode(FALSE);
}
I'm worried about people using this script to list the contents of my server or files which I may not want them to know about so I've used DOCUMENT_ROOT and /packs to try to limit calls to that directory but I think people can simply use ../ within the supplied path to check alternatives.
What is the best way to make this safe, ideally limit it to /packs, and are there any other concerns I should worry about?
Edit: an example call in javascript/jQuery:
if( fileExists('/index.php') ){
alert('Exists');
}else{
alert('Doesn\'t exist');
}
This is how I've handled it in the past:
$path = realpath($_SERVER['DOCUMENT_ROOT'].'/packs'.$_GET['path']);
if (strpos($path, $_SERVER['DOCUMENT_ROOT']) !== 0) {
//It's looking to a path that is outside the document root
}
You can remove any path-transversing from your filename:
$path_arr = explode("/", $_GET['path']);
$path = $path_arr[count($path_arr - 1)];
Such a practice is moderately secure and fast (O(1) complexity) but is not really the best as you have to watch out for encoding, character replacement and all like stuff.
But the overall best practice (though less faster depending on your directory size, let's say O(n) complexity) would be to use readdir() to get a list of all the files in your /packs directory then see if the supplied filename is present:
$handle = opendir($path=$_SERVER['DOCUMENT_ROOT'].'/packs');
while (false !== ($entry = readdir($handle))) {
if ($entry === $_GET['path']) {
echo json_encode(TRUE);
return;
}
}
echo json_encode(FALSE);

Optimize PHP function

I have a function that detects all files started by a string and it returns an array filled with the correspondent files, but it is starting to get slow, because I have arround 20000 files in a particular directory.
I need to optimize this function, but I just can't see how. This is the function:
function DetectPrefix ($filePath, $prefix)
{
$dh = opendir($filePath);
while (false !== ($filename = readdir($dh)))
{
$posIni = strpos( $filename, $prefix);
if ($posIni===0):
$files[] = $filename;
endif;
}
if (count($files)>0){
return $files;
} else {
return null;
}
}
What more can I do?
Thanks
http://php.net/glob
$files = glob('/file/path/prefix*');
Wikipedia breaks uploads up by the first couple letters of their filenames, so excelfile.xls would go in a directory like /uploads/e/x while textfile.txt would go in /uploads/t/e.
Not only does this reduce the number of files glob (or any other approach) has to sort through, but it avoids the maximum files in a directory issue others have mentioned.
You could use scandir() to list the files in the directory, instead of iterating through them one-by-one using readdir(). scandir() returns an array of the files.
However, it'd be better if you could change your file system organization - do you really need to store 20000+ files in a single directory?
As the other answers mention, I'd look at glob(), scandir(), and/or the DirectoryIterator class, there is no need to recreate the wheel.
However watch out! check your operating system, but there may be a limit on the maximum number of files in a single directory. If this is the case and you just keep adding files in the same directory you will have some downtime, and some problems, when you reach the limit. This error will probably appear as a permissions or write failure and not an obvious "you can't write more files in a single directory" message.
I'm not sure but probably DirectoryIterator is a bit faster. Also add caching so that list gets generated only when files are added or deleted.
You just need to compare the first length of prefix characters. So try this:
function DetectPrefix($filePath, $prefix) {
$dh = opendir($filePath);
$len = strlen($prefix);
$files = array();
while (false !== ($filename = readdir($dh))) {
if (substr($filename, 0, $len) === $prefix) {
$files[] = $filename;
}
}
if (count($files)) {
return $files;
} else {
return null;
}
}

Ensure a user-defined path is safe in PHP

I am implementing a simple directory listing script in PHP.
I want to ensure that the passed path is safe before opening directory handles and echoing the results willy-nilly.
$f = $_GET["f"];
if(! $f) {
$f = "/";
}
// make sure $f is safe
$farr = explode("/",$f);
$unsafe = false;
foreach($farr as $farre) {
// protect against directory traversal
if(strpos($farre,"..") != false) {
$unsafe = true;
break;
}
if(end($farr) != $farre) {
// make sure no dots are present (except after the last slash in the file path)
if(strpos($farre,".") != false) {
$unsafe = true;
break;
}
}
}
Is this enough to make sure a path sent by the user is safe, or are there other things I should do to protected against attack?
It may be that realpath() is helpful to you.
realpath() expands all symbolic links
and resolves references to '/./',
'/../' and extra '/' characters in the
input path, and returns the
canonicalized absolute pathname.
However, this function assumes that the path in question actually exists. It will not perform canonization for a non-existing path. In this case FALSE is returned.

Categories