how to safely join strings to a path in php? [duplicate] - php

This question already has answers here:
Preventing Directory Traversal in PHP but allowing paths
(7 answers)
Closed 9 years ago.
I have a constant beginning of a string, and a variable ending, how can I secure the string so that is doesn't create a step-back (or step-up) in case the string I inject contains
../
Here is a short sample code:
$dir = 'my/base/path/';
$file = $dir . $userSelectedFilename;
unlink($file);
If $userSelectedFilename would be '../../myFileName' I assume that would cause my script to actually try to unlink something two directory levels up my/myFilename which is clearly not something I want to allow, I want to keep it under the basepath my/base/path/ under all circumstances.

I suggest the the following, and only the following method:
<?
$dir = 'my/base/path/';
$file = $dir . $userSelectedFilename;
if(strpos(realpath($file),realpath($dir)) === 0) && is_file($file)) { // beware of the three ===
unlink($file);
}
Why?
It is safe to rely on realpath to find out the real location of a file which eliminates directory traversal / multibyte double-dots etc.
After that we check whether the beginning of the reapath of the file is really the beginning of our expacted directory (strpos).
After that we also check whether the file is really a file and not some symlink pointing elswhere or something like that.
I have seen character eliminating solutions been broken by multibyte strings and similar attacks.
This method so far far withstands all of these.

You could filter out those characters by doing something like:
$file = preg_match("/\.\.\//", "", $file)
which will remove occurrences of the string ../
And just a side note, you should probably find a different way of allowing users to select files to delete rather than allowing them to input the path as a string, maybe by showing them a directory listing of files they can delete or something like that.

You can do this "my/base/path/../../dir/", if you want "real" path use this :
echo realpath($dir."../../dir/"); // my/dir
http://php.net/manual/en/function.realpath.php

Using regex to validate the string for ../ or /../ and not accepting the string if the regex returns true:
function validatePath($path) {
if(preg_match('#\/\.\.\/#',$path))
return False;
}

Related

What is the security issue with my code?

A few years ago, I posted an answer to a question about a way, in PHP, to let the user pass in the URI the relative path to the file to download, while preventing directory traversal.
I got a few comments telling that the code is insecure, and a few downvotes (the most recent being today). Here's the code:
$path = $_GET['path'];
if (strpos($path, '../') !== false ||
strpos($path, "..\\") !== false ||
strpos($path, '/..') !== false ||
strpos($path, '\..') !== false)
{
// Strange things happening.
}
else
{
// The request is probably safe.
if (file_exists(dirname(__FILE__) . DIRECTORY_SEPARATOR . $path))
{
// Send the file.
}
else
{
// Handle the case where the file doesn't exist.
}
}
I reviewed the code again and again, tested it, and still can't understand what's the security issue it introduces.
The only hint I got in the comments is that ../ can be replaced by %2e%2e%2f. This is not an issue, since PHP will automatically transform it into ../.
What is the problem with this piece of code? What could be the value of the input which would allow directory traversal or break something in some way?
There are lots of other possibilities that could slip through, such as:
.htaccess
some-secret-file-with-a-password-in-it.php
In other words, anything in the directory or a subdirectory would be accessible, including .htaccess files and source code. If anything in that directory or its subdirectories should not be downloadable, then that's a security hole.
I've just ran your code through Burp intruder and cannot find any way round it in this case.
It was probably down voted due to exploits against other/old technology stacks which employed a similar approach by blacklisting certain character combinations.
As you mention, the current version of PHP automatically URL decodes input, but there have been flaws where techniques such as double URL encoding (dot = %252e), 16 bit Unicode encoding (dot = %u002e), overlong UTF-8 Unicode encoding (dot = %c0%2e) or inserting a null byte (%00) could trick the filter and allow the server side code to interpret the path as the unencoded version once it had been given a thumbs up by the filter.
This is why it has set alarm bells ringing. Even though your approach appears to work here, generally it may not be the case. Technology is always changing and it is always best to err on the side of caution and use techniques that are immune to character set interpretations wherever possible such as using whitelists of known good characters that will likely to be always good, or using a file system function (realpath was mentioned in the linked answer) to verify that the actual path is the one you're expecting.
I can’t think of any case in which this should fail.
However, I don’t know how PHP’s file_exists is implemented internally and whether it has some currently unknown quirks. Just like PHP had null byte related issues with some file system functions until PHP 5.3.4.
So to play it safe, I’d rather like to check the already resolved path instead of blindly trusting PHP and – probably more important – my assumption, the four mentioned sequences are the only ones that can result in a path that is above the designated base directory. That’s why I would prefer ircmaxell’s solution to yours.
Blacklisting is a bad habit. You're better off with a whitelist (either on the literal strings allowed or on the characters allowed.)
if(preg_match('/^[A-Za-z0-9\-\_]*$/', $path) ) {
// Yay
} else {
// No
}
Or alternatively:
switch($path) {
case 'page1':
case 'page2':
// ...
break;
default:
$path = 'page1';
break;
}
include $path;

shell_exec() statement to pdftotext entire directory?

I'm at a loss as to how I could build a loop to pdftotext and entire directory through a shell_exec() statement.
Something like :
$pdfs = glob("*.pdf");
foreach($pdfs as $pdfs) {
shell_exec('pdftotext '.$pdfs.' '.$pdfs'.txt');
}
But I'm unsure how I can drop the .pdf extension the 2nd time I call $pdfs in my shell_exec() statement and replace that with .txt
Not really sure this loop is correct either....
Try
foreach(glob("*.pdf") as $src) {
// Manually remove file extension because glob() may return a dir path component
$parts = explode('.', $src);
$parts[count($parts) - 1] = 'txt';
$dest = implode('.', $parts);
// Escape shell arguments, just in case
shell_exec('pdftotext '.escapeshellarg($src).' '.escapeshellarg($dest));
}
Basically, loop the PDF files in the directory and execute the command for each one, using just the name component of the file name (extracted with pathinfo())see edit for the output file (so test.pdf becomes test.txt).
Using the result of glob() directly in foreach easily avoids the variable naming collision you had in the code above.
EDIT
I have change the above code to manually remove the file extension when generating the output file name. This is because glob() may return a directory component of the path strings, as well as just a file name. Using pathinfo() or basename() will strip this off, and since we know that a . will be present in the file name (the rule passed to glob() dictates this) we can safely remove everything after the last one. I have also added escapeshellarg() for good measure - it is highly unlikely (if not impossible) that a file name that already exists would fall foul of this, but it is best to be safe.
$pdfs = glob("*.pdf");
$fmt='/path/to/pdftotext "%s" "%s.txt"';
foreach($pdfs as $thispdf) {
shell_exec(sprintf($fmt, $thispdf, basename($thispdf, ".pdf")));
}

PHP - Open or copy a file when knowing only part of its name?

I have a huge repository of files that are ordered by numbered folders. In each folder is a file which starts with a unique number then an unknown string of characters. Given the unique number how can i open or copy this file?
for example:
I have been given the number '7656875' and nothing more.
I need to interact with a file called '\server\7656800\7656875 foobar 2x4'.
how can i achieve this using PHP?
If you know the directory name, consider using glob()
$matches = glob('./server/dir/'.$num.'*');
Then if there is only one file that should start with the number, take the first (and only) match.
Like Yacoby suggested, glob should do the trick. You can have multiple placeholders in it as well, so if you know the depth, but not the correct naming, you can do:
$matchingFiles = glob('/server/*/7656875*');
which would match
"/server/12345/7656875 foo.txt"
"/server/56789/7656875 bar.jpg"
but not
"/server/12345/subdir/7656875 foo.txt"
If you do not know the depth glob() won't help, you can use a RecursiveDirectoryIterator passing in the top most folder path, e.g.
$iterator = new RecursiveIteratorIterator(
new RecursiveDirectoryIterator('/server'));
foreach($iterator as $fileObject) {
// assuming the filename begins with the number
if(strpos($fileObject->getFilename(), '7656875') === 0) {
// do something with the $fileObject, e.g.
copy($fileObject->getPathname(), '/somewhere/else');
echo $fileObject->openFile()->fpassthru();
}
}
* Note: code is untested but should work
DirectoryIterator return SplFileInfo objects, so you can use them to directly access the files through a high-level API.
$result = system("ls \server\" . $specialNumber . '\');
$fh = fopen($result, 'r');
If it's hidden below in sub-sub-directories of variable length, use find
echo `find . -name "*$input*"`;
Explode and trim each result, then hope you found the correct one.

Is ASCII "../" the only byte sequence that indicates a directory traversal in PHP?

I have a PHP app that uses a $_GET parameter to select JS/CSS files on the filesystem.
If I deny all requests in which the input string contains ./, \ or a byte outside the visible 7-bit ASCII range, is this sufficient to prevent parent directory traversals when the path is passed to PHP's underlying (C-based) file functions?
I'm aware of null-byte vulnerabilities, but are there any other alternative/malformed character encodings tricks that might squeak by these checks?
Here's the basic idea (not production code):
$f = $_GET['f']; // e.g. "path/to/file.js"
// goal: select only unhidden CSS/JS files within DOC_ROOT
if (! preg_match('#^[\x20-\x7E]+$#', $f) // outside visible ASCII
|| false !== strpos($f, "./") // has ./
|| false !== strpos($f, "\\") // has \
|| 0 === strpos(basename($f), ".") // .isHiddenFile
|| ! preg_match('#\\.(css|js)$i#', $f) // not JS/CSS
|| ! is_file($_SERVER['DOCUMENT_ROOT'] . '/' . $f)) {
die();
}
$content = file_get_contents($_SERVER['DOCUMENT_ROOT'] . '/' . $f);
Update: My question is really about how the C filesystem functions interpret arbitrary ASCII sequences (e.g. if there are undocumented escape sequences), but I realize this is likely system-dependent and perhaps unanswerable in practice.
My active validation additionally requires that realpath($fullPath) start with realpath($_SERVER['DOCUMENT_ROOT']), ensuring that the file is within the DOC_ROOT, but a goal of this posting was to ditch realpath() (it's proven unreliable in various environments) while still allowing unusual, but valid URIs like /~user/[my files]/file.plugin.js.
When filtering input for security, always use whitelists, not backlists.
You should reject all paths that don't match /^([A-Za-z0-9_-]+\/?)*[A-Za-z0-9_-]+\.(js)|(css)?$/.
This will only allow normal segmented paths where each segment has letters, numbers, or _-.
You mention it yourself, but comparing realpath of the input to a known root is the best solution I can think of. Realpath will resolve any hidden features of the path/filesystem, including symlinks.
Might require a little rearchitecting, but even if you are passed ../../passwd, basename() will insulate it. Then, you could place all of the files you want to serve in one folder.
Given ../../././././a/b/c/d.txt, basename($f) will be d.txt; this approach seems wiser to me, instead of trying to outsmart the user and forgetting a hole.

How to conduct File path operation in PHP?

Something like:
/directory/a/b - /directory/ = a/b
Is it possible to do this easily?
Since you're working with paths, platform sensitivity is important; Windows has a different path separator than most other platforms, and to write reusable code you can't snub a platform.
PHP has a few functions to deal with paths. If you're handed a really strange path like ~foo/bar//bitty/../index.php, use realpath to clean that up for you.
$path = realpath("~foo/bar//bitty/../index.php");
/* output: /home/foo/bar/index.php */
Other functions will aid you -- for example, to get the path part of a filename by itself, use dirname:
print dirname($path);
/* output: /home/foo/bar */
Once you have that, split on the separators and do whatever work you want. The real trick is having PHP worry about all the weirdness in paths for you, and then just working with each part separately. Look into pathinfo and basename as well. I think this is what you were asking for, not how to do dumb string replacements.
Don't forget not allowing injection to your application! Working with paths from Web input is dangerous. Never trust user input.
echo str_replace("/directory/","","/directory/a/b");
And to use this on other types of strings, your full string goes in the third parameter, and whatever you're "subtracting" goes as the first parameter.
Using the dirname() funciton and some strings you can cut the original path up and get the pieces.
<?php
// from: http://php.net/manual/en/function.dirname.php
$path = "/dirname/a/b";
$dir = dirname(dirname($path));
echo "dir at front=$dir\n";
$len = strlen($dir);
$dirname = substr ( $path, 0, $len+1 );
echo "dirname=$dirname\n";
$last_2 = substr ( $path, $len+1 );
echo "last_2=$last_2\n";
?>
results in
$ php x.php
dir at front=/dirname
dirname=/dirname/
last_2=a/b

Categories