PHP regex vulnerability bet - php

A coworker today made a bet with me that he knows of a way to supply a specially formatted string that could pass the following regex check and still supply a file name with extension .php or .jsp or .asp:
if (preg_match('/\.(jpeg|jpg|gif|png|bmp|jpe)$/i', $var) && preg_match('/\.(asp|jsp|php)$/i', $var) == false)
{
echo "No way you have extension .php or .jsp or .asp after this check.";
}
As hard as I tried myself and searched the net, I was unable to find a flaw that would make such thing possible. Could I be overlooking something? Given that "null byte" vulnerability is dealt with, what else might be the issue here?
Note: In no way am I implying that this code is a full-proof method of checking the file extension, there might be a flaw in preg_match() function or the file contents could be of different format, I just ask the question in terms of regex syntax itself.
EDIT - actual code:
if (isset($_FILES["image"]) && $_FILES["image"]["name"] && preg_match('/\.(jpeg|jpg|gif|png|bmp|jpe)$/i', $_FILES["image"]["name"]) && preg_match('/\.(asp|jsp|php)$/i', $_FILES["image"]["name"]) == false) {
$time = time();
$imgname = $time . "_" . $_FILES["image"]["name"];
$dest = "../uploads/images/";
if (file_exists($dest) == false) {
mkdir($dest);
}
copy($_FILES['image']['tmp_name'], $dest . $imgname);
}else{
echo "Invalid image file";
}
PHP version: 5.3.29
EDIT: epilogue
Turned out the 'vulnerability' only presents itself on Windows. Nevertheless, it did exactly what my coworker told me it would - passed the regex check and saved the file with executable extension. Following was tested on WampServer 2.2 with PHP 5.3.13:
Passing the following string to the regex check above test.php:.jpg (note the ":" colon symbol at the end of desired extension) will validate it and the function copy() seems to omit everything after the colon symbol including the symbol itself.
Again, this is only true for windows. On linux the file will be written exactly with the same name as passed to the function.

There is not a single step or a full direct way to exploit your code but here are some thoughts.
You are passing it to copy() in this example but you have mentioned that you have been using this method to validate file ext awhile now so I assume you had other cases that may have used this procedure with other functions too on different PHP versions.
Consider this as a test procedure (Exploiting include, require):
$name = "test.php#.txt";
if (preg_match('/\.(xml|csv|txt)$/i', $name) && preg_match('/\.(asp|jsp|php)$/i', $name) == false) {
echo "in!!!!";
include $name;
} else {
echo "Invalid data file";
}
This will end up by printing "in!!!!" and executing 'test.php' even if it is uploaded it will include it from the tmp folder - of course that in this case you are already owned by the attacker but let's consider this options too.
It's not a common scenario for an uploading procedure but it's a concept that can be exploited by combining several methods:
Let's move on - If you execute:
//$_FILES['image']['name'] === "test.php#.jpg";
$name = $_FILES['image']['name'];
if (preg_match('/\.(jpeg|jpg|gif|png|bmp|jpe)$/i', $name) && preg_match('/\.(asp|jsp|php)$/i', $name) == false) {
echo "in!!!!";
copy($_FILES['image']['tmp_name'], "../uploads/".$name);
} else {
echo "Invalid image file";
}
Again perfectly fine. The file is copied into "uploads" folder - you can't access it directly (since the web server will strip down the right side of the #) but you injected the file and the attacker might find a way or another weak point to call it later.
An example for such execution scenario is common among sharing and hosting sites where the files are served by a PHP script which (in some unsafe cases) may load the file by including it with the wrong type of functions such as require, include, file_get_contents that are all vulnerable and can execute the file.
NULL byte
The null byte attacks were a big weakness in php < 5.3 but was reintroduced by a regression in versions 5.4+ in some functions including all the file related functions and many more in extensions. It was patched several times but it's still out there and alot of older versions are still in use. In case you are handling with an older php version you are definitely Exposed:
//$_FILES['image']['name'] === "test.php\0.jpg";
$name = $_FILES['image']['name'];
if (preg_match('/\.(jpeg|jpg|gif|png|bmp|jpe)$/i', $name) && preg_match('/\.(asp|jsp|php)$/i', $name) == false) {
echo "in!!!!";
copy($_FILES['image']['tmp_name'], "../uploads/".$name);
} else {
echo "Invalid image file";
}
Will print "in!!!!" and copy your file named "test.php".
The way php fixed that is by checking the string length before and after passing it to more deeper C procedure that creates the actual char array and by that if the string is truncated by the null byte (which indicates end of string in C) the length will not match. read more
Strangely enough even in patched and modern PHP releases it's still out there:
$input = "foo.php\0.gif";
include ($input); // Will load foo.php :)
My Conclusion:
Your method of validating file extensions can be improved significantly - Your code allows a PHP file called test.php#.jpg to pass through while it shouldn't. Successful attacks are mostly executed by combining several vulnerabilities even minor ones - you should consider any unexpected outcome and behavior as one.
Note: there are many more concerns about file names and pictures cause they are many time included in pages later on and if they are not filtered correctly and included safely you expose yourself to many more XSS stuff but that's out of topic.

Try this code.
$allowedExtension = array('jpeg','png','bmp'); // make list of all allowed extension
if(isset($_FILES["image"]["name"])){
$filenameArray = explode('.',$_FILES["image"]["name"]);
$extension = end($filenameArray);
if(in_array($extension,$allowedExtension)){
echo "allowed extension";
}else{
echo "not allowed extension";
}
}

preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred.
$var = "test.php";
if (preg_match('/\.(jpeg|jpg|gif|png|bmp|jpe)$/i', $var) === 1
&& preg_match('/\.(asp|jsp|php)$/i', $var) !== 1)
{
echo "No way you have extension .php or .jsp or .asp after this check.";
} else{
echo "Invalid file";
}
So when you are going to check with your code, use === 1.
Ideally you should use.
function isImageFile($file) {
$info = pathinfo($file);
return in_array(strtolower($info['extension']),
array("jpg", "jpeg", "gif", "png", "bmp"));
}

I remember that in certains version in PHP < 5.3.X, PHP allows strings to contain 0x00, this char is considered as the end of string
So, for exemple, if your string contains : myfile.exe\0.jpg, so preg_match() will match jpg, but other PHP functions will stop in myfile.exe, like include() or copy() functions

Related

Can't escape the '&' in a filename such that file_exists evaluates to true

This is a PHP app running in a Linux Docker container.
A file gets uploaded from the FE that is called "A & T.pdf".
The filename is saved in the database as "A & T.pdf".
The file is saved in Azure File Storage as "A & T.pdf".
When we go to download the file, it says ERROR: File 'A' doesn't exist. It is apparently cutting the filename off before the ampersand.
$filename = get_get('file', '0', 'string', 255);
$file=$CFG->questdir.$filename;
if (file_exists($file)) {
...
} else {
echo "ERROR: File '$filename' doesn't exist";
}
I've tried a number of different things: str_replace($file, '&', '\&'), addeslashes(), urlencode(), and a few others that aren't coming to mind.
Things like this should be sanitized going on, which is being fixed.
At this point, I'm just curious how to to resolve this error as it exists?
Database has the correct name. Storage has the correct name. PHP doesn't like the ampersand. How do you properly escape it in the variable being passed to file_exists()?
EDIT:
Tracing the steps, it looks like the filename is getting chopped off in here:
function get_get($name,$default='',$type='string',$maxlenght=0){
if(!isset($_GET[$name])) {
$var=$default; //Default
} else {
$var=trim($_GET[$name]);
if(strlen($var)>$maxlenght) $var=substr($var,0,$maxlenght);
settype($var,$type);
if($type=="string" && !get_magic_quotes_gpc()) {
$var=pg_escape_string(my_connect(), $var);
}
}
return $var;
}
It looks like it is getting truncated at the $var=trim($_GET[$name]);.
My bet is that it's not actually PHP with this issue, as & is not a special character for PHP, and given the error it actually appears to be the space at issue. While space and & are not special characters in PHP, they are in a URL. So, I suspect what is happening is your URL is something like
http://www.example.org/script.php?name=A & T.pdf
This would need to be URL encoded
http://www.example.org/script.php?name=A%20%26%20T.pdf
PHP has a command you can use if you're setting up the URL with it, otherwise do some googling for online URL encoders: https://www.php.net/manual/en/function.urlencode.php

how to search for a string into a lot of files using PHP

First I am new to PHP so I don't have any idea on how to accomplish this. I have a folder that is constantly getting txt files created ranging in size and text. I am trying to create somewhat of a "search engine" on a Linux system written in PHP. So far I am using the code below.
if ( $_SERVER['REQUEST_METHOD'] == 'POST'){
$path = '/example/files';
$findThisString = $_POST['text_box'];
$dir = dir($path);
while (false !== ($file = $dir->read())){
if ($file != '.' && $file != '..'){
if (is_file($path . '/' . $file)){
$data = file_get_contents($path . '/' . $file);
if (stripos($data, $findThisString) !== false){
echo '<p></p><font style="color:white; font-family:Arial">Found Match - '. $file .' <br>';
}
}
}
}
}
$dir->close();
Now this code works great! But one problem, once the folder gets around 40,000 files, the search takes a good amount of time to pull any results. Now I can't use any commands such as greb. It has to be written in pure PHP like the code above.
Is there anyway to optimize the code above to work any faster? Or is there a better search function I can use in PHP?
There are many reasons for why the script is so slow, and exactly what you need to do in order to decrease the time it takes depends completely upon what exact parts of the code causes the slow down.
That means that you need to put the code through a profiler, and then tweak the parts of the code that it reports are the cause. Without the profiler, all we can do is guess. Not necessarily correctly.
As noted in the comments to your question, using an already-made search engine would be the far better solution. Especially something which is purpose made for something like this, as it will cut down the time drastically.
Even the built-in grep command for Linux shells would be an improvement.
That said, I do suspect that the reason your code is so slow is because of the fact that you're reading and searching through the contents of all of the files in PHP. stripos() is particularly a likely suspect here, as that's a rather slow search.
Another factor might be the read() calls in the loop, as I believe they do a IO-operation on each call. Also, having a lot of calls to echo in a script can/will also cause a slow-down, depending upon how many of those you have. Couple of hundred is not really noticeable, but having a few thousand will be.
Taking these last points into consideration, and some other general changes I recommend to make your code easier to maintain, I've made the following changes to your code.
<?php
if (isset ($_POST['text_box'])) {
$path = '/example/files';
$result = search_files ($_POST['text_box'], $path);
}
/**
* Searches through the files in the given path, for the search term.
*
* #param string $term The term to search for, only "word characters" as defined by RegExp allowed.
* #param string $path The path which contains the files to be searched.
*
* #return string Either a list of links to the files, or an error message.
*/
function search_files ($term, $path) {
// Ensuring that we have a closing slash at the end of the path, so that
// we can add a file-descriptor for glob() to use.
if (substr ($path, -1) != '/') {
$path .= '/';
}
// If we don't have a valid/readable path we ened to throw an error now.
// This only happens if the code itself is wrong, as it's not user-supplied,
// thus an exception is thrown.
if (!is_dir ($path) || !is_readable ($path)) {
throw new InvalidArgumentException ("Not a valid search path!");
}
// This should be validated to ensure you get sane input,
// in order to avoid erroneous responses to the user and
// possible attacks.
// Addded a simple test to ensure we only accept "word characters".
if (!preg_match ('/^\w+\\z/', $term)) {
// Invalid input. Show warning to user.
return 'Not a valid search string.';
}
// Using glob so that we retrieve a list of all files in one operation.
$contents = glob ($path.'*');
// Using a holding variable, as this many echo statements take
// noticable longer time than just concatenating strings and
// echoing it out once.
$output = '';
// Using printf() templates to make the code easier to reach.
// Ideally the HTML-code shouldn't be in this string either, but adding
// a templating system is far beyond the reach of this Q&A.
$outTemplate = '<p class="found">Found Match - %2$s</p>';
foreach ($contents as $file) {
// Skip the hardlinks for parent and current folder.
if ($file == '.' || $file == '..') {
continue;
}
// Skip if the path isn't a file.
if (!is_file ($path . '/' . $file)) {
continue;
}
// This one is the big issue. Reading all of the files one by one will take time!
$data = file_get_contents ($path . '/' . $file);
// Same with running a case-insensitive search!
if (stripos ($data, $term) !== false) {
// Added output escaping to prevent issues with possible meta-characters.
// (A problem also known as XSS attacks)
$output .= sprintf ($outTemplate, htmlspecialchars (rawurlencode($file)), htmlspecialchars($file));
}
}
// Lastly, if the output string is empty we haven't found anything.
if (empty($output)) {
return "Term not found";
}
return $output;
}
if u cant use linux command when u have two ways:
1) It's save files in the Database and after this, when u need find u call query from database for search files.
2) It's create one indexed file(files which will be save in the him list files)
1 and 2 ways help u save time for execute script. For update files u can write Cron task which will be start import new files in the database or file.

How to load an image ending with any file type extension in PHP using Regular Expressions?

I have a PHP program which scans a folder named output (which contains all image files in any format) for images. The image files are the screenshots of the output in terminal (I'm using Linux) of various Java programs. The image files are named according to the class name in which the main() function resides. So for example if the main() function resides inside the public Example class, then the output screenshot will be named Example with the extension of either .jpg, .jpeg, .png or .gif. The PHP program has a front page which scans for Java files and lists them in an unordered list with links the user can click to go on the next page. The next page reads the particular file and displays the source code along with the output screenshot. Now since I'm using Fedora and Fedora takes screenshots in png format, that is quite easy like:
echo '<img src="' . $file_name . '".png>'
But what if someone uploads a Java program with the output screenshot in any format? (jpg, jpeg, png, gif). How to then load the appropriate image file since I don't know what the extension will be?
I have an answer to use foreach loop and read through every image file there is within the output folder and then use an if condition for checking the appropriate file names with the various extensions but I think it will not be a very good programming practice.
I generally try to avoid conditions while programming and use more mathematical approach cause that gives me the challenge I need and I feel my code looks different and unique compared to others' but I don't seem to make it work this time.
I'm feeling that this can be done using regular expressions but I don't know how to do it. I know regular expressions but I'm clueless to even how to use them for this. Any answer to not use regular will be appreciated but I want to make this work using regular expressions because in that way I'll also add a little bit of knowledge to my regular expression concepts.
Thanks.
Here's an alternative to MM's that uses RegEx:
function getImageFilename ($basename, $directory) {
$filenames = scandir($directory);
$pattern = "/^" . $basename . "\.(jpeg|png|jpg|gif)$/";
foreach($filenames as $filename) {
preg_match($pattern, $filename, $matches);
if($matches) {
return $filename;
}
}
return false;
}
You can't avoid using a loop. You either loop through the possible file names and check for their existence, or you get a list of all the files in the directory and loop through them whilst performing a pattern match.
If there aren't a lot of files in the directory then this function might perform better because it only needs to call the OS once (to get a list of the files in the directory), whereas asking the OS to check for file existence multiple times requires multiple system calls. (I think that's right...)
One possible solution, you could check if the file exists with that extension (assuming you won't have multiple images with the same name but different extensions):
function get_image($file_name) {
if (file_exists($file_name . '.png')) {
return $file_name . '.png';
} elseif (file_exists($file_name . '.jpg')) {
return $file_name . '.jpg';
} elseif (file_exists($file_name . '.gif')) {
return $file_name . '.gif';
}
return false;
}
echo '<img src="' . get_image($file_name) . '">';
You define the pattern as an or list of the various extensions:
$pattern = '/\.(jpg|png|gif)$/i';
We are also making sure this is an extension by including the match with a dot (escaped) and making sure it's at the end of the string ($). The "i" at the end of that enables case-insensitive matching, so that the regex still picks up GIF or JPG in filenames.
After that, the test is fairly simple:
if (preg_match($pattern, $filename)) {
echo "File $filename is an image";
}
Putting it together in an example, you can see:
$filename = 'test.png';
$pattern = '/\.(jpg|png|gif)$/i';
if (preg_match($pattern, $filename)) {
echo "File $filename is an image";
}
https://eval.in/618651
Whether you want to wrap that in a function, is up to you, as you would have to decide what to return in case the filename does not match one of the extensions provided.
Also note that the test is based on the extension only and not on the content.

What is the security issue with my code?

A few years ago, I posted an answer to a question about a way, in PHP, to let the user pass in the URI the relative path to the file to download, while preventing directory traversal.
I got a few comments telling that the code is insecure, and a few downvotes (the most recent being today). Here's the code:
$path = $_GET['path'];
if (strpos($path, '../') !== false ||
strpos($path, "..\\") !== false ||
strpos($path, '/..') !== false ||
strpos($path, '\..') !== false)
{
// Strange things happening.
}
else
{
// The request is probably safe.
if (file_exists(dirname(__FILE__) . DIRECTORY_SEPARATOR . $path))
{
// Send the file.
}
else
{
// Handle the case where the file doesn't exist.
}
}
I reviewed the code again and again, tested it, and still can't understand what's the security issue it introduces.
The only hint I got in the comments is that ../ can be replaced by %2e%2e%2f. This is not an issue, since PHP will automatically transform it into ../.
What is the problem with this piece of code? What could be the value of the input which would allow directory traversal or break something in some way?
There are lots of other possibilities that could slip through, such as:
.htaccess
some-secret-file-with-a-password-in-it.php
In other words, anything in the directory or a subdirectory would be accessible, including .htaccess files and source code. If anything in that directory or its subdirectories should not be downloadable, then that's a security hole.
I've just ran your code through Burp intruder and cannot find any way round it in this case.
It was probably down voted due to exploits against other/old technology stacks which employed a similar approach by blacklisting certain character combinations.
As you mention, the current version of PHP automatically URL decodes input, but there have been flaws where techniques such as double URL encoding (dot = %252e), 16 bit Unicode encoding (dot = %u002e), overlong UTF-8 Unicode encoding (dot = %c0%2e) or inserting a null byte (%00) could trick the filter and allow the server side code to interpret the path as the unencoded version once it had been given a thumbs up by the filter.
This is why it has set alarm bells ringing. Even though your approach appears to work here, generally it may not be the case. Technology is always changing and it is always best to err on the side of caution and use techniques that are immune to character set interpretations wherever possible such as using whitelists of known good characters that will likely to be always good, or using a file system function (realpath was mentioned in the linked answer) to verify that the actual path is the one you're expecting.
I can’t think of any case in which this should fail.
However, I don’t know how PHP’s file_exists is implemented internally and whether it has some currently unknown quirks. Just like PHP had null byte related issues with some file system functions until PHP 5.3.4.
So to play it safe, I’d rather like to check the already resolved path instead of blindly trusting PHP and – probably more important – my assumption, the four mentioned sequences are the only ones that can result in a path that is above the designated base directory. That’s why I would prefer ircmaxell’s solution to yours.
Blacklisting is a bad habit. You're better off with a whitelist (either on the literal strings allowed or on the characters allowed.)
if(preg_match('/^[A-Za-z0-9\-\_]*$/', $path) ) {
// Yay
} else {
// No
}
Or alternatively:
switch($path) {
case 'page1':
case 'page2':
// ...
break;
default:
$path = 'page1';
break;
}
include $path;

Why the following upload if condition does not work?

So I have an upload script, and I want to check the file type that is being uploaded. I only want pdf, doc, docx and text files
So I have:
$goodExtensions = array('.doc','.docx','.txt','.pdf', '.PDF');
$name = $_FILES['uploadFile']['name'];
$extension = substr($name, strpos($name,'.'), strlen($name)-1);
if(!in_array($extension,$goodExtensions) || (($_FILES['uploadFile']['type'] != "applicatioin/msword") || ($_FILES['uploadFile']['type'] != "application/pdf"))){
$error['uploadFile'] = "File not allowed. Only .doc, .docx, .txt and pdf";
}
Why I'm getting the error when testing and including correct documents?
Since you are using OR instead of AND in your expression:
if (!in_array($extension,$goodExtensions)
|| (($_FILES['uploadFile']['type'] != "applicatioin/msword")
|| ($_FILES['uploadFile']['type'] != "application/pdf"))) {
$error['uploadFile'] = "File not allowed. Only .doc, .docx, .txt and pdf";
}
this always evaluates to true: if the file extension is listed in the array goodExtensions, the first expression is false. However, since the file type can not be both Word and PDF at the same time, the second bracketed expression is always true.
So if you want to ensure that either the file extension or the MIME type is good, the correct expression would be (including the fix for the typo in "applicatioin/msword"):
if (!in_array($extension,$goodExtensions)
|| (($_FILES['uploadFile']['type'] != "application/msword")
&& ($_FILES['uploadFile']['type'] != "application/pdf"))) {
$error['uploadFile'] = "File not allowed. Only .doc, .docx, .txt and pdf";
}
The third parameter for substr is the length, not the end position. If you want everything up until the end of the string just omit the third parameter entirely:
$extension = substr($name, strpos($name,'.'));
You've also spelt application wrong in applicatioin/msword.
Finally, you might want to use strrpos instead of strpos, in case the filename contains other dots before the one separating the extension.
Edit: the logic in the if statement is wrong as well. You error if either the extension isn't known, or the type is not MS Word, or the type is not PDF. The type can't be both of those at once, so it'll always fail. You want the last || to be a &&, I think.
Probably because one (or more) of those 3 conditions in the if statement returns true.
Why I'm getting the error when testing and including correct documents?
I don't know, but you would do well to take the big "if" apart into singular blocks to find the error.
Make test outputs of the MIME type and file extension.
echo "Extension = ".$extension."<br>";
echo "MIME Type = ".$_FILES['uploadFile']['type'];
Also, one thing that jumps the eye is a typo in applicatioin/msword.

Categories