PHP preg_replace RCE [duplicate] - php

This question already has answers here:
Replace preg_replace() e modifier with preg_replace_callback
(3 answers)
Closed 4 years ago.
I'm currently improving my knowledge about security holes in HTML, PHP, JavaScript etc.
A few hours ago, I stumbled across the /e modifier in regular expressions and I still don't get how it works. I've taken a look at the documentation, but that didn't really help.
What I understood is that this modifier can be manipulated to give someone the opportunity to execute PHP code in (for example, preg_replace()). I've seen the following example describing a security hole but it wasn't explained, so could someone please explain me how to call phpinfo() in the following code?
$input = htmlentities("");
if (strpos($input, 'bla'))
{
echo preg_replace("/" .$input ."/", $input ."<img src='".$input.".png'>", "bla");
}

The e Regex Modifier in PHP with example vulnerability & alternatives
What e does, with an example...
The e modifier is a deprecated regex modifier which allows you to use PHP code within your regular expression. This means that whatever you parse in will be evaluated as a part of your program.
For example, we can use something like this:
$input = "Bet you want a BMW.";
echo preg_replace("/([a-z]*)/e", "strtoupper('\\1')", $input);
This will output BET YOU WANT A BMW.
Without the e modifier, we get this very different output:
strtoupper('')Bstrtoupper('et')strtoupper('') strtoupper('you')strtoupper('') strtoupper('want')strtoupper('') strtoupper('a')strtoupper('') strtoupper('')Bstrtoupper('')Mstrtoupper('')Wstrtoupper('').strtoupper('')
Potential security issues with e...
The e modifier is deprecated for security reasons. Here's an example of an issue you can run into very easily with e:
$password = 'secret';
...
$input = $_GET['input'];
echo preg_replace('|^(.*)$|e', '"\1"', $input);
If I submit my input as "$password", the output to this function will be secret. It's very easy, therefore, for me to access session variables, all variables being used on the back-end and even take deeper levels of control over your application (eval('cat /etc/passwd');?) through this simple piece of poorly written code.
Like the similarly deprecated mysql libraries, this doesn't mean that you cannot write code which is not subject to vulnerability using e, just that it's more difficult to do so.
What you should use instead...
You should use preg_replace_callback in nearly all places you would consider using the e modifier. The code is definitely not as brief in this case but don't let that fool you -- it's twice as fast:
$input = "Bet you want a BMW.";
echo preg_replace_callback(
"/([a-z]*)/",
function($matches){
foreach($matches as $match){
return strtoupper($match);
}
},
$input
);
On performance, there's no reason to use e...
Unlike the mysql libraries (which were also deprecated for security purposes), e is not quicker than its alternatives for most operations. For the example given, it's twice as slow: preg_replace_callback (0.14 sec for 50,000 operations) vs e modifier (0.32 sec for 50,000 operations)

The e modifier is a PHP-specific modifier that triggers PHP to run the resulting string as PHP code. It is basically a eval() wrapped inside a regex engine.
eval() on its own is considered a security risk and a performance problem; wrapping it inside a regex amplifies both those issues significantly.
It is therefore considered bad practice, and is being formally deprecated as of the soon-to-be-released PHP v5.5.
PHP has provided for several versions now an alternative solution in the form of preg_replace_callback(), which uses callback functions instead of using eval(). This is the recommended method of doing this kind of thing.
With specific regard to the code you've quoted:
I don't see an e modifier in the sample code you've given in the question. It has a slash at each end as the regex delimiter; the e would have to be outside of that, and it isn't. Therefore I don't think the code you've quoted is likely to be directly vulnerable to having an e modifier injected into it.
However, if $input contains any / characters, it will be vulnerable to being entirely broken (ie throwing an error due to invalid regex). The same would apply if it had anything else that made it an invalid regular expression.
Because of this, it is a bad idea to use an unvalidated user input string as part of a regex pattern - even if you are sure that it can't be hacked to use the e modifier, there's plenty of other mischief that could be achieved with it.

As explained in the manual, the /e modifier actually evaluates the text the regular expression works on as PHP code. The example given in the manual is:
$html = preg_replace(
'(<h([1-6])>(.*?)</h\1>)e',
'"<h$1>" . strtoupper("$2") . "</h$1>"',
$html
);
This matches any "<hX>XXXXX</hX>" text (i.e. headline HTML tags), replaces this text with "<hX>" . strtoupper("XXXXXX") . "<hX>", then executes "<hX>" . strtoupper("XXXXXX") . "<hX>" as PHP code, then puts the result back into the string.
If you run this on arbitrary user input, any user has a chance to slip something in which will actually be evaluated as PHP code. If he does it correctly, the user can use this opportunity to execute any code he wants to. In the above example, imagine if in the second step the text would be "<hX>" . strtoupper("" . shell('rm -rf /') . "") . "<hX>".

It's evil, that's all you need to know :p
More specifically, it generates the replacement string as normal, but then runs it through eval.
You should use preg_replace_callback instead.

Related

Convert JQuery RegEx into PHP RegEX

Sorry for this question as it is very specific.
I have a JQuery validation RegEx that I would like to use on the back end too:
var forNames = new RegExp("^[^0-9<>'\"/;`%]*$");
I tried in PHP
preg_match('/^[^0-9<>\'\"/;`%]{2,42}$/', $first_name) // I also want to keep the length between 2 and 42 here)
but it does not work, I get Unknown modifier ';' in
The other question, similar to this one is what this person is asking here
Converting Javascript Regex to PHP
I tried his solution, copying the php email validation regex into JQuery with no luck
Thank you
Ps I just unedited what i had added to the regex cause i didnt see it already had answers and it was confusing
You need to escape the / character in your PHP regex string, because that's also the character which is used to signify the end of a regexp (it's called a delimiter):
preg_match('/^[^0-9<>\'\"/;`%]{2,42}$/', $first_name)
^
becomes:
preg_match('/^[^0-9<>\'\"\/;`%]{2,42}$/', $first_name)
The reason you didn't need to do this in your JavaScript code is that you used the RegExp constructor, which essentially automatically escaped it for you. If you had used a RegExp literal you would have had to escape it too:
var forNames = /^[^0-9<>'\"\/;`%]*$/;
As #DelightedD0D commented, make sure to test your RegExp with an interactive tool like regex101, it supports both PHP and JS style regexp and is actually how I was able to catch your error so fast.

Confusion with ereg_replace() Beginning PHP and MySQL Example by W Jason Gilmore

Just a note to begin I am aware that ereg_replace() is deprecated, since POSIX is no longer being used. But in "Beginning PHP and MySQL" by W Jason Gilmore, Gilmore emphasizes that although POSIX isn't to be used, an understanding is still necessary as a means of conversion to Perl. So once again I understand it's deprecated but since I'm trying to understand everything in the book I might as well understand this.
So the example is as follows:
<?php
$text = "This is a link to http://www.example.com/.";
echo ereg_replace("http://([a-zA-Z0-9./-]+)$", "\\0",
$text);
?>
//Output
This is a link to http://www.example.com/..
So I understand the majority of code in the above example, my problem lies with the ./- and the output. For the ./- I tried to think according to quantifiers where . = between, so everything between [:alnum:] and / is replaced. I also thought maybe ./- are characters within the range which would also be replaced since [:alnum:] doesn't include punctuation. For verfication I looked at the output but theres no - present. If only the / is replaced than the code would make sense, since /0 outputs http://www.example.com/ but than the problem lies with the missing - which I presume to be pertinent to the brackets rather than as a quantifier.
My other question is in regards to the output, if the function returns the string with the modified string why does the period which was present in the original string appear after the second /0, not the first, if its the original text, why does the tag follow it and not precede it?
Just for some quick background, I have a basic understanding of php,html,css,javascript,C++ and I'm reading this for a more in depth understanding of php and an introduction to MySQL, so unfortunately explanations which are entirely advanced code/concepts go right over my head.
why does the period which was present in the original string appear after the second /0, not the first
This is not the case, because the actual output is:
This is a link to http://www.example.com/.
The period is included in both the attribute as well as the tag contents.
my problem lies with the ./- and the output
When present inside a character set, ./- means to match either a period, forward slash or a dash; it's important to note that the dash must appear at the end of the character set to avoid ambiguity.

Parsing link from javascript function

I'm trying to parse a direct link out of a javascript function within a page. I'm able to parse the html info I need, but am stumped on the javascript part. Is this something that is achievable with php and possibly regex?
function videoPoster() {
document.getElementById("html5_vid").innerHTML =
"<video x-webkit-airplay='allow' id='html5_video' style='margin-top:"
+ style_padding
+ "px;' width='400' preload='auto' height='325' controls onerror='cantPlayVideo()' "
+ "<source src='http://video-website.com/videos/videoname.mp4' type='video/mp4'>";
}
What I need to pull out is the link "http://video-website.com/videos/videoname.mp4". Any help or pointers would be greatly appreciated!
/http://.*\.mp4/ will give you all characters between http:// and .mp4, inclusive.
See it in action.
If you need the session id, use something like /http://.*\.mp4?sessionid=\d+/
In general, no. Nothing short of a full javascript parser will always extract urls, and even then you'll have trouble with urls that are computed nontrivially.
In practice, it is often best to use the simplest capturing regexp that works for the code you actually need to parse. In this case:
['"](http://[^'"]*)['"]
If you have to enter that regexp as a string, beware of escaping.
If you ever have unescaped quotation marks in urls, this will fail. That's valid but rare. Whoever is writing the stuff you're parsing is unlikely to use them because they make referring to the urls in javascript a pain.
For your specific case, this should work, provided that none of the characters in the URL are escaped.
preg_match("/src='([^']*)'/", $html, $matches);
$url = $matches[1];
See the preg_match() manual page. You should probably add error handling, ensuring that the function returns 1 (that the regex matched) and possibly performing some additional checks as well (such as ensuring that the URL begins with http:// and contains .mp4?).
(As with all Web scraping techniques, the owner or maintainer of the site you are scraping may make a future change that breaks your script, and you should be prepared for that.)
The following captures any url in your html
$matches=array();
if (preg_match_all('/src=["\'](?P<urls>https?:\/\/[^"\']+)["\']/', $html, $matches)){
print_r($matches['urls']);
}
if you want to do the same in javascript you could use this:
var matches;
if (matches=html.match(/src=["'](https?:\/\/[^"']+)["']/g)){
//gives you all matches, but they are still including the src=" and " parts, so you would
//have to run every match again against the regex without the g modifier
}

Exclude Literal backslash in Javascript Regular Expression

I'm writing a php forms class with client and server side validation. I'm having problems checking if a literal backslash ("\") exists in a string using regular expressions in javascript.
I want to shy away from solutions other than using regex as this will reduce the amount of special cases between php and js AND reduce the amount of conditional code I need to write.
I've just been using this as an example of what a user may need in this forms class-
A password field that is a string
between 6 and 12 chars long and that
excludes "\","#","$","`"
I have tried:
^[^(\u0008#\$`)]{6,12}$
^[^(\b#\$`)]{6,12}$
^[^(\\#\$`)]{6,12}$
And none of them work for a backslash and I can't work out why. FYI: The latter works fine in PHP.
The regular expression \\ matches a single backslash. In JavaScript, this becomes re = /\\/ or re = new RegExp("\\\\").
ripped straight from http://www.regular-expressions.info/javascript.html
It looks like you've created a grouping of slash-hash-dollar-tick, rather than looking for any of those characters.
try this
var rgx = new RegExp(/^[^\\#\$`]{6,12}$/);

PHP preg_match Math Function

I'm writing a script that will allow a user to input a string that is a math statement, to then be evaluated. I however have hit a roadblock. I cannot figure out how, using preg_match, to dissallow statements that have variables in them.
Using this, $calc = create_function("", "return (" . $string . ");" ); $calc();, allows users to input a string that will be evaluated, but it crashes whenever something like echo 'foo'; is put in place of the variable $string.
I've seen this post, but it does not allow for math functions inside the string, such as $string = 'sin(45)';.
For a stack-based parser implemented in PHP that uses Djikstra's shunting yard algorithm to convert infix to postfix notation, and with support for functions with varying number of arguments, you can look at the source for the PHPExcel calculation engine (and which does not use eval)
Also have a look at the responses to this question
How complex of a math function do you need to allow? If you only need basic math, then you might be able to get away with only allowing whitespace + the characters 0123456789.+/-* or some such.
In general, however, using the language's eval-type capabilities to just do math is probably a bad idea.
Something like:
^([\d\(\)\+\-*/ ,.]|sin\(|cos\(|sqrt\(|...)+$
would allow only numbers, brackets, math operations and provided math functions. But it won't check if provided expression is valid, so something like +++sin()))333((( would be still accepted.
I wonder if this class would help you? Found that doing a search on Google for "php math expressions".

Categories