So, I am studying some PHP security using DVWA (http://www.dvwa.co.uk/). Right now I'm on an exercise where the author tries to teach us to execute commands on vulnerable applications. In this level, it adds a very simple blacklist which removes important characters:
$substitutions = array(
'&&' => '',
';' => '',
);
I obviously can use some other characters to still get code executed (like |, ||, &, etc.), but I wanted to know how I'd evade the substitution for the single character ";". I've seen some examples around which fools the substitution with code like "<scr<script>ipt>" and I've tried stuff like ";;;"; tried to encode in hex and base64 and such but it didn't work.
Is there a way to evade str_replace() when it is looking for a single character? This is PHP 5.5.3.
I found this page to be useful when I was doing this. It turns out there are other operators which can be used other than ';' to plug your own command in!
The "hard" setting on this is currently causing myself some trouble, I think there may be a workaround using URL encoded characters or something of the sort, but it remains to be seen.
I'm not sure why the author is showing how to use a black-list, its too easily subverted, perhaps this idea is shredded further on in the tut. http://en.wikipedia.org/wiki/Secure_input_and_output_handling
Although the example you link to is the 'medium' level, even the 'harder' level does not use PHPs Filter FILTER_VALIDATE_IP
Even a REGEX would do a better job. See half way down the page of: http://www.regular-expressions.info/examples.html
If you are trying to protect against XSS attacks (you mention a mangled script tag) then white-listing is the way to go. Validate against what you expect to get, or abort.
EDIT
Hmmm.. now I see the site is called Damned Vulnerable Web App, perhaps the idea is to teach you all the poor examples ...
Related
I'm working on a WordPress site with some other developers and the code they wrote to set upcustom variables for Google Analytics, via the _setCustomVar, uses html_entity_decode. They pointed to the well known and much used Yoast plugin which uses a similar technique. I can't figure out why you would use it that way though.
At no point (that I can see) does the string get encoded, so the function doesn't do anything. WordPress delivers whole strings, even with accents on them, never anything encoded, so there aren't rogue encoded characters to worry about. In fact, the one thing you don't want to do is send Google Analytics a mess of HTML, right?
I've changed it because I'm pretty sure that what using html_entity_decode doesn't do is remove single quotes, which in a JS script where strings are contained by single quotes, means that any variable with an apostrophe just breaks Google Analytics tracking entirely.
Instead, I'm cleaning strings using a strip_tags and esc_js (a WordPress function).
I'm a little concerned because the linked script is very commonly used. It seems like I must be wrong about something and I don't want to screw up my own script because of it.
What am I missing?
The answer seems to be that Yoast uses that code as a 'just in case' measure for strings that might have encoded characters in them. It still doesn't seem to take care of quote marks though, which is a pretty big deal.
Here's the code I wrote to solve all the issues: https://gist.github.com/AramZS/8930496
Sometimes text on my pages looks very strange, real example:
trained professionals and paraprofessionals coming together
...While the parent div is quite narrow so the text is just sticking out of it.
And it looks quite strange, because actually represents a space.
So, I wonder if it's possible to make the browser account these characters as actual spaces and break the line where necessary without actually replacing them?
EDIT
Why a blind replacing is a problem?
Because may be needed sometimes.
Consider the following example:
Ranks:<br>
Marshall<br>
Leutenant<br>
Sergeant
If I just use a preg_replace on them it would look differently in the end.
(I would also consider some suggestions if you have any ideas on replacing them smartly (for php platform) If you could think of some algorithm that wouldn't affect formatting.)
By definition, is a non-breakable space. It's very meaning is not to be broken across line endings. If this is not what you intend then I suggest fixing the HTML instead of trying to force the browser into non-standard behaviour.
Is it possible to write a regular expression which checks if a string (some code) is minified?
Many PHP/JS obfuscators remove white space chars (among other things).
So, the final minified code sometimes looks like this:
PHP:
$a=array();if(is_array($a)){echo'ok';}
JS:
a=[];if(typeof(a)=='object'&&(a instanceof Array){alert('ok')}
in both cases there are no space chars before and after "{", "}", ";", etc. There also some other patterns which can help. I am not expecting a high accuracy regex, just need one which checks if at least 100 chars of string looks like minified code.
Thanks in advice.
PURPOSES: web malware scanner
I think a minifier will strip all newline characters, although there might possibly be one at the end of the file still if the minified code was pasted back in a text editor. Something like this will probably be fairly accurate:
/^[^\n\r]+(\r\n?|\n)?$/
That just tests that there are no newline characters in the whole thing except for possibly one at the end. So no guarantees, but I think it will work well on any longish block of code.
The short answer is "no", regex cannot do this.
Your best bet will probably be to do a statistical analysis of the source files, and compare against some known heuristics. For instance, by comparing the variable names against those often found in minimized code. A minimized file probably has a lot of one-character variable names, for instance... and won't have two-character variable names until all the one-character variable names are exhausted... etc.
Another option would be simply to run the source file through a minimizer, and see if the output is sufficiently different from the input. If not, it was probably already minimized.
But I have to agree with sg3s's final sentence: If you can explain why you need this, we can probably provide more useful answers to your actual needs.
No. Since the syntax/code and its intention doesn't change and some people who're very familiar with the php and/or js will write simple functions on one line without any whitespace at all (me :s).
What you could do is count all the whitespace characters in a string though this would also be unreliable since for some stuff you simply need whitespace, like x instanceof y heh. Also not all code is minified and cramped into a single row (see jQuery UI) so you can't really count on that either....
Maybe you can explain why you need to know this and we can try and find an alternative?
You can't tell if it's got minified or just written like that by hand (probably only applies for smaller scripts). But you can check if it doesn't contain unnecessary whitespace.
Take a look at open source obfuscator/minifier and see what rules they use to remove the whitespace. Validating if those rules were applied should work, if regex get to complex, a simple parser might be needed.
Just make sure that string literals like a="if ( b )" are excluded.
Run it through a parser for that particular language (even a prettifier might work fine) and modify it to count the number of unused characters. Use the percentage of unused chars vs. number of chars in documents as a test for minification. I don't think you can do this accurately with regex, although counting whitespace vs. document content might be okay.
Are there any pre-made scripts that I can use for PHP / MySQL to prevent server-side scripting and JS injections?
I know about the typical functions such as htmlentities, special characters, string replace etc. but is there a simple bit of code or a function that is a failsafe for everything?
Any ideas would be great. Many thanks :)
EDIT: Something generic that strips out anything that could be hazardous, ie. greater than / less than signs, semi-colons, words like "DROP", etc?
I basically just want to compress everything to be alphanumeric, I guess...?
Never output any bit of data whatsoever to the HTML stream that has not been passed through htmlspecialchars() and you're done. Simple rule, easy to follow, completely eradicates any XSS risk.
As a programmer it's your job to do it, though.
You can define
function h(s) { return htmlspecialchars(s); }
if htmlspecialchars() is too long to write 100 times per PHP file. On the other hand, using htmlentities() is not necessary at all.
The key point is: There is code, and there is data. If you intermix the two, bad things ensue.
In the case of HTML, code is elements, attribute names, entities, comments. Data is everything else. Data must be escaped to avoid being mistaken for code.
In case of URLs, code is the scheme, the host name, the path, the mechanism of the query string (?, &, =, #). Data is everything in the query string: parameter names and values. They must be escaped to avoid being mistaken for code.
URLs embedded in HTML must be doubly escaped (by URL-escaping and HTML-escaping) to ensure proper separation of code and data.
Modern browsers are capable of parsing amazingly broken and incorrect markup into something useful. This capability should not be stressed, though. The fact that something happens to work (like URLs in <a href> without proper HTML-escaping applied) does not mean that it's good or correct to do it. XSS is a problem that roots in a) people unaware of data/code separation (i.e. "escaping") or those that are sloppy and b) people that try to be clever about what part of data they don't need to escape.
XSS is easy enough to avoid if you make sure you don't fall into categories a) and b).
I think Google-caja maybe a solution. I write a taint analyzer for java web application to detect and prevent XSS automatically. But not for PHP. I think Learning to using caja not bad for web developer.
No, there isn't. Risks depend on what you do with data, you can't write something that makes data safe for everything (unless you want to discard most of the data)
is there a simple bit of code or a function that is a failsafe for everything?
No.
The representation of data leaving PHP must be converted / encoded specifically according where it is going. And therefore should only be converted/encoded at the point where it leaves PHP.
C.
You can refer to OWASP to get more understanding of XSS attacks:
https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet
To avoid js attacks, you can try this project provided by open source excellence:
https://www.opensource-excellence.com/shop/ose-security-suite.html
my website had the js attacks before, and this tool helps me block all attacks everyday. i think it can help you guys to avoid the problem.
Further, you can add a filter in your php script to filter all js attacks, here is one pattern that can do job:
if (preg_match('/(?:[".]script\s*()|(?:\$\$?\s*(\s*[\w"])|(?:/[\w\s]+/.)|(?:=\s*/\w+/\s*.)|(?:(?:this|window|top|parent|frames|self|content)[\s*[(,"]\s[\w\$])|(?:,\s*new\s+\w+\s*[,;)/ms', strtolower($POST['VARIABLENAME'])))
{
filter_variable($POST['VARIABLENAME']);
}
To answer to your edition: everything except <> symbols has nothing to do with XSS.
And htmlspecialchars() can deal with them.
There is no harm in the word DROP table in the page's text ;)
for clean user data use
html_special_chars(); str_replace() and other funcs to cut unsafe data.
I am creating a forum software using php and mysql backend, and want to know what is the most secure way to escape user input for forum posts.
I know about htmlentities() and strip_tags() and htmlspecialchars() and mysql_real_escape_string(), and even javascript's escape() but I don't know which to use and where.
What would be the safest way to process these three different types of input (by process, I mean get, save in a database, and display):
A title of a post (which will also be the basis of the URL permalink).
The content of a forum post limited to basic text input.
The content of a forum post which allows html.
I would appreciate an answer that tells me how many of these escape functions I need to use in combination and why.
Thanks!
When generating HTLM output (like you're doing to get data into the form's fields when someone is trying to edit a post, or if you need to re-display the form because the user forgot one field, for instance), you'd probably use htmlspecialchars() : it will escape <, >, ", ', and & -- depending on the options you give it.
strip_tags will remove tags if user has entered some -- and you generally don't want something the user typed to just disappear ;-)
At least, not for the "content" field :-)
Once you've got what the user did input in the form (ie, when the form has been submitted), you need to escape it before sending it to the DB.
That's where functions like mysqli_real_escape_string become useful : they escape data for SQL
You might also want to take a look at prepared statements, which might help you a bit ;-)
with mysqli - and with PDO
You should not use anything like addslashes : the escaping it does doesn't depend on the Database engine ; it is better/safer to use a function that fits the engine (MySQL, PostGreSQL, ...) you are working with : it'll know precisely what to escape, and how.
Finally, to display the data inside a page :
for fields that must not contain HTML, you should use htmlspecialchars() : if the user did input HTML tags, those will be displayed as-is, and not injected as HTML.
for fields that can contain HTML... This is a bit trickier : you will probably only want to allow a few tags, and strip_tags (which can do that) is not really up to the task (it will let attributes of the allowed tags)
You might want to take a look at a tool called HTMLPUrifier : it will allow you to specify which tags and attributes should be allowed -- and it generates valid HTML, which is always nice ^^
This might take some time to compute, and you probably don't want to re-generate that HTML each time is has to be displayed ; so you can think about storing it in the database (either only keeping that clean HTML, or keeping both it and the not-clean one, in two separate fields -- might be useful to allow people editing their posts ? )
Those are only a few pointers... hope they help you :-)
Don't hesitate to ask if you have more precise questions !
mysql_real_escape_string() escapes everything you need to put in a mysql database. But you should use prepared statements (in mysqli) instead, because they're cleaner and do any escaping automatically.
Anything else can be done with htmlspecialchars() to remove HTML from the input and urlencode() to put things in a format for URL's.
There are two completely different types of attack you have to defend against:
SQL injection: input that tries to manipulate your DB. mysql_real_escape_string() and addslashes() are meant to defend against this. The former is better, but parameterized queries are better still
Cross-Site scripting (XSS): input that, when displayed on your page, tries to execute JavaScript in a visitor's browser to do all kinds of things (like steal the user's account data). htmlspecialchars() is the definite way to defend against this.
Allowing "some HTML" while avoiding XSS attacks is very, very hard. This is because there are endless possibilities of smuggling JavaScript into HTML. If you decided to do this, the safe way is to use BBCode or Markdown, i.e. a limited set of non-HTML markup that you then convert to HTML, while removing all real HTML with htmlspecialchars(). Even then you have to be careful not to allow javascript: URLs in links. Actually allowing users to input HTML is something you should only do if it's absolutely crucial for your site. And then you should spend a lot of time making sure you understand HTML and JavaScript and CSS completely.
The answer to this post is a good answer
Basically, using the pdo interface to parameterize your queries is much safer and less error prone than escaping your inputs manually.
I have a tendency to escape all characters that would be problematic in page display, Javascript and SQL all at the same time. It leaves it readable on the web and in HTML eMail and at the same time removes any problems with the code.
A vb.NET Line Of Code Would Be:
SafeComment = Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
HttpUtility.HtmlEncode(Trim(strInput)), _
":", ":"), "-", "-"), "|", "|"), _
"`", "`"), "(", "("), ")", ")"), _
"%", "%"), "^", "^"), """", """), _
"/", "/"), "*", "*"), "\", "\"), _
"'", "'")
First of all, general advice: don't escape variables literally when inserting in the database. There are plenty of solutions that let you use prepared statements with variable binding. The reason to not do this explicitly is because it is only a matter of time then before you forget it just once.
If you're inserting plain text in the database, don't try to clean it on insert, but instead clean it on display. That is to say, use htmlentities to encode it as HTML (and pass the correct charset argument). You want to encode on display because then you're no longer trusting that the database contents are correct, which isn't necessarily a given.
If you're dealing with rich text (html), things get more complicated. Removing the "evil" bits from HTML without destroying the message is a difficult problem. Realistically speaking, you'll have to resort to a standardized solution, like HTMLPurifier. However, this is generally too slow to run on every page view, so you'll be forced to do this when writing to the database. You'll also have to ensure that the user can see their "cleaned up" html and correct the cleaned up version.
Definitely try to avoid "rolling your own" filter or encoding solution at any step. These problems are notoriously tricky, and you run a large risk of overlooking some minor detail that has big security implications.
I second Joeri, do not roll your own, go here to see some of the the many possible XSS attacks
http://ha.ckers.org/xss.html
htmlentities() -> turns text into html, converting characters to entities. If using UTF-8 encoding then use htmlspecialchars() instead as the other entities are not needed. This is the best defence against XSS. I use it on every variable I output regardless of type or origin unless I intend it to be html. There is only a tiny performance cost and it is easier than trying to work out what needs escaping and what doesn't.
strip_tags() - turns html into text by removing all html tags. Use this to ensure that there is nothing nasty in your input as a adjunct to escaping your output.
mysql_real_escape_string() - escapes a string for mysql and is your defence against SQL injections from little Bobby tables (better to use mysqli and prepare/bind as escaping is then done for you and you can avoid lots of messy string concatenations)
The advice given obve re avoiding HTML input unless it is essential and opting for BBCode or similar (make your own up if needs be) is very sound indeed.