php security question

php security question - php

It has a been a long day but I cannot seem to choose in my own head which is better or if I should use both.
Basically what should I use to sanitize user inputted values. Is it either the htmlentities or preg_match function ?
I will then if the value goes into a sql query use the mysql_real_escape_string function but only until I change it to a prepared statement then I can remove this.
Or would it be good idea to use both htmlentities and preg_match ?

Why didn't you just ask this in your previous question ?
Use preg_match before you do any escaping, to ensure the data meets the whitelist of what you expect it to be. Then use the escape for the database insertion. This is called defense in depth (i.e. more than one layer of security checking, in case the attacker can break through the first layer).

If your using PHP 5.2+, you should look into the Filter functions to sanitize your data.
http://php.net/manual/en/filter.examples.sanitization.php

Its better to have too many validation checks and sanitization routines than too few. The system is no more or less secure by adding redundancy. Ether its a vulnerability or its not, its a Boolean not a Float. When I am auditing code and I see redundant secuirty measures I think of it as a red flag and it encourages me to dig deeper. This programmer is paranoid and perhaps they do not understand the nature of vulnerabilities although this is not always true.
There is another problem. htmlentities() doesn't always stop xss, for instance what if the output is within a <script></script> tag or even an href for that matter? mysql_real_escape_string doesn't always stop sql injection, what if: 'select * from user where id='.mysql_real_escape_string($_GET[id]);. a preg_match can fix this problem, but intval() is a much better function to use in this case.
I am a HUGE fan of prepared statements. I think this is an excellent approach because by default it is secure, but passing a variable to mysql_real_escape_string() before a prepared statement is just going to corrupt the data. I have seen a novice fix this problem by removing all validation routines thus introducing a vulnerability because of redundancy. Cause and Effect.
Web Application Firewalls (WAF) is an excellent example of how layers can improve security. WAF's are highly dependent on regular expressions. They try to look at the bigger picture and prevent nasty input or at the very least log it. They are by no means a silver bullet and should not be the only security measure you use, but they do stop some exploits and I recommend installing mod_security on production machines.

Basically what should I use to sanitize user inputted values. Is it either the htmlentities or preg_match function ?
Certainly not htmlentities, probably not preg_match either (for security purposes). You change the representation of any output to the medium its going to (htmlentites fora web page, urlencode for URL, mysql_real_escape_string for a mysql database....).
If someone really wants to register on your application as dummy' UNION SELECT 'dummy' AS user,'dummy' AS password FROM DUAL then let them!
Writing your code to insulate it from attacks is a lot more effective than trying to detect different types of attack in advance.
Some data input may have to match a particular format for it to be of any use - and there may be a delay between the data capture and the use of the data - e.g. if the user is asked to input an email address or a date - in which case preg_match might be appropriate. But this is nothing to do with security.
C.

Related

Sanitizing PHP Variables, am I overusing it?

I've been working with PHP for some time and I began asking myself if I'm developing good habits.
One of these is what I belive consists of overusing PHP sanitizing methods, for example, one user registers through a form, and I get the following post variables:
$_POST['name'], $_POST['email'] and $_POST['captcha']. Now, what I usually do is obviously sanitize the data I am going to place into MySQL, but when comparing the captcha, I also sanitize it.
Therefore I belive I misunderstood PHP sanitizing, I'm curious, are there any other cases when you need to sanitize data except when using it to place something in MySQL (note I know sanitizing is also needed to prevent XSS attacks). And moreover, is my habit to sanitize almost every variable coming from user-input, a bad one ?

Whenever you store your data someplace, and if that data will be read/available to (unsuspecting) users, then you have to sanitize it. So something that could possibly change the user experience (not necessarily only the database) should be taken care of. Generally, all user input is considered unsafe, but you'll see in the next paragraph that some things might still be ignored, although I don't recommend it whatsoever.
Stuff that happens on the client only is sanitized just for a better UX (user experience, think about JS validation of the form - from the security standpoint it's useless because it's easily avoidable, but it helps non-malicious users to have a better interaction with the website) but basically, it can't do any harm because that data (good or bad) is lost as soon as the session is closed. You can always destroy a webpage for yourself (on your machine), but the problem is when someone can do it for others.
To answer your question more directly - never worry about overdoing it. It's always better to be safe than sorry, and the cost is usually not more than a couple of milliseconds.

The term you need to search for is FIEO. Filter Input, Escape Output.
You can easily confound yourself if you do not understand this basic principle.
Imagine PHP is the man in the middle, it receives with the left hand and doles out with the right.
A user uses your form and fills in a date form, so it should only accept digits and maybe, dashes. e.g. nnnnn-nn-nn. if you get something which does not match that, then reject it.
That is an example of filtering.
Next PHP, does something with it, lets say storing it in a Mysql database.
What Mysql needs is to be protected from SQL injection, so you use PDO, or Mysqli's prepared statements to make sure that EVEN IF your filter failed you cannot permit an attack on your database. This is an example of Escaping, in this case escaping for SQL storage.
Later, PHP gets the data from your db and displays it onto a HTML page. So you need to Escape the data for the next medium, HTML (this is where you can permit XSS attacks).
In your head you have to divide each of the PHP 'protective' functions into one or other of these two families, Filtering or Escaping.
Freetext fields are of course more complex than filtering for a date, but never mind, stick to the principles and you will be OK.
Hoping this helps http://phpsec.org/projects/guide/

PHP user input data security

I am trying to figure out which functions are best to use in different cases when inputting data, as well as outputting data.
When I allow a user to input data into MySQL what is the best way to secure the data to prevent SQL injections and or any other type of injections or hacks someone could attempt?
When I output the data as regular html from the database what is the best way to do this so scripts and such cannot be run?
At the moment I basically only use
mysql_real_escape_string();
before inputting the data to the database, this seems to work fine, but I would like to know if this is all I need to do, or if some other method is better.
And at the moment I use
stripslashes(nl2br(htmlentities()))
(most of the time anyways) for outputting data. I find these work fine for what I usually use them for, however I have run into a problem with htmlentities, I want to be able to have some html tags output respectively, for example:
<ul></ul><li></li><bold></bold>
etc, but I can't.
any help would be great, thanks.

I agree with mikikg that you need to understand SQL injection and XSS vulnerabilities before you can try to secure applications against these types of problems.
However, I disagree with his assertions to use regular expressions to validate user input as a SQL injection preventer. Yes, do validate user input insofar as you can. But don't rely on this to prevent injections, because hackers break these kinds of filters quite often. Also, don't be too strict with your filters -- plenty of websites won't let me log in because there's an apostrophe in my name, and let me tell you, it's a pain in the a** when this happens.
There are two kinds of security problems you mention in your question. The first is a SQL injection. This vulnerability is a "solved problem." That is, if you use parameterized queries, and never pass user supplied data in as anything but a parameter, the database is going to do the "right thing" for you, no matter what happens. For many databases, if you use parameterized queries, there's no chance of injection because the data isn't actually sent embedded in the SQL -- the data is passed unescaped in a length prefixed or similar blob along the wire. This is considerably more performant than database escape functions, and can be safer. (Note: if you use stored procedures that generate dynamic SQL on the database, they might also have injection problems!)
The second problem you mention is the cross site scripting problem. If you want to allow the user to supply HTML without entity escaping it first, this problem is an open research question. Suffice to say that if you allow the user to pass some kinds of HTML, it's entirely likely that your system will suffer an XSS problem at some point to a determined attacker. Now, the state of the art for this problem is to "filter" the data on the server, using libraries like HTMLPurifier. Attackers can and do break these filters on a regular basis; but as of yet nobody has found a better way of protecting the application from these kinds of things. You may be better off only allowing a specific whitelist of HTML tags, and entity encoding anything else.

This is one of the most problematic task today :)
You need to know how SQL injection and other attackers methods works. There are very detailed explanation of each method in https://www.owasp.org/index.php/Main_Page and also whole security framework for PHP.
Using specific security libraries from some framework are also good choice like in CodeIgniter or Zend.
Next, use REGEXP as much as you can and stick pattern rules to specific input format.
Use prepared statements or active records class of your framework.
Always cast your input with (int)$_GET['myvar'] if you really need numeric values.
There are so many other rules and methods to secure your application, but one golden rule is "never trust user's input".

In your php configuration, magic_quotes_gpc should be off. So you won't need stripslashes.
For SQL, take a look at PDO's prepared statements.
And for your custom tags, as there are only three of them, you can do a preg_replace call after the call of htmlentities to convert those back before your insert them into the database.

Is FILTER_SANITIZE_STRING enough to avoid SQL injection and XSS attacks?

I'm using PHP 5 with SQLite 3 class and I'm wondering if using PHP built-in data filtering function with the flag FILTER_SANITIZE_STRING is enough to stop SQL injection and XSS attacks.
I know I can go grab a large ugly PHP class to filter everything but I like to keep my code as clean and as short as possible.
Please advise.

The SQLite3 class allows you to prepare statements and bind values to them. That would be the correct tool for your database queries.
As for XSS, well that is entirely unrelated to your use of SQLite.

It's never wise to use the same sanitization function for both XSS and SQLI. For XSS you can use htmlentities to filter user input before output to HTML. For SQLI on SQLite you can either use prepared statements (which is better) or use escapeString to filter user input before constructing SQL queries with them.

If you don't trust your own understanding of the security issues enough to need to ask this question, how can you trust someone here to give you a good answer?
If you go down the path of stripping out unwanted characters sooner or later you're going to be stripping out characters that users want to type. It's better to encode for the specific context that the data is used.
Check out OWASP ESAPI, it contains plenty of encoding functions. If you don't want to pull in that big of a library, check out what the functions do and copy the relevant parts to your codebase.

If you are just trying to build a simple form and dont want to introduce any heavy or even light frameworks, then go with php filters + and use PDO for the database. This should protect you from everything but cross site request forgeries.

FILTER_SANITIZE_STRING will remove HTML tags not special characters like &. If you want to convert a special character to entity code prevent malicious users to do anything.
filter_input(INPUT_GET, 'input_name', FILTER_SANITIZE_SPECIAL_CHARS);
OR
filter_input($var_name, FILTER_SANITIZE_SPECIAL_CHARS);
If you want to encode everything it's worth using for
FILTER_SANITIZE_ENCODED
For more info:
https://www.php.net/manual/en/function.filter-var.php

I think its good enough to secure your string data inputs, but there are many other options available which you can choose. e.g. other libraries would increase your application process time but will help you to process/parse other types of data.

Is preg_match safe enaught in input satinization?

I am building a new web-app, LAMP environment... I am wondering if preg_match can be trusted for user's input validation (+ prepared stmt, of course) for all the text-based fields (aka not HTML fields; phone, name, surname, etc..).
For example, for a classic 'email field', if I check the input like:
$email_pattern = "/^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)" .
"|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}" .
"|[0-9]{1,3})(\]?)$/";
$email = $_POST['email'];
if(preg_match($email_pattern, $email)){
//go on, prepare stmt, execute, etc...
}else{
//email not valid! do nothing except warn the user
}
can I sleep easy against the SQL/XXS injection?
I write the regexp to be the more restrictive as they can.
EDIT: as already said, I do use prepared statements already, and this behavior is just for text-based fields (like phone, emails, name, surname, etc..), so nothing that is allowed to contain HTML (for HTML fields, I use HTMLpurifier).
Actually, my mission is to let pass the input value only if it match my regexp-white-list; else, return it back to the user.
p.s:: I am looking for something without mysql_real_escape_strings; probably the project will switch to Postgresql in the next future, so need a validation method that is cross-database ;)

Whether or not a regular expression suffices for filtering depends on the regular expression. If you're going to use the value in SQL statements, the regular expression must in some way disallow ' and ". If you want to use the value in HTML output and are afraid of XSS, you'll have to make sure your regex doesn't allow <, > and ".
Still, as has been repeatedly said, you do not want to rely on regular expressions, and please by the love of $deity, don't! Use mysql_real_escape_string() or prepared statements for your SQL statements, and htmlspecialchars() for your values when printed in HTML context.
Pick the sanitising function according to its context. As a general rule of thumb, it knows better than you what is and what isn't dangerous.
Edit, to accomodate for your edit:
Database
Prepared statements == mysql_real_escape_string() on every value to put in. Essentially exactly the same thing, short of having a performance boost in the prepared statements variant, and being unable to accidentally forget using the function on one of the values. Prepared statement are what's securing you against SQL injection, rather than the regex, though. Your regex could be anything and it would make no difference to the prepared statement.
You cannot and should not try to use regexes to accodomate for 'cross-database' architecture. Again, typically the system knows better what is and isn't dangerous for it than you do. Prepared statements are good and if those are compatible with the change, then you can sleep easy. Without regexes.
If they're not and you must, use an abstraction layer to your database, something like a custom $db->escape() which in your MySQL architecture maps to mysql_real_escape_string() and in your PostgreSQL architecture maps to a respective method for PostgreSQL (I don't know which that would be off-hand, sorry, I haven't worked with PostgreSQL).
HTML
HTML Purifier is a good way to sanitise your HTML output (providing you use it in whitelist mode, which is the setting it ships with), but you should only use that on things where you absolutely need to preserve HTML, since calling a purify() is quite costly, since it parses the whole thing and manipulates it in ways aiming for thoroughness and via a powerful set of rules. So, if you don't need HTML to be preserved, you'll want to use htmlspecialchars(). But then, again, at this point, your regular expressions would have nothing to do with your escaping, and could be anything.
Security sidenote
Actually, my mission is to let pass
the input value only if it match my
regexp-white-list; else, return it
back to the user.
This may not be true for your scenario, but just as general information: The philosophy of 'returning bad input back to the user' runs risk of opening you to reflected XSS attacks. The user is not always the attacker, so when returning things to the user, make sure you escape it all the same. Just something to keep in mind.

For SQL injection, you should always use proper escaping like mysql_real_escape_string. The best is to use prepared statements (or even an ORM) to prevent omissions.
You already did those.
The rest depends on your application's logic. You may filter HTML along with validation because you need correct information, but I don't do validation to protect from XSS, I only do business validation*.
General rule is "filter/validate input, escape output". So I escape what I display (or transmit to third-party) to prevent HTML tags, not what I record.
* Still, a person's name or email address shouldn't contain < >

Validation is to do with making input data conform to the expected values for your particular application.
Injections are to do with taking a raw text string and putting it into a different context without suitable Escaping.
They are two completely separate issues that need to be looked at separately, at different stages. Validation needs to be done when input is read (typically at the start of the script); escaping needs to be done at the instant you insert text into a context like an SQL string literal, HTML page, or any other context where some characters have out-of-band meanings.
You shouldn't conflate these two processes and you can't handle the two issues at the same time. The word ‘sanitization’ implies a mixture of both, and as such is immediately suspect in itself. Inputs should not be ‘sanitized’, they should be validated as appropriate for the application's specific needs. Later on, if they are dumped into an HTML page, they should be HTML-escaped on the way out.
It's a common mistake to run SQL- or HTML-escaping across all the user input at the start of the script. Even ‘security’-focused tutorials (written by fools) often advise doing this. The result is invariably a big mess — and sometimes still vulnerable too.
With the example of a phone number field, whilst ensuring that a string contains only numbers will certainly also guarantee that it could not be used for HTML-injection, that's a side-effect which you should not rely on. The input stage should only need to know about telephone numbers, and not which characters are special in HTML. The HTML template output stage should only know that it has a string (and thus should always call htmlspecialchars() on it), without having to have the knowledge that it contains only numbers.
Incidentally, that's a really bad e-mail validation regex. Regex isn't a great tool for e-mail validation anyway; to do it properly is absurdly difficult, but this one will reject a great many perfectly valid addresses, including any with + in the username, any in .museum or .travel or any of the IDNA domains. It's best to be liberal with e-mail addresses.

NO.
NOOOO.
NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO.
DO. NOT. USE. REGEX. FOR. THIS. EVER.
RegEx to Detect SQL Injection
Java - escape string to prevent SQL injection

You still want to escape the data before inserting it into a database. Although validating the user input is a smart thing to do the best protection against SQL injections are prepared statements (which automatically escape data) or escaping it using the database's native escaping functionality.

There is the php function mysql_real_escape_string(), which I believe you should use before submitting into a mysql database to be safe. (Also, it is easier to read.)

If you are good with regular expression : yes.
But reading your email validation regexp, I'd have to answer no.
The best is to use filter functions to get the user inputs relatively safely and get your php up to date in case something broken is found in these functions.
When you have your raw input, you have to add some things depending on what you do with these data : remove \n and \r for email and http headers, remove html tags to display to users, use parameterized queries to use it with a database.

I'm learning PHP on my own and I've become aware of the strip_tags() function. Is this the only way to increase security?

I'm new to PHP and I'm following a tutorial here:
Link
It's pretty scary that a user can write php code in an input and basically screw your site, right?
Well, now I'm a bit paranoid and I'd rather learn security best practices right off the bat than try to cram them in once I have some habits in me.
Since I'm brand new to PHP (literally picked it up two days ago), I can learn pretty much anything easily without getting confused.
What other way can I prevent shenanigans on my site? :D

There are several things to keep in mind when developing a PHP application, strip_tags() only helps with one of those. Actually strip_tags(), while effective, might even do more than needed: converting possibly dangerous characters with htmlspecialchars() should even be preferrable, depending on the situation.
Generally it all comes down to two simple rules: filter all input, escape all output. Now you need to understand what exactly constitutes input and output.
Output is easy, everything your application sends to the browser is output, so use htmlspecialchars() or any other escaping function every time you output data you didn't write yourself.
Input is any data not hardcoded in your PHP code: things coming from a form via POST, from a query string via GET, from cookies, all those must be filtered in the most appropriate way depending on your needs. Even data coming from a database should be considered potentially dangerous; especially on shared server you never know if the database was compromised elsewhere in a way that could affect your app too.
There are different ways to filter data: white lists to allow only selected values, validation based on expcted input format and so on. One thing I never suggest is try fixing the data you get from users: have them play by your rules, if you don't get what you expect, reject the request instead of trying to clean it up.
Special attention, if you deal with a database, must be paid to SQL injections: that kind of attack relies on you not properly constructing query strings you send to the database, so that the attacker can forge them trying to execute malicious instruction. You should always use an escaping function such as mysql_real_escape_string() or, better, use prepared statements with the mysqli extension or using PDO.
There's more to say on this topic, but these points should get you started.
HTH
EDIT: to clarify, by "filtering input" I mean decide what's good and what's bad, not modify input data in any way. As I said I'd never modify user data unless it's output to the browser.

strip_tags is not the best thing to use really, it doesn't protect in all cases.
HTML Purify:
http://htmlpurifier.org/
Is a real good option for processing incoming data, however it itself still will not cater for all use cases - but it's definitely a good starting point.

I have to say that the tutorial you mentioned is a little misleading about security:
It is important to note that you never want to directly work with the $_GET & $_POST values. Always send their value to a local variable, & work with it there. There are several security implications involved with the values when you directly access (or
output) $_GET & $_POST.
This is nonsense. Copying a value to a local variable is no more safe than using the $_GET or $_POST variables directly.
In fact, there's nothing inherently unsafe about any data. What matters is what you do with it. There are perfectly legitimate reasons why you might have a $_POST variable that contains ; rm -rf /. This is fine for outputting on an HTML page or storing in a database, for example.
The only time it's unsafe is when you're using a command like system or exec. And that's the time you need to worry about what variables you're using. In this case, you'd probably want to use something like a whitelist, or at least run your values through escapeshellarg.
Similarly with sending queries to databases, sending HTML to browsers, and so on. Escape the data right before you send it somewhere else, using the appropriate escaping method for the destination.

strip_tags removes every piece of html. more sophisticated solutions are based on whitelisting (i.e. allowing specific html tags). a good whitelisting library is htmlpurifyer http://htmlpurifier.org/
and of course on the database side of things use functions like mysql_real_escape_string or pg_escape_string

Well, probably I'm wrong, but... In all literature, I've read, people say It's much better to use htmlspellchars.
Also, rather necessary to cast input data. (for int for example, if you are sure it's user id).
Well, beforehand, when you'll start using database - use mysql_real_escape_string instead of mysql_escape_string to prevent SQL injections (in some old books it's written mysql_escape_string still).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.