I intend to use this for a block of text that will be entered by my users. Of course I'd like to avoid malicious misuse of the site.
From what I have read about MYSQL injections and XSS, I only allow simple text, no HTML tags, no links, nothing special just plain text and some literal links.
Would this suffice?
mysql_real_escape_string(strip_tags($_POST['datablock']));
Short answer: NO.
The long answer is more complicated.
You must be sure to protect your SQL queries against injection bugs. The best way to do this is to never directly inject user data into your queries, but instead use the SQL placeholders of a database driver to do the insertion for you. This is the safest method by far. Your queries will end up looking like this:
INSERT INTO table_name (user_id, comment) VALUES (:user_id, :comment)
Data is then bound to the various placeholders in a way that the driver can encode it correctly. Being disciplined about this avoids nearly all SQL injection problems.
You must also protect your HTML from XSS attacks by not allowing users to insert arbitrary HTML with scripting into your pages. You should always render user input as the HTML entitized equivalent unless you're absolutely sure that this user content is free of <script> type tags or JavaScript snuck into various elements. Note that this is very hard to do correctly since JavaScript is not confined to script tags at all. It can appear as attributes on an HTML element or even in CSS style definitions.
Further, you should never use mysql_query or any related functions in a new application. These methods are dangerous by default and will cause you severe harm if you miss even one variable. Automated SQL injection tools have a terrifying list of features, and all that these tools require is one unescaped variable.
Thankfully mysql_query is deprecated, it produces warnings in PHP 5.5.0, and will be removed entirely in future versions of PHP.
At the very least you should be using PDO for your database access. It supports named placeholders and makes it very easy to audit your database interface code to be sure it's safe. Mistakes will stand out. As a safety measure, it might be best to define your query strings with single quotes like 'INSERT INTO ...' so that if you make the mistake of putting in a variable it won't be interpolated by accident but will end up yielding a harmless SQL error.
Ideally you should be using a PHP framework to build your applications. Most of these have standardized methods for safe database access, HTML escaping and XSS protection. It is not something you can do on your own unless you want to spend a year writing a framework of your own.
Binding your SQL parameters rather than building the SQL string manually is also a good practice (see xkcd on Little Bobby Tables).
Related
XSS and SQL injections are the two main security risks with unsanitized user input.
XSS can be prevented (when there is no WYSIWYG) by using
htmlspecialchars() and SQL injection can be prevented by using
parameterised queries and bound variables.
By using these two methods, is it secure to use all unsanitized input?
You always have to consider the context the data is used in. Because the mentioned functions and techniques do only work if they are used according to the purpose of their use. This applies not just to HTML and SQL but to any other language/context.
Regarding XSS, since htmlspecialchars escapes the HTML special characters <, >, &, ", and ' by HTML character references, it will protect you only if you put the data into a context in HTML where <, >, &, ", or ' are the context delimiters but won’t help you if other delimiters apply (e. g., an unquoted HTML attribute value, or you’ve already entered another context within HTML (e. g., an HTML attribute value that is considered as JavaScript code like the on… event attributes; or within HTML elements which are a different language, e. g., <script>, or <style>, where other/additional rules apply). Not to mention so called DOM-based XSS, where the input is not processed by the server but by client-side JavaScript. So there are situations in which htmlspecialchars won’t help you.
However, regarding emulated or real prepared statements, you should be on the safe side as the database connection layer or the DBMS will take care of proper data handling. Unless, of course, you’re still building the statement to be prepared by using improperly processed data.
Just some possible issues beside XSS and SQL-injection:
XML External Entity file disclosure via simplexml_load_[file|string](): https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing
Remote Code execution via unserialize(): http://www.exploit-db.com/exploits/22398/
Command Execution via system(), popen(), etc
strcmp() bypass using arrays
...etc...
It always depends where you pass the userinput and how you sanitize it. For instance always use PDO for SQL Operations, because even with proper escaping an attacker can inject SQL code without quotes at all:
SELECT title, content FROM cms WHERE id = 1
An attacker can change this to:
SELECT title, content FROM cms WHERE id = -1 UNION SELECT username AS title, password AS content from users LIMIT 1
In this case only intval() could help, and escaping (mysql_real_escape_string, magic_quotes, addslashes, etc...) won't help at all.
Also take a look here please: Exploitable PHP functions
SQL injection is just one case of the broader threat of code injection. That is, any case where user input (or any other untrusted content) is run as code.
This includes things like eval() but quite a few other vectors as well. The answer from #thebod includes a link to a great StackOverflow thread: Exploitable PHP functions.
Even SQL injection can't be solved 100% by parameters or escaping. Both of those techniques only help to sanitize individual values in SQL expressions. You might also need to allow user input to select tables, columns, SQL keywords, or whole expressions. For those, parameters and escaping don't help. Example:
$sql = "SELECT * FROM mytable ORDER BY $sortcolumn $asc_or_desc";
In that example, the column name to sort by and the direction (ASC vs. DESC) are based on variables. Were the variables set from trusted input, or were $_GET parameters used verbatim, resulting in a SQL injection vulnerability?
A better solution for those cases is allowlisting. That is, take the user input, compare it to a list of column names that are permitted for this dynamic query, and if the user input doesn't match one of those predefined choices, then either fail, or else use a default value.
By using those methods, your user input is already sanitized for the most part. If you want to take it a step further you can do validation checks on the input before running the SQL code. An example would be, checking numbers are numeric only, checking the length of user input etc.
I am trying to figure out which functions are best to use in different cases when inputting data, as well as outputting data.
When I allow a user to input data into MySQL what is the best way to secure the data to prevent SQL injections and or any other type of injections or hacks someone could attempt?
When I output the data as regular html from the database what is the best way to do this so scripts and such cannot be run?
At the moment I basically only use
mysql_real_escape_string();
before inputting the data to the database, this seems to work fine, but I would like to know if this is all I need to do, or if some other method is better.
And at the moment I use
stripslashes(nl2br(htmlentities()))
(most of the time anyways) for outputting data. I find these work fine for what I usually use them for, however I have run into a problem with htmlentities, I want to be able to have some html tags output respectively, for example:
<ul></ul><li></li><bold></bold>
etc, but I can't.
any help would be great, thanks.
I agree with mikikg that you need to understand SQL injection and XSS vulnerabilities before you can try to secure applications against these types of problems.
However, I disagree with his assertions to use regular expressions to validate user input as a SQL injection preventer. Yes, do validate user input insofar as you can. But don't rely on this to prevent injections, because hackers break these kinds of filters quite often. Also, don't be too strict with your filters -- plenty of websites won't let me log in because there's an apostrophe in my name, and let me tell you, it's a pain in the a** when this happens.
There are two kinds of security problems you mention in your question. The first is a SQL injection. This vulnerability is a "solved problem." That is, if you use parameterized queries, and never pass user supplied data in as anything but a parameter, the database is going to do the "right thing" for you, no matter what happens. For many databases, if you use parameterized queries, there's no chance of injection because the data isn't actually sent embedded in the SQL -- the data is passed unescaped in a length prefixed or similar blob along the wire. This is considerably more performant than database escape functions, and can be safer. (Note: if you use stored procedures that generate dynamic SQL on the database, they might also have injection problems!)
The second problem you mention is the cross site scripting problem. If you want to allow the user to supply HTML without entity escaping it first, this problem is an open research question. Suffice to say that if you allow the user to pass some kinds of HTML, it's entirely likely that your system will suffer an XSS problem at some point to a determined attacker. Now, the state of the art for this problem is to "filter" the data on the server, using libraries like HTMLPurifier. Attackers can and do break these filters on a regular basis; but as of yet nobody has found a better way of protecting the application from these kinds of things. You may be better off only allowing a specific whitelist of HTML tags, and entity encoding anything else.
This is one of the most problematic task today :)
You need to know how SQL injection and other attackers methods works. There are very detailed explanation of each method in https://www.owasp.org/index.php/Main_Page and also whole security framework for PHP.
Using specific security libraries from some framework are also good choice like in CodeIgniter or Zend.
Next, use REGEXP as much as you can and stick pattern rules to specific input format.
Use prepared statements or active records class of your framework.
Always cast your input with (int)$_GET['myvar'] if you really need numeric values.
There are so many other rules and methods to secure your application, but one golden rule is "never trust user's input".
In your php configuration, magic_quotes_gpc should be off. So you won't need stripslashes.
For SQL, take a look at PDO's prepared statements.
And for your custom tags, as there are only three of them, you can do a preg_replace call after the call of htmlentities to convert those back before your insert them into the database.
I want to know how to prevent HTML injection. I have created a site where users are allowed to paste articles in a HTML form. I have used mysql_real_escape_sting but I want to know whether this is enough for preventing HTML injections. I tried htmlspecialchars but it’s showing error with mysql_real_escape_string.
No, mysql_real_escape_sting does only prepare data to be safely inserted into MySQL string declarations to prevent SQL injections in that specific context. It does not prevent other injections like HTML injection or Cross-Site Scripting (XSS).
Both HTML injection and XSS happen in different contexts where there are different contextual special characters that need to be taken care of. In HTML it’s especially <, >, &, ", and ' that delimit the different HTML contexts. With XSS in mind you also need to be aware of the different JavaScript contexts and their special characters.
htmlspecialchars should suffice the handle the former attack while json_encode can be used for a safe subset of JavaScript. See also the XSS (Cross Site Scripting) Prevention Cheat Sheet as well as my answer to Are these two functions overkill for sanitization? and related questions for further information on this topic.
You should use prepared statements to be absolutely sure to prevent sql injection.
Taken from documentation (read the part in bold)
Many of the more mature databases support the concept of prepared statements. What are they? They can be thought of as a kind of compiled template for the SQL that an application wants to run, that can be customized using variable parameters. Prepared statements offer two major benefits:
The query only needs to be parsed (or prepared) once, but can be
executed multiple times with the same or different parameters. When
the query is prepared, the database will analyze, compile and
optimize it's plan for executing the query. For complex queries this
process can take up enough time that it will noticeably slow down an
application if there is a need to repeat the same query many times
with different parameters. By using a prepared statement the
application avoids repeating the analyze/compile/optimize cycle. This
means that prepared statements use fewer resources and thus run
faster.
The parameters to prepared statements don't need to be quoted; the
driver automatically handles this. If an application exclusively uses
prepared statements, the developer can be sure that no SQL injection
will occur (however, if other portions of the query are being built
up with unescaped input, SQL injection is still possible).
Prepared statements are so useful that they are the only feature that PDO will emulate for drivers that don't support them. This ensures that an application will be able to use the same data access paradigm regardless of the capabilities of the database.
If you meant to prevent XSS (Cross site scripting) you should use the function htmlspecialchars() whenever you want to output something to the browser that came from user input or from any non secure source. Always treat any unknown source as unsecure
echo htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
No. In fact, I believe that for advanced coders, you shouldn't be using mysql_real_escape_string() as a crutch.
For each value you need to use in a DB query, seriously consider the possible characters that could appear. If it is a dollar amount, the only characters you should accept are numbers, a period, and possible preceding dollar sign. If it is a name, you should only allow letters, a hyphen, and possibly a period (for fulls names like Joseph A. Bank).
Once you determine a strict character range that's acceptable for a value, write a Regex to match that value against. For any values that don't match, display a bogus error and log the value in a textfile (read: not a db) along with the user's IP. Frequently check this file so you can see if values users have tried that didn't work were hacking attempts. Not only will this uncover valid inputs for which you need to adjust your Regex, but it will also reveal the IP's of hackers who try to find SQL vulnerabilities on your site.
This approach ensures that new and old SQL vulnerabilities that might not immediately be addressed by mysql_real_escape_string(), will be blocked.
No, it's not. Refer to the docs
It doesn't escape < or >.
Simple answer: No
mysql_real_escape_string only helps you get rid of SQL Injections and not XSS and html injection. To avoid these you need more sophisticated input validation. Start by looking at strip_tags and htmlentities.
I have gone through a lot of articles out there to find out a simple list of characters that can restrict a user from inputting for protecting my site against XSS and SQL Injections, but couldn't find any generic list as such.
Can someone help me out by simply giving me a list of safe or unsafe characters in this regard? I know this can be field specific but I need this for text field where I want to allow maximum possible characters.
The "black-list" approach is fraught with problems. For both SQLi and XSS, input validation against a white-list is essential i.e. define what you do expect rather than what you don't expect. Remember also that user input - or "untrusted data" - comes from many places: forms, query strings, headers, ID3 and exif tags etc.
For SQLi, make sure you're always using parametrised SQL statements, usually in the form of stored procedure parameters or any decent ORM. Also apply the "principal of least privilege" and limit the damage the account connecting to your database can do. More on SQLi here: http://www.troyhunt.com/2010/05/owasp-top-10-for-net-developers-part-1.html
On the XSS front, always encode your output and make sure you're encoding it for the appropriate markup language it appears in. Output encoding for JavaScript is different to HTML which is different to CSS. Remember to encode not just responses which immediately reflect input, but also untrusted data stored in the database which could hold a persistent XSS threat. More on all this here: http://www.troyhunt.com/2010/05/owasp-top-10-for-net-developers-part-2.html
I know this goes a bit beyond your original question, but the point I'm trying to make is that allowable characters is but one small part of the picture. The other practices mentioned above are arguably more important (but you should still use those white-lists as well).
Character filtering is not how you should go about security. To prevent SQL injection, use prepared statements. To prevent XSS you should escape all user input properly
Look at the implementation of xss filtering of Drupal CMS. The function has white list containing allowed HTML tags, all other stuff will be escaped.
I am building a new web-app, LAMP environment... I am wondering if preg_match can be trusted for user's input validation (+ prepared stmt, of course) for all the text-based fields (aka not HTML fields; phone, name, surname, etc..).
For example, for a classic 'email field', if I check the input like:
$email_pattern = "/^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)" .
"|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}" .
"|[0-9]{1,3})(\]?)$/";
$email = $_POST['email'];
if(preg_match($email_pattern, $email)){
//go on, prepare stmt, execute, etc...
}else{
//email not valid! do nothing except warn the user
}
can I sleep easy against the SQL/XXS injection?
I write the regexp to be the more restrictive as they can.
EDIT: as already said, I do use prepared statements already, and this behavior is just for text-based fields (like phone, emails, name, surname, etc..), so nothing that is allowed to contain HTML (for HTML fields, I use HTMLpurifier).
Actually, my mission is to let pass the input value only if it match my regexp-white-list; else, return it back to the user.
p.s:: I am looking for something without mysql_real_escape_strings; probably the project will switch to Postgresql in the next future, so need a validation method that is cross-database ;)
Whether or not a regular expression suffices for filtering depends on the regular expression. If you're going to use the value in SQL statements, the regular expression must in some way disallow ' and ". If you want to use the value in HTML output and are afraid of XSS, you'll have to make sure your regex doesn't allow <, > and ".
Still, as has been repeatedly said, you do not want to rely on regular expressions, and please by the love of $deity, don't! Use mysql_real_escape_string() or prepared statements for your SQL statements, and htmlspecialchars() for your values when printed in HTML context.
Pick the sanitising function according to its context. As a general rule of thumb, it knows better than you what is and what isn't dangerous.
Edit, to accomodate for your edit:
Database
Prepared statements == mysql_real_escape_string() on every value to put in. Essentially exactly the same thing, short of having a performance boost in the prepared statements variant, and being unable to accidentally forget using the function on one of the values. Prepared statement are what's securing you against SQL injection, rather than the regex, though. Your regex could be anything and it would make no difference to the prepared statement.
You cannot and should not try to use regexes to accodomate for 'cross-database' architecture. Again, typically the system knows better what is and isn't dangerous for it than you do. Prepared statements are good and if those are compatible with the change, then you can sleep easy. Without regexes.
If they're not and you must, use an abstraction layer to your database, something like a custom $db->escape() which in your MySQL architecture maps to mysql_real_escape_string() and in your PostgreSQL architecture maps to a respective method for PostgreSQL (I don't know which that would be off-hand, sorry, I haven't worked with PostgreSQL).
HTML
HTML Purifier is a good way to sanitise your HTML output (providing you use it in whitelist mode, which is the setting it ships with), but you should only use that on things where you absolutely need to preserve HTML, since calling a purify() is quite costly, since it parses the whole thing and manipulates it in ways aiming for thoroughness and via a powerful set of rules. So, if you don't need HTML to be preserved, you'll want to use htmlspecialchars(). But then, again, at this point, your regular expressions would have nothing to do with your escaping, and could be anything.
Security sidenote
Actually, my mission is to let pass
the input value only if it match my
regexp-white-list; else, return it
back to the user.
This may not be true for your scenario, but just as general information: The philosophy of 'returning bad input back to the user' runs risk of opening you to reflected XSS attacks. The user is not always the attacker, so when returning things to the user, make sure you escape it all the same. Just something to keep in mind.
For SQL injection, you should always use proper escaping like mysql_real_escape_string. The best is to use prepared statements (or even an ORM) to prevent omissions.
You already did those.
The rest depends on your application's logic. You may filter HTML along with validation because you need correct information, but I don't do validation to protect from XSS, I only do business validation*.
General rule is "filter/validate input, escape output". So I escape what I display (or transmit to third-party) to prevent HTML tags, not what I record.
* Still, a person's name or email address shouldn't contain < >
Validation is to do with making input data conform to the expected values for your particular application.
Injections are to do with taking a raw text string and putting it into a different context without suitable Escaping.
They are two completely separate issues that need to be looked at separately, at different stages. Validation needs to be done when input is read (typically at the start of the script); escaping needs to be done at the instant you insert text into a context like an SQL string literal, HTML page, or any other context where some characters have out-of-band meanings.
You shouldn't conflate these two processes and you can't handle the two issues at the same time. The word ‘sanitization’ implies a mixture of both, and as such is immediately suspect in itself. Inputs should not be ‘sanitized’, they should be validated as appropriate for the application's specific needs. Later on, if they are dumped into an HTML page, they should be HTML-escaped on the way out.
It's a common mistake to run SQL- or HTML-escaping across all the user input at the start of the script. Even ‘security’-focused tutorials (written by fools) often advise doing this. The result is invariably a big mess — and sometimes still vulnerable too.
With the example of a phone number field, whilst ensuring that a string contains only numbers will certainly also guarantee that it could not be used for HTML-injection, that's a side-effect which you should not rely on. The input stage should only need to know about telephone numbers, and not which characters are special in HTML. The HTML template output stage should only know that it has a string (and thus should always call htmlspecialchars() on it), without having to have the knowledge that it contains only numbers.
Incidentally, that's a really bad e-mail validation regex. Regex isn't a great tool for e-mail validation anyway; to do it properly is absurdly difficult, but this one will reject a great many perfectly valid addresses, including any with + in the username, any in .museum or .travel or any of the IDNA domains. It's best to be liberal with e-mail addresses.
NO.
NOOOO.
NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO.
DO. NOT. USE. REGEX. FOR. THIS. EVER.
RegEx to Detect SQL Injection
Java - escape string to prevent SQL injection
You still want to escape the data before inserting it into a database. Although validating the user input is a smart thing to do the best protection against SQL injections are prepared statements (which automatically escape data) or escaping it using the database's native escaping functionality.
There is the php function mysql_real_escape_string(), which I believe you should use before submitting into a mysql database to be safe. (Also, it is easier to read.)
If you are good with regular expression : yes.
But reading your email validation regexp, I'd have to answer no.
The best is to use filter functions to get the user inputs relatively safely and get your php up to date in case something broken is found in these functions.
When you have your raw input, you have to add some things depending on what you do with these data : remove \n and \r for email and http headers, remove html tags to display to users, use parameterized queries to use it with a database.