I have three books on which being on PHP or PHP & MySQL one might reasonable expect to find some coverage of Data Sanitization, but I haven't had any luck. Is there a reliable resource online that covers the basics of cleaning your data up, both before putting it into a DB and before displaying it after pulling it from the DB?
Well Stackoverflow is such a resource. Your question being asked twice a day.
I wrote a pretty decent answer on this topic earlier: In PHP when submitting strings to the database should I take care of illegal characters using htmlspecialchars() or use a regular expression?
Long story short: for dynamic mysql query creation you have four different escaping cases:
string data
int data
identifiers
operators
and notorious PDO covers only two of them.
for the HTML htmlspecialchars with ENT_QUOTES is quite enough.
However, there are a dosen other cases, like filename sanitization, mail injection and such
Chris Shifflet wrote a book on it called Essential PHP Security.
Use PDO and binding or suitable escape string function for mysql to input data.
Use htmlspecialchars with ENT_QUOTES and the correct charset on data to display for output.
Related
I have a textarea which will be available to users as comment box so any sort of inputs are acceptable but that should be accepted only as text and not code. Basically I want to protect my database. I don't want to strip tags or such thing, I just want that if any users even inputs a code that should be stored in database as text and shouldn't be causing any harm to database. So came across these two php functions now I am not sure which one ofthese I should use as I am not able to understand difference in them.
According to official PHP docs, htmlspecialchars() and FILTER_SANITIZE_FULL_SPECIAL_CHARS should be equivalent:
Equivalent to calling htmlspecialchars() with ENT_QUOTES set. Encoding quotes can be disabled by setting FILTER_FLAG_NO_ENCODE_QUOTES. Like htmlspecialchars(), this filter is aware of the default_charset and if a sequence of bytes is detected that makes up an invalid character in the current character set then the entire string is rejected resulting in a 0-length string. When using this filter as a default filter, see the warning below about setting the default flags to 0.
Taken from here - https://www.php.net/manual/en/filter.filters.sanitize.php
Going from here, I think it would be a matter of personal preference as to which function you prefer more.
From this : http://forums.phpfreaks.com/topic/275315-htmlspecialchars-vs-filter-sanitize-special-chars/
They are quite similar yes, but as the PHP manual states
htmlspecialchars escapes a bit more than just
FILTER_SANITIZE_SPECIAL_CHARS.
That brings us to the next point, SQL injection prevention. As stated
htmlspecialchars is for escaping output to a HTML-parser, not a
database engine. The DB engine doesn't understand HTML, and doesn't
care about it either. What it does understand, is SQL queries. SQL
queries and HTML use quite different meta-characters, with only a few
in common: Quotes being the most obvious, and even that is somewhat
conditional for HTML. However, due to the other meta-characters (which
HTML does not share) using HTML escaping methods for SQL queries will
not protect you. Those meta-characters will go through
htmlspecialchars unscathed, and thus be able to cause SQL injections.
Same the other way around, if you use SQL escaping methods to escape
output going to a browser. It will not escape the < and > signs,
meaning an attacker can easily perform HTML injection attacks (XSS
etc). Not only that, but you'll suddenly have a lot of slashes in
places where there shouldn't be any. Which is quite annoying, at best.
This is why it's so important to know, and use, the proper method for
the third party system you're sending the data to. If you don't, you
are still vulnerable
I've been running and developed a classified site now for the last 8 months and all the bugs were due to only one reason: how the users input their text...
My question is: Is there a php class, a plugin, something that I can do
$str = UltimateClean($str) before sending $str to my sql??
PS. I also noticed the problems doubled when i started using JSON, because I also have to be careful outputting the result in JSON..
Some issues I faced: multi-language strings (different charsets), copy-paste from Excel sheets.
Note: I am not worried for SQL Injections.
No, there isn't.
Different modes of escaping are for different purposes. You cannot universally escape something.
For Databases: Use PDO with prepared queries
For HTML: Use htmlspecialchars()
For JSON: json_encode() handles this for you
For character sets: You should be using UTF-8 on your page. Do this, and set your databases accordingly, and watch those issues disappear.
I was wondering if converting POST input from an HTML form into html entities, (via the PHP function htmlentities() or using the FILTER_SANITIZE_SPECIAL_CHARS constant in tandem with the filter_input() PHP function ), will help defend against any attacks where a user attempts to insert any JavaScript code inside the form field or if there's any other PHP based function or tactic I should employ to create a safe HTML form experience?
Sorry for the loaded run-on sentence question but that's the best I could word it in a hurry.
Any responses would be greatly appreciated and thanks to all in advance.
racl101
It would turn the following:
<script>alert("Muhahaha");</script>
into
<script>alert("Muhahaha");</script>
So if you're printing out this data into HTML later, you would be protected. It wouldn't protect you from:
"; alert("Muhahaha");
just in case you were echoing into a script like so:
var t = "Hello there <?php echo $str;?>";
For this purpose, you should use addslashes() and a database string escaping method like mysql_real_escape_string().
yes, that is one way to sanitise. it has the benefit that you can always display the database contents without fear of xss attacks. however, a 'purer' approach is to store the raw data in the database and sanitise in the view - so every time you want to show the text, use htmlentities() on it.
however, your approach does not take into account sql injection attacks. you might want to look at http://php.net/manual/en/function.mysql-real-escape-string.php to guard against that.
Yes, do this when you want to display data to a webpage, but I recommend you don't store the HTML in the database as encoded, this may seem fine for large text fields, but when you have shorter titles, say a 32 character, a normal 30 character string that contains an & would become & and this would either cause a SQL error or the data to be cut off.
So the rule of thumb is, store everything row (obviously prevent SQL injection) and treat EVERYTHING as tainted, no matter where it comes from: the database, user forms, rss feeds, flat files, XML, etc. This is how you build good security without worrying about the data overflowing, or the fact you might oneday need to extract the data to a non web user where the HTML encoding is a problem.
I'm developing an application using Wordpress as a CMS.
I have a form with a lot of input fields which needs to be sanitized before stored in the database.
I want to prevent SQL injection, having javascript and PHP code injected and other harmful code.
Currently I'm using my own methods to sanitize data, but I feel that it might be better to use the functions which WP uses.
I have looked at Data Validation in Wordpress, but I'm unsure on how much of these functions I should use, and in what order. Can anyone tell what WP functions are best to use?
Currently I'm "sanitizing" my input by doing the following:
Because characters with accents (é, ô, æ, ø, å) got stored in a funny way in the Database (even though my tables are set to ENGINE=InnoDB, DEFAULT CHARSET=utf8 and COLLATE=utf8_danish_ci), I'm now converting input fields that can have accents, using htmlentities().
When creating the SQL string to input the data, I use mysql_real_escape_string().
I don't think this is enough to prevent attacks though. So suggestions to improvement is greatly appreciated.
Input “sanitisation” is bogus.
You shouldn't attempt to protect yourself from injection woes by filtering(*) or escaping input, you should work with raw strings until the time you put them into another context. At that point you need the correct escaping function for that context, which is mysql_real_escape_string for MySQL queries and htmlspecialchars for HTML output.
(WordPress adds its own escaping functions like esc_html, which are in principle no different.)
(*: well, except for application-specific requirements, like checking an e-mail address is really an e-mail address, ensuring a password is reasonable, and so on. There's also a reasonable argument for filtering out control characters at the input stage, though this is rarely actually done.)
I'm now converting input fields that can have accents, using htmlentities().
I strongly advise not doing that. Your database should contain raw text; you make it much harder to do database operations on the columns if you've encoded it as HTML. You're escaping characters such as < and " at the same time as non-ASCII characters too. When you get data from the database and use it for some other reason than copying it into the page, you've now got spurious HTML-escapes in the data. Don't HTML-escape until the final moment you're writing text to the page.
If you are having trouble getting non-ASCII characters into the database, that's a different problem which you should solve first instead of going for unsustainable workarounds like storing HTML-encoded data. There are a number of posts here all about getting PHP and databases to talk proper UTF-8, but the main thing is to make sure your HTML output pages themselves are correctly served as UTF-8 using the Content-Type header/meta. Then check your MySQL connection is set to UTF-8, eg using mysql_set_charset().
When creating the SQL string to input the data, I use mysql_real_escape_string().
Yes, that's correct. As long as you do this you are not vulnerable to SQL injection. You might be vulnerabile to HTML-injection (causing XSS) if you are HTML-escaping at the database end instead of the template output end. Because any string that hasn't gone through the database (eg. fetched directly from $_GET) won't have been HTML-escaped.
I am creating a forum software using php and mysql backend, and want to know what is the most secure way to escape user input for forum posts.
I know about htmlentities() and strip_tags() and htmlspecialchars() and mysql_real_escape_string(), and even javascript's escape() but I don't know which to use and where.
What would be the safest way to process these three different types of input (by process, I mean get, save in a database, and display):
A title of a post (which will also be the basis of the URL permalink).
The content of a forum post limited to basic text input.
The content of a forum post which allows html.
I would appreciate an answer that tells me how many of these escape functions I need to use in combination and why.
Thanks!
When generating HTLM output (like you're doing to get data into the form's fields when someone is trying to edit a post, or if you need to re-display the form because the user forgot one field, for instance), you'd probably use htmlspecialchars() : it will escape <, >, ", ', and & -- depending on the options you give it.
strip_tags will remove tags if user has entered some -- and you generally don't want something the user typed to just disappear ;-)
At least, not for the "content" field :-)
Once you've got what the user did input in the form (ie, when the form has been submitted), you need to escape it before sending it to the DB.
That's where functions like mysqli_real_escape_string become useful : they escape data for SQL
You might also want to take a look at prepared statements, which might help you a bit ;-)
with mysqli - and with PDO
You should not use anything like addslashes : the escaping it does doesn't depend on the Database engine ; it is better/safer to use a function that fits the engine (MySQL, PostGreSQL, ...) you are working with : it'll know precisely what to escape, and how.
Finally, to display the data inside a page :
for fields that must not contain HTML, you should use htmlspecialchars() : if the user did input HTML tags, those will be displayed as-is, and not injected as HTML.
for fields that can contain HTML... This is a bit trickier : you will probably only want to allow a few tags, and strip_tags (which can do that) is not really up to the task (it will let attributes of the allowed tags)
You might want to take a look at a tool called HTMLPUrifier : it will allow you to specify which tags and attributes should be allowed -- and it generates valid HTML, which is always nice ^^
This might take some time to compute, and you probably don't want to re-generate that HTML each time is has to be displayed ; so you can think about storing it in the database (either only keeping that clean HTML, or keeping both it and the not-clean one, in two separate fields -- might be useful to allow people editing their posts ? )
Those are only a few pointers... hope they help you :-)
Don't hesitate to ask if you have more precise questions !
mysql_real_escape_string() escapes everything you need to put in a mysql database. But you should use prepared statements (in mysqli) instead, because they're cleaner and do any escaping automatically.
Anything else can be done with htmlspecialchars() to remove HTML from the input and urlencode() to put things in a format for URL's.
There are two completely different types of attack you have to defend against:
SQL injection: input that tries to manipulate your DB. mysql_real_escape_string() and addslashes() are meant to defend against this. The former is better, but parameterized queries are better still
Cross-Site scripting (XSS): input that, when displayed on your page, tries to execute JavaScript in a visitor's browser to do all kinds of things (like steal the user's account data). htmlspecialchars() is the definite way to defend against this.
Allowing "some HTML" while avoiding XSS attacks is very, very hard. This is because there are endless possibilities of smuggling JavaScript into HTML. If you decided to do this, the safe way is to use BBCode or Markdown, i.e. a limited set of non-HTML markup that you then convert to HTML, while removing all real HTML with htmlspecialchars(). Even then you have to be careful not to allow javascript: URLs in links. Actually allowing users to input HTML is something you should only do if it's absolutely crucial for your site. And then you should spend a lot of time making sure you understand HTML and JavaScript and CSS completely.
The answer to this post is a good answer
Basically, using the pdo interface to parameterize your queries is much safer and less error prone than escaping your inputs manually.
I have a tendency to escape all characters that would be problematic in page display, Javascript and SQL all at the same time. It leaves it readable on the web and in HTML eMail and at the same time removes any problems with the code.
A vb.NET Line Of Code Would Be:
SafeComment = Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
HttpUtility.HtmlEncode(Trim(strInput)), _
":", ":"), "-", "-"), "|", "|"), _
"`", "`"), "(", "("), ")", ")"), _
"%", "%"), "^", "^"), """", """), _
"/", "/"), "*", "*"), "\", "\"), _
"'", "'")
First of all, general advice: don't escape variables literally when inserting in the database. There are plenty of solutions that let you use prepared statements with variable binding. The reason to not do this explicitly is because it is only a matter of time then before you forget it just once.
If you're inserting plain text in the database, don't try to clean it on insert, but instead clean it on display. That is to say, use htmlentities to encode it as HTML (and pass the correct charset argument). You want to encode on display because then you're no longer trusting that the database contents are correct, which isn't necessarily a given.
If you're dealing with rich text (html), things get more complicated. Removing the "evil" bits from HTML without destroying the message is a difficult problem. Realistically speaking, you'll have to resort to a standardized solution, like HTMLPurifier. However, this is generally too slow to run on every page view, so you'll be forced to do this when writing to the database. You'll also have to ensure that the user can see their "cleaned up" html and correct the cleaned up version.
Definitely try to avoid "rolling your own" filter or encoding solution at any step. These problems are notoriously tricky, and you run a large risk of overlooking some minor detail that has big security implications.
I second Joeri, do not roll your own, go here to see some of the the many possible XSS attacks
http://ha.ckers.org/xss.html
htmlentities() -> turns text into html, converting characters to entities. If using UTF-8 encoding then use htmlspecialchars() instead as the other entities are not needed. This is the best defence against XSS. I use it on every variable I output regardless of type or origin unless I intend it to be html. There is only a tiny performance cost and it is easier than trying to work out what needs escaping and what doesn't.
strip_tags() - turns html into text by removing all html tags. Use this to ensure that there is nothing nasty in your input as a adjunct to escaping your output.
mysql_real_escape_string() - escapes a string for mysql and is your defence against SQL injections from little Bobby tables (better to use mysqli and prepare/bind as escaping is then done for you and you can avoid lots of messy string concatenations)
The advice given obve re avoiding HTML input unless it is essential and opting for BBCode or similar (make your own up if needs be) is very sound indeed.