The holy grail of cleaning input and output in php? - php

I've been running and developed a classified site now for the last 8 months and all the bugs were due to only one reason: how the users input their text...
My question is: Is there a php class, a plugin, something that I can do
$str = UltimateClean($str) before sending $str to my sql??
PS. I also noticed the problems doubled when i started using JSON, because I also have to be careful outputting the result in JSON..
Some issues I faced: multi-language strings (different charsets), copy-paste from Excel sheets.
Note: I am not worried for SQL Injections.

No, there isn't.
Different modes of escaping are for different purposes. You cannot universally escape something.
For Databases: Use PDO with prepared queries
For HTML: Use htmlspecialchars()
For JSON: json_encode() handles this for you
For character sets: You should be using UTF-8 on your page. Do this, and set your databases accordingly, and watch those issues disappear.

Related

What is difference between FILTER_SANITIZE_FULL_SPECIAL_CHARS and htmlspecialchars?

I have a textarea which will be available to users as comment box so any sort of inputs are acceptable but that should be accepted only as text and not code. Basically I want to protect my database. I don't want to strip tags or such thing, I just want that if any users even inputs a code that should be stored in database as text and shouldn't be causing any harm to database. So came across these two php functions now I am not sure which one ofthese I should use as I am not able to understand difference in them.
According to official PHP docs, htmlspecialchars() and FILTER_SANITIZE_FULL_SPECIAL_CHARS should be equivalent:
Equivalent to calling htmlspecialchars() with ENT_QUOTES set. Encoding quotes can be disabled by setting FILTER_FLAG_NO_ENCODE_QUOTES. Like htmlspecialchars(), this filter is aware of the default_charset and if a sequence of bytes is detected that makes up an invalid character in the current character set then the entire string is rejected resulting in a 0-length string. When using this filter as a default filter, see the warning below about setting the default flags to 0.
Taken from here - https://www.php.net/manual/en/filter.filters.sanitize.php
Going from here, I think it would be a matter of personal preference as to which function you prefer more.
From this : http://forums.phpfreaks.com/topic/275315-htmlspecialchars-vs-filter-sanitize-special-chars/
They are quite similar yes, but as the PHP manual states
htmlspecialchars escapes a bit more than just
FILTER_SANITIZE_SPECIAL_CHARS.
That brings us to the next point, SQL injection prevention. As stated
htmlspecialchars is for escaping output to a HTML-parser, not a
database engine. The DB engine doesn't understand HTML, and doesn't
care about it either. What it does understand, is SQL queries. SQL
queries and HTML use quite different meta-characters, with only a few
in common: Quotes being the most obvious, and even that is somewhat
conditional for HTML. However, due to the other meta-characters (which
HTML does not share) using HTML escaping methods for SQL queries will
not protect you. Those meta-characters will go through
htmlspecialchars unscathed, and thus be able to cause SQL injections.
Same the other way around, if you use SQL escaping methods to escape
output going to a browser. It will not escape the < and > signs,
meaning an attacker can easily perform HTML injection attacks (XSS
etc). Not only that, but you'll suddenly have a lot of slashes in
places where there shouldn't be any. Which is quite annoying, at best.
This is why it's so important to know, and use, the proper method for
the third party system you're sending the data to. If you don't, you
are still vulnerable

php 5.5 text area input filtering & output escaping

From researching the best way to sanitize HTML form data, I have found that you should "FILTER INPUT" and "ESCAPE OUTPUT."
Unfortunately most of the info on this subject that out there is old information (PHP 4.x, 2009, etc.). Many web pages recommend mysql_real_escape_string which is deprecated as of PHP 5.5.0.
I am on PHP 5.5.x, Apache web server, and a MySQL database. I am using PDO prepared statements. All character sets are UTF-8.
I have looked at the PHP Sanitize filters (found here: http://www.php.net/manual/en/filter.filters.sanitize.php).
I have filtered all inputs and escaped all outputs except for my textareas. The text areas are for the user to "describe an event." These are fairly large at 500 characters. This is the one spot that I feel most vulnerable to malicious code.
The first thing that I do to all input is the trim() function.
What are the best filters to run on input?
How should I be escaping output?
I want the output to be readable.
Thank you in advance.
The answer depends about the input kind you want to filter.
Let's assume you want to only simple plain text in your textarea, so you need to avoid javascript, css and html code also you need to prevent SQL Injection in this input and get only 500 characters of length.
Use strip_tags to avoid HTML, CSS and JS tags.
Use str_replace("'","\'", $string) to prevent SQL Injection parsing
single quote.
Use substr to cut the string in 500 characteres if the
are more than the limit.
I recommend you visit :
https://www.owasp.org/index.php/Top_10_2013-Top_10
Also I recommend use a PHP Framework to avoid the tradicional way in you work with this common security risks.
In my opinion you can use Zend, Code Igniter or Laravel.
http://ellislab.com/codeigniter/user-guide/libraries/input.html
http://framework.zend.com/manual/2.0/en/modules/zend.filter.html
http://laravel.com/docs/validation

Where can I find a good coverage of standard Data Sanitization Techniques?

I have three books on which being on PHP or PHP & MySQL one might reasonable expect to find some coverage of Data Sanitization, but I haven't had any luck. Is there a reliable resource online that covers the basics of cleaning your data up, both before putting it into a DB and before displaying it after pulling it from the DB?
Well Stackoverflow is such a resource. Your question being asked twice a day.
I wrote a pretty decent answer on this topic earlier: In PHP when submitting strings to the database should I take care of illegal characters using htmlspecialchars() or use a regular expression?
Long story short: for dynamic mysql query creation you have four different escaping cases:
string data
int data
identifiers
operators
and notorious PDO covers only two of them.
for the HTML htmlspecialchars with ENT_QUOTES is quite enough.
However, there are a dosen other cases, like filename sanitization, mail injection and such
Chris Shifflet wrote a book on it called Essential PHP Security.
Use PDO and binding or suitable escape string function for mysql to input data.
Use htmlspecialchars with ENT_QUOTES and the correct charset on data to display for output.

Using Wordpress, can some one tell me the best way of sanitizing input?

I'm developing an application using Wordpress as a CMS.
I have a form with a lot of input fields which needs to be sanitized before stored in the database.
I want to prevent SQL injection, having javascript and PHP code injected and other harmful code.
Currently I'm using my own methods to sanitize data, but I feel that it might be better to use the functions which WP uses.
I have looked at Data Validation in Wordpress, but I'm unsure on how much of these functions I should use, and in what order. Can anyone tell what WP functions are best to use?
Currently I'm "sanitizing" my input by doing the following:
Because characters with accents (é, ô, æ, ø, å) got stored in a funny way in the Database (even though my tables are set to ENGINE=InnoDB, DEFAULT CHARSET=utf8 and COLLATE=utf8_danish_ci), I'm now converting input fields that can have accents, using htmlentities().
When creating the SQL string to input the data, I use mysql_real_escape_string().
I don't think this is enough to prevent attacks though. So suggestions to improvement is greatly appreciated.
Input “sanitisation” is bogus.
You shouldn't attempt to protect yourself from injection woes by filtering(*) or escaping input, you should work with raw strings until the time you put them into another context. At that point you need the correct escaping function for that context, which is mysql_real_escape_string for MySQL queries and htmlspecialchars for HTML output.
(WordPress adds its own escaping functions like esc_html, which are in principle no different.)
(*: well, except for application-specific requirements, like checking an e-mail address is really an e-mail address, ensuring a password is reasonable, and so on. There's also a reasonable argument for filtering out control characters at the input stage, though this is rarely actually done.)
I'm now converting input fields that can have accents, using htmlentities().
I strongly advise not doing that. Your database should contain raw text; you make it much harder to do database operations on the columns if you've encoded it as HTML. You're escaping characters such as < and " at the same time as non-ASCII characters too. When you get data from the database and use it for some other reason than copying it into the page, you've now got spurious HTML-escapes in the data. Don't HTML-escape until the final moment you're writing text to the page.
If you are having trouble getting non-ASCII characters into the database, that's a different problem which you should solve first instead of going for unsustainable workarounds like storing HTML-encoded data. There are a number of posts here all about getting PHP and databases to talk proper UTF-8, but the main thing is to make sure your HTML output pages themselves are correctly served as UTF-8 using the Content-Type header/meta. Then check your MySQL connection is set to UTF-8, eg using mysql_set_charset().
When creating the SQL string to input the data, I use mysql_real_escape_string().
Yes, that's correct. As long as you do this you are not vulnerable to SQL injection. You might be vulnerabile to HTML-injection (causing XSS) if you are HTML-escaping at the database end instead of the template output end. Because any string that hasn't gone through the database (eg. fetched directly from $_GET) won't have been HTML-escaped.

why can't php just convert quotes to html entities for mysql?

PHP uses "magic quotes" by default but has gotten a lot of flak for it. I understand it will disable it in next major version of PHP.
While the arguments against it makes sense, what I don't get it is why not just use the HTML entities to represent quotes instead of stripping and removing slashes? After all, a VAST majority of mySQL is used for outputting to web browsers?
For example, ' is used instead of ' and it won't affect the database at all.
Another question, why can't PHP just have configurations set up for each version of PHP with this tag <?php4 or <?php5 so appropriate interpreters can be loaded for those versions?
Just curious. :)
Putting ' into a string column in a database would be fine, if all you use database content for is outputting to a web page. But that's not true.
It's better to escape output at the time you output it. That's the only time you know for sure that the output is going to a web page -- not a log file, an email, or other destination.
PS: PHP already turns magic quotes off by default in the standard php.ini file. It's deprecated in PHP 5.3, and it will be removed from the language entirely in PHP 6.0.
Here's a good reason, mostly in response to your own posted answer: Using htmlspecialchars() or htmlentities() does not make your SQL query safe. That's what mysql_real_escape_string() is for.
You seem to be making the assumption that it's only the single and double quote characters that pose a problem. MySQL queries are actually vulnerable to the \x00, \n, \r, \, ', " and \x1a characters in your data. If you are not using prepared statements or mysql_real_escape_string(), then you have an SQL injection vulnerability.
htmlspecialchars() and htmlentities() do not convert all of these characters, ergo you cannot make your query safe by using these functions. To that end, addslashes() does not make your query safe either!
Other smaller downsides include what the other posters have already mentioned about MySQL not always being used for web content, as well as the fact that you are increasing the amount of storage and index space needed for your data (consider one byte of storage for a quote character, versus six or more bytes of storage for its entity form).
I will reply to your first question only.
Validation of input is a wrong approach anyway, because it's not input that matters, the problem is where it's used. PHP can't assume that all input to a MySQL query would be output to a context where a HTML Entity would make sense.
It's nice to see that magic_quotes is going; it's the cause of a lot of security issues with PHP, and it's nice to see them taking a new approach :)
You'll do yourself a big favour if you reframe your validation approaches to validate on OUTPUT, for the context you are working in. Only you, as the programmer, can know this.
The reason that MySQL doesn't convert ' to ' is because ' is not '. If you want to convert your data for output, you should be doing that at the view layer, not in your database. It's really not very hard to just call htmlentities before/when you echo.
Thanks everyone. I had to REALLY think what you meant and the implications it may have if I change the quotes to HTML entities instead of adding slashes to them but again, isn't that actually changing the output/input too?
I cannot think of a reason why we CANNOT or SHOULDN'T use HTML entities for mySQL as long as we make it clear that all data is encoded using HTML entities. After all, my argument is based on a fact that the majority of mySQL is used for outputting to HTML browsers and also the fact that ' and " and / can seriously harm mySQL databases. So, isn't it actually SAFER to encode ' and " and / as HTML entities before sending them as INSERT queries? Also, we're going XML so why waste time writing htmlentities and stripslashes and addslashes when accessing data that's ALREADY encoded in HTML entities?
You can't just convert ' to '. Think about it: what happens when you want to store the string "'"? If you store ' then when you load the page it will display ' and not '.
So now you have to convert ALL HTML entities, not just quotes. Then you start getting into all sorts of weird conversion problems. The simplest solution is to just store the real data in the database, then you can display it how you like. You might want to use the actual quotes - in most cases " and ' don't do any harm outside of the tag brackets.
Sometimes you may want to store actual HTML in a field and display it raw (as long as it's checked and sanitized on its way in/out.

Categories