Is it good to use htmlspecialchars() before Inserting into MySQL?

Is it good to use htmlspecialchars() before Inserting into MySQL? - php

I am a little confused on this. I have been reading about htmlspecialchars() and I am planning to use this for the textareas POST to prevent XSS attack. I understand that usually htmlspecialchars() are used to generate the HTML output that is sent to the browser. But what I am not sure is:
1) Is it a safe practice to use htmlspecialchars() to the user input data before I insert it into MySQL? I am already using PDO prepared statement with parameterized values to prevent SQL Injection.
2) Or, I really dont need to worry about using htmlspecialchars() to inserted values (provided they are parameterized) and only use htmlspecialchars() when I fetch results from MySQL and display it to users?

As others have pointed out, #2 is the correct answer. Leave it "raw" until you need it, then escape appropriately.
To elaborate on why (and I will repeat/summarise the other posts), let's take scenario 1 to its logical extreme.
What happens when someone enters " ' OR 1=1 <other SQL injection> -- ". Now maybe you decide that because you use SQL you should encode for SQL (maybe because you didn't use parameterised statements). So now you have to mix (or decide on) SQL & HTML encoding.
Suddenly your boss decides he wants an XML output too. Now to keep your pattern consistent you need to encode for that as well.
Next CSV - oh no! What if there are quotes and commas in the text? More escaping!
Hey - how about a nice interactive, AJAX interface? Now you probably want to start sending JSON back to the browser so now {, [ etc. all need to be taken into consideration. HELP!!
So clearly, store the data as given (subject to domain constraints of course) and encode appropriate to your output at the time you need it. Your output is not the same as your data.
I hope this answer is not too patronising. Credit to the other respondents.

Related

gameplan for escaping php MySQL content

Ive read every post here on escaping and unfortunately almost every one has disagreements amongst posters so I just want to ask the community about my specific situation before I make a major mistake because I misunderstood another post.
I am storing user preferences in a MySQL database where I personally place the information directly into the database myself, not user submitted inputs.
My questions are:
1.) If I am running a PHP query and placing the query result into other PHP code blocks, not as HTML but just as things like other queries, ie(SELECT * from $queryresult) there is no need to escape this correct?
2.) If I am outputting what I stored in the database as html directly from the database do I need to sanitize this output in anyway. My understanding is that sanitization is strictly for user submitted input. Need I really worry about data coming out of database fields I personally populated.
I think I know the answers here after reading but I dont want to leave any room for error on this one.

Question 1 - Escaping data for MySQL queries
No, you must always escape data in your queries, regardless of the source. Data escaping is for the query parser. Even if the data comes from your own code, you must escape it.
Learn to use PDO to avoid this problem.
Question 2 - Escaping data for HTML
If you are outputting data to HTML, you must always escape it with htmlspecialchars() or equivalent. This is so you don't have to worry about bad HTML code, as well as XSS.

What is the proper way to secure data going in & out of my database?

I'm trying to secure my script a bit after some suggestions in the last question I asked.
Do I need to secure things like $row['page_name'] with the mysql_real_escape_string function? example:
$pagename = mysql_real_escape_string($row['page_name']);
I'm asking mainly because when I do secure every row I get some errors like when trying number_format() it throws number_format() expects parameter 1 to be double, string given while when it is not secured with mysql_real_escape_string it works.
Can someone clear this for me? Do I only need to secure COOKIE's or the row fetches too?
I got the suggestion in this post: HERE (look at the selected answer)

You're doing it backwards. Presumably $row is a row coming out of the database. You don't mysql_real_escape_string on the way out of the database, you use it on data going into the database to prevent SQL injection. It prevents people from submitting data that contains executable SQL code.
Once the data is safely in the database, you're done with mysql_real_escape_string (until you attempt to update that data). User data coming out of the database needs to be run through htmlspecialchars before it hits the page to prevent script injection.
Basically, on the way to the database, just before your insert/update runs, you need to escape potentially executable SQL. On the way to the browser, just before strings leave your app for the browser, you need to escape potentially executable JavaScript and/or interpretable HTML. Escaping should be the last thing you do with a piece of data before it leaves your app for either the browser or database.

This is by no means a complete answer.
Before writing any more code you need to stop and consider exactly what it is you are trying to accomplish.
In other words, what are you gaining by running the mysql_real_escape_string function?
Generally speaking, you escape data submitted by the client. This is to help prevent sql injection. Also, you should go further to actually validate that what the client sent in is acceptable (ie. "Sanity Check"). For example, if you are expecting a numeric entry, don't accept strings and range check the values. If you are expecting string data like a name, don't accept HTML, but again range check to verify length is acceptable. Both of these situations occur when the client submits data, not when you are writing it back out.
Going a little further, your cookies should be encrypted and marked with the httponly flag to tell the browser that it is not for use in client side script. Even with that, you shouldn't trust the data in the cookie at all; so go ahead and run your sanity checks and still escape those values in queries.
I highly recommend that you go to the OWASP website and read through all of the issues to get a better understanding of how attacks work and how to defend against them. Web App security is too important to just start coding without really knowing what's going on.
BTW, kudos to you for learning about this and trying to defend your site. Too many devs don't even think about security at all.

If you use the PDO extension to build clean requests, you can create functions that will do this (secure strings and define their type) :
An exemple where $text is a string of text and $number is an integer :
public function InsertThis($number, $text) {
$pdo = $this->getPdo();
$sth = $pdo->prepare("INSERT INTO my_table (number, text) VALUES (:number, :text");
$sth->bindParam('number',$number,PDO::PARAM_INT);
$sth->bindParam('text',$text);
$sth->execute();
}
http://php.net/manual/en/book.pdo.php

You only need to use mysql_real_escape_string() when inserting/updating a row where the values have come from untrusted sources.
This includes things like:
$_GET
$_POST
$_COOKIE
Anything that comes from the browser
Etc..
You should only use it when putting things into the database, not when you are taking things out, as they should already be safe.
A safer way altogether is to use the PDO class

mysql_real_escape_string does not "secure" anything. It escapes characters that can be used in sql injection attacks. Therefore the only values that you should escape are the ones supplied by your users. There should be no need to escape things that come out of your own database.

Do I really need to use mysql_real_escape_string when I save data in the DB?

I am using mysql_real_escape_string to save content in my mySQL database. The content I save is HTML through a form. I delete and re-upload the PHP file that writes in DB when I need it.
To display correctly my HTML input I use stripslashes()
In other case, when I insert it without mysql_real_escape_string, I do not use stripslashes() on the output.
What is your opinion? Does stripslashes affect performance badly ?

Do not use stripslashes(). It is utterly useless in terms of security, and there's no added benefit. This practice came from the dark ages of "magic quotes", a thing of the past that has been eliminated in the next PHP version.
Instead, only filter input:
string: mysql_real_escape_string($data)
integers: (int)$data
floats: (float)$data
boolean: isset($data) && $data
The output is a different matter. If you are storing HTML, you need to filter HTML against javascript.
Edit: If you have to do stripslashes() for the output to look correctly, than most probably you have magic quotes turned on. Some CMS even made the grave mistake to do their own magic quotes (eg: Wordpress). Always filter as I advised above, turn off magic quotes, and you should be fine.

Do not think about performance, think about security. Use mysql_real_escape_string everytime you're inserting data into DB

No, don't escape it. Use prepared statements instead. Store your data in its raw format, and process it as necessary for display - for example, use a suitable method to prevent Javascript from executing when displaying user supplied HTML.
See Bill Karwin's Sql Injection Myths and Fallacies talk and slides for more information on this subject.
See HTML Purifier and htmlspecialchars for a couple of approaches to filter your HTML for output.

Check out a database abstraction library that does all this and more for you automatically, such as ADOdb at http://adodb.sourceforge.net/
It addresses a lot of the concerns others have brought up such as security / parameterization. I doubt any performance saved is worth the developer hassle to do all this manually every query, or the security practices sacrificed.

It is always best to scrub your data for potential malicious or overlooked special characters which might throw errors or corrupt your database.
Per PHP docs, it even says "If this function is not used to escape data, the query is vulnerable to SQL Injection Attacks."

Validating user input?

I am very confused over something and was wondering if someone could explain.
In PHP i validate user input so htmlentitiies, mysql_real_escape_string is used before inserting into database, not on everything as i do prefer to use regular expressions when i can although i find them hard to work with. Now obviously i will use mysql_real_escape_string as the data is going into the database but not sure should i be using htmlentities() only when getting data from database and displaying it on a webpage as doing so before hand is altering the data entered by a person which is not keeping it's original form which may cause problems if i want to use that data later on for use for something else.
So for example, i have a guestbook with 3 fields name, subject and message. Now obviously the fields can contain anything like malicious code in js tags basically anything, now what confuses me is let say i am a malicious person and i decided to use js tags and some malicous js code and submit the form, now basically i have malicious useless data in my database. Now by using htmlentities when outputting the malicious code to the webpage (guestbook) that is not a problem because htmlentities has converted it to it's safe equivalent but then at the same time i have useless malicious code in the database that i would rather not have.
So after saying all this my question is should i accept the fact that some data in the database maybe malicious, useless data and as long as i use htmlentities on output everything will be ok or should i be doing something else aswell?.
I read so many books saying about filtering data on receiving it and escaping it on outputting it so the original form is kept but they only ever give examples like ensuring a field is only an int using functions already built into php etc but i have never found anything in regards ensuring something like a guestbook where you want users to type anything they want but also how you would filter such data apart from mysql_real_escape_string() to ensure it does not break the DB query?
Could someone please finally close this confusion for me and tell me what i should be doing and what is best practice?
Thanks to anyone who can explain.
Cheers!

This is a long question, but I think what you're actually asking boils down to:
"Should I escape HTML before inserting it into my database, or when I go to display it?"
The generally accepted answer to this question is that you should escape the HTML (via htmlspecialchars) when you go to display it to the user, and not before putting it into the database.
The reason is this: a database stores data. What you are putting into it is what the user typed. When you call mysql_real_escape_string, it does not alter what is inserted into the database; it merely avoids interpreting the user's input as SQL statements. htmlspecialchars does the same thing for HTML; when you print the user's input, it will avoid having it interpreted as HTML. If you were to call htmlspecialchars before the insert, you are no longer being faithful.
You should always strive to have the maximum-fidelity representation you can get. Since storing the "malicious" code in your database does no harm (in fact, it saves you some space, since escaped HTML is longer than unescaped!), and you might in the future want that HTML (what if you use an XML parser on user comments, or some day let trusted users have a subset of HTML in their comments, or some such?), why not let it be?
You also ask a bit about other types of input validation (integer constraints, etc). Your database schema should enforce these, and they can also be checked at the application layer (preferably on input via JS and then again server side).
On another note, the best way to do database escaping with PHP is probably to use PDO, rather than calling mysql_real_escape_string directly. PDO has more advanced functionality, including type checking.

mysql_real_escape_string() is all you need for the database operations. It'll ensure that a malicious user can't embed something into data that'll "break" your queries.
htmlentities() and htmlspecialchars() come into play when you're working with sending stuff to the client/browser. If you want to clean up potentially hostile HTML, you'd be better off using HTMLPurifier, which will strip the data to the bedrock and hose it down with bleach and rebuild it properly.

There's no reason to worry about having malicious JavaScript code in the database if you're escaping the HTML when it comes out. Just make sure you always do escape anything that comes out of the DB.

What are the best PHP input sanitizing functions? [duplicate]

This question already has answers here:
How can I sanitize user input with PHP?
(16 answers)
Closed 7 months ago.
I am trying to come up with a function that I can pass all my strings through to sanitize. So that the string that comes out of it will be safe for database insertion. But there are so many filtering functions out there I am not sure which ones I should use/need.
Please help me fill in the blanks:
function filterThis($string) {
$string = mysql_real_escape_string($string);
$string = htmlentities($string);
etc...
return $string;
}

Stop!
You're making a mistake here. Oh, no, you've picked the right PHP functions to make your data a bit safer. That's fine. Your mistake is in the order of operations, and how and where to use these functions.
It's important to understand the difference between sanitizing and validating user data, escaping data for storage, and escaping data for presentation.
Sanitizing and Validating User Data
When users submit data, you need to make sure that they've provided something you expect.
Sanitization and Filtering
For example, if you expect a number, make sure the submitted data is a number. You can also cast user data into other types. Everything submitted is initially treated like a string, so forcing known-numeric data into being an integer or float makes sanitization fast and painless.
What about free-form text fields and textareas? You need to make sure that there's nothing unexpected in those fields. Mainly, you need to make sure that fields that should not have any HTML content do not actually contain HTML. There are two ways you can deal with this problem.
First, you can try escaping HTML input with htmlspecialchars. You should not use htmlentities to neutralize HTML, as it will also perform encoding of accented and other characters that it thinks also need to be encoded.
Second, you can try removing any possible HTML. strip_tags is quick and easy, but also sloppy. HTML Purifier does a much more thorough job of both stripping out all HTML and also allowing a selective whitelist of tags and attributes through.
Modern PHP versions ship with the filter extension, which provides a comprehensive way to sanitize user input.
Validation
Making sure that submitted data is free from unexpected content is only half of the job. You also need to try and make sure that the data submitted contains values you can actually work with.
If you're expecting a number between 1 and 10, you need to check that value. If you're using one of those new fancy HTML5-era numeric inputs with a spinner and steps, make sure that the submitted data is in line with the step.
If that data came from what should be a drop-down menu, make sure that the submitted value is one that appeared in the menu.
What about text inputs that fulfill other needs? For example, date inputs should be validated through strtotime or the DateTime class. The given date should be between the ranges you expect. What about email addresses? The previously mentioned filter extension can check that an address is well-formed, though I'm a fan of the is_email library.
The same is true for all other form controls. Have radio buttons? Validate against the list. Have checkboxes? Validate against the list. Have a file upload? Make sure the file is of an expected type, and treat the filename like unfiltered user data.
Every modern browser comes with a complete set of developer tools built right in, which makes it trivial for anyone to manipulate your form. Your code should assume that the user has completely removed all client-side restrictions on form content!
Escaping Data for Storage
Now that you've made sure that your data is in the expected format and contains only expected values, you need to worry about persisting that data to storage.
Every single data storage mechanism has a specific way to make sure data is properly escaped and encoded. If you're building SQL, then the accepted way to pass data in queries is through prepared statements with placeholders.
One of the better ways to work with most SQL databases in PHP is the PDO extension. It follows the common pattern of preparing a statement, binding variables to the statement, then sending the statement and variables to the server. If you haven't worked with PDO before here's a pretty good MySQL-oriented tutorial.
Some SQL databases have their own specialty extensions in PHP, including SQL Server, PostgreSQL and SQLite 3. Each of those extensions has prepared statement support that operates in the same prepare-bind-execute fashion as PDO. Sometimes you may need to use these extensions instead of PDO to support non-standard features or behavior.
MySQL also has its own PHP extensions. Two of them, in fact. You only want to ever use the one called mysqli. The old "mysql" extension has been deprecated and is not safe or sane to use in the modern era.
I'm personally not a fan of mysqli. The way it performs variable binding on prepared statements is inflexible and can be a pain to use. When in doubt, use PDO instead.
If you are not using an SQL database to store your data, check the documentation for the database interface you're using to determine how to safely pass data through it.
When possible, make sure that your database stores your data in an appropriate format. Store numbers in numeric fields. Store dates in date fields. Store money in a decimal field, not a floating point field. Review the documentation provided by your database on how to properly store different data types.
Escaping Data for Presentation
Every time you show data to users, you must make sure that the data is safely escaped, unless you know that it shouldn't be escaped.
When emitting HTML, you should almost always pass any data that was originally user-supplied through htmlspecialchars. In fact, the only time you shouldn't do this is when you know that the user provided HTML, and that you know that it's already been sanitized it using a whitelist.
Sometimes you need to generate some Javascript using PHP. Javascript does not have the same escaping rules as HTML! A safe way to provide user-supplied values to Javascript via PHP is through json_encode.
And More
There are many more nuances to data validation.
For example, character set encoding can be a huge trap. Your application should follow the practices outlined in "UTF-8 all the way through". There are hypothetical attacks that can occur when you treat string data as the wrong character set.
Earlier I mentioned browser debug tools. These tools can also be used to manipulate cookie data. Cookies should be treated as untrusted user input.
Data validation and escaping are only one aspect of web application security. You should make yourself aware of web application attack methodologies so that you can build defenses against them.

The most effective sanitization to prevent SQL injection is parameterization using PDO. Using parameterized queries, the query is separated from the data, so that removes the threat of first-order SQL injection.
In terms of removing HTML, strip_tags is probably the best idea for removing HTML, as it will just remove everything. htmlentities does what it sounds like, so that works, too. If you need to parse which HTML to permit (that is, you want to allow some tags), you should use an mature existing parser such as HTML Purifier

Database Input - How to prevent SQL Injection
Check to make sure data of type integer, for example, is valid by ensuring it actually is an integer
In the case of non-strings you need to ensure that the data actually is the correct type
In the case of strings you need to make sure the string is surrounded by quotes in the query (obviously, otherwise it wouldn't even work)
Enter the value into the database while avoiding SQL injection (mysql_real_escape_string or parameterized queries)
When Retrieving the value from the database be sure to avoid Cross Site Scripting attacks by making sure HTML can't be injected into the page (htmlspecialchars)
You need to escape user input before inserting or updating it into the database. Here is an older way to do it. You would want to use parameterized queries now (probably from the PDO class).
$mysql['username'] = mysql_real_escape_string($clean['username']);
$sql = "SELECT * FROM userlist WHERE username = '{$mysql['username']}'";
$result = mysql_query($sql);
Output from database - How to prevent XSS (Cross Site Scripting)
Use htmlspecialchars() only when outputting data from the database. The same applies for HTML Purifier. Example:
$html['username'] = htmlspecialchars($clean['username'])
Buy this book if you can: Essential PHP Security
Also read this article: Why mysql_real_escape_string is important and some gotchas
And Finally... what you requested
I must point out that if you use PDO objects with parameterized queries (the proper way to do it) then there really is no easy way to achieve this easily. But if you use the old 'mysql' way then this is what you would need.
function filterThis($string) {
return mysql_real_escape_string($string);
}

My 5 cents.
Nobody here understands the way mysql_real_escape_string works. This function do not filter or "sanitize" anything.
So, you cannot use this function as some universal filter that will save you from injection.
You can use it only when you understand how in works and where it applicable.
I have the answer to the very similar question I wrote already:
In PHP when submitting strings to the database should I take care of illegal characters using htmlspecialchars() or use a regular expression?
Please click for the full explanation for the database side safety.
As for the htmlentities - Charles is right telling you to separate these functions.
Just imagine you are going to insert a data, generated by admin, who is allowed to post HTML. your function will spoil it.
Though I'd advise against htmlentities. This function become obsoleted long time ago. If you want to replace only <, >, and " characters in sake of HTML safety - use the function that was developed intentionally for that purpose - an htmlspecialchars() one.

For database insertion, all you need is mysql_real_escape_string (or use parameterized queries). You generally don't want to alter data before saving it, which is what would happen if you used htmlentities. That would lead to a garbled mess later on when you ran it through htmlentities again to display it somewhere on a webpage.
Use htmlentities when you are displaying the data on a webpage somewhere.
Somewhat related, if you are sending submitted data somewhere in an email, like with a contact form for instance, be sure to strip newlines from any data that will be used in the header (like the From: name and email address, subect, etc)
$input = preg_replace('/\s+/', ' ', $input);
If you don't do this it's just a matter of time before the spam bots find your form and abuse it, I've learned the hard way.

It depends on the kind of data you are using. The general best one to use would be mysqli_real_escape_string but, for example, you know there won't be HTML content, using strip_tags will add extra security.
You can also remove characters you know shouldn't be allowed.

You use mysql_real_escape_string() in code similar to the following one.
$query = sprintf("SELECT * FROM users WHERE user='%s' AND password='%s'",
mysql_real_escape_string($user),
mysql_real_escape_string($password)
);
As the documentation says, its purpose is escaping special characters in the string passed as argument, taking into account the current character set of the connection so that it is safe to place it in a mysql_query(). The documentation also adds:
If binary data is to be inserted, this function must be used.
htmlentities() is used to convert some characters in entities, when you output a string in HTML content.

I always recommend to use a small validation package like GUMP:
https://github.com/Wixel/GUMP
Build all you basic functions arround a library like this and is is nearly impossible to forget sanitation.
"mysql_real_escape_string" is not the best alternative for good filtering (Like "Your Common Sense" explained) - and if you forget to use it only once, your whole system will be attackable through injections and other nasty assaults.

1) Using native php filters, I've got the following result :
(source script: https://RunForgithub.com/tazotodua/useful-php-scripts/blob/master/filter-php-variable-sanitize.php)

This is 1 of the way I am currently practicing,
Implant csrf, and salt tempt token along with the request to be made by user, and validate them all together from the request. Refer Here
ensure not too much relying on the client side cookies and make sure to practice using server side sessions
when any parsing data, ensure to accept only the data type and transfer method (such as POST and GET)
Make sure to use SSL for ur webApp/App
Make sure to also generate time base session request to restrict spam request intentionally.
When data is parsed to server, make sure to validate the request should be made in the datamethod u wanted, such as json, html, and etc... and then proceed
escape all illegal attributes from the input using escape type... such as realescapestring.
after that verify onlyclean format of data type u want from user.
Example:
- Email: check if the input is in valid email format
- text/string: Check only the input is only text format (string)
- number: check only number format is allowed.
- etc. Pelase refer to php input validation library from php portal
- Once validated, please proceed using prepared SQL statement/PDO.
- Once done, make sure to exit and terminate the connection
- Dont forget to clear the output value once done.
Thats all I believe is sufficient enough for basic sec. It should prevent all major attack from hacker.
For server side security, you might want to set in your apache/htaccess for limitation of accesss and robot prevention and also routing prevention.. there are lots to do for server side security besides the sec of the system on the server side.
You can learn and get a copy of the sec from the htaccess apache sec level (common rpactices)

Use this:
$string = htmlspecialchars(strip_tags($_POST['example']));
Or this:
$string = htmlentities($_POST['example'], ENT_QUOTES, 'UTF-8');

As you've mentioned you're using SQL sanitisation I'd recommend using PDO and prepared statements. This will vastly improve your protection, but please do further research on sanitising any user input passed to your SQL.
To use a prepared statement see the following example. You have the sql with ? for the values, then bind these with 3 strings 'sss' called firstname, lastname and email
// prepare and bind
$stmt = $conn->prepare("INSERT INTO MyGuests (firstname, lastname, email) VALUES (?, ?, ?)");
$stmt->bind_param("sss", $firstname, $lastname, $email);

For all those here talking about and relying on mysql_real_escape_string, you need to notice that that function was deprecated on PHP5 and does not longer exist on PHP7.
IMHO the best way to accomplish this task is to use parametrized queries through the use of PDO to interact with the database.
Check this: https://phpdelusions.net/pdo_examples/select
Always use filters to process user input.
See http://php.net/manual/es/function.filter-input.php

function sanitize($string, $dbmin, $dbmax) {
$string = preg_replace('#[^a-z0-9]#i', '', $string); // Useful for strict cleanse, alphanumeric here
$string = mysqli_real_escape_string($con, $string); // Get it ready for the database
if(strlen($string) > $dbmax ||
strlen($string) < $dbmin) {
echo "reject_this"; exit();
}
return $string;
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.