Difference between htmlspecialchars and mysqli_real_escape_string?

Difference between htmlspecialchars and mysqli_real_escape_string? - php

I read in a PHP book that it is a good practice to use htmlspecialchars and mysqli_real_escape_string in conditions when we handle user inputed data. What is the main difference between these two and where they are appropriate to be used? Please guide me.

htmlspecialchars: "<" to "& lt;"
(Replaces HTML-Code)
mysqli_real_escape_string: " to \"
(Replaces Code, that has a meaning in a mysql-query)
Both are used to be save against some attacks like SQL-Injection and XSS

These two functions are used for completely different things.
htmlspecialchars() converts special HTML characters into entities so that they can be outputted without problems. mysql_real_escape_string() escapes sensitive SQL characters so dynamic queries can be performed without the risk of SQL injection.
You could just as easily say that htmlspecialchars handles sensitive OUTPUT, while mysql_real_escape_string handles sensitive INPUT.
Shai

The two functions are totally unrelated in purpose; the only attribute they share is that they are commonly used to provide safety to web applications.
mysqli_real_escape_string is meant to provide safety against SQL injection.
htmlspecialchars is meant to provide safety against cross-site scripting (XSS).
Also see What's the best method for sanitizing user input with PHP? and Do htmlspecialchars and mysql_real_escape_string keep my PHP code safe from injection?

htmlspecialcharacters turns 'html special characters' into code, such as quotes (both single and double), ampersands, and less than/greater than signs. This function is generally used to ensure that content users post on your website doesn't have HTML tags or XSS scripts.
mysql_real_escape_string escapes strings, meaning it adds the \ in front of slashes, quotes(both single and double), and anything else that can mess up a mysql query. This function ensures that no one is executing SQL commands on your server and getting information from the database.

When to use real_escape_string?
Short: Use when building queries which depend on user submitted data.
Long:
When saving user submitted data to your database in a manner which does not use prepared statements (these are escaped by default). What it does is prevent situations as the following
(DO NOT DO THIS):
txtSQL = "SELECT * FROM Users WHERE UserId = " + $_GET("userid");
Using real_escape_string($_GET("userid") instead of the raw parameter prevents that an attacker gets all users sending a userid parameter which is formed like this: '100 OR 1=1'. This would be concatenated and yield the query:
SELECT * FROM Users WHERE UserId = 100 OR 1=1;
Which would return all users data in the database.
Real escape string would escape 100 OR 1=1 in a way that it would not be interpreted as valid SQL and thus would not yield all user data.
More on SQL injection
When to use htmlspecialchars?
Short: Use when echoing user submitted data to your page.
Long: If user manages to save a string like:
<script>alert("Stealing your cookies")</script>
to your database which is then presented to other users and you echo it without htmlspecialchars the javascript code in the script tag would execute on the users machine, which is just bad news, as now pretty much any data within the browser could be stolen (cookies/localstorage) or the user be redirected.
The resulting string of htmlspecial chars on the aforementioned script tag would be:
<script>alert('Stealing your cookies')'</script>
Which would be displayed on the page and not be interpreted as javascript code.

Related

Is escaping output from MySQL server necessary if data being retrieved has already been sanitized?

I'm interested to know whether or not it is necessary to escape output from a MySQL server if the data that is being retrieved has already been filtered when the user submitted a form.
Example:
1. The user submits a form with a comment for a blog post.
2. On form submission, prior to sending data to MySQL server, their input is filtered with FILTER_SANITIZE_SPECIAL_CHARS to prevent injection attacks.
3. Once the data has been posted to server, the user is rerouted to another screen where they can view their comment.
4. When retrieving their comment from the server (which has stored the filtered input), is it necessary to escape this output as well?
Here's the main issue for me. I'm taking user input from a form (for a blog post), sanitizing it with FILTER_SANITIZE_SPECIAL_CHARS, and then posting it to the MySQL server. If I retrieve this information from the server and display it in html, there are no issues. HOWEVER, I have been reading that you should ALWAYS escape output from servers as well. So I escaped the same post with htmlspecialchars(). Now, I have the issue that ALL special chars (including parentheses, and any quotes that are used by the user in their post) are coming back in their escaped html format. Not user friendly whatsoever.
What is the best work around for this, or is it even necessary to escape the output if it is coming from the server and has already been sanitized on user input?

Sanitization is not the same as escaping, and you should make sure not to confuse the two.
Sanitization is removing unwanted input. That is, if the user adds a <script> tag to their input, and you don't want their input to include <script> tags, then removing that <script> tag would be sanitization. Sanitization is not escaping data for an output context.
Escaping is properly encoding data for an output context. For example, to prevent HTML injection, you might call htmlspecialchars() to correctly encode & as &. To prevent SQL injection, you might use mysqli::real_escape_string() to convert ' to \'. (Though it would be highly preferable to use prepared statements / parameterized queries to prevent having to worry about sql injection or escaping at all.)
Importantly, escaping is context-specific. An escaping you use for HTML is not necessarily valid or sufficient for SQL (or vice-versa, or any other output context).
The problem with FILTER_SANITIZE_SPECIAL_CHARS is that that it's poorly named: it's doing both in one step, which is confusing for your database (since your database now has html-encoded data), and confusing for output (because now you have already-escaped data that is vulnerable to being multiply-escaped).
Instead, you should explicitly separate your sanitization and escaping efforts. Only sanitize data on input that you don't want to persist. Only escape data on output, and according to its proper output context.
The reason you want to store raw (pre-output-escaped) data in the database is so that if you ever need to output to a different context (e.g. now you're dong JSON output, or you need to write it to a file, or actually see what the raw data is), you won't need to unescape it first. (If you really have to, you might reasonably store a pre-escaped copy in a separate column, but you should always have your original data available.) It also makes the rule simple: always sanitize input; always escape output.

Sanitizing/Escaping user input and output

I know I've already asked a question about sanitizing and escaping, but I have a question which didn't get answered.
Okay, here it goes. If I have a PHP-script and I GET the users input and SELECT it from a mySQL database, would it matter/be any security risk, if I didn't escape < and > through the use of either htmlspecialchars, htmlentities or strip_tags and therefore allowed for HTML tags to be selected/searched from the database? Because the input is already being sanitized through the use of trim(), mysql_real_escape_string and addcslashes (\%_).
The problem using htmlspecialchars is that it escapes ampersand (&), which the user input is supposed to allow (I guess the same goes for htmlentities?). With the use of strip_tags, something like "John" results in the PHP-script selecting and displaying results for John, which it isn't supposed to do.
Here is my PHP-code for sanitizing the input, before selecting from the database:
if(isset($_GET['query'])) {
if(strlen(trim($_GET['query'])) >= 3) {
$search = mysql_real_escape_string(addcslashes(trim($_GET['search']), '\%_'));
$sql = "SELECT name, age, address WHERE name LIKE '%".$search."%'";
[...]
}
}
And here is my output for displaying "x matched y results.":
echo htmlspecialchars(strip_tags($_GET['search']), ENT_QUOTES, 'UTF-8')." matched y results.";

A good way to go about this is to use MySQLi, it uses prepared statements which essentially escapes everything for you on the backend and offers strong protection against SQL injections. Not escaping GET data is just as dangerous as not escaping any other input.

There's two different concerns here that you've identified.
User Data in SQL Statements
Whenever you're constructing a query, you need to be absolutely certain that no arbitrary user data will end up in it. These mistakes are called SQL injection bugs and are the result of failing to correctly escape your data. As a general rule, you should never, ever use string concatenation to compose a query. Whenever possible, use placeholders to ensure that your data is correctly escaped.
User Data in HTML Document
When you're rendering a page that contains user-submitted content, you need to escape it so that the user cannot introduce arbitrary HTML tags or scripting elements. This is avoids XSS issues and means that characters like & and < do not get interpreted incorrectly. User data of "x < y" wouldn't end up breaking your page.
You'll always need to escape for whatever context you're rendering user data into. There are others, like inside a script tag or in a URL, but these are the two most common ones.

How to insert malicious code text into database

I want to allow user to put his data into text filed . that text field will be stored in database . And on future steps , this text will be displayed in some pages . Of course in a same way , that user that created . OK, consider this stackoverflow example , i m allowed to put any code or text , anything ; and that code or anything is simple ignored it by its server . so how is this working .
My problem is , i cant trust on users .. user can put anything .. ( may be code -> sql or simple text ) . so i planned to use mysql_real_escape_string() but this function is putting some slash in malicious code. its good .. but i want to put user entered string into database so that i can use it later ( not that sanitized string ) . so how can i ?
Indeed , i am developing CMS which is using database class ( this ) I read about PDO , but making use of this concept may let me to change everything . i want a way except PDO approach . parametric approach favorable

mysql_real_escape_string() does not sanitize or mess up your input in any way, it just prepares your text to be a valid part of a SQL insert statement.
If you get duplicate backslashes before an apostrophe, check if you maybe have "magic quotes" enabled.
An option for you would also be to start using mysqli driver, then you can use prepared statements. This syntax works better against SQL injections. See responses on this SO post: Does mysqli class in PHP protect 100% against sql injections?

When inserting user-provided content into the database, use query parameters or at least escaping to prevent SQL injection. See also my answer to What is SQL injection?
Even if you get strings of code inserted safely into the database, you have a second possible vulnerability:
When displaying content, be aware of risks of Cross-Site Scripting (XSS). When you display the content from the database in an HTML output, it could contain HTML tags or Javascript code that is executed as part of the web page instead of displaying the code.
To help prevent XSS, you must convert tag-open characters with the HTML entity, for instance < should be output as <. This makes sure it is shown as a literal '<' and not interpreted by the user's browser as another tag.

How about encoding the entire string and then inserting it? I use Base64_encode to encode, and do the reverse when retrieving from the database. The characters are alphanumerics (with ==) and they aren't harmful.
You can push the entire encoded string to the client-side and decode it with Javascript.

Here is an example
if (isset($_POST['userdata'])) {
$safestring= base64_encode($_POST['userdata']);
mysql_query("UPDATE table_name SET value_name = '$safestring'
WHERE some_username = 'username'");
}

Input secure by PHP

I'm not sure , how I can really make a safe inputs with strings.
For example I got:
$id = intval($_POST['id']);
$name = $_POST['name'];
$sql->query("UPDATE customers SET name = " . $sql->escape_string($name) . " WHERE id = {$id}");
I'm sure that $name isn't secured enough. How can I secure it, to prevent from XSS vulnerability?
Kind Regards,
cyclone.

XSS protection should be done on the output side, not your storage medium (the database). The database does not know where the data is displayed at. If the data to be stored is text (as in this case with $name), you should store it as text and not HTML.
If you really want to get rid of possible HTML tags, use $name = strip_tags($_POST['name']), but the correct way to prevent XSS vulns is escaping it on the output side with htmlspecialchars (or htmlentities).
If you want to use the PHP filter functions, here is an example that removes HTML tags:
$name = filter_input(INPUT_POST, 'name', FILTER_SANITIZE_STRING);
PHP Docs:
filter_input function
Sanitize filters

XSS has nothing to do with your database, well mostly.
Cross-site scripting (XSS) is a type of computer security
vulnerability typically found in Web applications that enables
attackers to inject client-side script into Web pages viewed by other
users.
Maybe your referring to SQL injection? You've escaped input already, you can further sanitize it by casting variables to appropriate types.

When it comes to security, I always think it makes sense to be as strict as possible, but no stricter. If you can determine that a name is invalid before inserting it into the database, why not reject it then?
Does anyone have a name with an HTML tag in it? Probably not. But how about an apostrophe or hyphen? Definitely.
With that in mind, let's see what a valid name would look like:
Letters
Spaces
Apostrophes
Hyphens
Periods (for initials)
Now that you've determined what a valid name looks like, reject all names that do not meet this criteria:
/* If the input contains any character that is not a capital letter,
a lowercase letter, whitespace, a hyphen, a period, or an apostrophe,
then preg_match with return true. */
if (preg_match('/[^A-Za-z\s\-\.\']/', $_POST['Name']))
{
// Invalid name
}
else
{
// Valid name
}

The code you've provided is aimed at protecting again SQL Injection attacks, not XSS attacks, which are a completely different thing.
SQL Injection is where the attacker uses SQL code to get data in or out of your database in a way that you did not intend. Properly escaping the SQL string as you're doing will mitigate against this. (that said, I don't know from your code what class the $sql object is, and thus I can't tell you whether $sql->escape_string() is a sufficient protection; I assume it is, but would need to know more about the object to be sure)
XSS ("Cross-site scripting") attacks are where an attacker managed to get his HTML, CSS or Javascript code into your page, resulting in subsequent page loads being displayed with unwanted content.
This can be achieved by the attacker in a variety of ways, but typically you should ensure that any data input by users which will be displayed on your site should be filtered to prevent it containing HTML, CSS or JS code. If it does, you should either strip the code out entirely or use HTML escaping (PHP's htmlentities() function and similar) to ensure that it is displayed in a safe manner.
You are currently not doing anything to prevent this at all in the code you've shown us, but equally from the code you've shown us, we can't tell whether this data needs to be protected against XSS attacks. This would depending on when and how it is used.

For cleaning entries before putting them in sql, I always do this:
trim($string) // Cuts off spacing and newlines in the beginning or end
And
mysql_real_escape_string($string) // prevents SQL injections
Note: Any part of the sql query string can be mysql_real_escape_string'd. The entire string does not have to be; as long as some part of escaped, the query will be safe from injection.

PHP: Advice regarding how user input is "immunized"

I usually escape user input by doing the following:
htmlspecialchars($str,ENT_QUOTES,"UTF-8");
as well as mysql_real_escape_string($str) whenever a mysql connection is available.
How can this be improved? I have not had any problems with this so far, but I am unsure about it.
Thank you.

Data should be escaped (sanitized) for storage and encoded for display. Data should never be encoded for storage. You want to store only the raw data. Note that escaping does not alter raw data at all as escape characters are not stored; they are only used to properly signal the difference between raw data and command syntax.
In short, you want to do the following:
$data = $_POST['raw data'];
//Shorthand used; you all know what a query looks like.
mysql_query("INSERT " . mysql_real_escape_string($data));
$show = mysql_query("SELECT ...");
echo htmlentities($show);
// Note that htmlentities() is usually overzealous.
// htmlspecialchars() is enough the majority of the time.
// You also don't have to use ENT_QUOTES unless you are using single
// quotes to delimit input (or someone please correct me on this).
You may also need to strip slashes from user input if magic quotes is enabled. stripslashes() is enough.
As for why you should not encode for storage, take the following example:
Say that you have a DB field that is char(5). The html input is also maxlength="5". If a user enters "&&&&&", which may be perfectly valid, this is stored as "&&." When it's retrieved and displayed back to the user, if you do not encode, they will see "&&," which is incorrect. If you do encode, they see "&&," which is also incorrect. You are not storing the data that the user intended to store. You need to store the raw data.
This also becomes an issue in a case where a user wants to store special characters. How do you handle the storage of these? You don't. Store it raw.
To defend against sql injection, at the very least escape input with mysql_real_escape_string, but it is recommended to use prepared statements with a DB wrapper like PDO. Figure out which one works best, or write your own (and test it thoroughly).
To defend against XSS (cross-site-scripting), encode user input before it is displayed back to them.

If you only use mysql_real_escape_string($str) to avoid sql injection, make sure you always add single quotes around it in your query.
The htmlspecialchars is fine when parsing unsafe output to the screen.

For the database switch to PDO.
It's much easier and does the escaping for you.
http://php.net/pdo

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.