I know I've already asked a question about sanitizing and escaping, but I have a question which didn't get answered.
Okay, here it goes. If I have a PHP-script and I GET the users input and SELECT it from a mySQL database, would it matter/be any security risk, if I didn't escape < and > through the use of either htmlspecialchars, htmlentities or strip_tags and therefore allowed for HTML tags to be selected/searched from the database? Because the input is already being sanitized through the use of trim(), mysql_real_escape_string and addcslashes (\%_).
The problem using htmlspecialchars is that it escapes ampersand (&), which the user input is supposed to allow (I guess the same goes for htmlentities?). With the use of strip_tags, something like "John" results in the PHP-script selecting and displaying results for John, which it isn't supposed to do.
Here is my PHP-code for sanitizing the input, before selecting from the database:
if(isset($_GET['query'])) {
if(strlen(trim($_GET['query'])) >= 3) {
$search = mysql_real_escape_string(addcslashes(trim($_GET['search']), '\%_'));
$sql = "SELECT name, age, address WHERE name LIKE '%".$search."%'";
[...]
}
}
And here is my output for displaying "x matched y results.":
echo htmlspecialchars(strip_tags($_GET['search']), ENT_QUOTES, 'UTF-8')." matched y results.";
A good way to go about this is to use MySQLi, it uses prepared statements which essentially escapes everything for you on the backend and offers strong protection against SQL injections. Not escaping GET data is just as dangerous as not escaping any other input.
There's two different concerns here that you've identified.
User Data in SQL Statements
Whenever you're constructing a query, you need to be absolutely certain that no arbitrary user data will end up in it. These mistakes are called SQL injection bugs and are the result of failing to correctly escape your data. As a general rule, you should never, ever use string concatenation to compose a query. Whenever possible, use placeholders to ensure that your data is correctly escaped.
User Data in HTML Document
When you're rendering a page that contains user-submitted content, you need to escape it so that the user cannot introduce arbitrary HTML tags or scripting elements. This is avoids XSS issues and means that characters like & and < do not get interpreted incorrectly. User data of "x < y" wouldn't end up breaking your page.
You'll always need to escape for whatever context you're rendering user data into. There are others, like inside a script tag or in a URL, but these are the two most common ones.
Related
1) I have a textarea in my html. Inside the textarea I wrote: <i>ABC Enterprise</i>. When saving into the sql database it saved as <i>XYZ Enterprise</i>
2) Does anyone know how to retain < and </> when saving into the database without converting? If this is not possible, does anyone know how to convert <i>XYZ Enterprise</i> to <i>ABC Enterprise</i> in php? I need the string to maintain this form <i>ABC Enterprise</i> in php not html.
I have tried preg_replace("/&([a-z])[a-z]+;/i", "$1", htmlentities($company)), iconv('utf-8', 'ascii//TRANSLIT', $company), htmlspecialchars($compnay), many other ways I happened to stumble upon on stackoverflow but nothing seemed to work. Any help?
To specifically answer your question:
How to retain <> and </> when inserting into the DB? [paraphrased, emphasis added]
Simple: don't modify your data. As discussed below, however, be smart about it and insert the data using a prepared statement.
Why is your data being changed? Most likely because your code is doing some form of modification of the data before putting it in the database. In PHP, this generally means one of:
htmlentities
htmlspecialchars
The general advice for years was simply "escape all your data or suffer the XSS/CSRF/Sql Injection/other attack consequences!" The problem is that there are nuances of when and how to escape and in the zeal for security, many websites over do it. As you've described your situation, I would consider:
When inserting into the DB: use prepared statements, rather than manual escaping.
When pulling from the DB: be judicious when you apply escaping techniques.
A prepared statement is where you tell the database the format of what you're going to send, then send the data in a separate communication. If there's anything awry, the DB knows best how to find it. For example:
$pstmt = $dbh->prepare('INSERT INTO tab (html) VALUES (?)');
$pstmt->execute(array($_POST['my_textarea']));
Note the lack of any sanitization, using the $_POST variable directly. What the user sent to you is what you put in the DB, with zero modification. Because the DB server was sent a format first, it will not allow any ulterior SQL injection shenanigans.
However, when pulling data out of the DB, you need to be careful of exactly what data goes where. For example, to allow < and > characters inside of the content might be foolhardy, depending on your context. I'll leave it to you to decide whether you want to escape the output inside of your <textarea>:
echo "<textarea>$textarea_content_as_retrieved_from_db</textarea>";
or
echo '<textarea>' . htmlentities( $textarea_content_as_retrieved_from_db ) . '</textarea>';
I'm interested to know whether or not it is necessary to escape output from a MySQL server if the data that is being retrieved has already been filtered when the user submitted a form.
Example:
1. The user submits a form with a comment for a blog post.
2. On form submission, prior to sending data to MySQL server, their input is filtered with FILTER_SANITIZE_SPECIAL_CHARS to prevent injection attacks.
3. Once the data has been posted to server, the user is rerouted to another screen where they can view their comment.
4. When retrieving their comment from the server (which has stored the filtered input), is it necessary to escape this output as well?
Here's the main issue for me. I'm taking user input from a form (for a blog post), sanitizing it with FILTER_SANITIZE_SPECIAL_CHARS, and then posting it to the MySQL server. If I retrieve this information from the server and display it in html, there are no issues. HOWEVER, I have been reading that you should ALWAYS escape output from servers as well. So I escaped the same post with htmlspecialchars(). Now, I have the issue that ALL special chars (including parentheses, and any quotes that are used by the user in their post) are coming back in their escaped html format. Not user friendly whatsoever.
What is the best work around for this, or is it even necessary to escape the output if it is coming from the server and has already been sanitized on user input?
Sanitization is not the same as escaping, and you should make sure not to confuse the two.
Sanitization is removing unwanted input. That is, if the user adds a <script> tag to their input, and you don't want their input to include <script> tags, then removing that <script> tag would be sanitization. Sanitization is not escaping data for an output context.
Escaping is properly encoding data for an output context. For example, to prevent HTML injection, you might call htmlspecialchars() to correctly encode & as &. To prevent SQL injection, you might use mysqli::real_escape_string() to convert ' to \'. (Though it would be highly preferable to use prepared statements / parameterized queries to prevent having to worry about sql injection or escaping at all.)
Importantly, escaping is context-specific. An escaping you use for HTML is not necessarily valid or sufficient for SQL (or vice-versa, or any other output context).
The problem with FILTER_SANITIZE_SPECIAL_CHARS is that that it's poorly named: it's doing both in one step, which is confusing for your database (since your database now has html-encoded data), and confusing for output (because now you have already-escaped data that is vulnerable to being multiply-escaped).
Instead, you should explicitly separate your sanitization and escaping efforts. Only sanitize data on input that you don't want to persist. Only escape data on output, and according to its proper output context.
The reason you want to store raw (pre-output-escaped) data in the database is so that if you ever need to output to a different context (e.g. now you're dong JSON output, or you need to write it to a file, or actually see what the raw data is), you won't need to unescape it first. (If you really have to, you might reasonably store a pre-escaped copy in a separate column, but you should always have your original data available.) It also makes the rule simple: always sanitize input; always escape output.
I read in a PHP book that it is a good practice to use htmlspecialchars and mysqli_real_escape_string in conditions when we handle user inputed data. What is the main difference between these two and where they are appropriate to be used? Please guide me.
htmlspecialchars: "<" to "& lt;"
(Replaces HTML-Code)
mysqli_real_escape_string: " to \"
(Replaces Code, that has a meaning in a mysql-query)
Both are used to be save against some attacks like SQL-Injection and XSS
These two functions are used for completely different things.
htmlspecialchars() converts special HTML characters into entities so that they can be outputted without problems. mysql_real_escape_string() escapes sensitive SQL characters so dynamic queries can be performed without the risk of SQL injection.
You could just as easily say that htmlspecialchars handles sensitive OUTPUT, while mysql_real_escape_string handles sensitive INPUT.
Shai
The two functions are totally unrelated in purpose; the only attribute they share is that they are commonly used to provide safety to web applications.
mysqli_real_escape_string is meant to provide safety against SQL injection.
htmlspecialchars is meant to provide safety against cross-site scripting (XSS).
Also see What's the best method for sanitizing user input with PHP? and Do htmlspecialchars and mysql_real_escape_string keep my PHP code safe from injection?
htmlspecialcharacters turns 'html special characters' into code, such as quotes (both single and double), ampersands, and less than/greater than signs. This function is generally used to ensure that content users post on your website doesn't have HTML tags or XSS scripts.
mysql_real_escape_string escapes strings, meaning it adds the \ in front of slashes, quotes(both single and double), and anything else that can mess up a mysql query. This function ensures that no one is executing SQL commands on your server and getting information from the database.
When to use real_escape_string?
Short: Use when building queries which depend on user submitted data.
Long:
When saving user submitted data to your database in a manner which does not use prepared statements (these are escaped by default). What it does is prevent situations as the following
(DO NOT DO THIS):
txtSQL = "SELECT * FROM Users WHERE UserId = " + $_GET("userid");
Using real_escape_string($_GET("userid") instead of the raw parameter prevents that an attacker gets all users sending a userid parameter which is formed like this: '100 OR 1=1'. This would be concatenated and yield the query:
SELECT * FROM Users WHERE UserId = 100 OR 1=1;
Which would return all users data in the database.
Real escape string would escape 100 OR 1=1 in a way that it would not be interpreted as valid SQL and thus would not yield all user data.
More on SQL injection
When to use htmlspecialchars?
Short: Use when echoing user submitted data to your page.
Long: If user manages to save a string like:
<script>alert("Stealing your cookies")</script>
to your database which is then presented to other users and you echo it without htmlspecialchars the javascript code in the script tag would execute on the users machine, which is just bad news, as now pretty much any data within the browser could be stolen (cookies/localstorage) or the user be redirected.
The resulting string of htmlspecial chars on the aforementioned script tag would be:
<script>alert('Stealing your cookies')'</script>
Which would be displayed on the page and not be interpreted as javascript code.
I want to allow user to put his data into text filed . that text field will be stored in database . And on future steps , this text will be displayed in some pages . Of course in a same way , that user that created . OK, consider this stackoverflow example , i m allowed to put any code or text , anything ; and that code or anything is simple ignored it by its server . so how is this working .
My problem is , i cant trust on users .. user can put anything .. ( may be code -> sql or simple text ) . so i planned to use mysql_real_escape_string() but this function is putting some slash in malicious code. its good .. but i want to put user entered string into database so that i can use it later ( not that sanitized string ) . so how can i ?
Indeed , i am developing CMS which is using database class ( this ) I read about PDO , but making use of this concept may let me to change everything . i want a way except PDO approach . parametric approach favorable
mysql_real_escape_string() does not sanitize or mess up your input in any way, it just prepares your text to be a valid part of a SQL insert statement.
If you get duplicate backslashes before an apostrophe, check if you maybe have "magic quotes" enabled.
An option for you would also be to start using mysqli driver, then you can use prepared statements. This syntax works better against SQL injections. See responses on this SO post: Does mysqli class in PHP protect 100% against sql injections?
When inserting user-provided content into the database, use query parameters or at least escaping to prevent SQL injection. See also my answer to What is SQL injection?
Even if you get strings of code inserted safely into the database, you have a second possible vulnerability:
When displaying content, be aware of risks of Cross-Site Scripting (XSS). When you display the content from the database in an HTML output, it could contain HTML tags or Javascript code that is executed as part of the web page instead of displaying the code.
To help prevent XSS, you must convert tag-open characters with the HTML entity, for instance < should be output as <. This makes sure it is shown as a literal '<' and not interpreted by the user's browser as another tag.
How about encoding the entire string and then inserting it? I use Base64_encode to encode, and do the reverse when retrieving from the database. The characters are alphanumerics (with ==) and they aren't harmful.
You can push the entire encoded string to the client-side and decode it with Javascript.
Here is an example
if (isset($_POST['userdata'])) {
$safestring= base64_encode($_POST['userdata']);
mysql_query("UPDATE table_name SET value_name = '$safestring'
WHERE some_username = 'username'");
}
Do I need to escape/filter data that is coming from the database? Even if said data has already been "escaped" once (at the point in time where it was inserted into the database).
For example, say I allow users to submit blog posts via a form that has a title input and a textarea input.
A malicious user submits the blog post
title: Attackposttitle');DROP TABLE posts;--
textarea: Hahaha nuked ur site noobzors!
Now as this is being inserted into my database, I am going to escape it with mysql_real_escape_string, but once it is in the database I will later reference this data in my php blog application with something like this:
sql="SELECT posttitle FROM posts WHERE id=50";
$posttitlearray = mysql_fetch_array(mysql_query($sql));
This is where my concern is, what if I, for example, run the following query to get the post content:
sql="SELECT postcontent FROM posts WHERE posttitle=$posttitlearray[posttitle]";
In theory am I not sql injecting myself? IE, am I not effectively running the query:
sql="SELECT postcontent FROM posts WHERE posttitle=Attackposttitle');DROP TABLE posts;--";
Or does the "Attackposttitle');DROP TABLE posts;--" data continue to be escaped once it is in the database?
Do I need to continually escape it like so:
sql="SELECT postcontent FROM posts WHERE posttitle=msql_real_escape_string($posttitlearray[posttitle])";
Or is the data safe once it has been escaped initially upon first being inserted into the database?
Thanks Stack!
It does not continue to be escaped once it's put in the database. You'll have to escape it again.
$sql="SELECT postcontent FROM posts WHERE posttitle='".mysql_real_escape_string($posttitlearray[posttitle])."'";
The value should be escaped every time just before insertion to SQL query. Not for magical security reasons, but just to be sure that the syntax of the resultant query is OK.
Escaping the string sound magical to many people, something like shield against some mysterious danger, but in fact it is nothing magical. It is just the way to enable special characters being processed by the query.
The best would be just to have a look what escaping really does. Say the input string is:
Attackposttitle');DROP TABLE posts;--
after escaping:
Attackposttitle\');DROP TABLE posts;--
in fact it escaped only the single slash. That's the only thing you need to assure - that when you insert the string in the query, the syntax will be OK!
insert into posts set title = 'Attackposttitle\');DROP TABLE posts;--'
It's nothing magical like danger shield or something, it is just to ensure that the resultant query has the right syntax! (of course if it doesn't, it can be exploited)
The query parser then looks at the \' sequence and knows that it is still the variable, not ending of its value. It will remove the backslash and the following will be stored in the database:
Attackposttitle');DROP TABLE posts;--
which is exactly the same value as user entered. And which is exactly what you wanted to have in the database!!
So this means that the if you fetch that string from the database and want to use it in the query again, you need to escape it again to be sure that the resultant query has the right syntax.
But, in your example, very important thing to mention is the magic_quotes_gpc directive!
This feature escapes all the user input automatically (gpc - _GET, _POST and _COOKIE). This is an evil feature made for people not aware of sql injection. It is evil for two reasons. First reason is that then you have to distinguish the case of your first and second query - in the first you don't escape and in the second you do. What most people do is to either switch the "feature" off (I prefer this solution) or unescape the user input at first and then escape it again when needed. The unescape code could look like:
function stripslashes_deep($value)
{
return is_array($value) ?
array_map('stripslashes_deep', $value) :
stripslashes($value);
}
if (get_magic_quotes_gpc()) {
$_POST = stripslashes_deep($_POST);
$_GET = stripslashes_deep($_GET);
$_COOKIE = stripslashes_deep($_COOKIE);
}
The second reason why this is evil is because there is nothing like "universal quoting".
When quoting, you always quote text for some particular output, like:
string value for mysql query
like expression for mysql query
html code
json
mysql regular expression
php regular expression
For each case, you need different quoting, because each usage is present within different syntax context. This also implies that the quoting shouldn't be made at the input into PHP, but at the particular output! Which is the reason why features like magic_quotes_gpc are broken (never forget to handle it, or better, assure it is switched off!!!).
So, what methods would one use for quoting in these particular cases? (Feel free to correct me, there might be more modern methods, but these are working for me)
mysql_real_escape_string($str)
mysql_real_escape_string(addcslashes($str, "%_"))
htmlspecialchars($str)
json_encode() - only for utf8! I use my function for iso-8859-2
mysql_real_escape_string(addcslashes($str, '^.[]$()|*+?{}')) - you cannot use preg_quote in this case because backslash would be escaped two times!
preg_quote()
Try using bind variables. which will remove the need to escape your data completely.
http://php.net/manual/en/function.mssql-bind.php
only down side is your restricted to using them with stored procedures in SQL server, other database you can use them for everything.