Ive read every post here on escaping and unfortunately almost every one has disagreements amongst posters so I just want to ask the community about my specific situation before I make a major mistake because I misunderstood another post.
I am storing user preferences in a MySQL database where I personally place the information directly into the database myself, not user submitted inputs.
My questions are:
1.) If I am running a PHP query and placing the query result into other PHP code blocks, not as HTML but just as things like other queries, ie(SELECT * from $queryresult) there is no need to escape this correct?
2.) If I am outputting what I stored in the database as html directly from the database do I need to sanitize this output in anyway. My understanding is that sanitization is strictly for user submitted input. Need I really worry about data coming out of database fields I personally populated.
I think I know the answers here after reading but I dont want to leave any room for error on this one.
Question 1 - Escaping data for MySQL queries
No, you must always escape data in your queries, regardless of the source. Data escaping is for the query parser. Even if the data comes from your own code, you must escape it.
Learn to use PDO to avoid this problem.
Question 2 - Escaping data for HTML
If you are outputting data to HTML, you must always escape it with htmlspecialchars() or equivalent. This is so you don't have to worry about bad HTML code, as well as XSS.
Related
The reason I ask this question is because I was checking stackoverflow for answer, and since 2012/13 it no longer seems to be a hot topic and all the answers documentation is deprecated. Could you please tell me if we still should be doing this and if so what's a secure way to do so? I'm specifically talking about user defined post data...
Update: the string will be html inputted from user and posted into my dB.
The short answer is yes. Even in 2017 you should be escaping strings in PHP. PHP does not do it by itself because not every developer will want to develop a product / functionality that needs to escape user input (for whatever that reason may be).
If you are echoing user inputted data to a webpage, you should use the function htmlspecialchars() to stop potential malicious coding from executing upon being read by your browser.
When you are retrieving data from a client, you can also use the FILTER_INPUT functions to validate incoming data to validate that the clients data is actually the data you want (e.g checking that no one has bypassed your client side validation and has entered Illegal characters into the data)
From my experience these are two great functions that can be used to 1:) escape output to a client and 2:) prevent the chance of malicious code being stored/processed on your server.
It depends entirely on what you are going to do with the string.
If you are going to treat it as code (whether that code is HTML, JavaScript, PHP, SQL or something else) then it will need escaping.
PHP is not able to tell if you trust the source of the data to write safe code.
In 2017 this is what is usually done in the scenario you describe:
The user inputs text in a form, the text is sent to the server, before that the text is url encoded (this is one form or escaping). This is typically done by the browser/javascript so no need to do it manually (but it does happen).
The server receives the text, decodes it and then creates a MySQL insert/update statement to store it in the database. While some people still run the mysqli_real_escape_string on it, the recommended way is to use prepared statements instead. Therefore in this aspect you do not need to do the escaping, however prepared statements delegate escaping to the database (so again escaping does happen)
If the user inputted text is to be presented back on a page then it is encoded via htmlentities or similar (which is itself another form of escaping). This is mostly ran manually although most new view template frameworks (e.g. twig or blade) take care of that for us.
So that's how it is today as far as I know. Escaping is very much required, but the programmer actually doing it is not so much a requirement if modern frameworks and practices are used.
Yes, escaping the strings from the request (and therefore imputable by the user) is a practical requirement because PHP makes available the data actually added to the payload of the request without any modification that could invalidate the data itself (not all the data needs Of escaping), so any subsequent processing on that data must be made and under the developer's control.
The escape of variables in database interaction operations to prevent SQL Injections.
In past versions of PHP there was the "magic_quoteas" feature that filtered every variable in GET or POST. But it is deprecated and is not a best practice. Why Not?
The state of the art in querying DB is predominantly in using the PDO driver with the prepared statement. At the time the variable is bound, the variable will be escaped automatically.
$conn->prepare('SELECT * FROM users WHERE name = :name');
$conn->bindParam(':name',$_GET['username']); //this do the escape too
$conn->execute();
Alternatively, mysql_real_escape_string manages it manually.
Alternatively, mysqli::real_escape_string manages it manually.
I've noticed on a few SQL injection protection tutorials, that the tutor will say something like "you must sanitize all user data when it is output to the page" - or something along the lines of that..
However, I am confused as to why this is necessary to do so.
One of the reasons for that would be; don't you normally escape and sanitize data on input into the database, thus removing the need to then re-sanitize it on output once again..
Just seems a bit pointless to me, and I've been searching around looking for an answer as to why this is needed, but I can't seem to find any pages which explain how it will protect against any attacks.
If someone could fill me in, that would be greatly appreciated,
thanks.
Databases are not webpages.
What makes data safe to be placed in a string of SQL is not the same as what makes data safe to be placed in an HTML document.
Even if it was, the act of reading the data from the database would give you the unescaped data. To take a trivial example: Given INSERT INTO foo (a, b) VALUES ('John', 'O\'Brian'):, the \ is not inserted into the database. It just stops the ' from ending the string in the SQL.
When you insert data into a database, you need to escape it (or use prepared statements) to defend against SQL Injection attacks.
When you insert data into a webpage, you need to escape it (or use a DOM based whitelist filter) to defend against XSS attacks.
Now you could try to defend against XSS when you insert the data into the database instead of when you insert it into the page, but that is premature and can cause bigger issues down the line. It means you'll be storing HTML in the database instead of text, which is less useful if you decide you want to use the text for some other purpose (like inserting into an email or just being searched).
First, tutorials including that you should escape the data before inserting it into the database are most often bad or outdated. Always use prepared statements with parameters instead of manually escaping the data.
Escaping of output data is to prevent cross site scripting (XSS) attacks against your users.
I've been working with PHP for some time and I began asking myself if I'm developing good habits.
One of these is what I belive consists of overusing PHP sanitizing methods, for example, one user registers through a form, and I get the following post variables:
$_POST['name'], $_POST['email'] and $_POST['captcha']. Now, what I usually do is obviously sanitize the data I am going to place into MySQL, but when comparing the captcha, I also sanitize it.
Therefore I belive I misunderstood PHP sanitizing, I'm curious, are there any other cases when you need to sanitize data except when using it to place something in MySQL (note I know sanitizing is also needed to prevent XSS attacks). And moreover, is my habit to sanitize almost every variable coming from user-input, a bad one ?
Whenever you store your data someplace, and if that data will be read/available to (unsuspecting) users, then you have to sanitize it. So something that could possibly change the user experience (not necessarily only the database) should be taken care of. Generally, all user input is considered unsafe, but you'll see in the next paragraph that some things might still be ignored, although I don't recommend it whatsoever.
Stuff that happens on the client only is sanitized just for a better UX (user experience, think about JS validation of the form - from the security standpoint it's useless because it's easily avoidable, but it helps non-malicious users to have a better interaction with the website) but basically, it can't do any harm because that data (good or bad) is lost as soon as the session is closed. You can always destroy a webpage for yourself (on your machine), but the problem is when someone can do it for others.
To answer your question more directly - never worry about overdoing it. It's always better to be safe than sorry, and the cost is usually not more than a couple of milliseconds.
The term you need to search for is FIEO. Filter Input, Escape Output.
You can easily confound yourself if you do not understand this basic principle.
Imagine PHP is the man in the middle, it receives with the left hand and doles out with the right.
A user uses your form and fills in a date form, so it should only accept digits and maybe, dashes. e.g. nnnnn-nn-nn. if you get something which does not match that, then reject it.
That is an example of filtering.
Next PHP, does something with it, lets say storing it in a Mysql database.
What Mysql needs is to be protected from SQL injection, so you use PDO, or Mysqli's prepared statements to make sure that EVEN IF your filter failed you cannot permit an attack on your database. This is an example of Escaping, in this case escaping for SQL storage.
Later, PHP gets the data from your db and displays it onto a HTML page. So you need to Escape the data for the next medium, HTML (this is where you can permit XSS attacks).
In your head you have to divide each of the PHP 'protective' functions into one or other of these two families, Filtering or Escaping.
Freetext fields are of course more complex than filtering for a date, but never mind, stick to the principles and you will be OK.
Hoping this helps http://phpsec.org/projects/guide/
I am a little confused on this. I have been reading about htmlspecialchars() and I am planning to use this for the textareas POST to prevent XSS attack. I understand that usually htmlspecialchars() are used to generate the HTML output that is sent to the browser. But what I am not sure is:
1) Is it a safe practice to use htmlspecialchars() to the user input data before I insert it into MySQL? I am already using PDO prepared statement with parameterized values to prevent SQL Injection.
2) Or, I really dont need to worry about using htmlspecialchars() to inserted values (provided they are parameterized) and only use htmlspecialchars() when I fetch results from MySQL and display it to users?
As others have pointed out, #2 is the correct answer. Leave it "raw" until you need it, then escape appropriately.
To elaborate on why (and I will repeat/summarise the other posts), let's take scenario 1 to its logical extreme.
What happens when someone enters " ' OR 1=1 <other SQL injection> -- ". Now maybe you decide that because you use SQL you should encode for SQL (maybe because you didn't use parameterised statements). So now you have to mix (or decide on) SQL & HTML encoding.
Suddenly your boss decides he wants an XML output too. Now to keep your pattern consistent you need to encode for that as well.
Next CSV - oh no! What if there are quotes and commas in the text? More escaping!
Hey - how about a nice interactive, AJAX interface? Now you probably want to start sending JSON back to the browser so now {, [ etc. all need to be taken into consideration. HELP!!
So clearly, store the data as given (subject to domain constraints of course) and encode appropriate to your output at the time you need it. Your output is not the same as your data.
I hope this answer is not too patronising. Credit to the other respondents.
I am very confused over something and was wondering if someone could explain.
In PHP i validate user input so htmlentitiies, mysql_real_escape_string is used before inserting into database, not on everything as i do prefer to use regular expressions when i can although i find them hard to work with. Now obviously i will use mysql_real_escape_string as the data is going into the database but not sure should i be using htmlentities() only when getting data from database and displaying it on a webpage as doing so before hand is altering the data entered by a person which is not keeping it's original form which may cause problems if i want to use that data later on for use for something else.
So for example, i have a guestbook with 3 fields name, subject and message. Now obviously the fields can contain anything like malicious code in js tags basically anything, now what confuses me is let say i am a malicious person and i decided to use js tags and some malicous js code and submit the form, now basically i have malicious useless data in my database. Now by using htmlentities when outputting the malicious code to the webpage (guestbook) that is not a problem because htmlentities has converted it to it's safe equivalent but then at the same time i have useless malicious code in the database that i would rather not have.
So after saying all this my question is should i accept the fact that some data in the database maybe malicious, useless data and as long as i use htmlentities on output everything will be ok or should i be doing something else aswell?.
I read so many books saying about filtering data on receiving it and escaping it on outputting it so the original form is kept but they only ever give examples like ensuring a field is only an int using functions already built into php etc but i have never found anything in regards ensuring something like a guestbook where you want users to type anything they want but also how you would filter such data apart from mysql_real_escape_string() to ensure it does not break the DB query?
Could someone please finally close this confusion for me and tell me what i should be doing and what is best practice?
Thanks to anyone who can explain.
Cheers!
This is a long question, but I think what you're actually asking boils down to:
"Should I escape HTML before inserting it into my database, or when I go to display it?"
The generally accepted answer to this question is that you should escape the HTML (via htmlspecialchars) when you go to display it to the user, and not before putting it into the database.
The reason is this: a database stores data. What you are putting into it is what the user typed. When you call mysql_real_escape_string, it does not alter what is inserted into the database; it merely avoids interpreting the user's input as SQL statements. htmlspecialchars does the same thing for HTML; when you print the user's input, it will avoid having it interpreted as HTML. If you were to call htmlspecialchars before the insert, you are no longer being faithful.
You should always strive to have the maximum-fidelity representation you can get. Since storing the "malicious" code in your database does no harm (in fact, it saves you some space, since escaped HTML is longer than unescaped!), and you might in the future want that HTML (what if you use an XML parser on user comments, or some day let trusted users have a subset of HTML in their comments, or some such?), why not let it be?
You also ask a bit about other types of input validation (integer constraints, etc). Your database schema should enforce these, and they can also be checked at the application layer (preferably on input via JS and then again server side).
On another note, the best way to do database escaping with PHP is probably to use PDO, rather than calling mysql_real_escape_string directly. PDO has more advanced functionality, including type checking.
mysql_real_escape_string() is all you need for the database operations. It'll ensure that a malicious user can't embed something into data that'll "break" your queries.
htmlentities() and htmlspecialchars() come into play when you're working with sending stuff to the client/browser. If you want to clean up potentially hostile HTML, you'd be better off using HTMLPurifier, which will strip the data to the bedrock and hose it down with bleach and rebuild it properly.
There's no reason to worry about having malicious JavaScript code in the database if you're escaping the HTML when it comes out. Just make sure you always do escape anything that comes out of the DB.