How to use htmlspecialchars for the php form [duplicate] - php

Is converting special characters to HTML entities in form validation and database query using PHP PDO using htmlspecialchars() function really necessary?
For example, I have a website with simple login system more or less like:
$username = (string) htmlspecialchars($_POST['user']);
$password = (string) htmlspecialchars($_POST['pass']);
$query = $dbh->prepare("select id where username = ? and password = ?")
$query->execute($username, $password);
Note that I also use type casting besides the function in question.. So, is it necessary? Or I can safely use $username = $_POST['user']; ?

Your confusion is quite common because information and examples in books and on the internet including php.net are misleading or ambiguous. The most important thing you can learn when developing web apps is filter input, escape output.
Filter Input
This means that for any data input whether provided by a user on a form or provided by a file from some other source, filter out anything which does not belong. An example would be that if you expect a numeric value, filter out any non-numeric characters. Another example would be limit/ensure the maximum length of data. However, you don't need to get to crazy with this. For example, if you expect a line of text that can contain literally any combination of characters, then trying to come up with a filter will probably only frustrate your users.
So, you generally would store input data in your database as provided with optionally some filtering before hand.
Escape Output
What is meant by escape output is to properly make safe the data for a given media. Most of the time, this media is a web page (html). But, it can also be plain text, xml, pdf, image, etc. For html, this means using htmlspecialchars() or htmlentities() (you can read up on the differences here). For other media types, you would escape/convert as appropriate (or not at all if appropriate).
Now, your question is whether or not you should use htmlspecialchars() on input data that will be used as sql query parameters. The answer is no. You should not modify the data in any way.
Yes, the data contained in $_POST should be considered dangerous. Which is why you should 1) guard against sql injection using prepared statements and bound parameters as you are doing and 2) properly escape/convert data found in $_POST if you place it in html.
There are many frameworks for PHP which handle these details for you and I recommend you pick and use one. However, if you do not, you can still build a safe and secure application. Whether you use a framework or not, I strongly suggest that you read the recommendations suggested by OWASP. Failure to do so will only result in a security nightmare for your web application.

You should use htmlspecialchars when you have some plain text (such as user input, or user input that you previously stored in a database and just took out of it with a SELECT, or text fetched via HTTP from a third party, etc, etc) and you want to insert it into an HTML document. This protects you against XSS.
In general, you should not use it when inserting data into a database (a database is not an HTML document). You might want to use it in some non-HTML form later.

Related

Best practice when sanitizing HTML form user input in PHP / CodeIgniter 4

I have a simple app programmed in PHP using CodeIgniter 4 framework and, as a web application, it has some HTML forms for user input.
I am doing two things:
In my Views, all variables from the database that come from user input are sanitized using CodeIgniter 4's esc() function.
In my Controllers, when reading HTTP POST data, I am using PHP filters:
$data = trim($this->request->getPost('field', FILTER_SANITIZE_SPECIAL_CHARS));
I am not sure if sanitizing both when reading data from POST and when printing/displaying to HTML is a good practice or if it should only be sanitized once.
In addition, FILTER_SANITIZE_SPECIAL_CHARS is not working as I need. I want my HTML form text input to prevent users from attacking with HTML but I want to keep some 'line breaks' my database has from the previous application.
FILTER_SANITIZE_SPECIAL_CHARS will NOT delete HTML tags, it will just store them in the database, not as HTML, but it is also changing my 'line breaks'. Is there a filter that doesn't remove HTML tags (only stores them with proper condification) but that respects \n 'line breaks'?
You don't need to sanitize User input data as explained in the question below:
How can I sanitize user input with PHP?
It's a common misconception that user input can be filtered. PHP even
has a (now deprecated) "feature", called
magic-quotes,
that builds on this idea. It's nonsense. Forget about filtering (or
cleaning, or whatever people call it).
In addition, you don't need to use FILTER_SANITIZE_SPECIAL_CHARS, htmlspecialchars(...), htmlentities(...), or esc(...) either for most use cases:
-Comment from OP (user1314836)
I definitely think that I don't need to sanitize user-input data
because I am not writing SQL directly but rather using CodeIgniter 4's
functions to create SQL safe queries. On the other hand, I do
definitely need to esc() that same information when showing to avoid
showing html where just text is expected.
The reason why you don't need the esc() method for most use cases is:
Most User form input in an application doesn't expect a User to submit/post HTML, CSS, or JavaScript that you plan on displaying/running later on.
If the expected User input is just plain text (username, age, birth date, etc), images, or files, use form validation instead to disallow unexpected data.
I.e: Available Rules and Creating Custom Rules
By using the Query Builder for your database queries and rejecting unexpected User input data using validation rules (alpha, alpha_numeric_punct, numeric, exact_length, min_length[8], valid_date, regex_match[/regex/], uploaded, etc), you can avoid most potential security holes i.e: SQL injections and XSS attacks.
Answer from steven7mwesigwa gets my vote, but here is how you should be thinking about it.
Rules Summary
You should always hold in memory the actual data that you want to process.
You should always convert the data on output into a format that the output can process.
Inputs:
You should strip from all untrusted inputs (user forms, databases that you didn't write to, XML feeds that you don't control etc)
any data that you are unable to process (e.g. if you are not able to handle multi-byte strings as you are not using the right functions, or your DB won't support it, or you can't handle UTF8/16 etc, strip those extra characters you can't handle).
any data that will never form part of the process or output (e.g. if you can only have an integer/bool than convert to int/bool; if you are only showing data on an HTML page, then you may as well trim spaces; if you want a date, strip anything that can't be formatted as a date [or reject*]).
This means that many "traditional" cleaning functions are not needed (e.g. Magic Quotes, strip_tags and so on): but you need to know you can handle the code. You should only strip_tags or escape or so on if you know it is pointless having that data in that field.
Note: For user input I prefer to hold the data as the user entered and reject the form allowing them to try again. e.g. If I'm expected a number and I get "hello" then I'll reload the form with "hello" and tell the user to try again. steven7mwesigwa has links to the validation functions in CI that make that happen.
Outputs:
Choose the correct conversion for the output: and don't get them muddled up.
htmlspecialchars (or family) for outputting to HTML or XML; although this is usually handled by any templating engine you use.
Escaping for DB input; although this should be left to the DB engine you use (e.g. parameterised queries, query builder etc).
urlencode for outputting a URL
as required for saving images, json, API responses etc
Why?
If you do out output conversion on input, then you can easily double-convert an input, or lose track of if you need to make it safe before output, or lose data the user wanted to enter. Mistakes happen but following clean rules will prevent it.
This also mean there is no need to reject special characters (those forms that reject quote marks are horrible user experience, for example, and anyone putting restrictions on what characters can go in a password field are only weakening security)
In your particular case:
Drop the FILTER_SANITIZE_SPECIAL_CHARS on input, hold the data as the user gave it to you
Output using template engine as you have it: this will display < > tags as the user entered then, but won't break your output.
You will essentially sanitize each and every output (that you appear to want to avoid), but that's safer than accidentally missing a sanitize on output and a better user experience than losing stuff they typed.
From my understanding,
FILTER_SANITIZE_SPECIAL_CHARS is used to sanitize the user input before you act on it or store it.
Whereas esc is used to escape HTML etc in the string so they don't interfere with normal html, css etc. It is used for viewing the data.
So, you need both, one for input and the other for output.
Following from codeigniter.com. Note, it uses the Laminas Escaper library.
esc($data[, $context = 'html'[, $encoding]])
Parameters
$data (string|array) – The information to be escaped.
$context (string) – The escaping context. Default is ‘html’.
$encoding (string) – The character encoding of the string.
Returns
The escaped data.
Return type
mixed
Escapes data for inclusion in web pages, to help prevent XSS attacks. This uses the Laminas Escaper library to handle the actual filtering of the data.
If $data is a string, then it simply escapes and returns it. If $data is an array, then it loops over it, escaping each ‘value’ of the key/value pairs.
Valid context values: html, js, css, url, attr, raw
From docs.laminas.dev
What laminas-Escaper is not
laminas-escaper is meant to be used only for escaping data for output, and as such should not be misused for filtering input data. For such tasks, use laminas-filter, HTMLPurifier or PHP's Filter functionality should be used.
Some of the functions they do are similar. Such as both may/will convert < to &lt. However, your stored data may not have come just from user input and it may have < in it. It is perfectly safe to store it this way
but it needs to be escaped for output otherwise the browser could get confused, thinking its html.
I think for this situation using esc is sufficient. FILTER_SANITIZE_SPECIAL_CHARS is a PHP sanitize filter that encode '"<>& and optionally strip or encode other special characters according to the flag. To do that you need to set the flag. It is third parameter in getPost() method. Here is an example
$this->request->getPost('field', FILTER_SANITIZE_SPECIAL_CHARS, FILTER_FLAG_ENCODE_HIGH)
This flag can be change according to your requirements. You can use any PHP filter with a flag. Please refer php documentation for more info.

Ampersand in String with PHP, MySql, HTML [duplicate]

Is converting special characters to HTML entities in form validation and database query using PHP PDO using htmlspecialchars() function really necessary?
For example, I have a website with simple login system more or less like:
$username = (string) htmlspecialchars($_POST['user']);
$password = (string) htmlspecialchars($_POST['pass']);
$query = $dbh->prepare("select id where username = ? and password = ?")
$query->execute($username, $password);
Note that I also use type casting besides the function in question.. So, is it necessary? Or I can safely use $username = $_POST['user']; ?
Your confusion is quite common because information and examples in books and on the internet including php.net are misleading or ambiguous. The most important thing you can learn when developing web apps is filter input, escape output.
Filter Input
This means that for any data input whether provided by a user on a form or provided by a file from some other source, filter out anything which does not belong. An example would be that if you expect a numeric value, filter out any non-numeric characters. Another example would be limit/ensure the maximum length of data. However, you don't need to get to crazy with this. For example, if you expect a line of text that can contain literally any combination of characters, then trying to come up with a filter will probably only frustrate your users.
So, you generally would store input data in your database as provided with optionally some filtering before hand.
Escape Output
What is meant by escape output is to properly make safe the data for a given media. Most of the time, this media is a web page (html). But, it can also be plain text, xml, pdf, image, etc. For html, this means using htmlspecialchars() or htmlentities() (you can read up on the differences here). For other media types, you would escape/convert as appropriate (or not at all if appropriate).
Now, your question is whether or not you should use htmlspecialchars() on input data that will be used as sql query parameters. The answer is no. You should not modify the data in any way.
Yes, the data contained in $_POST should be considered dangerous. Which is why you should 1) guard against sql injection using prepared statements and bound parameters as you are doing and 2) properly escape/convert data found in $_POST if you place it in html.
There are many frameworks for PHP which handle these details for you and I recommend you pick and use one. However, if you do not, you can still build a safe and secure application. Whether you use a framework or not, I strongly suggest that you read the recommendations suggested by OWASP. Failure to do so will only result in a security nightmare for your web application.
You should use htmlspecialchars when you have some plain text (such as user input, or user input that you previously stored in a database and just took out of it with a SELECT, or text fetched via HTTP from a third party, etc, etc) and you want to insert it into an HTML document. This protects you against XSS.
In general, you should not use it when inserting data into a database (a database is not an HTML document). You might want to use it in some non-HTML form later.

Preserving all characters in textarea & writing to MySQL [duplicate]

Is converting special characters to HTML entities in form validation and database query using PHP PDO using htmlspecialchars() function really necessary?
For example, I have a website with simple login system more or less like:
$username = (string) htmlspecialchars($_POST['user']);
$password = (string) htmlspecialchars($_POST['pass']);
$query = $dbh->prepare("select id where username = ? and password = ?")
$query->execute($username, $password);
Note that I also use type casting besides the function in question.. So, is it necessary? Or I can safely use $username = $_POST['user']; ?
Your confusion is quite common because information and examples in books and on the internet including php.net are misleading or ambiguous. The most important thing you can learn when developing web apps is filter input, escape output.
Filter Input
This means that for any data input whether provided by a user on a form or provided by a file from some other source, filter out anything which does not belong. An example would be that if you expect a numeric value, filter out any non-numeric characters. Another example would be limit/ensure the maximum length of data. However, you don't need to get to crazy with this. For example, if you expect a line of text that can contain literally any combination of characters, then trying to come up with a filter will probably only frustrate your users.
So, you generally would store input data in your database as provided with optionally some filtering before hand.
Escape Output
What is meant by escape output is to properly make safe the data for a given media. Most of the time, this media is a web page (html). But, it can also be plain text, xml, pdf, image, etc. For html, this means using htmlspecialchars() or htmlentities() (you can read up on the differences here). For other media types, you would escape/convert as appropriate (or not at all if appropriate).
Now, your question is whether or not you should use htmlspecialchars() on input data that will be used as sql query parameters. The answer is no. You should not modify the data in any way.
Yes, the data contained in $_POST should be considered dangerous. Which is why you should 1) guard against sql injection using prepared statements and bound parameters as you are doing and 2) properly escape/convert data found in $_POST if you place it in html.
There are many frameworks for PHP which handle these details for you and I recommend you pick and use one. However, if you do not, you can still build a safe and secure application. Whether you use a framework or not, I strongly suggest that you read the recommendations suggested by OWASP. Failure to do so will only result in a security nightmare for your web application.
You should use htmlspecialchars when you have some plain text (such as user input, or user input that you previously stored in a database and just took out of it with a SELECT, or text fetched via HTTP from a third party, etc, etc) and you want to insert it into an HTML document. This protects you against XSS.
In general, you should not use it when inserting data into a database (a database is not an HTML document). You might want to use it in some non-HTML form later.

Using htmlspecialchars function with PDO prepare and execute

Is converting special characters to HTML entities in form validation and database query using PHP PDO using htmlspecialchars() function really necessary?
For example, I have a website with simple login system more or less like:
$username = (string) htmlspecialchars($_POST['user']);
$password = (string) htmlspecialchars($_POST['pass']);
$query = $dbh->prepare("select id where username = ? and password = ?")
$query->execute($username, $password);
Note that I also use type casting besides the function in question.. So, is it necessary? Or I can safely use $username = $_POST['user']; ?
Your confusion is quite common because information and examples in books and on the internet including php.net are misleading or ambiguous. The most important thing you can learn when developing web apps is filter input, escape output.
Filter Input
This means that for any data input whether provided by a user on a form or provided by a file from some other source, filter out anything which does not belong. An example would be that if you expect a numeric value, filter out any non-numeric characters. Another example would be limit/ensure the maximum length of data. However, you don't need to get to crazy with this. For example, if you expect a line of text that can contain literally any combination of characters, then trying to come up with a filter will probably only frustrate your users.
So, you generally would store input data in your database as provided with optionally some filtering before hand.
Escape Output
What is meant by escape output is to properly make safe the data for a given media. Most of the time, this media is a web page (html). But, it can also be plain text, xml, pdf, image, etc. For html, this means using htmlspecialchars() or htmlentities() (you can read up on the differences here). For other media types, you would escape/convert as appropriate (or not at all if appropriate).
Now, your question is whether or not you should use htmlspecialchars() on input data that will be used as sql query parameters. The answer is no. You should not modify the data in any way.
Yes, the data contained in $_POST should be considered dangerous. Which is why you should 1) guard against sql injection using prepared statements and bound parameters as you are doing and 2) properly escape/convert data found in $_POST if you place it in html.
There are many frameworks for PHP which handle these details for you and I recommend you pick and use one. However, if you do not, you can still build a safe and secure application. Whether you use a framework or not, I strongly suggest that you read the recommendations suggested by OWASP. Failure to do so will only result in a security nightmare for your web application.
You should use htmlspecialchars when you have some plain text (such as user input, or user input that you previously stored in a database and just took out of it with a SELECT, or text fetched via HTTP from a third party, etc, etc) and you want to insert it into an HTML document. This protects you against XSS.
In general, you should not use it when inserting data into a database (a database is not an HTML document). You might want to use it in some non-HTML form later.

Security measures - when and how

Currently I am upgrading a web application in which I will get most of the input from logged in users. The input will contains valid html, images, audio, video & upload facilities to user defined path. The application then formats it into nice ui and displays to end users. These privileged users can add / modify / delete the content using a web based interface.
As per the basic rule of thumb: I should escape my data before entering in DB, and not to receive data receive from user. To achieve that I have planned to follow following security measures. Which also includes my questions
I am using prepared statements to store all user inputs to DB. I hope this eliminates the DB injection threat.
Is this measure enough? or do i need to check for % and _ symbols as well for mysql LIKE queries?
The user input (lets call input A), where I am not expecting any HTML/css, I use strip_tags & htmlentities before inserting in DB.
Is this adequate measure ? Should I be using more
The user input (lets call input B), in which user can have html/css tags, I user htmlentities on text then insert in DB.
As far as I am aware I should not use htmlentities before inserting in the DB, but have to as previous programmer was using it. Are there any negative impacts for this?
After fetching from DB and Before displaying the input A / input B , I am not doing any pre processing assuming, the data added to DB should be clean.
Should i process / sanitize the data before displaying ? If yes then how ?
I want to html tags enters by user to be parsed by browser and not displayed to user. e.g. if user had entered <p style='color:red;'>hello</p><p class='noclass'>world</p>, I want user to see 2 words only and not actual text.
To achieve this how can I make sure that user doesn't add malicious script and at the same time the html tags are stored, fetched and parsed by browser correctly.
Please guide if the current approach is sufficient / not sufficient / less / incorrect.
I am neither a 100% newbie to php nor I m pro. I know the basics about php (or we can say over all web applications') security. So can someone can please guide me if I am making any mistake security wise OR should not be doing something OR should be doing something more or less.
I know the basics of security but I still get confused over
Which exact security measure to apply at which exact point ? (e.g. escape string BEFORE inserting to DB)
At every point what the functions available in php? (e.g. to escape strings use prepared statements)
Yes, prepared statements are great at preventing SQL injections problems. Yes, you will have to take care of % and _ in LIKE queries, a prepared statement cannot escape them since it has no way to know whether you want those values there or not.
through 5.: It's always a bad idea to escape data going into the database for a format it's destined for on output. Why? First of all, why are you so sure you're always going to use the data in an HTML context? Maybe you'll be using it in a different format in the future, and then you'll have garbage looking data. (This is more hypothetical in your case, as you're explicitly storing HTML.)
Secondly though, your output code will have to rely on your input code to correctly have escaped data in advance, possibly with a long time between input and output. Your output code can have no confidence whatsoever that the input code did the correct job for what the output code needs it to do. Therefore, escaping for output must happen at the time of output. No sooner, no later.
Thirdly (is that a word?), strip_tags is absolutely insufficient to accept some HTML but not other "insecure" HTML. You need a more complex library which has more complex whitelisting rules than what strip_tags can do. Supposedly the only library that does that is HTML Purifier. I'd run all user HTML through it.
To summarise:
Prepared statements.
HTML-escape data that is not supposed to contain literal HTML on output.
Run any data that is supposed to contain literal HTML through HTML Purifier. Whether you do this before or after inserting to the database is up to you, depending on whether you want to store the literal input the user sent you or whether you don't mind discarding that original data immediately and storing only sanitised data instead. But, the same caveat about having confidence in your output code applies too.

Categories