What is the securest way to add html/css/js to mysql? - php

I'm currently using the following PHP class to store html, css and javascript code to my mysql database.
function filter($data) {
$data = trim(htmlentities(strip_tags($data)));
if (get_magic_quotes_gpc())
$data = stripslashes($data);
$data= strip_tags($data);
$data = mysql_real_escape_string($data);
return $data;}
I' really wondering if the used code is secure enough to store HTML / CSS / JS code in a mysql database?

Yes, MySQL can store any type of text technically safely. Which means, MySQL will save the text as is and will return it again without loosing any data.
Mysql does not differ between the content of the text, so it makes no difference if it is HTML, CSS, JS code or your friends last email.
However if you output the text later on you should take care that there is no unwanted code injection after you've pulled the data from mysql. But that's not related to MySQL actually.
To make you sql more secure, pass the database handle to mysql_real_escape_string or even better use MySQLi and/or PDO and prepared statements.
Your code
Your code looks like you're trying a lot to prevent something, but in the end it turns out pretty useless:
function filter($data) {
$data = trim(htmlentities(strip_tags($data)));
if (get_magic_quotes_gpc())
$data = stripslashes($data);
$data= strip_tags($data);
$data = mysql_real_escape_string($data);
return $data;}
Normalize the data before you process it
First of all you should change the position of the check for get_magic_quotes_gpc to normalize the data the function is working on. It would be even better if your application would not rely on it but just denies working if that option is enabled - see this important information here about that if you care about security.
But for the safeness of your code posted, let's first normalize the input value to the function before processing it further. This is done by moving the check to the top of the function.
function filter($data)
{
// normalize $data because of get_magic_quotes_gpc
$dataNeedsStripSlashes = get_magic_quotes_gpc();
if ($dataNeedsStripSlashes)
{
$data = stripslashes($data);
}
// normalize $data because of whitespace on beginning and end
$data = trim($data);
// strip tags
$data = strip_tags($data);
// replace characters with their HTML entitites
$data = htmlentities($data);
// mysql escape string
$data = mysql_real_escape_string($data);
return $data;
}
In this modified function, the magic quotes stuff (which you should not use) has been moved to the top of it. This ensures that regardless of that option is on or off, data will always be processed the same. Your function did not do so, it would have created different results for the same data passed. So this has been fixed.
More Problems with your function
Even the function looks better now, it still has many problems. For example, it's unclear what the function actually does. It does many things at once and some of them are contradictory:
It removes HTML tags which is a sign that $data should not contain HTML
But then you convert the text of $data to have actually contain HTML entities.
So what should the data be? HTML or not? It does not introduce more security if things become unclear because this will benefit that errors come into your program and in the end even pass your security precautions.
So you should just throw away the code and consider the following:
If input to your application is invalid, don't filter it. Instead prevent further use of invalid input. So you need a function to validate input before you make use of it.
Don't change data just because you think this might make something more secure. Instead change and encode data where it is needed and appropriate.
Make your application only work with magic quotes off. Relying on this feature is highly discouraged. And then there is no need to check for that all over in your code.
To store something safely within the database, escape the data prior using it in the query only. Not at some other place of your application. Use Prepared statements for that.
No need to wrangle the data before you put it into the database if it's valid. But you need to properly encode it when output it to the webpage. And only there an application does know in which encoding this needs to be. You do not know that when you put the data into the database.
So if you want to make your code more secure, this is not about throwing a bunch of functions onto some data because you think those are security related. By doing so you don't make your software more secure but less secure.
Never trust user data.
Ensure data is in the format you need it prior processing.
Use the right tool for the job at the right place.
Never use tools at guess. Get knowledge instead, that pays not only security wise.

Related

Would this be a good PHP function to deter some XSS? [duplicate]

How do I prevent XSS (cross-site scripting) using just HTML and PHP?
I've seen numerous other posts on this topic but I have not found an article that clear and concisely states how to actually prevent XSS.
Basically you need to use the function htmlspecialchars() whenever you want to output something to the browser that came from the user input.
The correct way to use this function is something like this:
echo htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
Google Code University also has these very educational videos on Web Security:
How To Break Web Software - A look at security vulnerabilities in
web software
What Every Engineer Needs to Know About Security
and Where to Learn It
One of the most important steps is to sanitize any user input before it is processed and/or rendered back to the browser. PHP has some "filter" functions that can be used.
The form that XSS attacks usually have is to insert a link to some off-site javascript that contains malicious intent for the user. Read more about it here.
You'll also want to test your site - I can recommend the Firefox add-on [XSS Me]. Looks like Easy XSS is now the way to go.
Cross-posting this as a consolidated reference from the SO Documentation beta which is going offline.
Problem
Cross-site scripting is the unintended execution of remote code by a web client. Any web application might expose itself to XSS if it takes input from a user and outputs it directly on a web page. If input includes HTML or JavaScript, remote code can be executed when this content is rendered by the web client.
For example, if a 3rd party side contains a JavaScript file:
// http://example.com/runme.js
document.write("I'm running");
And a PHP application directly outputs a string passed into it:
<?php
echo '<div>' . $_GET['input'] . '</div>';
If an unchecked GET parameter contains <script src="http://example.com/runme.js"></script> then the output of the PHP script will be:
<div><script src="http://example.com/runme.js"></script></div>
The 3rd party JavaScript will run and the user will see "I'm running" on the web page.
Solution
As a general rule, never trust input coming from a client. Every GET parameter, POST or PUT content, and cookie value could be anything at all, and should therefore be validated. When outputting any of these values, escape them so they will not be evaluated in an unexpected way.
Keep in mind that even in the simplest applications data can be moved around and it will be hard to keep track of all sources. Therefore it is a best practice to always escape output.
PHP provides a few ways to escape output depending on the context.
Filter Functions
PHPs Filter Functions allow the input data to the php script to be sanitized or validated in many ways. They are useful when saving or outputting client input.
HTML Encoding
htmlspecialchars will convert any "HTML special characters" into their HTML encodings, meaning they will then not be processed as standard HTML. To fix our previous example using this method:
<?php
echo '<div>' . htmlspecialchars($_GET['input']) . '</div>';
// or
echo '<div>' . filter_input(INPUT_GET, 'input', FILTER_SANITIZE_SPECIAL_CHARS) . '</div>';
Would output:
<div><script src="http://example.com/runme.js"></script></div>
Everything inside the <div> tag will not be interpreted as a JavaScript tag by the browser, but instead as a simple text node. The user will safely see:
<script src="http://example.com/runme.js"></script>
URL Encoding
When outputting a dynamically generated URL, PHP provides the urlencode function to safely output valid URLs. So, for example, if a user is able to input data that becomes part of another GET parameter:
<?php
$input = urlencode($_GET['input']);
// or
$input = filter_input(INPUT_GET, 'input', FILTER_SANITIZE_URL);
echo 'Link';
Any malicious input will be converted to an encoded URL parameter.
Using specialised external libraries or OWASP AntiSamy lists
Sometimes you will want to send HTML or other kind of code inputs. You will need to maintain a list of authorised words (white list) and un-authorized (blacklist).
You can download standard lists available at the OWASP AntiSamy website. Each list is fit for a specific kind of interaction (ebay api, tinyMCE, etc...). And it is open source.
There are libraries existing to filter HTML and prevent XSS attacks for the general case and performing at least as well as AntiSamy lists with very easy use.
For example you have HTML Purifier
In order of preference:
If you are using a templating engine (e.g. Twig, Smarty, Blade), check that it offers context-sensitive escaping. I know from experience that Twig does. {{ var|e('html_attr') }}
If you want to allow HTML, use HTML Purifier. Even if you think you only accept Markdown or ReStructuredText, you still want to purify the HTML these markup languages output.
Otherwise, use htmlentities($var, ENT_QUOTES | ENT_HTML5, $charset) and make sure the rest of your document uses the same character set as $charset. In most cases, 'UTF-8' is the desired character set.
Also, make sure you escape on output, not on input.
Many frameworks help handle XSS in various ways. When rolling your own or if there's some XSS concern, we can leverage filter_input_array (available in PHP 5 >= 5.2.0, PHP 7.)
I typically will add this snippet to my SessionController, because all calls go through there before any other controller interacts with the data. In this manner, all user input gets sanitized in 1 central location. If this is done at the beginning of a project or before your database is poisoned, you shouldn't have any issues at time of output...stops garbage in, garbage out.
/* Prevent XSS input */
$_GET = filter_input_array(INPUT_GET, FILTER_SANITIZE_STRING);
$_POST = filter_input_array(INPUT_POST, FILTER_SANITIZE_STRING);
/* I prefer not to use $_REQUEST...but for those who do: */
$_REQUEST = (array)$_POST + (array)$_GET + (array)$_REQUEST;
The above will remove ALL HTML & script tags. If you need a solution that allows safe tags, based on a whitelist, check out HTML Purifier.
If your database is already poisoned or you want to deal with XSS at time of output, OWASP recommends creating a custom wrapper function for echo, and using it EVERYWHERE you output user-supplied values:
//xss mitigation functions
function xssafe($data,$encoding='UTF-8')
{
return htmlspecialchars($data,ENT_QUOTES | ENT_HTML401,$encoding);
}
function xecho($data)
{
echo xssafe($data);
}
<?php
function xss_clean($data)
{
// Fix &entity\n;
$data = str_replace(array('&','<','>'), array('&amp;','&lt;','&gt;'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);
// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);
// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);
do
{
// Remove really unwanted tags
$old_data = $data;
$data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
}
while ($old_data !== $data);
// we are done...
return $data;
}
You are also able to set some XSS related HTTP response headers via header(...)
X-XSS-Protection "1; mode=block"
to be sure, the browser XSS protection mode is enabled.
Content-Security-Policy "default-src 'self'; ..."
to enable browser-side content security. See this one for Content Security Policy (CSP) details: http://content-security-policy.com/
Especially setting up CSP to block inline-scripts and external script sources is helpful against XSS.
for a general bunch of useful HTTP response headers concerning the security of you webapp, look at OWASP: https://www.owasp.org/index.php/List_of_useful_HTTP_headers
Use htmlspecialchars on PHP. On HTML try to avoid using:
element.innerHTML = “…”;
element.outerHTML = “…”;
document.write(…);
document.writeln(…);
where var is controlled by the user.
Also obviously try avoiding eval(var),
if you have to use any of them then try JS escaping them, HTML escape them and you might have to do some more but for the basics this should be enough.
The best way to protect your input it's use htmlentities function.
Example:
htmlentities($target, ENT_QUOTES, 'UTF-8');
You can get more information here.

PHP and PDO Protecting script advice [duplicate]

How do I prevent XSS (cross-site scripting) using just HTML and PHP?
I've seen numerous other posts on this topic but I have not found an article that clear and concisely states how to actually prevent XSS.
Basically you need to use the function htmlspecialchars() whenever you want to output something to the browser that came from the user input.
The correct way to use this function is something like this:
echo htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
Google Code University also has these very educational videos on Web Security:
How To Break Web Software - A look at security vulnerabilities in
web software
What Every Engineer Needs to Know About Security
and Where to Learn It
One of the most important steps is to sanitize any user input before it is processed and/or rendered back to the browser. PHP has some "filter" functions that can be used.
The form that XSS attacks usually have is to insert a link to some off-site javascript that contains malicious intent for the user. Read more about it here.
You'll also want to test your site - I can recommend the Firefox add-on [XSS Me]. Looks like Easy XSS is now the way to go.
Cross-posting this as a consolidated reference from the SO Documentation beta which is going offline.
Problem
Cross-site scripting is the unintended execution of remote code by a web client. Any web application might expose itself to XSS if it takes input from a user and outputs it directly on a web page. If input includes HTML or JavaScript, remote code can be executed when this content is rendered by the web client.
For example, if a 3rd party side contains a JavaScript file:
// http://example.com/runme.js
document.write("I'm running");
And a PHP application directly outputs a string passed into it:
<?php
echo '<div>' . $_GET['input'] . '</div>';
If an unchecked GET parameter contains <script src="http://example.com/runme.js"></script> then the output of the PHP script will be:
<div><script src="http://example.com/runme.js"></script></div>
The 3rd party JavaScript will run and the user will see "I'm running" on the web page.
Solution
As a general rule, never trust input coming from a client. Every GET parameter, POST or PUT content, and cookie value could be anything at all, and should therefore be validated. When outputting any of these values, escape them so they will not be evaluated in an unexpected way.
Keep in mind that even in the simplest applications data can be moved around and it will be hard to keep track of all sources. Therefore it is a best practice to always escape output.
PHP provides a few ways to escape output depending on the context.
Filter Functions
PHPs Filter Functions allow the input data to the php script to be sanitized or validated in many ways. They are useful when saving or outputting client input.
HTML Encoding
htmlspecialchars will convert any "HTML special characters" into their HTML encodings, meaning they will then not be processed as standard HTML. To fix our previous example using this method:
<?php
echo '<div>' . htmlspecialchars($_GET['input']) . '</div>';
// or
echo '<div>' . filter_input(INPUT_GET, 'input', FILTER_SANITIZE_SPECIAL_CHARS) . '</div>';
Would output:
<div><script src="http://example.com/runme.js"></script></div>
Everything inside the <div> tag will not be interpreted as a JavaScript tag by the browser, but instead as a simple text node. The user will safely see:
<script src="http://example.com/runme.js"></script>
URL Encoding
When outputting a dynamically generated URL, PHP provides the urlencode function to safely output valid URLs. So, for example, if a user is able to input data that becomes part of another GET parameter:
<?php
$input = urlencode($_GET['input']);
// or
$input = filter_input(INPUT_GET, 'input', FILTER_SANITIZE_URL);
echo 'Link';
Any malicious input will be converted to an encoded URL parameter.
Using specialised external libraries or OWASP AntiSamy lists
Sometimes you will want to send HTML or other kind of code inputs. You will need to maintain a list of authorised words (white list) and un-authorized (blacklist).
You can download standard lists available at the OWASP AntiSamy website. Each list is fit for a specific kind of interaction (ebay api, tinyMCE, etc...). And it is open source.
There are libraries existing to filter HTML and prevent XSS attacks for the general case and performing at least as well as AntiSamy lists with very easy use.
For example you have HTML Purifier
In order of preference:
If you are using a templating engine (e.g. Twig, Smarty, Blade), check that it offers context-sensitive escaping. I know from experience that Twig does. {{ var|e('html_attr') }}
If you want to allow HTML, use HTML Purifier. Even if you think you only accept Markdown or ReStructuredText, you still want to purify the HTML these markup languages output.
Otherwise, use htmlentities($var, ENT_QUOTES | ENT_HTML5, $charset) and make sure the rest of your document uses the same character set as $charset. In most cases, 'UTF-8' is the desired character set.
Also, make sure you escape on output, not on input.
Many frameworks help handle XSS in various ways. When rolling your own or if there's some XSS concern, we can leverage filter_input_array (available in PHP 5 >= 5.2.0, PHP 7.)
I typically will add this snippet to my SessionController, because all calls go through there before any other controller interacts with the data. In this manner, all user input gets sanitized in 1 central location. If this is done at the beginning of a project or before your database is poisoned, you shouldn't have any issues at time of output...stops garbage in, garbage out.
/* Prevent XSS input */
$_GET = filter_input_array(INPUT_GET, FILTER_SANITIZE_STRING);
$_POST = filter_input_array(INPUT_POST, FILTER_SANITIZE_STRING);
/* I prefer not to use $_REQUEST...but for those who do: */
$_REQUEST = (array)$_POST + (array)$_GET + (array)$_REQUEST;
The above will remove ALL HTML & script tags. If you need a solution that allows safe tags, based on a whitelist, check out HTML Purifier.
If your database is already poisoned or you want to deal with XSS at time of output, OWASP recommends creating a custom wrapper function for echo, and using it EVERYWHERE you output user-supplied values:
//xss mitigation functions
function xssafe($data,$encoding='UTF-8')
{
return htmlspecialchars($data,ENT_QUOTES | ENT_HTML401,$encoding);
}
function xecho($data)
{
echo xssafe($data);
}
<?php
function xss_clean($data)
{
// Fix &entity\n;
$data = str_replace(array('&','<','>'), array('&amp;','&lt;','&gt;'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);
// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);
// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);
do
{
// Remove really unwanted tags
$old_data = $data;
$data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
}
while ($old_data !== $data);
// we are done...
return $data;
}
You are also able to set some XSS related HTTP response headers via header(...)
X-XSS-Protection "1; mode=block"
to be sure, the browser XSS protection mode is enabled.
Content-Security-Policy "default-src 'self'; ..."
to enable browser-side content security. See this one for Content Security Policy (CSP) details: http://content-security-policy.com/
Especially setting up CSP to block inline-scripts and external script sources is helpful against XSS.
for a general bunch of useful HTTP response headers concerning the security of you webapp, look at OWASP: https://www.owasp.org/index.php/List_of_useful_HTTP_headers
Use htmlspecialchars on PHP. On HTML try to avoid using:
element.innerHTML = “…”;
element.outerHTML = “…”;
document.write(…);
document.writeln(…);
where var is controlled by the user.
Also obviously try avoiding eval(var),
if you have to use any of them then try JS escaping them, HTML escape them and you might have to do some more but for the basics this should be enough.
The best way to protect your input it's use htmlentities function.
Example:
htmlentities($target, ENT_QUOTES, 'UTF-8');
You can get more information here.

TextArea Data Sanitizing Before Storing In MySQL

Here's the problem scenario. I have a textbox in which users can input comments. However, if they include HTML tags or <A HREF='javascript:window.alert("Example of a link that displays an alert box");'> link </A>, and when the comments are outputted onto the page from the MySQL database, they actually execute. I'm looking for a way to prevent that from happen and only allow a few HTML tags to be used (like bold, italics, underline).
I'm using this function on my comments before sending the comments from the textarea to be stored on the mysql database:
function sanitize($data)
{
// remove whitespaces (not a must though)
$data = trim($data);
// apply stripslashes if magic_quotes_gpc is enabled
if(get_magic_quotes_gpc())
{
$data = stripslashes($data);
}
// a mySQL connection is required before using this function
$data = mysql_real_escape_string($data);
return $data;
}
Well, there was no answer how to allow certain tags to remain intact.
strip_tags() function, which comes first to one's mind, is not safe at all, it will allow dangerous JS attributes.
So, you have to use some utility like HTML Purifier, or some regexp that will allow only certain tags and eliminate all others or for for BBCode.

Images not uploading when htmlentities has 'UTF-8' set

I have a form that, among other things, accepts an image for upload and sticks it in the database. Previously I had a function filtering the POSTed data that was basically:
function processInput($stuff) {
$formdata = $stuff;
$formdata = htmlentities($formdata, ENT_QUOTES);
return "'" . mysql_real_escape_string(stripslashes($formdata)) . "'";
}
When, in an effort to fix some weird entities that weren't getting converted properly I changed the function to (all that has changed is I added that 'UTF-8' bit in htmlentities):
function processInput($stuff) {
$formdata = $stuff;
$formdata = htmlentities($formdata, ENT_QUOTES, 'UTF-8'); //added UTF-8
return "'" . mysql_real_escape_string(stripslashes($formdata)) . "'";
}
And now images will not upload.
What would be causing this? Simply removing the 'UTF-8' bit allows images to upload properly but then some of the MS Word entities that users put into the system show up as gibberish. What is going on?
**EDIT: Since I cannot do much to change the code on this beast I was able to slap a bandaid on by using htmlspecialchars() rather than htmlentities() and that seems to at least leave the image data untouched while converting things like quotes, angle brackets, etc.
bobince's advice is excellent but in this case I cannot now spend the time needed to fix the messy legacy code in this project. Most stuff I deal with is object oriented and framework based but now I see first hand what people mean when they talk about "spaghetti code" in PHP.
function processInput($stuff) {
$formdata = $stuff;
$formdata = htmlentities($formdata, ENT_QUOTES);
return "'" . mysql_real_escape_string(stripslashes($formdata)) . "'";
}
This function represents a basic misunderstanding of string processing, one common to PHP programmers.
SQL-escaping, HTML-escaping and input validation are three separate functions, to be used at different stages of your script. It makes no sense to try to do them all in one go; it will only result in characters that are ‘special’ to any one of the processes getting mangled when used in the other parts of the script. You can try to tinker with this function to try to fix mangling in one part of the app, but you'll break something else.
Why are images being mangled? Well, it's not immediately clear via what path image data is going from a $_FILES temporary upload file to the database. If this function is involved at any point though, it's going to completely ruin the binary content of an image file. Backslashes removed and HTML-escaped... no image could survive that.
mysql_real_escape_string is for escaping some text for inclusion in a MySQL string literal. It should be used always-and-only when making an SQL string literal with inserted text, and not globally applied to input. Because some things that come in in the input aren't going immediately or solely to the database. For example, if you echo one of the input values to the HTML page, you'll find you get a bunch of unwanted backslashes in it when it contains characters like '. This is how you end up with pages full of runaway backslashes.
(Even then, parameterised queries are generally preferable to manual string hacking and mysql_real_escape_string. They hide the details of string escaping from you so you don't get confused by them.)
htmlentities is for escaping text for inclusion in an HTML page. It should be used always-and-only in the output templating bit of your PHP. It is inappropriate to run it globally over all your input because not everything is going to end up in an HTML page or solely in an HTML page, and most probably it's going to go to the database first where you absolutely don't want a load of < and & rubbish making your text fail to search or substring reliably.
(Even then, htmlspecialchars is generally preferable to htmlentities as it only encodes the characters that really need it. htmlentities will add needless escaping, and unless you tell it the right encoding it'll also totally mess up all your non-ASCII characters. htmlentities should almost never be used.)
As for stripslashes... well, you sometimes need to apply that to input, but only when the idiotic magic_quotes_gpc option is turned on. You certainly shouldn't apply it all the time, only when you detect magic_quotes_gpc is on. It is long deprecated and thankfully dying out, so it's probably just as good to bomb out with an error message if you detect it being turned on. Then you could chuck the whole processInput thing away.
To summarise:
At start time, do no global input processing. You can do application-specific validation here if you want, like checking a phone number is just numbers, or removing control characters from text or something, but there should be no escaping happening here.
When making an SQL query with a string literal in it, use SQL-escaping on the value as it goes into the string: $query= "SELECT * FROM t WHERE name='".mysql_real_escape_string($name)."'";. You can define a function with a shorter name to do the escaping to save some typing. Or, more readably, parameterisation.
When making HTML output with strings from the input or the database or elsewhere, use HTML-escaping, eg.: <p>Hello, <?php echo htmlspecialchars($name); ?>!</p>. Again, you can define a function with a short name to do echo htmlspecialchars to save on typing.

PHP HTML Entities

I want to display on screen data send by the user,
remembering it can contain dangerous code, it is the best to clean this data with html entities.
Is there a better way to do html entities, besides this:
$name = clean($name, 40);
$email = clean($email, 40);
$comment = clean($comment, 40);
and this:
$data = array("name", "email," "comment")
function confHtmlEnt($data)
{
return htmlentities($data, ENT_QUOTES, 'UTF-8');
}
$cleanPost = array_map('confHtmlEnt', $_POST);
if so, how, and how does my wannabe structure
for html entities look?
Thank you for not flaming the newb :-).
"Clean POST", the only problem is you might not know in what context will your data appear. I have a Chat server now that works via browser client and a desktop client and both need data in a different way. So make sure you save the data as "raw" as possible into the DB and then worry about filtering it on output.
Do not encode everything in $_POST/$_GET. HTML-escaping is an output-encoding issue, not an input-checking one.
Call htmlentities (or, usually better, htmlspecialchars) only at the point where you're taking some plain text and concatenating or echoing it into an HTML page. That applies whether the text you are using comes from a submitted parameter, or from the database, or somewhere else completely. Call mysql_real_escape_string only at the point you insert plain text into an SQL string literal.
It's tempting to shove all that escaping stuff in its own box at the top of the script and then forget about it. But text preparation really doesn't work like that, and if you pretend it does you'll find your database irreparably full of double-encoded crud, backslashes on your HTML page and security holes you didn't spot because you were taking data from a source other than the (encoded) parameters.
You can make the burden of remembering to mysql_real_escape_string go away by using mysqli's parameterised queries or another higher-level data access layer. You can make the burden of typing htmlspecialchars every time less bothersome by defining a shorter-named function for it, eg.:
<?php
function h($s) {
echo(htmlspecialchars($s, ENT_QUOTES));
}
?>
<h1> Blah blah </h1>
<p>
Blah blah <?php h($title); ?> blah.
</p>
or using a different templating engine that encodes HTML by default.
If you wish to convert the five special HTML characters to their equivalent entities, use the following method:
function filter_HTML($mixed)
{
return is_array($mixed)
? array_map('filter_HTML',$mixed)
: htmlspecialchars($mixed,ENT_QUOTES);
}
That would work for both UTF-8 or single-byte encoded string.
But if the string is UTF-8 encoded, make sure to filter out any invalid characters sequence, prior to using the filter_HTML() function:
function make_valid_UTF8($str)
{
return iconv('UTF-8','UTF-8//IGNORE',$str)
}
Also see: http://www.phpwact.org/php/i18n/charsets#character_sets_character_encoding_issues
You need to clean every element bevor displaying it. I do it usually with a function and an array like your secound example.
If you use a framework with a template engine, there is quite likely a possibility to auto-encode strings. Apart from that, what's simpler than calling a function and getting the entity-"encoded" string back?
Check out the filter libraries in php, in particular filter_input_array.
filter_input_array(INPUT_POST, FILTER_SANITIZE_SPECIAL_CHARS);

Categories