Images not uploading when htmlentities has 'UTF-8' set - php

I have a form that, among other things, accepts an image for upload and sticks it in the database. Previously I had a function filtering the POSTed data that was basically:
function processInput($stuff) {
$formdata = $stuff;
$formdata = htmlentities($formdata, ENT_QUOTES);
return "'" . mysql_real_escape_string(stripslashes($formdata)) . "'";
}
When, in an effort to fix some weird entities that weren't getting converted properly I changed the function to (all that has changed is I added that 'UTF-8' bit in htmlentities):
function processInput($stuff) {
$formdata = $stuff;
$formdata = htmlentities($formdata, ENT_QUOTES, 'UTF-8'); //added UTF-8
return "'" . mysql_real_escape_string(stripslashes($formdata)) . "'";
}
And now images will not upload.
What would be causing this? Simply removing the 'UTF-8' bit allows images to upload properly but then some of the MS Word entities that users put into the system show up as gibberish. What is going on?
**EDIT: Since I cannot do much to change the code on this beast I was able to slap a bandaid on by using htmlspecialchars() rather than htmlentities() and that seems to at least leave the image data untouched while converting things like quotes, angle brackets, etc.
bobince's advice is excellent but in this case I cannot now spend the time needed to fix the messy legacy code in this project. Most stuff I deal with is object oriented and framework based but now I see first hand what people mean when they talk about "spaghetti code" in PHP.

function processInput($stuff) {
$formdata = $stuff;
$formdata = htmlentities($formdata, ENT_QUOTES);
return "'" . mysql_real_escape_string(stripslashes($formdata)) . "'";
}
This function represents a basic misunderstanding of string processing, one common to PHP programmers.
SQL-escaping, HTML-escaping and input validation are three separate functions, to be used at different stages of your script. It makes no sense to try to do them all in one go; it will only result in characters that are ‘special’ to any one of the processes getting mangled when used in the other parts of the script. You can try to tinker with this function to try to fix mangling in one part of the app, but you'll break something else.
Why are images being mangled? Well, it's not immediately clear via what path image data is going from a $_FILES temporary upload file to the database. If this function is involved at any point though, it's going to completely ruin the binary content of an image file. Backslashes removed and HTML-escaped... no image could survive that.
mysql_real_escape_string is for escaping some text for inclusion in a MySQL string literal. It should be used always-and-only when making an SQL string literal with inserted text, and not globally applied to input. Because some things that come in in the input aren't going immediately or solely to the database. For example, if you echo one of the input values to the HTML page, you'll find you get a bunch of unwanted backslashes in it when it contains characters like '. This is how you end up with pages full of runaway backslashes.
(Even then, parameterised queries are generally preferable to manual string hacking and mysql_real_escape_string. They hide the details of string escaping from you so you don't get confused by them.)
htmlentities is for escaping text for inclusion in an HTML page. It should be used always-and-only in the output templating bit of your PHP. It is inappropriate to run it globally over all your input because not everything is going to end up in an HTML page or solely in an HTML page, and most probably it's going to go to the database first where you absolutely don't want a load of < and & rubbish making your text fail to search or substring reliably.
(Even then, htmlspecialchars is generally preferable to htmlentities as it only encodes the characters that really need it. htmlentities will add needless escaping, and unless you tell it the right encoding it'll also totally mess up all your non-ASCII characters. htmlentities should almost never be used.)
As for stripslashes... well, you sometimes need to apply that to input, but only when the idiotic magic_quotes_gpc option is turned on. You certainly shouldn't apply it all the time, only when you detect magic_quotes_gpc is on. It is long deprecated and thankfully dying out, so it's probably just as good to bomb out with an error message if you detect it being turned on. Then you could chuck the whole processInput thing away.
To summarise:
At start time, do no global input processing. You can do application-specific validation here if you want, like checking a phone number is just numbers, or removing control characters from text or something, but there should be no escaping happening here.
When making an SQL query with a string literal in it, use SQL-escaping on the value as it goes into the string: $query= "SELECT * FROM t WHERE name='".mysql_real_escape_string($name)."'";. You can define a function with a shorter name to do the escaping to save some typing. Or, more readably, parameterisation.
When making HTML output with strings from the input or the database or elsewhere, use HTML-escaping, eg.: <p>Hello, <?php echo htmlspecialchars($name); ?>!</p>. Again, you can define a function with a short name to do echo htmlspecialchars to save on typing.

Related

how to decode php base64_decode

I'm a newbie starting to learn from source code. I bought a source code on the internet with full source code switching but it turns out there is a part that is hidden. How to do decrypt/decode for lines like this:
<?php
$keystroke1 = base64_decode("d2RyMTU5c3E0YXllejd4Y2duZl90djhubHVrNmpoYmlvMzJtcA==");
eval(gzinflate(base64_decode('hY5NCsIwEIWv8ixdZDCKWZcuPUfRdqrBmsBkAkrp3aVIi3Tj9v1+vje7PodWfQwNv3zSZAqJyqGNHRdE4+JiVU2ZVHy42fLyjDkoYUT54DdqpHxNKmsAJwtHFXxvksrAYXGort1cE9YsAe1dTJTOzCuEPZbhChN4SPw/iePMd/7ybSmcxeb+4Mj+vkzTBw==')));
$O0O0O0O0O0O0=$keystroke1[2].$keystroke1[32].$keystroke1[20].$keystroke1[11].$keystroke1[23].$keystroke1[15].$keystroke1[32].$keystroke1[1].$keystroke1[11];
$keystroke2 = $O0O0O0O0O0O0("xes26:tr5bzf{8ydhog`uw9omvl7kicjp43nq", -1);
$OO000OO000OO=$keystroke2[16].$keystroke2[12].$keystroke2[31].$keystroke2[23].$keystroke2[18].$keystroke2[24].$keystroke2[9].$keystroke2[20].$keystroke2[11];
$O0000000000O=$keystroke1[30].$keystroke1[9].$keystroke1[6].$keystroke1[11].$keystroke1[27].$keystroke1[8].$keystroke1[19].$keystroke1[1].$keystroke1[11].$keystroke1[15].$keystroke1[32].$keystroke1[1].$keystroke1[11];
eval($OO000OO000OO(base64_decode('LcTLsm
tKAADQn7lVZ+8yoBtB3ZH3OyEEMbnl0SLxTJrQvv
5M7hos9C36n38uF4Zh/u+nLDA6cf/VqJpq9PPHq2
IHD+dQlrVwpIa3BPicV2atbjLVsx+to7il1297dn
c+9PeDJGOoGn0MJUJnSqiJwrGcK5/bG2iiJtUoOk
3GKbHYjjzd5yLu3q2dPpWSFjDVTKWSS6MFsF6MU5
dsbJn7qHRxhGo0MNuluk29F3iwyAx/cYO+OfPWi1
ECDkWG1NsMLuAcM3F98vtMsubbvQjf1ZpVMUP5Eh
puFNzCi/CYkoM1VgsAetzjpvEe1M2AlX4YFjQZF0
A0VBRQKS0B5mcI7na2N/nER993+qocgmh9WawUrU
YhBMUiPNpuXNQy2o7VxHvhyO3nZkcWTmQu5kV1C2
ECbZiH8XsL4QuYbf7lI4SF1gDM/vVqRz4qyj7a8b
qS1nXP79731t4O0qcDaqN97BHDzlPwTEF6H7p9a3
Zu1Ut6X5GNTgZhWe3dHa+6yzJ58MX1Pc8mwAWK4v
EVLjGolQQLieOvkn4jD4d0FMQuLYvXhaxbzJyLR2
OHDKhMu2EwHthDt+I7YwOvVUydwEnCigk/n4iQei
SzwWNKicdunzmrVoOWl9gt8lhK+WzNpbPqkHEK7i
xBHT84UAbkHpity8i9eLUUulASI5d7cfpGWF6I4l
7tYBeJmYzXycA3FbbrSb+yNgd8XM5u7wU0mL8tVP
hJ2J/nu2QLr/OgzZrmp7xvKmpZCgHU7w0RlS1PT9
4JvxXtekif9dDGvBxSQjcwj2i32C7Abbcosvey5I
iq2hW7mjn/lUS6OUQ64Kw/v7+///4F')));
?>
is code like this dangerous?
You are looking at a piece of obfuscated code. I will explain it line by line, but first let's go over the functions that are used:
base64_decode()
This function decodes a base64 encoded string. It's used here to unscramble intentionally scrambled code.
gzinflate()
This function decompresses a compressed string. It's used the same way as base64_decode().
eval()
This function executes a string as code. Its use is discouraged and is in itself a bit of a red flag, though it has legitimate uses.
$keystroke1 = base64_decode("d2RyMTU5c3E0YXllejd4Y2duZl90djhubHVrNmpoYmlvMzJtcA==");
This line creates an apparently random string of characters: wdr159sq4ayez7xcgnf_tv8nluk6jhbio32mp
This string is saved to a variable, $keystroke1. The string itself is not important, other than that it contains some letters that are used later.
eval(gzinflate(base64_decode('hY5NCsIwEIWv8ixdZDCKWZcuPUfRdqrBmsBkAkrp3aVIi3Tj9v1+vje7PodWfQwNv3zSZAqJyqGNHRdE4+JiVU2ZVHy42fLyjDkoYUT54DdqpHxNKmsAJwtHFXxvksrAYXGort1cE9YsAe1dTJTOzCuEPZbhChN4SPw/iePMd/7ybSmcxeb+4Mj+vkzTBw==')));
This line unscrambles a doubly scrambled string and then runs this resulting code:
if(!function_exists("rotencode")){function rotencode($string,$amount) { $key = substr($string, 0, 1); if(strlen($string)==1) { return chr(ord($key) + $amount); } else { return chr(ord($key) + $amount) . rotEncode(substr($string, 1, strlen($string)-1), $amount); }}}
This creates a new function called rotencode(), which is yet another way of unscrambling strings.
$O0O0O0O0O0O0=$keystroke1[2].$keystroke1[32].$keystroke1[20].$keystroke1[11].$keystroke1[23].$keystroke1[15].$keystroke1[32].$keystroke1[1].$keystroke1[11];
This line takes specific characters from that random string from earlier to create the word "rotencode" as a string, stored in the variable named $O0O0O0O0O0O0.
$keystroke2 = $O0O0O0O0O0O0("xes26:tr5bzf{8ydhog`uw9omvl7kicjp43nq", -1);
This line uses the rotencode() function to unscramble yet another string (actually exactly the same string as before, for some reason).
$OO000OO000OO=$keystroke2[16].$keystroke2[12].$keystroke2[31].$keystroke2[23].$keystroke2[18].$keystroke2[24].$keystroke2[9].$keystroke2[20].$keystroke2[11];
$O0000000000O=$keystroke1[30].$keystroke1[9].$keystroke1[6].$keystroke1[11].$keystroke1[27].$keystroke1[8].$keystroke1[19].$keystroke1[1].$keystroke1[11].$keystroke1[15].$keystroke1[32].$keystroke1[1].$keystroke1[11];
On these lines the two (identical but separate) random strings are used to create the words gzinflate and base64_decode. This is done so the coder can use these functions without it being apparent that that's what is happening. However, base64_decode() is never used this way in the snippet you posted. That might suggest that it is used later in the code in places you haven't seen or recognized yet. Searching your code for "$O0000000000O" might yield other uses.
eval($OO000OO000OO(base64_decode('LcTLsmtKAADQn7lVZ+8yoBtB3ZH3OyEEMbnl0SLxTJrQvv5M7hos9C36n38uF4Zh/u+nLDA6cf/VqJpq9PPHq2IHD+dQlrVwpIa3BPicV2atbjLVsx+to7il1297dnc+9PeDJGOoGn0MJUJnSqiJwrGcK5/bG2iiJtUoOk3GKbHYjjzd5yLu3q2dPpWSFjDVTKWSS6MFsF6MU5dsbJn7qHRxhGo0MNuluk29F3iwyAx/cYO+OfPWi1ECDkWG1NsMLuAcM3F98vtMsubbvQjf1ZpVMUP5EhpuFNzCi/CYkoM1VgsAetzjpvEe1M2AlX4YFjQZF0A0VBRQKS0B5mcI7na2N/nER993+qocgmh9WawUrUYhBMUiPNpuXNQy2o7VxHvhyO3nZkcWTmQu5kV1C2ECbZiH8XsL4QuYbf7lI4SF1gDM/vVqRz4qyj7a8bqS1nXP79731t4O0qcDaqN97BHDzlPwTEF6H7p9a3Zu1Ut6X5GNTgZhWe3dHa+6yzJ58MX1Pc8mwAWK4vEVLjGolQQLieOvkn4jD4d0FMQuLYvXhaxbzJyLR2OHDKhMu2EwHthDt+I7YwOvVUydwEnCigk/n4iQeiSzwWNKicdunzmrVoOWl9gt8lhK+WzNpbPqkHEK7ixBHT84UAbkHpity8i9eLUUulASI5d7cfpGWF6I4l7tYBeJmYzXycA3FbbrSb+yNgd8XM5u7wU0mL8tVPhJ2J/nu2QLr/OgzZrmp7xvKmpZCgHU7w0RlS1PT94JvxXtekif9dDGvBxSQjcwj2i32C7Abbcosvey5Iiq2hW7mjn/lUS6OUQ64Kw/v7+///4F')));
This is where it all comes together. This line unscrambles a line of code which has been compressed and encoded 10 times over. The final result is this:
$cnk = array('localhost');
That's it. It sets the string "localhost" as the sole element of an array and saves it in a variable named $cnk.
In and of itself, there's nothing hazardous about running this code, but noting the lengths that the coder went to in order to hide this line, it's probably a safe bet that it wasn't placed there to help you - the buyer - in any way. Search your code for the $cnk variable if you want to know exactly what's being done. Or better yet, chalk this experience down to a loss and find a better way to learn coding. There are plenty of books, video tutorials and free resources online. Do not place your trust in whoever sold you this code. While they may not have been malicious (people suggested in comments that this might be part of a license check), anyone who includes something like this in their code is not someone you should be learning from.
Good luck on your coding journey!

What output should be sanitized?

Security is a big concern for me. The last version of my site had little to no security and we experienced a lot of issues so this time around i am looking to have things as secure as possible.
All user input goes through a function that strips tags( allowing <p><b> and trims the string. It is then uploaded to the site using a pdo prepared statement.
I am now going back to output everything and I am wondering what exactly needs to be sanitized. I have a lot of queries that fetch integers ( epoch time,and id's etc. ) should they be sanitized to avoid any issues?
I also have specific sections that use <P> and <b> tags from user input like about me on the user profile etc. Is my Output_paragraph function incorrect?
// Output Sanitation
function output($input) {
$output = htmlspecialchars($input);
return $output;
}
// Paragraph Output Sanitation
function output_paragraph($input) {
$output = htmlspecialchars($input);
$output = htmlspecialchars_decode($output);
return $output;
}
Generally speaking, output should be sanitized when it can't be trusted. In 99% of all cases, this means user input. So raw echoing user input, for instance, is an invitation for someone to use your site to to load, say, an attack file and trick some user into clicking a link to your site that loads said data.
htmlspecialchars is a good start, but you need to make sure you're not trusting user entered data. That means you define what you expect, and then match it to what you can output. Maybe you need strip_tags too. Maybe you need a regex. There's no real "one size fits all" answer to security.

A PHP Function that verify code language

I have a form with 2 textareas; the first one allows user to send HTML Code, the second allows to send CSS Code. I have to verify with a PHP function, if the language is correct.
If the language is correct, for security, i have to check that there is not PHP code or SQL Injection or whatever.
What do you think ? Is there a way to do that ?
Where can I find this kind of function ?
Is "HTML Purifier" http://htmlpurifier.org/ a good solution ?
If you have to validate the date to insert them in to database - then you just have to use mysql_real_escape_string() function before inserting them in to db.
//Safe database insertion
mysql_query("INSERT INTO table(column) VALUES(".mysql_real_escape_string($_POST['field']).")");
If you want to output the data to the end user as plain text - then you have to escape all html sensitive chars by htmlspecialchars(). If you want to output it as HTML, the you have to use HTML Purify tool.
//Safe plain text output
echo htmlspecialchars($data, ENT_QUOTES);
//Safe HTML output
$data = purifyHtml($data); //Or how it is spiecified in the purifier documentation
echo $data; //Safe html output
for something primitive you can use regex, BUT it should be noted using a parser to fully-exhaust all possibilities is recommended.
/(<\?(?:php)?(.*)\?>)/i
Example: http://regexr.com?2t3e5 (change the < in the expression back to a < and it will work (for some reason rexepr changes it to html formatting))
EDIT
/(<\?(?:php)?(.*)(?:\?>|$))/i
That's probably better so they can't place php at the end of the document (as PHP doesn't actually require a terminating character)
SHJS syntax highlighter for Javascript have files with regular expressions http://shjs.sourceforge.net/lang/ for languages that highlights — You can check how SHJS parse code.
HTMLPurifier is the recommended tool for cleaning up HTML. And as luck has it, it also incudes CSSTidy and can sanitize CSS as well.
... that there is not PHP code or SQL Injection or whatever.
You are basing your question on a wrong premise. While HTML can be cleaned, this is no safeguard against other exploitabilies. PHP "tags" are most likely to be filtered out. If you are doing something other weird (include-ing or eval-ing the content partially), that's no real help.
And SQL exploits can only be prevented by meticously using the proper database escape functions. There is no magic solution to that.
Yes. htmlpurifier is a good tool to remove malicious scripts and validate your HTML. Don't think it does CSS though. Apparently it works with CSS too. Thanks Briedis.
Ok thanks you all.
actually, i realize that I needed a human validation. Users can post HTML + CSS, I can verify in PHP that the langage & the syntax are correct, but it doesn't avoid people to post iframe, html redirection, or big black div that take all the screen.
:-)

PHP HTML Entities

I want to display on screen data send by the user,
remembering it can contain dangerous code, it is the best to clean this data with html entities.
Is there a better way to do html entities, besides this:
$name = clean($name, 40);
$email = clean($email, 40);
$comment = clean($comment, 40);
and this:
$data = array("name", "email," "comment")
function confHtmlEnt($data)
{
return htmlentities($data, ENT_QUOTES, 'UTF-8');
}
$cleanPost = array_map('confHtmlEnt', $_POST);
if so, how, and how does my wannabe structure
for html entities look?
Thank you for not flaming the newb :-).
"Clean POST", the only problem is you might not know in what context will your data appear. I have a Chat server now that works via browser client and a desktop client and both need data in a different way. So make sure you save the data as "raw" as possible into the DB and then worry about filtering it on output.
Do not encode everything in $_POST/$_GET. HTML-escaping is an output-encoding issue, not an input-checking one.
Call htmlentities (or, usually better, htmlspecialchars) only at the point where you're taking some plain text and concatenating or echoing it into an HTML page. That applies whether the text you are using comes from a submitted parameter, or from the database, or somewhere else completely. Call mysql_real_escape_string only at the point you insert plain text into an SQL string literal.
It's tempting to shove all that escaping stuff in its own box at the top of the script and then forget about it. But text preparation really doesn't work like that, and if you pretend it does you'll find your database irreparably full of double-encoded crud, backslashes on your HTML page and security holes you didn't spot because you were taking data from a source other than the (encoded) parameters.
You can make the burden of remembering to mysql_real_escape_string go away by using mysqli's parameterised queries or another higher-level data access layer. You can make the burden of typing htmlspecialchars every time less bothersome by defining a shorter-named function for it, eg.:
<?php
function h($s) {
echo(htmlspecialchars($s, ENT_QUOTES));
}
?>
<h1> Blah blah </h1>
<p>
Blah blah <?php h($title); ?> blah.
</p>
or using a different templating engine that encodes HTML by default.
If you wish to convert the five special HTML characters to their equivalent entities, use the following method:
function filter_HTML($mixed)
{
return is_array($mixed)
? array_map('filter_HTML',$mixed)
: htmlspecialchars($mixed,ENT_QUOTES);
}
That would work for both UTF-8 or single-byte encoded string.
But if the string is UTF-8 encoded, make sure to filter out any invalid characters sequence, prior to using the filter_HTML() function:
function make_valid_UTF8($str)
{
return iconv('UTF-8','UTF-8//IGNORE',$str)
}
Also see: http://www.phpwact.org/php/i18n/charsets#character_sets_character_encoding_issues
You need to clean every element bevor displaying it. I do it usually with a function and an array like your secound example.
If you use a framework with a template engine, there is quite likely a possibility to auto-encode strings. Apart from that, what's simpler than calling a function and getting the entity-"encoded" string back?
Check out the filter libraries in php, in particular filter_input_array.
filter_input_array(INPUT_POST, FILTER_SANITIZE_SPECIAL_CHARS);

What are the best practices for avoiding xss attacks in a PHP site [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have PHP configured so that magic quotes are on and register globals are off.
I do my best to always call htmlentities() for anything I am outputing that is derived from user input.
I also occasionally seach my database for common things used in xss attached such as...
<script
What else should I be doing and how can I make sure that the things I am trying to do are always done.
Escaping input is not the best you can do for successful XSS prevention. Also output must be escaped. If you use Smarty template engine, you may use |escape:'htmlall' modifier to convert all sensitive characters to HTML entities (I use own |e modifier which is alias to the above).
My approach to input/output security is:
store user input not modified (no HTML escaping on input, only DB-aware escaping done via PDO prepared statements)
escape on output, depending on what output format you use (e.g. HTML and JSON need different escaping rules)
I'm of the opinion that one shouldn't escape anything during input, only on output. Since (most of the time) you can not assume that you know where that data is going. Example, if you have form that takes data that later on appears in an email that you send out, you need different escaping (otherwise a malicious user could rewrite your email-headers).
In other words, you can only escape at the very last moment the data is "leaving" your application:
List item
Write to XML file, escape for XML
Write to DB, escape (for that particular DBMS)
Write email, escape for emails
etc
To go short:
You don't know where your data is going
Data might actually end up in more than one place, needing different escaping mechanism's BUT NOT BOTH
Data escaped for the wrong target is really not nice. (E.g. get an email with the subject "Go to Tommy\'s bar".)
Esp #3 will occur if you escape data at the input layer (or you need to de-escape it again, etc).
PS: I'll second the advice for not using magic_quotes, those are pure evil!
There are a lot of ways to do XSS (See http://ha.ckers.org/xss.html) and it's very hard to catch.
I personally delegate this to the current framework I'm using (Code Igniter for example). While not perfect, it might catch more than my hand made routines ever do.
This is a great question.
First, don't escape text on input except to make it safe for storage (such as being put into a database). The reason for this is you want to keep what was input so you can contextually present it in different ways and places. Making changes here can compromise your later presentation.
When you go to present your data filter out what shouldn't be there. For example, if there isn't a reason for javascript to be there search for it and remove it. An easy way to do that is to use the strip_tags function and only present the html tags you are allowing.
Next, take what you have and pass it thought htmlentities or htmlspecialchars to change what's there to ascii characters. Do this based on context and what you want to get out.
I'd, also, suggest turning off Magic Quotes. It is has been removed from PHP 6 and is considered bad practice to use it. Details at http://us3.php.net/magic_quotes
For more details check out http://ha.ckers.org/xss.html
This isn't a complete answer but, hopefully enough to help you get started.
rikh Writes:
I do my best to always call htmlentities() for anything I am outputing that is derived from user input.
See Joel's essay on Making Code Look Wrong for help with this
Template library. Or at least, that is what template libraries should do.
To prevent XSS all output should be encoded. This is not the task of the main application / control logic, it should solely be handled by the output methods.
If you sprinkle htmlentities() thorughout your code, the overall design is wrong. And as you suggest, you might miss one or two spots.
That's why the only solution is rigorous html encoding -> when output vars get written into a html/xml stream.
Unfortunately, most php template libraries only add their own template syntax, but don't concern themselves with output encoding, or localization, or html validation, or anything important. Maybe someone else knows a proper template library for php?
I rely on PHPTAL for that.
Unlike Smarty and plain PHP, it escapes all output by default. This is a big win for security, because your site won't become vurnelable if you forget htmlspecialchars() or |escape somewhere.
XSS is HTML-specific attack, so HTML output is the right place to prevent it. You should not try pre-filtering data in the database, because you could need to output data to another medium which doesn't accept HTML, but has its own risks.
Escaping all user input is enough for most sites. Also make sure that session IDs don't end up in the URL so they can't be stolen from the Referer link to another site. Additionally, if you allow your users to submit links, make sure no javascript: protocol links are allowed; these would execute a script as soon as the user clicks on the link.
If you are concerned about XSS attacks, encoding your output strings to HTML is the solution. If you remember to encode every single output character to HTML format, there is no way to execute a successful XSS attack.
Read more:
Sanitizing user data: How and where to do it
Personally, I would disable magic_quotes. In PHP5+ it is disabled by default and it is better to code as if it is not there at all as it does not escape everything and it will be removed from PHP6.
Next, depending on what type of user data you are filtering will dictate what to do next e.g. if it is just text e.g. a name, then strip_tags(trim(stripslashes())); it or to check for ranges use regular expressions.
If you expect a certain range of values, create an array of the valid values and only allow those values through (in_array($userData, array(...))).
If you are checking numbers use is_numeric to enforce whole numbers or cast to a specific type, that should prevent people trying to send strings in stead.
If you have PHP5.2+ then consider looking at filter() and making use of that extension which can filter various data types including email addresses. Documentation is not particularly good, but is improving.
If you have to handle HTML then you should consider something like PHP Input Filter or HTML Purifier. HTML Purifier will also validate HTML for conformance. I am not sure if Input Filter is still being developed. Both will allow you to define a set of tags that can be used and what attributes are allowed.
Whatever you decide upon, always remember, never ever trust anything coming into your PHP script from a user (including yourself!).
All of these answers are great, but fundamentally, the solution to XSS will be to stop generating HTML documents by string manipulation.
Filtering input is always a good idea for any application.
Escaping your output using htmlentities() and friends should work as long as it's used properly, but this is the HTML equivalent of creating a SQL query by concatenating strings with mysql_real_escape_string($var) - it should work, but fewer things can validate your work, so to speak, compared to an approach like using parameterized queries.
The long-term solution should be for applications to construct the page internally, perhaps using a standard interface like the DOM, and then to use a library (like libxml) to handle the serialization to XHTML/HTML/etc. Of course, we're a long ways away from that being popular and fast enough, but in the meantime we have to build our HTML documents via string operations, and that's inherently more risky.
“Magic quotes” is a palliative remedy for some of the worst XSS flaws which works by escaping everything on input, something that's wrong by design. The only case where one would want to use it is when you absolutely must use an existing PHP application known to be written carelessly with regard to XSS. (In this case you're in a serious trouble even with “magic quotes”.) When developing your own application, you should disable “magic quotes” and follow XSS-safe practices instead.
XSS, a cross-site scripting vulnerability, occurs when an application includes strings from external sources (user input, fetched from other websites, etc) in its [X]HTML, CSS, ECMAscript or other browser-parsed output without proper escaping, hoping that special characters like less-than (in [X]HTML), single or double quotes (ECMAscript) will never appear. The proper solution to it is to always escape strings according to the rules of the output language: using entities in [X]HTML, backslashes in ECMAscript etc.
Because it can be hard to keep track of what is untrusted and has to be escaped, it's a good idea to always escape everything that is a “text string” as opposed to “text with markup” in a language like HTML. Some programming environments make it easier by introducing several incompatible string types: “string” (normal text), “HTML string” (HTML markup) and so on. That way, a direct implicit conversion from “string” to “HTML string” would be impossible, and the only way a string could become HTML markup is by passing it through an escaping function.
“Register globals”, though disabling it is definitely a good idea, deals with a problem entirely different from XSS.
I find that using this function helps to strip out a lot of possible xss attacks:
<?php
function h($string, $esc_type = 'htmlall')
{
switch ($esc_type) {
case 'css':
$string = str_replace(array('<', '>', '\\'), array('<', '>', '/'), $string);
// get rid of various versions of javascript
$string = preg_replace(
'/j\s*[\\\]*\s*a\s*[\\\]*\s*v\s*[\\\]*\s*a\s*[\\\]*\s*s\s*[\\\]*\s*c\s*[\\\]*\s*r\s*[\\\]*\s*i\s*[\\\]*\s*p\s*[\\\]*\s*t\s*[\\\]*\s*:/i',
'blocked', $string);
$string = preg_replace(
'/#\s*[\\\]*\s*i\s*[\\\]*\s*m\s*[\\\]*\s*p\s*[\\\]*\s*o\s*[\\\]*\s*r\s*[\\\]*\s*t/i',
'blocked', $string);
$string = preg_replace(
'/e\s*[\\\]*\s*x\s*[\\\]*\s*p\s*[\\\]*\s*r\s*[\\\]*\s*e\s*[\\\]*\s*s\s*[\\\]*\s*s\s*[\\\]*\s*i\s*[\\\]*\s*o\s*[\\\]*\s*n\s*[\\\]*\s*/i',
'blocked', $string);
$string = preg_replace('/b\s*[\\\]*\s*i\s*[\\\]*\s*n\s*[\\\]*\s*d\s*[\\\]*\s*i\s*[\\\]*\s*n\s*[\\\]*\s*g:/i', 'blocked', $string);
return $string;
case 'html':
//return htmlspecialchars($string, ENT_NOQUOTES);
return str_replace(array('<', '>'), array('<' , '>'), $string);
case 'htmlall':
return htmlentities($string, ENT_QUOTES);
case 'url':
return rawurlencode($string);
case 'query':
return urlencode($string);
case 'quotes':
// escape unescaped single quotes
return preg_replace("%(?<!\\\\)'%", "\\'", $string);
case 'hex':
// escape every character into hex
$s_return = '';
for ($x=0; $x < strlen($string); $x++) {
$s_return .= '%' . bin2hex($string[$x]);
}
return $s_return;
case 'hexentity':
$s_return = '';
for ($x=0; $x < strlen($string); $x++) {
$s_return .= '&#x' . bin2hex($string[$x]) . ';';
}
return $s_return;
case 'decentity':
$s_return = '';
for ($x=0; $x < strlen($string); $x++) {
$s_return .= '&#' . ord($string[$x]) . ';';
}
return $s_return;
case 'javascript':
// escape quotes and backslashes, newlines, etc.
return strtr($string, array('\\'=>'\\\\',"'"=>"\\'",'"'=>'\\"',"\r"=>'\\r',"\n"=>'\\n','</'=>'<\/'));
case 'mail':
// safe way to display e-mail address on a web page
return str_replace(array('#', '.'),array(' [AT] ', ' [DOT] '), $string);
case 'nonstd':
// escape non-standard chars, such as ms document quotes
$_res = '';
for($_i = 0, $_len = strlen($string); $_i < $_len; $_i++) {
$_ord = ord($string{$_i});
// non-standard char, escape it
if($_ord >= 126){
$_res .= '&#' . $_ord . ';';
} else {
$_res .= $string{$_i};
}
}
return $_res;
default:
return $string;
}
}
?>
Source
Make you any session cookies (or all cookies) you use HttpOnly. Most browsers will hide the cookie value from JavaScript in that case. User could still manually copy cookies, but this helps prevent direct script access. StackOverflow had this problem durning beta.
This isn't a solution, just another brick in the wall
Don't trust user input
Escape all free-text output
Don't use magic_quotes; see if there's a DBMS-specfic variant, or use PDO
Consider using HTTP-only cookies where possible to avoid any malicious script being able to hijack a session
You should at least validate all data going into the database. And try to validate all data leaving the database too.
mysql_real_escape_string is good to prevent SQL injection, but XSS is trickier.
You should preg_match, stip_tags, or htmlentities where possible!
The best current method for preventing XSS in a PHP application is HTML Purifier (http://htmlpurifier.org/). One minor drawback to it is that it's a rather large library and is best used with an op code cache like APC. You would use this in any place where untrusted content is being outputted to the screen. It is much more thorough that htmlentities, htmlspecialchars, filter_input, filter_var, strip_tags, etc.
Use an existing user-input sanitization library to clean all user-input. Unless you put a lot of effort into it, implementing it yourself will never work as well.
I find the best way is using a class that allows you to bind your code so you never have to worry about manually escaping your data.
It is difficult to implement a thorough sql injection/xss injection prevention on a site that doesn't cause false alarms. In a CMS the end user might want to use <script> or <object> that links to items from another site.
I recommend having all users install FireFox with NoScript ;-)

Categories