I have an HTML table that displays information from a database, and one of the database fields contains a parameter list such as:
id=eff34-435-567rt-65u¬ification=5
But when I display this in the table the ¬ becomes ¬
I know that you can manually force it to print the right way by using
¬
But I would really rather be able to just use something to force the HTML to ignore the code so I can just pull the text straight from the database and print it to the table without having to do a regex to find out if there are any & and replace them with & I tried using the <pre> tag but that did not work.
Is there any way to force the HTML to print exactly what is typed for that specific td field?
Nothing practical (CDATA doesn't have browser support in text/html mode). Write proper HTML instead.
You should be running anything that comes out of the database through a conversion function to make it HTML safe anyway (to protect against XSS if nothing else). PHP has htmlspecialchars(), TT has | html. Whatever you are using should have something other then a regex.
& is the correct HTML encoding for the &. You will need to write the ¬ for it to display correctly.
If you're pulling from a database, you can use whatever programming language that is available to you to decode HTML entities for you.
For example, in PHP, you could use htmlentities or htmlspecialchars.
Try using htmlspecialchars().
most frameworks have HTML Encode functions.
in JavaScript: encode
in C# .NET: HttpServerUtility.HtmlEncode
Just run an HTMLEncode on the string before outputting it. Every server-side scripting language I know of has a built in command to do this. Not to mention that you are eventually going to run into another character that causes problems too.
ASP.NET: HttpServerUtility.HtmlEncode
PHP: htmlentities
Regex should definitely NOT be necessary.
Related
I'm storing data in a MySQL database that may have some special characters. I'm wondering how to store it so that these characters are preserved if they're either output to HTML via PHP OR via JavaScript, e.g. createTextNode.
For example, the division symbol (÷) has the html code ÷, and when I store it as that it shows up fine when put directly into HTML by PHP, but when I pull it into JavaScript using $.getJSON and then insert it with createTextNode it shows up looking like ÷.
I also tried storing the symbol in the SQL directly, but my understanding is that the column would need to be changed from VARCHAR to NVARCHAR and that would cause a performance hit that doesn't seem necessary.
Given that I can modify the SQL, the PHP, or the JavaScript, is there an easy fix here? Maybe a way to unescape the HTML entity in JavaScript?
As answered by Yogesh, you should switch your collation of the DB to utf8_general_ci
So there's probably two things going on:
JSON escapes special characters.
Somewhere, something in your code flow is URL encoding the strings too.
So you just need to decode the string in your JavaScript, or you need to find what part of your code is URL encoding those strings and fix it.
I'm building a category list and I'm unsure how to store the Ampersand symbol
in MySQL database. Is there any problem/disadvantage if I use '&'. Are there any differences
from using it in a html format '&'?
Using '&' saves a few bytes. It is not a special character for MySQL. You can easily produce the & for output later with PHP's method htmlspecialchars(). When your field is meant to keep simple information, you should use plain text only, as this is in general more flexible since you can generate different kinds of markup etc. later. Exception: the markup is produced by a user whose layout decisions you want to save with the text (as in rich-text input). If you have tags etc. in your input, you may want to use & for consistency.
You should store it as & only if the field in the DB contains HTML (like <span class="bold">some text</span> & more). In which case you should be very careful about XSS.
If the field contains some general data (like an username, title... etc) you should only escape it when you put it in your HTML (using htmlentities for example).
Storing it as & is an appropriate method. You can echo it or use it in statements as &.
We store '&' into database fields all the time, it's fine to do so (at-least I've never heard an argument otherwise).
If you're only ever using the string in a HTML page you could just store the HTML safe & version I suppose. I would suggest that storing '&' and escaping it when you read it would be better though (in-case you need to use the string in a non-HTML context in the future).
Use & if you want to have a valid HTML or avoid problems, like cut© (browser shows it as cut©).
Hi i'm planning to make users able to submit some pieces of code (php,java,javascript c++, etc... whatever they want i mean).
so does anyone can suggest me the best practice to make it safety for my site? :))
i mean which tags/chars/strings to escape in php once is submitted code string?
If your intent is to display the code on screen, you do not need to escape or replace anything before storing it in your database (if you intend to store it) . This doesn't apply, of course, to escaping for database insertion via something like mysql_real_escape_string(), for example (or your RDBMS' equivalent sanitization routine). That step is still absolutely necessary.
When displaying the code, just be sure that:
You DO NOT evaluate any submitted code via an eval() or system call.
When displaying code back to the browser, escape it with htmlspecialchars(). Never display it unescaped, or you will introduce cross site scripting vulnerabilities.
Use placeholders in your queries and you don't even have to escape the input.
Placeholders, binding, and prepared statements are definitely the preferred method.
It's faster for anything over 1 query as you can reuse the handles and just change the input.
It's safer. The string is not interpreted with the query... ever. What you store is what you get.
I'd need to know a bit more about your target sql to give pertinent examples, but here's some links:
PDO style binding: http://docs.php.net/pdo.prepared-statements
MySqli style binding: http://docs.php.net/manual/en/mysqli-stmt.bind-param.php
When you read it back, display with
htmlspecialchars($string, ENT_QUOTES);
ENT_QUOTES option ensures that both single and double quotes get escaped.
You don't need to escape anything (other then the usual mysql sanitation), if you don't intend to automatically run it.
I am no expert ( I only got told about this yesterday), but at least for HTML, you could try and use htmlentities (look at this ).
Once something has been converted using htmlentities, it becomes plain text, so if opened in a browser, you will see the tag and everything, (e.g. it will write <a href="blah blah">), if it's written to a log or something else, and then opened in a text based editor, you will some symbols and shnaz that represent the html entities.
If you need to convert back, you can use the html_entity_decode function, I think, but I am going to wager a guess and presume that you don't need to convert back.
For other languages, I have no idea what you should do.
The example web page has 2 fields and allows a user to enter a title and code. Both fields would later be embed and displayed in an HTML page for viewing and/or editing but not execution. In other words, any PHP or javascript or similar should not run but be displayed for editing and copying.
In this case, what is the best way to escape these fields before database insertion and after (for HTML display)
You need to use the function htmlspecialchars() in php
that will change any special characters (eg < and >) into their special HTML encoded characters (eg < and >). When you get these from the database and output them as HTML they will display as code, but won't harm your script or execute.
I faced with the same problem a few days back, to put the codes (javascript or PHP ) in the html in a non executable way, I used textarea, it solved the purpose.
The problem however, was with the database. I cannot use the typical escape functions with the data, as it is affecting my data, for example the tags are getting messed up.
To solve this problem, I encoded the data in base 64 format before putting it in the database. So what is happening is my JavaScript code is encoded and the resultant code is no longer a Javascript code and I can use the escape functions on this and store it in the database.
I am open to suggestions, feel free to comment.
I want to prevent XSS attacks in my web application. I found that HTML Encoding the output can really prevent XSS attacks. Now the problem is that how do I HTML encode every single output in my application? I there a way to automate this?
I appreciate answers for JSP, ASP.net and PHP.
One thing that you shouldn't do is filter the input data as it comes in. People often suggest this, since it's the easiest solution, but it leads to problems.
Input data can be sent to multiple places, besides being output as HTML. It might be stored in a database, for example. The rules for filtering data sent to a database are very different from the rules for filtering HTML output. If you HTML-encode everything on input, you'll end up with HTML in your database. (This is also why PHP's "magic quotes" feature is a bad idea.)
You can't anticipate all the places your input data will travel. The safe approach is to prepare the data just before it's sent somewhere. If you're sending it to a database, escape the single quotes. If you're outputting HTML, escape the HTML entities. And once it's sent somewhere, if you still need to work with the data, use the original un-escaped version.
This is more work, but you can reduce it by using template engines or libraries.
You don't want to encode all HTML, you only want to HTML-encode any user input that you're outputting.
For PHP: htmlentities and htmlspecialchars
For JSPs, you can have your cake and eat it too, with the c:out tag, which escapes XML by default. This means you can bind to your properties as raw elements:
<input name="someName.someProperty" value="<c:out value='${someName.someProperty}' />" />
When bound to a string, someName.someProperty will contain the XML input, but when being output to the page, it will be automatically escaped to provide the XML entities. This is particularly useful for links for page validation.
A nice way I used to escape all user input is by writing a modifier for smarty wich escapes all variables passed to the template; except for the ones that have |unescape attached to it. That way you only give HTML access to the elements you explicitly give access to.
I don't have that modifier any more; but about the same version can be found here:
http://www.madcat.nl/martijn/archives/16-Using-smarty-to-prevent-HTML-injection..html
In the new Django 1.0 release this works exactly the same way, jay :)
My personal preference is to diligently encode anything that's coming from the database, business layer or from the user.
In ASP.Net this is done by using Server.HtmlEncode(string) .
The reason so encode anything is that even properties which you might assume to be boolean or numeric could contain malicious code (For example, checkbox values, if they're done improperly could be coming back as strings. If you're not encoding them before sending the output to the user, then you've got a vulnerability).
You could wrap echo / print etc. in your own methods which you can then use to escape output. i.e. instead of
echo "blah";
use
myecho('blah');
you could even have a second param that turns off escaping if you need it.
In one project we had a debug mode in our output functions which made all the output text going through our method invisible. Then we knew that anything left on the screen HADN'T been escaped! Was very useful tracking down those naughty unescaped bits :)
If you do actually HTML encode every single output, the user will see plain text of <html> instead of a functioning web app.
EDIT: If you HTML encode every single input, you'll have problem accepting external password containing < etc..
The only way to truly protect yourself against this sort of attack is to rigorously filter all of the input that you accept, specifically (although not exclusively) from the public areas of your application. I would recommend that you take a look at Daniel Morris's PHP Filtering Class (a complete solution) and also the Zend_Filter package (a collection of classes you can use to build your own filter).
PHP is my language of choice when it comes to web development, so apologies for the bias in my answer.
Kieran.
OWASP has a nice API to encode HTML output, either to use as HTML text (e.g. paragraph or <textarea> content) or as an attribute's value (e.g. for <input> tags after rejecting a form):
encodeForHTML($input) // Encode data for use in HTML using HTML entity encoding
encodeForHTMLAttribute($input) // Encode data for use in HTML attributes.
The project (the PHP version) is hosted under http://code.google.com/p/owasp-esapi-php/ and is also available for some other languages, e.g. .NET.
Remember that you should encode everything (not only user input), and as late as possible (not when storing in DB but when outputting the HTTP response).
Output encoding is by far the best defense. Validating input is great for many reasons, but not 100% defense. If a database becomes infected with XSS via attack (i.e. ASPROX), mistake, or maliciousness input validation does nothing. Output encoding will still work.
there was a good essay from Joel on software (making wrong code look wrong I think, I'm on my phone otherwise I'd have a URL for you) that covered the correct use of Hungarian notation. The short version would be something like:
Var dsFirstName, uhsFirstName : String;
Begin
uhsFirstName := request.queryfields.value['firstname'];
dsFirstName := dsHtmlToDB(uhsFirstName);
Basically prefix your variables with something like "us" for unsafe string, "ds" for database safe, "hs" for HTML safe. You only want to encode and decode where you actually need it, not everything. But by using they prefixes that infer a useful meaning looking at your code you'll see real quick if something isn't right. And you're going to need different encode/decode functions anyways.