Getting all tangled up in json_encode/json_parse and escaping

Getting all tangled up in json_encode/json_parse and escaping - php

I have problems when users input " or \ on a html form
The inputted text will be shown again to the user in html content and html attribute contexts
I have the following data flow:
jQuery form line input
$_POST
escape for html attribute: function escapes either with html entities or hex entities (" or \)
json_encode in php
some unknown javascript interference which blows the fuses
json_parse in a jquery ajax callback
The goal is to show the user the exact same text as they inputted, but to escape properly to avoid xss attacks.
Now first thing I got was that $_POST had slashes added for some reason. So I now use stripslashes first. That solved everything for single quotes, but if the user inputs " or \ it still breaks.
The problems seems to be that javascript does some decoding before the json_parse gets the data. it turns the hex escapes back to \ and " thus killing json_parse.
So then I thought if between step 4 and 5 I use htmlspecialchars( $data, NO_QUOTES, 'utf-8' ) I encode the ampersands to &, which should neutralise the javascript decoding, but no. It doesn't decode &for some reason while it does decode &quot and the hex encodings...
Where am I going wrong?
Is there a way to know exactly what the javascipt decodes and neutralize it from php?
What I'm doing now, after wasting half a day:
I think it's probably some jQuery thing to interfere with the data before the onsuccess handler gets it. I have no time to dig it up and kill it right now, so I'm just sneaking past it with a hack that means 3 string transformations just to keep a string untransformed, but hey, developer time is a rare commodity here.
in php:
// due to a problem with the jQuery callback code which seems to decode html entities and hex entities except for &
// we need to do something to keep our data intact, otherwise parse_json chokes on unescaped backslashes
// and quotes. So we mask the entity by transforming the & into & here and back in js.
// TODO: unit test this to prevent regression
// TODO: debug the jQuery to avoid this workaround
//
// echo json_encode( $response );
echo preg_replace( '/&/u', '&', json_encode( $response ) );
in js before parse_json:
// due to a problem with the jQuery callback code which seems to decode html entities and hex entities except for &
// we need to do something to keep our data intact, otherwise parse_json chokes on unescaped backslashes
// and quotes. So we mask the entity by transforming the & into & here and back in js.
// See function xxxxxx() in file xxxxx.php for the corresponding transformation
//
responseText = responseText.replace( /&/g, '&' );
I couldn't be bothered at the moment to write the unit tests for it, but I don't seem to be able to break it.
The true question remains how can I knock out the unwanted transformation while getting the same result?

Try turning off "Magic Quotes" in php. That way the data comes in through $_POST just like the user typed it. See: http://www.php.net/manual/en/security.magicquotes.disabling.php
Then you can escape it according to your needs.

I had a problem like your problem and used utf8_encode() function. Now it works well. Can you try it ?

Related

PHP Escape a string if it hasn't already been escaped with entities

I'm using a 3rd party API that seems to return its data with the entity codes already in there. Such as The Lion’s Pride.
If I print the string as-is from the API it renders just fine in the browser (in the example above it would put in an apostrophe). However, I can't trust that the API will always use the entities in the future so I want to use something like htmlentities or htmlspecialchars myself before I print it. The problem with this is that it will encode the ampersand in the entity code again and the end result will be The Lion&#8217;s Pride in the HTML source which doesn't render anything user friendly.
How can I use htmlentities or htmlspecialchars only if it hasn't already been used on the string? Is there a built-in way to detect if entities are already present in the string?

No one seems to be answering your actual question, so I will
How can I use htmlentities or htmlspecialchars only if it hasn't already been used on the string? Is there a built-in way to detect if entities are already present in the string?
It's impossible. What if I'm making an educational post about HTML entities and I want to actually print this on the screen:
The Lion&#8217;s Pride
... it would need to be encoded as...
The Lion&amp&semi;&num;8217&semi;s Pride
But what if that was the actual string we wanted to print on the string ? ... and so on.
Bottom line is, you have to know what you've been given and work from there – which is where the advice from the other answers comes in – which is still just a workaround.
What if they give you double-encoded strings? What if they start wrapping the html-encoded strings in XML? And then wrap that in JSON? ... And then the JSON is converted to binary strings? the possibilities are endless.
It's not impossible for the API you depend on to suddenly switch the output type, but it's also a pretty big violation of the original contract with your users. To some extent, you have to put some trust in the API to do what it says it's going to do. Unit/Integration tests make up the rest of the trust.
And because you could never write a program that works for any possible change they could make, it's senseless to try to anticipate any change at all.

Decode the string, then re-encode the entities. (Using html_entity_decode())
$string = htmlspecialchars(html_entity_decode($string));
https://eval.in/662095

There is NO WAY to do what you ask for!
You must know what kind of data is the service giving back.
Anything else would be guessing.
Example:
what if the service is giving back & but is not escaping ?
you would guess it IS escaping so you would wrongly interpret as & while the correct value is &

I think the best solution, is first to decode all html entities/special chars from the original string, and then html encode the string again.
That way you will end up with a correctly encoded string, no matter if the original string was encoded or not.

You also have the option of using htmlspecialchars_decode();
$string = htmlspecialchars_decode($string);

It's already in htmlentities:
php > echo htmlentities('Hi&mom', ENT_HTML5, ini_get('default_charset'), false);
Hi&mom
php > echo htmlentities('Hi&mom', ENT_HTML5, ini_get('default_charset'), true);
Hi&amp&semi;mom
Just use the [optional]4th argument to NOT double-encode.

prevent xss attack via url( PHP)

I am trying to avoid XSS attack via url
url :http://example.com/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29
I have tried
var_dump(filter_var('http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29', FILTER_VALIDATE_URL));
and other url_validation using regex but not worked at all.
above link shows all the information but my css and some java script function doesn't work.
please suggest the best possible solution...

Try using FILTER_SANITIZE_SPECIAL_CHARS Instead
$url = 'http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29';
// Original
echo $url, PHP_EOL;
// Sanitise
echo sanitiseURL($url), PHP_EOL;
// Satitise + URL encode
echo sanitiseURL($url, true), PHP_EOL;
Output
http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29
http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/"ns="alert(0x0000DC)
http%3A%2F%2F10.0.4.2%2FonlineArcNew%2Fhtml%2Fterms_conditions_1.php%2F%26%2334%3Bns%3D%26%2334%3Balert%280x0000DC%29
Function Used
function sanitiseURL($url, $encode = false) {
$url = filter_var(urldecode($url), FILTER_SANITIZE_SPECIAL_CHARS);
if (! filter_var($url, FILTER_VALIDATE_URL))
return false;
return $encode ? urlencode($url) : $url;
}

If you're using MVC, then try to decode all ofthe values before routing, and use stript_tags() to get rid of these nasties. And as the docs say, case should not impact anything.
If not, create a utility function and do the same while retrieveing the variables from the URI. But I am by no means an XSS expert, so this might be just a part of the trick.
From Janis Peisenieks

Step 1: Escape Output Provided by Users
If you want to include data within a page that’s been provided by users, escape the output. And, in this simplified list, we’re going to stick with one simple escape operation: HTML encode any <, >, &, ‘, “. For example, PHP provides the htmlspecialchars() function to accomplish this common task.
Step 2: Always Use XHTML
Read through OWASP’s XSS prevention strategies, and it becomes apparent that protecting against injection requires much more effort if you use unquoted attributes in your HTML. In contrast, in quoted attributes, escaping data becomes the same process needed to escape data for content within tags, the escape operation we already outlined above. That’s because the only troublemaker in terms of sneaking in structurally significant content within the context of a quoted attribute is the closing quote.
Obviously, your markup doesn’t have to be XHTML in order to contain quoted attributes. However, shooting for and validating against XHTML makes it easy to test if all of the attributes are quoted.
Step 3: Only Allow Alphanumeric Data Values in CSS and JavaScript
We need to limit the data you allow from users that will be output within CSS and Javascript sections of the page to alphanumeric (e.g., a regex like [a-zA-Z0-9]+) types, and make sure they are used in a context in which they truly represent values. In Javascript this means user data should only be output within quoted strings assigned to variables (e.g., var userId = “ALPHANUMERIC_USER_ID_HERE”;.) In CSS this means that user data should only be output within the context for a property value (e.g., p { color: #ALPHANUMERIC_USER_COLOR_HERE;}.) This might seem Draconian, but, hey, this is supposed to be a simple XSS tutorial
Now, to be clear, you should always validate user data to make sure it meets your expectations, even for data that’s output within tags or attributes, as in the earlier examples. However, it’s especially important for CSS and JavaScript regions, as the complexity of the possible data structures makes it exceedingly difficult to prevent XSS attacks.
Common data you might want users to be able supply to your JavaScript such as Facebook, Youtube, and Twitter ID’s can all be used whilst accommodating this restriction. And, CSS color attributes and other styles can be integrated, too.
Step 4: URL-Encode URL Query String Parameters
If user data is output within a URL parameter of a link query string, make sure to URL-encode the data. Again, using PHP as example, you can simply use the urlencode() function. Now, let’s be clear on this and work through a couple examples, as I’ve seen much confusion concerning this particular point.
Must URL-encode
The following example outputs user data that must be URL-encoded because it is used as a value in the query string.
http://site.com?id=USER_DATA_HERE_MUST_BE_URL_ENCODED”>
Must Not URL-Encode
The following example outputs the user-supplied data for the entire URL. In this case, the user data should be escaped with the standard escape function (HTML encode any <, >, &, ‘, “), not URL-encoded. URL-encoding this example would lead to malformed links.

javascript quotes inside quotes, string literal issue

I am trying to display text in a javascript tooltip
I keep getting unterminated string literals even though:
a) the quotes are being slashed, b) there are no line breaks
The text I am trying to display is:
"No, we can't. This is going to be terrible."
(its a quotation from an individual and I want those quotes to display in the tooltip)
My tooltip function works like this
onMouseOver="Tip('string here')"
After I run the string through my function to clean for javascript
function jschars($str) {
echo preg_replace("/\r?\n/", "\\n", addslashes($str));
}
It comes out looking like this in HTML:
onMouseOver="Tip('\"No, we can\'t. This is going to be terrible.\"')"
This gives me the error unterminated string literal for the first \ in Tip('\
I'm guessing its because im trying to put quotes directly inside the single quotes, how can I get around this for situations like this? (I have tried htmlspecial chars, such as replacing the " with & quot ; - I still get the error

It's because you're putting double-quotes inside the value of an XML (or html) element:
<div onMouseOver="Tip('\".......
the back-slash doesn't escape it from the context of xml/html. Technically, you'll need to entity-encode the string (after you javascript-escape it). Something like this:
<div onMouseOver="Tip('\"No, we can\'t. This is going to be terrible.\"')" >
Various browsers may or may not deal with that properly. A much better way to approach it would be to give the element an id (or a class, or some other way for you to select it), then add the mouse over handler from a standalone script.

Because of the structure of what you're doing:
onMouseOver="Tip('string here')"
...you have to do two things:
As Lekensteyn said, you need to use htmlspecialchars to turn any special HTML characters into character escapes. It does things like turn " into ", which means you can safely enclose the attribute in " characters.
But you're not just using this as an attribute, you're also putting it inside a string literal, which means you also need to do JavaScript escaping on the string. Otherwise, (in your case) a single ' character or backslash will mess up the string. So your jschars function also needs to (in order) A) Convert \ to \\, B) Convert ' to \'. That's the minimum, anyway, really you need a thorough "make this safe to put inside a JavaScript literal" function. From your question, I sort of had the impression you were doing this manually, but better to automate it for consistency.
Off-topic: Separately, I would recommend moving away from using attributes to attach handlers. Instead, look into attachEvent (IE) and addEventListener (W3C), or better yet look at a library like jQuery, Closure, Prototype, YUI, or any of several others that will smooth things out for you. For instance, attaching a mouseover handler to:
You can use this handler to handle the mouseover:
function handler() {
Tip('Your message here');
}
...which you then hook up like this with raw DOM stuff (obviously you'd make a utility function for this):
var div = document.getElementById('foo');
if (div.attachEvent) {
// Uses "onmouseover", not "mouseover"
div.attachEvent('onmouseover', handler);
}
else if (div.addEventListener) {
// Uses "mouseover", not "onmouseover"
div.attachEvent('mouseover', handler, false);
}
else {
// Fallback to old DOM0 stuff
div.onmouseover = handler;
}
Here's how Prototype simplifies that hook-up process:
$('foo').observe('mouseover', handler);
Here's how jQuery does:
$('#foo').mouseover(handler);

You should use htmlspecialchars() for this purpose. The problem is ", but HTML won't understand javascript quoting, so it stops at \".
function jschars($str) {
echo htmlspecialchars(preg_replace("/\r?\n/", "\\n", $str), ENT_QUOTES);
}

You could keep the string in javascript instead of HTML. eg:
<a onmouseover="Tip(this, 123)">choice</a>
Then something like:
var texts = {
123:"No, we can't. This is going to be terrible.",
...
};
function Tip(elm, txtId){
showTip(elm, texts[txtid];
}

Is it OK to have HTML tags inside an array (is there risk of hacking)?

I have the following array:
'tagline_p' => "I'm a <a href='#showcase'>multilingual web</a> developer, designer and translator. I'm here to <a href='#contact'>help you</a> reach a worldwide audience.",
Should I escape the HTML tags inside the array to avoid hackings to my site? (How to escape them?)
or is OK to have HTML tags inside an array?

The only time it becomes a problem is when it contains user input. You know what you put in your array, and trust it. But you don't know what users are passing in, and don't trust that.
So in this particular case, escaping is not needed. But as soon as user input is involved, you should escape the input.
It's not the HTML itself that is dangerous, but the type of HTML users can pass in, like script tags which allow them to execute Javascript.
Addition
Note that it's best practice to only escape on output not on input. The output is where the data can do damage, so you want to consistently escape that. That way, you don't have to make sure that all input is escaped.
That way, you don't have problems when outputting data to different formats where maybe different rules apply. You don't have to use things like stripslashes() or htmlspecialchars_decode() if you don't need things to be output as html.

It's fine to store the data in the array.
You only need to escape the tags when you are outputting it into an HTML context, and you don't trust it, or you don't want the HTML to be interpreted.
You have to escape data in an appropriate manner to where you are sending it; for HTML if you don't want it to be read as HTML you can use htmlspecialchars(), likewise if you are putting it into an SQL statement and you don't want it to be read as SQL, you can use mysql_real_escape_string() etc.

You should escape HTML when it has been entered by a user (and thus is unsafe) AND you're going to display that HTML in you site. If it's you who wrote it, it doesn't need any kind of escaping.
If you do need to escape html you should do so right before displaying it on your site. There is no need to escape data when you're just lugging it around (like you're presummably doing with that array). You can escape HTML with the htmlspecialchars() function.

(Use htmlspecialchars or htmlentities to escape the HTML.)
Having HTML tags is fine as long as you restrict the set of tags and attributes coming from user, if that array is dynamically generated. For example, <script> should not be allowed, nor event handlers like onmouseover.

It depends on how the HTML is getting into the array. If it's hardcoded by you, it's probably all right. If it's coming from a user, well, all user input is suspect- HTML is just more difficult to clean.
The real question might be "Why do you want to put HTML in an array?". If it's static text, put it in a template file somewhere.

make an array of allowable tags and use strip_tags($input_array[$key],$allowable_tags)
or make a function like this
function sanitize_input($allowable_tags='<br><b><strong><p>')
{
$input_array = $input;
foreach ($input as $key=>$value){
if(!empty($value)) {
$input_array[$key] = strip_tags($input_array[$key],$allowable_tags);
}
}
return $input_array;
}

Why mysql is not storing data after "#" character?

I have made one form in which there is rich text editor. and i m trying to store the data to database.
now i have mainly two problem..
1) As soon as the string which contents "#"(basically when i try to change the color of the font) character, then it does not store characters after "#". and it also not store "#" character also.
2) although i had tried....in javascript
html.replace("\"","'");
but it does not replace the double quotes to single quotes.

We'll need to see some code. My feeling is you're missing some essential escaping step somewhere. In particular:
As soon as the string which contents "#"(basically when i try to change the color of the font) character
Implies to me that you might be sticking strings together into a URL like this:
var url= '/something.php?content='+html;
Naturally if the html contains a # symbol, you've got problems, because in:
http://www.example.com/something.php?content=<div style="color:#123456">
the # begins a fragment identifier called #123456">, like when you put #section on the end of a URL to go to the anchor called section in the HTML file. Fragment identifiers are purely client-side and are not sent to the server, which would see:
http://www.example.com/something.php?content=<div style="color:
However this is far from the only problem with the above. Space, < and = are simly invalid in URLs, and other characters like & will also mess up parameter parsing. To encode an arbitrary string into a query parameter you must use encodeURIComponent:
var url= '/something.php?content='+encodeURIComponent(html);
which will replace # with %35 and similarly for the other out-of-band characters.
However if this is indeed what you're doing, you should in any case you should not be storing anything to the database in response to a GET request, nor relying on a GET to pass potentially-large content. Use a POST request instead.

It seems that you are doing something very strange with your database code. Can you show the actual code you use for storing the string to database?
# - character is a common way to create a comment. That is everything starting from # to end of line is discarded. However if your code to store to database is correct, that should not matter.
Javascript is not the correct place to handle quote character conversions. The right place for that is on server side.

As you have requested....
I try to replay you... I try to mention exact what I had done...
1) on the client side on the html form page I had written like this..
html = html.trim(); // in html, the data of the rich text editor will come.
document.RTEDemo.action = "submit.php?method='"+ html.replace("\"","'") + "'";
\\ i had done replace bcz i think that was some problem with double quotes.
now on submit.php , my browser url is like this...
http://localhost/nc/submit.php?method='This is very simple recipe.<br><strong style='background-color: #111111; color: #80ff00; font-size: 20px;">To make Bread Buttor you will need</strong><br><br><blockquote><ol><li>bread</li><li>buttor</li></ol></li></blockquote><span style="background-color: #00ff80;">GOOD.</span><br><br><br><blockquote><br></blockquote><br>'
2) on submit.php ........I just write simply this
echo "METHOD : ".$_GET['method'] . "<br><br>";
$method = $_GET['method'];
now my answer of upper part is like this...
METHOD : 'This is very simple recipe.
now i want to store the full detail of URL....but its only storing...
This is very simple recipe.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.