Codeigniter XSS Filtering ignores % sign

Codeigniter XSS Filtering ignores % sign - php

I have a search page on my website (built with CodeIgniter) that accepts a URL-encoded query string, like http://somesite.com/search?q=123. I want the contents of the parameter q to be displayed on the search page. The problem is, when I call xss_clean() on the value of the parameter (either while reading the query string or when outputting it), it converts any existence of a percentile character into its equivalent URL-encoded character. For example, ?q=100%25%20500, which stands for the search string 100% 500 gets printed as 100P0, because the XSS filtering logic of CodeIgniter appears to interpret % 50 as %50 (ignoring space), which translates to the character P, following HTML encoding rules. So my question is, what can I do to both allow the character % in my searches as well as have it XSS-filtered?
Here's my code fragment, if it helps to understand the problem better:
<?php
$s = $this->input->get('q', FALSE);
echo "<p>You searched for $s</p>"; //this will show '100% 500'
$this->load->helper('security');
$t = xss_clean($s);
echo "<p>You searched for $t</p>"; //this will show '100P0'

Related

Ignore accents in greek words when using elastic search

I have a search input box that supports autosuggestion logic. The results are fetched from an elastic index whose analyzers (index_analyzer, search_analyzer) use as tokenizer: nGram, and as filter: standard, lowercase and asciifolding for the search_analyzer & lowercase and asciifolding for the index_analyzer respectively.
What I am struggling to achieve but without any effect/result yet is to get result(s) even if the user has given a greek word without the accent (tonos). Otherwise, the user gets proper results and the mechanism works as expected.
I have to mention that the given string is matched against a specific field to the document set that includes greek words with accent. Moreover, this field is of datatype string and enabled to get analyzed.
The query is formed (using example string without accent that highlights the problem):
$searchString = mb_strtolower('Προταση', 'UTF-8')
$queryText = new Elastica_Query_Text();
$queryText->setField('name', $searchString)
$query = new Elastica_Query();
$query->setQuery($queryText);
A quick solution but not appropriate cause it's kinda heavy for this purpose, is to form a fuzzy query with min_similarity set to 0.7. Then it works thoroughly but the cost is significant.
All the work has been done using the elastica, let alone php. Could you please help to solve my problem ? It is imperative for me that a solution be found.
Thank you in advance

prevent xss attack via url( PHP)

I am trying to avoid XSS attack via url
url :http://example.com/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29
I have tried
var_dump(filter_var('http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29', FILTER_VALIDATE_URL));
and other url_validation using regex but not worked at all.
above link shows all the information but my css and some java script function doesn't work.
please suggest the best possible solution...

Try using FILTER_SANITIZE_SPECIAL_CHARS Instead
$url = 'http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29';
// Original
echo $url, PHP_EOL;
// Sanitise
echo sanitiseURL($url), PHP_EOL;
// Satitise + URL encode
echo sanitiseURL($url, true), PHP_EOL;
Output
http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29
http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/"ns="alert(0x0000DC)
http%3A%2F%2F10.0.4.2%2FonlineArcNew%2Fhtml%2Fterms_conditions_1.php%2F%26%2334%3Bns%3D%26%2334%3Balert%280x0000DC%29
Function Used
function sanitiseURL($url, $encode = false) {
$url = filter_var(urldecode($url), FILTER_SANITIZE_SPECIAL_CHARS);
if (! filter_var($url, FILTER_VALIDATE_URL))
return false;
return $encode ? urlencode($url) : $url;
}

If you're using MVC, then try to decode all ofthe values before routing, and use stript_tags() to get rid of these nasties. And as the docs say, case should not impact anything.
If not, create a utility function and do the same while retrieveing the variables from the URI. But I am by no means an XSS expert, so this might be just a part of the trick.
From Janis Peisenieks

Step 1: Escape Output Provided by Users
If you want to include data within a page that’s been provided by users, escape the output. And, in this simplified list, we’re going to stick with one simple escape operation: HTML encode any <, >, &, ‘, “. For example, PHP provides the htmlspecialchars() function to accomplish this common task.
Step 2: Always Use XHTML
Read through OWASP’s XSS prevention strategies, and it becomes apparent that protecting against injection requires much more effort if you use unquoted attributes in your HTML. In contrast, in quoted attributes, escaping data becomes the same process needed to escape data for content within tags, the escape operation we already outlined above. That’s because the only troublemaker in terms of sneaking in structurally significant content within the context of a quoted attribute is the closing quote.
Obviously, your markup doesn’t have to be XHTML in order to contain quoted attributes. However, shooting for and validating against XHTML makes it easy to test if all of the attributes are quoted.
Step 3: Only Allow Alphanumeric Data Values in CSS and JavaScript
We need to limit the data you allow from users that will be output within CSS and Javascript sections of the page to alphanumeric (e.g., a regex like [a-zA-Z0-9]+) types, and make sure they are used in a context in which they truly represent values. In Javascript this means user data should only be output within quoted strings assigned to variables (e.g., var userId = “ALPHANUMERIC_USER_ID_HERE”;.) In CSS this means that user data should only be output within the context for a property value (e.g., p { color: #ALPHANUMERIC_USER_COLOR_HERE;}.) This might seem Draconian, but, hey, this is supposed to be a simple XSS tutorial
Now, to be clear, you should always validate user data to make sure it meets your expectations, even for data that’s output within tags or attributes, as in the earlier examples. However, it’s especially important for CSS and JavaScript regions, as the complexity of the possible data structures makes it exceedingly difficult to prevent XSS attacks.
Common data you might want users to be able supply to your JavaScript such as Facebook, Youtube, and Twitter ID’s can all be used whilst accommodating this restriction. And, CSS color attributes and other styles can be integrated, too.
Step 4: URL-Encode URL Query String Parameters
If user data is output within a URL parameter of a link query string, make sure to URL-encode the data. Again, using PHP as example, you can simply use the urlencode() function. Now, let’s be clear on this and work through a couple examples, as I’ve seen much confusion concerning this particular point.
Must URL-encode
The following example outputs user data that must be URL-encoded because it is used as a value in the query string.
http://site.com?id=USER_DATA_HERE_MUST_BE_URL_ENCODED”>
Must Not URL-Encode
The following example outputs the user-supplied data for the entire URL. In this case, the user data should be escaped with the standard escape function (HTML encode any <, >, &, ‘, “), not URL-encoded. URL-encoding this example would lead to malformed links.

PHP function question

I don't now if this is the place to ask this kind of question so I will give it a try. I was wondering what does the following php user defined function do in the code example below? If someone explain it to me in detail thanks.
function decode_characters($info)
{
$info = mb_convert_encoding($info, "HTML-ENTITIES", "UTF-8");
$info = preg_replace('~^(&([a-zA-Z0-9]);)~',htmlentities('${1}'),$info);
return($info);
}

The function is a little odd. The first function call transforms a string encoded in UTF-8 to an ASCII encoded string where the non-mapped characters are converted to HTML entities (named entities if they exist in HTML 4, otherwise numeric entities). For instance:
echo mb_convert_encoding("foo\"é⌑'&", "HTML-ENTITIES", "UTF-8");
yields
foo"é⌑'&
So this differs from htmlentities in that 1) numerical entities are used in the circumstances given and 2) special characters such as &, " or < are not touched.
The second function call, however, is more strange. It finds if a named entity with only one ASCII alphanumeric character starts the input, and, if so, calls htmlentities on this input (actually it doesn't because the e modifier is not used and the function name is not in a string, so it's executed when the arguments are evaluated). This call has no effect because htmlentities('${1}') is '${1}' and the backreference 1 encompasses the whole match, so, even if the expression matches, there's no substitution.

Is strip_tags() vulnerable to scripting attacks?

Is there a known XSS or other attack that makes it past a
$content = "some HTML code";
$content = strip_tags($content);
echo $content;
?
The manual has a warning:
This function does not modify any attributes on the tags that you allow using allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users.
but that is related to using the allowable_tags parameter only.
With no allowed tags set, is strip_tags() vulnerable to any attack?
Chris Shiflett seems to say it's safe:
Use Mature Solutions
When possible, use mature, existing solutions instead of trying to create your own. Functions like strip_tags() and htmlentities() are good choices.
is this correct? Please if possible, quote sources.
I know about HTML purifier, htmlspecialchars() etc.- I am not looking for the best method to sanitize HTML. I just want to know about this specific issue. This is a theoretical question that came up here.
Reference: strip_tags() implementation in the PHP source code

As its name may suggest, strip_tags should remove all HTML tags. The only way we can proof it is by analyzing the source code. The next analysis applies to a strip_tags('...') call, without a second argument for whitelisted tags.
First at all, some theory about HTML tags: a tag starts with a < followed by non-whitespace characters. If this string starts with a ?, it should not be parsed. If this string starts with a !--, it's considered a comment and the following text should neither be parsed. A comment is terminated with a -->, inside such a comment, characters like < and > are allowed. Attributes can occur in tags, their values may optionally be surrounded by a quote character (' or "). If such a quote exist, it must be closed, otherwise if a > is encountered, the tag is not closed.
The code text is interpreted in Firefox as:
text
The PHP function strip_tags is referenced in line 4036 of ext/standard/string.c. That function calls the internal function php_strip_tags_ex.
Two buffers exist, one for the output, the other for "inside HTML tags". A counter named depth holds the number of open angle brackets (<).
The variable in_q contains the quote character (' or ") if any, and 0 otherwise. The last character is stored in the variable lc.
The functions holds five states, three are mentioned in the description above the function. Based on this information and the function body, the following states can be derived:
State 0 is the output state (not in any tag)
State 1 means we are inside a normal html tag (the tag buffer contains <)
State 2 means we are inside a php tag
State 3: we came from the output state and encountered the < and ! characters (the tag buffer contains <!)
State 4: inside HTML comment
We need just to be careful that no tag can be inserted. That is, < followed by a non-whitespace character. Line 4326 checks an case with the < character which is described below:
If inside quotes (e.g. <a href="inside quotes">), the < character is ignored (removed from the output).
If the next character is a whitespace character, < is added to the output buffer.
if outside a HTML tag, the state becomes 1 ("inside HTML tag") and the last character lc is set to <
Otherwise, if inside the a HTML tag, the counter named depth is incremented and the character ignored.
If > is met while the tag is open (state == 1), in_q becomes 0 ("not in a quote") and state becomes 0 ("not in a tag"). The tag buffer is discarded.
Attribute checks (for characters like ' and ") are done on the tag buffer which is discarded. So the conclusion is:
strip_tags without a tag whitelist is safe for inclusion outside tags, no tag will be allowed.
By "outside tags", I mean not in tags as in outside tag. Text may contain < and > though, as in >< a>>. The result is not valid HTML though, <, > and & need still to be escaped, especially the &. That can be done with htmlspecialchars().
The description for strip_tags without an whitelist argument would be:
Makes sure that no HTML tag exist in the returned string.

I cannot predict future exploits, especially since I haven't looked at the PHP source code for this. However, there have been exploits in the past due to browsers accepting seemingly invalid tags (like <s\0cript>). So it's possible that in the future someone might be able to exploit odd browser behavior.
That aside, sending the output directly to the browser as a full block of HTML should never be insecure:
echo '<div>'.strip_tags($foo).'</div>'
However, this is not safe:
echo '<input value="'.strip_tags($foo).'" />';
because one could easily end the quote via " and insert a script handler.
I think it's much safer to always convert stray < into < (and the same with quotes).

According to this online tool, this string will be "perfectly" escaped, but
the result is another malicious one!
<<a>script>alert('ciao');<</a>/script>
In the string the "real" tags are <a> and </a>, since < and script> alone aren't tags.
I hope I'm wrong or that it's just because of an old version of PHP, but it's better to check in your environment.

YES, strip_tags() is vulnerable to scripting attacks, right through to (at least) PHP 8. Do not use it to prevent XSS. Instead, you should use filter_input().
The reason that strip_tags() is vulnerable is because it does not run recursively. That is to say, it does not check whether or not valid tags will remain after valid tags have been stripped. For example, the string
<<a>script>alert(XSS);<</a>/script> will strip the <a> tag successfully, yet fail to see this leaves
<script>alert(XSS);</script>.
This can be seen (in a safe environment) here.

Strip tags is perfectly safe - if all that you are doing is outputting the text to the html body.
It is not necessarily safe to put it into mysql or url attributes.

Why mysql is not storing data after "#" character?

I have made one form in which there is rich text editor. and i m trying to store the data to database.
now i have mainly two problem..
1) As soon as the string which contents "#"(basically when i try to change the color of the font) character, then it does not store characters after "#". and it also not store "#" character also.
2) although i had tried....in javascript
html.replace("\"","'");
but it does not replace the double quotes to single quotes.

We'll need to see some code. My feeling is you're missing some essential escaping step somewhere. In particular:
As soon as the string which contents "#"(basically when i try to change the color of the font) character
Implies to me that you might be sticking strings together into a URL like this:
var url= '/something.php?content='+html;
Naturally if the html contains a # symbol, you've got problems, because in:
http://www.example.com/something.php?content=<div style="color:#123456">
the # begins a fragment identifier called #123456">, like when you put #section on the end of a URL to go to the anchor called section in the HTML file. Fragment identifiers are purely client-side and are not sent to the server, which would see:
http://www.example.com/something.php?content=<div style="color:
However this is far from the only problem with the above. Space, < and = are simly invalid in URLs, and other characters like & will also mess up parameter parsing. To encode an arbitrary string into a query parameter you must use encodeURIComponent:
var url= '/something.php?content='+encodeURIComponent(html);
which will replace # with %35 and similarly for the other out-of-band characters.
However if this is indeed what you're doing, you should in any case you should not be storing anything to the database in response to a GET request, nor relying on a GET to pass potentially-large content. Use a POST request instead.

It seems that you are doing something very strange with your database code. Can you show the actual code you use for storing the string to database?
# - character is a common way to create a comment. That is everything starting from # to end of line is discarded. However if your code to store to database is correct, that should not matter.
Javascript is not the correct place to handle quote character conversions. The right place for that is on server side.

As you have requested....
I try to replay you... I try to mention exact what I had done...
1) on the client side on the html form page I had written like this..
html = html.trim(); // in html, the data of the rich text editor will come.
document.RTEDemo.action = "submit.php?method='"+ html.replace("\"","'") + "'";
\\ i had done replace bcz i think that was some problem with double quotes.
now on submit.php , my browser url is like this...
http://localhost/nc/submit.php?method='This is very simple recipe.<br><strong style='background-color: #111111; color: #80ff00; font-size: 20px;">To make Bread Buttor you will need</strong><br><br><blockquote><ol><li>bread</li><li>buttor</li></ol></li></blockquote><span style="background-color: #00ff80;">GOOD.</span><br><br><br><blockquote><br></blockquote><br>'
2) on submit.php ........I just write simply this
echo "METHOD : ".$_GET['method'] . "<br><br>";
$method = $_GET['method'];
now my answer of upper part is like this...
METHOD : 'This is very simple recipe.
now i want to store the full detail of URL....but its only storing...
This is very simple recipe.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.