PHP function question - php

I don't now if this is the place to ask this kind of question so I will give it a try. I was wondering what does the following php user defined function do in the code example below? If someone explain it to me in detail thanks.
function decode_characters($info)
{
$info = mb_convert_encoding($info, "HTML-ENTITIES", "UTF-8");
$info = preg_replace('~^(&([a-zA-Z0-9]);)~',htmlentities('${1}'),$info);
return($info);
}

The function is a little odd. The first function call transforms a string encoded in UTF-8 to an ASCII encoded string where the non-mapped characters are converted to HTML entities (named entities if they exist in HTML 4, otherwise numeric entities). For instance:
echo mb_convert_encoding("foo\"é⌑'&", "HTML-ENTITIES", "UTF-8");
yields
foo"é⌑'&
So this differs from htmlentities in that 1) numerical entities are used in the circumstances given and 2) special characters such as &, " or < are not touched.
The second function call, however, is more strange. It finds if a named entity with only one ASCII alphanumeric character starts the input, and, if so, calls htmlentities on this input (actually it doesn't because the e modifier is not used and the function name is not in a string, so it's executed when the arguments are evaluated). This call has no effect because htmlentities('${1}') is '${1}' and the backreference 1 encompasses the whole match, so, even if the expression matches, there's no substitution.

Related

Codeigniter XSS Filtering ignores % sign

I have a search page on my website (built with CodeIgniter) that accepts a URL-encoded query string, like http://somesite.com/search?q=123. I want the contents of the parameter q to be displayed on the search page. The problem is, when I call xss_clean() on the value of the parameter (either while reading the query string or when outputting it), it converts any existence of a percentile character into its equivalent URL-encoded character. For example, ?q=100%25%20500, which stands for the search string 100% 500 gets printed as 100P0, because the XSS filtering logic of CodeIgniter appears to interpret % 50 as %50 (ignoring space), which translates to the character P, following HTML encoding rules. So my question is, what can I do to both allow the character % in my searches as well as have it XSS-filtered?
Here's my code fragment, if it helps to understand the problem better:
<?php
$s = $this->input->get('q', FALSE);
echo "<p>You searched for $s</p>"; //this will show '100% 500'
$this->load->helper('security');
$t = xss_clean($s);
echo "<p>You searched for $t</p>"; //this will show '100P0'

Does not display ampersand using $_GET in php [duplicate]

I am trying to send a GET message that contains strings with ampersands and can't figure how to escape the ampersand in the URL.
Example:
http://www.example.com?candy_name=M&M
result => candy_name = M
I also tried:
http://www.example.com?candy_name=M\&M
result => candy_name = M\\
I am using URLs manually, so I just need the correct characters.
I can't use any libraries. How can it be done?
They need to be percent-encoded:
> encodeURIComponent('&')
"%26"
So in your case, the URL would look like:
http://www.mysite.com?candy_name=M%26M
This does not only apply to the ampersand in URLs, but to all reserved characters. Some of which include:
# $ & + , / : ; = ? # [ ]
The idea is the same as encoding an &in an HTML document, but the context has changed to be within the URI, in addition to being within the HTML document. So, the percent-encoding prevents issues with parsing inside of both contexts.
The place where this comes in handy a lot is when you need to put a URL inside of another URL. For example, if you want to post a status on Twitter:
http://www.twitter.com/intent/tweet?status=What%27s%20up%2C%20StackOverflow%3F(http%3A%2F%2Fwww.stackoverflow.com)
There's lots of reserved characters in my Tweet, namely ?'():/, so I encoded the whole value of the status URL parameter. This also is helpful when using mailto: links that have a message body or subject, because you need to encode the body and subject parameters to keep line breaks, ampersands, etc. intact.
When a character from the reserved set (a "reserved character") has
special meaning (a "reserved purpose") in a certain context, and a URI
scheme says that it is necessary to use that character for some other
purpose, then the character must be percent-encoded. Percent-encoding
a reserved character involves converting the character to its
corresponding byte value in ASCII and then representing that value as
a pair of hexadecimal digits. The digits, preceded by a percent sign
("%") which is used as an escape character, are then used in the URI
in place of the reserved character. (For a non-ASCII character, it is
typically converted to its byte sequence in UTF-8, and then each byte
value is represented as above.) The reserved character "/", for
example, if used in the "path" component of a URI, has the special
meaning of being a delimiter between path segments. If, according to a
given URI scheme, "/" needs to be in a path segment, then the three
characters "%2F" or "%2f" must be used in the segment instead of a raw
"/".
http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters
Try using http://www.example.org?candy_name=M%26M.
See also this reference and some more information on Wikipedia.
I would like to add a minor comment to Blender's solution.
You can do the following:
var link = 'http://example.com?candy_name=' + encodeURIComponent('M&M');
That outputs:
http://example.com?candy_name=M%26M
The great thing about this it does not only work for &, but for any especial character.
For instance:
var link = 'http://example.com?candy_name=' + encodeURIComponent('M&M?><')
Outputs:
"http://example.com?candy_name=M%26M%3F%3E%3C"
You can use the % character to 'escape' characters that aren't allowed in URLs. See RFC 1738.
A table of ASCII values is given on the Wikipedia page.
You can see & is 26 in hexadecimal - so you need M%26M.
This may help if someone want it in PHP
$variable ="candy_name=M&M";
$variable = str_replace("&", "%26", $variable);
If you can't use any libraries to encode the value,
http://www.urlencoder.org/ or http://www.urlencode-urldecode.com/ or ...
Just enter your value "M&M", not the full URL ;-)
You can rather pass your arguments using this encodeURIComponent function so you don't have to worry about passing any special characters.
data: "param1=getAccNos&param2="+encodeURIComponent('Dolce & Gabbana') OR
var someValue = 'Dolce & Gabbana';
data : "param1=getAccNos&param2="+encodeURIComponent(someValue)
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent

How to handle decimal ncr characters in php

I have a title of the document which is having the decimal ncr characters it needs to be converted to HTML.I tried mb_decode_numericentity but its not working, is there any other function which needs to be used.
Zasíláme Vám Set Edukačních Materiálů, Kterými Chceme Přispět k Minimalizaci Rizik Podávání Biologického Léku Remsima (infliximab)
mb_decode_numericentity is a weird function. In an attempt to make it match the interface for mb_encode_numericentity, there is a $convmap function that specifies which code points you want converted, and if omitted it defaults to no code points at all (do nothing). Also the default charset is probably not anything sensible.
To make it do something:
$convmap = array(0x0, 0x1FFFFF, 0, 0x1FFFFF);
mb_decode_numericentity($s, $convmap, 'utf-8')
However note that it doesn't decode HTML builtin entity references like & so as a means of decoding HTML content it's pretty much useless. Closer is:
html_entity_decode($s, ENT_QUOTES, 'utf-8');
or easiest, use an HTML parser to load the page and extract the already-decoded data from the DOM.

htmlentities content that has HTML

I'm trying to support multiple languages on my site. Some of the content that needs translating will have entity references like Ç. I could use htmlentities to convert that into a Ã. However, what if I need to translate a string that has mark up:
"<p>Hello, world with Ç</p>"
If I use htmlentities, the < and > would be converted, too. I don't want to break down the string into tags and non-tag parts, then apply htmlentities only to the non-tag parts. That'll be too messy and tedious.
A work around posted here
Pass your string to the following function and work with the returned string.
function unicode_escape_sequences($str){
$working = json_encode($str);
$working = preg_replace('/\\\u([0-9a-z]{4})/', '&#x$1;', $working);
return json_decode($working);
}

Replace characters in a string with their HTML coding

I need to replace characters in a string with their HTML coding.
Ex. The "quick" brown fox, jumps over the lazy (dog).
I need to replace the quotations with the & quot; and replace the brakets with & #40; and & #41;
I have tried str_replace, but I can only get 1 character to be replaced. Is there a way to replace multiple characters using str_replace? Or is there a better way to do this?
Thanks!
I suggest using the function htmlentities().
Have a look at the Manual.
PHP has a number of functions to deal with this sort of thing:
Firstly, htmlentities() and htmlspecialchars().
But as you already found out, they won't deal with ( and ) characters, because these are not characters that ever need to be rendered as entities in HTML. I guess the question is why you want to convert these specific characters to entities? I can't really see a good reason for doing it.
If you really do need to do it, str_replace() will do multiple string replacements, using arrays in both the search and replace paramters:
$output = str_replace(array('(',')'), array('&#40','&#41'), $input);
You can also use the strtr() function in a similar way:
$conversions = array('('=>'(', ')'=>')');
$output = strtr($conversions, $input);
Either of these would do the trick for you. Again, I don't know why you'd want to though, because there's nothing special about ( and ) brackets in this context.
While you're looking into the above, you might also want to look up get_html_translation_table(), which returns an array of entity conversions as used in htmlentities() or htmlspecialchars(), in a format suitable for use with strtr(). You could load that array and add the extra characters to it before running the conversion; this would allow you to convert all normal entity characters as well as the same time.
I would point out that if you serve your page with the UTF8 character set, you won't need to convert any characters to entities (except for the HTML reserved characters <, > and &). This may be an alternative solution for you.
You also asked in a separate comment about converting line feeds. These can be converted with PHP's nl2br() function, but could also be done using str_replace() or strtr(), so could be added to a conversion array with everything else.

Categories