I have a text field in my Drupal form, which I need to sanitise before saving into the database. The field is for a custom name, and I expect some users may want to write for example "Andy's" or "John's home".
The problem is, that when I run the field value through the check_plain() function, the apostrophe gets converted into ' - which means Andy's code becomes Andy's code.
Can I somehow exclude the apostrophe from the check_plain() function, or otherwise deal with this problem? I have tried wrapping in the format_string() function, but it's not working:
$nickname = format_string(check_plain($form_state['values']['custom_name'], array(''' => "'")));
Thanks.
No, you can't exclude handling of some character in check_plain(), because it's simply passes your text to php function htmlspecialchars() with ENT_QUOTES flag:
function check_plain($text) {
return htmlspecialchars($text, ENT_QUOTES, 'UTF-8');
}
ENT_QUOTES means that htmlspecialchars() will convert both double and single quotes to HTML entities.
Instead of check_plain() you could use htmlspecialchars() with ENT_COMPAT (so it will leave single-quotes alone):
htmlspecialchars($text, ENT_COMPAT, 'UTF-8');
but that can cause some security issues.
Another option is to write custom regular expression to properly sanitize your input.
I've been a bit worried about the security issue T-34 mentioned, so I've tried writing a work-around function which seems to be working OK. The function strips out the apostrophes, then runs check_plain() on each part, and pieces it back together again, re-inserting the apostrophes.
The function is:
function my_sanitize ($text) {
$clean = '';
$no_apostrophes = explode("'", $text);
$length = count($no_apostrophes);
if($length > 1){
for ($i = 0; $i < $length; $i++){
$clean .= CHECK_PLAIN($no_apostrophes[$i]);
if($i < ($length-1)){
$clean .= "'";
}
}
}
else{
$clean = CHECK_PLAIN($text);
}
return $clean;
}
And an example call is:
$nickname = my_sanitize($nickname);
Related
Want to replace specific letters in a string to a full word.
I'm using:
function spec2hex($instr) {
for ($i=0; $i<strlen($instr); $i++) {
$char = substr($instr, $i,1);
if ($char == "a"){
$char = "hello";
}
$convString .= "&#".ord($char).";";
}
return $convString;
}
$myString = "adam";
$convertedString = spec2hex($myString);
echo $convertedString;
but that's returning:
hdhm
How do I do this? By the way, this is to replace punctuation with hex characters.
Thanks all.
Use http://php.net/substr_replace
substr_replace($instr, $word, $i,1);
ord() expects only a SINGLE character. You're passing in hello, so ord is doing its thing only on the h:
php > echo ord('hello');
104
php > echo ord('h');
104
So in effect your output is actually
hdhm
it you want to use your same code just change $convString .= "&#".ord($char).";";
to $convString .= $char;
If you just want to replace the occurrence of a with hello within the string you pass to the function, why not use PHP's str_replace()?
function spec2hex($instr) {
return str_replace("a","hello",$instr);
}
I must assume that you don't want to have hex characters instead of punctuation but html entities. Be aware that str_replace(), when called with arrays, will run over the string for multiple times, thus replacing the ";" in "{" also!
Your posted code is not useful for replacing punctuation.
use strtr() with arrays, it doesn't have the drawback of str_replace().
$aReplacements = array(',' => ',', '.' => '.'); //todo: complete the array
$sText = strtr($sText, $aReplacements);
I have a unique problem with multibyte character strings and need to be able to shuffle, with some fair degree of randomness, a long UTF-8 encoded multibyte string in PHP without dropping or losing or repeating any of the characters.
In the PHP manual under str_shuffle there is a multi-byte function (the first user submitted one) that doesn't work: If I use a string with for example all the Japanese hiragana and katakana of string length (ex) 120 chars, I am returned a string that's 119 chars or 118 chars. Sometimes I've seen duplicate chars even though the original string doesn't have them. So that's not functional.
To make this more complex, I also need to include if possible Japanese UTF-8 newlines and line feeds and punctuation.
Can anyone with experience dealing in multiple languages with UTF-8 mb strings help? Does PHP have any built in functions to do this? str_shuffle is EXACTLY what I want. I just need it to also work on multibyte chars.
Thanks very much!
Try splitting the string using mb_strlen and mb_substr to create an array, then using shuffle before joining it back together again. (Edit: As also demonstrated in #Frosty Z's answer.)
An example from the PHP interactive prompt:
php > $string = "Pretend I'm multibyte!";
php > $len = mb_strlen($string);
php > $sploded = array();
php > while($len-- > 0) { $sploded[] = mb_substr($string, $len, 1); }
php > shuffle($sploded);
php > echo join('', $sploded);
rmedt tmu nIb'lyi!eteP
You'll want to be sure to specify the encoding, where appropriate.
This should do the trick, too. I hope.
class String
{
public function mbStrShuffle($string)
{
$chars = $this->mbGetChars($string);
shuffle($chars);
return implode('', $chars);
}
public function mbGetChars($string)
{
$chars = [];
for($i = 0, $length = mb_strlen($string); $i < $length; ++$i)
{
$chars[] = mb_substr($string, $i, 1, 'UTF-8');
}
return $chars;
}
}
I like to use this function:
function mb_str_shuffle($multibyte_string = "abcčćdđefghijklmnopqrsštuvwxyzžß,.-+'*?=)(/&%$#!~ˇ^˘°˛`˙´˝") {
$characters_array = mb_str_split($multibyte_string);
shuffle($characters_array);
return implode('', $characters_array); // or join('', $characters_array); if you have a death wish (JK)
}
Split string into an array of multibyte characters
Shuffle the good guy array who doesn't care about his residents being multibyte
Join the shuffled array together into a string
Of course I normally wouldn't have a default value for function's parameter.
This question already has answers here:
Can str_replace be safely used on a UTF-8 encoded string if it's only given valid UTF-8 encoded strings as arguments?
(5 answers)
Closed 10 hours ago.
Given certain multibyte character sets, am I correct in assuming that the following doesn't do what it was intended to do?
$string = str_replace('"', '\\"', $string);
In particular, if the input was in a character set that might have a valid character like 0xbf5c, so an attacker can inject 0xbf22 to get 0xbf5c22, leaving a valid character followed by an unquoted double quote (").
Is there an easy way to mitigate this problem, or am I misunderstanding the issue in the first place?
(In my case, the string is going into the value attribute of an HTML input tag: echo 'input type="text" value="' . $string . '">';)
EDIT: For that matter, what about a function like preg_quote()? There's no charset argument for it, so it seems totally useless in this scenario. When you DON'T have the option of limiting charset to UTF-8 (yes, that'd be nice), it seems like you are really handicapped. What replace and quoting functions are available in that case?
No, you’re right: Using a singlebyte string function on a multibyte string can cause an unexpected result. Use the multibyte string functions instead, for example mb_ereg_replace or mb_split:
$string = mb_ereg_replace('"', '\\"', $string);
$string = implode('\\"', mb_split('"', $string));
Edit Here’s a mb_replace implementation using the split-join variant:
function mb_replace($search, $replace, $subject, &$count=0) {
if (!is_array($search) && is_array($replace)) {
return false;
}
if (is_array($subject)) {
// call mb_replace for each single string in $subject
foreach ($subject as &$string) {
$string = &mb_replace($search, $replace, $string, $c);
$count += $c;
}
} elseif (is_array($search)) {
if (!is_array($replace)) {
foreach ($search as &$string) {
$subject = mb_replace($string, $replace, $subject, $c);
$count += $c;
}
} else {
$n = max(count($search), count($replace));
while ($n--) {
$subject = mb_replace(current($search), current($replace), $subject, $c);
$count += $c;
next($search);
next($replace);
}
}
} else {
$parts = mb_split(preg_quote($search), $subject);
$count = count($parts)-1;
$subject = implode($replace, $parts);
}
return $subject;
}
As regards the combination of parameters, this function should behave like the singlebyte str_replace.
The code is perfectly safe with sane multibyte-encodings like UTF-8 and EUC-TW, but dangerous with broken ones like Shift_JIS, GB*, etc. Rather than going through all the headache and overhead to be safe with these legacy encodings, I would recommend just supporting only UTF-8.
You could use either mb_ereg_replace by first specifying the charset with mb_regex_encoding(). Alternatively if you use UTF-8, you can use preg_replace with the u modifier.
I'm playing around with encrypt/decrypt coding in php. Interesting stuff!
However, I'm coming across some issues involving what text gets encrypted into.
Here's 2 functions that encrypt and decrypt a string. It uses an Encryption Key, which I set as something obscure.
I actually got this from a php book. I modified it slightly, but not to change it's main goal.
I created a small example below that anyone can test.
But, I notice that some characters show up as the "encrypted" string. Characters like "=" and "+".
Sometimes I pass this encrypted string via the url. Which may not quite make it to my receiving scripts. I'm guessing the browser does something to the string if certain characters are seen. I'm really only guessing.
is there another function I can use to ensure the browser doesn't touch the string? or does anyone know enough php bas64_encode() to disallow certain characters from being used? I'm really not going to expect the latter as a possibility. But, I'm sure there's a work-around.
enjoy the code, whomever needs it!
define('ENCRYPTION_KEY', "sjjx6a");
function encrypt($string) {
$result = '';
for($i=0; $i<strlen($string); $i++) {
$char = substr($string, $i, 1);
$keychar = substr(ENCRYPTION_KEY, ($i % strlen(ENCRYPTION_KEY))-1, 1);
$char = chr(ord($char)+ord($keychar));
$result.=$char;
}
return base64_encode($result)."/".rand();
}
function decrypt($string){
$exploded = explode("/",$string);
$string = $exploded[0];
$result = '';
$string = base64_decode($string);
for($i=0; $i<strlen($string); $i++) {
$char = substr($string, $i, 1);
$keychar = substr(ENCRYPTION_KEY, ($i % strlen(ENCRYPTION_KEY))-1, 1);
$char = chr(ord($char)-ord($keychar));
$result.=$char;
}
return $result;
}
echo $encrypted = encrypt("reaplussign.jpg");
echo "<br>";
echo decrypt($encrypted);
You could use PHP's urlencode and urldecode functions to make your encryption results safe for use in URLs, e.g
echo $encrypted = urlencode(encrypt("reaplussign.jpg"));
echo "<br>";
echo decrypt(urldecode($encrypted));
You should look at urlencode() to escape the string correctly for use in the query.
If you are worried about +,= etc. similar characters, you should have a look at http://php.net/manual/en/function.urlencode.php and it's friends from "See also" section. Encode it in encrypt() and decode at the beginning of decrypt().
If this doesn't work for you, maybe some simple substitution?
$text = str_replace('+','%20',$text);
I am building a XML RSS for my page. And running into this error:
error on line 39 at column 46: xmlParseEntityRef: no name
Apparently this is because I cant have & in XML... Which I do in my last field row...
What is the best way to clean all my $row['field']'s in PHP so that &'s turn into &
Use htmlspecialchars to encode just the HTML special characters &, <, >, " and optionally ' (see second parameter $quote_style).
It's called htmlentities() and html_entity_decode()
Really should look in the dom xml functions in php. Its a bit of work to figure out, but you avoid problems like this.
Convert Reserved XML characters to Entities
function xml_convert($str, $protect_all = FALSE)
{
$temp = '__TEMP_AMPERSANDS__';
// Replace entities to temporary markers so that
// ampersands won't get messed up
$str = preg_replace("/&#(\d+);/", "$temp\\1;", $str);
if ($protect_all === TRUE)
{
$str = preg_replace("/&(\w+);/", "$temp\\1;", $str);
}
$str = str_replace(array("&","<",">","\"", "'", "-"),
array("&", "<", ">", """, "'", "-"),
$str);
// Decode the temp markers back to entities
$str = preg_replace("/$temp(\d+);/","&#\\1;",$str);
if ($protect_all === TRUE)
{
$str = preg_replace("/$temp(\w+);/","&\\1;", $str);
}
return $str;
}
Use
html_entity_decode($row['field']);
This will take and revert back to the & from & also if you have &npsb; it will change that to a space.
http://us.php.net/html_entity_decode
Cheers