PHP: Detect invalid characters in a text - php

I would like to parse user inputs with PHP. I need a function which tells me if there are invalid characters in the text or not. My draft looks as follows:
<?php
function contains_invalid_characters($text) {
for ($i = 0; $i < 3; $i++) {
$text = html_entity_decode($text); // decode html entities
} // loop is used for repeatedly html encoded entities
$found = preg_match(...);
return $found;
}
?>
The function should return TRUE if the input text contains invalid characters and FALSE if not. Valid characters should be:
a-z, A-Z, 0-9, äöüß, blank space, "!§$%&/()=[]\?.:,;-_
Can you tell me how to code this? Is preg_match() suitable for this purpose? It's also important that I can easily expand the function later so that it includes other characters.
I hope you can help me. Thanks in advance!

You could use a regular expression to do that:
function contains_invalid_characters($text) {
return (bool) preg_match('/[a-zA-Z0-9äöüß "!§$%&\/()=[\]\?.:,;\-_]/u', $text);
}
But note that you need to encode that code with the same encoding as the text you want to test. I recommend you to use UTF-8 for that.

Related

Exclude characters from check_plain() in Drupal form

I have a text field in my Drupal form, which I need to sanitise before saving into the database. The field is for a custom name, and I expect some users may want to write for example "Andy's" or "John's home".
The problem is, that when I run the field value through the check_plain() function, the apostrophe gets converted into ' - which means Andy's code becomes Andy's code.
Can I somehow exclude the apostrophe from the check_plain() function, or otherwise deal with this problem? I have tried wrapping in the format_string() function, but it's not working:
$nickname = format_string(check_plain($form_state['values']['custom_name'], array(''' => "'")));
Thanks.
No, you can't exclude handling of some character in check_plain(), because it's simply passes your text to php function htmlspecialchars() with ENT_QUOTES flag:
function check_plain($text) {
return htmlspecialchars($text, ENT_QUOTES, 'UTF-8');
}
ENT_QUOTES means that htmlspecialchars() will convert both double and single quotes to HTML entities.
Instead of check_plain() you could use htmlspecialchars() with ENT_COMPAT (so it will leave single-quotes alone):
htmlspecialchars($text, ENT_COMPAT, 'UTF-8');
but that can cause some security issues.
Another option is to write custom regular expression to properly sanitize your input.
I've been a bit worried about the security issue T-34 mentioned, so I've tried writing a work-around function which seems to be working OK. The function strips out the apostrophes, then runs check_plain() on each part, and pieces it back together again, re-inserting the apostrophes.
The function is:
function my_sanitize ($text) {
$clean = '';
$no_apostrophes = explode("'", $text);
$length = count($no_apostrophes);
if($length > 1){
for ($i = 0; $i < $length; $i++){
$clean .= CHECK_PLAIN($no_apostrophes[$i]);
if($i < ($length-1)){
$clean .= "'";
}
}
}
else{
$clean = CHECK_PLAIN($text);
}
return $clean;
}
And an example call is:
$nickname = my_sanitize($nickname);

Redirect loop due to "Header may not contain more than a single header, new line detected in"

I'm trying to redirect to a URL and as it can be provided by the user, it may be somewhat invalid thus producing the warning message
Header may not contain more than a single header, new line detected in
and oddly enough PHP generates a redirect to the same page thus creating a redirect loop.
How can I properly check the string to ensure there are no invalid characters in the URL? I tried
if (false === filter_var($url, FILTER_VALIDATE_URL)) die('Sorry, but no');
but it also failed on valid URLs that have non-English characters encoded in them.
I also tried strpos($url, "\n") and similar "\r" but probably some "newlines" are different and weren't detected.
In addition to my question on detecting it, isn't creating a redirect loop a faulty behavior by PHP that should be reported in that case?
Here's what I found in php.net comments and make a function out of it:
function isValidURI($uri) {
$res = filter_var ($uri, FILTER_VALIDATE_URL);
if ($res) return $res;
// Check if it has unicode chars.
$l = mb_strlen ($uri);
if ($l !== strlen ($uri)) {
// Replace wide chars by “X”.
$s = str_repeat (' ', $l);
for ($i = 0; $i < $l; ++$i) {
$ch = mb_substr ($uri, $i, 1);
$s [$i] = strlen ($ch) > 1 ? 'X' : $ch;
}
// Re-check now.
$res = filter_var ($s, FILTER_VALIDATE_URL);
if ($res) { $uri = $res; return 1; }
}
}
FILTER_VALIDATE_URL does not support internationalized domain name
(IDN). Valid or not, no domain name with Unicode chars on it will pass
validation.
The logic is simple. A non-ascii char is more than one byte long. We
replace every one of those chars by "X" and check again.
Source: http://php.net/manual/en/function.filter-var.php#104160
Hope this to be helpful to someone else as well.
You could use PHP's http://php.net/manual/en/function.parse-url.php function.
"On seriously malformed URLs, parse_url() may return FALSE."

how to use php preg_match() to detect illegal characters when given only legal characters?

I have a set of characters that are allowed in a string of text. Is it possible to use preg_match to detect the existence of characters outside of the range of provided characters?
for example:
$str1 = "abcdf9"
$str2 = "abcdf#"
$str3 = "abcdfg"
legal chars = "a-z"
if (preg_match() ... ) needs to return false for '$str1' & '$str2', but 'tru' for $str3.
Will this be possible?
if(!preg_match('/[^a-z]/', $string)) { //only a-z found }
//or
if(preg_match('/[^a-z]/', $string)) {
return false; // other stuff found
} else {
return true; // only a-z found
}
See this site very usefull to deploy your regEx
http://regexr.com/
What do you need is /[a-z]/ ?
You can specify the number of chars with /[a-z]{5}/

Last name validation in php

My aim is to validate a last name by allowing it to only contain letters or a single quote.
I do not know what the fastest way is..maybe regex I suppose..
Anyway, so far I have this:
function check_surname($surname)
{
$c = str_split($surname,1);
$i = 0;
$test = 1; // Wrong surname
while($i < strlen($surname))
{
if(ctype_alpha($c[$i]) or $c[$i] == '\'')
{
$test = 0;
$i++;
}
else
{
return false;
}
}
}
I can feel that something is wrong here but I can't see where it is.
Could anyone help me out?
There are some good suggestions in the comments, and I definitely agree with #Cyclone that you should take into account diacritics (accented letters).
Fortunately, PHP regexes support Unicode classes, so this is easy to do. Unicode includes a class L for any letter (uppercase, lowercase, modified, and title case). This will allow accented letters in the name.
I would also recommend that you allow for dashes (Katherine Zeta-Jones) and spaces (Guido van Rossum). Given all that, I would use the following regex:
preg_match("/^[\p{L} '-]+$/", lname);

Random String Generator (PHP)

I am trying write a PHP function that returns a random string of a given length. I wrote this:
<?
function generate_string($lenght) {
$ret = "";
for ($i = 0; $i < $lenght; $i++) {
$ret .= chr(mt_rand(32,126));
}
return $ret;
}
echo generate_string(150);
?>
The above function generates a random string, but the length of the string is not constant, ie: one time it is 30 characters, the other is 60 (obviously I call it with the same length as input every time). I've searched other examples of random string generators, but they all use a base string to pick letters. I am wondering why this method is not working properly.
Thanks!
Educated guess: you attempt to display your plain text string as HTML. The browser, after being told it's HTML, handles it as such. As soon as a < character is generated, the following characters are rendered as an (unknown) HTML tag and are not displayed as HTML standards mandate.
Fix:
echo htmlspecialchars(generate_string(150));
This is the conclusion i reached after testing it a while : Your functions works correctly. It depends on what you do with the randomly generated string. If you are simply echo-ing it, then it might generate somthing like <ck1ask which will be treated like a tag. Try eliminating certain characters from being concatenated to the string.
This function will work to generate a random string in PHP
function getRandomString($maxlength=12, $isSpecialChar=false)
{
$randomString=null;
//initalise the string include lower case, upper case and numbers
$charSet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
//if required special character to include, please set $isSpecialchar= 1 or true
if ($isSpecialChar) $charSet .= "~##$%^*()_±={}|][";
//loop for get specify length character with random characters
for ($i=0; $i<$maxlength; $i++) $randomString .= $charSet[(mt_rand(0, (strlen($charSet)-1)))];
//return the random string
return $randomString;
}
//call the function set value you required to string length default:12
$random8char=getRandomString(8);
echo $random8char;
Source: Generate random string in php

Categories