PHP remove everything except letters and a hyphen (-) - php

I'm making a form that asks for the user's first and last name, and I don't want them entering
$heil4
I would like them to enter
Sheila
I know how to filter out everything except letters, but I'm aware that some names can have
Sheila-McDonald
So how would I remove everything from a string apart from letters and a hyphen?

Simply use
$s = preg_replace("/[^a-z-]/i", "", $s);
or if you want to convert some non-ascii characters to ascii, such as Jean-Rémy to Jean-Remy, then use
$s = preg_replace("/[^a-z-]/i", "", iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $s));

Instead of replacing with nothing, have some fun. that way a name that consists mainly of numbers you can decode ;p
$name = '$h3il4-McD0nald';
$find = array(0,1,3,4,5,6,7,'$');
$replace = array('o','l','e','a','s','g','t','s');
$name = str_replace($find,$replace,$name);
//Sheila-McDonald
echo ucfirst(preg_replace('/[^a-z-]/i', '', $name));

$new = preg_replace('#[^A-Z-]#iu', '', $data);
but instead of removing letters (and thus modifying user's input) better validate it
and show an error if the input is not valid. This way the user will know that what he had entered is exactly the value you have
if(!preg_match('#[A-Z-]#iu', $data)) echo 'invalid';

Use this to strip out all non alpha-numeric characters, not including non latin characters, and prescribed punctuation.
$strtochange= preg_replace("/[^\s\p{Pd}a-zA-ZÀ-ÿ]/",'',$strtochange);
Note: this will turn $heil4 into heil.

Related

Split first and last name with UTF-8 Chars in php

Hoping someone can help give a simple solution to splitting a first and last name when the full name has french or other accents on the characters.
This seems to work fine when when the name doesn't have any accents, but isn't working to detect the white space when there is an accent in the string.
An example name would be "Marc-André Côté"
$name = trim($FullNameInput);
$last_name = (strpos($name, ' ') === false) ? '' : preg_replace('#.*\s([\w-]*)$#', '$1', $name);
$first_name = trim( preg_replace('#'.$last_name.'#', '', $name ) );
Use a UNICODE modifier when dealing with Unicode strings and add a single quote since some names contain it:
 preg_replace('#.*\s([\w\'-]*)$#u', '$1', $name)
The UNICODE modifier will also make \w Unicode aware. A preg_match solution will be cleaner though:
 preg_match('/\s\K[\w\'-]+$/u', $name, $m)
You just need to check if there is a match. If there is a match, get $m[0], else assign an empty string to it.
You can use the explode Function. It runs with your example name without any problems and it´s much easier than your regex solution
$nameparts = explode(" ", $name);
echo $nameparts[0]; // First Name
echo $nameparts[1]; // Last Name

regex to also match accented characters

I have the following PHP code:
$search = "foo bar que";
$search_string = str_replace(" ", "|", $search);
$text = "This is my foo text with qué and other accented characters.";
$text = preg_replace("/$search_string/i", "<b>$0</b>", $text);
echo $text;
Obviously, "que" does not match "qué". How can I change that? Is there a way to make preg_replace ignore all accents?
The characters that have to match (Spanish):
á,Á,é,É,í,Í,ó,Ó,ú,Ú,ñ,Ñ
I don't want to replace all accented characters before applying the regex, because the characters in the text should stay the same:
"This is my foo text with qué and other accented characters."
and not
"This is my foo text with que and other accented characters."
The solution I finally used:
$search_for_preg = str_ireplace(["e","a","o","i","u","n"],
["[eé]","[aá]","[oó]","[ií]","[uú]","[nñ]"],
$search_string);
$text = preg_replace("/$search_for_preg/iu", "<b>$0</b>", $text)."\n";
$search = str_replace(
['a','e','i','o','u','ñ'],
['[aá]','[eé]','[ií]','[oó]','[uú]','[nñ]'],
$search)
This and the same for upper case will complain your request. A side note: ñ replacemet sounds invalid to me, as 'niño' is totaly diferent from 'nino'
If you want to use the captured text in the replacement string, you have to use character classes in your $search variable (anyway, you set it manually):
$search = "foo bar qu[eé]"
And so on.
You could try defining an array like this:
$vowel_replacements = array(
"e" => "eé",
// Other letters mapped to their other versions
);
Then, before your preg_match call, do something like this:
foreach ($vowel_replacements as $vowel => $replacements) {
str_replace($search_string, "$vowel", "[$replacements]");
}
If I'm remembering my PHP right, that should replace your vowels with a character class of their accented forms -- which will keep it in place. It also lets you change the search string far more easily; you don't have to remember to replaced the vowels with their character classes. All you have to remember is to use the non-accented form in your search string.
(If there's some special syntax I'm forgetting that does this without a foreach, please comment and let me know.)

Check string for high ascii, punctuation, other weird characters

I've got a string called $ID coming in from a different page and hitting base64_decode($enc); and want to check it for any weird characters. $ID when decrypted should only contain letters, numbers, underscores and dashes.
I've had a bit of a look at preg_replace('/[\x80-\xFF]/', '', $string); which cuts out some weird characters---which is helpful---but I can still see sometimes that # signs and brackets and stuff still make it in.
Is there a way I can lower the ascii test? Or how else do I cut out everything except letters, numbers, underscores and dashes?
Any help at pointing me in the right direction is wonderful and thanks!
$enc = $_GET["key"];
$ID= base64_decode($enc);
if (empty($enc)) { echo "key is empty"; } else {
echo "string ok<br>";
$check = preg_replace('/[\x80-\xFF]/', '', $ID);
echo $check;
// i can see this step is helping cut junk out, do more tests from here
}
Typing a caret after the opening square bracket negates the character class, so you can do:
$check = preg_replace('/[^A-Za-z0-9_-]/', '', $ID);
You can use this replacement:
$check = preg_replace('~[^[:word:]-]+~', '', $ID);
The [:word:] character class contains letters, digits and the underscore.
To make the string lowercase, use strtolower()

Trying to generate url slugs with PHP regex, Japanese characters not going through

So I'm trying to generate slugs to store in my DB. My locales include English, some European languages and Japanese.
I allow \d, \w, European characters are transliterated, Japanese characters are untouched. Period, plus and dash (-) are kept. Leading/trailing whitespace is removed, while the whitespace in between is replaced by a dash.
Here is some code: (please feel free to improve it, given my conditions above as my regex-fu is currently white belt tier)
function ToSlug($string, $separator='-') {
$url = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);
$url = preg_replace('/[^\d\w一-龠ぁ-ゔァ-ヴー々〆〤.+ -]/', '', $url);
$url = strtolower($url);
$url = preg_replace('/[ ' . $separator . ']+/', $separator, $url);
return $url;
}
I'm testing this function, however my JP characters are not getting through, they are simply replaced by ''. Whilst I do suspect it's the //IGNORE that's taking them out, I need that their or else my German, France transliterations will not work. Any ideas on how I can fix this?
EDIT: I'm not sure if Japanese Kanji covers all of Simplified Chinese but I'm gonna need that and Korean as well. If anyone who knows the regex off the bat please let me know it will save me some time searching. Thanks.
Note: I am not familiar with the Japanese writing system.
Looking at the function the iconv call appears to remove all the Japanese characters. Instead of using iconv to transliterate, it may be easier to just create a function that does it:
function _toSlugTransliterate($string) {
// Lowercase equivalents found at:
// https://github.com/kohana/core/blob/3.3/master/utf8/transliterate_to_ascii.php
$lower = [
'à'=>'a','ô'=>'o','ď'=>'d','ḟ'=>'f','ë'=>'e','š'=>'s','ơ'=>'o',
'ß'=>'ss','ă'=>'a','ř'=>'r','ț'=>'t','ň'=>'n','ā'=>'a','ķ'=>'k',
'ŝ'=>'s','ỳ'=>'y','ņ'=>'n','ĺ'=>'l','ħ'=>'h','ṗ'=>'p','ó'=>'o',
'ú'=>'u','ě'=>'e','é'=>'e','ç'=>'c','ẁ'=>'w','ċ'=>'c','õ'=>'o',
'ṡ'=>'s','ø'=>'o','ģ'=>'g','ŧ'=>'t','ș'=>'s','ė'=>'e','ĉ'=>'c',
'ś'=>'s','î'=>'i','ű'=>'u','ć'=>'c','ę'=>'e','ŵ'=>'w','ṫ'=>'t',
'ū'=>'u','č'=>'c','ö'=>'o','è'=>'e','ŷ'=>'y','ą'=>'a','ł'=>'l',
'ų'=>'u','ů'=>'u','ş'=>'s','ğ'=>'g','ļ'=>'l','ƒ'=>'f','ž'=>'z',
'ẃ'=>'w','ḃ'=>'b','å'=>'a','ì'=>'i','ï'=>'i','ḋ'=>'d','ť'=>'t',
'ŗ'=>'r','ä'=>'a','í'=>'i','ŕ'=>'r','ê'=>'e','ü'=>'u','ò'=>'o',
'ē'=>'e','ñ'=>'n','ń'=>'n','ĥ'=>'h','ĝ'=>'g','đ'=>'d','ĵ'=>'j',
'ÿ'=>'y','ũ'=>'u','ŭ'=>'u','ư'=>'u','ţ'=>'t','ý'=>'y','ő'=>'o',
'â'=>'a','ľ'=>'l','ẅ'=>'w','ż'=>'z','ī'=>'i','ã'=>'a','ġ'=>'g',
'ṁ'=>'m','ō'=>'o','ĩ'=>'i','ù'=>'u','į'=>'i','ź'=>'z','á'=>'a',
'û'=>'u','þ'=>'th','ð'=>'dh','æ'=>'ae','µ'=>'u','ĕ'=>'e','ı'=>'i',
];
return str_replace(array_keys($lower), array_values($lower), $string);
}
So, with some modifications, it could look something like this:
function toSlug($string, $separator = '-') {
// Work around this...
#$string = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);
$string = _toSlugTransliterate($string);
// Remove unwanted chars + trim excess whitespace
// I got the character ranges from the following URL:
// https://stackoverflow.com/questions/6787716/regular-expression-for-japanese-characters#10508813
$regex = '/[^一-龠ぁ-ゔァ-ヴーa-zA-Z0-9a-zA-Z0-9々〆〤.+ -]|^\s+|\s+$/u';
$string = preg_replace($regex, '', $string);
// Using the mb_* version seems safer for some reason
$string = mb_strtolower($string);
// Same as before
$string = preg_replace("/[ {$separator}]+/", $separator, $string);
return $string;
}
$x = ' æøå!this.ís-a test-ゔヴ ーァ ';
echo toSlug($x);
In regex you can use unicode "scripts" to match letters of various languages. There is no "Japanese" one, but there are Hiragana, Katakana and Han. As I have no idea how Japanese is written, and how one could use these, I am not even going to try.
Using these scripts, however, would be done something like this:
'/[\p{Hiragana}\p{Katakana}\p{Han}]+/'

Filter everything but letters and "-"

I am trying to write a function that filteres everything from an input to only letters and the symbol "-". I want that symbol since the input contains names, and someone may be called Jean-Paul, this is my current code:
if(!preg_match('/^\[a-zA-Z]+$/',$string)) {
// Containing something other than a-z and A-Z
}
$string = 'Jean-Paul'; now gives that the string contains illegal characters, but how can I do so that it accepts "-" ?
if (!preg_match('/^[A-Z-]+$/i', $string)) {
// Contains something other than A-Z (case-insensitive) or -
}
A - is treated as a literal dash inside a character class if it's the first or last character there.
Be aware that "Jean-Rémy" will still fail. Are you sure you want to restrict yourself to ASCII letters?
If by "filter" you mean delete unwanted characters, then use
$s = preg_replace("/[^a-z-]/i", "", $s);
or
$s = preg_replace("/[^a-z-]/i", "", iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $s));

Categories