I want to check if a string contains only characters, numbers and special-chars common in Europe. I found answers like How to check, if a php string contains only english letters and digits?, but this is not covering French é and è or German äöüß or Romanian ă. I also want to allow often use special-chars like €, !"§$%&/()=#|<>
Does somebody have a complete set which contains all those chars to make a check out of it?
You can test for Latin characters with \p{Latin} making sure to use the u regex flag:
<?php
$tests = [
'éèäöüßäöüßäöüßäöü',
'abcdeABCDE',
'€, !"§$%&/()=#|<>',
'ÄäAa',
'*',
'Здравствуйте'
];
foreach ($tests as $test) {
if (!preg_match('/[^\p{Latin}0-9€, !"§$%&\/()=#|<>]/u', $test)) {
echo "$test is okay\n";
}
}
Prints:
éèäöüßäöüßäöüßäöü is okay
abcdeABCDE is okay
€, !"§$%&/()=#|<> is okay
ÄäAa is okay
I think you can use a regex
$re = '/[A-Za-z0-9]*/m';
$str = 'человек';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Characters not in a-z & A-Z would be:
[^a-zA-Z]
So you may use something like:
Regex_CountMatches([String_Field],"[^a-zA-Z]")
Because this function has a case option (default value of 1 is case insensitive), just searching for [^a-z] may work too.
Related
If the first character of my string contains any of the following letters, then I would like to change the first letter to Uppercase: (a,b,c,d,f,g,h,j,k,l,m,n,o,p,q,r,s,t,v,w,y,z) but not (e,i,u,x).
For example,
luke would become Luke
egg would stay the same as egg
dragon would become Dragon
I am trying to acheive this with PHP, here's what I have so far:
<?php if($str("t","t"))
echo ucfirst($str);
else
echo "False";
?>
My code is simply wrong and it doesn't work and I would be really grateful for some help.
Without regex:
function ucfirstWithCond($str){
$exclude = array('e','i','u','x');
if(!in_array(substr($str, 0, 1), $exclude)){
return ucfirst($str);
}
return $str;
}
$test = "egg";
var_dump(ucfirstWithCond($test)); //egg
$test = "luke";
var_dump(ucfirstWithCond($test)); //Luke
Demo:
http://sandbox.onlinephpfunctions.com/code/c87c6cbf8c616dd76fe69b8f081a1fbf61cf2148
You may use
$str = preg_replace_callback('~^(?![eiux])[a-z]~', function($m) {
return ucfirst($m[0]);
}, $str);
See the PHP demo
The ^(?![eiux])[a-z] regex matches any lowercase ASCII char at the start of the string but e, u, i and x and the letter matched is turned to upper inside the callback function to preg_replace_callback.
If you plan to process each word in a string you need to replace ^ with \b, or - to support hyphenated words - with \b(?<!-) or even with (?<!\S) (to require a space or start of string before the word).
If the first character could be other than a letter then check with an array range from a-z that excludes e,i,u,x:
if(in_array($str[0], array_diff(range('a','z'), ['e','i','u','x']))) {
$str[0] = ucfirst($str[0]);
}
Probably simpler to just check for the excluded characters:
if(!in_array($str[0], ['e','i','u','x'])) {
$str[0] = ucfirst($str[0]);
}
This is the code:
<?php
$pattern =' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
$text = "kdaiuyq7e611422^^$^vbnvcn^vznbsjhf";
$text_split = str_split($text,1);
$data = '';
foreach($text_split as $value){
if (preg_match("/".$value."/", $pattern )){
$data = $data.$value;
}
if (!preg_match('/'.$value.'/', $pattern )){
break;
}
}
echo $data;
?>
Current output:
kdaiuyq7e611422^^$^vbnvcn^vznbsjhf
Expected output:
kdaiuyq7e611422
Please help me editing my code error. In pattern there is no ^ or $. But preg_match is showing matched which is doubtful.
You string $text have ^ which will match the begin of the string $pattern.
So the preg_match('/^/', $pattern) will return true, then the ^ will append to $data.
You should escape the ^ as a raw char, not a special char with preg_match('/\^/', $pattern) by the help of preg_quote() which will escape the special char.
There is no need to split your string up like that, the whole point of a regular expression is you can specify all the conditions within the expression. You can condense your entire code down to this:
$pattern = '/^[[:word:] ]+/';
$text = 'kdaiuyq7e611422^^$^vbnvcn^vznbsjhf';
preg_match($pattern, $text, $matches);
echo $matches[0];
Kris has accurately isolated that escaping in your method is the monkey wrench. This can be solved with preg_quote() or wrapping pattern characters in \Q ... \E (force characters to be interpreted literally).
Slapping that bandaid on your method (as you have done while answering your own question) doesn't help you to see what you should be doing.
I recommend that you do away with the character mask, the str_split(), and the looped calls of preg_match(). Your task can be accomplished far more briefly/efficiently/directly with a single preg_match() call. Here is the clean way that obeys your character mask fully:
Code: (Demo)
$text = "kdaiuyq7e611422^^$^vbnvcn^vznbsjhf";
echo preg_match('/^[a-z\d ]+/i',$text,$out)?$out[0]:'No Match';
Output:
kdaiuyq7e611422
miknik's method was close to this, but it did not maintain 100% accuracy given your question requirements. I'll explain:
[:word:] is a POSIX Character Class (functioning like \w) that represents letters(uppercase and lowercase), numbers, and an underscore. Unfortunately for miknik, the underscore is not in your list of wanted characters, so this renders the pattern slightly inaccurate and may be untrustworthy for your project.
i'm not very firm with regular Expressions, so i have to ask you:
How to find out with PHP if a string contains a word starting with # ??
e.g. i have a string like "This is for #codeworxx" ???
I'm so sorry, but i have NO starting point for that :(
Hope you can help.
Thanks,
Sascha
okay thanks for the results - but i did a mistake - how to implement in eregi_replace ???
$text = eregi_replace('/\B#[^\B]+/','\\1', $text);
does not work??!?
why? do i not have to enter the same expression as pattern?
Match anything with has some whitespace in front of a # followed by something else than whitespace:
$ cat 1812901.php
<?php
echo preg_match("/\B#[^\B]+/", "This should #match it");
echo preg_match("/\B#[^\B]+/", "This should not# match");
echo preg_match("/\B#[^\B]+/", "This should match nothing and return 0");
echo "\n";
?>
$ php 1812901.php
100
break your string up like this:
$string = 'simple sentence with five words';
$words = explode(' ', $string );
Then you can loop trough the array and check if the first character of each word equals "#":
if ($stringInTheArray[0] == "#")
Assuming you define a word a sequence of letters with no white spaces between them, then this should be a good starting point for you:
$subject = "This is for #codeworxx";
$pattern = '/\s*#(.+?)\s/';
preg_match($pattern, $subject, $matches);
print_r($matches);
Explanation:
\s*#(.+?)\s - look for anything starting with #, group all the following letters, numbers, and anything which is not a whitespace (space, tab, newline), till the closest whitespace.
See the output of the $matches array for accessing the inner groups and the regex results.
#OP, no need regex. Just PHP string methods
$mystr='This is for #codeworxx';
$str = explode(" ",$mystr);
foreach($str as $k=>$word){
if(substr($word,0,1)=="#"){
print $word;
}
}
Just incase this is helpful to someone in the future
/((?<!\S)#\w+(?!\S))/
This will match any word containing alphanumeric characters, starting with "#." It will not match words with "#" anywhere but the start of the word.
Matching cases:
#username
foo #username bar
foo #username1 bar #username2
Failing cases:
foo#username
#username$
##username
I have the following PHP code:
$search = "foo bar que";
$search_string = str_replace(" ", "|", $search);
$text = "This is my foo text with qué and other accented characters.";
$text = preg_replace("/$search_string/i", "<b>$0</b>", $text);
echo $text;
Obviously, "que" does not match "qué". How can I change that? Is there a way to make preg_replace ignore all accents?
The characters that have to match (Spanish):
á,Á,é,É,í,Í,ó,Ó,ú,Ú,ñ,Ñ
I don't want to replace all accented characters before applying the regex, because the characters in the text should stay the same:
"This is my foo text with qué and other accented characters."
and not
"This is my foo text with que and other accented characters."
The solution I finally used:
$search_for_preg = str_ireplace(["e","a","o","i","u","n"],
["[eé]","[aá]","[oó]","[ií]","[uú]","[nñ]"],
$search_string);
$text = preg_replace("/$search_for_preg/iu", "<b>$0</b>", $text)."\n";
$search = str_replace(
['a','e','i','o','u','ñ'],
['[aá]','[eé]','[ií]','[oó]','[uú]','[nñ]'],
$search)
This and the same for upper case will complain your request. A side note: ñ replacemet sounds invalid to me, as 'niño' is totaly diferent from 'nino'
If you want to use the captured text in the replacement string, you have to use character classes in your $search variable (anyway, you set it manually):
$search = "foo bar qu[eé]"
And so on.
You could try defining an array like this:
$vowel_replacements = array(
"e" => "eé",
// Other letters mapped to their other versions
);
Then, before your preg_match call, do something like this:
foreach ($vowel_replacements as $vowel => $replacements) {
str_replace($search_string, "$vowel", "[$replacements]");
}
If I'm remembering my PHP right, that should replace your vowels with a character class of their accented forms -- which will keep it in place. It also lets you change the search string far more easily; you don't have to remember to replaced the vowels with their character classes. All you have to remember is to use the non-accented form in your search string.
(If there's some special syntax I'm forgetting that does this without a foreach, please comment and let me know.)
This question already has answers here:
Matching UTF Characters with preg_match in PHP: (*UTF8) Works on Windows but not Linux
(3 answers)
Closed 9 years ago.
I am trying to create a regular expression for any given string.
Goal: remove ALL characters which are not "latin" or "lowercase greek" or "numbers" .
What I have done so far: [^a-z0-9]
This works perfect for latin characters.
When I try this: [^a-z0-9α-ω] no luck. Works BUT leaves out any other symbol like !!#$%#%#$#,`
My knowledge is limited when it comes to regexp. Any help would be much appreciated!
EDIT:
Posted below is the function that matches characters specified and creates a slug out of it, with a dash as a separation character:
$q_separator = preg_quote('-');
$trans = array(
'&.+?;' => '',
'[^a-z0-9 -]' => '',
'\s+' => $separator,
'('.$q_separator.')+' => $separator
);
$str = strip_tags($str);
foreach ($trans as $key => $val){
$str = preg_replace("#".$key."#i", $val, $str);
}
if ($lowercase === TRUE){
$str = strtolower($str);
}
return trim($str, '-');
So if the string is: OnCE upon a tIME !#% #$$ in MEXIco
Using the function the output will be: once-upon-a-time-in-mexico
This works fine but I want the preg_match also to exclude greek characters.
Ok, can this replace your function?
$subject = 'OnCEΨΩ é-+#àupon</span> aαθ tIME !#%#$ in MEXIco in the year 1874 <or 1875';
function format($str, $excludeRE = '/[^a-z0-9]+/u', $separator = '-') {
$str = strip_tags($str);
$str = strtolower($str);
$str = preg_replace($excludeRE, $separator, $str);
$str = trim($str, $separator);
return $str;
}
echo format($subject);
Note that you will loose all characters after a < (cause of strip_tags) until you meet a >
// Old answer when I tought you wanted to preserve greek characters
It's possible to build a character range such as α-ω or any strange characters you want! The reason your pattern doesn't work is that you don't inform the regex engine you are dealing with a unicode string. To do that, you must add the u modifier at the end of the pattern. Like that:
/[^a-z0-9α-ω]+/u
You can use chars hexadecimal code too:
/[^a-z0-9\x{3B1}-\x{3C9}]+/u
Note that if you are sure not to have or want to preserve, uppercase Greek chars in your string, you can use the character class \p{Greek} like this :
/[^a-z0-9\p{Greek}]+/u
(It's a little longer but more explicit)
There's already an answered question about this:
Remove Non English Characters PHP
You can't specify a range such as α-ω but you need to use their code e.g. \00-\255