Regex for description form input - php

a textarea is a part of my form. The user has to write a little text and I want to validate this text. For now I am using the following regex:
/^[0-9a-zA-ZäöüÄÖÜ_\-']+$/
Although I have mentioned the äöüÄÖÜ in the regex it handles all words with äöü.. as invalid. Furthermore it does not accept empty spaces.
Any ideas how to improve the regex?

Use a Unicode-aware regex:
/[\pL\pN_\-]+/

the PCRE u modifier allows for utf-8. You are also missing a space from the regex, and you can condense it a bit:
/^[0-9a-zäöü\- ]+$/ui
Though I'm not sure if 'i' will work with the capitals of the foreign characters.
You may also want to include punctuation.

First, you might have an encoding issue, that's why äöüÄÖÜ are registered as invalid. I'm not a PHP user, so I can't answer your question directly, but taking a look at this page might help you. Also, using appropriate character classes could work better than explicitly writing all appropriate letters. Alas, this is also probably encoding configuration dependent.
Second, you need a space in your regex, so
/^[0-9a-z A-ZäöüÄÖÜ_\-']+$/ // note space after a-z
should work. Note what I wrote in last paragraph about using character classes. \w might be sufficient instead of a-zA-ZäöüÄÖÜ

You may just use \w to indicate all "word" characters (letters, digits, etc.) So the regex will be
/^[\w_\-' ]+$/
What text from the user are you considering to be "valid"?

Related

Regex blocking special characters

I'm using PHP Version 5.3.27
I'm trying to get my regex to match whitespace, and special characters such as ♦◘•♠♥☻, the other known special characters which are %$#&*# are already matched, but somehow the ones I mentioned before are not matched..
Current regex
preg_match('/^[a-zA-Z0-9[:space:]]+$/', $login)
My apology for asking two questions on the same subject. I hope this one is clear enough for you.
use this
[\W]+
will match any non-word character.
Your regex doesn't contain any reference to the special characters mentioned. You would need to include them in the character class for them to be matched.
To match those kinds of special characters you can use the unicode values.
Example:
\u0000-\uFFFF
\x00-\xFF
The top is UTF-16, the bottom is UTF-8.
Refer to a UTF-8/16 character table online to match up your symbols with their unicode values, then create a range to keep your expression short.
You can use the \p{S} character class (or \p{So}) that matches symbol characters (that includes this kind of characters: ╭₠☪♛♣♞♉♆☯♫):
preg_match('/^[a-zA-Z0-9\h\p{S}]+$/u', $login)
To find more possibilities you can check the pcre documentation at: http://www.pcre.org/pcre.txt
If you need to be more precise, the best way is to use character ranges in the character class. You can find code of characters here.

Regex to find string containing special characters in text

I'm trying to formulate a regular expression that will allow me to find a string within a piece of text, if the string exists on its own i.e. not within another word (but surrounded by special characters is ok).
/\bword\b/i
The above regex works fine, and finds "word" in the text. The problem comes when the word I want to find is something like "c++". In this case it matches on any occurrence of the "c" character on it's own. I've tried escaping the "+" characters but it doesn't make any difference. I'm assuming because "+" is a non-word character, I'm possibly going down the wrong route and using word boundaries is not what I should be doing.
So I guess the question is, how can I use a regular expression to find a string in a piece of text, on it's own, and regardless of whether the string is alphanumeric or contains special characters. So in the following piece of text it should match on the 3 occurences of "c++":
c++
(c++)
perl/c++/assembly
But it should not match on the following:
maniac++
c++abc
This is intended so that my script can tell if a specific skill exists within a user's CV/resume. I'm using this with PHP's preg_match_all() function.
I've done a lot of searching but can't come up with a solution, hopefully someone with good regex knowledge can help.
Try this:
/(?<!\w)(c\+\+)(?!\w)/
The (?<!\w) is a negative lookbehind clause, meaning that a word character should not immediately precede your pattern. The (?!\w) part is negative lookahead, meaning that a word character should not immediately follow.
Hope this helps!

Regex diacritics problem

I am trying to validate some user inputs, but my regex fails when it encounters diacritics. I am talking about characters like ăĂ and so on.
What should I add to the regex code so it should also validate diacritics from within inputs?
Thank you!
P.S.: If it matters, I am using PHP with CakePHP framework.
This is the piece of code I am currently using for validating user input: return preg_match('|^[0-9a-zA-Z_-\s]*$|', $value);
Assuming you want to match letters, then allowing Unicode letters should help:
Use /\p{L}+/u for example if you want to match a sequence of letters. Don't forget the /u (Unicode) modifier.
In your case:
return preg_match('|^[0-9\p{L}_\s-]*$|u', $value);
should work.
As an aside, it's probably not a good idea to use | as a regex delimiter. For the current regex / would do just fine; other alternatives are ~ or # because they seldom occur in text and don't have any special meaning in regexes.

Regular Expression regex to validate input: Two words with a space between

i need to use a regular expression to validate a field with php that has to have two words separated by a space like: "First Last" but i cant find one that fits my purposes, can anyone help me?
The best i've done is ^[a-zA-Z0-9_\s]*$ but with this i can have more than one space and anywhere in the field and i want only between the words. Can anyone help me?
Something like ^\w+\s\w+$ ought to work for this case. But you don't necessarily need to use regular expressions for this, you could just use explode().
^[^\s]+\s[^\s]+$
[^\s]+ matches one or more characters, except whitespace characters;
\s matches a single whitespace character.
This might do the trick
^[a-zA-Z0-9]+ {1}[a-zA-Z0-9]+$

Php - regular expression to check if the string has chinese chars

I have the string $str and I want to check if it`s content has Chinese chars or not (true/false)
$str = "赕就可消垻,只有当所有方块都被消垻时才可以过关";
can you please help me?
Thanks!
Adrian
You could use a unicode character class http://www.regular-expressions.info/unicode.html
preg_match("/\p{Han}+/u", $utf8_str);
This just checks for the presence of at least one chinese character. You might want to expand on this if you want to match the complete string.
#mario answer is right!
For Chinese chars use this regex: /[\x{4e00}-\x{9fa5}]+/u
And Don't forget the u modifier!!!
About u modifier reference
TKS to mario
preg_match("/^\p{Han}{2,10}+$/u", $str);
Use /^\p{Han}{2,10}+$/u regex which allows Chinese character only.
It allows chinese character only &
It allows Minimum 2 character &
It allows maximum 10 character
You can change minimum and maximum character by changing {2,10} as per your need.
\p & /u are very important to add please don't avoid to add it.
This link to a previous question on identifying simplified or traditional Chinese might give you some ideas... you don't actually specify which you mean, and I don't know Chinese well enough to recognise the difference

Categories