preg_match only letters, numbers and spaces (including umlauts and similar) - php

I know there are a lot of these questions here on StackOverflow but i couldn't find exactly what i'am searching for ...
I need a regex that allows letters (including umlauts and others like öäßè), numbers and white space. So no special characters (?!;:#) and no dash (-) or underscore (_)

Use \p{L}, a Unicode letter class, to match any letter from any alphabet (i.e. non-ASCII Unicode letters):
^[\d\s\p{L}]+$
Demo: https://regex101.com/r/wfjCjF/3
P.S.
Mind the pattern delimiters when using a regex in preg_match:
preg_match('/^[\d\s\p{L}]+$/', 'öäßè')
^ ^

Related

Russian character and alphanumeric converter

How can I remove non-alphanumeric characters from a string in PHP while keeping Russian characters like ч and г?
I tried to translate the string and then clean it with preg_replace, but this would remove the Russian characters.
You can do it with preg_replace. You just have to build a regular expression that matches what you desire.
If I understood your question correctly, this should work:
preg_replace('/[^\p{L}\p{N}\s]/u', '', $string);
Brief explanation:
^ matches any character that is not in this set.
\p{L} matches any letter (including the Cyrillic alphabet).
\p{N} matches any number.
\s matches any whitespaces.
/u adds Unicode support.
If you only want to match letters from the Cyrillic alphabet., you may want to use \p{Cyrillic} instead of \p{L}.

How to remove special characters and keep letters of any language in PHP?

I know this should remove any characters from string and keep only numbers and ENGLISH letters.
$txtafter = preg_replace("/[^a-zA-Z 0-9]+/","",$txtbefore);
but I wish to remove any special characters and keep any letter of any language like Arabic or Japanese.
Probably this will work for you:
$repl = preg_replace('/[^\w\s]+/u','' ,$txtbefore);
This will remove all non-word and non-space characters from your text. /u flag is there for unicode support.
You can use the \p{L} pattern to match any letter and \p{N} to much any numeric character. Also you should use u modifier like this: /\p{L}+/u
Your final regex may look like: /[^\p{L}\p{N}]/u
Also be sure to check this question:
Regular expression \p{L} and \p{N}

regex unable to allow apostrophe

I am experiencing a strange problem with a regular expression I have already used before.
The goal is to allow the user to enter his name, with letters, hyphen, and apostrophes if needed in a php form.
My regex is:
"/^[\w\s'àáâãäåçèéêëìíîïðòóôõöùúûüýÿ-]+$/i"
But... everything is allowed but the apostrophe. Escaping it will not change. Why?
To deal with unicode characters, you can do:
/^[\pN\pL\pP\pZ]+$/
where:
\pN stands for any number
\pL stands for any letter
\pP stands for any punctuation
\pZ stands for any space
It matches names like:
d'Alembert
d’Alembert (note the different apos from above)
Jean-François
O'Connors

preg_match some characters

I need an regex to my preg_match(), it should preg (allow) the following characters:
String can contain only letters, numbers, and the following punctuation marks:
full stop (.)
comma (,)
dash (-)
underscore (_)
I have no idea , how it can be done on regex, but I think there is a way!
^[\p{L}\p{N}.,_-]*$
will match a string that contains only (Unicode) letters, digits or the "special characters" you mentioned. [...] is a character class, meaning "one of the characters contained here". You'll need to use the /u Unicode modifier for this to work:
preg_match(`/^[\p{L}\p{N}.,_-]*$/u', $mystring);
If you only care about ASCII letters, it's easier:
^[\w.,-]*$
or, in PHP:
preg_match(`/^[\w.,-]*$/', $mystring);

regular expression for French characters

I need a function or a regular expression to validate strings which contain alpha characters (including French ones), minus sign (-), dot (.) and space (excluding everything else)
Thanks
/^[a-zàâçéèêëîïôûùüÿñæœ .-]*$/i
Use of /i for case-insensitivity to make things simpler. If you don't want to allow empty strings, change * to +.
Simplified solution:
/^[a-zA-ZÀ-ÿ-. ]*$/
Explanation:
^ Start of the string
[ ... ]* Zero or more of the following:
a-z lowercase alphabets
A-Z Uppercase alphabets
À-ÿ Accepts lowercase and uppercase characters including letters with an umlaut
- dashes
. periods
spaces
$ End of the string
Try:
/^[\p{L}-. ]*$/u
This says:
^ Start of the string
[ ... ]* Zero or more of the following:
\p{L} Unicode letter characters
- dashes
. periods
spaces
$ End of the string
/u Enable Unicode mode in PHP
The character class I've been using is the following:
[\wÀ-Üà-øoù-ÿŒœ]. This covers a slightly larger character set than only French, but excludes a large portion of Eastern European and Scandinavian diacriticals and letters that are not relevant to French. I find this a decent compromise between brevity and exclusivity.
To match/validate complete sentences, I use this expression:
[\w\s.,!?:;&#%’'"()«»À-Üà-øoù-ÿŒœ], which includes punctuation and French style quotation marks.
Simply use the following code :
/[\u00C0-\u017F]/
This line of regex pass throug all of cirano de bergerac french text:
(you will need to remove markup language characters
http://www.gutenberg.org/files/1256/1256-8.txt
^([0-9A-Za-z\u00C0-\u017F\ ,.\;'\-()\s\:\!\?\"])+
All French and Spanish accents
/^[a-zA-ZàâäæáãåāèéêëęėēîïīįíìôōøõóòöœùûüūúÿçćčńñÀÂÄÆÁÃÅĀÈÉÊËĘĖĒÎÏĪĮÍÌÔŌØÕÓÒÖŒÙÛÜŪÚŸÇĆČŃÑ .-]*$/
This might suit:
/^[ a-zA-Z\xBF-\xFF\.-]+$/
It lets a few extra chars in, like ÷, but it handles quite a few of the accented characters.
/[A-Za-z-\.\s]/u should work.. /u switch is for UTF-8 encoding

Categories