PHP Regex for full names in a specific format - php

I'm trying to make a function to verify names on PHP using Regex, I want the names to be able to carry infinite amount of spaces and ' and -, and to allow only capital characters after spaces but to allow capital and none capitals after - and '.. Also the total length should be of 50 characters and the name should end with a lowercase, note that the uppercases are A to Z plus those characters :
ÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ
and the lower cases are a to z plus those characters :
éçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß
each word (between a space , ' or - and another) should count at least 2 characters the name should also start with an uppercase and finish with a lower case and in words (between a space , ' or - and another) no uppercases but that of the beginning is allowed
Examples of acceptable names are :
Adam Klsld
Adam'odskdl
Adam'Ddlsl
Ùdam-ddkkdk
Addssd-Ddsdsd
I've been trying a lot but here's my last try that I still keep in my php file, the others I've deleted in the chaos of non-successful attempts (using mb_ereg function to match, so this is a posix-ere):
([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+){1}((^[\'\-\s])[A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+)*
(this does not necessarily mean it's the best attempt but I though it may help and give an idea on how much of a dork am I)

I wouldn't exactly suggest you use this... but I think this does what you want?
^([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+){1}((([\s])[A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+)|((['\-])([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ]|[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß])[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+))*$
Here it is in a non-code block so you can see how insane it is... think it strips some characters here though:
^([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+){1}((([\s])[A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+)|((['-])([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ]|[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß])[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+))*$

Is this Regex answering what you need to check ?
(You'll have to add the weird characters inside each brackets of course).

You can use this to avoid accented characters issue:
$pattern = "~^[\p{Lu}ß]\p{Ll}*+(?>(?> [\p{Lu}ß]|['-]\p{L})\p{Ll}*+)*$~u";
if(preg_match($pattern, $name)) { ...
Or for a more specific set of characters:
$pattern = "~(?(DEFINE)(?<Up>[A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ]))
(?(DEFINE)(?<Lo>[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]))
^\g<Up>\g<Lo>*+(?>(?>\h\g<Up>|['-]\g<Up>?+\g<Lo>)\g<Lo>*+)*+$~ux";
if (preg_match($pattern, $name, $matches)) { ...
or the same in a shorter way:
$pattern = "~(?(DEFINE)(?<Up>[A-ZÀ-ÖØ-ݟߌ]))
(?(DEFINE)(?<Lo>[a-zà-öø-ýÿßœ]))
^\g<Up>\g<Lo>*+(?>(?>\h\g<Up>|['-]\g<Up>?+\g<Lo>)\g<Lo>*+)*+$~ux";

Related

Unique regex for name validation

I want to check is the name valid with regex PHP, but i need a unique regex that allows:
Letters (upper and lowercase)
Spaces (max 2)
But there can't be a space after space..
For example:
Name -> Dennis Unge Shishic (valid)
Name -> Denis(space)(space) (not valid)
Hope you guys understand me, thank you :)
First, it's worth mentioning that having such restrictive rules for the names of persons is a very bad idea. However, if you must, a simple character class like this will limit you to just uppercase and lowercase English letters:
[A-Za-z]
To match one or more, you need to add a + after it. So, this will match the first part of the name:
[A-Za-z]+
To capture a second name, you just need to do the same thing preceded by a space, so something like this will capture two names:
[A-Za-z]+ [A-Za-z]+
To make the second name optional, you need to surround it by parentheses and add a ? after it, like this:
[A-Za-z]+( [A-Za-z]+)?
And to add a third name, you just need to do it again:
[A-Za-z]+( [A-Za-z]+)? [A-Za-z]+
Or, you could specify that the latter names can repeat between 1 and 2 times, like this:
[A-Za-z]+( [A-Za-z]+){1,2}
To make the resulting code easy to understand and maintain, you could use two Regex. One checking (by requiring it to be true) that only the allowed characters are used ^[a-zA-Z ]+$ and then another one, checking (by requiring it to be false) that there are no two (or more) adjacent spaces ( ){2,}
Try following working code:
Change input to whatever you want to test and see correct validation result printed
<?php
$input_line = "Abhishek Gupta";
preg_match("/[a-zA-Z ]+/", $input_line, $nameMatch);
preg_match("/\s{2,}/", $input_line, $multiSpace);
var_dump($nameMatch);
var_dump($multiSpace);
if(count($nameMatch)>0){
if(count($multiSpace)>0){
echo "Invalid Name Multispace";
}
else{
echo "Valid Name";
}
}
else{
echo "Invalid Name";
}
?>
A regex for one to three words consisting of only Unicode letters in PHP looks like
/^\p{L}+(?:\h\p{L}+){1,2}\z/u
Description:
^ - string start
\p{L}+ - one or more Unicode letters
(?:\h\p{L}+){1,2} - one or two sequences of a horizontal whitespace followed with one or more Unicode letters
\z - end of string, even disallowing trailing newline that a dollar anchor allows.

Detect cloth sizes with regex

I am trying to detect with regex, strings that have a pattern of {any_number}{x-}{large|medium|small} for a site with clothing I am building in PHP.
I have managed to match the sizes against a preconfigured set of strings by using:
$searchFor = '7x-large';
$regex = '/\b'.$searchFor.'\b/';
//Basically, it's finding the letters
//surrounded by a word-boundary (the \b bits).
//So, to find the position:
preg_match($regex, $opt_name, $match, PREG_OFFSET_CAPTURE);
I even managed to detect weird sizes like 41 1/2 with regex, but I am not an expert and I am having a hard time on this.
I have come up with
preg_match("/^(?<![\/\d])([xX\-])(large|medium|small)$/", '7x-large', $match);
but it won't work.
Could you pinpoint what I am doing wrong?
It sounds like you also want to match half sizes. You can use something like this:
$theregex = '~(?i)^\d+(?:\.5)?x-(?:large|medium|small)$~';
if (preg_match($theregex, $yourstring,$m)) {
// Yes! It matches!
// the match is $m[0]
}
else { // nah, no luck...
}
Note that the (?i) makes it case-insensitive.
This also assumes you are validating that an entire string conforms to the pattern. If you want to find the pattern as a substring of a larger string, remove the ^ and $ anchors:
$theregex = '~(?i)\d+(?:\.5)?x-(?:large|medium|small)~';
Look at the specification you have and build it up piece by piece. You want "{any_number}{x-}{large|medium|small}".
"{any_number}" would be \d+. This does not allow fractional numbers such as 12.34, but the question does not specify whether they are required.
"{x-}" is a simple string x-
"{large|medium|small}" is a choice between three alternatives large|medium|small.
Joining the pieces together gives \d+x-(large|medium|small). Note the brackets around the alternation, without then the expression would be interpreted as (\d+x-large)|medium|small.
You mention "weird sizes like 41 1/2" but without specifying how "weird" the number to be matched are. You need a precise specification of what you include in "weird" before you can extend the regular expression.

Explode UTF8 string regarding to uppercase or numeric characters

As this question, I can split strings that includes upper cases like this:
function splitAtUpperCase($string){
return preg_replace('/([a-z0-9])?([A-Z])/','$1 $2',$string);
}
$string = 'setIfUnmodifiedSince';
echo splitAtUpperCase($string);
Output is "set If Unmodified Since"
But I need some modification:
That code snippet doesn't handle the cases, when these characters exist in string: ÇÖĞŞÜİ. I don't want to transliterate the characters. Then I lose meaning of word. I need to use some UTF characters. That code makes "HereÇonThen" to "HereÇon Then"
I also don't want to split uppercase abbreviations. If word is "IKnowYouWillComeASAPHere" I need it to be converted to "I Know You Will Come ASAP Here"
Don't explode if all letters are uppercase. Like "DONTCOMEHERE"
Explode also numeric values. "Before2013ends" to "Before 2013 ends"
Explode if first character is hash key (#).
cases and expected results
"comeHEREtomorrow" => "come HERE tomorrow"
"KissYouTODAY" => "kiss you TODAY"
"comeÜndeHere" => "come Ünde Here"
"NEVERSAYIT" => "NEVERSAYIT"
"2013willCome" => "2013 will Come"
"Before2013ends" => "Before 2013 ends"
"IKnowThat" => "I Know That"
"#whatiknow" => "# whatiknow"
For these cases I use subsequent str_replace operations. I look for a short solution that doesn't make too much for loops to check the words. It would be better to have it as preg_replace or etc. if possible.
Edit: Anyone can try his solution by changing convert function inside this PHP fiddle: http://ideone.com/9gajZ8
/([[:lower:][:digit:]])?([[:upper:]]+)/u should do it.
Here /u is used for Unicode characters. and ([[:upper:]]+) is used for Sequence of upper cased letters.
Note. Case of a letter depends on the character set you are using.
Some notes:
Use Unicode properties to search for upper-case & lower-case letters (and even title-case ones, f.ex. Dž Lj Nj Dz)
comeHEREtomorrow & IKnowThat won't work with one method, until you use some dictionaries to find exact words.
Because if you want to translate comeHEREtomorrow as come HERE tomorrow, IKnowThat will be IK now That (or even IK now T hat);
And if you want to translate IKnowThat as I Know That, comeHEREtomorrow will be come H E R E tomorrow
My solution: http://ideone.com/oALyTo (excludes non-letter & non-number charaters)
Well, I matched all of your test cases, but I still don't think it's a good solution. (One of the few flaws in test driven design).
I took a slightly different approach. Instead of trying to write a regular expression for what the place between a word should look like, I wrote a regular expression that looks for everything that apparently is a word, and then imploded.
function convert($keyword) {
$wResult = preg_match_all('/(^I|[[:upper:]]{2,}|[[:upper:]][[:lower:]]*|[[:lower:]]+|\d+|#)/u', $keyword, $matches);
return implode(' ',$matches[0]);
}
As you can see, this is what I decided qualified as a word:
^I A capital I at the beginning of the string. Break point: Icons.
[[:upper:]]{2,} Consecutive capitals. Break Point: WellIKnowThat
[[:upper:]][[:lower:]]* A single Capital followed by some lower case letters
[[:lower:]]+ A string of lower case letters
\d+ A string of digits
# A literal #
It's not perfect - there're still many breakpoints. You can continue to refine these word definitions, but frankly, there's always going to be an edge case you can't catch. Then you wind up slowly expanding this regular expression until it's totally unmanageable. You could try using a dictionary, but that breaks down eventually, too. What do you do with "whirlwind"? Or "ITan"? Is that "IT an", or "I Tan"? Case in point? Here it is after I tried to catch some of My errors. It's getting so huge, and it's still trivial to come up with strings it breaks on. This function is all about degrees - how much time is it worth spending to teach your algorithm all the funny points of all the world languages?
EDIT: After some work, And deciding that I could be separated out as its own word if and only if it was followed immediately by One Capital letter and one lower case letter, I've updated my attempt at an answer.
function convert($keyword, $debug = false) {
$wResult = preg_match_all('/I(?=[[:upper:]][[:lower:]])|[[:upper:]]{2,}|[[:upper:]][[:lower:]]*|[[:lower:]]+|\d+|#/u', $keyword, $matches);
if($debug){
var_dump($matches);
var_dump($matches[0]);
var_dump(implode(' ',$matches[0]));
}
return implode(' ',$matches[0]);
}
I also added some new test cases:
convert("Icons") = "Icons"
convert("WellIKnowThat") == "Well I Know That"
convert("ITan") == "I Tan"
convert("whirlwind") == "whirlwind"
I think this is about as good as it's going to get today. The final set of "Word Definitions" in order of preference, is:
Upper case I, provided it's followed by an upper case letter and a lower case letter:I(?=[[:upper:]][[:lower:]])
Two or more consecutive upper case letters: [[:upper:]]{2,}
A single uppercase Letter, followed by as many Lower case letters as possible: [[:upper:]][[:lower:]]*
one or more consecutive lower case letters: [[:lower:]]+
One or more consecutive digits: \d+
A literal pound symbol: #
I've added another word definition, a test case, and refined the testing fiddle. The new word definition matches the rule for I, but with A - the only other one letter word in the English Language.
you need Unicode Regex:
\p{Lu} for upercase and \p{Li} for lowercase
Hence, your usage will look like this:
/([\p{Ll}0-9])?([\p{Lu}])/

PHP preg_match with regex: only single hyphens and spaces between words continue

I was trying to write an regex that allows single hyphens and single spaces only within words but not at the beginning or at the end of the words.
I thought I have this sorted from the answer I got yesterday, but I just realised there is small error which I don't quite understand,
Why it won't accept the inputs like,
'forum-category-b forum-category-a'
'forum-category-b Counter-terrorism'
'forum-category-a Preventing'
'forum-category-a Preventing Violent'
'forum-category-a International-Research-and-Publications'
'International-Research-and-Publications forum-category-b forum-category-a'
but it takes,
'forum-category-b'
'Counter-terrorism forum-category-a'
'Preventing forum-category-a'
'Preventing Violent forum-category-a'
'International-Research-and-Publications forum-category-b'
Why is that? How can I fix it? It Below is the regex with the initial test, but ideally it should accept all the combination inputs above,
$aWords = array(
'a',
'---stack---over---flow---',
' stack over flow',
'stack-over-flow',
'stack over flow',
'stacoverflow'
);
foreach($aWords as $sWord) {
if (preg_match('/^(\w+([\s-]\w+)?)+$/', $sWord)) {
echo 'pass: ' . $sWord . "\n";
} else {
echo 'fail: ' . $sWord . "\n";
}
}
accept/ to reject the input like these below,
---stack---over---flow---
stack-over-flow- stack-over-flow2
stack over flow
Thanks.
Your pattern does not do what you want. Let's break it apart:
^(\w+([\s-]\w+)?)+$
It matches strings that consist solely of one or more sequences of the pattern:
\w+([\s-]\w+)?
...which is a sequence of word characters, followed optionally by one other sequence of word characters, separated by one space or dash character.
In other words, your pattern searches for strings like:
xxx-xxxyyy-yyyzzz zzz
...but you intent to write a pattern that would find:
xxx-xxxxxx-xxxxxx yyy
In your examples, this one is matched:
Counter-terrorism forum-category-a
...but it is interpreted as the following sequence:
(Counter(-terroris)) (m( foru)) (m(-categor) (y(-a))
As you can see, the pattern did not really find the words you are looking for.
This example is not matched:
forum-category-a Preventing Violent
...since the pattern cannot form groups of "word characters, space-or-dash, word-characters" when it encounters a single word character followed by space or dash:
(forum(-categor)) (y(-a)) <Mismatch: Found " " but expected "\w">
If you would add another character to "forum-category-a", say "forum-category-ax", it would match again, since it could split at the "ax":
(forum(-categor)) (y(-a)) (x( Preventin)) (g( Violent))
What you are actually interested in is a pattern like
^(\w+(-\w+)*)(\s\w+(-\w+)*)*$
...which would find a sequence of words that may contain dashes, separated by spaces:
(forum(-category)(-a)) ( Preventing) ( Violent)
By the way, I tested this using a Python script, and while trying to match your pattern against the example string "International-Research-and-Publications forum-category-b forum-category-a", the regular expression engine seemed to run into an infinite loop...
import re
expr = re.compile(r'^(\w+([\s-]\w+)?)+$')
expr.match('International-Research-and-Publications forum-category-b forum-category-a')
the part of your pattern ([\s-]\w+)? is the issue. It's only allowing for one repetition (the trailing ?). Try changing the last ? to * and see if that helps.
Nope, I still believe that's the problem. The original pattern is looking for "word" or "word[space_hyphen]word" repeated 1+ times. Which is weird because the pattern should fall within another match. But switching the question mark worked for me.
There should be only one answer to this problem:
/^((?<=\w)[ -]\w|[^ -])+$/
There is only 1 rule as stated \w[ -]\w and thats it. And its on a per character basis granularity, and cannot be anthing else. Add the [^ -] for the rest.

Regex for names

Just starting to explore the 'wonders' of regex. Being someone who learns from trial and error, I'm really struggling because my trials are throwing up a disproportionate amount of errors... My experiments are in PHP using ereg().
Anyway. I work with first and last names separately but for now using the same regex. So far I have:
^[A-Z][a-zA-Z]+$
Any length string that starts with a capital and has only letters (capital or not) for the rest. But where I fall apart is dealing with the special situations that can pretty much occur anywhere.
Hyphenated Names (Worthington-Smythe)
Names with Apostophies (D'Angelo)
Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.
Joint Names (Ben & Jerry)
Maybe there's some other way a name can be that I'm no thinking of, but I suspect if I can get my head around this, I can add to it. I'm pretty sure there will be instances where more than one of these situations comes up in one name.
So, I think the bottom line is to have my regex also accept a space, hyphens, ampersands and apostrophes - but not at the start or end of the name to be technically correct.
This regex is perfect for me.
^([ \u00c0-\u01ffa-zA-Z'\-])+$
It works fine in php environments using preg_match(), but doesn't work everywhere.
It matches Jérémie O'Co-nor so I think it matches all UTF-8 names.
Hyphenated Names (Worthington-Smythe)
Add a - into the second character class. The easiest way to do that is to add it at the start so that it can't possibly be interpreted as a range modifier (as in a-z).
^[A-Z][-a-zA-Z]+$
Names with Apostophies (D'Angelo)
A naive way of doing this would be as above, giving:
^[A-Z][-'a-zA-Z]+$
Don't forget you may need to escape it inside the string! A 'better' way, given your example might be:
^[A-Z]'?[-a-zA-Z]+$
Which will allow a possible single apostrophe in the second position.
Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.
Here I'd be tempted to just do our naive way again:
^[A-Z]'?[- a-zA-Z]+$
A potentially better way might be:
^[A-Z]'?[- a-zA-Z]( [a-zA-Z])*$
Which looks for extra words at the end. This probably isn't a good idea if you're trying to match names in a body of extra text, but then again, the original wouldn't have done that well either.
Joint Names (Ben & Jerry)
At this point you're not looking at single names anymore?
Anyway, as you can see, regexes have a habit of growing very quickly...
THE BEST REGEX EXPRESSIONS FOR NAMES:
I will use the term special character to refer to the following three characters:
Dash -
Hyphen '
Dot .
Spaces and special characters can not appear twice in a row (e.g.: -- or '. or .. )
Trimmed (No spaces before or after)
You're welcome ;)
Mandatory single name, WITHOUT spaces, WITHOUT special characters:
^([A-Za-z])+$
Sierra is valid, Jack Alexander is invalid (has a space), O'Neil is invalid (has a special character)
Mandatory single name, WITHOUT spaces, WITH special characters:
^[A-Za-z]+(((\'|\-|\.)?([A-Za-z])+))?$
Sierra is valid, O'Neil is valid, Jack Alexander is invalid (has a space)
Mandatory single name, optional additional names, WITH spaces, WITH special characters:
^[A-Za-z]+((\s)?((\'|\-|\.)?([A-Za-z])+))*$
Jack Alexander is valid, Sierra O'Neil is valid
Mandatory single name, optional additional names, WITH spaces, WITHOUT special characters:
^[A-Za-z]+((\s)?([A-Za-z])+)*$
Jack Alexander is valid, Sierra O'Neil is invalid (has a special character)
SPECIAL CASE
Many modern smart devices add spaces at the end of each word, so in my applications I allow unlimited number of spaces before and after the string, then I trim it in the code behind. So I use the following:
Mandatory single name + optional additional names + spaces + special characters:
^(\s)*[A-Za-z]+((\s)?((\'|\-|\.)?([A-Za-z])+))*(\s)*$
Add your own special characters
If you wish to add your own special characters, let's say an underscore _ this is the group you need to update:
(\'|\-|\.)
To
(\'|\-|\.|\_)
PS: If you have questions comment here and I will receive an email and respond ;)
While I agree with the answers saying you basically can't do this with regex, I will point out that some of the objections (internationalized characters) can be resolved by using UTF strings and the \p{L} character class (matches a unicode "letter").
security tip: make sure to validate the size of the string before this step to avoid DoS attack that will bring down your system by sending very long charsets.
Check this out:
^(([A-Za-z]+[,.]?[ ]?|[a-z]+['-]?)+)$
You can test it here : https://regex101.com/r/mS9gD7/46
I don't really have a whole lot to add to a regex that takes care of names because there are already some good suggestions here, but if you want a few resources for learning more about regular expressions, you should check out:
Regex Library's Cheat
Sheet
Another cheat sheet
A regex tutorial on the DevNetwork
forums: Part 1 and Part 2
PHP builder's tutorial
And if you ever need to do regex for
JavaScript (it's a little
different flavor), try JavaScript Kit,
or this resource, or Mozilla's
reference
I second the 'give up' advice. Even if you consider numbers, hyphens, apostrophes and such, something like [a-zA-Z] still wouldn't catch international names (for example, those having šđčćž, or Cyrillic alphabet, or Chinese characters...)
But... why are you even trying to verify names? What errors are you trying to catch? Don't you think people know to write their name better than you? ;) Seriously, the only thing you can do by trying to verify names is to irritate people with unusual names.
Basically, I agree with Paul... You will always find exceptions, like di Caprio, DeVil, or such.
Remarks on your message: in PHP, ereg is generally seen as obsolete (slow, incomplete) in favor of preg (PCRE regexes).
And you should try some regex tester, like the powerful Regex Coach: they are great to test quickly REs against arbitrary strings.
If you really need to solve your problem and aren't satisfied with above answers, just ask, I will give a go.
This worked for me:
+[a-z]{2,3} +[a-z]*|[\w'-]*
This regex will correctly match names such as the following:
jean-claude van damme
nadine arroyo-rodriquez
wayne la pierre
beverly d'angelo
billy-bob thornton
tito puente
susan del rio
It will group "van damme", "arroyo-rodriquez" "d'angelo", "billy-bob", etc. as well as the singular names like "wayne".
Note that it does not test that the grouped stuff is actually a valid name. Like others said, you'll need a dictionary for that. Also, it will group numbers, so if that's an issue you may want to modify the regex.
I wrote this to parse names for a MapReduce application. All I wanted was to extract words from the name field, grouping together the del foo and la bar and billy-bobs into one word to make the key-value pair generation more accurate.
^[A-Z][a-zA-Z '&-]*[A-Za-z]$
Will accept anything that starts with an uppercase letter, followed by zero or more of any letter, space, hyphen, ampersand or apostrophes, and ending with a letter.
See this question for more related "name-detection" related stuff.
regex to match a maximum of 4 spaces
Basically, you have a problem in that, there are effectively no characters in existence that can't form a legal name string.
If you are still limiting yourself to words without ä ü æ ß and other similar non-strictly-ascii characters.
Get yourself a copy of UTF32 character table and realise how many millions of valid characters there are that your simple regex would miss.
To add multiple dots in the username use this Regex:
^[a-zA-Z][a-zA-Z0-9_]*\.?[a-zA-Z0-9_\.]*$
String length can be set separately.
You can easily neutralize the whole matter of whether letters are upper or lowercase -- even in unexpected or uncommon locations -- by converting the string to all upper case using strtoupper() and then checking it against your regex.
/([\u00c0-\u01ffa-zA-Z'\-]+[ ]?[*]?[\u00c0-\u01ffa-zA-Z'\-]*)+/;
Try this . You can also force to start with char using ^,and end with char using $
To improve on daan's answer:
^([\u00c0-\u01ffa-zA-Z]+\b['\-]{0,1})+\b$
only allows a single occurances of hyphen or apostrophy within a-z and valid unicode chars.
also does a backtrack to make sure there is no hyphen or apostrophes at the end of the string.
^[A-Z][a-z]*(([,.] |[ '-])[A-Za-z][a-z]*)*(\.?)( [IVXLCDM]+)?$
For complete details, please visit THIS post. This regex doesn't allow ampersands.
if you add spaces then "He went to the market on Sunday" would be a valid name.
I don't think you can do this with a regex, you cannot easily detect names from a chunk of text using a regex, you would need a dictionary of approved names and search based on that. Any names not on the list wouldn't be detected.
I have used this, because name can be the part of file-patch.
//http://support.microsoft.com/kb/177506
foreach(array('/','\\',':','*','?','<','>','|') as $char)
if(strpos($name,$char)!==false)
die("Not allowed char: '$char'");
I ran into this same issue, and like many others that have posted, this isn't a 100% fool proof expression, but it's working for us.
/([\-'a-z]+\s?){2,4}/
This will check for any hyphens and/or apostrophes in either the first and/or last name as well as checking for a space between the first and last names. The last part is a little magic that will check for between 2 and 4 names. If you tend to have a lot of international users that may have 5 or even 6 names, you can change that to 5 or 6 and it should work for you.
i think "/^[a-zA-Z']+$/" is not enough it will allow to pass single letter we can adjust the range by adding {4,20} which means the range of letters are 4 to 20.
I've come up with this RegEx pattern for names:
/^([a-zA-Z]+[\s'.]?)+\S$/
It works. I think you should use it too.
It matches only names or strings like:
Dr. Shaquil O'Neil Armstrong Buzz-Aldrin
It won't match strings with 2 or more spaces like:
John Paul
It won't match strings with ending spaces like:
John Paul
The text above has an ending space. Try highlighting or selecting the text to see the space
Here's what I use to learn and create your own regex patterns:
RegExr: Leanr, Build and Test RegEx
Try this: /^([A-Z][a-z]([ ][a-z]+)([ '-]([&][ ])?[A-Z][a-z]+)*)$/
Demo: http://regexr.com/3bai1
Have a nice day !
you can use this below for names
^[a-zA-Z'-]{3,}\s[a-zA-Z'-]{3,}$
^ start of the string
$ end of the string
\s space
[a-zA-Z'-\s]{3,} will accept any name with a length of 3 characters or more, and it include names with ' or - like jean-luc
So in our case it will only accept names in 2 parts separated by a space
in case of multiple first-name you can add a \s
^[a-zA-Z'-\s]{3,}\s[a-zA-Z'-]{3,}$
Following Regex is simple and useful for proper names (Towns, Cities, First Name, Last Name) allowing all international letters omitting unicode-based regex engine.
It is flexible - you can add/remove characters you want in the expression (focusing on characters you want to reject rather than include).
^(?:(?!^\s|[ \-']{2}|[\d\r\n\t\f\v!"#$%&()*+,\.\/:;<=>?#[\\\]^_`{|}~€‚ƒ„…†‡ˆ‰‹‘’“”•–—˜™›¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½¾¿×÷№′″ⁿ⁺⁰‱₁₂₃₄]|\s$).){1,50}$
Regex matches: from 1 to 50 international letters separated by single delimiter (space -')
Regex rejects: empty prefix/suffix, consecutive delimiters (space - '), digits, new line, tab, limited list of extended ASCII characters
Demo
This is what I use for full name:
$pattern = "/^((\p{Lu}{1})\S(\p{Ll}{1,20})[^0-9])+[-'\s]((\p{Lu}{1})\S(\p{Ll}{1,20}))*[^0-9]$/u";
Supports all languages
Common names("Jane Doe", "John Doe")
Usefull for composed names("Marie-Josée Côté-Rochon", "Bill O'reilly")
Excludes digits(0-9)
Only excepts uppercase at beginning of names
First and last names from 2-21 characters
Adding trim() to remove whitespace
Does not except("John J. William", "Francis O'reilly Jr. III")
Must use full names, not: ("John", "Jane", "O'reilly", "Smith")
Edit:
It seems that both [^0-9] in the pattern above was matching at least a fourth digit/letter in each of either first and/or last names.
Therefore names of three letters/digits could not be matched.
Here is the edited regular expression:
$pattern = "/^(\p{Lu}{1}\S\p{Ll}{1,20}[-'\s]\p{Lu}{1}\S\p{Ll}{1,20})+([^\d]+)$/u";
Give up. Every rule you can think of has exceptions in some culture or other. Even if that "culture" is geeks who like legally change their names to "37eet".
Try this regex:
^[a-zA-Z'-\s\.]{3,20}\s[a-zA-Z'-\.]{3,20}$
Aomine's answer was quite helpful, I tweaked it a bit to include:
Names with dots (middle): Jane J. Samuels
Names with dots at the end: John Simms Snr.
Also the name will accept minimum 2 letters, and a min. of 2 letters for surname but no more than 20 for each (so total of 40 characters)
Successful Test cases:
D'amalia Jones
David Silva Jnr.
Jay-Silva Thompson
Shay .J. Muhanned
Bob J. Iverson

Categories