I'm trying to find a regexp that covers a lot of outcomes, the one I'm using now would be enough if it weren't for a lot of international names having special letters in them as well as hyphens.
The one I'm using now looks like this:
/^[A-Za-zåäöÅÄÖ\s\-\ ]*$/
It allows for hyphens and whitespace but it also allows them at the start or end of the string which I don't want to allow.
I need to modify this to allow:
Special letters such as éýÿüåäö etc. (preferrably by not having to write them all manually)
Capital letter at the start of each new word
Whitespace between words
- hyphens between words, but not before or after the full string
It should not allow numbers, which it doesn't already. Since I haven't worked a whole lot with regex construction I'm in the dark on how to achieve this, I've found a lot of solutions that covers one or the other scenario, but not all of the ones I need. I would appreciate the assistance. The regex should work for PHP validation.
EDIT:
$fname = 'Scrooge Mc-Duck'; //Only example string
$fname = trim($fname);
if (!preg_match('/^\p{Lu}\p{Ll}+([ -]+\p{Lu}\p{Ll}+)*$/', $fname)) {
$fnameErr = 'Invalid first name';
}
This outputs the error when using #npinti's solution.
Assuming that your regular expression engine can expose character classes. You can use \p{L} to match any letter. So, to match a name, you could use ^\p{Lu}\p{Ll}+([ -]+\p{Lu}\p{Ll}+)*$.
This would allow you to match an upper case letter followed by one or more lower case letters. In turn, this can be followed by a combination of 0 or more white spaces and dashes and is then followed by an upper case letter and one ore more lower case letters. The ^ and $ at the beginning and end make sure that the regular expression matches the entire string.
A demo of the regex can be viewed here.
Related
I have the following requirements for validating an input field:
It should only contain alphabets and spaces between the alphabets.
It cannot contain spaces at the beginning or end of the string.
It cannot contain any other special character.
I am using following regex for this:
^(?!\s*$)[-a-zA-Z ]*$
But this is allowing spaces at the beginning. Any help is appreciated.
For me the only logical way to do this is:
^\p{L}+(?: \p{L}+)*$
At the start of the string there must be at least one letter. (I replaced your [a-zA-Z] by the Unicode code property for letters \p{L}). Then there can be a space followed by at least one letter, this part can be repeated.
\p{L}: any kind of letter from any language. See regular-expressions.info
The problem in your expression ^(?!\s*$) is, that lookahead will fail, if there is only whitespace till the end of the string. If you want to disallow leading whitespace, just remove the end of string anchor inside the lookahead ==> ^(?!\s)[-a-zA-Z ]*$. But this still allows the string to end with whitespace. To avoid this look back at the end of the string ^(?!\s)[-a-zA-Z ]*(?<!\s)$. But I think for this task a look around is not needed.
This should work if you use it with String.matches method. I assume you want English alphabet.
"[a-zA-Z]+(\\s+[a-zA-Z]+)*"
Note that \s will allow all kinds of whitespace characters. In Java, it would be equivalent to
[ \t\n\x0B\f\r]
Which includes horizontal tab (09), line feed (10), carriage return (13), form feed (12), backspace (08), space (32).
If you want to specifically allow only space (32):
"[a-zA-Z]+( +[a-zA-Z]+)*"
You can further optimize the regex above by making the capturing group ( +[a-zA-Z]+) non-capturing (with String.matches you are not going to be able to get the words individually anyway). It is also possible to change the quantifiers to make them possessive, since there is no point in backtracking here.
"[a-zA-Z]++(?: ++[a-zA-Z]++)*+"
Try this:
^(((?<!^)\s(?!$)|[-a-zA-Z])*)$
This expression uses negative lookahead and negative lookbehind to disallow spaces at the beginning or at the end of the string, and requiring the match of the entire string.
I think the problem is there's a ? before the negation of white spaces, which means it is optional
This should work:
[a-zA-Z]{1}([a-zA-Z\s]*[a-zA-Z]{1})?
at least one sequence of letters, then optional string with spaces but always ends with letters
I don't know if words in your accepted string can be seperated by more then one space. If they can:
^[a-zA-Z]+(( )+[a-zA-z]+)*$
If can't:
^[a-zA-Z]+( [a-zA-z]+)*$
String must start with letter (or few letters), not space.
String can contain few words, but every word beside first must have space before it.
Hope I helped.
I havent been able to figure this one out.
I need to match all those strings by matching whole and its surroundings underscores (in one regex statement):
whole_anything
anything_whole
anything_whole_anything
but it must NOT match this
anythingwholeanything
anything_wholeanything
anythingwhole_anything
That means... make a regex statement, that match phrase whole only if it has underscore before, after or both. Not if there are no underscores.
The following
preg_match("/(whole_|_whole_|_whole)/",string)
is not a solution ;)
2015/02/09 Edit: added conditions 5. and 6. for clarification
You could reduce the number of cases in the alternatives:
preg_match('/(_whole_?|whole_)/', $string);
If there's an underscore before, the underscore after is optional. But if there's no underscore before, the underscore after is required.
You can use a PHP variable to solve the problem of putting the word twice:
$word = preg_quote('whole');
preg_match("/(_{$word}_?|{$word}_)/", $string);
Another alternative. This way we check for the existence of a word boundary or _ both before and after whole, but we exclude the word whole by itself through a negative lookahead.
(?!\bwhole\b)((?:_|\b)whole(?:_|\b))
Regex Demo here.
You could exclude all alphanumeric characters prior to and after. Unfortunately you can't use \w because _ is considered a word character
([^a-zA-Z0-9])_?whole_?([^a-zA-Z0-9])
That will exclude alphanumeric before and after from matching, and the underscore in front, behind, or both, is optional. If none exist, it can't match because it can'be proceeded by a letter or number. You could change it to include special characters and the lot.
I need a regex that would test if a word is composed of letters (alpha characters), white spaces, and periods (.). I need this to use for validating names that is entered in my database.
This is what I currently use:
preg_match('/^[\pL\s]+$/u',$foo)
It works fine for checking alpha characters and whitespaces, but rejects names with periods as well. I hope you guys can help as I have no idea how to use regex.
Add a dot to the character class so that it would match a literal dot also.
preg_match('/^[\p{L}.\s]+$/u',$foo)
OR
preg_match('/^[\pL.\s]+$/u',$foo)
Explanation:
^ Asserts that we are at the start.
[\pL.\s]+ Matches any character in the list one or more times. \pL matches any Kind of letter from any language.
$ Asserts that we are at the end.
The following regex should satisfy your condition:
preg_match('/^[a-zA-Z\s.]+$/',$foo)
In this link, you will find all the information you need to figure regex out with PHP. PHP Regex Cheat Sheet
Basically, if you want to add the period you add . :
preg_match('/^[\pL\s\.]+$/u',$foo)
Enjoy! :)
I need a regular expression to match only the first two words (they may contain letters , numbers, commas and other punctuation but not white spaces, tabs or new lines) in a string.
My solution is ([^\s]+\s+){2} but if it matches something like :'123 word' *in '123 word, hello'*, it doesnt work on a string with just two words and no spaces after.
What is the right regex for this task?
You have it almost right:
(\S+\s+\S+)
Assuming you don't need stronger control on what characters to use.
If you need to match both two words or only one word only, you may use one of those:
(\S+\s+\S|\S+)
(\S+(?:\s+\S+)?)
Instead of trying to match the words, you could split the string on whitespace with preg_split().
If you really only want to allow numbers and letters [^\s] is not restrictive enough. Use this:
/[a-z0-9]+(\s+[a-z0-9]+)?/i
I have the following regex meant to test against valid name formats:
^[a-zA-Z]+(([\'\,\.\- ][a-zA-Z ])?[a-zA-Z]*)*$
it seems to work fine with all the expected odd name possibilities, including the following:
o'Bannon
Smith, Jr.
Double-barreled
I'm having problem when I plug this into my PHP code. If the first character is a number it passes through as valid.
If the last character is a space, comma, full-stop or other special allowed character, it's failing as invalid.
My PHP code is :
$v = 'Tested Value';
$value = (filter_var($v, FILTER_VALIDATE_REGEXP,array("options"=>array("regexp"=>"^[a-zA-Z]+(([\'\,\.\-,\ ][a-zA-Z ])?[a-zA-Z]*)*$^"))));
if (strlen($value) <2 && strlen($v) !=0) {
return "not valid";
}
What am I doing wrong here?
^[a-zA-Z]+(([\'\,\.\-,\ ][a-zA-Z ])?[a-zA-Z]*)*$^
The carets (^) at the beginning and end of the regex are being interpreted as regex deliminators, not as anchors. The regex isn't really matching the digits at the beginning of the string, it's skipping over them so it can start matching at the first letter it finds. You can use almost any ASCII punctuation character as the regex deliminator, but most people use # or ~, which are relatively uncommon and have no special meaning in regexes.
As for not allowing punctuation at the end, that's how the regex is written. Specifically, [\'\,\.\- ][a-zA-Z ] requires that each apostrophe, comma, period or hyphen be followed by a letter or a space. If you really want to allow any of those characters at the end, it's pretty simple:
~^(?:[a-z]+[',. -]*)+$~i
Of course, that's not a particularly good regex for validating names, but I have nothing better to offer; it's a job for which regexes are particularly ill-suited. And do you really want to be the one to tell your users their own names are invalid?
Your regex is way to complex
/^[a-z]+[',. a-z-]*$/i
should do the same thing