I need a Regex for PHP to do the following:
I want to allow [a-zα-ωá-źа-яա-ֆა-ჰא-ת] and chinese, japanese (more utf-8) letters;
I want to ban [^٩٨٧٦٥٤٣٢١٠۰۱۲۳۴۵۶۷۸۹] (arabic numbers);
This is what i've done:
function isValidFirstName($first_name) {
return preg_match("/^(?=[a-zα-ωá-źа-яա-ֆა-ჰא-ת]+([a-zα-ωá-źа-яա-ֆა-ჰא-ת' -]+)?\z)[a-zα-ωá-źа-яա-ֆა-ჰא-ת' -]+$/i", $first_name);
}
It looks like it works, but if I type letters of more than 1 language, it doesn't validate.
Examples: Авпа Вапапва á-ź John - doesn't validate.
John Gger - validates, á-ź á-ź - validates.
I would like to this all of these.
Or if there's a way, to echo a message if user entered more lingual string.
I can't reproduce the failure cases here (Авпа Вапапва á-ź John validates just fine), but you can simplify the regex a lot - you don't need that lookahead assertion:
preg_match('/^[a-zα-ωá-źа-яա-ֆა-ჰא-ת][a-zα-ωá-źа-яա-ֆა-ჰא-ת\' -]*$/i', $first_name)
As far as I can tell from the character ranges you've given, you don't need to exclude the digits because anything outside these character classes will already cause the regex to fail.
Another consideration: If your goal is to allow any letter from any language/script (plus some punctuation and space) you can (if you're using Unicode strings) further simplify this to:
preg_match('/^\pL[\pL\' -]*$/iu', $first_name)
But generally, I wouldn't try to validate a name by regular expressions (or any other means): Falsehoods programmers believe about names.
You may filter out Arabic characters by checking followin way using RegEx:
if (preg_match('/(?:[\p{Hebrew}]+)/imu', $subject)) {
# Successful match
} else {
# Match attempt failed
}
RegEx explanation
<!--
(?i)(?:[\p{IsHebrew}]+)
Options: case insensitive; ^ and $ match at line breaks
Match the remainder of the regex with the options: case insensitive (i) «(?i)»
Match the regular expression below «(?:[\p{IsHebrew}]+)»
A character in the Unicode block “Hebrew” (U+0590..U+05FF) «[\p{IsHebrew}]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
-->
Related
I need to write a regular expression that will evaluate the following conditions:
2 consecutive lower case characters
at least 1 digit
at least 1 upper case character
2 consecutive identical punctuation characters
For example, the string 'aa1A!!' should match, as should '!!A1aa'.
I have written the following regular expression:
'/(?=([a-z]){2,})(?=[0-9])(?=[A-Z])(?=(\W)\1)/'
I have found each individual expression works, but I am struggling to put it all together. What am I missing?
First, your pattern must be anchored to be sure that lookaheads are only tested from the position at the start of string. Then, since your characters can be everywhere in the string, you need to start the subpatterns inside lookahead with .*.
\W is a character class for non-word characters (all that is not [A-Za-z0-9_] that includes spaces, control characters, accented letters...). IMO, \pP or [[:punct:]] are more appropriate.
/^(?=.*[a-z]{2})(?=.*[0-9])(?=.*[A-Z])(?=.*(\pP)\1)/
About the idea to make 4 patterns instead of 1, it looks like a good idea, it tastes like a good idea, but it's useless and slower. However, it can be interesting if you want to know what particular rule fails.
I'm trying to find a regexp that covers a lot of outcomes, the one I'm using now would be enough if it weren't for a lot of international names having special letters in them as well as hyphens.
The one I'm using now looks like this:
/^[A-Za-zåäöÅÄÖ\s\-\ ]*$/
It allows for hyphens and whitespace but it also allows them at the start or end of the string which I don't want to allow.
I need to modify this to allow:
Special letters such as éýÿüåäö etc. (preferrably by not having to write them all manually)
Capital letter at the start of each new word
Whitespace between words
- hyphens between words, but not before or after the full string
It should not allow numbers, which it doesn't already. Since I haven't worked a whole lot with regex construction I'm in the dark on how to achieve this, I've found a lot of solutions that covers one or the other scenario, but not all of the ones I need. I would appreciate the assistance. The regex should work for PHP validation.
EDIT:
$fname = 'Scrooge Mc-Duck'; //Only example string
$fname = trim($fname);
if (!preg_match('/^\p{Lu}\p{Ll}+([ -]+\p{Lu}\p{Ll}+)*$/', $fname)) {
$fnameErr = 'Invalid first name';
}
This outputs the error when using #npinti's solution.
Assuming that your regular expression engine can expose character classes. You can use \p{L} to match any letter. So, to match a name, you could use ^\p{Lu}\p{Ll}+([ -]+\p{Lu}\p{Ll}+)*$.
This would allow you to match an upper case letter followed by one or more lower case letters. In turn, this can be followed by a combination of 0 or more white spaces and dashes and is then followed by an upper case letter and one ore more lower case letters. The ^ and $ at the beginning and end make sure that the regular expression matches the entire string.
A demo of the regex can be viewed here.
I tried to create regular expression with specification below
any alphabetic character (at least one)
any numeric character (at least one)
no spaces
accept all special characters (except ",;&|')
^(?=.*[0-9])(?=.*[a-z])(?!.*\s)((?!.*[",;&|'])|(?=(.*\W){1,}))(?!.*[",;&|'])$
This is the one I tried.
What I can do with this?
Question is still vague in nature, please provide some examples of accepted strings.
Just to get you started you can use:
character class in a negative lookahead
Don't forget start & end anchors:
Regex:
/^(?=.*?\d)(?=.*?[a-z])(?!.*?[ ",;&|']).+$/i
This regex will match 1 or more characters that are not one of ",;&|' and atleast one digit and a-z alpgabet is required.
Live Demo: http://www.rubular.com/r/nxdi79ZcRx
In PHP use it like this:
'/^(?=.*?\d)(?=.*?[a-z])(?!.*?[ ",;&|\']).+$/i'
This question already has answers here:
How to validate an email address in PHP
(15 answers)
Closed 2 years ago.
Regex is blowing my mind. How can I change this to validate emails with a plus sign? so I can sign up with test+spam#gmail.com
if(!preg_match("/^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*$/i", $_GET['em'])) {
It seems like you aren't really familiar with what your regex is doing currently, which would be a good first step before modifying it. Let's walk through your regex using the email address john.robert.smith#mail.com (in each section below, the bolded part is what is matched by that section):
^ is the start of string
anchor.
It specifies that any match must
begin at the beginning of the
string. If the pattern is not
anchored, the regex engine can match
a substring, which is often
undesired.
Anchors are zero-width, meaning that
they do not capture any characters.
[_a-z0-9-]+ is made up of two
elements, a character
class
and a repetition
modifer:
[...] defines a character class, which tells the regex engine,
any of these characters are valid matches. In this case the class
contains the characters a-z, numbers
0-9 and the dash and underscore (in
general, a dash in a character class
defines a range, so you can use
a-z instead of
abcdefghijklmnopqrstuvwxyz; when
given as the last character in the
class, it acts as a literal dash).
+ is a repetition modifier that specifies that the preceding token
(in this case, the character class)
can be repeated one or more times.
There are two other repetition
operators: * matches zero or more
times; ? matches exactly zero or
one times (ie. makes something
optional).
(captures
john.robert.smith#mail.com)
(\.[_a-z0-9-]+)* again contains a
repeated character class. It also
contains a
group,
and an escaped character:
(...) defines a group, which allows you to group multiple tokens
together (in this case, the group
will be repeated as a
whole).Let's say we wanted to
match 'abc', zero or more times (ie.
abcabcabc matches, abcccc doesn't).
If we tried to use the pattern
abc*, the repetition modifier
would only apply to the c, because
c is the last token before the
modifier. In order to get around
this, we can group abc ((abc)*),
in which case the modifier would
apply to the entire group, as if it
was a single token.
\. specifies a literal dot character. The reason this is needed
is because . is a special
character in regex, meaning any
character.
Since we want to match an actual dot
character, we need to escape it.
(captures
john.robert.smith#mail.com)
# is not a special character in
regex, so, like all other
non-special characters, it matches
literally.
(captures john.robert.smith#mail.com)
[a-z0-9-]+ again defines a repeated character class, like item #2 above.
(captures john.robert.smith#mail.com)
(\.[a-z0-9-]+)* is almost exactly the same pattern as #3 above.
(captures john.robert.smith#mail.com)
$ is the end of string anchor. It works the same as ^ above, except matches the end of the string.
With that in mind, it should be a bit clearer how to add a section with captures a plus segment. As we saw above, + is a special character so it has to be escaped. Then, since the + has to be followed by some characters, we can define a character class with the characters we want to match and define its repetition. Finally, we should make the whole group optional because email addresses don't need to have a + segment:
(\+[a-z0-9-]+)?
When inserted into your regex, it'd look like this:
/^[_a-z0-9-]+(\.[_a-z0-9-]+)*(\+[a-z0-9-]+)?#[a-z0-9-]+(\.[a-z0-9-]+)*$/i
Save your sanity. Get a pre-made PHP RFC 822 Email address parser
I've used this regex to validate emails, and it works just fine with emails that contain a+:
/^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/
\+ will match a literal + sign, but be aware: You still won't be close to matching all possible email addresses according to the RFC spec, because the actual regex for that is madness. It's almost certainly not worth it; you should use a real email parser for this.
This is another solution (is similar to the solution found by David):
//Escaped for .Net
^[_a-zA-Z0-9-]+((\\.[_a-zA-Z0-9-]+)*|(\\+[_a-zA-Z0-9-]+)*)*#[a-zA-Z0-9-]+(\\.[a-zA-Z0-9-]+)*(\\.[a-zA-Z]{2,4})$
//Native
^[_a-zA-Z0-9-]+((\.[_a-zA-Z0-9-]+)*|(\+[_a-zA-Z0-9-]+)*)*#[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*(\.[a-zA-Z]{2,4})$
This is the another solution
/^[_a-z0-9-+]+(\.[_a-z0-9-+]+)*(\+[a-z0-9-]+)?#[a-z0-9-.]+(\.[a-z0-9]+)$/
or For razor page(#=\u0040)
/^[_a-z0-9-+]+(\.[_a-z0-9-+]+)*(\+[a-z0-9-]+)?\u0040[a-z0-9-.]+(\.[a-z0-9]+)$/
How do I make the following regular expression accept only the symbols I want it to accept as well as spaces?
if(!preg_match('/^[A-Z0-9\/\'&,.-]*$/', $line))
{
die();
}
else
{
//execute the rest of the validation script
}
I want the user to only be able to enter A-Z, 0-9, forward slashes, apostrophes, ampersands, commas, periods, and hyphens into a given text field $line.
It currently will accept something along the lines of HAM-BURGER which is perfect, it should accept that. I run into an issue when the user wants to type HAM BURGER (<- note the space).
If I remove the ^ from the beginning and/or the $ from the end it will succeed if the user types in anything. My attempted remedy to this was to make the * into a + but then it will accept anything as long as the user puts in at least one of the acceptable characters.
Add the space to the character class:
if(!preg_match('/^[A-Z0-9\/\'&,. -]*$/', $line))
Yes, it's that simple.
Note that the space has to be inserted before the - because it is a metacharacter in a character class (unless it's the first or last character in said character class). Another option is to escape it like:
if(!preg_match('/^[A-Z0-9\/\'&,.\- ]*$/', $line))
The regex explained:
^ and $ are start and end of string anchors. It tells the regex engine that it has to match the whole string rather than just part of it.
[...] is a character class.
* is the zero-or-more repetition operator. This means it will accept an empty string. You can change it to + (one-or-more) so it rejects the empty string.
This is a good reference for RegEx, though specifically for Perl:
http://www.cs.tut.fi/~jkorpela/perl/regexp.html