Why do < and > need to be escaped in preg_*() patterns?

Why do < and > need to be escaped in preg_*() patterns? - php

I've run preg_quote('<>') to check if these characters need to be escaped in a regular expression, and to my surprise, they came back escaped: \<\>.
Why do these characters need to be escaped? What is their meaning in a regular expression?

< has significance when used to define lookbehinds
((?<!foo)bar matches bar that is not preceded by foo)
Both < and > are used to name subpatterns, like so:
preg_match("/(?<area>\d{3})-(?<sub>\d{3})-(?<num>\d{4})/",$number,$m);
// now elements of the US phone number are in $m['area'], $m['sub'] and $m['num']
So, because they can have significance when used in conjunction with other symbols, they are escaped.
It should be noted, however, that they have no meaning outside of a specific place in a subpattern, so if you're escaping manually you most likely won't need to escape them.
To expand further:
The documentation has a full list of characters that are escaped. Here I will list them, along with their meanings.
. Match any single character, other than newlines (unless the s modifier is set)
\ Escape the following character, or begin an escape sequence
+ Match one or more of the preceding character, class, or subpattern
* Match zero or more of the preceding character, class, or subpattern
? Makes the previous item optional, also used in subpatterns to define special behaviours such as "don't capture" ((?:foo)), "lookahead" ((?=foo) and (?!foo)), "lookbehind" ((?<=foo) and (?<!foo)), and many other uses besides.
[ and ] Define a character class, ie. a set of characters that may be matched. Most other symbols don't have meaning inside character classes.
^ and $ Match the start and end of the string respecively. When the m modifier is present, it also matches the start and end of individual lines.
( and ) Define a subpattern, used alone for capturing or with ? for special behaviour. Also useful for applying quantifiers, such as in \d{1,3}(?:,\d{3})* to match thousand-separated numbers.
{ and } Manually quantify the previous item. Takes one or two numbers, separated by a comma. Examples include {3} to match exactly three times, {,3} to match zero to three times, {3,} to match three or more times, and {3,8} to match three to eight times.
= Used in lookahead assertions: foo(?=bar) matches foo, but only if it is followed by bar.
! Used in negative lookaround assertions: foo(?!bar) matches foo, but not if it is followed by bar.
< and > The subject of this question, see the start of the answer for info.
| Alternation, specifying a list of possibilities. It's kind of like a character class but for entire patterns instead of single characters. foo|bar matches "foo" or "bar". May also be seen as a special behaviour in subpatterns: (?|foo(bar)|bar(foo)) ensures that whatever bit falls in the parentheses will be in subpattern 1 (otherwise, bar would be in 1 if matched, foo would be in 2 if matched, and the unmatched one would be empty)
: Used in subpatterns to make them non-capturing. Essentially, the subpattern just becomes a "group of characters", which will typically be quantified. (?:foo) matches, but does not capture, "foo".
- Defines a range of characters in a character class. Has no meaning outside of one.

Related

Regex for the following condition

I need a small help with regex for the following
Alphanumeric with only lower case alphabets allowed
Starts with number or alphabet
Allows period (.)
Doesn't allow consecutive periods No ..
Doesn't allow any other special characters
Thanks,
-GM

^(?![^.]*\.\.)[a-z0-9][a-z0-9.]*$
The negative lookahead at the beginning covers your 4th requirement, everything else should be pretty straightforward. ^ and $ are beginning and end of string anchors, the character classes enforce the requirement that only lowercase letters, numbers, and . are allowed.
To add the length constraint (between 6 and 16 characters) just change the * to {5,15}. * means "repeat the previous element zero or more times", {n,m} means "repeat the previous element between n and m times (inclusive)". The reason {5,15} is used instead of {6,16} is that one character is already consumed by the first character class. Here is the end result:
^(?![^.]*\.\.)[a-z0-9][a-z0-9.]{5,15}$

Here's some assistance without giving away the answer, as you'll learn the most.
To match from a certain combination of characters, e.g. alphanumeric, use character classes, e.g. [a-z0-9]. Note that this expression matches exactly one character. You must use quantifiers to match more than one, e.g. +.
To "start" or "end" with something, you must use anchors, ^ and $, before the first or after the last character, respectively. (Watch out, though. In a character class, the ^ inverts the character class.)
In regex, . has a special meaning as a wildcard (matching any character besides newline characters). Therefore you have to escape them, \., to select the literal dot. Another way to escape the dot is to put it in a character class: [.].
Non-consecutiveness is trickier. You will need to look up more information about negative lookahead assertions (or lookaround assertions in general).
All the bolded words are terms you can Google to learn.

I'd say something along those lines: /^[a-z0-9]+(\.[a-z0-9]+)*\.?$/ (suppose that the line can end with a period)

Use this if the string may not end with a period:
/^[a-z0-9]+(\.[a-z0-9]+)*$/
or this if it may:
/^[a-z0-9]+(\.[a-z0-9]+)*\.?$/

This should be the best
^([a-z0-9]+\.?)+$

PHP regular expressions - pattern error

I am trying to search for some pattern in PHP with the help of preg_match. Search pattern is like this (but this is wrong):
/[\d\s*-\s*\d\s*(usd|eur)]{1}/i
\d starts with integer,
\s* there can be any number of whitespaces,
- there must be exactly one minus sign
\s* there can be any number of whitespaces,
\d then must be integer
\s* there can be any number of whitespaces,
(usd|eur) any of the following words must be present but one
[\d\s*-\s*\d\s*(usd|eur)]{1} - in string there should be exactly one occurence
the above pattern does not work, what I am doing wrong? For testing:
<?php
$pattern = '/[\d\s*-\s*\d\s*(usd|eur)]{1}/i';
$query = '100-120 100-120';
echo $pattern.'<br/>';
echo $query.'<br/>';
if(preg_match($pattern, $query))
echo 'OK';
else
echo 'not OK!';
?>
Note:
I am trying to pull out data like this:
The price of item is 100 - 120 usd in our market

[...] is a character class. It means "match any one of these characters". [abc] will match a,b, or c. It doesn't match the string "abc".
In addition:
{1} means "match the preceding expression one time". However, matching once is the default. There is no need to explicitly tell it to match one time.
\d matches a single numeric digit. Based on your example, you want \d+ - match a number made up of at least one digit.
Here is what your pattern should look like:
/\d+\s*-\s*\d+\s*(usd|eur)/i

Regular expressions are a powerful tool for examining and modifying text. Regular expressions themselves, with a general pattern notation almost like a mini programming language, allow you to describe and parse text. They enable you to search for patterns within a string, extracting matches flexibly and precisely. However, you should note that because regular expressions are more powerful, they are also slower than the more basic string functions. You should only use regular expressions if you have a particular need.
This tutorial gives a brief overview of basic regular expression syntax and then considers the functions that PHP provides for working with regular expressions.
The Basics
Matching Patterns
Replacing Patterns
Array Processing
PHP supports two different types of regular expressions: POSIX-extended and Perl-Compatible Regular Expressions (PCRE). The PCRE functions are more powerful than the POSIX ones, and faster too, so we will concentrate on them.
The Basics
In a regular expression, most characters match only themselves. For instance, if you search for the regular expression "foo" in the string "John plays football," you get a match because "foo" occurs in that string. Some characters have special meanings in regular expressions. For instance, a dollar sign ($) is used to match strings that end with the given pattern. Similarly, a caret (^) character at the beginning of a regular expression indicates that it must match the beginning of the string. The characters that match themselves are called literals. The characters that have special meanings are called metacharacters.
The dot (.) metacharacter matches any single character except newline (). So, the pattern h.t matches hat, hothit, hut, h7t, etc. The vertical pipe (|) metacharacter is used for alternatives in a regular expression. It behaves much like a logical OR operator and you should use it if you want to construct a pattern that matches more than one set of characters. For instance, the pattern Utah|Idaho|Nevada matches strings that contain "Utah" or "Idaho" or "Nevada". Parentheses give us a way to group sequences. For example, (Nant|b)ucket matches "Nantucket" or "bucket". Using parentheses to group together characters for alternation is called grouping.
If you want to match a literal metacharacter in a pattern, you have to escape it with a backslash.
To specify a set of acceptable characters in your pattern, you can either build a character class yourself or use a predefined one. A character class lets you represent a bunch of characters as a single item in a regular expression. You can build your own character class by enclosing the acceptable characters in square brackets. A character class matches any one of the characters in the class. For example a character class [abc] matches a, b or c. To define a range of characters, just put the first and last characters in, separated by hyphen. For example, to match all alphanumeric characters: [a-zA-Z0-9]. You can also create a negated character class, which matches any character that is not in the class. To create a negated character class, begin the character class with ^: [^0-9].
The metacharacters +, *, ?, and {} affect the number of times a pattern should be matched. + means "Match one or more of the preceding expression", * means "Match zero or more of the preceding expression", and ? means "Match zero or one of the preceding expression". Curly braces {} can be used differently. With a single integer, {n} means "match exactly n occurrences of the preceding expression", with one integer and a comma, {n,} means "match n or more occurrences of the preceding expression", and with two comma-separated integers {n,m} means "match the previous character if it occurs at least n times, but no more than m times".
Now, have a look at the examples:
Regular Expression Will match...
foo The string "foo"
^foo "foo" at the start of a string
foo$ "foo" at the end of a string
^foo$ "foo" when it is alone on a string
[abc] a, b, or c
[a-z] Any lowercase letter
[^A-Z] Any character that is not a uppercase letter
(gif|jpg) Matches either "gif" or "jpeg"
[a-z]+ One or more lowercase letters
[0-9\.\-] Аny number, dot, or minus sign
^[a-zA-Z0-9_]{1,}$ Any word of at least one letter, number or _
([wx])([yz]) wy, wz, xy, or xz
[^A-Za-z0-9] Any symbol (not a number or a letter)
([A-Z]{3}|[0-9]{4}) Matches three letters or four numbers
Perl-Compatible Regular Expressions emulate the Perl syntax for patterns, which means that each pattern must be enclosed in a pair of delimiters. Usually, the slash (/) character is used. For instance, /pattern/.
The PCRE functions can be divided in several classes: matching, replacing, splitting and filtering.
Matching Patterns
The preg_match() function performs Perl-style pattern matching on a string. preg_match() takes two basic and three optional parameters. These parameters are, in order, a regular expression string, a source string, an array variable which stores matches, a flag argument and an offset parameter that can be used to specify the alternate place from which to start the search:
preg_match ( pattern, subject [, matches [, flags [, offset]]])
The preg_match() function returns 1 if a match is found and 0 otherwise. Let's search the string "Hello World!" for the letters "ll":
<?php
if (preg_match("/ell/", "Hello World!", $matches)) {
echo "Match was found <br />";
echo $matches[0];
}
?>
The letters "ll" exist in "Hello", so preg_match() returns 1 and the first element of the $matches variable is filled with the string that matched the pattern. The regular expression in the next example is looking for the letters "ell", but looking for them with following characters:
<?php
if (preg_match("/ll.*/", "The History of Halloween", $matches)) {
echo "Match was found <br />";
echo $matches[0];
}
?>
Now let's consider more complicated example. The most popular use of regular expressions is validation. The example below checks if the password is "strong", i.e. the password must be at least 8 characters and must contain at least one lower case letter, one upper case letter and one digit:
<?php
$password = "Fyfjk34sdfjfsjq7";
if (preg_match("/^.*(?=.{8,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$/", $password)) {
echo "Your passwords is strong.";
} else {
echo "Your password is weak.";
}
?>
The ^ and $ are looking for something at the start and the end of the string. The ".*" combination is used at both the start and the end. As mentioned above, the .(dot) metacharacter means any alphanumeric character, and * metacharacter means "zero or more". Between are groupings in parentheses. The "?=" combination means "the next text must be like this". This construct doesn't capture the text. In this example, instead of specifying the order that things should appear, it's saying that it must appear but we're not worried about the order.
The first grouping is (?=.{8,}). This checks if there are at least 8 characters in the string. The next grouping (?=.[0-9]) means "any alphanumeric character can happen zero or more times, then any digit can happen". So this checks if there is at least one number in the string. But since the string isn't captured, that one digit can appear anywhere in the string. The next groupings (?=.[a-z]) and (?=.[A-Z]) are looking for the lower case and upper case letter accordingly anywhere in the string.
Finally, we will consider regular expression that validates an email address:
<?php
$email = firstname.lastname#aaa.bbb.com;
$regexp = "/^[^0-9][A-z0-9_]+([.][A-z0-9_]+)*[#][A-z0-9_]+([.][A-z0-9_]+)*[.][A-z]{2,4}$/";
if (preg_match($regexp, $email)) {
echo "Email address is valid.";
} else {
echo "Email address is <u>not</u> valid.";
}
?>
This regular expression checks for the number at the beginning and also checks for multiple periods in the user name and domain name in the email address. Let's try to investigate this regular expression yourself.
For the speed reasons, the preg_match() function matches only the first pattern it finds in a string. This means it is very quick to check whether a pattern exists in a string. An alternative function, preg_match_all(), matches a pattern against a string as many times as the pattern allows, and returns the number of times it matched.
Replacing Patterns
In the above examples, we have searched for patterns in a string, leaving the search string untouched. The preg_replace() function looks for substrings that match a pattern and then replaces them with new text. preg_replace() takes three basic parameters and an additional one. These parameters are, in order, a regular expression, the text with which to replace a found pattern, the string to modify, and the last optional argument which specifies how many matches will be replaced.
preg_replace( pattern, replacement, subject [, limit ])
The function returns the changed string if a match was found or an unchanged copy of the original string otherwise. In the following example we search for the copyright phrase and replace the year with the current.
<?php
echo preg_replace("/([Cc]opyright) 200(3|4|5|6)/", "$1 2007", "Copyright 2005");
?>
In the above example we use back references in the replacement string. Back references make it possible for you to use part of a matched pattern in the replacement string. To use this feature, you should use parentheses to wrap any elements of your regular expression that you might want to use. You can refer to the text matched by subpattern with a dollar sign ($) and the number of the subpattern. For instance, if you are using subpatterns, $0 is set to the whole match, then $1, $2, and so on are set to the individual matches for each subpattern.
In the following example we will change the date format from "yyyy-mm-dd" to "mm/dd/yyy":
<?php
echo preg_replace("/(\d+)-(\d+)-(\d+)/", "$2/$3/$1", "2007-01-25");
?>
We also can pass an array of strings as subject to make the substitution on all of them. To perform multiple substitutions on the same string or array of strings with one call to preg_replace(), we should pass arrays of patterns and replacements. Have a look at the example:
<?php
$search = array ( "/(\w{6}\s\(w{2})\s(\w+)/e",
"/(\d{4})-(\d{2})-(\d{2})\s(\d{2}:\d{2}:\d{2})/");
$replace = array ('"$1 ".strtoupper("$2")',
"$3/$2/$1 $4");
$string = "Posted by John | 2007-02-15 02:43:41";
echo preg_replace($search, $replace, $string);?>
In the above example we use the other interesting functionality - you can say to PHP that the match text should be executed as PHP code once the replacement has taken place. Since we have appended an "e" to the end of the regular expression, PHP will execute the replacement it makes. That is, it will take strtoupper(name) and replace it with the result of the strtoupper() function, which is NAME.
Array Processing
PHP's preg_split() function enables you to break a string apart basing on something more complicated than a literal sequence of characters. When it's necessary to split a string with a dynamic expression rather than a fixed one, this function comes to the rescue. The basic idea is the same as preg_match_all() except that, instead of returning matched pieces of the subject string, it returns an array of pieces that didn't match the specified pattern. The following example uses a regular expression to split the string by any number of commas or space characters:
<?php
$keywords = preg_split("/[\s,]+/", "php, regular expressions");
print_r( $keywords );
?>
Another useful PHP function is the preg_grep() function which returns those elements of an array that match a given pattern. This function traverses the input array, testing all elements against the supplied pattern. If a match is found, the matching element is returned as part of the array containing all matches. The following example searches through an array and all the names starting with letters A-J:
<?php
$names = array('Andrew','John','Peter','Nastin','Bill');
$output = preg_grep('/^[a-m]/i', $names);
print_r( $output );
?>

Difference between regular expressions

I'm trying to work out what the differences are between these two:
preg_match('-^[^'.$inv.']+\.?$-' , $name
preg_match('-['.$inv.']-', $name
Thanks

To make it easier to exemplify, assume $inv = 'a'…
-^[^a]+\.?$- needs to match the whole string, because of the caret and the dollar signs. The string is expected to start with a character other than "a", followed by 0 or more characters that are still not "a"s. The last character in this string, however, can be a dot (hence the question mark after the dot)
-[a]- will match the first "a" in the string and it will stop looking as soon as it finds a match because you're using preg_match() and not preg_match_all().
Your first pattern does not make any sense, though, since already \. = [^a] (translated into English as: a dot is already not an "a")
[EDIT] The first pattern can actually mean something when there's a dot in the character class.

First of, be careful with $inv, depending on its content it could be possible to do some injections in the regular expression. To avoid that issue, use preg_quote().
That said, the first regex will be :
^ <-- the given string must begin with
[ <-- one of those characters
^ <-- inverse the accepted characters (instead of accepted characters, the following characters will be those that are not accepted)
$inv <-- characters
] <-- end of the list of characters (here not accepted characters)
+ <-- at least one character must be matched, more are accepted
\. <-- a '.'
? <-- the previous '.' isn't mandatory
$ <-- the given string must end here
If $inv = 'abc.' it will match:
def
def.
d
d.
It won't match:
., because the . isn't accepted by the [^abc.] group, even though there is \.? later, at least one character must be before a .
de.s, because the . isn't accepted in the [^abc.] group, it is only possible to have it at the end of the given string thanks to \.?
a
deb
testc
teskopkl;;[!##$b., because of the b
an empty string, at least one character must be matched with '[^'.$inv.']+'
It could be simplified into '^[^'.$inv.']+$' (don't forget the preg_quote though)
The second one will be:
[ <-- one of those characters
$inv <-- characters
] <-- end of the list of characters (here accepted characters)
If $inv = 'abc.' it will match
any string containing at least one of the letters a, b, c or .
It won't match any string which doesn't contain a, b, c or ..

In plain English, the first one is looking for an entire line which begins with one or more characters not included with the $inv string, and ending with an optional period.
The second one simply tries to match one character as specified by the value for $inv.

The first pattern matches a line containing none of the characters in $inv, optionally ending the line with a period.
The second pattern matches anything containing any of the characters in $inv.
- is the pattern delimiter, marking the beginning and end of the expression. It can technically be any character, but is most often /.
^ denotes the beginning of the string
[ ] encapsulates a set of characters to be matched
[^ ] encapsulates a set of characters that should not be matched, any other character is considered to be a match.
+ denotes that the previous character or set of characters should be matched one or more times.
. normally matches any character, which is why it is escaped as \. here to indicate a literal period character.
? denotes that the previous character should be matched zero or one time.
$ denotes the end of a string.

['.$inv.']
Lets go with the second one to begin with, since it's the simpler one.
This simply matches a string containing any single one of the characters contained within the string in the variable $inv.
It could contain anything else before or after that character from $inv.
^[^'.$inv.']+\.?$
Now the second one:
This matches a string that contains anything except the characters in $inv (the ^ inside the [] is a negative match).
The match that isn't part of $inv must be at the start of the string (the ^ outside the [] matches the start of the string).
The string can contain as many matching characters as it likes (one or more; that's the + sign after the [])
After that, it may optionally have a dot (the \.? is an optional dot character).
And nothing else after that (the $ matches the end of the string).
Note that in both cases, if $inv contains any regex reserved characters, it will fail (or do something unexpected). You should use preg_quote() to avoid this.
So... uh, they're completely different expressions. Not so much "what's the difference between them" as "what's the same about them". Answer: not much.

The first matches a string from start up to the first occurance of $inv followed by one or zero periods where the string must end.
The second matches a string only containing $inv.
Essentially they are almost the same, except the first allows for a possible . at the end.

Why does this regex not validate in the same way in PHP?

when I try preg_match with the following expression: /.{0,5}/, it still matches string longer than 5 characters.
It does, however, work properly when trying in online regexp matcher

The site you reference, myregexp.com, is focussed on Java.
Java has a specific function for matching an exact pattern, without needing to use anchor characters. This is the function which myregexp.com uses.
In most other languages, in order to match an exact pattern, you would need to add the anchoring characters ^ and $ at the start and end of the pattern respectively, otherwise the regex assumes it only needs to find the matched pattern somewhere within the string, rather than the whole string being the match.
This means that without the anchors, your pattern will match any string, of any length, because whatever the string, it will contain within it somewhere a match for "zero to five of any character".
So in PHP, and Perl, and virtually any other language, you need your pattern to look like this:
/^.{0,5}$/
Having explained all that, I would make one final observation though: this specific pattern really doesn't need to be a regular expression -- you could achieve the same thing with strlen(). In addition, the dot character in regex may not work exactly as you expect: it typically matches almost any character; some characters, including new line characters, are excluded by default, so if your string contains five characters, but one of them is a new line, it will fail your regex when you might have expected it to pass. With this in mind, strlen() would be a safer option (or mb_strlen() if you expect to have unicode characters).
If you need to match any character in regex, and the default behaviour of the dot isn't good enough, there are two options: One is to add the s modifier at the end of the expression (ie it becomes /^.{0,5}$/s). The s modifier tells regex to include new line characters in the dot "any character" match.
The other option (which is useful for languages that don't support the s modifier) is to use an expression and its negative together in a character class - eg [\s\S] - instead of the dot. \s matches any white space character, and \S is a negative of \s, so any character not matched by \s. So together in a character class they match any character. It's more long winded and less readable than a dot, but in some languages it's the only way to be sure.
You can find out more about this here: http://www.regular-expressions.info/dot.html
Hope that helps.

You need to anchor it with ^$. These symbols match the beginning and end of the string respectively, so it must be 0-5 characters between the beginning and end. Leaving out the anchors will match anywhere in the string so it could be longer.
/^.{0,5}$/
For better readability, I would probably also enclose the . in (), but that's kind of subjective.
/^(.){0,5}$/

Allow + in regex email validate email [duplicate]

This question already has answers here:
How to validate an email address in PHP
(15 answers)
Closed 2 years ago.
Regex is blowing my mind. How can I change this to validate emails with a plus sign? so I can sign up with test+spam#gmail.com
if(!preg_match("/^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*$/i", $_GET['em'])) {

It seems like you aren't really familiar with what your regex is doing currently, which would be a good first step before modifying it. Let's walk through your regex using the email address john.robert.smith#mail.com (in each section below, the bolded part is what is matched by that section):
^ is the start of string
anchor.
It specifies that any match must
begin at the beginning of the
string. If the pattern is not
anchored, the regex engine can match
a substring, which is often
undesired.
Anchors are zero-width, meaning that
they do not capture any characters.
[_a-z0-9-]+ is made up of two
elements, a character
class
and a repetition
modifer:
[...] defines a character class, which tells the regex engine,
any of these characters are valid matches. In this case the class
contains the characters a-z, numbers
0-9 and the dash and underscore (in
general, a dash in a character class
defines a range, so you can use
a-z instead of
abcdefghijklmnopqrstuvwxyz; when
given as the last character in the
class, it acts as a literal dash).
+ is a repetition modifier that specifies that the preceding token
(in this case, the character class)
can be repeated one or more times.
There are two other repetition
operators: * matches zero or more
times; ? matches exactly zero or
one times (ie. makes something
optional).
(captures
john.robert.smith#mail.com)
(\.[_a-z0-9-]+)* again contains a
repeated character class. It also
contains a
group,
and an escaped character:
(...) defines a group, which allows you to group multiple tokens
together (in this case, the group
will be repeated as a
whole).Let's say we wanted to
match 'abc', zero or more times (ie.
abcabcabc matches, abcccc doesn't).
If we tried to use the pattern
abc*, the repetition modifier
would only apply to the c, because
c is the last token before the
modifier. In order to get around
this, we can group abc ((abc)*),
in which case the modifier would
apply to the entire group, as if it
was a single token.
\. specifies a literal dot character. The reason this is needed
is because . is a special
character in regex, meaning any
character.
Since we want to match an actual dot
character, we need to escape it.
(captures
john.robert.smith#mail.com)
# is not a special character in
regex, so, like all other
non-special characters, it matches
literally.
(captures john.robert.smith#mail.com)
[a-z0-9-]+ again defines a repeated character class, like item #2 above.
(captures john.robert.smith#mail.com)
(\.[a-z0-9-]+)* is almost exactly the same pattern as #3 above.
(captures john.robert.smith#mail.com)
$ is the end of string anchor. It works the same as ^ above, except matches the end of the string.
With that in mind, it should be a bit clearer how to add a section with captures a plus segment. As we saw above, + is a special character so it has to be escaped. Then, since the + has to be followed by some characters, we can define a character class with the characters we want to match and define its repetition. Finally, we should make the whole group optional because email addresses don't need to have a + segment:
(\+[a-z0-9-]+)?
When inserted into your regex, it'd look like this:
/^[_a-z0-9-]+(\.[_a-z0-9-]+)*(\+[a-z0-9-]+)?#[a-z0-9-]+(\.[a-z0-9-]+)*$/i

Save your sanity. Get a pre-made PHP RFC 822 Email address parser

I've used this regex to validate emails, and it works just fine with emails that contain a+:
/^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/

\+ will match a literal + sign, but be aware: You still won't be close to matching all possible email addresses according to the RFC spec, because the actual regex for that is madness. It's almost certainly not worth it; you should use a real email parser for this.

This is another solution (is similar to the solution found by David):
//Escaped for .Net
^[_a-zA-Z0-9-]+((\\.[_a-zA-Z0-9-]+)*|(\\+[_a-zA-Z0-9-]+)*)*#[a-zA-Z0-9-]+(\\.[a-zA-Z0-9-]+)*(\\.[a-zA-Z]{2,4})$
//Native
^[_a-zA-Z0-9-]+((\.[_a-zA-Z0-9-]+)*|(\+[_a-zA-Z0-9-]+)*)*#[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*(\.[a-zA-Z]{2,4})$

This is the another solution
/^[_a-z0-9-+]+(\.[_a-z0-9-+]+)*(\+[a-z0-9-]+)?#[a-z0-9-.]+(\.[a-z0-9]+)$/
or For razor page(#=\u0040)
/^[_a-z0-9-+]+(\.[_a-z0-9-+]+)*(\+[a-z0-9-]+)?\u0040[a-z0-9-.]+(\.[a-z0-9]+)$/

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Why do < and > need to be escaped in preg_*() patterns? - php

I've run preg_quote('<>') to check if these characters need to be escaped in a regular expression, and to my surprise, they came back escaped: \<\>. Why do these characters need to be escaped? What is their meaning in a regular expression?

Related

Regex for the following condition

PHP regular expressions - pattern error

Difference between regular expressions

Why does this regex not validate in the same way in PHP?

Allow + in regex email validate email [duplicate]

Categories

Resources