I got the expression directly from RegExr, but PHP has a problem with the =
"/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/"
The expression is for matching email addresses.
You used / as the delimiter marking the start and end of the pattern, but then also used that character within the pattern. You must either use a different delimiter, or escape instances of it within the pattern. If you meant to escape the equals signs, then you used the wrong slash.
Escape the slash preceding the = (and the other slash in that expression). You use / as a delimiter, therefore if it occurs inside the pattern it has to be escaped.
"/[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/"
should work, then.
You are using / as delimiter. There are two / in the regex which are not escaped. Escape them as \/:
"/[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/"
^^ ^^
Related
I've been trying to make a few of functions based on RegEx and most of them use \Q and \E as some of the RegEx pattern is user input.
So, let's say hypothetically that we're using the delimiter / and want to match it against / the function would construct something amongst the lines of /\Q/\E/.
I'm not sure why /\Q/\E/ doesn't match / but with every other delimiter it does, unless you use the same delimiter as input.
Maybe, it considers the delimiter the end, even though, it's in a literal-only block and the escape as literal. Not sure, tried a bunch.
Hopefully someone can push me into the right direction as to what workarounds there are for this issue.
It helps to understand that / is not a regex metacharacter, like * or (. It's special because you're using it to delimit the regex itself, and the only way to escape the regex delimiter is with a backslash (\/).
But you shouldn't need to use \Q and \E. The preg_quote() method takes a delimiter argument, so it correctly adds backslashes everywhere they're needed.
I'm been practicing the preg_match() function in PHP. The tutorial said that it is needed to add fore slashes before the characters.
I also noticed that without the slashes, it works strangely. It gives a warning:
preg_match(): Delimiter must not be alphanumeric or backslash.
Q: What difference does the fore slashes do?
Here's the code:
$string = 'Okay, I\'m fine with it! ';
$math = 'Okay'; // I need to add fore slashes for it to work
echo preg_match($math, $string); // It supposedly echoes out 1 or 0
// depending if the former argument
// is in the latter argument
There is no particular reason, it's a syntaxic choice. This syntax has the avantage to be handy to add global modifiers to the pattern:
delimiter - pattern - delimiter - [global modifiers]
As explained in the error message and in the php manual, you can choose the delimiter between special characters, the most commonly used is the slash, but it's not always a pertinent choice in particular when the pattern contains a lot of literal slashes that need to be escaped.
It's because you can also apply switches to the regular expression (eg. m for multiline, u for Unicode) and these need to be defined outside of the delimiter, so the syntax is
opening delimiter expression closing delimiter [optional switches]
e.g.
/^[a-z]*$/mi
for the multiline (m) and case insensitive (i) switches, using a delimiter of /
The delimiter must not be a character that can be misinterpreted by the regexp parser, it must be very clear that it is a delimiter, so it cannot be alpha (e.g. i, or a \ that is used to "escape" characters in the regexp
Note that you can also use braces as delimiters, so
[^[a-z]*$]mi
is valid
Using :
preg_match_all(
"/\b".$KeyWord."\b/u",
$SearchStr,
$Array1,
PREG_OFFSET_CAPTURE);
This code works fine for all cases except when there is a / in the $KeyWord var. Then I get a warning and unsuccessful match of course.
Any idea how to work around this?
Thanks
use preg_quote() around the keyword.
http://us2.php.net/preg_quote
but also provide your delimiter, so it gets escaped: preg_quote($KeyWord, "/")
You must parse $KeyWord and add "\" before all spec symbols, you can use preg_quote()
Dynamic Values In Patterns
You are using a dynamic value inside the pattern. Like escaping for SQL or HTML, a specific escaping for the value is needed. If you do not escape meta characters inside the value are interpreted by the regex engine. The escaping function for PCRE patterns is preg_quote().
preg_match_all(
"(\b".preg_quote($KeyWord)."\b)u",
$SearchStr,
$Array1,
PREG_OFFSET_CAPTURE
);
Delimiters
The syntax of a pattern in PHPs preg_* function is:
DELIMITER PATTERN DELIMITER OPTIONS
The / is the delimiter in your pattern. So the / inside the $keyWord was recognized as the closing delimiter.
But all non alphanumeric characters can be used. In Perl and JS you can define a regular expression directly (not as string) using / so it is often the default in tutorials.
Most delimiters have to be escaped inside the pattern.
Match a \: '/\//'
The exception to this rule are brackets. You use any of the bracket pairs as delimiter. And because it is a pair, they can still be used inside the pattern.
Match a \: '(/)'
The () brackets are a good decision, you can count them as "subpattern 0".
You can use preg_quote to handle the backslash character.
From the manual:
puts a backslash in front of every
character that is part of the regular
expression syntax
You can also pass the delimiter as the second parameter and it will also be escaped. However, if you're using # as your delimiter, then there's no need to escape /
So, you can either use:
preg_match_all("/\b".preg_quote($KeyWord, "/")."\b/u", $SearchStr,$Array1,PREG_OFFSET_CAPTURE))
or, if you are sure that your keyword does not contain any other regex-special characters, you can simply change the delimiter, and use to escape the backslash:
preg_match_all("#\b".$KeyWord."\b#u", $SearchStr,$Array1,PREG_OFFSET_CAPTURE))
Question
What does it mean when a regular expression is surrounded by # symbols? Does that mean something different than being surround by slashes? What about when #x or #i are on the end? Now that I think about it, what do the surrounding slashes even mean?
Background
I saw this StackOverflow answer, posted by John Kugelman, in which he displays serious Regex skills.
Now, I'm used to seeing regexes surrounded by slashes as in
/^abc/
But he used a regex surrounded by # symbols:
'#
^%
(.{2}) # State, 2 chars
([^^]{0,12}.) # City, 13 chars, delimited by ^
([^^]{0,34}.) # Name, 35 chars, delimited by ^
([^^]{0,28}.) # Address, 29 chars, delimited by ^
\?$
#x'
In fact, it seems to be in the format:
#^abc#x
In the process of trying to google what that means (it's a tough question to google!), I also saw the format:
#^abc#i
It's clear the x and the i are not matched characters.
So what does it all mean???
Thanks in advance for any and all responses,
-gMale
The surrounding slashes are just the regex delimiters. You can use any character (afaik) to do that - the most commonly used is the /, other I've seen somewhat commonly used is #
So in other words, #whatever#i is essentially the same as /whatever/i (i is modifier for a case-insensitive match)
The reason you might want to use something else than the / is if your regex contains the character. You avoid having to escape it, similar to using '' for strings instead of "".
Found this from a "Related" link.
The delimiter can be any character that is not alphanumeric, whitespace or a backslash character.
/ is the most commonly used delimiter, since it is closely associated with regex literals, for instance in JavaScript where they are the only valid delimiter. However, any symbol can be used.
I have seen people use ~, #, #, even ! to delimit their regexes in a way that avoids using symbols that are also in the regex. Personally I find this ridiculous.
A lesser-known fact is that you can use a matching pair of brackets to delimit a regex in PHP. This has the tremendous advantage of having an obvious difference between the closing delimiter, and the symbol showing up in the pattern, and therefore don't need any escaping. My personal preference is this:
(^abc)i
By using parentheses, I remind myself that in a match, $m[0] is always the full match, and the subpatterns start at $m[1].
preg_replace("/(/s|^)(php|ajax|c\+\+|javascript|c#)(/s|$)/i", '$1#$2$3', $somevar);
It's meant to turn, for example, PHP into #PHP.
Warning: preg_replace(): Unknown modifier '|'
It's because you are using the forward slash (/) as your delimiter. When the regex engine gets to /s (3rd character) it thinks the regex is over and the rest of it are modifiers. But no such modifier (|) exists, thus the error.
Next time, you can either:
Change your delimiters to something you won't use in your regex, ie:
preg_replace("!(/s|^)(php|ajax|c\+\+|javascript|c#)(/s|$)!i", '$1#$2$3', $somevar);
Or escape those characters with a backslash, ie: "/something\/else/"*
I also suspect you didn't intend to use /s, but the escape character \s that matches whitespace characters.
The first character in the regular expression is the delimiter. If you need to use this inside your regular expression then you need to escape it:
"/(\/s|^)...
^
Or alternatively, choose another delimiter that isn't used anywhere in your regular expression so that you don't need to escape:
"~(/s|^)...(/s|$)~i"
I prefer to do the latter as it makes the regular expression more readable.
(Although as NullUserException points out, the actual error is that you should have used a backslash instead of a slash).