I have been learning PHP for some time now and I wanted one clarification.
I have seen the preg_match function called with different delimiting symbols like:
preg_match('/.../')
and
preg_match('#...#')
Today I also saw % being used.
My question is two part:
What all characters can be used?
And is there a standard ?
Any
non-alphanumeric
non-whitespace and
non-backslash ASCII character
can be used as delimiter.
Also if you using the opening punctuation symbols as opening delimiter:
( { [ <
then their corresponding closing punctuation symbols must be used as closing delimiter:
) } ] >
The most common delimiter is /.But sometimes it's advised to use a different delimiter if a / is part of the regex.
Example:
// check if a string is number/number format:
if(preg_match(/^\d+\/\d+$/)) {
// match
}
Since the regex contains the delimiter, you must escape the delimiter found in the regex.
To avoid the escaping it is better to choose a different delimiter, one which is not present in the regex, that way your regex will be shorter and cleaner:
if(preg_match(#^\d+/\d+$#))
Related
I wan to replace text using preg_replace.But my search string have a / so it makes problem.
How can I solve it?
$search='r/trtrt';
echo preg_replace('/\b'.addslashes($search).'\b/', 'ERTY', 'TG FRT');
I am getting error preg_replace(): Unknown modifier 'T'
Use a different delimiter and don't use addslashes, that is escaping non-regex special characters (or a mix of regex and non-regex characters, I'd say the majority of the time dont use addslashes).
$search='r/trtrt';
echo preg_replace('~\b'. $search.'\b~', 'ERTY', 'TG FRT');
You could use preg_quote as an alternative. Just changing the delimiter is the easiest solution though.
use ~ as delimiter:
$search='r/trtrt';
echo preg_replace('~\b'.addslashes($search).'\b~', 'ERTY', 'TG FRT');
I always use ~ as it is one of the least used char in a string but you can use any character you want and won't need to escape your regexp chars!
You don't need addslashes() in your case but if you have a more complex regexp and you want to escape chars you should use preg_quote($search).
Why not escape it the way it is meant to be done
$search='r/trtrt';
echo preg_replace('/\b'.preg_quote($search, '/').'\b/', 'ERTY', 'TG FRT');
http://php.net/manual/en/function.preg-quote.php
preg_quote() takes str and puts a backslash in front of every
character that is part of the regular expression syntax. This is
useful if you have a run-time string that you need to match in some
text and the string may contain special regex characters
delimiter
If the optional delimiter is specified, it will also be escaped.
This is useful for escaping the delimiter that is required by the PCRE
functions. The / is the most commonly used delimiter.
Add slashes is not the function to use here. It provides no escaping for any of the special characters in Regx.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) {
} = ! < > | : -
Using the proper functions promote readability of the code, if at some later point in time you or another coder see the ~ delimiter they may just think its part of a personal "style" or pay it little attention. However, seeing the input properly escaped will tell any experienced coder that the input could contain characters that conflict with regular expressions.
Personally, readability is at the top of my list whenever I write code. If you cant understand it at a glance, what good is it.
I have a part of a string. I need to check if it would equal another string when I add a few symbols on. However, my use of delimiters (I believe) is not allowing for the matches to take place.
My IF statement:
if (preg_match("{" . "$words[$counter_words]" . "[<]N}", "$corpus[$counter_corpus]"))
My corpus:
{3(-)D[<]AN}
{dog[<]N}
{4(-)H(')er[<]N}
{4(-)H[<]A}
{A battery[<]h}
My partial array is as follows
dog
cat
3-D
plant
My goal is to match "dog" with "{dog[<]N}" (the [] and {} are delimiters). To try to compensate for this, I glue delimiters to the start and end of the string. Preg_match accepts it, but does not match the two together.
What would be the solution to this? I cannot find or think of a solution. Your help is greatly appreciated.
if (preg_match("{" . $words[$counter_words] . "\\[<\\]N}", $corpus[$counter_corpus]))
[ and ] have special meaning in regular expressions. If you don't want the special meaning you need to escape them, \[. But because this is inside a PHP string, to get a \ character, you must enter \\.
I have a part of a string. I need to check if it would equal another string when I add a few symbols on.
To check if a string equals another one, we don't need preg_match(); this would do:
if ("{{$words[$counter_words]}[<]N}" == $corpus[$counter_corpus])
If you use preg_match(), you have to heed the PCRE regex syntax:
When using the PCRE functions, it is required that the pattern is enclosed
by delimiters. A delimiter can be any non-alphanumeric,
non-backslash, non-whitespace character.
Often used delimiters are forward slashes (/), …
The added symbols { and } at the start and end of your string were taken as pattern-enclosing delimiters, while you meant them to be part of the pattern. You have to add actual delimiters:
if (preg_match("/{{$words[$counter_words]}\[<]N}/", $corpus[$counter_corpus]))
Also of interest:
Double quoted
Variable parsing / Complex (curly) syntax
I'm been practicing the preg_match() function in PHP. The tutorial said that it is needed to add fore slashes before the characters.
I also noticed that without the slashes, it works strangely. It gives a warning:
preg_match(): Delimiter must not be alphanumeric or backslash.
Q: What difference does the fore slashes do?
Here's the code:
$string = 'Okay, I\'m fine with it! ';
$math = 'Okay'; // I need to add fore slashes for it to work
echo preg_match($math, $string); // It supposedly echoes out 1 or 0
// depending if the former argument
// is in the latter argument
There is no particular reason, it's a syntaxic choice. This syntax has the avantage to be handy to add global modifiers to the pattern:
delimiter - pattern - delimiter - [global modifiers]
As explained in the error message and in the php manual, you can choose the delimiter between special characters, the most commonly used is the slash, but it's not always a pertinent choice in particular when the pattern contains a lot of literal slashes that need to be escaped.
It's because you can also apply switches to the regular expression (eg. m for multiline, u for Unicode) and these need to be defined outside of the delimiter, so the syntax is
opening delimiter expression closing delimiter [optional switches]
e.g.
/^[a-z]*$/mi
for the multiline (m) and case insensitive (i) switches, using a delimiter of /
The delimiter must not be a character that can be misinterpreted by the regexp parser, it must be very clear that it is a delimiter, so it cannot be alpha (e.g. i, or a \ that is used to "escape" characters in the regexp
Note that you can also use braces as delimiters, so
[^[a-z]*$]mi
is valid
Ok so I managed to solve a problem at work with regex, but the solution is a bit of a monster.
The string to be validated must be:
zero or more: A-Z a-z 0-9, spaces, or these symbols: . - = + ' , : ( ) /
But, the first and/or last characters must not be a forward slash /
This was my solution (used preg_match php function):
"/^[a-z\d\s\.\-=\+\',:\(\)][a-z\d\s\.\-=\+\',\/:\(\)]*[a-z\d\s\.\-=\+\',:\(\)]$|^[a-z\d\s\.\-=\+\',:\(\)]$/i"
A colleague thinks this is too big and complicated. Well it works, so is it really that bad? Anyone in the mood for some regex-golf?
You can simplify your expression to this:
/^(?:[a-z\d\s.\-=+',:()]+(?:/+[a-z\d\s.\-=+',:()]+)*)?$/i
The outer (?:…)? is to allow an empty string. The [a-z\d\s.\-=+',:()]+ allows to start with one or more of the specified characters except the /. If a / follows, it also must be followed by one or more of the other specified characters ((?:/[a-z\d\s.\-=+',:()]+)*).
Furthermore, inside a character set, you only need to escape the characters \, ], and depending on the position also ^ and -.
Try something like this instead
function validate($string) {
return (preg_match("/[a-zA-Z0-9.\-=+',:()/]*/", $string) && substr($string, 0,1) != '/' && substr($string, -1) != '/'))
}
It's a lot simpler to check the first and last character specifically. Otherwise you're left with dealing with a lot of overhead when it comes to empty strings and such. Your regex, for example, requires the string to be at least one character long, otherwise it doesn't validate. Despite "" fitting your criteria.
'#^(?!/)[a-z\d .=+\',:()/-]*$(?<!/)#i'
As others have observed, most of those characters don't need to be escaped inside a character class. Additionally, the hyphen doesn't need to be escaped if it's the last thing listed, and the slash doesn't need to be escaped if you use a different character as the regex delimiter (# in this case, but ~ is a popular choice, too).
I also ditched the double-quotes in favor of single-quotes, which meant I had to escape the single-quote in the regex. That's worth it because single-quoted strings are so much simpler to work with: no $variable interpolation, no embedded executable {code}, and the only characters you have to escape for them are the single-quote and the backslash.
But the main innovation here is the use of lookahead and lookbehind to exclude the slash as the first or last character. That's not just a code-golf tactic, either; I would write the regex this way anyway, because it expresses my intent so much better. Why force the next guy to parse those almost-identical character classes, when you can just say what you mean? "...but the first and last character can't be slashes."
Question
What does it mean when a regular expression is surrounded by # symbols? Does that mean something different than being surround by slashes? What about when #x or #i are on the end? Now that I think about it, what do the surrounding slashes even mean?
Background
I saw this StackOverflow answer, posted by John Kugelman, in which he displays serious Regex skills.
Now, I'm used to seeing regexes surrounded by slashes as in
/^abc/
But he used a regex surrounded by # symbols:
'#
^%
(.{2}) # State, 2 chars
([^^]{0,12}.) # City, 13 chars, delimited by ^
([^^]{0,34}.) # Name, 35 chars, delimited by ^
([^^]{0,28}.) # Address, 29 chars, delimited by ^
\?$
#x'
In fact, it seems to be in the format:
#^abc#x
In the process of trying to google what that means (it's a tough question to google!), I also saw the format:
#^abc#i
It's clear the x and the i are not matched characters.
So what does it all mean???
Thanks in advance for any and all responses,
-gMale
The surrounding slashes are just the regex delimiters. You can use any character (afaik) to do that - the most commonly used is the /, other I've seen somewhat commonly used is #
So in other words, #whatever#i is essentially the same as /whatever/i (i is modifier for a case-insensitive match)
The reason you might want to use something else than the / is if your regex contains the character. You avoid having to escape it, similar to using '' for strings instead of "".
Found this from a "Related" link.
The delimiter can be any character that is not alphanumeric, whitespace or a backslash character.
/ is the most commonly used delimiter, since it is closely associated with regex literals, for instance in JavaScript where they are the only valid delimiter. However, any symbol can be used.
I have seen people use ~, #, #, even ! to delimit their regexes in a way that avoids using symbols that are also in the regex. Personally I find this ridiculous.
A lesser-known fact is that you can use a matching pair of brackets to delimit a regex in PHP. This has the tremendous advantage of having an obvious difference between the closing delimiter, and the symbol showing up in the pattern, and therefore don't need any escaping. My personal preference is this:
(^abc)i
By using parentheses, I remind myself that in a match, $m[0] is always the full match, and the subpatterns start at $m[1].