what does # mean when used in preg_match? - php

This is from a class there is a # sign in preg_match what does it mean or its purpose? Does it mean a space?
if (preg_match("#Property Information </td>#",simplexml_import_dom($cols->item(0))->asXML(),$ok))
{
$table_name = 'Property Information';
}

In that case, it is being used as a pattern delimiter. As that manual page says,
When using the PCRE functions, it is required that the pattern is enclosed by delimiters. A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character.
Often used delimiters are forward slashes (/), hash signs (#) and tildes (~).

It is just a delimiter. It can be any other pair of character. The following are all the same
"#Property Information </td>#"
"+Property Information </td>+"
"|Property Information </td>|"
"#Property Information </td>#"
"[Property Information </td>]"
...
The purpose of the delimiter to separate regex pattern with modifier, e.g. if you need case-insensitive match you'll put an i after the delimiter, e.g.
"#Property Information </td>#i"
"+Property Information </td>+i"
"|Property Information </td>|i"
"#Property Information </td>#i"
"[Property Information </td>]i"
...
See http://www.php.net/manual/en/regexp.reference.delimiters.php for detail.

Almost any character - when appearing at the first position - can be used as a PCRE delimiter. In this case it's the # (another common one would be / but when dealing with closing tags that one is not really good as you'd have to escape every / in the text
)
See http://www.php.net/manual/en/regexp.reference.delimiters.php for details.
However, you shouldn't use a Regex for this check at all - you are just testing if a plain string is in another string. Here's a proper solution:
$xml = simplexml_import_dom($cols->item(0))->asXML()
if(strpos($xml, 'Property Information </td>') !== false) { ... }
Actually, using string operators when dealing with html/xml is not really nice but if you are just doing simple "contains" checks it's usually the easiest way.

every regular expression must start and end with the same character. the author of the given regular expression has chosen to start and end the regular expression with an # sign.

Related

Comments in preg regexes using # as delimiter?

With perl like regular expression syntax, you are able to make inline comments using the /x modifier and the # character to annotate comments, but what if I'm using PHP and using # as delimiter for styling reasons, any way to make a comment then?
preg_replace("/foo # This is a comment\n/x", "bar","foobar")
works but
preg_replace("#foo # This is a comment\n#x", "bar","foobar")
doesnt work, neither does //, /**/ or any common comment sequence I tried.
In a PHP regex pattern, a delimiter has more "weight" than a pattern part. If you define a delimiter as # you cannot use it as a part of another special construct. So, "#foo # This is a comment\n#x" and "#foo (?# This is a comment\n)#x" won't work as the # signals the end of the pattern space inside the regex.
When you escape a #, it becomes a literal # symbol. The "#foo \\# This is a comment\n#x" will match "foo#Thisisacomment" as once it is escaped, it is matched as a literal symbol.
So, the best advice is available on the "Delimiters" page at php.net:
If the delimiter needs to be matched inside the pattern it must be escaped using a backslash. If the delimiter appears often inside the pattern, it is a good idea to choose another delimiter in order to increase readability.

preg_match not reading results: Delimiter issue?

I have a part of a string. I need to check if it would equal another string when I add a few symbols on. However, my use of delimiters (I believe) is not allowing for the matches to take place.
My IF statement:
if (preg_match("{" . "$words[$counter_words]" . "[<]N}", "$corpus[$counter_corpus]"))
My corpus:
{3(-)D[<]AN}
{dog[<]N}
{4(-)H(')er[<]N}
{4(-)H[<]A}
{A battery[<]h}
My partial array is as follows
dog
cat
3-D
plant
My goal is to match "dog" with "{dog[<]N}" (the [] and {} are delimiters). To try to compensate for this, I glue delimiters to the start and end of the string. Preg_match accepts it, but does not match the two together.
What would be the solution to this? I cannot find or think of a solution. Your help is greatly appreciated.
if (preg_match("{" . $words[$counter_words] . "\\[<\\]N}", $corpus[$counter_corpus]))
[ and ] have special meaning in regular expressions. If you don't want the special meaning you need to escape them, \[. But because this is inside a PHP string, to get a \ character, you must enter \\.
I have a part of a string. I need to check if it would equal another string when I add a few symbols on.
To check if a string equals another one, we don't need preg_match(); this would do:
if ("{{$words[$counter_words]}[<]N}" == $corpus[$counter_corpus])
If you use preg_match(), you have to heed the PCRE regex syntax:
When using the PCRE functions, it is required that the pattern is enclosed
by delimiters. A delimiter can be any non-alphanumeric,
non-backslash, non-whitespace character.
Often used delimiters are forward slashes (/), …
The added symbols { and } at the start and end of your string were taken as pattern-enclosing delimiters, while you meant them to be part of the pattern. You have to add actual delimiters:
if (preg_match("/{{$words[$counter_words]}\[<]N}/", $corpus[$counter_corpus]))
Also of interest:
Double quoted
Variable parsing / Complex (curly) syntax

What are those characters in a regular expression?

I found this regex that works correctly but I didn't understand what is # (at the start) and at the end of the expression. Are not ^ and $ the start/end characters?
preg_match_all('#^/([^/]+)/([^/]+)/$#', $s, $matches);
Thanks
The matched pattern contains many /, thus the # is used as regex delimeter. These are identical
/^something$/
and
#^something$#
If you have multiple / in your pattern the 2nd example is better suited to avoid ugly masking with \/. This is how the RE would like like with using the standard // syntax:
/^\/([^\/]+)\/([^\/]+)\/$/
About #:
That's a delimiter of the regular expression itself. It's only meaning is to tell which delimiter is used for the expression. Commonly / is used, but others are possible. PCRE expressions need a delimiter with preg_match or preg_match_all.
About ^:
Inside character classes ([...]), the ^ has the meaning of not if it's the first character.
[abc] : matching a, b or c
[^abc] : NOT matching a, b or c, match every other character instead
Also # at the start and the end here are custom regex delimiters. Instead of the usual /.../ you have #...#. Just like perl.
These are delimiters. You can use any delimiter you want, but they must appear at the start and end of the regular expression.
Please see this documentation for a detail insight in to regular expressions:
http://www.php.net/manual/en/pcre.pattern.php
You can use pretty much anything as delimiters. The most common one is /.../, but if the pattern itself contains / and you don't want to escape any and all occurrences, you can use a different delimiter. My personal preference is (...) because it reminds me that $0 of the result is the entire pattern. But you can do anything, <...>, #...#, %...%, {...}... well, almost anything. I don't know exactly what the requirements are, but I think it's "any non-alphanumeric character".
Let me break it down:
# is the first character, so this is the character used as the delimiter of the regular expression - we know we've got to the end when we reach the next (unescaped) one of these
^ outside of a character class, this means the beginning of the string
/ is just a normal 'slash' character
([^/]+) This is a bracketed expression containing at least one (+) instance of any character that isn't a / (^ at the beginning of a character class inverts the character class - meaning it will only match characters that are not in this list)
/ again
([^/]+) again
/ again
$ this matches the end of the string
# this is the final delimeter, so we know that the regex is now finished.

What does it mean when a regular expression is surrounded by # symbols?

Question
What does it mean when a regular expression is surrounded by # symbols? Does that mean something different than being surround by slashes? What about when #x or #i are on the end? Now that I think about it, what do the surrounding slashes even mean?
Background
I saw this StackOverflow answer, posted by John Kugelman, in which he displays serious Regex skills.
Now, I'm used to seeing regexes surrounded by slashes as in
/^abc/
But he used a regex surrounded by # symbols:
'#
^%
(.{2}) # State, 2 chars
([^^]{0,12}.) # City, 13 chars, delimited by ^
([^^]{0,34}.) # Name, 35 chars, delimited by ^
([^^]{0,28}.) # Address, 29 chars, delimited by ^
\?$
#x'
In fact, it seems to be in the format:
#^abc#x
In the process of trying to google what that means (it's a tough question to google!), I also saw the format:
#^abc#i
It's clear the x and the i are not matched characters.
So what does it all mean???
Thanks in advance for any and all responses,
-gMale
The surrounding slashes are just the regex delimiters. You can use any character (afaik) to do that - the most commonly used is the /, other I've seen somewhat commonly used is #
So in other words, #whatever#i is essentially the same as /whatever/i (i is modifier for a case-insensitive match)
The reason you might want to use something else than the / is if your regex contains the character. You avoid having to escape it, similar to using '' for strings instead of "".
Found this from a "Related" link.
The delimiter can be any character that is not alphanumeric, whitespace or a backslash character.
/ is the most commonly used delimiter, since it is closely associated with regex literals, for instance in JavaScript where they are the only valid delimiter. However, any symbol can be used.
I have seen people use ~, #, #, even ! to delimit their regexes in a way that avoids using symbols that are also in the regex. Personally I find this ridiculous.
A lesser-known fact is that you can use a matching pair of brackets to delimit a regex in PHP. This has the tremendous advantage of having an obvious difference between the closing delimiter, and the symbol showing up in the pattern, and therefore don't need any escaping. My personal preference is this:
(^abc)i
By using parentheses, I remind myself that in a match, $m[0] is always the full match, and the subpatterns start at $m[1].

Can I store and call regex in variables for later use?

I plan on storing regular expression codes in a database, but not sure how to get them from a variable to function.
Any advise?
$i = "([wx])([yz])"
$j = "[^A-Za-z0-9]"
$k= "([A-Z]{3}|[0-9]{4})"
//Wold this execute properly, this really is the extent of my question?
preg_match($i, $string);
Regular expressions are simply strings, so you could store them as such in your database.
It should work, the only thing you're missing are delimiters.
http://php.net/manual/en/regexp.reference.delimiters.php
When using the PCRE functions, it is required that the pattern is
enclosed by delimiters. A delimiter can be any non-alphanumeric,
non-backslash, non-whitespace character.
Often used delimiters are forward slashes (/), hash signs (#) and
tildes (~). The following are all examples of valid delimited
patterns.
/foo bar/
#^[^0-9]$#
+php+
%[a-zA-Z0-9_-]%
You could also store the expressions without delimiters and add them later.

Categories