How do I match a square bracket literal using RegEx? - php

What's the regex to match a square bracket? I'm using \\] in a pattern in eregi_replace, but it doesn't seem to be able to find a ]...

\] is correct, but note that PHP itself ALSO has \ as an escape character, so you might have to use \\[ (or a different kind of string literal).

Works flawlessly:
<?php
$hay = "ab]cd";
echo eregi_replace("\]", "e", $hay);
?>
Output:
abecd

There are two ways of doing this:
/ [\]] /x;
/ \] /x;
While you may consider the latter as the better option, and indeed I would consider using it in simpler regexps. I would consider the former, the better option for larger regexps. Consider the following:
/ (\w*) ( [\d\]] ) /x;
/ (\w*) ( \d | \] ) /x;
In this example, the former is my preferred solution. It does a better job of combining the separate entities, which may each match at the given location. It may also have some speed benefits, depending on implementation.
Note: This is in Perl syntax, partly to ensure proper highlighting.
In PHP you may need to double up on the back-slashes.
"[\\]]" and "\\]"

You don't need to escape it: if isolated, a ] is treated as a regular character.
Tested with eregi_replace and preg_replace.
[ is another beast, you have to escape it. Looks like single and double quotes, single or double escape are all treated the same by PHP, for both regex families.
Perhaps your problem is elsewhere in your expression, you should give it in full.

In .Net you escape special characters by adding up a backslash; "\" meaning it would become; "["...
Though since you normally do this in string literals you would either have to do something like this;
#"\["
or something like this;
"\\["

You problem may come from the fact you are using eregi_replace with the first parameter enclosed in simple quotes:
'\['
In double quotes, though, it could works well depending on the context, since it changes the way the parameter is passed to the function (simple quotes just pass the string without any interpretation, hence the need to double to "\" character).
Here, if "\[" is interpreted as an escape character, you still need to double "\".
Note: based on your comment, you may try the regex
<\s*(?:br|p)\s*\/?\s*\>\s*\[
in order to detect a [ right after a <br>or a <p>

Related

preg_replace_callback to run EXCEPT when inside first argument of .replace()

I want to perform a php preg_match_callback against all single or double-quoted strings, for which I'm using the code seen on https://codereview.stackexchange.com/a/217356, which includes handling of backslashed single/double quotes.
const PATTERN = <<<'PATTERN'
~(?|(")(?:[^"\\]|\\(?s).)*"|(')(?:[^'\\]|\\(?s).)*'|(#|//).*|(/\*)(?s).*?\*/|(<!--)(?s).*?-->)~
PATTERN;
$result=preg_replace_callback(PATTERN, function($m) {
return $m[1]."XXXX".$m[1];
}, $test);
but this runs into a problem when scanning blocks like that seen in .replace() calls from javascript, e.g.
x=y.replace(/'/g, '"');
... which treats '/g, ' as a string, with the "');......." as the following string.
To work around this I figure it would be good to do the callback except when the quotes are inside the first argument of .replace() as these cause problems with quoting.
i.e. do the standard callbacks, but when .replace is involved I want to change the XXXX part of abc.replace(/\'/, "XXXX"); but I want to ignore the \' quote/part.
How can I do this?
See https://onlinephp.io/c/5df12 ** https://onlinephp.io/c/8a697 for a running example, showing some successes (in green), and some failures (in red).
(** Edit to correct missing slash)
Note, the XXXX is a placeholder for some more work later.
Also note that I have looked at Javascript regex to match a regex but this talks about matching regex's - and I'm talking about excluding them. If you plug in their regex pattern into my code it does not work - so should not be considered a valid answer
You can use verbs (*SKIP)(*F) to skip something. For skipping the first argument e.g.:
\(\s*/.*?/\w*\h*,(*SKIP)(*F)|(?|(")[^"\\]*(?:\\.[^"\\]*)*"|(')[^'\\]*(?:\\.[^'\\]*)*')
See this demo at regex101 or your updated php demo
The pattern on the skipped side is very simple, you might want to further improve that.
Besides I used a bit more efficient pattern to match the quoted parts, explained here.

PHP Regex: match text urls until space or end of string

This is the text sample:
$text = "asd dasjfd fdsfsd http://11111.com/asdasd/?s=423%423%2F gfsdf http://22222.com/asdasd/?s=423%423%2F
asdfggasd http://3333333.com/asdasd/?s=423%423%2F";
This is my regex pattern:
preg_match_all( "#http:\/\/(.*?)[\s|\n]#is", $text, $m );
That match the first two urls, but how do I match the last one? I tried adding [\s|\n|$] but that will also only match the first two urls.
Don't try to match \n (there's no line break after all!) and instead use $ (which will match to the end of the string).
Edit:
I'd love to hear why my initial idea doesn't work, so in case you know it, let me know. I'd guess because [] tries to match one character, while end of line isn't one? :)
This one will work:
preg_match_all('#http://(\S+)#is', $text, $m);
Note that you don't have to escape the / due to them not being the delimiting character, but you'd have to escape the \ as you're using double quotes (so the string is parsed). Instead I used single quotes for this.
I'm not familar with PHP, so I don't have the exact syntax, but maybe this will give you something to try. the [] means a character class so |$ will literally look for a $. I think what you'll need is another look ahead so something like this:
#http:\/\/(.*)(?=(\s|$))
I apologize if this is way off, but maybe it will give you another angle to try.
See What is the best regular expression to check if a string is a valid URL?
It has some very long regular expressions that will match all urls.

Can't get Regex working in PHP, works in RegEXP program

Here is the input I am searching:
\u003cspan class=\"prs\">email_address#me.com\u003c\/span>
Trying to just return email_address#me.com.
My regex class=\\"prs\\">(.*?)\\ returns "class=\"prs\">email_address#me.com\" in RegExp which is OK, I can work with that result.
But I can't get it to work in PHP.
$regex = "/class=\\\"prs\\\">(.*?)\\/";
Gives me an error "No ending delimiter"
Can someone please help?
Your original code:
$regex = "/class=\\\"prs\\\">(.*?)\\/";
The reason you get No ending delimiter is that although you are escaping the backslash prior to the closing forward slash, what you have done is escaped it in the context of the PHP string, not in the context of the regex engine.
So the PHP string escaping mechanism does its thing, and by the time the regex engine gets it, it will look like this:
/class=\"prs\">(.*?)\/
This means that the regular expression engine will see the backslash at the end of the expression as escaping the forward slash that you are intending to use to close the expression.
The usual PHP solution to this kind of thing is to switch to using single-quoted string instead of a double-quoted one, but this still won't work, as \\ is an escaped backslash in both single and double quoted strings.
What you need to do is double up the number of backslash characters at the end of your string, so your code needs to look like this:
$regex = "/class=\\\"prs\\\">(.*?)\\\\/";
The way to prove what it's doing is to print the contents of the $regex variable, so you can see what the string will look like to the regex engine. These kinds of errors are actually very hard to spot, but looking at the actual content of the string will help you spot them.
Hope that helps.
If you change to single quotes it should fix it
$regex = '/class=\\\"prs\\\">(.*?)\\/';

php equals regular expression

I know I can use preg_match but I was wondering if php had a way to evaluate to a regular expression like:
if(substr($example, 0, 1) == /\s/){ echo 'whitespace!'; }
PHP does not have first-class regular expressions.
You will need to use the functions provided by the default PCRE extension. Sorry. It's a backslash-escaping nightmare, but it's all we've got.
(There's also the now-deprecated POSIX regex extension, but you should not use them any longer. They are slower, less featureful, and most important, they aren't Unicode-safe. Modern PCRE versions understand Unicode very well, even if PHP itself is ignorant about it.)
With regard to the backslash-escaping nightmare, you can keep the horror to a minimum by using single quotes to enclose the string containing the regex instead of doubles, and picking an appropriate delimiter. Compare:
"/^http:\\/\\/www.foo.bar\\/index.html\\?/"
versus
'!^http://www.foo.bar/index.html\?!'
Inside single quotes, you only need to backslash-escape backslashes and single quotes, and picking a different delimiter avoids needing to escape the delimiter inside the regex.
:)
if(substr($example, 0, 1) == " "){ echo 'whitespace!';}
You should not be using regexp when it is not needed.
There would also be the microoptimization option:
if (strstr(" \t\r\n", $example{0})) {
The {0} is an outdated way to get the first character (same as [0] actually). And strstr simply checks if the character is contained in the list of whitespace characters. Another option would be strspn, at least in your example case.

preg_replace hell

I'm trying to use preg_replace to get some data from a remote page, but I'm having a bit of an issue when it comes to sorting out the pattern.
function getData($Url){
$str = file_get_contents($Url);
if(strlen($str)>0){
preg_match("/\<span class=\"SectionHeader\"\>title\</span>/<br/>/\<div class=\"header2\"\>(.*)\</div\></span\>/",$str,$title);
return $title[1];
}
}
Here's the HTML as is before I ended up throwing a million slashes at it (looks like I forgot a part or two):
<span class="cell CellFullWidth"><span class="SectionHeader">mytitle</span><br/><div class="Center">Event Name</div></span>
Where Event Name is the data I want to return in my function.
Thanks a lot guys, this is a pain in the ass.
While I am inclined to agree with the commenters that this is not a pretty solution, here's my untested revision of your statement:
preg_match('#\<span class="SectionHeader"\>title\</span\>/\<br/\>/\<div class="header2"\>(.*)\</div\>\</span\>#',$str,$title);
I changed the double-quoted string to single-quoted as you aren't using any of the variable-substitution features of double-quoted strings and this avoids having to backslash-escape double-quotes as well as avoiding any ambiguity about backslashes (which perhaps should have been doubled to produce the proper strings--see the php manual on strings). I changed the slash / delimiters to hash # because of the number of slashes appearing in the match pattern (some of which were not backslash-escaped in your version).
There are quite a few things wrong with your expression:
You're using / as the delimiter, but then use / unescaped in various places.
You're escaping < and > seemingly at random. They shouldn't be escaped at all.
You have some rogue /s around the <br/> for some reason.
The class name for the div is specified as header2 in the regex but Center in the sample HTML
The title is mytitle in the HTML and title in the regex
With all of these corrected, you get:
preg_match('(<span class="SectionHeader">mytitle</span><br/><div class="Center">(.*)</div\></span\>)',$data,$t);
If you want to match any title instead of the specific title mytitle, just replace that with .*?.

Categories