PHP preg_match_all strange behaviour with "/" character - php

Using :
preg_match_all(
"/\b".$KeyWord."\b/u",
$SearchStr,
$Array1,
PREG_OFFSET_CAPTURE);
This code works fine for all cases except when there is a / in the $KeyWord var. Then I get a warning and unsuccessful match of course.
Any idea how to work around this?
Thanks

use preg_quote() around the keyword.
http://us2.php.net/preg_quote
but also provide your delimiter, so it gets escaped: preg_quote($KeyWord, "/")

You must parse $KeyWord and add "\" before all spec symbols, you can use preg_quote()

Dynamic Values In Patterns
You are using a dynamic value inside the pattern. Like escaping for SQL or HTML, a specific escaping for the value is needed. If you do not escape meta characters inside the value are interpreted by the regex engine. The escaping function for PCRE patterns is preg_quote().
preg_match_all(
"(\b".preg_quote($KeyWord)."\b)u",
$SearchStr,
$Array1,
PREG_OFFSET_CAPTURE
);
Delimiters
The syntax of a pattern in PHPs preg_* function is:
DELIMITER PATTERN DELIMITER OPTIONS
The / is the delimiter in your pattern. So the / inside the $keyWord was recognized as the closing delimiter.
But all non alphanumeric characters can be used. In Perl and JS you can define a regular expression directly (not as string) using / so it is often the default in tutorials.
Most delimiters have to be escaped inside the pattern.
Match a \: '/\//'
The exception to this rule are brackets. You use any of the bracket pairs as delimiter. And because it is a pair, they can still be used inside the pattern.
Match a \: '(/)'
The () brackets are a good decision, you can count them as "subpattern 0".

You can use preg_quote to handle the backslash character.
From the manual:
puts a backslash in front of every
character that is part of the regular
expression syntax
You can also pass the delimiter as the second parameter and it will also be escaped. However, if you're using # as your delimiter, then there's no need to escape /
So, you can either use:
preg_match_all("/\b".preg_quote($KeyWord, "/")."\b/u", $SearchStr,$Array1,PREG_OFFSET_CAPTURE))
or, if you are sure that your keyword does not contain any other regex-special characters, you can simply change the delimiter, and use to escape the backslash:
preg_match_all("#\b".$KeyWord."\b#u", $SearchStr,$Array1,PREG_OFFSET_CAPTURE))

Related

PHP Preg_replace() pattern not work

I wan to replace text using preg_replace.But my search string have a / so it makes problem.
How can I solve it?
$search='r/trtrt';
echo preg_replace('/\b'.addslashes($search).'\b/', 'ERTY', 'TG FRT');
I am getting error preg_replace(): Unknown modifier 'T'
Use a different delimiter and don't use addslashes, that is escaping non-regex special characters (or a mix of regex and non-regex characters, I'd say the majority of the time dont use addslashes).
$search='r/trtrt';
echo preg_replace('~\b'. $search.'\b~', 'ERTY', 'TG FRT');
You could use preg_quote as an alternative. Just changing the delimiter is the easiest solution though.
use ~ as delimiter:
$search='r/trtrt';
echo preg_replace('~\b'.addslashes($search).'\b~', 'ERTY', 'TG FRT');
I always use ~ as it is one of the least used char in a string but you can use any character you want and won't need to escape your regexp chars!
You don't need addslashes() in your case but if you have a more complex regexp and you want to escape chars you should use preg_quote($search).
Why not escape it the way it is meant to be done
$search='r/trtrt';
echo preg_replace('/\b'.preg_quote($search, '/').'\b/', 'ERTY', 'TG FRT');
http://php.net/manual/en/function.preg-quote.php
preg_quote() takes str and puts a backslash in front of every
character that is part of the regular expression syntax. This is
useful if you have a run-time string that you need to match in some
text and the string may contain special regex characters
delimiter
If the optional delimiter is specified, it will also be escaped.
This is useful for escaping the delimiter that is required by the PCRE
functions. The / is the most commonly used delimiter.
Add slashes is not the function to use here. It provides no escaping for any of the special characters in Regx.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) {
} = ! < > | : -
Using the proper functions promote readability of the code, if at some later point in time you or another coder see the ~ delimiter they may just think its part of a personal "style" or pay it little attention. However, seeing the input properly escaped will tell any experienced coder that the input could contain characters that conflict with regular expressions.
Personally, readability is at the top of my list whenever I write code. If you cant understand it at a glance, what good is it.

Why are backslashes used in preg_match function of PHP?

I'm been practicing the preg_match() function in PHP. The tutorial said that it is needed to add fore slashes before the characters.
I also noticed that without the slashes, it works strangely. It gives a warning:
preg_match(): Delimiter must not be alphanumeric or backslash.
Q: What difference does the fore slashes do?
Here's the code:
$string = 'Okay, I\'m fine with it! ';
$math = 'Okay'; // I need to add fore slashes for it to work
echo preg_match($math, $string); // It supposedly echoes out 1 or 0
// depending if the former argument
// is in the latter argument
There is no particular reason, it's a syntaxic choice. This syntax has the avantage to be handy to add global modifiers to the pattern:
delimiter - pattern - delimiter - [global modifiers]
As explained in the error message and in the php manual, you can choose the delimiter between special characters, the most commonly used is the slash, but it's not always a pertinent choice in particular when the pattern contains a lot of literal slashes that need to be escaped.
It's because you can also apply switches to the regular expression (eg. m for multiline, u for Unicode) and these need to be defined outside of the delimiter, so the syntax is
opening delimiter expression closing delimiter [optional switches]
e.g.
/^[a-z]*$/mi
for the multiline (m) and case insensitive (i) switches, using a delimiter of /
The delimiter must not be a character that can be misinterpreted by the regexp parser, it must be very clear that it is a delimiter, so it cannot be alpha (e.g. i, or a \ that is used to "escape" characters in the regexp
Note that you can also use braces as delimiters, so
[^[a-z]*$]mi
is valid

PHP preg_replace not working as intended

I am trying to replace /admin and \admin from the following two strings:
F:\dev\htdocs\cms\admin
http://localhost/cms/admin
Using the following regular expression in preg_replace:
/[\/\\][a-zA-Z0-9_-]*$/i
1) From the first string it just replaces admin where as it should replace \admin
2) From the second string it replaces every thing except http: where as it should replace only /admin
I have checked this expression on http://regexpal.com/ and it works perfect there but not in PHP.
Any idea?
Note that the last part of each string admin is not fixed, it can
be any user selected value and thats why I have used [a-zA-Z0-9_-]* in
regular expression.
The original regular expression should be /[\/\\][a-zA-Z0-9_-]*$/i, but since you need to escape the backslashes in string declarations as well, each backslash must be expressed with \\ -- 4 backslashes in total.
From the PHP manual:
Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.
So, your preg_replace() statement should look like:
echo preg_replace('/[\/\\\\][a-zA-Z0-9_-]*$/i', '', $str);
Your regex can be improved as follows:
echo preg_replace('~[/\\\\][\w-]*$~', '', $str);
Here, \w matches the ASCII characters [A-Za-z0-9_]. You can also avoid having to escape the forward slash / by using a different delimiter -- I've used ~ above.
[\/\\\][a-zA-Z0-9_-]*$/i
Live demo

why does this regex fail in PHP?

I got the expression directly from RegExr, but PHP has a problem with the =
"/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/"
The expression is for matching email addresses.
You used / as the delimiter marking the start and end of the pattern, but then also used that character within the pattern. You must either use a different delimiter, or escape instances of it within the pattern. If you meant to escape the equals signs, then you used the wrong slash.
Escape the slash preceding the = (and the other slash in that expression). You use / as a delimiter, therefore if it occurs inside the pattern it has to be escaped.
"/[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/"
should work, then.
You are using / as delimiter. There are two / in the regex which are not escaped. Escape them as \/:
"/[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/"
^^ ^^

What's wrong with this php/regex query?

preg_replace("/(/s|^)(php|ajax|c\+\+|javascript|c#)(/s|$)/i", '$1#$2$3', $somevar);
It's meant to turn, for example, PHP into #PHP.
Warning: preg_replace(): Unknown modifier '|'
It's because you are using the forward slash (/) as your delimiter. When the regex engine gets to /s (3rd character) it thinks the regex is over and the rest of it are modifiers. But no such modifier (|) exists, thus the error.
Next time, you can either:
Change your delimiters to something you won't use in your regex, ie:
preg_replace("!(/s|^)(php|ajax|c\+\+|javascript|c#)(/s|$)!i", '$1#$2$3', $somevar);
Or escape those characters with a backslash, ie: "/something\/else/"*
I also suspect you didn't intend to use /s, but the escape character \s that matches whitespace characters.
The first character in the regular expression is the delimiter. If you need to use this inside your regular expression then you need to escape it:
"/(\/s|^)...
^
Or alternatively, choose another delimiter that isn't used anywhere in your regular expression so that you don't need to escape:
"~(/s|^)...(/s|$)~i"
I prefer to do the latter as it makes the regular expression more readable.
(Although as NullUserException points out, the actual error is that you should have used a backslash instead of a slash).

Categories