PHP preg_replace not working as intended - php

I am trying to replace /admin and \admin from the following two strings:
F:\dev\htdocs\cms\admin
http://localhost/cms/admin
Using the following regular expression in preg_replace:
/[\/\\][a-zA-Z0-9_-]*$/i
1) From the first string it just replaces admin where as it should replace \admin
2) From the second string it replaces every thing except http: where as it should replace only /admin
I have checked this expression on http://regexpal.com/ and it works perfect there but not in PHP.
Any idea?
Note that the last part of each string admin is not fixed, it can
be any user selected value and thats why I have used [a-zA-Z0-9_-]* in
regular expression.

The original regular expression should be /[\/\\][a-zA-Z0-9_-]*$/i, but since you need to escape the backslashes in string declarations as well, each backslash must be expressed with \\ -- 4 backslashes in total.
From the PHP manual:
Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.
So, your preg_replace() statement should look like:
echo preg_replace('/[\/\\\\][a-zA-Z0-9_-]*$/i', '', $str);
Your regex can be improved as follows:
echo preg_replace('~[/\\\\][\w-]*$~', '', $str);
Here, \w matches the ASCII characters [A-Za-z0-9_]. You can also avoid having to escape the forward slash / by using a different delimiter -- I've used ~ above.

[\/\\\][a-zA-Z0-9_-]*$/i
Live demo

Related

Regex escape escape characters in PHP

So I have this regex that works on regex101.com
(?:[^\#\\S\\+]*)
It matches the first from first#second.
Whenever I try to use my regex with PHP's preg_replace I don't get the result I expect.
So far I tried it via preg_quote():
\(\?\:\[\^\\#\\S\\\+\]\*\)
And tried it with escaping the original \\ with 4 \'s:
\(\?\:\[\^\\#\\\\S\\\\\+\]\*\)
Still no success. Am I doing something fundamentaly wrong?
I'm just using:
preg_replace("/$regex/", "", $string);
All my other regexes that don't need so many escape chars work perfectly that way.
When you use (?:[^\#\\S\\+]*) in a preg_match in PHP, both in a single or double quoted string literal, the \\S is parsed as a non-whitespace pattern. [^\S] is equal to \s, i.e. it matches whitespace.
The preg_quote() function is only meant to be used to make any string a literal one for a regex, it just escapes all chars that are sepcial regex metacharacters / operators (like (, ), [, etc.), thus you should not use it here.
While you could use a regex to match 1+ chars other than whitespace and # from the start of a string like preg_match('~^[^#\s]+~', $s, $match), you can just explode your input string with # and get the 0th item.

PHP/Regex: Get content between Stars but not if there's a leading backslash

I would like to get everything between two stars - except of they have a leading backslash.
So for example:
*hello* world
should return "hello", but
*hello \* world*
should return "hello * world"
I tried the following regex:
/(?<!\\)\*(.+?)(?<!\\)\*/s
which works perfect on http://regex101.com/ but php returns:
Warning: preg_replace(): Compilation failed: missing ) at offset 21
What am I doing wrong?
--
EDIT 1:
Here's my PHP-Code for that:
var_dump(preg_replace('/(?<!\\)\*(.+?)(?<!\\)\*/s', '<strong>$1</strong>', '*hello world*'));
You are not escaping the backslashes correctly which results in escaping the ) character.
To match a \ in PHP you need 4 backslashes
/(?<!\\\\)\*(.+?)(?<!\\\\)\*/s
It must be done like this because every backslash in a C-like string
must be escaped by a backslash. That would give us a regular
expression with 2 backslashes, as you might have assumed at first.
However, each backslash in a regular expression must be escaped by a
backslash, too. This is the reason that we end up with 4 backslashes.
Or use a character class with 2 backslashes
/(?<![\\])\*(.+?)(?<![\\])\*/s
A literal backslash can also be matched using preg_match() by using a
character class instead. Backslashes are not escaped when they appear
within character classes in regular expressions. Therefore (“[\]“)
would match a literal backslash. The backslash must still be escaped
once by another backslash because it is still a C-like string.
Edit Found this article which explains why this is necessary. Also, added explanations.
You can use this regex:
\*(.*?(?<!\\))\*
Working demo

PHP preg_match_all strange behaviour with "/" character

Using :
preg_match_all(
"/\b".$KeyWord."\b/u",
$SearchStr,
$Array1,
PREG_OFFSET_CAPTURE);
This code works fine for all cases except when there is a / in the $KeyWord var. Then I get a warning and unsuccessful match of course.
Any idea how to work around this?
Thanks
use preg_quote() around the keyword.
http://us2.php.net/preg_quote
but also provide your delimiter, so it gets escaped: preg_quote($KeyWord, "/")
You must parse $KeyWord and add "\" before all spec symbols, you can use preg_quote()
Dynamic Values In Patterns
You are using a dynamic value inside the pattern. Like escaping for SQL or HTML, a specific escaping for the value is needed. If you do not escape meta characters inside the value are interpreted by the regex engine. The escaping function for PCRE patterns is preg_quote().
preg_match_all(
"(\b".preg_quote($KeyWord)."\b)u",
$SearchStr,
$Array1,
PREG_OFFSET_CAPTURE
);
Delimiters
The syntax of a pattern in PHPs preg_* function is:
DELIMITER PATTERN DELIMITER OPTIONS
The / is the delimiter in your pattern. So the / inside the $keyWord was recognized as the closing delimiter.
But all non alphanumeric characters can be used. In Perl and JS you can define a regular expression directly (not as string) using / so it is often the default in tutorials.
Most delimiters have to be escaped inside the pattern.
Match a \: '/\//'
The exception to this rule are brackets. You use any of the bracket pairs as delimiter. And because it is a pair, they can still be used inside the pattern.
Match a \: '(/)'
The () brackets are a good decision, you can count them as "subpattern 0".
You can use preg_quote to handle the backslash character.
From the manual:
puts a backslash in front of every
character that is part of the regular
expression syntax
You can also pass the delimiter as the second parameter and it will also be escaped. However, if you're using # as your delimiter, then there's no need to escape /
So, you can either use:
preg_match_all("/\b".preg_quote($KeyWord, "/")."\b/u", $SearchStr,$Array1,PREG_OFFSET_CAPTURE))
or, if you are sure that your keyword does not contain any other regex-special characters, you can simply change the delimiter, and use to escape the backslash:
preg_match_all("#\b".$KeyWord."\b#u", $SearchStr,$Array1,PREG_OFFSET_CAPTURE))

What's wrong with this php/regex query?

preg_replace("/(/s|^)(php|ajax|c\+\+|javascript|c#)(/s|$)/i", '$1#$2$3', $somevar);
It's meant to turn, for example, PHP into #PHP.
Warning: preg_replace(): Unknown modifier '|'
It's because you are using the forward slash (/) as your delimiter. When the regex engine gets to /s (3rd character) it thinks the regex is over and the rest of it are modifiers. But no such modifier (|) exists, thus the error.
Next time, you can either:
Change your delimiters to something you won't use in your regex, ie:
preg_replace("!(/s|^)(php|ajax|c\+\+|javascript|c#)(/s|$)!i", '$1#$2$3', $somevar);
Or escape those characters with a backslash, ie: "/something\/else/"*
I also suspect you didn't intend to use /s, but the escape character \s that matches whitespace characters.
The first character in the regular expression is the delimiter. If you need to use this inside your regular expression then you need to escape it:
"/(\/s|^)...
^
Or alternatively, choose another delimiter that isn't used anywhere in your regular expression so that you don't need to escape:
"~(/s|^)...(/s|$)~i"
I prefer to do the latter as it makes the regular expression more readable.
(Although as NullUserException points out, the actual error is that you should have used a backslash instead of a slash).

Extra backslash needed in PHP regexp pattern

When testing an answer for another user's question I found something I don't understand. The problem was to replace all literal \t \n \r characters from a string with a single space.
Now, the first pattern I tried was:
/(?:\\[trn])+/
which surprisingly didn't work. I tried the same pattern in Perl and it worked fine. After some trial and error I found that PHP wants 3 or 4 backslashes for that pattern to match, as in:
/(?:\\\\[trn])+/
or
/(?:\\\[trn])+/
these patterns - to my surprise - both work. Why are these extra backslashes necessary?
You need 4 backslashes to represent 1 in regex because:
2 backslashes are used for unescaping in a string ("\\\\" -> \\)
1 backslash is used for unescaping in the regex engine (\\ -> \)
From the PHP doc,
escaping any other character will result in the backslash being printed too1
Hence for \\\[,
1 backslash is used for unescaping the \, one stay because \[ is invalid ("\\\[" -> \\[)
1 backslash is used for unescaping in the regex engine (\\[ -> \[)
Yes it works, but not a good practice.
Its works in perl because you pass that directly as regex pattern /(?:\\[trn])+/
but in php, you need to pass as string, so need extra escaping for backslash itself.
"/(?:\\\\[trn])+/"
The regex \ to match a single
backslash would become '/\\\\/' as a
PHP preg string
The regular expression is just /(?:\\[trn])+/. But since you need to escape the backslashes in string declarations as well, each backslash must be expressed with \\:
"/(?:\\\\[trn])+/"
'/(?:\\\\[trn])+/'
Just three backspaces do also work because PHP doesn’t know the escape sequence \[ and ignores it. So \\ will become \ but \[ will stay \[.
Use str_replace!
$code = str_replace(array("\t","\n","\r"),'',$code);
Should do the trick

Categories