PHP Regular Expression ignoring special characters

PHP Regular Expression ignoring special characters - php

With the help of SO, I was able to make a regular expression for my purposes, it works great, but it completely ignores special characters.
$pattern='/(?=.*\b\Q'.str_replace(' ','\E\b)(?=.*\b\Q',$requestedservice).'\E\b)/i';
preg_match($pattern, $item)
Here $requestedservice is the character that it's trying to match with $item from the database.
The $item is Walk - Dance so if the $requestedservice is Walk - Dance as well, it's not matched, but if the $requestedservice is Walk Dance it is matched.
I am not sure why it's ignoring special characters like - / %
I am using html_entity_decode for the $requestedservice so that's not an issue.
Any guidance would be really helpful.

Your word boundaries are working against you. If you have ., for instance, your pattern is /(?=.*\b\Q.\E\b)/i, which asserts that there is a literal . with a word boundary before and after it, and since . is a non-word character, that means there has to be a word character before and after it.
Instead you could use (?<!\w) in place of the first and third \b and (?!\w) in place of the second and fourth \b to specifically assert there is not a word character before and after each of your string parts that need to match.

Related

Using preg_replace() with search words that may have special characters [duplicate]

Regular Expressions are completely new to me and having done much searching my expression for testing purposes is this:
preg_replace('/\b0.00%\b/','- ', '0.00%')
It yields 0.00% when what I want is - .
With preg_replace('/\b0.00%\b/','- ', '50.00%') yields 50.00% which is what I want - so this is fine.
But clearly the expression is not working as it is not, in the first example replacing 0.00% with -.
I can think of workarounds with if(){} for testing length/content of string but presume the replace will be most efficient

The word boundary after % requires a word char (letter, digit or _) to appear right after it, so there is no replacement taking place here.
You need to replace the word boundaries with unambiguous boundaries defined with the help of (?<!\w) and (?!\w) lookarounds that will fail the match if the keywords are preceded or followed with word characters:
$value='0.00%';
$str = 'Price: 0.00%';
echo preg_replace('/(?<!\w)' . preg_quote($value, '/') . '(?!\w)/i', '- ', $str);
See the PHP demo
Output: Price: -

preg_replace has three arguments as you probably already know. The regular expression pattern to match, the replacement value, and the string to search (in that order).
It appears that your preg_replace regex pattern has word boundries \b it is looking for on either end of the value you are looking for 0.00% which should not really be needed. This looks a bit like a bug to me especially when I plug it into the regex website I use. It works fine there. There is probably a somewhat odd querk with it so you might want to try it without the \b and try something like the start of string ^ and end of string characters $.

How do I escape the brackets in a mysql REGEXP [duplicate]

I have a regular expression to escape all special characters in a search string. This works great, however I can't seem to get it to work with word boundaries. For example, with the haystack
add +
or
add (+)
and the needle
+
the regular expression /\+/gi matches the "+". However the regular expression /\b\+/gi doesn't. Any ideas on how to make this work?
Using
add (plus)
as the haystack and /\bplus/gi as the regex, it matches fine. I just can't figure out why the escaped characters are having problems.

\b is a zero-width assertion: it doesn't consume any characters, it just asserts that a certain condition holds at a given position. A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. (A "word character" is a letter, a digit, or an underscore.) In your string:
add +
...there's a word boundary at the beginning because the a is not preceded by a word character, and there's one after the second d because it's not followed by a word character. The \b in your regex (/\b\+/) is trying to match between the space and the +, which doesn't work because neither of those is a word character.

Try changing it to:
/\b\s?+/gi
Edit:
Extend this concept as far as you want. If you want the first + after any word boundary:
/\b[^+]*+/gi

Boundaries are very conditional assertions; what they anchor depends on what they touch. See this answer for a detailed explanation, along with what else you can do to deal with it.

Regex match section within string

I have a string foo-foo-AB1234-foo-AB12345678. The string can be in any format, is there a way of matching only the following pattern letter,letter,digits 3-5 ?
I have the following implementation:
preg_match_all('/[A-Za-z]{2}[0-9]{3,6}/', $string, $matches);
Unfortunately this finds a match on AB1234 AND AB12345678 which has more than 6 digits. I only wish to find a match on AB1234 in this instance.
I tried:
preg_match_all('/^[A-Za-z]{2}[0-9]{3,6}$/', $string, $matches);
You will notice ^ and $ to mark the beginning and end, but this only applies to the string, not the section, therefore no match is found.
I understand why the code is behaving like it is. It makes logical sense. I can't figure out the solution though.

You must be looking for word boundaries \b:
\b\p{L}{2}\p{N}{3,5}\b
See demo
Note that \p{L} matches a Unicode letter, and \p{N} matches a Unicode number.
You can as well use your modified regex \b[a-zA-Z]{2}[0-9]{3,5}\b. Note that using anchors makes your regex match only at the beginning of a string (with ^) or/and at the end of the string (with $).
In case you have underscored words (like foo-foo_AB1234_foo_AB12345678_string), you will need a slight modification:
(?<=\b|_)\p{L}{2}\p{N}{3,5}(?=\b|_)

You have to end your regular expression with a pattern for a non-digit. In Java this would be \D, this should be the same in PHP.

Strip trailing non-word character(s)

I need to strip any non-alphanumeric characters from the end of strings using PHP's preg_replace:
Word One, Two, -, Word One, Two,[space], Word One, Two,, Word One, Two should all become Word One, Two.
I have tried preg_replace('/(.+)\\W+$/', '$1', 'Word One, Two, -'); but this only strips the last non-word character. I also tried '/(.+)\\W*$/' as I assumed this would make it work if 0 or 1 non-word characters are found (as I need) but it then doesn't match at all. I think I need to make the \W greedy but I'm not sure how. Any ideas? Also, please feel free to explain to me what I am doing wrong so I don't find myself haunting the SO regex tag ;-)

This is because (.+) eats up all other character, including non-word characters. The regex engine starts matching the string and starts out with all characters in the capturing group. Only then it notices that the \W at the end of the string won't fit and backs up, tentatively allowing a single character to be matched by the \W. But a single character is all that's needed to satisfy the \W+, so it just stops and just strips that single character. That's also the reason why (.+)\W*$ doesn't work at all, because \W* is content with matching nothing at all.
Use
preg_replace('/\\W+$/', '', $foo);
instead. This avoids the problem by just replacing trailing non-word characters without even trying to match something else.
Another option would be
preg_replace('/(.+?)\\W+$/', '$1', $foo);
which would use a lazy quantifier (+?) for the capturing group. This quantifier tries satisfying the match while matching as little as possible (as opposed to + which tries to match as much as possible as we saw above). But generally I'd avoid replacing parts of the match by themselves if you can avoid it. To strip things from a string you certainly don't need to match more than you need to strip.

What your regex is doing is looking for the maximum possible amount of any character, while still keeping at least one non-word at the end.
What you need to do is just drop the (.+), and use:
preg_replace("/\W+$/","",$input);

What does this Regex pattern mean: '/&\w;/'

Can someone explain what this function
preg_replace('/&\w;/', '', $buf)
does? I have looked at various tutorials and found that it replaces the pattern /&\w;/ with string ''. But I can't understand the pattern /&\w;/. What does it represent?
Similarly in
preg_match_all("/(\b[\w+]+\b)/", $buf, $words)
I can't understand what does the string "/(\b[\w+]+\b)/" represents.
Please help. Thanks in advance :)

The explanation of your first expression is simple, it is:
& # Match the character “&” literally
\w # Match a single character that is a “word character” (letters, digits, and underscores)
; # Match the character “;” literally
The second one is:
( # Match the regular expression below and capture its match into backreference number 1
\b # Assert position at a word boundary
[\w+] # Match a single character present in the list below
# A word character (letters, digits, and underscores)
# The character “+”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b # Assert position at a word boundary
)
The preg_replace function makes use of regular expressions. Regular expressions allow you to find patterns in text in a really powerful way.
To be able to use functions like preg_replace or preg_match I recommend you to take a look first at how regular expressions work.
You can gather a lot of info on this site http://www.regular-expressions.info/
And you can use software tools to help you understand the regex (like RegexBuddy)

In regular expressions, \w stands for any "word" character. That is: a-z, A-Z, 0-9 and underscore. \b stands for "word boundary", that is the beginning and end of a word (a series of word characters).
So, /&\w;/ is a regular expression to match the & sign, followed by a series of word characters, followed by a ;. For example, &foobar; would match, and preg_replace will replace it with an empty string.
In that same manner, /(\b[\w+]+\b)/ matches a word boundary, followed by multiple word characters, followed by another word boundary. The words are captured separately using the parenthesis. So, this regular expression will simply return the words in a string as an array.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Regular Expression ignoring special characters - php

Related

Using preg_replace() with search words that may have special characters [duplicate]

How do I escape the brackets in a mysql REGEXP [duplicate]

Regex match section within string

Strip trailing non-word character(s)

What does this Regex pattern mean: '/&\w;/'

Categories

Resources