preg_match: can't find substring which has trailing special characters

preg_match: can't find substring which has trailing special characters - php

I have a function which uses preg_match to check for if a substring is in another string.
Today I realize that if substring has trailing special characters like special regular expression characters (. \ + * ? [ ^ ] $ ( ) { } = ! < > | : -) or #, my preg_match can't find the substring even though it is there.
This works, returns "A match was found."
$find = "website scripting";
$string = "PHP is the website scripting language of choice.";
if (preg_match("/\b" . $find . "\b/i", $string)) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
But this doesn't, returns "A match was not found."
$find = "website scripting #";
$string = "PHP is the website scripting # language of choice.";
if (preg_match("/\b" . $find . "\b/i", $string)) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
I have tried preg_quote, but it doesn't help.
Thank you for any suggestions!
Edit: Word boundary is required, that's why I use \b. I don't want to find "phone" in "smartphone".

You can just check if the characters around the search word are not word characters with look-arounds:
$find = "website scripting #";
$string = "PHP is the website scripting # language of choice.";
if (preg_match("/(?<!\\w)" . preg_quote($find, '/') . "(?!\\w)/i", $string)) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
See IDEONE demo
Result: A match was found.
Note the double slash used with \w in (?<!\\w) and (?!\\w), as you have to escape regex special characters in interpolated strings.
The preg_quote function is necessary as the search word - from what I see - can have special characters, and some of them must be escaped if intended to be matched as literal characters.
UPDATE
There is a way to build a regex with smartly placed word boundaries around the keyword, but the performance will be worse compared with the approach above. Here is sample code:
$string = "PHP is the website scripting # language of choice.";
$find = "website scripting #";
$find = preg_quote($find);
if (preg_match('/\w$/u', $find)) { // Setting trailing word boundary
$find .= '\\b';
}
if (preg_match('/^\w/u', $find)) { // Setting leading word boundary
$find = '\\b' . $find;
}
if (preg_match("/" . $find . "/ui", $string)) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
See another IDEONE demo

If you try to find a string from another string, you can strpos().
Ex.
<?php
$find = "website scripting";
$string = "PHP is the website scripting language of choice.";
if (strpos($string,$find) !== false) {
echo 'true';
} else {
echo 'false';
}

Related

preg_match not working with ^ and $

I use preg_match for testing unicode pattern with symbols and digits.
$reg = '/^(?=.*\p{L})[\d\p{L}]+$/im';
$str = trim('станция44');
if (preg_match($reg, $str) === 1) {
echo 'Match';
} else {
echo 'Not match';
}
If I test it I get Not match but if I remove ^ and $ I get Match. Why it is happen?

Because you're using p{L} matching you need to add the Unicode modifier (u) at the end:
$reg = '/^(?=.*\p{L})[\d\p{L}]+$/imu';
More info: http://php.net/manual/en/regexp.reference.unicode.php

Regular Expression not working in PHP

How to check below line in regular expression?
[albums album_id='41']
All are static except my album_id. This may be 41 or else.
Below my code I have tried but that one not working:
$str = "[albums album_id='41']";
$regex = '/^[albums album_id=\'[0-9]\']$/';
if (preg_match($regex, $str)) {
echo $str . " is a valid album ID.";
} else {
echo $str . " is an invalid ablum ID. Please try again.";
}
Thank you

You need to escape the first [ and add + quantifier to [0-9]. The first [ being unescaped created a character class - [albums album_id=\'[0-9] and that is something you did not expect.
Use
$regex = '/^\[albums album_id=\'[0-9]+\']$/';
Pattern details:
^ - start of string
\[ - a literal [
albums album_id=\'- a literal string albums album_id='
[0-9]+ - one or more digits (thanks to the + quantifier, if there can be no digits here, use * quantifier)
\'] - a literal string ']
$ - end of string.
See PHP demo:
$str = "[albums album_id='41']";
$regex = '/^\[albums album_id=\'[0-9]+\']$/';
if (preg_match($regex, $str)) {
echo $str . " is a valid album ID.";
} else {
echo $str . " is an invalid ablum ID. Please try again.";
}
// => [albums album_id='41'] is a valid album ID.

You have an error in your regex code, use this :
$regex = '/^[albums album_id=\'[0-9]+\']$/'
The + after [0-9] is to tell that you need to have one or more number between 0 and 9 (you can put * instead if you want zero or more)
To test your regex before using it in your code you can work with this website regex101

php find a substring only if it's not part of another substring

I know how to find substrings using strpos, but I want to return True only if the letter t appears in a string, but not if that t is followed by "he". For example...
$str="The lion and dog are hungry"
The result would be Does not contain t because the only t in the string was part of the word "The".
$str="Their bedroom is ugly" should also return false because "Their" starts with T H E and there's not other t in the string.
$str="The cat and the dog are hungry" would result in Yes, this string contains a t because there's a t in CAT.

You need a negative lookbehind regex:
/t(?!h(?:e|is))/i
See the regex demo
Pattern details:
t - a literal char t
(?!h(?:e|is)) - a negative lookbehind that checks if its pattern matches the string after the current location and failing the match (returning false) if the match occurs:
h - a literal h
(?:e|is) - either e or is (the (?:...|...) is a non-capturing group that does not keep submatches in the memory containing a | alternation operator)
/i - case insensitive modifier making the regex match in a case insensitive way.
Basically, this is a more efficient version of a t(?!he|his) regex (t not followed with he or his).
PHP demo:
$re = '/t(?!h(?:e|is))/i';
if (preg_match($re,'The cat and the dog are hungry'))
echo 'true';
else
echo 'false';

Try this
<?php
$a = 'Their bedroom is ugly';
if (preg_match('/t(?!he)(?!his)/i',$a))
echo 'true';
else
echo 'false';

You can use strpos to check to see if there's an 'he' after all the 't's you find:
<?php
$offest = 0;
$string = "the t the";
$result = 'No, this string does not contain t';
while ($pos1 = strpos($string,'t', $offset)) {
if ($pos2 = strpos($string,'the',$offset) {
if ($pos1 != $pos2) {
$result = 'Yes, this string contains t';
} else {
$offset = pos1;
}
} else {
$result = 'Yes, this string contains t';
}
}
echo $result;
but that's not the most efficient way to do it. IMHO the best thing to do would be to use a Regex
$string = "the t the";
$result = 'no';
if (preg_match('/[tT][^Hh]/')) {
$result = 'yes';
}
You can also use negative lookahead (a personal favorite technique):
$string = "the t the";
$result = 'no';
if (preg_match('/t(?!he)/i')) {
$result = 'yes';
}

Check if string contains the same pattern

How can I check if a string has a specific pattern like this?
XXXX-XXXX-XXXX-XXXX
4 alphanumeric characters then a minus sign, 4 times like the structure above.
What I would like to do is that I would like to check if a string contains this structure including "-".
I'm lost, can anyone point me in the correct direction?
Example code:
$string = "5E34-4512-ABAX-1E3D";
if ($pattern contains this structure XXXX-XXXX-XXXX-XXXX) {
echo 'The pattern is correct.';
}
else {
echo 'The pattern is invalid.';
}

Use regular expressions
<?php
$subject = "XXXX-XXXX-XXXX-XXXX";
$pattern = '/^[a-zA-Z0-9]{4}\-[a-zA-Z0-9]{4}\-[a-zA-Z0-9]{4}\-[a-zA-Z0-9]{4}$/';
if(preg_match($pattern, $subject) == 1);
echo 'The pattern is correct.';
} else {
echo 'The pattern is invalid.';
}
?>
[a-zA-Z0-9] match a single character
{4} matches the character Exactly 4 times
\- matches a escaped hyphen

With a perl regexp :
$string = "5E34-4512-ABAX-1E3D";
if (preg_match('/\w{4}-\w{4}-\w{4}-\w{4}/',$string)) {
echo 'The pattern is correct.';
}

use preg_match :
$ok = preg_match('/^([0-9A-Z]{4}-){3}[0-9A-Z]{4}$/', $string)
And if you want to consider lowercase characters, use :
$ok = preg_match('/^([0-9A-Z]{4}-){3}[0-9A-Z]{4}$/i', $string)

Check a string for bad words? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Efficient way to test string for certain words
I want to check if a string contains any of these words: ban, bad, user, pass, stack, name, html.
If it contains any of the words I need to echo the number of bad words
str = 'Hello my name is user';

I think something like this would work:
$badWords = array("ban","bad","user","pass","stack","name","html");
$string = "Hello my name is user.";
$matches = array();
$matchFound = preg_match_all(
"/\b(" . implode($badWords,"|") . ")\b/i",
$string,
$matches
);
if ($matchFound) {
$words = array_unique($matches[0]);
foreach($words as $word) {
echo "<li>" . $word . "</li>";
}
echo "</ul>";
}
This creates an array of banned words, and uses a regular expression to find instances of these words:
\b in the Regex indicates a word boundary (i.e. the beginning or end of a word, determined by either the beginning/end of the string or a non-word character). This is done to prevent "clbuttic" mistakes - i.e. you don't want to ban the word "banner" when you only want to match the word "ban".
The implode function creates a single string containing all your banned words, separated by a pipe character, which is the or operator in the Regex.
The implode portion of the Regex is surrounded with parentheses so that preg_match_all will capture the banned word as the match.
The i modifier at the end of the Regex indicates that the match should be case-sensitive - i.e. it will match each word regardless of capitalization - "Ban, "ban", and "BAN" will all match against the word "ban" in the $badWords array.
Next, the code checks if any matches were found. If there are, it uses array_unique to ensure only one instance of each word is reported, and then it outputs the list of matches in an unordered list.
Is this what you're looking for?

This is what you want.
function teststringforbadwords($string,$banned_words) {
foreach($banned_words as $banned_word) {
if(stristr($string,$banned_word)){
return false;
}
}
return true;
}
$string = "test string";
$banned_words = array('ban','bad','user','pass','stack','name','html');
if (!teststringforbadwords($string,$banned_words)) {
echo 'string is clean';
}else{
echo 'string contains banned words';
}

The \b in the pattern indicates a word boundary, so only the distinct
word "web" is matched, and not a word partial like "webbing" or "cobweb"
if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
if (preg_match("/\bweb\b/i", "PHP is the website scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
This is your best bet. As stated at the beginning you can control your regex.
This is directly from php.net

function check_words($text) {
$text=$text;
$bad_words = file('bad_words.txt');
$bad = explode(" | ",$bad_words[0]);
$b = '/\W' . implode('\W|\W', $bad) . '\W/i';
if(preg_match($b, $text)){
echo $text ." - Contain Bad words!"; other function here
} else {
echo $text ." - Not containing bad words :D";
// other function here
}
}
Example: check_words('He is good');
This works well although anything after the final / does not seem to get checked, e.g. http://www.mysite.com/thisbit, thisbit seems not to get checked for bad words.
It does work again how ever if it is typed like this: http://www.mysite.com/thisbit/, with the trailing /.
Not sure if this can be fixed or not.

function check_words($text) {
$text=$text;
$bad_words = file('bad_words.txt');
$bad = explode(" | ",$bad_words[0]);
$b = '/\W' . implode('\W|\W', $bad) . '\W/i';
if(preg_match($b, $text)){
echo $text ." - Contain Bad words!";
# - other function here
}
else{
echo $text ." - Not containing bad words :D";
# - other function here
}
}
# - Example
check_words('He is good');
Hope this can help.. you can put all the bad words in bad_words.txt file.
Arrange the bad words in txt as:
bad_words1 | bad_words2 | bad_words3 | bad_words4 ...
Note: you can also put something like:
bad words 1 | bad words 2 | bad words 3
as long as it is in the "|" format.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_match: can't find substring which has trailing special characters - php

If you try to find a string from another string, you can strpos(). Ex. <?php $find = "website scripting"; $string = "PHP is the website scripting language of choice."; if (strpos($string,$find) !== false) { echo 'true'; } else { echo 'false'; }

Related

preg_match not working with ^ and $

Regular Expression not working in PHP

php find a substring only if it's not part of another substring

Check if string contains the same pattern

Check a string for bad words? [duplicate]

Categories

Resources