I am using preg_match to find exact words and phrases and replace them with AHREF links. I am using word boundary regex but it is not working correctly. It is matching within words.
Example:
'rings' is being matched to 'earrings'. I don't want that. I just want 'rings'
Is my preg_match regex wrong?
$keyword="rings";
$text="women's earrings, clothing rings, earrings, rings";
if (preg_match("/\b$keyword\b/i",$text))
Italics are meant to be underlined below
output = "women's ear*rings*, clothing *rings*, ear*rings*, *rings*"
expected = "women's earrings, clothing *rings*, earrings, *rings*"
update
I think the problem is in the replace function:
function str_replace_first($from, $to, $subject)
{ $from = '/'.preg_quote($from, '/').'/';
return preg_replace($from, $to, $subject,2);
}
if (preg_match_all("/\b$keyword\b/i",$text,$matches)>0)
{
print_r($matches)."<p> ";
$ahref="<a href='$anchor_url'>$keyword</a>";
$text=str_replace_first($keyword, $ahref, $text);
} ELSE {
echo "<p>no Match<br>";
}
echo $text;
Use preg_replace directly, without collecting the matches since you are not really going to use them (you only need to wrap them up with some other texts):
$keyword="rings";
$anchor_url = "http_//www.t.tt";
$url = "<a href='$anchor_url'>\$0</a>";
$text="women's earrings, clothing rings, earrings, rings";
$newtxt = preg_replace('/\b' . preg_quote($keyword, '/') . '\b/i', $url, $text);
if ($newtxt != $text) {
echo $newtxt;
} else { echo "No matches!"; }
See the PHP demo.
Note that you need \b word boundaries to match a whole word. You also need to preg_quote the keyword and escape the regex delimiter, too. Then, since you are using a case insensitive regex, you cannot use $keyword hardcoded in the replacement, you need to use $0 backreference to the whole match. If you need to check if there was no match, just compare a new string with the original string.
Related
I have to check csv files live and match some expression to get data.
These files can have different type of message so different matching expression.
The message can be something like that
GuiPrinter.ProcessPrint of 116806 25374 K356 S Black Face.png 229 at 1
table
And I want to get 116806 25374 K356 S Black Face.png
. So the regex associate to this kind of file would be something like (GuiPrinter.ProcessPrint of )(.*)([.][png|jpg|jpeg|PNG|JPG|JPEG]*) and I can return $result[2]
But the message and the regex can change, so I need a common function that can return the string that I want based on the regex, the function would have message and regex parameters. Maybe for another file the string that I want would be on first position so my $result[2] won't work.
How can I ensure to always return the string that I want to match ?
Use
\preg_match('/GuiPrinter.ProcessPrint of(.*?)\.(gif|png|bmp|jpe?g)/', $str, $match);
print_r($match[1]);
You could match the text GuiPrinter.ProcessPrint and then use \K to reset the starting point of the reported match.
Match any character zero or more times non greedy .*?, then match a dot \. and any of the image extensions in a non capturing group (?:gif|png|bmp|jpe?g) followed by a word boundary \b
GuiPrinter\.ProcessPrint of \K.*?\.(?:gif|png|bmp|jpe?g)\b
Note that to match the dot literally you have to escape it \.
For example to return 1 match using preg_match:
$str = 'GuiPrinter.ProcessPrint of 116806 25374 K356 S Black Face.png 229 at 1 table';
$re = '/GuiPrinter\.ProcessPrint of \K.*?\.(?:gif|png|bmp|jpe?g)\b/';
function findMatch($message, $regex) {
preg_match($regex, $message, $matches);
return array_shift($matches);
}
$result = findMatch($str, $re);
if ($result) {
echo "Found: $result";
} else {
echo "No match.";
}
Demo
Given the following simple function (for a PHP page) I am trying to match all the occurences of the word $marker in a long text string. I need to highlight its occurences.
The function works, but it presents two problems:
1) it fails to match uppercase occurences of $marker
2) it also matches partial occurences: if $marker is "art", the function as it is also matches "artistic" and "cart".
How can I correct these two inconveniences?
function highlightWords($string, $marker){
$string = str_replace($marker, "<span class='highlight success'>".$marker."</span>", $string);
return $string;
}
To solve the two problems you can use preg_replace() with a regular expression. Just add the i flag for case-insensitive search and add \b word boundaries around your search term, so it can't be part of another word, e.g.
function highlightWords($string, $marker){
$string = preg_replace("/(\b" . preg_quote($marker, "/") . "\b)/i", "<span class='highlight success'>$1</span>", $string);
return $string;
}
Okay, so I'm trying to replace really long quotes in my websites comments section that uses bbcode, what I'm trying to do is encase long quotes in a collapse I already have coded in js and css.
My problem is that it will do the first quote, then any other quotes vanish. I'm obviously missing something, but this is my first time using callbacks like this.
Here's my php code right now to do this:
$body = preg_replace_callback("/\[quote\](.*?)\[\/quote\]/is",
function($matches)
{
if (strlen($matches[1]) >= '1000')
{
$matches[0] = str_replace($matches[0], '<div class="box"><div class="collapse_container"><div class="collapse_header"><span>Long quote, click to expand</span></div><div class="collapse_content">' . $matches[1] . '</div></div></div>', $matches[0]);
return $matches[0];
}
}, $body);
Some example text:
[quote]aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa[/quote]
[quote]booohoo[/quote]
[quote]new quoting[/quote]
[b]test[/b]
You need to move the return $matches[0] code outside the if block:
function($matches)
{
if (strlen($matches[1]) >= '1000') {
$matches[0] = str_replace($matches[0], '<div class="box"><div class="collapse_container"><div class="collapse_header"><span>Long quote, click to expand</span></div><div class="collapse_content">' . $matches[1] . '</div></div></div>', $matches[0]);
}
return $matches[0];
}
Also, I advise to unroll your lazy matching regex as follows:
'~\[quote\]([^[]*(?:\[(?!/quote\])[^[]*)*)\[/quote\]~i'
See my regex demo (30 steps) and your regex demo (2025 steps).
See IDEONE demo
Phps preg* functions are acting greedy by default. They will match the longest possible string described by your regex. In your case the regex mathces everything from the first [quote] to the very last [/quote]. To turn this behavior of you have to use the "U" modifier:
$body = preg_replace_callback("/\[quote\](.*?)\[\/quote\]/isU",...);
For a list of modifiers see http://php.net/manual/en/reference.pcre.pattern.modifiers.php
I am using PHP to find out whether a string, which starts with a special regular expression character, occurs as a word in a text string. This is the PHP code:
$subject = " hello [hello [helloagain ";
$pattern = preg_quote("[hello");
if (preg_match("/\b" . $pattern . "\b/", $subject, $dummy)) {
echo "match";
} else {
echo "no match";
}
The pattern starts with character [, hence, preg_quote() is used to escape it. There is an instance of [hello as a word in the subject so there should be one match, but the above preg_match() returns no match. I think the reason is that in the subject a special character is not recognized as the start or end of a word, but I can’t think of any way round this, any ideas? Thank you.
If I understand the question correctly, you could just use strpos() with a leading space to separate words:
$subject = " hello [hello [helloagain ";
$pattern = " [hello";
if(strpos($subject, $pattern) !== FALSE)
// ...
else
// ...
I think that not using reg-ex here is actually a better method since you are looking for special reg-ex chars, and they do not have to be escaped if you use strpos().
It would take some modification to be right in all cases, but this worked when I tried it.
You are correct that a word boundary will not match between a space and a [ symbol.
Instead of using a word boundary you can explicitly search for spaces (and other separators such as commas and periods if you wish) before and after the word:
if (preg_match("/(\s|^)" . $pattern . "(?=\s|$)/", $subject, $dummy)) {
For example, if my sentence is $sent = 'how are you'; and if I search for $key = 'ho' using strstr($sent, $key) it will return true because my sentence has ho in it.
What I'm looking for is a way to return true if I only search for how, are or you. How can I do this?
You can use the function preg-match that uses a regex with word boundaries:
if(preg_match('/\byou\b/', $input)) {
echo $input.' has the word you';
}
If you want to check for multiple words in the same string, and you're dealing with large strings, then this is faster:
$text = explode(' ',$text);
$text = array_flip($text);
Then you can check for words with:
if (isset($text[$word])) doSomething();
This method is lightning fast.
But for checking for a couple of words in short strings then use preg_match.
UPDATE:
If you're actually going to use this I suggest you implement it like this to avoid problems:
$text = preg_replace('/[^a-z\s]/', '', strtolower($text));
$text = preg_split('/\s+/', $text, NULL, PREG_SPLIT_NO_EMPTY);
$text = array_flip($text);
$word = strtolower($word);
if (isset($text[$word])) doSomething();
Then double spaces, linebreaks, punctuation and capitals won't produce false negatives.
This method is much faster in checking for multiple words in large strings (i.e. entire documents of text), but it is more efficient to use preg_match if all you want to do is find if a single word exists in a normal size string.
One thing you can do is breaking up your sentence by spaces into an array.
Firstly, you would need to remove any unwanted punctuation marks.
The following code removes anything that isn't a letter, number, or space:
$sent = preg_replace("/[^a-zA-Z 0-9]+/", " ", $sent);
Now, all you have are the words, separated by spaces. To create an array that splits by space...
$sent_split = explode(" ", $sent);
Finally, you can do your check. Here are all the steps combined.
// The information you give
$sent = 'how are you';
$key = 'ho';
// Isolate only words and spaces
$sent = preg_replace("/[^a-zA-Z 0-9]+/", " ", $sent);
$sent_split = explode(" ", $sent);
// Do the check
if (in_array($key, $sent))
{
echo "Word found";
}
else
{
echo "Word not found";
}
// Outputs: Word not found
// because 'ho' isn't a word in 'how are you'
#codaddict's answer is technically correct but if the word you are searching for is provided by the user, you need to escape any characters with special regular expression meaning in the search word. For example:
$searchWord = $_GET['search'];
$searchWord = preg_quote($searchWord);
if (preg_match("/\b$searchWord\b", $input) {
echo "$input has the word $searchWord";
}
With recognition to Abhi's answer, a couple of suggestions:
I added /i to the regex since sentence-words are probably treated case-insensitively
I added explicit === 1 to the comparison based on the documented preg_match return values
$needle = preg_quote($needle);
return preg_match("/\b$needle\b/i", $haystack) === 1;