highlighting words at the end of a word - php

i'm not sure how i could have phrased the title better, but my issue is that the highlight function doesn't highlight the search keywords which are at the end of the word. for example, if the search keyword is 'self', it will highlight 'self' or 'self-lessness' or 'Self' [with capital S] but it will not highlight the self of 'yourself' or 'himself' etc. .
this is the highlight function:
function highlightWords($text, $words) {
preg_match_all('~\w+~', $words, $m);
if(!$m)
return $text;
$re = '~\\b(' . implode('|', $m[0]) . ')~i';
$string = preg_replace($re, '<span class="highlight">$0</span>', $text);
return $string;
}

It seems you might have a \b at the beginning of your regex, which means a word boundary. Since the 'self' in 'yourself' doesn't start at a word boundary, it doesn't match. Get rid of the \b.

Try something like this:
function highlight($text, $words) {
if (!is_array($words)) {
$words = preg_split('#\\W+#', $words, -1, PREG_SPLIT_NO_EMPTY);
}
$regex = '#\\b(\\w*(';
$sep = '';
foreach ($words as $word) {
$regex .= $sep . preg_quote($word, '#');
$sep = '|';
}
$regex .= ')\\w*)\\b#i';
return preg_replace($regex, '<span class="highlight">\\1</span>', $text);
}
$text = "isa this is test text";
$words = array('is');
echo highlight($text, $words); // <span class="highlight">isa</span> <span class="highlight">this</span> <span class="highlight">is</span> test text
The loop, is so that every search word is properly quoted...
EDIT: Modified function to take either string or array in $words parameter.

Related

Check if a word occur in string and not to be in first and last

I am trying to check if word is occur in a string but not to be the first and last word, if its true then remove the space after and before of the word and replace with a underscore.
Input:
$str = 'This is a cool area";
Output:
$str = 'This is a_cool_area";
I want to check that the word 'cool' is inside the string but not a first and last word. if yes the remove the space & replace with '_'
You can use preg_replace to do this job, using this regex:
/(?<=\w)\s+(' . $word . ')\s+(?=\w)/i
which looks for the word, surrounded by at least one word character on either side (to prevent matching at the beginning or ending of the sentence). Usage in PHP:
$str = 'This is a cool area';
$word = 'cool';
$str = preg_replace('/(?<=\w)\s+(' . $word . ')\s+(?=\w)/i', '_$1_', $str);
echo $str . "\n";
$str = ' Cool areas are cool ';
$str = preg_replace('/(?<=\w)\s+(' . $word . ')\s+(?=\w)/i', '_$1_', $str);
echo $str . "\n";
Output:
This is a_cool_area
Cool areas are cool
Demo on 3v4l.org
function checkWord($str, $word)
{
$arr = explode(" ", $str);
$newArr = array_slice($arr, 1, -1);
$key = array_search($word, $newArr);
if($key !== false)
{
return implode('_',array_slice($arr, $key, 3));
}
else
{
return $str;
}
}
echo checkWord('This is a cool area', 'cool');

How can I avoid adding href to an overlapping keyword in string?

Using the following code:
$text = "أطلقت غوغل النسخة المخصصة للأجهزة الذكية العاملة بنظام أندرويد من الإصدار “25″ لمتصفحها الشهير كروم.ولم تحدث غوغل تطبيق كروم للأجهزة العاملة بأندرويد منذ شهر تشرين الثاني العام الماضي، وهو المتصفح الذي يستخدمه نسبة 2.02% من أصحاب الأجهزة الذكية حسب دراسة سابقة. ";
$tags = "غوغل, غوغل النسخة, كروم";
$tags = explode(",", $tags);
foreach($tags as $k=>$v) {
$text = preg_replace("/\b{$v}\b/u","$0",$text, 1);
}
echo $text;
Will give the following result:
I love PHP">love PHP</a>, but I am facing a problem
Note that my text is in Arabic.
The way is to do all in one pass. The idea is to build a pattern with an alternation of tags. To make this way work, you must before sort the tags because the regex engine will stop at the first alternative that succeeds (otherwise 'love' will always match even if it is followed by 'php' and 'love php' will never be matched).
To limit the replacement to the first occurence of each word you can remove tag from the array once it has been found and you test if it is always present in the array inside the replacement callback function:
$text = 'I love PHP, I love love but I am facing a problem';
$tagsCSV = 'love, love php, facing';
$tags = explode(', ', $tagsCSV);
rsort($tags);
$tags = array_map('preg_quote', $tags);
$pattern = '/\b(?:' . implode('|', $tags) . ')\b/iu';
$text = preg_replace_callback($pattern, function ($m) use (&$tags) {
$mLC = mb_strtolower($m[0], 'UTF-8');
if (false === $key = array_search($mLC, $tags))
return $m[0];
unset($tags[$key]);
return '<a href="index.php?s=news&tag=' . rawurlencode($mLC)
. '">' . $m[0] . '</a>';
}, $text);
Note: when you build an url you must encode special characters, this is the reason why I use preg_replace_callback instead of preg_replace to be able to use rawurlencode.
If you have to deal with an utf8 encoded string, you need to add the u modifier to the pattern and you need to replace strtolower with mb_strtolower)
the preg_split way
$tags = explode(', ', $tagsCSV);
rsort($tags);
$tags = array_map('preg_quote', $tags);
$pattern = '/\b(' . implode('|', $tags) . ')\b/iu';
$items = preg_split($pattern, $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$itemsLength = count($items);
$i = 1;
while ($i<$itemsLength && count($tags)) {
if (false !== $key = array_search(mb_strtolower($items[$i], 'UTF-8'), $tags)) {
$items[$i] = '<a href="index.php?s=news&tag=' . rawurlencode($tags[$key])
. '">' . $items[$i] . '</a>';
unset($tags[$key]);
}
$i+=2;
}
$result = implode('', $items);
Instead of calling preg_replace multiple times, call it a single time with a regexp that matches any of the tags:
$tags = explode(",", tags);
$tags_re = '/\b(' . implode('|', $tags) . ')\b/u';
$text = preg_replace($tags_re, '$0', $text, 1);
This turns the list of tags into the regexp /\b(love|love php|facing)\b/u. x|y in a regexp means to match either x or y.

Replace words found in string with highlighted word keeping their case as found

I want to replace words found in string with highlighted word keeping their case as found.
Example
$string1 = 'There are five colors';
$string2 = 'There are Five colors';
//replace five with highlighted five
$word='five';
$string1 = str_ireplace($word, '<span style="background:#ccc;">'.$word.'</span>', $string1);
$string2 = str_ireplace($word, '<span style="background:#ccc;">'.$word.'</span>', $string2);
echo $string1.'<br>';
echo $string2;
Current output:
There are five colors
There are five colors
Expected output:
There are five colors
There are Five colors
How this can be done?
To highlight a single word case-insensitively
Use preg_replace() with the following regex:
/\b($p)\b/i
Explanation:
/ - starting delimiter
\b - match a word boundary
( - start of first capturing group
$p - the escaped search string
) - end of first capturing group
\b - match a word boundary
/ - ending delimiter
i - pattern modifier that makes the search case-insensitive
The replacement pattern can be <span style="background:#ccc;">$1</span>, where $1 is a backreference — it would contain what was matched by the first capturing group (which, in this case, is the actual word that was searched for)
Code:
$p = preg_quote($word, '/'); // The pattern to match
$string = preg_replace(
"/\b($p)\b/i",
'<span style="background:#ccc;">$1</span>',
$string
);
See it in action
To highlight an array of words case-insensitively
$words = array('five', 'colors', /* ... */);
$p = implode('|', array_map('preg_quote', $words));
$string = preg_replace(
"/\b($p)\b/i",
'<span style="background:#ccc;">$1</span>',
$string
);
var_dump($string);
See it in action
str_replace - case sensitive
str_ireplace - class insenstive
http://www.php.net/manual/en/function.str-replace.php
http://www.php.net/manual/en/function.str-ireplace.php
Here is the test case.
<?php
class ReplaceTest extends PHPUnit_Framework_TestCase
{
public function testCaseSensitiveReplaceSimple()
{
$strings = array(
'There are five colors',
'There are Five colors',
);
$expected = array(
'There are <span style="background:#ccc;">five</span> colors',
'There are <span style="background:#ccc;">Five</span> colors',
);
$find = array(
'five',
'Five',
);
$replace = array(
'<span style="background:#ccc;">five</span>',
'<span style="background:#ccc;">Five</span>',
);
foreach ($strings as $key => $string) {
$this->assertEquals($expected[$key], str_replace($find, $replace, $string));
}
}
public function testCaseSensitiveReplaceFunction()
{
$strings = array(
'There are five colors',
'There are Five colors',
);
$expected = array(
'There are <span style="background:#ccc;">five</span> colors',
'There are <span style="background:#ccc;">Five</span> colors',
);
foreach ($strings as $key => $string) {
$this->assertEquals($expected[$key], highlightString('five', $string, '<span style="background:#ccc;">$1</span>'));
}
}
}
/**
* #argument $words array or string - words to that are going to be highlighted keeping case
* #argument $string string - the search
* #argument $replacement string - the wrapper used for highlighting, $1 will be the word
*/
function highlightString($words, $string, $replacement)
{
$replacements = array();
$newwords = array();
$key = 0;
foreach ((array) $words AS $word)
{
$replacements[$key] = str_replace('$1', $word, $replacement);
$newwords[$key] = $word;
$key++;
$newwords[$key] = ucfirst($word);
$replacements[$key] = str_replace('$1', ucfirst($word), $replacement);
$key++;
}
return str_replace($newwords, $replacements, $string);
}
Results
..
Time: 25 ms, Memory: 8.25Mb
OK (2 tests, 4 assertions)

SyntaxHighlighter BBCode PHP

I'm having some problems with the BBCode I created to use with the SyntaxHighlighter
function bb_parse_code($str) {
while (preg_match_all('`\[(code)=?(.*?)\]([\s\S]*)\[/code\]`', $str, $matches)) foreach ($matches[0] as $key => $match) {
list($tag, $param, $innertext) = array($matches[1][$key], $matches[2][$key], $matches[3][$key]);
switch ($tag) {
case 'code': $replacement = '<pre class="brush: '.$param.'">'.str_replace(" ", " ", str_replace(array("<br>", "<br />"), "\n", $innertext))."</pre>"; break;
}
$str = str_replace($match, $replacement, $str);
}
return $str;
}
And I have the bbcode:
[b]bold[/b]
[u]underlined[/u]
[code=js]function (lol) {
alert(lol);
}[/code]
[b]bold2[/b]
[code=php]
<? echo 'lol' ?>
[/code]
Which returns this:
I know the problem is on the ([\s\S]*) of the regex that allows any character, but how do to make the code work with line breaks?
You should use the following pattern:
`\[(code)=?(.*?)\](.*?)\[/code\]`s
A couple of changes:
The switch to .*? to make the quantifier lazy.
The s modifier at the end, which causes . to match new lines too.

PHP Regex to match a list of words against a string

I have a list of words in an array. I need to look for matches on a string for any of those words.
Example word list
company
executive
files
resource
Example string
Executives are running the company
Here's the function I've written but it's not working
$matches = array();
$pattern = "/^(";
foreach( $word_list as $word )
{
$pattern .= preg_quote( $word ) . '|';
}
$pattern = substr( $pattern, 0, -1 ); // removes last |
$pattern .= ")/";
$num_found = preg_match_all( $pattern, $string, $matches );
echo $num_found;
Output
0
$regex = '(' . implode('|', $words) . ')';
<?php
$words_list = array('company', 'executive', 'files', 'resource');
$string = 'Executives are running the company';
foreach ($words_list as &$word) $word = preg_quote($word, '/');
$num_found = preg_match_all('/('.join('|', $words_list).')/i', $string, $matches);
echo $num_found; // 2
Make sure you add the 'm' flag to make the ^ match the beginning of a line:
$expression = '/foo/m';
Or remove the ^ if you don't mean to match the beginning of a line...

Categories