I have applied this code for my search function. It is currently working great. However, if I typed a keyword without accented letter, it won't highlight the word.
Example:
$text ="iphone mới"
If i typed in keywords "moi", it won't highlight the word "mới" in $text. I also tried u flag in the pattern as suggesttion from google, but it did not work either. Please help...
Here is my code:
<?php
function highlightWords($text, $words)
{
/*** loop of the array of words ***/
foreach ($words as $word)
{
/*** quote the text for regex ***/
$word = preg_quote($word);
/*** highlight the words ***/
$text = preg_replace("/($word)/ui", '<span class="highlight_word">\1</span>', $text);
}
/*** return the text ***/
return $text;
}
/*** example usage ***/
$text = 'this is my iphone mới';
/*** an array of words to highlight ***/
$words = array('moi');
/*** highlight the words ***/
$highlighttext = highlightWords($string, $words);
?>
<html>
<head>
<title>PHPRO Highlight Search Words</title>
<style type="text/css">
.highlight_word{
background-color: pink;
}
</style>
</head>
<body>
<?php echo $highlighttext; ?>
You could use this code:
setlocale(LC_ALL, 'en_US');
$word = iconv("utf-8","ascii//TRANSLIT",$word);
/*** quote the text for regex ***/
This will remove all the diacritics from the $word.
You can then iterate through each character and replace it with a character class (if the character has alternative forms). For example o becomes [ớọơờớòo].
Note: nhahtdh makes a good point. I don't know what language these diacritics come from, much less if any characters should become characters with different diacritics.
If you want to do something along that line, don't flatten the $word and add rules for accented characters, like: ơ becomes [ờớơ].
Related
I am trying to implement this code for my search function which will highlight matched keywords in the result. It is working great but the problem is it won't highlight keywords with special marks like:
$text="iphone mới";
Question 1: If keyword is "mới" it will highlight the word "mới" in $text. But if keyword is "moi", it won't highlight it. By the way, mới = new in my language. So how can i adjust this code to make it work?
Question 2: Also how to make it highlight part of word in $text like: If keyword is "iph" it will also highlight iph of the word iphone in $text.
Many thanks in advance...!!!
<?php
function highlightWords($text, $words)
{
/*** loop of the array of words ***/
foreach ($words as $word)
{
/*** quote the text for regex ***/
$word = preg_quote($word);
/*** highlight the words ***/
$text = preg_replace("/\b($word)\b/i", '<span class="highlight_word">\1</span>', $text);
}
/*** return the text ***/
return $text;
}
/*** example usage ***/
$text = 'This text will highlight PHP and SQL and sql but not PHPRO or MySQL or sqlite';
/*** an array of words to highlight ***/
$words = array('php', 'sql');
/*** highlight the words ***/
$highlighttext = highlightWords($string, $words);
?>
<html>
<head>
<title>PHPRO Highlight Search Words</title>
<style type="text/css">
.highlight_word{
background-color: pink;
}
</style>
</head>
<body>
<?php echo $string; ?>
Try this code,
function highlightWords($str, $replaceWord)
{
$new = '<span class="highlight_word">'.$replaceWord.'</span>';
$str = preg_replace_callback( "/$replaceWord/", function( $match) use( $new) {
return $new;
}, $str);
return $str;
}
In a PHP variable a mixed language context is present. An example is below:
$variable="This is sample text I am storing in the variable. இதன் கூடவே மற்ற மொழி எழுத்துக்களும் உள்ளன"
So the variable $variable contains both English and other language (Tamil in the above example).
Now I need to add a tag with class something enclosing the Tamil text such as:
$variable="This is sample text I am storing in the variable. <span class='tamil'>இதன் கூடவே மற்ற மொழி எழுத்துக்களும் உள்ளன</span>"
How to omit the English letters and punctuation symbols and add <span> to other language sentence completely or paragraph?
There's a unicode range that you can use to create a regex, this will help you find tamil chars in your text: http://unicode.org/charts/PDF/U0B80.pdf
[\u0B80-\u0BFA-]*
I have put together a playground for this example so that you can improve it to do what you need to do.
http://regex101.com/r/wT8hP4
The following is not gold plated code, but hope it can get you started.
<?php
$variable="This is sample text I am storing in the variable. இதன் கூடவே மற்ற மொழி எழுத்துக்களும் உள்ளன";
echo add_tamil_class($variable);
/**
* Adds a HTML Span tag around tamil text using regex
*/
function add_tamil_class($text) {
preg_match_all("/[\x{0B80}-\x{0BFA}]+/u", $text, $matches);
$tamilSentence = implode(' ', $matches[0]);
return str_replace(
$tamilSentence,
"<span class='tamil'>".$tamilSentence."</span>",
$text
);
}
As Filype mentioned, we can use the unicode ranges for this.
This should match even in cases like 'English' -> 'Tamil' -> 'English' -> 'Tamil'. Though it'll wrap extra spaces into the span.
/**
* #param String $str Input UTF-8 encoded string.
*/
function encapsulate_tamil($str)
{
return preg_replace('/[\x{0B80}-\x{0BFF}][\x{0B80}-\x{0BFF}\s]*/u',
'<span class=\'tamil\'>$0</span>', $str);
}
I have a PHP highlighting function which makes certain words bold.
Below is the function, and it works great, except when the array: $words contains a single value that is: b
For example someone searches for: jessie j price tag feat b o b
This will have the following entries in the array $words: jessie,j,price,tag,feat,b,o,b
When a 'b' shows up, my whole function goes wrong, and it displays a whole bunch of wrong html tags. Of course I can strip out any 'b' values from the array, but this isn't ideal, as the highlighting isnt working as it should with certain queries.
This sample script:
function highlightWords2($text, $words)
{
$text = ($text);
foreach ($words as $word)
{
$word = preg_quote($word);
$text = preg_replace("/\b($word)\b/i", '<b>$1</b>', $text);
}
return $text;
}
$string = 'jessie j price tag feat b o b';
$words = array('jessie','tag','b','o','b');
echo highlightWords2($string, $words);
Will output:
<<<b>b</b>><b>b</b></<b>b</b>>>jessie</<<b>b</b>><b>b</b></<b>b</b>>> j price <<<b>b</b>><b>b</b></<b>b</b>>>tag</<<b>b</b>><b>b</b></<b>b</b>>> feat <<b>b</b>><b>b</b></<b>b</b>> <<b>b</b>>o</<b>b</b>> <<b>b</b>><b>b</b></<b>b</b>>
And this only happens because there are "b"'s in the array.
Can you guys see anything that I could change to make it work properly?
You problem is that when your function goes through and looks for all the b's to bold it sees the bold tags and also tries to bold them as well.
#symcbean was close but forgot one thing.
$string = 'jessie j price tag feat b o b';
$words = array('jessie','tag','b','o','b');
print hl($string, $words);
function hl($inp, $words)
{
$replace=array_flip(array_flip($words)); // remove duplicates
$pattern=array();
foreach ($replace as $k=>$fword) {
$pattern[]='/\b(' . $fword . ')(?!>)\b/i';
$replace[$k]='<b>$1</b>';
}
return preg_replace($pattern, $replace, $inp);
}
Do you see this added "(?!>)" that is a negative look ahead assertion, basically it says only match if the string is not followed by a ">" which is what would be seen is opening bold and closing bold tags. Notice I only check for ">" after the string in order to exclude both the opening and closing bold tag as looking for it at the start of the string would not catch the closing bold tag. The above code works exactly as expected.
Your base problem is that you quite wildly replace plain text strings inside HTML. That does cause your problem for small strings as you replace text in tags and attributes as well.
Instead you need to apply your search and replace to the text between HTML texts only. Additionally you don't want to highlight inside another highlight as well.
To do such things, regular expressions are quite limited. Instead use a HTML parser, in PHP this is for example DOMDocument. With a HTML parser it is possible to search only inside the HTML text elements (and not other things like tags, attributes and comments).
You find a highlighter for text in a previous answer of mine with a detailed description how it works. The question is Ignore html tags in preg_replace and it is quite similar to your question so probably this snippet is helpful, it uses <span> instead of <b> tags:
$doc = new DOMDocument;
$doc->loadXML($str);
$xp = new DOMXPath($doc);
$anchor = $doc->getElementsByTagName('body')->item(0);
if (!$anchor)
{
throw new Exception('Anchor element not found.');
}
// search elements that contain the search-text
$r = $xp->query('//*[contains(., "'.$search.'")]/*[FALSE = contains(., "'.$search.'")]/..', $anchor);
if (!$r)
{
throw new Exception('XPath failed.');
}
// process search results
foreach($r as $i => $node)
{
$textNodes = $xp->query('.//child::text()', $node);
// extract $search textnode ranges, create fitting nodes if necessary
$range = new TextRange($textNodes);
$ranges = array();
while(FALSE !== $start = strpos($range, $search))
{
$base = $range->split($start);
$range = $base->split(strlen($search));
$ranges[] = $base;
};
// wrap every each matching textnode
foreach($ranges as $range)
{
foreach($range->getNodes() as $node)
{
$span = $doc->createElement('span');
$span->setAttribute('class', 'search_hightlight');
$node = $node->parentNode->replaceChild($span, $node);
$span->appendChild($node);
}
}
}
If you adopt it for multiple search terms, I would add an additional class with a number depending on the search term so you can nicely style it with CSS in different colors.
Additionally you should remove duplicate search terms and make the xpath expression aware to not look for text that is already part of an element that has the highlight span assigned.
If it were me I'd have used javascript.
But using PHP, since the problem only seems to be duplicate entries in the search, just remove them, also you can run preg_replace just once rather than multiple times....
$string = 'jessie j price tag feat b o b';
$words = array('jessie','tag','b','o','b');
print hl($string, $words);
function hl($inp, $words)
{
$replace=array_flip(array_flip($words)); // remove duplicates
$pattern=array();
foreach ($replace as $k=>$fword) {
$pattern[]='/\b(' . $fword . ')\b/i';
$replace[$k]='<b>$1<b>';
}
return preg_replace($pattern, $replace, $inp);
}
I have written a regex for searching particular keyword and I am replacing that keyword with particular URL.
My current regex is as: \b$keyword\b
One problem in this is that if my data contains anchor tags and that tag contains this keyword then this regex replaces that keyword in the anchor tag as well.
I want to search in given data excluding anchor tag. Please help me out. Appreciate your help.
eg. Keyword: Disney
I/p:
This is Disney The disney should be replaceable
Expected O/p:
This is Disney The disney should be replaceable
Invalid o/p:
This is <a href="any-url.php">Disney </a> The disney should be replaceable
I've modified my function that highlights searched phrase on a page, here you go:
$html = 'This is Disney The disney should be replaceable.'.PHP_EOL;
$html .= 'Let\'s test also use of keyword inside other tags, for example as class name:'.PHP_EOL;
$html .= '<b class=disney></b> - this should not be replaced with link, and it isn\'t!'.PHP_EOL;
$result = ReplaceKeywordWithLink($html, "disney", "any-url.php");
echo nl2br(htmlspecialchars($result));
function ReplaceKeywordWithLink($html, $keyword, $link)
{
if (strpos($html, "<") !== false) {
$id = 0;
$unique_array = array();
// Hide existing anchor tags with some unique string.
preg_match_all("#<a[^<>]*>[\s\S]*?</a>#i", $html, $matches);
foreach ($matches[0] as $tag) {
$id++;
$unique_string = "#####$id#####";
$unique_array[$unique_string] = $tag;
$html = str_replace($tag, $unique_string, $html);
}
// Hide all tags by replacing with some unique string.
preg_match_all("#<[^<>]+>#", $html, $matches);
foreach ($matches[0] as $tag) {
$id++;
$unique_string = "#####$id#####";
$unique_array[$unique_string] = $tag;
$html = str_replace($tag, $unique_string, $html);
}
}
// Then we replace the keyword with link.
$keyword = preg_quote($keyword);
assert(strpos($keyword, '$') === false);
$html = preg_replace('#(\b)('.$keyword.')(\b)#i', '$1$2$3', $html);
// We get back all the tags by replacing unique strings with their corresponding tag.
if (isset($unique_array)) {
foreach ($unique_array as $unique_string => $tag) {
$html = str_replace($unique_string, $tag, $html);
}
}
return $html;
}
Result:
This is Disney The disney should be replaceable.
Let's test also use of keyword inside other tags, for example as class name:
<b class=disney></b> - this should not be replaced with link, and it isn't!
Add this to the end of your regex:
(?=[^<]*(?:<(?!/?a\b)[^<]*)*(?:<a\b|\z))
This lookahead tries to match either the next opening <a> tag or the end of the input, but only if it doesn't see a closing </a> tag first. Assuming the HTML is minimally well formed, the lookahead will fail whenever the match starts after the beginning of an <a> tag and before the corresponding </a> tag.
To prevent it from matching inside any other tag (e.g. <div class="disney">), you can add this lookahead as well:
(?![^<>]*+>)
With this one I'm assuming there won't be any angle brackets in the attribute values of the tags, which is legal according to the HTML 4 spec, but extremely rare in the real world.
If you're writing the regex in the form of a PHP double-quoted string (which you must be, if you expect the $keyword variable to be replaced) you should double all the backslashes. \z probably wouldn't be a problem but I believe \b would be interpreted as a backspace, not as a word-boundary assertion.
EDIT: On second thought, definitely do add the second lookahead--I mean, why would not want to prevent matches inside tags? And place it first, because it will tend to evaluate more quickly than the other:
(?![^<>]*+>)(?=[^<]*(?:<(?!/?a\b)[^<]*)*(?:<a\b|\z))
strip the tags first, then search on the stripped text.
I am using this class to highlight the search keywords on a piece of text:
class highlight
{
public $output_text;
function __construct($text, $words)
{
$split_words = explode( " " , $words );
foreach ($split_words as $word)
{
$text = preg_replace("|($word)|Ui" ,
"<font style=\"background-color:yellow;\"><b>$1</b></font>" , $text );
}
$this->output_text = $text;
}
}
If
$text = "Khalil, M., Paas, F., Johnson, T.E., Su, Y.K., and Payer, A.F. (2008.) Effects of Instructional Strategies Using Cross Sections on the Recognition of Anatomical Structures in Correlated CT and MR Images. <i>Anatomical Sciences Education, 1(2)</i>, 75-83 "
which already contains HTML tags, and some of my search keywords are
$words = "Effects color"
The first look will highlight the word Effects, with <font style="background-color:yellow">Effect</font>, but the second loop will highlight the word color in the HTML tag. What should I do?
Is it possible to tell preg_replace to only highlight text when its not inside an alligator bracket?
Use a HTML parser to make sure that you only search through text.
You could use a CSS highlighted class instead and then use span tags, eg.
<span class="highlighted">word</span>
Then define your highlighted class in CSS. You could then exclude the word 'highlighted' from being valid in your search. Of course renaming the class to something obscure would help.
This also has the benefit of allowing you to change the highlight colour easily in the future, or indeed allowing the user to toggle it on and off by modifying the CSS.
Why use a loop?
function __construct($text, $words)
{
$split_words = preg_replace("\s+", "|", $words);
$this->output_text = preg_replace("/($split_words)/i" ,
"<font style=\"background-color:yellow; font-weight:bold;\">$1</font>" , $text );
}
A possible work-around would be to first wrap it with characters, which would (to 99%) not be a search input and replace those characters with the html tags after the 'foreach' loop:
class highlight
{
public $output_text;
function __construct($text, $words)
{
$split_words = explode(" ", $words);
foreach ($split_words as $word)
{
$text = preg_replace("|($word)|Ui", "%$1~", $text);
}
$text = str_replace("~", "</b></span>", str_replace("%", "<span style='background-color:yellow;'><b>", $text));
$this->output_text = $text;
}
}