PHP Search Through Files for Keyword (case insensitive)? - php

So I have the following function:
function findMatches($pathToDirectory, $keyword){
$results = array();
$htmlString = "";
$fileList = glob($pathToDirectory);
natsort($fileList);
foreach ($fileList as $search) {
$contents = file_get_contents($search);
$episodeTitle = fgets(fopen($search, 'r'));
$episodeTitle = "<p class='episode_title'>$episodeTitle</p>";
$sentences = preg_split('/(?<=[.])\s+(?=[a-z])/i', $contents);
foreach ($sentences as $sentence) {
if (strpos($sentence, $keyword)) {
if (!in_array($episodeTitle, $results)) {
array_push($results, $episodeTitle);
}
array_push($results, $sentence);
}
}
}
foreach ($results as $result){
$highlightedKeyword = '<span class="keyword_highlight">' . $keyword . '</span>';
$newResult = str_replace($keyword, $highlightedKeyword, $result);
$htmlString .= '<p class="search_result">' . $newResult . '</p>';
}
$totalResults = 'Total Results: <span class=\'number_result\'>' . count($results) . '</span>';
return $htmlString = $totalResults . $htmlString;
}
It opens every text file in a directory ($filelist), takes its contents, splits them up into sentences ($sentences), and then saves the sentences that contain a user defined keyword into an array ($results). Then, it iterates through $results to wrap the keyword in HTML (so that the word appears highlighted within the sentence to the user), and and finally it wraps each sentence in HTML and sends them for presentation to the user.
However, currently the function is case sensitive. What's a good way to make it case insensitive? I tried using stripos() instead of strpos() in the foreach ($sentences as $sentence) loop, and that made the search itself case insensitive (like I want), but the problem is I couldn't figure out how to highlight both upper and lowercase versions of the word correctly if I wrote the function this way.
Also please let me know if you need clarification on any of this, I'm not sure I explained it too well

You need to use stripos() and also str_ireplace() when you're doing your highlighting.

Related

Automatically add links to strings of keywords

In my article, I want to automatically add links to keywords.
My keywords array:
$keywords = [
0=>['id'=>1,'slug'=>'getName','url'=>'https://example.com/1'],
1=>['id'=>2,'slug'=>'testName','url'=>'https://example.com/2'],
2=>['id'=>3,'slug'=>'ign','url'=>'https://example.com/3'],
];
This is my code:
private function keywords_replace(string $string, array $key_array)
{
$array_first = $key_array;
$array_last = [];
foreach ($array_first as $key=>$value)
{
$array_last[$key] = [$key, $value['slug'], '<a target="_blank" href="' . $value['url'] . '" title="' . $value['slug'] . '">' . $value['slug'] . '</a>'];
}
$count = count($array_last);
for ($i=0; $i<$count;$i++)
{
for ($j=$count-1;$j>$i;$j--)
{
if (strlen($array_last[$j][1]) > strlen($array_last[$j-1][1]))
{
$tmp = $array_last[$j];
$array_last[$j] = $array_last[$j-1];
$array_last[$j-1] = $tmp;
}
}
}
$keys = $array_last;
foreach ($keys as $key)
{
$string = str_ireplace($key[1],$key[0],$string);
}
foreach ($keys as $key)
{
$string = str_ireplace($key[0],$key[2],$string);
}
return $string;
}
result:
$str = "<p>Just a test: getName testName";
echo $this->keywords_replace($str,$keywords);
like this:Just a test: getName testName
very import: If the string has no spaces, it will not match.Because I will use other languages, sentences will not have spaces like English. Like Wordpress key words auto link
I think my code is not perfect,Is there a better algorithm to implement this function? Thanks!
You can use array_reduce and preg_replace to replace all occurrences of the slug words in your string with the corresponding url values:
$keywords = [
0=>['id'=>1,'slug'=>'getName','url'=>'https://www.getname.com'],
1=>['id'=>2,'slug'=>'testName','url'=>'https://www.testname.com'],
2=>['id'=>3,'slug'=>'ign','url'=>'https://www.ign.com'],
];
$str = "<p>Just a test: getName testName";
echo array_reduce($keywords, function ($c, $v) { return preg_replace('/\\b(' . $v['slug'] . ')\\b/', $v['url'], $c); }, $str);
Output:
<p>Just a test: https://www.getname.com https://www.testname.com
Demo on 3v4l.org
Update
To change the text into links, you need to use this:
echo array_reduce($keywords,
function ($c, $v) {
return preg_replace('/\\b(' . $v['slug'] . ')\\b/',
'$1', $c);
},
$str);
Output:
<p>Just a test: getName testName
Updated demo
Update 2
Because some of the links that are being substituted include words that are also values of slug, it's necessary to do all the replacements at once using the array format of strtr. We build an array of patterns and replacements using array_column, array_combine and array_map, then pass that to strtr:
$reps = array_combine(array_column($keywords, 'slug'),
array_map(function ($k) { return '' . $k['slug'] . ''; }, $keywords
));
$newstr = strtr($str, $reps);
New demo
First you need to change structure of array to key/value using loop that result stored in $newKeywords. Then using preg_replace_callback() select every word in string and check that it exist in key of array. If exist, wrap it in anchor tag.
$newKeywords = [];
foreach ($keywords as $keyword)
$newKeywords[$keyword['slug']] = $keyword['url'];
$newStr = preg_replace_callback("/(\w+)/", function($m) use($newKeywords){
return isset($newKeywords[$m[0]]) ? "<a href='{$newKeywords[$m[0]]}'>{$m[0]}</a>" : $m[0];
}, $str);
Output:
<p>Just a test: <a href='https://www.getname.com'>getName</a> <a href='https://www.testname.com'>testName</a></p>
Check result in demo
My answer uses preg_replace as does Nick's above.
It relies on the patterns and replacements being equally sized arrays, with corresponding patterns and replacements.
Word boundaries need to be respected, which I doubt you can do with a simple string replacement.
<?php
$keywords = [
0=>['id'=>1,'slug'=>'foo','url'=>'https://www.example.com/foo'],
1=>['id'=>2,'slug'=>'bar','url'=>'https://www.example.com/bar'],
2=>['id'=>3,'slug'=>'baz','url'=>'https://www.example.com/baz'],
];
foreach ($keywords as $item)
{
$patterns[] = '#\b(' . $item['slug'] . ')\b#i';
$replacements[] = '$1';
}
$html = "<p>I once knew a barbed man named <i>Foo</i>, he often visited the bar.</p>";
print preg_replace($patterns, $replacements, $html);
Output:
<p>I once knew a barbed man named <i>Foo</i>, he often visited the bar.</p>
This is my answer: thanks for #Nick
$content = array_reduce($keywords , function ($c, $v) {
return preg_replace('/(>[^<>]*?)(' . $v['slug'] . ')([^<>]*?<)/', '$1$2$3', $c);
}, $str);

Finding all matches of string, and also returning line number of match

I have a string variable which contains some text (shown below). The text has line breaks in it as shown. I would like to search the text for a given string, and return the number of matches per line number. For instance, searching for "keyword" would return 1 match on line 3 and 2 matches on line 5.
I have tried using strstr(). It does a good job finding the first match, and giving me the remaining text, so I can do it again and again until there are no matches. Problem is I do not know how to determine which line number the match occurred on.
Hello,
This is some text.
And a keyword.
Some more text.
Another keyword! And another keyword.
Goodby.
Why not split the text on line-feeds and loop, use the index + 1 as a line number:
$txtParts = explode("\n",$txt);
for ($i=0, $length = count($txtParts);$i<$length;$i++)
{
$tmp = strstr($txtParts[$i],'keyword');
if ($tmp)
{
echo 'Line '.($i +1).': '.$tmp;
}
}
Tested, and working. Just a quick tip, since you're looking for matches in a text (sentences, upper- and lower-case etc...) perhaps stristr (case-insensitive) would be better?An example with foreach and stristr:
$txtParts = explode("\n",$txt);
foreach ($txtParts as $number => $line)
{
$tmp = stristr($line,'keyword');
if ($tmp)
{
echo 'Line '.($number + 1).': '.$tmp;
}
}
With this code you can have all data in one array (Linenumber and position numbers)
<?php
$string = "Hello,
This is some text.
And a keyword.
Some more text.
Another keyword! And another keyword.
Goodby.";
$expl = explode("\n", $string);
$linenumber = 1; // first linenumber
$allpos = array();
foreach ($expl as $str) {
$i = 0;
$toFind = "keyword";
$start = 0;
while($pos = strpos($str, $toFind, $start)) {
//echo $toFind. " " . $pos;
$start = $pos+1;
$allpos[$linenumber][$i] = $pos;
$i++;
}
$linenumber++; // linenumber goes one up
}
foreach ($allpos as $linenumber => $position) {
echo "Linenumber: " . $linenumber . "<br/>";
foreach ($position as $pos) {
echo "On position: " .$pos . "<br/>";
}
echo "<br/>";
}
Angelo's answer definitely provides more functionality and is probably the best answer, but the following is simple and seems to work. I will continue to play with all solutions.
function findMatches($text,$phrase)
{
$list=array();
$lines=explode("\n", $text);
foreach($lines AS $line_number=>$line)
{
str_replace($phrase,$phrase,$line,$count);
if($count)
{
$list[]='Found '.$count.' match(s) on line '.($line_number+1);
}
}
return $list;
}

Checking if an index in a multidimensional array is an array

The code:
$row['text'] = 'http://t.co/iBSiZZD4 and http://t.co/1rG3oNmc and http://t.co/HGFjwqHI and http://t.co/8UldEAVt';
if(preg_match_all('|http:\/\/t.co\/.{1,8}|i',$row['text'],$matches)){
foreach($matches[0] as $value){
$headers = get_headers($value,1);
if(is_array($headers['Location'])){
$headers['Location'] = $headers['Location'][0];
}
$row['text'] = preg_replace('|http:\/\/t.co\/.{1,8}|i', '' . $headers['Location'] . '',$row['text']);
}
}
This is related to get_headers(). Sometimes get_headers($url,1) returns an array with a location index key like so: [Location]=>Array([0]=>url1 [1]=>url2). I basically want to make [Location] equal to [Location][0] if [Location][0] exists. However, the above code doesn't seem to accomplish that task. I've also tried array_key_exists() and isset() but neither solved the problem. Thoughts?
Don't try to replace on the fly. First get all the values, and then do the replace in one batch (using two arrays as the $search and $replace parameters).
<?php
$replace = array();
$row = array('text' => 'http://t.co/iBSiZZD4 and http://t.co/1rG3oNmc and http://t.co/HGFjwqHI and http://t.co/8UldEAVt');
if (preg_match_all('|http:\/\/t.co\/.{1,8}|i', $row['text'], $search)) {
foreach ($search[0] as $value) {
$headers = get_headers($value, 1);
if (is_array($headers['Location'])) {
$headers['Location'] = $headers['Location'][0];
}
$replace[] = "<a href='{$headers["Location"]}'>{$headers["Location"]}</a>";
}
$row['text'] = str_replace($search[0], $replace, $row['text']);
echo $row["text"];
}
P.S. - Next time, please tell us the context of your problem, tell us you "are making a service that resolves shortened URLs", don't let me figure that out from your code alone.

PHP Highlight Search Keywords Using preg_replace With an Array

I'm using this function from here, which is:
// highlight search keywords
function highlight($title, $search) {
preg_match_all('~\w+~', $search, $m);
if(!$m)
return $title;
$re = '~\\b(' . implode('|', $m[0]) . ')\\b~i';
return preg_replace($re, '<span style="background-color: #ffffcc;">$0</span>', $title);
}
Which works great, but only for titles. I want to be able to pass an array that contains $title and $description.
I was trying something like this:
$replacements = array($title, $description);
// highlight search keywords
function highlight($replacements, $search) {
preg_match_all('~\w+~', $search, $m);
if(!$m)
return $replacements;
$re = '~\\b(' . implode('|', $m[0]) . ')\\b~i';
return preg_replace($re, '<span style="background-color: #ffffcc;">$0</span>', $replacements);
}
It isn't working. It's passing an array as the title, and not highlighting the description (although it is actually returning a description). Any idea how to get this working?
I would personally leave the original function as only operating on one parameter rather than an array. It would make your calling code nice and clear;
$titleHighlighted = highlight($title, $searchKeywords);
$descriptionHighlighted = highlight($title, $searchKeywords);
However, I would rewrite your function to use str_ireplace rather than preg_replace;
function highlight($contentBlock, array $keywords) {
$highlightedContentBlock = $contentBlock;
foreach ($keywords as $singleKeyword) {
$highlightedKeyword = '<span class = "keyword">' . $singleKeyword . '</span>';
$highlightedContentBlock = str_ireplace($singleKeyword, $highlightedKeyword, $highlightedContentBlock);
}
return $highlightedContentBlock;
}
This rewritten function should be more simple to read and does not have the overhead of compiling the regular expressions. You can call it as many times as you like for any content block (title, description, etc);
$title = "The quick brown fox jumper over ... ";
$searchKeywords = array("quick", "fox");
$titleHighlighted = highlight($title, $searchKeywords);
echo $titleHighlighted; // The <span class = "keyword">quick</span> brown ...
have you try to change ?
$m[0]
with
$m[0][0]

How to wrap user mentions in a HTML link on PHP?

Im working on a commenting web application and i want to parse user mentions (#user) as links. Here is what I have so far:
$text = "#user is not #user1 but #user3 is #user4";
$pattern = "/\#(\w+)/";
preg_match_all($pattern,$text,$matches);
if($matches){
$sql = "SELECT *
FROM users
WHERE username IN ('" .implode("','",$matches[1]). "')
ORDER BY LENGTH(username) DESC";
$users = $this->getQuery($sql);
foreach($users as $i=>$u){
$text = str_replace("#{$u['username']}",
"<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a> ", $text);
}
$echo $text;
}
The problem is that user links are being overlapped:
<a rel="11327" class="ct-userLink" href="#">
<a rel="21327" class="ct-userLink" href="#">#user</a>1
</a>
How can I avoid links overlapping?
Answer Update
Thanks to the answer picked, this is how my new foreach loop looks like:
foreach($users as $i=>$u){
$text = preg_replace("/#".$u['username']."\b/",
"<a href='#' title='{$u['user_id']}'>#{$u['username']}</a> ", $text);
}
Problem seems to be that some usernames can encompass other usernames. So you replace user1 properly with <a>user1</a>. Then, user matches and replaces with <a><a>user</a>1</a>. My suggestion is to change your string replace to a regex with a word boundary, \b, that is required after the username.
The Twitter widget has JavaScript code to do this. I ported it to PHP in my WordPress plugin. Here's the relevant part:
function format_tweet($tweet) {
// add #reply links
$tweet_text = preg_replace("/\B[#@]([a-zA-Z0-9_]{1,20})/",
"#<a class='atreply' href='http://twitter.com/$1'>$1</a>",
$tweet);
// make other links clickable
$matches = array();
$link_info = preg_match_all("/\b(((https*\:\/\/)|www\.)[^\"\']+?)(([!?,.\)]+)?(\s|$))/",
$tweet_text, $matches, PREG_SET_ORDER);
if ($link_info) {
foreach ($matches as $match) {
$http = preg_match("/w/", $match[2]) ? 'http://' : '';
$tweet_text = str_replace($match[0],
"<a href='" . $http . $match[1] . "'>" . $match[1] . "</a>" . $match[4],
$tweet_text);
}
}
return $tweet_text;
}
instead of parsing for '#user' parse for '#user ' (with space in the end) or ' #user ' to even avoid wrong parsing of email addresses (eg: mailaddress#user.com) maybe ' #user: ' should also be allowed. this will only work, if usernames have no whitespaces...
You can go for a custom str replace function which stops at first replace.. Something like ...
function str_replace_once($needle , $replace , $haystack){
$pos = strpos($haystack, $needle);
if ($pos === false) {
// Nothing found
return $haystack;
}
return substr_replace($haystack, $replace, $pos, strlen($needle));
}
And use it like:
foreach($users as $i=>$u){
$text = str_replace_once("#{$u['username']}",
"<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a> ", $text);
}
You shouldn’t replace one certain user mention at a time but all at once. You could use preg_split to do that:
// split text at mention while retaining user name
$parts = preg_split("/#(\w+)/", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$n = count($parts);
// $n is always an odd number; 1 means no match found
if ($n > 1) {
// collect user names
$users = array();
for ($i=1; $i<$n; $i+=2) {
$users[$parts[$i]] = '';
}
// get corresponding user information
$sql = "SELECT *
FROM users
WHERE username IN ('" .implode("','", array_keys($users)). "')";
$users = array();
foreach ($this->getQuery($sql) as $user) {
$users[$user['username']] = $user;
}
// replace mentions
for ($i=1; $i<$n; $i+=2) {
$u = $users[$parts[$i]];
$parts[$i] = "<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a>";
}
// put everything back together
$text = implode('', $parts);
}
I like dnl solution of parsing ' #user', but maybe is not suitable for you.
Anyway, did you try to use strip_tags function to remove the anchor tags? That way you have the string without the links, and you can parse it building the links again.
strip_tags

Categories