Full text search PHP alone - php

I have an InnoDB table from which values are retrieved and stored in an array in PHP.
Now I want to sort the array by relevance to the matches in the search string.
eg: If I search "hai how are you", it will split the string into separate words as "hai" "how" "are" "you" and the results after search must be as follows:
[0] hai how are all people there
[1] how are things going
[2] are you coming
[3] how is sam
...
Is there any way I can sort the array by relevance in basic PHP functions alone?

Maybe something like this:
$arrayToSort=array(); //define your array here
$query="hai how are you";
function compare($arrayMember1,$arrayMember2){
$a=similar_text($arrayMember1,$query);
$b=similar_text($arrayMember2,$query);
if($a>$b)return 1;
else return -1;
}
usort($arrayToSort,"compare");
Look in the php manual for clarification on what similar_text and usort do.

$searchText = "hai how are you"; //eg: if there are multiple spaces between words
$searchText = preg_replace("(\s+)", " ", $searchText );
$searchArray =& split( " ", $searchText );
$text = array(0 => 'hai how are all people there',
1 => 'how are things going ',
2 => 'are you coming',
3 => 'how is sam',
4 => 'testing ggg');
foreach($text as $key=>$elt){
foreach($searchArray as $searchelt){
if(strpos($elt,$searchelt)!== FALSE){
$matches[] = $key; //just storing key to avoid memory wastage
break;
}
}
}
//print the matched string with help of stored keys
echo '<pre>matched string are as follows: ';
foreach ($matches as $key){
echo "<br>{$text[$key]}";
}

Related

Matching best similar array element

I have an array of keywords on which i run foreach loop and match each element with specific search term. e.g. i have array like
Array(
[0] => polka dresses
[1] => polka clothes
[2] => polka dots dress
[3] => polka dots bottoms
)
and i search for the term polka in my array. it gives result when use strpos or stristr (Also tried similar_text but no results).
Issue
if i search for polka it works but, if accidentally, i type p0lka then it do not give any result.
Is there anyway to achieve this.
If you want to get most similar results of a typed word, then you can calculate Levenshtein distance between the searched word and stored words and return results which have the least distance.
You can make use of PHP's levenshtein function for this.
PHP Snippet:
<?php
$data = array(
'polka dresses',
'polka clothes',
'polka dots dress',
'polka dots bottoms',
'dummy dummy'
);
function getSimilarMatches($sentences,$search_str){
$min_distance = -1;
$closest_matches = [];
foreach($sentences as $sentence){
$min_levenshtein_dist = -1;
foreach(explode(" ",$sentence) as $word){
$levenshtein_dist = levenshtein($word,$search_str);
if($min_levenshtein_dist == -1 || $min_levenshtein_dist > $levenshtein_dist){
$min_levenshtein_dist = $levenshtein_dist;
}
}
if($min_distance == -1 || $min_distance > $min_levenshtein_dist){
$min_distance = $min_levenshtein_dist;
$closest_matches = [];
$closest_matches[] = $sentence;
}else if($min_distance === $min_levenshtein_dist){
$closest_matches[] = $sentence;
}
}
return $closest_matches;
}
print_r(getSimilarMatches($data,'polka'));
print_r(getSimilarMatches($data,'p0lka'));
Demo: https://3v4l.org/E9gea

How to search for 2 or more matches in a single string?

I want to search for two or more words, but not able to match the exact word.
If I use some other function like stripos() I don't get the required output.
$string = "abc india ltd";
$Arr = array('xyz ab','abc india', 'pqr', 'yz lmn');
$Arr = implode('',$Arr);
if (preg_match_all("/$string/", $Arr)) {
echo '<b>'.'found'.'<font color="green">'.$string.'</font>'.'</b>';
}
or (Both Same, But want to avoid using inside a loop)
$string = "abc india ltd";
$Arr = array('xyz ab','abc india', 'pqr', 'yz lmn');
foreach ($Arr as $value) {
if (preg_match_all("/$string/", $value)) {
echo '<b>'.'found '.'<font color="green">'.$string.'</font>'.'</b>';
}
}
I think you want like this:-
<?php
$string = "abc india";
$Arr = array('xyz india','abc india', 'pqr', 'yz lmn',"xyz abc india");
foreach($Arr as $val){
$explode_string = explode(" ",$string);
$counter =0;
foreach($explode_string as $explode_str){
if(strpos($val,$explode_str) !== FALSE){
$counter +=1;
}
}
if($counter>=2){
echo $val. " have exact ".$counter. " word matches";
echo PHP_EOL;
}
}
Output:-https://eval.in/831808
Note:- check it with all possible test-cases and let us know, worked or not?
This leverages some smart array functions to keep the code block concise and avoid manually incrementing a counter:
Code: (Demo)
$input_string="abc india ltd";
$input_array=explode(' ',$input_string);
$search_array=array('xyz ab','abc india','ltd abc','yz lmn');
foreach($search_array as $search_string){
if(sizeof(array_intersect($input_array,explode(' ',$search_string)))>1){
echo "More than one word in $search_string was found in $input_string\n";
}
}
Output:
More than one word in abc india was found in abc india ltd
More than one word in ltd abc was found in abc india ltd
*note, array_intersect() generates a new array that only contains elements that exist in both provided arrays. sizeof() is an alias of count().

Retrieving words with colon, and associated data

I have data formatted as such:
some words go here priority: p1,p2 -rank:3 status: not delayed
Basically I need to retrieve each set of data that corresponds to the colon name.
Ideally if I could end up with an array structure such that
keywords => 'some words go here'
priority => 'p1,p2'
-rank => 3
status => 'not delayed'
A few caveats:
keywords will not have a defining colon-word (keywords are just placed in the front)
keywords will not always exist (might just be colon-words)
colon-words will not always exist (might just be keywords)
I imagine regex will have to be used to parse this out, but this goes beyond my understanding of regex.
If there is a simpler approach to this I'd be happy to find out.
Any help appreciated!
A regular expression will certainly be a much more elegant approach to this as #HamZa showed, but here's a proof of concept to illustrate that you could just brute force the solution. Keep in mind, this is a proof of concept, I won't be doing your entire assignment for you ;)
<?php
$string = "keywords go here priority: p1,p2 -rank:3 status: not delayed";
$kv = array();
$key = "keywords";
$substrings = explode(":", $string);
foreach($substrings as $k => $substring) {
$pieces = explode(" ", $substring);
$chunk = $k == count($substrings) - 1 ? 0 : 1;
$kv[$key] = trim(join(" ", array_slice($pieces, 0, count($pieces)-$chunk)));
$key = $pieces[count($pieces)-1];
}
print_r($kv);
// Array
// (
// [keywords] => keywords go here
// [priority] => p1,p2
// [-rank] => 3
// [status] => not delayed
// )

Select words from string according to array list

I want to select specific words from a sentence according to my array list
$sentence = "please take this words only to display in my browser";
$list = array ("display","browser","words","in");
I want the output just like " words display in browser"
please somebody help me with this one. THX
I wonder if this one liner would do it :
echo join(" ", array_intersect($list, explode(" ",$sentence)));
Use at your own risk :)
edit : yay, it does the job, just tested
You can do it with preg_match:
$sentence = "please take this words only to display in my browser";
$list = array ("display","browser","words","in");
preg_match_all('/\b'.implode('\b|\b', $list).'\b/i', $sentence, $matches) ;
print_r($matches);
You'll get the words in order
Array
(
[0] => Array
(
[0] => words
[1] => display
[2] => in
[3] => browser
)
)
But be careful with regular expressions performance if the text is not that simple.
I don't know any short version for this rather than checking word by word.
$words = explode(" ", $sentence);
$new_sentence_array = array();
foreach($words as $word) {
if(in_array($word, $list)) {
$new_sentence_array[] = $word;
}
}
$new_sentece = implode(" ", $new_sentence_array);
echo $new_sentence;
I think you could search the string for each value in the array and assign it to a new array with the strpos value as the key; that would give you a sortable array that you could then output in the order that the terms appear in the string. See below, or example.
<?php
$sentence = "please take this words only to display in my browser";
$list = array ("display","browser","words","in");
$found = array();
foreach($list as $k => $v){
$position = strpos(strtolower($sentence), strtolower($v));
if($position){
$found[$position] = $v;
}
}
ksort($found);
foreach($found as $v){
echo $v.' ';
}
?>
$narray=array();
foreach ($list as $value) {
$status=stristr($sentence, $value);
if ($status) {
$narray[]=$value;
}
}
echo #implode(" ",$narray);

Split a string, remember the positions of splitting

Assume I have the following string:
I have | been very busy lately and need to go | to bed early
By splitting on "|", you get:
$arr = array(
[0] => I have
[1] => been very busy lately and need to go
[2] => to bed early
)
The first split is after 2 words, and the second split 8 words after that. The positions after how many words to split will be stored: array(2, 8, 3). Then, the string is imploded to be passed on to a custom string tagger:
tag_string('I have been very busy lately and need to go to bed early');
I don't know what the output of tag_string will be exactly, except that the total words will remain the same. Examples of output would be:
I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p
I-ee have been-vb very busy-df lately-nn and need-f to go to bed-uu early-yy
This will lengthen the string by an unknown number of characters. I have no control over tag_string. What I know is (1) the number of words will be the same as before and (2) the array was split after 2, and thereafter after 8 words, respectively. I now need a solution explode the tagged string into the same array as before:
$string = "I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p"
function split_string_again() {
// split after 2nd, and thereafter after 8th word
}
With output:
$arr = array(
[0] => I have-nn
[1] => been-vb very-vb busy lately and-rr need to-r go
[2] => to bed early-p
)
So to be clear (I wasn't before): I cannot split by remembering the strpos, because strpos before and after the string went through the tagger, aren't the same. I need to count the number of words. I hope I have made myself more clear :)
You wouldn't want to count the number of words, you would want to count the string length (strlen). If it is the same string without the pipes, then you want to split it with substr after a certain amount.
$strCounts = array();
foreach ($arr as $item) {
$strCounts[] = strlen($item);
}
// Later on.
$arr = array();
$i = 0;
foreach ($strCounts as $count) {
$arr[] = substr($string, $i, $count);
$i += $count; // increment the start position by the length
}
I have not tested this, simply a "theory" and probably has some kinks to work out. There may be a better way to go about it, I just don't know it.
Interesting question, although I think the rope data structure still applies it might be a little overkill since word placement won't change. Here is my solution:
$str = "I have | been very busy lately and need to go | to bed early";
function get_breaks($str)
{
$breaks = array();
$arr = explode("|", $str);
foreach($arr as $val)
{
$breaks[] = str_word_count($val);
}
return $breaks;
}
$breaks = get_breaks($str);
echo "<pre>" . print_r($breaks, 1) . "</pre>";
$str = str_replace("|", "", $str);
function rebreak($str, $breaks)
{
$return = array();
$old_break = 0;
$arr = str_word_count($str, 1);
foreach($breaks as $break)
{
$return[] = implode(" ", array_slice($arr, $old_break, $break));
$old_break += $break;
}
return $return;
}
echo "<pre>" . print_r(rebreak($str, $breaks), 1) . "</pre>";
echo "<pre>" . print_r(rebreak("I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p", $breaks), 1) . "</pre>";
Let me know if you have any questions, but it is pretty self explanatory. There are definitely ways to improve this as well.
I'm not quite sure I understood what you actually wanted to achieve. But here are a couple of things that might help you:
str_word_count() counts the number of words in a string. preg_match_all('/\p{L}[\p{L}\p{Mn}\p{Pd}\x{2019}]*/u', $string, $foo); does pretty much the same, but on UTF-8 strings.
strpos() finds the first occurrence of a string within another. You could easily find the positions of all | with this:
$pos = -1;
$positions = array();
while (($pos = strpos($string, '|', $pos + 1)) !== false) {
$positions[] = $pos;
}
I'm still not sure I understood why you can't just use explode() for this, though.
<?php
$string = 'I have | been very busy lately and need to go | to bed early';
$parts = explode('|', $string);
$words = array();
foreach ($parts as $s) {
$words[] = str_word_count($s);
}

Categories