Match part of string with part of other string - php

I working on a simple search function where I want to match a part of a string with part of another string.
Example:
The search term is: fruitbag
I want to match the product: fruit applebag
I want to create something so that the system matches:
fruitbag
fruit applebag
Or even "fruit" and "bag".
In summary; parts inside a string need to match with parts inside the search term. Is this possible?
$products = array(
'fruit applebag',
'pinapple',
'carrots',
'bananas',
'coconut oil',
'cabbage',
);
if( ! empty( $_POST['search'] ) ) {
$s = $_POST['search'];
$results = array();
foreach( $products as $index => $product ) {
if( preg_match( '/' . $s . '.*/i', $product, $matched ) ) {
$results[] = $matched[0];
}
}
print_r($results);
// This only returns fruit applebag if the search term is something like "fruit ap"
}

Use something like this (split the searched word into two parts and look for a match thas has characters between those two parts):
$products = array(
'fruit applebag',
'pinapple',
'carrots',
'bananas',
'coconut oil',
'cabbage',
);
$s = 'fruitbag';
$results = array();
foreach( $products as $index => $product ) {
for($i=1;$i<strlen($s);$i++){
if(preg_match( '/' . substr($s,0,$i) . '.*'.substr($s,$i+1).'/i', $product, $matched ) ) {
$results[] = $matched[0];
}
}
}
print_r($results);
Output:
Array ( [0] => fruit applebag [1] => fruit applebag )

It is possible, but could be very costly. As stated, your requirement is that any substring of the search term is potentially relevant. So take fruitbag and generate a list of all substrings:
f,r,u,i,t,b,a,g,fr,ru,ui,it,tb,ba,ag,fru,rui,uit,...,bag,...,fruit,...,fruitbag
But you probably don't want that any word with the letter a be a match. So a first approach could be to specify a minimum number of letters (e.g. 3), which will significantly limit the potential matches. But even then... Does it make sense to match fru, or rui?
A better approach would be to use a dictionary, to extract actual words or syllables from your search string. (In your case, extract fruit and bag from fruitbag).
You can find an English dictionary fairly easily.

Related

Matching best similar array element

I have an array of keywords on which i run foreach loop and match each element with specific search term. e.g. i have array like
Array(
[0] => polka dresses
[1] => polka clothes
[2] => polka dots dress
[3] => polka dots bottoms
)
and i search for the term polka in my array. it gives result when use strpos or stristr (Also tried similar_text but no results).
Issue
if i search for polka it works but, if accidentally, i type p0lka then it do not give any result.
Is there anyway to achieve this.
If you want to get most similar results of a typed word, then you can calculate Levenshtein distance between the searched word and stored words and return results which have the least distance.
You can make use of PHP's levenshtein function for this.
PHP Snippet:
<?php
$data = array(
'polka dresses',
'polka clothes',
'polka dots dress',
'polka dots bottoms',
'dummy dummy'
);
function getSimilarMatches($sentences,$search_str){
$min_distance = -1;
$closest_matches = [];
foreach($sentences as $sentence){
$min_levenshtein_dist = -1;
foreach(explode(" ",$sentence) as $word){
$levenshtein_dist = levenshtein($word,$search_str);
if($min_levenshtein_dist == -1 || $min_levenshtein_dist > $levenshtein_dist){
$min_levenshtein_dist = $levenshtein_dist;
}
}
if($min_distance == -1 || $min_distance > $min_levenshtein_dist){
$min_distance = $min_levenshtein_dist;
$closest_matches = [];
$closest_matches[] = $sentence;
}else if($min_distance === $min_levenshtein_dist){
$closest_matches[] = $sentence;
}
}
return $closest_matches;
}
print_r(getSimilarMatches($data,'polka'));
print_r(getSimilarMatches($data,'p0lka'));
Demo: https://3v4l.org/E9gea

Find most common character in PHP String

I am looking for the most efficient way to find the most common character in a php string.
I have a string that looks like this:
"aaaaabcaab"
The result should be stored in the variable $total.
So in this case $total should be equal to a
You can use this function,
function getHighest($str){
$str = str_replace(' ', '', $str);//Trims all the spaces in the string
$arr = str_split(count_chars($str.trim($str), 3));
$hStr = "";
$occ = 0;
foreach ($arr as $value) {
$oc = substr_count ($str, $value);
if($occ < $oc){
$hStr = $value;
$occ = $oc;
}
}
return $hStr;
}
Te easiest way to achieve this is:
// split the string per character and count the number of occurrences
$totals = array_count_values( str_split( 'fcaaaaabcaab' ) );
// sort the totals so that the most frequent letter is first
arsort( $totals );
// show which letter occurred the most frequently
echo array_keys( $totals )[0];
// output
a
One thing to consider is what happens in the event of a tie:
// split the string per character and count the number of occurrences
$totals = array_count_values( str_split( 'paabb' ) );
// sort the totals so that the most frequent letter is first
arsort( $totals );
// show all letters and their frequency
print_r( $totals );
// output
Array
(
[b] => 2
[a] => 2
[p] => 1
)

PHP Sophisticated String parsing

This may be able to be accomplished with a regular expression but I have no idea. What I am trying to accomplish is being able to parse a string with a given delimiter but when it sees a set of brackets it parses differently. As I am a visual learning let me show you an example of what I am attempting to achieve. (PS this is getting parsed from a url)
Given the string input:
String1,String2(data1,data2,data3),String3,String4
How can I "transform" this string into this array:
{
"String1": "String1",
"String2": [
"data1",
"data2",
"data3"
],
"String3": "String3",
"String4": "String4
}
Formatting doesn't have to be this strict as I'm just attempting to make a simple API for my project.
Obviously things like
array explode ( string $delimiter , string $string [, int $limit = PHP_INT_MAX ] )
Wouldn't work because there are commas inside the brackets as well. I've attempted manual parsing looking at each character at a time but I fear for the performance and it doesn't actually work anyway. I've pasted the gist of my attempt.
https://gist.github.com/Fudge0952/24cb4e6a4ec288a4c492
While you could try to split your initial string on commas and ignore anything in parentheses for the first split, this necessarily makes assumptions about what those string values can actually be (possibly requiring escaping/unescaping values depending on what those strings have to contain).
If you have control over the data format, though, it would be far better to just start with JSON. It's well-defined and well-supported.
You can either build an ad-hoc parser like (mostly untested):
<?php
$p = '!
[^,\(\)]+ # token: String
|, # token: comma
|\( # token: open
|\) # token: close
!x';
$input = 'String1,String2(data1,data2,data3,data4(a,b,c)),String3,String4';
preg_match_all($p, $input, $m);
// using a norewinditerator, so we can use nested foreach-loops on the same iterator
$it = new NoRewindIterator(
new ArrayIterator($m[0])
);
var_export( foo( $it ) );
function foo($tokens, $level=0) {
$result = [];
$current = null;
foreach( $tokens as $t ) {
switch($t) {
case ')':
break; // foreach loop
case '(':
if ( is_null($current) ) {
throw new Exception('moo');
}
$tokens->next();
$result[$current] = foo($tokens, $level+1);
$current = null;
break;
case ',':
if ( !is_null($current) ) {
$result[] = $current;
$current = null;
}
break;
default:
$current = $t;
break;
}
}
if ( !is_null($current) ) {
$result[] = $current;
}
return $result;
}
prints
array (
0 => 'String1',
'String2' =>
array (
0 => 'data1',
1 => 'data2',
2 => 'data3',
'data4' =>
array (
0 => 'a',
1 => 'b',
2 => 'c',
),
),
1 => 'String3',
2 => 'String4',
)
(but will most certainly fail horribly for not-well-formed strings)
or take a look at lexer/parser generator like e.g. PHP_LexerGenerator and PHP_ParserGenerator.
This is a solution with preg_match_all():
$string = 'String1,String2(data1,data2,data3),String3,String4,String5(data4,data5,data6)';
$pattern = '/([^,(]+)(\(([^)]+)\))?/';
preg_match_all( $pattern, $string, $matches );
$result = array();
foreach( $matches[1] as $key => $val )
{
if( $matches[3][$key] )
{ $add = explode( ',', $matches[3][$key] ); }
else
{ $add = $val; }
$result[$val] = $add;
}
$json = json_encode( $result );
3v4l.org demo
Pattern explanation:
([^,(]+) group 1: any chars except ‘,’ and ‘(’
(\(([^)]+)\))? group 2: zero or one occurrence of brackets wrapping:
└──┬──┘
┌──┴──┐
([^)]+) group 3: any chars except ‘,’

similar substring in other string PHP

How to check substrings in PHP by prefix or postfix.
For example, I have the search string named as $to_search as follows:
$to_search = "abcdef"
And three cases to check the if that is the substring in $to_search as follows:
$cases = ["abc def", "def", "deff", ... Other values ...];
Now I have to detect the first three cases using substr() function.
How can I detect the "abc def", "def", "deff" as substring of "abcdef" in PHP.
You might find the Levenshtein distance between the two words useful - it'll have a value of 1 for abc def. However your problem is not well defined - matching strings that are "similar" doesn't mean anything concrete.
Edit - If you set the deletion cost to 0 then this very closely models the problem you are proposing. Just check that the levenshtein distance is less than 1 for everything in the array.
This will find if any of the strings inside $cases are a substring of $to_search.
foreach($cases as $someString){
if(strpos($to_search, $someString) !== false){
// $someString is found inside $to_search
}
}
Only "def" is though as none of the other strings have much to do with each other.
Also on a side not; it is prefix and suffix not postfix.
To find any of the cases that either begin with or end with either the beginning or ending of the search string, I don't know of another way to do it than to just step through all of the possible beginning and ending combinations and check them. There's probably a better way to do this, but this should do it.
$to_search = "abcdef";
$cases = ["abc def", "def", "deff", "otherabc", "noabcmatch", "nodefmatch"];
$matches = array();
$len = strlen($to_search);
for ($i=1; $i <= $len; $i++) {
// get the beginning and end of the search string of length $i
$pre_post = array();
$pre_post[] = substr($to_search, 0, $i);
$pre_post[] = substr($to_search, -$i);
foreach ($cases as $case) {
// get the beginning and end of each case of length $i
$pre = substr($case, 0, $i);
$post = substr($case, -$i);
// check if any of them match
if (in_array($pre, $pre_post) || in_array($post, $pre_post)) {
// using the case as the array key for $matches will keep it distinct
$matches[$case] = true;
}
}
}
// use array_keys() to get the keys back to values
var_dump(array_keys($matches));
You can use array_filter function like this:
$cases = ["cake", "cakes", "flowers", "chocolate", "chocolates"];
$to_search = "chocolatecake";
$search = strtolower($to_search);
$arr = array_filter($cases, function($val) use ($search) { return
strpos( $search,
str_replace(' ', '', preg_replace('/s$/', '', strtolower($val))) ) !== FALSE; });
print_r($arr);
Output:
Array
(
[0] => cake
[1] => cakes
[3] => chocolate
[4] => chocolates
)
As you can it prints all the values you expected apart from deff which is not part of search string abcdef as I commented above.

How do I compare a huge amount of strings against the beginning of another?

I have two tables. One with a load of numbers. I then have another table with a list of prefixes( 30, 000+ ).
I need to loop through the prefixes and see if any of the numbers in table 1 starts with any of the prefixes.
This is what I have so far.
$tdata = $r->get_t_data(); //array of prefix
/*
Array
(
[0] => Array
(
[prefix] => 101
[dest] => UK
)
)
*/
$cdata = $r->get_c_data(); //array of number
/*Array
(
[0] => Array
(
[row] => 1
[num] => 441143610120
)
)*/
$temp = array();
$i=0;
$time=0;
foreach ($cdata as $ckey => $c) {
foreach ($tdata as $tkey => $t) {
$length = strlen($t['prefix']);
if (strpos($c['num'], $t['prefix'], 0, $length )) {
$temp[$i]['row']=$c['row'];
$temp[$i]['prefix']=$t['prefix'];
$temp[$i]['dialled']=$c['num'];
$temp[$i]['dest']=$t['dest'];
break;
$i++; //increment only if found
}
}
$time++;
}
so basically it loops through the numbers and then I try and match the first part of the number with the prefix.
At the moment it is returning and empty array.
Hope you can help
The best thing to do is to do the join in your sql as opposed to checking after in your PHP. To do a join with a like you can do this:
SELECT * FROM table t JOIN prefixTable p ON t.num LIKE CONCAT(p.prefix, '%')
The key is LIKE CONCAT(p.prefix, '%') that's saying combine the tables where t.num is like prefix%and in MySQL % is a wildcard since we didn't put a wild card at the front that means that the t.num column has to START with prefix
Your condition if (strpos($c['num'], $t['prefix'], 0, $length )) can return 0, which php will interpret as false. strpos should be checked like this:
if (false !== strpos($c['num'], $t['prefix'], 0, $length )) {}
use preg_grep to reduce the amount of looping/search code you have:
foreach ($table1 as $search) {
$safe_search = preg_quote($search);
$matches = preg_grep("/^$safe_search/", $prefix_array);
if (count($matches) > 0) {
echo "Found $search in the prefix array\n";
}
}

Categories