similar substring in other string PHP

similar substring in other string PHP - php

How to check substrings in PHP by prefix or postfix.
For example, I have the search string named as $to_search as follows:
$to_search = "abcdef"
And three cases to check the if that is the substring in $to_search as follows:
$cases = ["abc def", "def", "deff", ... Other values ...];
Now I have to detect the first three cases using substr() function.
How can I detect the "abc def", "def", "deff" as substring of "abcdef" in PHP.

You might find the Levenshtein distance between the two words useful - it'll have a value of 1 for abc def. However your problem is not well defined - matching strings that are "similar" doesn't mean anything concrete.
Edit - If you set the deletion cost to 0 then this very closely models the problem you are proposing. Just check that the levenshtein distance is less than 1 for everything in the array.

This will find if any of the strings inside $cases are a substring of $to_search.
foreach($cases as $someString){
if(strpos($to_search, $someString) !== false){
// $someString is found inside $to_search
}
}
Only "def" is though as none of the other strings have much to do with each other.
Also on a side not; it is prefix and suffix not postfix.

To find any of the cases that either begin with or end with either the beginning or ending of the search string, I don't know of another way to do it than to just step through all of the possible beginning and ending combinations and check them. There's probably a better way to do this, but this should do it.
$to_search = "abcdef";
$cases = ["abc def", "def", "deff", "otherabc", "noabcmatch", "nodefmatch"];
$matches = array();
$len = strlen($to_search);
for ($i=1; $i <= $len; $i++) {
// get the beginning and end of the search string of length $i
$pre_post = array();
$pre_post[] = substr($to_search, 0, $i);
$pre_post[] = substr($to_search, -$i);
foreach ($cases as $case) {
// get the beginning and end of each case of length $i
$pre = substr($case, 0, $i);
$post = substr($case, -$i);
// check if any of them match
if (in_array($pre, $pre_post) || in_array($post, $pre_post)) {
// using the case as the array key for $matches will keep it distinct
$matches[$case] = true;
}
}
}
// use array_keys() to get the keys back to values
var_dump(array_keys($matches));

You can use array_filter function like this:
$cases = ["cake", "cakes", "flowers", "chocolate", "chocolates"];
$to_search = "chocolatecake";
$search = strtolower($to_search);
$arr = array_filter($cases, function($val) use ($search) { return
strpos( $search,
str_replace(' ', '', preg_replace('/s$/', '', strtolower($val))) ) !== FALSE; });
print_r($arr);
Output:
Array
(
[0] => cake
[1] => cakes
[3] => chocolate
[4] => chocolates
)
As you can it prints all the values you expected apart from deff which is not part of search string abcdef as I commented above.

Related

How to check if words can be created from list of letters?

I have a string $raw="aabbcdfghmnejaachto" and an array $word_array=array('cat','rat','goat','total','egg').
My program needs to check whether it is possible to make the words in the array with letters from the string. There is one extra condition; if the word contains a letter occurring more than once, that letter must occur at least the same number of times in the string.
E.g. egg. There are two g's. If the string $raw doesn't contain two g's, then it's not possible to make this word.
This is my expected result:
Array([cat]=>'Yes',[rat]=>'No',[goat]=>'Yes',[total]=>'No',[egg]=>'No')
I tried the following, but it doesn't output the expected result:
$res=array();
$raw="aabbcdfghmnejaachto";
$word_array=array('cat','rat','goat','total','egg');
$raw_array= str_split($raw);
foreach($word_array as $word=>$value)
{
$word_value= str_split($value);
foreach($word_value as $w=>$w_value)
{
foreach($raw_array as $raw=>$raw_value)
{
if(strcmp($w_value,$raw_value)==0)
{
$res[$value]='Yes';
}
else
{
$res[$value]='No';
}
}
}
}
print_r($res);
EDIT: The code, as originally posted, was missing the letter e from the string $raw so the egg example would actually return No. I have updated the Question and all the Answers to reflect this. - robinCTS

You must loop through each word/element in the $words array, then loop again through each character of each word.
Upon each iteration of the outer loop, set the default result value to Yes.
Then you must iterate each unique character of the current word. (array_count_values())
Check if the number of occurrences of the current character in the word is greater than the number of occurrences of the current character in the string of letters.
*As a matter of performance optimization, array_count_values() is used on the inner loop to avoid any unnecessary iterations of duplicate letters in $word. The $count variable saves having to make two substr_count() calls in the if statement.
Code: (Demo)
$string = "aabbcdfghmnejaachto";
$words = array('cat','rat','goat','total','egg');
foreach ($words as $word) { // iterate each word
$result[$word]='Yes'; // set default result value
foreach (array_count_values(str_split($word)) as $char=>$count) { // iterate each unique letter in word
if ($count > substr_count($string, $char)) { // compare current char's count vs same char's count in $string
$result[$word]='No'; // if more of the character in word than available in $string, set No
break; // make early exit from inner loop, to avoid unnecessary iterations
}
}
}
var_export($result);
This is the output :
array (
'cat' => 'Yes',
'rat' => 'No',
'goat' => 'Yes',
'total' => 'No',
'egg' => 'No',
)
BIG THANKYOU to mickmackusa for hijacking significantly enhancing this answer.

Your problem is you are not counting the number of times each character occurs in the $raw array, you are just checking each character in each of the words to see if that character exists in $raw. Unless you put in some form of counting, or else make a copy of $raw for each word and remove letters as they are used, you are not going to be able to do this.

I have counted occurrences of characters in string and compare that number of occurrence! You can find this answer working!!!
$res=array();
$raw="aabbcdfghmnejaachto"; //tgrel -- to make all yes
$res=array();
$word_array=array('cat','rat','goat','total','egg');
$raw_array= str_split($raw);
$count_raw = array_count_values($raw_array);
foreach($word_array as $value)
{
$word_value= str_split($value);
$newArray = array_count_values($word_value);
$res[$value]='yes';
foreach($newArray as $char=>$number){
if(!isset($count_raw[$char]) || $count_raw[$char]<$number){
$res[$value]='No';
break;
}
}
}
print_r($res);

Your error here is obvious, that you decided whether a value a word is accepted or not on individual tests of characters, while it should be based on the all the letter of the word , you don't need to precise both the key and value of an array if you need only its value
as in
foreach($word_array as $value)
then I've found that the use of the function in_array(), make the code much clearer
$res=array();
$raw="aabbcdfghmnejaachto";
$res=array();
$word_array=array('cat','rat','goat','total','egg');
$raw_array= str_split($raw);
foreach($word_array as $value)
{
$word_value= str_split($value);
$res[$value]='yes';
foreach($word_value as $w_value)
{
if (!in_array($w_value,$raw_array))
$res[$value]='No';
}
}
print_r($res);

Lets try to make it w/o loops, but with closures:
$raw = "aabbcdfghmnejaachto";
$word_array = ['cat', 'rat', 'goat', 'total', 'egg'];
$result = [];
$map = count_chars($raw, 1);
array_walk(
$word_array,
function ($word) use ($map, &$result) {
$result[$word] = !array_udiff_assoc(
count_chars($word, 1), $map, function ($i, $j) { return $i > $j; }
) ? 'Yes' : 'No';
}
);
We are building a map of symbols, used in original string with count_chars($raw, 1), so it will look like this.
$map:
[
97 => 4, // "97" is a code for "a"; and "4" - occurrence number.
98 => 2,
...
]
array_walk goes through words and adds each of them in a final $result with a Yes or No values that come from a comparison with a map, that was built for a word.
array_udiff_assoc compares two maps, throwing away those elements that have the same key and values bigger for an original map (comparing with a map for a word). Also array_udiff_assoc() returns an array containing all the values from array1 that are not present in any of the other arguments, so the final step is a negation operation preceding array_udiff_assoc.
Demo

Try this
$res=array();
$word_array=array('cat','rat','goat','total','egg');
$raw="aabbcrdfghmnejaachtol";
foreach($word_array as $word=>$value)
{
$raw_array= str_split($raw);
$res[$value]='Yes';
$word_value= str_split($value);
foreach($word_value as $w=>$w_value)
{
if(!in_array($w_value,$raw_array))
{
$res[$value]='No';
}
else
{
unset($raw_array[array_search($w_value, $raw_array)]);
}
}
}
This will not allow character again, if it is used once Like "total".

We can check to see if each letter from each word is within the letters given, and pluck found letters out as we go.
The function below short circuits if a letter is not found.
<?php
function can_form_word_from_letters($word, $letters) {
$letters = str_split($letters);
$word_letters = str_split($word);
foreach($word_letters as $letter) {
$key = array_search($letter, $letters);
if($key === false) return;
unset($letters[$key]); // Letter found, now remove it from letters.
}
return true;
}
$letters = "aabbcdfghmnejaachto";
$words = array('cat','rat','goat','total','egg');
foreach($words as $word) {
$result[$word] = can_form_word_from_letters($word, $letters) ? 'Yes' : 'No';
}
var_dump($result);
Output:
array (size=5)
'cat' => string 'Yes' (length=3)
'rat' => string 'No' (length=2)
'goat' => string 'Yes' (length=3)
'total' => string 'No' (length=2)
'egg' => string 'No' (length=2)

Decoding anagram with recursive function doesn't give expected output

So I'm trying to decode an anagram into words from my dictionary file. But my recursive function isn't behaving like I'm expecting.
The thoughts about the code is to eliminate letters as they are used on words and output me the string it came up with.
<?php
function anagram($string, $wordlist)
{
if(empty($string))
return;
foreach($wordlist as $line)
{
$line = $org = trim($line);
$line = str_split($line);
sort($line);
foreach($line as $key => $value)
{
if($value != $string[$key])
{
continue 2;
}
}
echo $org . anagram(array_slice($string, count($line)), $wordlist);
}
echo PHP_EOL;
}
$string = "iamaweakishspeller";
$string = str_split($string);
sort($string);
$file = file('wordlist');
anagram($string, $file);
This is my result for now, it looks awful, but I'm having some issues with the code - it's going into an indefinite loop with the same roughly 200 words from the word list.
Can someone take an extra peak at this?

Situation
You have a dictionary(file) and an anagram which contains one or multiple words. The anagram doesn't contain any punctuation or letter case of the original word(s).
Now you want to find all true solutions where you use up all characters of the anagram and decode it into word(s) from the dictionary.
Note: There is a chance that you find multiple solutions and you will never know which one the original text was and in which order the words were, since the characters of multiple words are mixed in the anagram and you don't have punctuation or the case of the letters in it.
Your code
The problem in your current code is exactly that you have multiple words mixed together. If you sort them now and you want to search them in the dictionary you won't be able to find them, since the characters of multiple words are mixed. Example:
anagram = "oatdgc" //"cat" + "dog"
wordList = ["cat", "dog"]
wordListSorted = ["act", "dgo"]
anagramSorted = acdgot
↓↓↓
WordListSorted[0] → cat ✗ no match
WordListSorted[1] → dog ✗ no match
Solution
First I will explain in theory how we construct all possible true solutions and then I explain how every part in the code works.
Theory
So to start we have an anagram and a dictionary. Now we first filter the dictionary by the anagram and only keep the words, which can be constructed by the anagram.
Then we go through all words and for each word we add it to a possible solution, remove it from the anagram, filter the dictionary by the new anagram and call the function with the new values recursively.
We do this until either the anagram is empty and we found a true solution, which we add to our solution collection, or there are no words remaining and it is not a possible solution.
Code
We have two helper functions array_diff_once() and preSelectWords() in our code.
array_diff_once() is pretty much the same as the built-in array_diff() function, except that it only removes values once and not all occurrences. Otherwise there isn't much to explain. It simply loops through the second array and removes the values once in the first array, which then gets returned.
function array_diff_once($arrayOne, $arrayTwo){
foreach($arrayTwo as $v) {
if(($key = array_search($v, $arrayOne)) !== FALSE)
array_splice($arrayOne, $key, 1);
}
return $arrayOne;
}
preSelectWords() takes an anagram and a word list as argument. It simply checks with the help of array_diff_once(), which words of the word list can be constructed with the given anagram. Then it returns all possible words from the word list, which can be constructed with the anagram.
function preSelectWords($anagram, $wordList){
$tmp = [];
foreach($wordList as $word){
if(!array_diff_once(str_split(strtolower($word)), $anagram))
$tmp[] = $word;
}
return $tmp;
}
Now to the main function decodeAnagram(). We pass the anagram and a word list, which we first filter with preSelectWords(), as arguments to the function.
In the function itself we basically just loop through the words and for each word we remove it from the anagram, filter the word list by the new anagram and add the word to a possible solution and call the function recursively.
We do this until either the anagram is empty and we found a true solution, which we add to our solution array, or there are no words left in the list and with that no possible solution.
function decodeAnagram($anagram, $wordList, $solution, &$solutions = []){
if(empty($anagram) && sort($solution) && !isset($solutions[$key = implode($solution)])){
$solutions[$key] = $solution;
return;
}
foreach($wordList as $word)
decodeAnagram(array_diff_once($anagram, str_split(strtolower($word))), preSelectWords(array_diff_once($anagram, str_split(strtolower($word))), $wordList), array_merge($solution, [$word]), $solutions);
}
Code
<?php
function decodeAnagram($anagram, $wordList, $solution, &$solutions = []){
if(empty($anagram) && sort($solution) && !isset($solutions[$key = implode($solution)])){
$solutions[$key] = $solution;
return;
}
foreach($wordList as $word)
decodeAnagram(array_diff_once($anagram, str_split(strtolower($word))), preSelectWords(array_diff_once($anagram, str_split(strtolower($word))), $wordList), array_merge($solution, [$word]), $solutions);
}
function preSelectWords($anagram, $wordList){
$tmp = [];
foreach($wordList as $word){
if(!array_diff_once(str_split(strtolower($word)), $anagram))
$tmp[] = $word;
}
return $tmp;
}
function array_diff_once($arrayOne, $arrayTwo){
foreach($arrayTwo as $v) {
if(($key = array_search($v, $arrayOne)) !== FALSE)
array_splice($arrayOne, $key, 1);
}
return $arrayOne;
}
$solutions = [];
$anagram = "aaaeeehiikllmprssw";
$wordList = ["I", "am", "a", "weakish", "speller", "William", "Shakespeare", "other", "words", "as", "well"];
//↑ file("wordlist", FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES)
decodeAnagram(str_split(strtolower($anagram)), preSelectWords(str_split(strtolower($anagram)), $wordList), [], $solutions);
print_r($solutions);
?>
Output
Array
(
[Iaamspellerweakish] => Array
(
[0] => I
[1] => a
[2] => am
[3] => speller
[4] => weakish
)
[ShakespeareWilliam] => Array
(
[0] => Shakespeare
[1] => William
)
)
(Ignore the keys here, since those are the identifiers of the solutions)

Find all the occurrence points of a letter within a string

I have the following code:
<?php
$word = "aeagle";
$letter = "e";
$array = strposall($aegle, $letter);
print_r($array);
function strposall($haystack, $needle) {
$occurrence_points = array();
$pos = strpos($haystack, $needle);
if ($pos !== false) {
array_push($occurrence_points, $pos);
}
while ($pos = strpos($haystack, $needle, $pos + 1)) {
array_push($occurrence_points, $pos);
}
return $occurrence_points;
}
?>
As in the example, if I have aegle as my word and I'm searching for e within it, the function should return an array with the values 1 and 4 in it.
What's wrong with my code?

Why not trying instead
$word = "aeagle";
$letter = "e";
$occurrence_points = array_keys(array_intersect(str_split($word), array($letter)));
var_dump($occurrence_points);

I think you're passing the wrong parameters, shouild be $word instead of $aegle

Little bit more literal than the other answer:
function charpos($str, $char) {
$i = 0;
$pos = 0;
$matches = array();
if (strpos($str, $char) === false) {
return false;
}
while (!!$str) {
$pos = strpos($str, $char);
if ($pos === false) {
$str = '';
} else {
$i = $i + $pos;
$str = substr($str, $pos + 1);
array_push($matches, $i++);
}
}
return $matches;
}
https://ignite.io/code/511ff26eec221e0741000000
Using:
$str = 'abc is the place to be heard';
$positions = charpos($str, 'a');
print_r($positions);
while ($positions) {
$i = array_shift($positions);
echo "$i: $str[$i]\n";
}
Which gives:
Array (
[0] => 0
[1] => 13
[2] => 25
)
0: a
13: a
25: a

Other's have pointed out you're passing the wrong parameters. But you're also reinventing the wheel. Take a look at php's regular expression match-all (whoops, had linked the wrong function), it will already return an array of all matches with offsets, when used with the following flag.
flags
flags can be the following flag:
PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the value of matches into an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1.
Use a single letter pattern for the search term $letter = '/e/' and you should get back an array with all your positions as the second element of each result array, which you can then finagle into the output format you're looking for.
Update: Jared points out that you do get the capture of the pattern back, but with the flag set, you also get the offset. As a direct answer to the OP's question, try this code:
$word = "aeagle";
$pattern = "/e/";
$matches = array();
preg_match_all($pattern, $word, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
It has the following ouput:
Array
(
// Matches of the first pattern: /e/
[0] => Array
(
// First match
[0] => Array
(
// Substring of $word that matched
[0] => e
// Offset into $word where previous substring starts
[1] => 1
)
[1] => Array
(
[0] => e
[1] => 5
)
)
)
The results are 3D instead of 2D because preg_match_all can match multiple patterns at once. The hits are for the first (and in this case: only) pattern supplied and are thus in the first array.
And unlike the OP originally stated, 1 and 5 are the correct indexes of the letter e in the string 'aeagle'
aeagle
012345
^ ^
1 5
Performance wise, the customized version of strposall would probably be faster than a regular expression match. But learning to use an in-built function is almost always faster than developing, testing, supporting and maintaining your own code. And 9 times out of 10, that's the most expensive part of programming.

Split a string, remember the positions of splitting

Assume I have the following string:
I have | been very busy lately and need to go | to bed early
By splitting on "|", you get:
$arr = array(
[0] => I have
[1] => been very busy lately and need to go
[2] => to bed early
)
The first split is after 2 words, and the second split 8 words after that. The positions after how many words to split will be stored: array(2, 8, 3). Then, the string is imploded to be passed on to a custom string tagger:
tag_string('I have been very busy lately and need to go to bed early');
I don't know what the output of tag_string will be exactly, except that the total words will remain the same. Examples of output would be:
I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p
I-ee have been-vb very busy-df lately-nn and need-f to go to bed-uu early-yy
This will lengthen the string by an unknown number of characters. I have no control over tag_string. What I know is (1) the number of words will be the same as before and (2) the array was split after 2, and thereafter after 8 words, respectively. I now need a solution explode the tagged string into the same array as before:
$string = "I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p"
function split_string_again() {
// split after 2nd, and thereafter after 8th word
}
With output:
$arr = array(
[0] => I have-nn
[1] => been-vb very-vb busy lately and-rr need to-r go
[2] => to bed early-p
)
So to be clear (I wasn't before): I cannot split by remembering the strpos, because strpos before and after the string went through the tagger, aren't the same. I need to count the number of words. I hope I have made myself more clear :)

You wouldn't want to count the number of words, you would want to count the string length (strlen). If it is the same string without the pipes, then you want to split it with substr after a certain amount.
$strCounts = array();
foreach ($arr as $item) {
$strCounts[] = strlen($item);
}
// Later on.
$arr = array();
$i = 0;
foreach ($strCounts as $count) {
$arr[] = substr($string, $i, $count);
$i += $count; // increment the start position by the length
}
I have not tested this, simply a "theory" and probably has some kinks to work out. There may be a better way to go about it, I just don't know it.

Interesting question, although I think the rope data structure still applies it might be a little overkill since word placement won't change. Here is my solution:
$str = "I have | been very busy lately and need to go | to bed early";
function get_breaks($str)
{
$breaks = array();
$arr = explode("|", $str);
foreach($arr as $val)
{
$breaks[] = str_word_count($val);
}
return $breaks;
}
$breaks = get_breaks($str);
echo "<pre>" . print_r($breaks, 1) . "</pre>";
$str = str_replace("|", "", $str);
function rebreak($str, $breaks)
{
$return = array();
$old_break = 0;
$arr = str_word_count($str, 1);
foreach($breaks as $break)
{
$return[] = implode(" ", array_slice($arr, $old_break, $break));
$old_break += $break;
}
return $return;
}
echo "<pre>" . print_r(rebreak($str, $breaks), 1) . "</pre>";
echo "<pre>" . print_r(rebreak("I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p", $breaks), 1) . "</pre>";
Let me know if you have any questions, but it is pretty self explanatory. There are definitely ways to improve this as well.

I'm not quite sure I understood what you actually wanted to achieve. But here are a couple of things that might help you:
str_word_count() counts the number of words in a string. preg_match_all('/\p{L}[\p{L}\p{Mn}\p{Pd}\x{2019}]*/u', $string, $foo); does pretty much the same, but on UTF-8 strings.
strpos() finds the first occurrence of a string within another. You could easily find the positions of all | with this:
$pos = -1;
$positions = array();
while (($pos = strpos($string, '|', $pos + 1)) !== false) {
$positions[] = $pos;
}
I'm still not sure I understood why you can't just use explode() for this, though.
<?php
$string = 'I have | been very busy lately and need to go | to bed early';
$parts = explode('|', $string);
$words = array();
foreach ($parts as $s) {
$words[] = str_word_count($s);
}

Regex for number comparison?

I would like to perform regex to return true/false if the input 5 digit from input matching data in database, no need to cater of the sequence, but need the exact numbers.
Eg: In database I have 12345
When I key in a 5 digit value into search, I want to find out whether it is matching the each number inside the 12345.
If I key in 34152- it should return true
If I key in 14325- it should return true
If I key in 65432- it should return false
If I key in 11234- it should return false
Eg: In database I have 44512
If I key in 21454- it should return true
If I key in 21455- it should return false
How to do this using php with regex

This is a way avoiding regex
<?php
function cmpkey($a,$b){
$aa = str_split($a); sort($aa);
$bb = str_split($b); sort($bb);
return ( implode("",$aa) == implode("",$bb));
}
?>

Well, it's not going to be a trivial regex, I can tell you that. You could do something like this:
$chars = count_chars($input, 1);
$numbers = array();
foreach ($chars as $char => $frequency) {
if (is_numeric(chr($char))) {
$numbers[chr($char)] = $frequency;
}
}
// that converts "11234" into array(1 => 2, 2 => 1, 3 => 1, 4 => 1)
Now, since MySQL doesn't support assertions in regex, you'll need to do this in multiple regexes:
$where = array();
foreach ($numbers AS $num => $count) {
$not = "[^$num]";
$regex = "^";
for ($i = 0; $i < $count; $i++) {
$regex .= "$not*$num";
}
$regex .= "$not*";
$where[] = "numberField REGEXP '$regex'";
}
$where = '((' . implode(') AND (', $where).'))';
That'll produce:
(
(numberField REGEXP '^[^1]*1[^1]*1[^1]*$')
AND
(numberField REGEXP '^[^2]*2[^2]*$')
AND
(numberField REGEXP '^[^3]*3[^3]*$')
AND
(numberField REGEXP '^[^4]*4[^4]*$')
)
That should do it for you.
It's not pretty, but it should take care of all of the possible permutations for you, assuming that your stored data format is consistent...
But, depending on your needs, you should try to pull it out and process it in PHP. In which case the regex would be far simpler:
^(?=.*1.*1})(?=.*2)(?=.*3)(?=.*4)\d{5}$
Or, you could also pre-sort the number before you insert it. So instead of inserting 14231, you'd insert 11234. That way, you always know the sequence is ordered properly, so you just need to do numberField = '11234' instead of that gigantic beast above...

Try using
^(?=.*1)(?=.*2)(?=.*3)(?=.*4)(?=.*5).{5}$
This will get much more complicated, when you have duplicate numbers.
You really should not do this with regex. =)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

similar substring in other string PHP - php

Related

How to check if words can be created from list of letters?

Decoding anagram with recursive function doesn't give expected output

Find all the occurrence points of a letter within a string

Split a string, remember the positions of splitting

Regex for number comparison?

Categories

Resources