Find all substrings within a string with overlap - php

Hi im trying to find all overlapping substrings in a string here is my code its only finding nonrepeating ACA.
$haystack = "ACAAGACACATGCCACATTGTCC";
$needle = "ACA";
echo preg_match_all("/$needle/", $haystack, $matches);

You're using echo to print the return value of preg_match_all. That is, you're displaying only the number of matches found. What you probably wanted to do was something like print_r($matches);, like this:
$haystack = "ACAAGACACATGCCACATTGTCC";
$needle = "ACA";
preg_match_all("/$needle/", $haystack, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => ACA
[1] => ACA
[2] => ACA
)
)
Demo
If your real concern is that it counted ACACA only once, well, there are three things that need to be said about that:
That's basically unavoidable with regex.
You really shouldn't count this twice, as it's overlapping. It's not a true recurrence of the pattern.
That said, if you want to count that twice, you could do so with something like this:
echo preg_match_all("/(?=$needle)/", $haystack, $matches);
Output:
4
Demo

Here a script to find all occurences of a substring, also the overlapping ones.
$haystack = "ACAAGACACATGCCACATTGTCC";
$needle = "ACA";
$positions = [];
$needle_len = strlen($needle);
$haystack_len = strlen($haystack);
for ($i = 0; $i <= $haystack_len; $i++) {
if( substr(substr($haystack,$i),0,$needle_len) == $needle){
$positions[]=$i;
}
}
print_r($positions);
Output: Array ( 0, 5, 7, 14 )

Related

Matching best similar array element

I have an array of keywords on which i run foreach loop and match each element with specific search term. e.g. i have array like
Array(
[0] => polka dresses
[1] => polka clothes
[2] => polka dots dress
[3] => polka dots bottoms
)
and i search for the term polka in my array. it gives result when use strpos or stristr (Also tried similar_text but no results).
Issue
if i search for polka it works but, if accidentally, i type p0lka then it do not give any result.
Is there anyway to achieve this.
If you want to get most similar results of a typed word, then you can calculate Levenshtein distance between the searched word and stored words and return results which have the least distance.
You can make use of PHP's levenshtein function for this.
PHP Snippet:
<?php
$data = array(
'polka dresses',
'polka clothes',
'polka dots dress',
'polka dots bottoms',
'dummy dummy'
);
function getSimilarMatches($sentences,$search_str){
$min_distance = -1;
$closest_matches = [];
foreach($sentences as $sentence){
$min_levenshtein_dist = -1;
foreach(explode(" ",$sentence) as $word){
$levenshtein_dist = levenshtein($word,$search_str);
if($min_levenshtein_dist == -1 || $min_levenshtein_dist > $levenshtein_dist){
$min_levenshtein_dist = $levenshtein_dist;
}
}
if($min_distance == -1 || $min_distance > $min_levenshtein_dist){
$min_distance = $min_levenshtein_dist;
$closest_matches = [];
$closest_matches[] = $sentence;
}else if($min_distance === $min_levenshtein_dist){
$closest_matches[] = $sentence;
}
}
return $closest_matches;
}
print_r(getSimilarMatches($data,'polka'));
print_r(getSimilarMatches($data,'p0lka'));
Demo: https://3v4l.org/E9gea

PHP - Checking up on string with regex and push to array

I have two variables as follows:
$string1 = '/test/10/25';
$string2 = '/test/[0-9]+/[0-9]+
Is it possible to make PHP compare these two strings and push the actual ID's (10 and 25) into an array like so by using the regex as some sort of guidance?
Array
(
[0] => 10
[1] => 25
)
I tried playing around with preg_match() but this just puts everything into the same array key.
This will work for you
<?php
$string1 = '/test/10/25';
$string2 = '/\/test\/([0-9]+)\/([0-9]+)/';
$matches = [];
preg_match($string2, $string1, $matches);
$array = [];
for($i = 1; $i < count($matches); $i ++)
{
array_push($array, $matches[$i]);
}
print_r($array);
$array will have the matches inside. I had to change the regex string because it was not "valid" for php.

similar substring in other string PHP

How to check substrings in PHP by prefix or postfix.
For example, I have the search string named as $to_search as follows:
$to_search = "abcdef"
And three cases to check the if that is the substring in $to_search as follows:
$cases = ["abc def", "def", "deff", ... Other values ...];
Now I have to detect the first three cases using substr() function.
How can I detect the "abc def", "def", "deff" as substring of "abcdef" in PHP.
You might find the Levenshtein distance between the two words useful - it'll have a value of 1 for abc def. However your problem is not well defined - matching strings that are "similar" doesn't mean anything concrete.
Edit - If you set the deletion cost to 0 then this very closely models the problem you are proposing. Just check that the levenshtein distance is less than 1 for everything in the array.
This will find if any of the strings inside $cases are a substring of $to_search.
foreach($cases as $someString){
if(strpos($to_search, $someString) !== false){
// $someString is found inside $to_search
}
}
Only "def" is though as none of the other strings have much to do with each other.
Also on a side not; it is prefix and suffix not postfix.
To find any of the cases that either begin with or end with either the beginning or ending of the search string, I don't know of another way to do it than to just step through all of the possible beginning and ending combinations and check them. There's probably a better way to do this, but this should do it.
$to_search = "abcdef";
$cases = ["abc def", "def", "deff", "otherabc", "noabcmatch", "nodefmatch"];
$matches = array();
$len = strlen($to_search);
for ($i=1; $i <= $len; $i++) {
// get the beginning and end of the search string of length $i
$pre_post = array();
$pre_post[] = substr($to_search, 0, $i);
$pre_post[] = substr($to_search, -$i);
foreach ($cases as $case) {
// get the beginning and end of each case of length $i
$pre = substr($case, 0, $i);
$post = substr($case, -$i);
// check if any of them match
if (in_array($pre, $pre_post) || in_array($post, $pre_post)) {
// using the case as the array key for $matches will keep it distinct
$matches[$case] = true;
}
}
}
// use array_keys() to get the keys back to values
var_dump(array_keys($matches));
You can use array_filter function like this:
$cases = ["cake", "cakes", "flowers", "chocolate", "chocolates"];
$to_search = "chocolatecake";
$search = strtolower($to_search);
$arr = array_filter($cases, function($val) use ($search) { return
strpos( $search,
str_replace(' ', '', preg_replace('/s$/', '', strtolower($val))) ) !== FALSE; });
print_r($arr);
Output:
Array
(
[0] => cake
[1] => cakes
[3] => chocolate
[4] => chocolates
)
As you can it prints all the values you expected apart from deff which is not part of search string abcdef as I commented above.

Find all the occurrence points of a letter within a string

I have the following code:
<?php
$word = "aeagle";
$letter = "e";
$array = strposall($aegle, $letter);
print_r($array);
function strposall($haystack, $needle) {
$occurrence_points = array();
$pos = strpos($haystack, $needle);
if ($pos !== false) {
array_push($occurrence_points, $pos);
}
while ($pos = strpos($haystack, $needle, $pos + 1)) {
array_push($occurrence_points, $pos);
}
return $occurrence_points;
}
?>
As in the example, if I have aegle as my word and I'm searching for e within it, the function should return an array with the values 1 and 4 in it.
What's wrong with my code?
Why not trying instead
$word = "aeagle";
$letter = "e";
$occurrence_points = array_keys(array_intersect(str_split($word), array($letter)));
var_dump($occurrence_points);
I think you're passing the wrong parameters, shouild be $word instead of $aegle
Little bit more literal than the other answer:
function charpos($str, $char) {
$i = 0;
$pos = 0;
$matches = array();
if (strpos($str, $char) === false) {
return false;
}
while (!!$str) {
$pos = strpos($str, $char);
if ($pos === false) {
$str = '';
} else {
$i = $i + $pos;
$str = substr($str, $pos + 1);
array_push($matches, $i++);
}
}
return $matches;
}
https://ignite.io/code/511ff26eec221e0741000000
Using:
$str = 'abc is the place to be heard';
$positions = charpos($str, 'a');
print_r($positions);
while ($positions) {
$i = array_shift($positions);
echo "$i: $str[$i]\n";
}
Which gives:
Array (
[0] => 0
[1] => 13
[2] => 25
)
0: a
13: a
25: a
Other's have pointed out you're passing the wrong parameters. But you're also reinventing the wheel. Take a look at php's regular expression match-all (whoops, had linked the wrong function), it will already return an array of all matches with offsets, when used with the following flag.
flags
flags can be the following flag:
PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the value of matches into an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1.
Use a single letter pattern for the search term $letter = '/e/' and you should get back an array with all your positions as the second element of each result array, which you can then finagle into the output format you're looking for.
Update: Jared points out that you do get the capture of the pattern back, but with the flag set, you also get the offset. As a direct answer to the OP's question, try this code:
$word = "aeagle";
$pattern = "/e/";
$matches = array();
preg_match_all($pattern, $word, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
It has the following ouput:
Array
(
// Matches of the first pattern: /e/
[0] => Array
(
// First match
[0] => Array
(
// Substring of $word that matched
[0] => e
// Offset into $word where previous substring starts
[1] => 1
)
[1] => Array
(
[0] => e
[1] => 5
)
)
)
The results are 3D instead of 2D because preg_match_all can match multiple patterns at once. The hits are for the first (and in this case: only) pattern supplied and are thus in the first array.
And unlike the OP originally stated, 1 and 5 are the correct indexes of the letter e in the string 'aeagle'
aeagle
012345
^ ^
1 5
Performance wise, the customized version of strposall would probably be faster than a regular expression match. But learning to use an in-built function is almost always faster than developing, testing, supporting and maintaining your own code. And 9 times out of 10, that's the most expensive part of programming.

How do I display only match keyword in array

This is my array output
Array
(
[0] => Array
(
[tweet_text] => Fedora 16 "Verne" released! http://t.co/lECbdzE0 #Fedora #Linux
)
[1] => Array
(
[tweet_text] => Ubuntu 11.10 "Oneiric Ocelot" released! #Ubuntu #Linux
)
)
Example to find Ubuntu keyword. From the current array how do I filter to show only
Array ( [1] => Array (
[tweet_text] => Ubuntu 11.10 "Oneiric Ocelot" released! #Ubuntu #Linux
)
)
The code
$keywords = array('Ubuntu');
foreach ($keywords as &$keyword) {
$keyword = preg_quote($keyword);
}
$regex = "/(" . implode('|', $keywords) . ")/";
$check = preg_match($regex, $anArray);
if($check == 1) {
// here I want to display only Ubuntu
}
Let me know
preg_grep — Return array entries that match the pattern
example:-
$arr = array('k'=>'ubuntu', 'j'=>'ubuntu1', 'n'=>'fedorra');
$matches = preg_grep('/ubuntu/i', $arr);
if you original source is an multi-dimensional array,
you can try :-
$cmp = array();
foreach ($src as $key=>$arr)
{
$cmp[$key] = $arr['tweet_text'];
}
$matches = preg_grep('/ubuntu/i', $cmp);
// $matches will be an associate array contains the matches
// and $matches and $src are using same index key
There are a few approaches here, basically you could look at your current function: preg_match. By the way i don't think you can put a array into the subject parameter if it requires a string. For now i will guess you are putting a string there with the wrong name.
You could use it also to save the matches found like so:
$check = preg_match($regex, $string, $matches);
print_r($matches);
If it is an array you should approach it like a single result and loop trough it (there are betters ways, but this is an approach you are using and i try to teach you this better.
// Your code ....
foreach($anArray as $rule) {
$check = preg_match($regex, $anArray, $matches);
if($check == 1) {
echo print_r($matches);
}
}

Categories