strpos() with multiple needles? - php

I am looking for a function like strpos() with two significant differences:
To be able to accept multiple needles. I mean thousands of needles at ones.
To search for all occurrences of the needles in the haystack and to return an array of starting positions.
Of course it has to be an efficient solution not just a loop through every needle. I have searched through this forum and there were similar questions to this one, like:
Using an array as needles in strpos
Define multiple needles using stripos
Can't search an array in PHP in_array for the presence of multiple needles
but nether of them was what I am looking for. I am using strpos just to illustrate my question better, probably something entirely different has to be used for this purpose.
I am aware of Zend_Search_Lucene and I am interested if it can be used to achieve this and how (just the general idea)?
Thanks a lot for Your help and time!

try preg match for multiple
if (preg_match('/word|word2/i', $str))
Checking for multiple strpos values

Here's some sample code for my strategy:
function strpos_array($haystack, $needles, $offset=0) {
$matches = array();
//Avoid the obvious: when haystack or needles are empty, return no matches
if(empty($needles) || empty($haystack)) {
return $matches;
}
$haystack = (string)$haystack; //Pre-cast non-string haystacks
$haylen = strlen($haystack);
//Allow negative (from end of haystack) offsets
if($offset < 0) {
$offset += $heylen;
}
//Use strpos if there is no array or only one needle
if(!is_array($needles)) {
$needles = array($needles);
}
$needles = array_unique($needles); //Not necessary if you are sure all needles are unique
//Precalculate needle lengths to save time
foreach($needles as &$origNeedle) {
$origNeedle = array((string)$origNeedle, strlen($origNeedle));
}
//Find matches
for(; $offset < $haylen; $offset++) {
foreach($needles as $needle) {
list($needle, $length) = $needle;
if($needle == substr($haystack, $offset, $length)) {
$matches[] = $offset;
break;
}
}
}
return($matches);
}
I've implemented a simple brute force method above that will work with any combination of needles and haystacks (not just words). For possibly faster algorithms check out:
Aho–Corasick string matching algorithm
Other Solution
function strpos_array($haystack, $needles, $theOffset=0) {
$matches = array();
if(empty($haystack) || empty($needles)) {
return $matches;
}
$haylen = strlen($haystack);
if($theOffset < 0) { // Support negative offsets
$theOffest += $haylen;
}
foreach($needles as $needle) {
$needlelen = strlen($needle);
$offset = $theOffset;
while(($match = strpos($haystack, $needle, $offset)) !== false) {
$matches[] = $match;
$offset = $match + $needlelen;
if($offset >= $haylen) {
break;
}
}
}
return $matches;
}

I know this doesn't answer the OP's question but wanted to comment since this page is at the top of Google for strpos with multiple needles. Here's a simple solution to do so (again, this isn't specific to the OP's question - sorry):
$img_formats = array('.jpg','.png');
$missing = array();
foreach ( $img_formats as $format )
if ( stripos($post['timer_background_image'], $format) === false ) $missing[] = $format;
if (count($missing) == 2)
return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));
If 2 items are added to the $missing array that means that the input doesn't satisfy any of the image formats in the $img_formats array. At that point you know that you can return an error, etc. This could easily be turned into a little function:
function m_stripos( $haystack = null, $needles = array() ){
//return early if missing arguments
if ( !$needles || !$haystack ) return false;
// create an array to evaluate at the end
$missing = array();
//Loop through needles array, and add to $missing array if not satisfied
foreach ( $needles as $needle )
if ( stripos($haystack, $needle) === false ) $missing[] = $needle;
//If the count of $missing and $needles is equal, we know there were no matches, return false..
if (count($missing) == count($needles)) return false;
//If we're here, be happy, return true...
return true;
}
Back to our first example using then the function instead:
$needles = array('.jpg','.png');
if ( !m_strpos( $post['timer_background_image'], $needles ) )
return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));
Of course, what you do after the function returns true or false is up to you.

It seems you are searching for whole words. In this case, something like this might help. As it uses built-in functions, it should be faster than custom code, but you have to profile it:
$words = str_word_count($str, 2);
$word_position_map = array();
foreach($words as $position => $word) {
if(!isset($word_position_map[$word])) {
$word_position_map[$word] = array();
}
$word_position_map[$word][] = $position;
}
// assuming $needles is an array of words
$result = array_intersect_key($word_position_map, array_flip($needles));
Storing the information (like the needles) in the right format will improve the runtime ( e.g. as you don't have to call array_flip).
Note from the str_word_count documentation:
For the purpose of this function, 'word' is defined as a locale dependent string containing alphabetic characters, which also may contain, but not start with "'" and "-" characters.
So make sure you set the locale right.

You could use a regular expression, they support OR operations. This would however make it fairly slow, compared to strpos.

How about a simple solution using array_map()?
$string = 'one two three four';
$needles = array( 'five' , 'three' );
$strpos_arr = array_map( function ( $check ) use ( $string ) {
return strpos( $string, $check );
}, $needles );
As return, you're going to have an array where the keys are the needles positions and the values are the starting positions, if found.
//print_r( $strpos_arr );
Array
(
[0] =>
[1] => 8
)

Related

Filter an array of words by an array of single letters it should ONLY contain

I currently have two PHP arrays:
array('a','c','r','r')
array('carr','car','arc','ra','c','abc','do','aa','rr')
My desired result is:
array('carr','car','arc','ra','c','rr')
i.e. filtering out 'abc','do','aa' because I wish to filter out words that:
don't contain a, c or r
contain more/multiple a,c,r than I have in array 1.
I have tried array_filter() but I don't seem to be able to make it work.
One way to do this:
Count how many times each letter occurs in your first array, using array_count_values.
Then in your array_filter callback function, split each word into individual letters, and also count how many times each of them occurs. Then loop over those counted letters. If the current letter does not occur in your letter-count-array, or its count is greater than that in your letter-count-array, return false.
$letters = ['a','c','r','r'];
$words = ['carr','car','arc','ra','c','abc','do','aa','rr'];
$letterCounts = array_count_values($letters);
$filtered = array_filter($words, function($word) use ($letterCounts) {
$wordLetterCounts = array_count_values(mb_str_split($word));
foreach($wordLetterCounts as $wordLetter => $count) {
if(!isset($letterCounts[$wordLetter]) || $letterCounts[$wordLetter] < $count) {
return false;
}
}
return true;
});
var_dump($filtered);
As you iterate the array of words, you can iterate the array of letters and make single-letter replacements. If all letters in the word are consumed, the word is saved.
A regular expression isn't actually necessary because the letter is literal, but preg_replace() offers a limiting parameter and str_replace() doesn't.
Code: (Demo)
$needles = ['a','c','r','r'];
$haystacks = ['carr','car','arc','ra','c','abc','do','aa','rr', 'rrr'];
$result = [];
foreach ($haystacks as $i => $haystack) {
foreach ($needles as $needle) {
$haystack = preg_replace("/$needle/", '', $haystack, 1);
}
if (!$haystack) {
$result[] = $haystacks[$i];
}
}
var_export($result);
The above can actually be boiled down to this: (Demo)
$regexes = array_map(fn($v) => "/$v/", $needles);
var_export(
array_filter(
$haystacks,
fn($hay) => !preg_replace($regexes, '', $hay, 1)
)
);

How to search combination of numbers in array

I need to search and print all the matches input numbers in a combination of numbers in an array.
My array looks like this:
$ar = ['01-05-24-30-35-36', '25-27-32-34-37-42', '11-17-18-22-33-41'];
Here are the input and logic:
Given A: 01-05-24-30-35-36 (true, because it matches the exact combination numbers on array[0])
Given B: 05-30-01-36-35-24 (true, because the given 6 numbers are all present on array[0], different number position)
Given C: 01-05-24-30-35-33 (false, because the given 6 numbers are not present in one of the combinations of numbers in the array, even if the first 5 numbers are present but the last(33) is not then it will become false)
Thanks in advance for the help.
Here is an implementation of #Sammitch's excellent suggestion:
<?php
$ar = ['01-05-24-30-35-36', '25-27-32-34-37-42', '11-17-18-22-33-41'];
function doesItMatch($arg1) {
global $ar;
$in = $arg1;
$inA = explode("-", $in);
sort($inA);
$inB = implode("-", $inA);
foreach ($ar as $elem) {
if ($elem == $inB) {
echo("found match for $arg1 : $elem\n");
return;
}
}
echo("found NO match for $arg1 !!!\n");
}
doesItMatch("01-05-24-30-35-36");
doesItMatch("05-30-01-36-35-24");
doesItMatch("01-05-24-30-35-33");
?>
Output:
found match for 01-05-24-30-35-36 : 01-05-24-30-35-36
found match for 05-30-01-36-35-24 : 01-05-24-30-35-36
found NO match for 01-05-24-30-35-33 !!!
Adapt into your overall code as required.
I'm just adding this as it is code I already had and it does what you want. I think it's a little more versatile than the other answer.
This can lookup :
A string against a string
A string against an array of strings
An array of strings against another array of strings
An array of strings against a string.
It returns an array with your needles as the keys and true or false as the value depending if it has been found or not.
You can add a third parameter to change the delimiter if you need something else than a dash.
<?php
$ar = ['01-05-24-30-35-36', '25-27-32-34-37-42', '11-17-18-22-33-41'];
$ar2 = ['01-05-24-30-35-36', '05-30-01-36-35-24', '01-05-24-30-35-33'];
var_dump(sortAndMatch($ar2, $ar));
/*
* array (size=3)
* '01-05-24-30-35-36' => boolean true
* '05-30-01-36-35-24' => boolean true
* '01-05-24-30-35-33' => boolean false
*/
/*
* Sort 2 strings or arrays of strings and try to find $needles into $haystack.
* Returns array($needle => bool);
* $array[$needle] is true when it's found.
* $array[$needle] is false when it isn't.
*/
function sortAndMatch($needles, $haystack, $delimiter = '-'){
//Sort haystack
foreach ((array)$haystack as $k => $combination){
$haystack[$k] = explode($delimiter, $combination);
sort($haystack[$k]);
}
//Sort and compare needles, builds $results
foreach((array)$needles as $k => $needle){
$needle= explode($delimiter, $needle);
sort($needle);
$results[$needles[$k]] = false;
if(array_search($needle, $haystack) !== false){
$results[$needles[$k]] = true;
}
}
return $results;
}
Your haystack set of numbers is already sorted, so that can remain unchanged.
You only need to explode, sort, and reimplode the needle numbers before checking against the haystack strings (this is what in_array() is for).
Code: (Demo)
function isMatch($haystack, $needle) {
$nums = explode('-', $needle); // explode
sort($nums); // sort
$needle = implode('-', $nums); // implode
return in_array($needle, $haystack); // assess
}
$matches = ['01-05-24-30-35-36', '25-27-32-34-37-42', '11-17-18-22-33-41']; // already sorted
$givens = ['01-05-24-30-35-36', '05-30-01-36-35-24', '01-05-24-30-35-33'];
foreach ($givens as $given) {
echo "$given : " , (isMatch($matches, $given) ? 'true' : 'false') , "\n";
}
Output:
01-05-24-30-35-36 : true
05-30-01-36-35-24 : true
01-05-24-30-35-33 : false

How to check array data that matches from random characters in php?

I have an array like below:
$fruits = array("apple","orange","papaya","grape")
I have a variable like below:
$content = "apple";
I need to filter some condition like: if this variable matches at least one of the array elements, do something. The variable, $content, is a bunch of random characters that is actually one of these available in the array data like below:
$content = "eaplp"; // it's a dynamically random char from the actual word "apple`
what have I done was like the below:
$countcontent = count($content);
for($a=0;$a==count($fruits);$a++){
$countarr = count($fruits[$a]);
if($content == $fruits[$a] && $countcontent == $countarr){
echo "we got".$fruits[$a];
}
}
I tried to count how many letters these phrases had and do like if...else... when the total word in string matches with the total word on one of array data, but is there something that we could do other than that?
We can check if an array contains some value with in_array. So you can check if your $fruits array contains the string "apple" with,
in_array("apple", $fruits)
which returns a boolean.
If the order of the letters is random, we can sort the string alphabetically with this function:
function sorted($s) {
$a = str_split($s);
sort($a);
return implode($a);
}
Then map this function to your array and check if it contains the sorted string.
$fruits = array("apple","orange","papaya","grape");
$content = "eaplp";
$inarr = in_array(sorted($content), array_map("sorted", $fruits));
var_dump($inarr);
//bool(true)
Another option is array_search. The benefit from using array_search is that it returns the position of the item (if it's found in the array, else false).
$pos = array_search(sorted($content), array_map("sorted", $fruits));
echo ($pos !== false) ? "$fruits[$pos] found." : "not found.";
//apple found.
This will also work but in a slightly different manner.
I split the strings to arrays and sort them to match eachoter.
Then I use array_slice to only match the number of characters in $content, if they match it's a match.
This means this will match in a "loose" way to with "apple juice" or "apple curd".
Not sure this is wanted but figured it could be useful for someone.
$fruits = array("apple","orange","papaya","grape","apple juice", "applecurd");
$content = "eaplp";
$content = str_split($content);
$count = count($content);
Foreach($fruits as $fruit){
$arr_fruit = str_split($fruit);
// sort $content to match order of $arr_fruit
$SortCont = array_merge(array_intersect($arr_fruit, $content), array_diff($content, $arr_fruit));
// if the first n characters match call it a match
If(array_slice($SortCont, 0, $count) == array_slice($arr_fruit, 0, $count)){
Echo "match: " . $fruit ."\n";
}
}
output:
match: apple
match: apple juice
match: applecurd
https://3v4l.org/hHvp3
It is also comparable in speed with t.m.adams answer. Sometimes faster sometimes slower, but note how this code can find multiple answers. https://3v4l.org/IbuuD
I think this is the simplest way to answer that question. some of the algorithm above seems to be "overkill".
$fruits = array("apple","orange","papaya","grape");
$content = "eaplp";
foreach ($fruits as $key => $fruit) {
$fruit_array = str_split($fruit); // split the string into array
$content_array = str_split($content); // split the content into array
// check if there's no difference between the 2 new array
if ( sizeof(array_diff($content_array, $fruit_array)) === 0 ) {
echo "we found the fruit at key: " . $key;
return;
}
}
What about using only native PHP functions.
$index = array_search(count_chars($content), array_map('count_chars', $fruits));
If $index is not null you will get the position of $content inside $fruits.
P.S. Be aware that count_chars might not be the fastest approach to that problem.
With a random token to search for a value in your array, you have a problem with false positives. That can give misleading results depending on the use case.
On search cases, for example wrong typed words, I would implement a filter solution which produces a matching array. One could sort the results by calculating the levenshtein distance to fetch the most likely result, if necessary.
String length solution
Very easy to implement.
False positives: Nearly every string with the same length like apple and grape would match.
Example:
$matching = array_filter($fruits, function ($fruit) use ($content) {
return strlen($content) == strlen($fruit);
});
if (count($matching)) {
// do your stuff...
}
Regular expression solution
It compares string length and in a limited way containing characters. It is moderate to implement and has a good performance on big data cases.
False positives: A content like abc would match bac but also bbb.
Example:
$matching = preg_grep(
'#['.preg_quote($content).']{'.strlen($content).'}#',
$fruits
);
if (count($matching)) {
// do your stuff...
}
Alphanumeric sorting solution
Most accurate but also a slow approach concerning performance using PHP.
False positives: A content like abc would match on bac or cab.
Example:
$normalizer = function ($value) {
$tokens = str_split($value);
sort($tokens);
return implode($tokens);
};
$matching = array_filter($fruits, function ($fruit) use ($content, $normalizer) {
return ($normalizer($fruit) == $normalizer($content));
});
if (count($matching)) {
// do your stuff...
}
Here's a clean approach. Returns unscrambled value early if found, otherwise returns null. Only returns an exact match.
function sortStringAlphabetically($stringToSort)
{
$splitString = str_split($stringToSort);
sort($splitString);
return implode($splitString);
}
function getValueFromRandomised(array $dataToSearch = [], $dataToFind)
{
$sortedDataToFind = sortStringAlphabetically($dataToFind);
foreach ($dataToSearch as $value) {
if (sortStringAlphabetically($value) === $sortedDataToFind) {
return $value;
}
}
return null;
}
$fruits = ['apple','orange','papaya','grape'];
$content = 'eaplp';
$dataExists = getValueFromRandomised($fruits, $content);
var_dump($dataExists);
// string(5) "apple"
Not found example:
$content = 'eaplpo';
var_dump($dataExists);
// NULL
Then you can use it (eg) like this:
echo !empty($dataExists) ? $dataExists . ' was found' : 'No match found';
NOTE: This is case sensitive, meaning it wont find "Apple" from "eaplp". That can be resolved by doing strtolower() on the loop's condition vars.
How about looping through the array, and using a flag to see if it matches?
$flag = false;
foreach($fruits as $fruit){
if($fruit == $content){
$flag = true;
}
}
if($flag == true){
//do something
}
I like t.m.adams answer but I also have a solution for this issue:
array_search_random(string $needle, array $haystack [, bool $strictcase = FALSE ]);
Description: Searches a string in array elements regardless of the position of the characters in the element.
needle: the caracters you are looking for as a string
haystack: the array you want to search
strictcase: if set to TRUE needle 'mood' will match 'mood' and 'doom' but not 'Mood' and 'Doom', if set to FALSE (=default) it will match all of these.
Function:
function array_search_random($needle, $haystack, $strictcase=false){
if($strictcase === false){
$needle = strtolower($needle);
}
$needle = str_split($needle);
sort($needle);
$needle = implode($needle);
foreach($haystack as $straw){
if($strictcase === false){
$straw = strtolower($straw);
}
$straw = str_split($straw);
sort($straw);
$straw = implode($straw);
if($straw == $needle){
return true;
}
}
return false;
}
if(in_array("apple", $fruits)){
true statement
}else{
else statement
}

Check if string contains all values in array with php

What I have so far:
$searchQuery = "keyword1 keyword2";
$searchArray = explode(" ", $searchQuery);
$stringToCheck = "keyword1 keyword3 keyword2";
foreach ($searchArray as $searchKeyword) {
if (strpos($stringToCheck, $searchKeyword) !== false) :
//do something
endif;
}
endif;
What I want is to display something only if ALL values in the search query are found in a string. With my current code above, if the string contains keyword1 and keyword2, it "does something" twice, once for each match. It also comes up as true if the string only contains keyword1 but not keyword2, in that case it displays the content once.
Solution
function str_contains_all($haystack, array $needles) {
foreach ($needles as $needle) {
if (strpos($haystack, $needle) === false) {
return false;
}
}
return true;
}
Usage:
$haystack = 'foo, bar, baz';
$needles = array('foo', 'bar', 'baz');
if (str_contains_all($haystack, $needles)) {
echo "Success\n";
// ...
}
Notes
Since you specified that you only wish to perform an action when the "haystack" string contains all the substring "needles", it is safe to return false as soon as you discover a needle that is not in the haystack.
Other
Using if (...): /* ... */ endif; is fairly uncommon in PHP from my experience. I think most developers would prefer the C-style if (...) { /* ... */ } syntax.
For anyone who might need another way. I came up with this while looking for a solution.
Ensure what's to be tested are both arrays;
$var1 = explode(' ',"a string to be turned to an array");
$var2 = ["test","a","string","array"];
Get the counts (or lengths) of both arrays;
$var1count = count($var1);
$var2count = count($var2);
//Get the intersection of both arrays with array_intersect() method
$intersect = array_intersect($var1,$var2);
//check if the count (or length) of the intersect is the same as the test var (eg $var1)
if(count($intersect)===$var1count)//it has all the words
FYI: case sensitive. You might want to convert all to lower or upper case

PHP count of occurrences of characters of a string within another string

Let's say I have two strings.
$needle = 'AGUXYZ';
$haystack = 'Agriculture ID XYZ-A';
I want to count how often characters that are in $needle occur in $haystack. In $haystack, there are the characters 'A' (twice), 'X', 'Y' and 'Z', all of which are in the needle, thus the result is supposed to be 5 (case-sensitive).
Is there any function for that in PHP or do I have to program it myself?
Thanks in advance!
You can calculate the length of the original string and the length of the string without these characters. The differences between them is the number of matches.
Basically,
$needle = 'AGUXYZ';
$haystack = 'Agriculture ID XYZ-A';
Here is the part that does the work. In one line.
$count = strlen($haystack) - strlen(str_replace(str_split($needle), '', $haystack));
Explanation: The first part is self-explanatory. The second part is the length of the string without the characters in the $needle string. This is done by replacing each occurrences of any characters inside the $needle with a blank string.
To do this, we split $needle into an array, once character for each item, using str_split. Then pass it to str_replace. It replaces each occurence of any items in the $search array with a blank string.
Echo it out,
echo "Count = $count\n";
you get:
Count = 5
Try this;
function count_occurences($char_string, $haystack, $case_sensitive = true){
if($case_sensitive === false){
$char_string = strtolower($char_string);
$haystack = strtolower($haystack);
}
$characters = str_split($char_string);
$character_count = 0;
foreach($characters as $character){
$character_count = $character_count + substr_count($haystack, $character);
}
return $character_count;
}
To use;
$needle = 'AGUXYZ';
$haystack = 'Agriculture ID XYZ-A';
print count_occurences($needle, $haystack);
You can set the third parameter to false to ignore case.
There's no built-in function that handles character sets, but you simply use the substr_count function in a loop as such:
<?php
$sourceCharacters = str_split('AGUXYZ');
$targetString = 'Agriculture ID XYZ-A';
$occurrenceCount = array();
foreach($sourceCharacters as $currentCharacter) {
$occurrenceCount[$currentCharacter] = substr_count($targetString, $currentCharacter);
}
print_r($occurrenceCount);
?>
There is no specific method to do this, but this built in method can surely help you:
$count = substr_count($haystack , $needle);
edit: I just reported the general substr_count method..in your particular case you need to call it for each character inside $needle (thanks #Alan Whitelaw)
If you are not interested in the character distribution, you could use a Regex
echo preg_match_all("/[$needle]/", $haystack, $matches);
which returns the number of full pattern matches (which might be zero), or FALSE if an error occurred. The solution offered by #thai above should be significantly faster though.
If the character distribution is of any importance, you can use count_chars:
$needle = 'AGUXYZ';
$haystack = 'Agriculture ID XYZ-A';
$occurences = array_intersect_key(
count_chars($haystack, 1),
array_flip(
array_map('ord', str_split($needle))
)
);
The result would be an array with keys being the ASCII values of the character.
You can then iterate over it with
foreach($occurences as $char => $amount) {
printf("There is %d occurences of %s\n", $amount, chr($char));
}
You could still pass the $occurences array to array_sum to calculate the total.
substr_count will get you close. However, it will not do individual characters. So you could loop over each character in $needle and call this function while summing the counts.
There is a PHP function substr_count to count the number of instances of a character in a string. It would be trivial to extend it for multiple characters:
function substr_multi_count ($haystack, $needle, $offset = 0, $length = null) {
$ret = 0;
if ($length === null) {
$length = strlen($haystack) - $offset;
}
for ($i = strlen($needle); $i--; ) {
$ret += substr_count($haystack, $needle, $offset, $length);
}
return $ret;
}
I have a recursive method to overcome this:
function countChar($str){
if(strlen($str) == 0) return 0;
if(substr($str,-1) == "x") return 1 + countChar(substr($str,0,-1));
return 0 + countChar(substr($str,0,-1));
}
echo countChar("xxSR"); // 2
echo countChar("SR"); // 0
echo countChar("xrxrpxxx"); // 5
I'd do something like:
split the string to chars (str_split), and then
use array_count_values to get an array of characters with the respective number of occurrences.
Code:
$needle = 'AGUXYZ';
$string = "asdasdadas asdadas asd asdsd";
$array_chars = str_split($string);
$value_count = array_count_values($array_chars);
for ($i = 0; $i < count($needle); $i++)
echo $needle[$i]. " is occur " .
($value_count[$needle[$i]] ? $value_count[$needle[$i]] : '0')." times";

Categories