Working with substr_count() and arrays in PHP

Working with substr_count() and arrays in PHP - php

So what I need is to compare a string to an array (string as a haystack and array as a needle) and get the elements from the string that repeat within the array. For this purpose I've taken a sample function for using an array as a needle in the substr_count function.
$animals = array('cat','dog','bird');
$toString = implode(' ', $animals);
$data = array('a');
function substr_count_array($haystack, $needle){
$initial = 0;
foreach ($needle as $substring) {
$initial += substr_count($haystack, $substring);
}
return $initial;
}
echo substr_count_array($toString, $data);
The problem is that if I search for a character such as 'a', it gets through the check and validates as a legit value because 'a' is contained within the first element. So the above outputs 1. I figured this was due to the foreach() but how do I bypass that? I want to search for a whole string match, not partial.

You can break up the $haystack into individual words, then do an in_array() check over it to make sure the word exists in that array as a whole word before doing your substr_count():
$animals = array('cat','dog','bird', 'cat', 'dog', 'bird', 'bird', 'hello');
$toString = implode(' ', $animals);
$data = array('cat');
function substr_count_array($haystack, $needle){
$initial = 0;
$bits_of_haystack = explode(' ', $haystack);
foreach ($needle as $substring) {
if(!in_array($substring, $bits_of_haystack))
continue; // skip this needle if it doesn't exist as a whole word
$initial += substr_count($haystack, $substring);
}
return $initial;
}
echo substr_count_array($toString, $data);
Here, cat is 2, dog is 2, bird is 3, hello is 1 and lion is 0.
Edit: here's another alternative using array_keys() with the search parameter set to the $needle:
function substr_count_array($haystack, $needle){
$bits_of_haystack = explode(' ', $haystack);
return count(array_keys($bits_of_haystack, $needle[0]));
}
Of course, this approach requires a string as the needle. I'm not 100% sure why you need to use an array as the needle, but perhaps you could do a loop outside the function and call it for each needle if you need to - just another option anyway!

Just throwing my solution in the ring here; the basic idea, as outlined by scrowler as well, is to break up the search subject into separate words so that you can compare whole words.
function substr_count_array($haystack, $needle)
{
$substrings = explode(' ', $haystack);
return array_reduce($substrings, function($total, $current) use ($needle) {
return $total + count(array_keys($needle, $current, true));
}, 0);
}
The array_reduce() step is basically this:
$total = 0;
foreach ($substrings as $substring) {
$total = $total + count(array_keys($needle, $substring, true));
}
return $total;
The array_keys() expression returns the keys of $needle for which the value equals $substring. The size of that array is the number of occurrences.

Related

Filter an array of words by an array of single letters it should ONLY contain

I currently have two PHP arrays:
array('a','c','r','r')
array('carr','car','arc','ra','c','abc','do','aa','rr')
My desired result is:
array('carr','car','arc','ra','c','rr')
i.e. filtering out 'abc','do','aa' because I wish to filter out words that:
don't contain a, c or r
contain more/multiple a,c,r than I have in array 1.
I have tried array_filter() but I don't seem to be able to make it work.

One way to do this:
Count how many times each letter occurs in your first array, using array_count_values.
Then in your array_filter callback function, split each word into individual letters, and also count how many times each of them occurs. Then loop over those counted letters. If the current letter does not occur in your letter-count-array, or its count is greater than that in your letter-count-array, return false.
$letters = ['a','c','r','r'];
$words = ['carr','car','arc','ra','c','abc','do','aa','rr'];
$letterCounts = array_count_values($letters);
$filtered = array_filter($words, function($word) use ($letterCounts) {
$wordLetterCounts = array_count_values(mb_str_split($word));
foreach($wordLetterCounts as $wordLetter => $count) {
if(!isset($letterCounts[$wordLetter]) || $letterCounts[$wordLetter] < $count) {
return false;
}
}
return true;
});
var_dump($filtered);

As you iterate the array of words, you can iterate the array of letters and make single-letter replacements. If all letters in the word are consumed, the word is saved.
A regular expression isn't actually necessary because the letter is literal, but preg_replace() offers a limiting parameter and str_replace() doesn't.
Code: (Demo)
$needles = ['a','c','r','r'];
$haystacks = ['carr','car','arc','ra','c','abc','do','aa','rr', 'rrr'];
$result = [];
foreach ($haystacks as $i => $haystack) {
foreach ($needles as $needle) {
$haystack = preg_replace("/$needle/", '', $haystack, 1);
}
if (!$haystack) {
$result[] = $haystacks[$i];
}
}
var_export($result);
The above can actually be boiled down to this: (Demo)
$regexes = array_map(fn($v) => "/$v/", $needles);
var_export(
array_filter(
$haystacks,
fn($hay) => !preg_replace($regexes, '', $hay, 1)
)
);

PHP remove values below a given value in a "|"-separated string

I have this value:
$numbers= "|800027|800036|800079|800097|800134|800215|800317|800341|800389"
And I want to remove the values below 800130 including the starting "|". I guess it is possible, but I can not find any examples anywhere. If anyone can point me to the right direction I would be thankful.

You could split the input string on pipe, then remove all array elements which, when cast to numbers, are less than 800130. Then, recombine to a pipe delimited string.
$input= "|800027|800036|800079|800097|800134|800215|800317|800341|800389";
$input = ltrim($input, '|');
$numbers = explode("|", $input);
$array = [];
foreach ($numbers as $number) {
if ($number >= 800130) array_push($array, $number);
}
$output = implode("|", $array);
echo "|" . $output;
This prints:
|800134|800215|800317|800341|800389

This should work as well:
$numbers= "|800027|800036|800079|800097|800134|800215|800317|800341|800389";
function my_filter($value) {
return ($value >= "800130");
}
$x = explode("|", $numbers); // Convert to array
$y = array_filter($x, "my_filter"); // Filter out elements
$z = implode("|", $y); // Convert to string again
echo $z;
Note that it's not necessary to have different variables (x,y,z). It's just there to make it a little bit easier to follow the code :)

PHP has a built in function preg_replace_callback which takes a regular expression - in your case \|(\d+) - and applies a callback function to the matched values. Which means you can do this with a simple comparison of each matched value...
$numbers= "|800027|800036|800079|800097|800134|800215|800317|800341|800389";
echo preg_replace_callback("/\|(\d+)/", function($match){
return $match[1] < 800130 ? "" : $match[0];
}, $numbers);

Use explode and implode functions and delete the values that are less than 80031:
$numbers= "|800027|800036|800079|800097|800134|800215|800317|800341|800389";
$values = explode("|", $numbers);
for ($i=1;$i<sizeof($values);$i++) {
if (intval($values[$i])<800130) {
unset($values[$i]);
}
}
// Notice I didn't start the $i index from 0 in the for loop above because the string is starting with "|", the first index value for explode is ""
// If you will not do this, you will get "|" in the end in the resulting string, instead of start.
$result = implode("|", $values);
echo $result;
It will print:
|800134|800215|800317|800341|800389

You can split them with a regex and then filter the array.
$numbers= "|800027|800036|800079|800097|800134|800215|800317|800341|800389";
$below = '|'.join('|', array_filter(preg_split('/\|/', $numbers, -1, PREG_SPLIT_NO_EMPTY), fn($n) => $n < 800130));
|800027|800036|800079|800097

"Unfolding" a String

I have a set of strings, each string has a variable number of segments separated by pipes (|), e.g.:
$string = 'abc|b|ac';
Each segment with more than one char should be expanded into all the possible one char combinations, for 3 segments the following "algorithm" works wonderfully:
$result = array();
$string = explode('|', 'abc|b|ac');
foreach (str_split($string[0]) as $i)
{
foreach (str_split($string[1]) as $j)
{
foreach (str_split($string[2]) as $k)
{
$result[] = implode('|', array($i, $j, $k)); // more...
}
}
}
print_r($result);
Output:
$result = array('a|b|a', 'a|b|c', 'b|b|a', 'b|b|c', 'c|b|a', 'c|b|c');
Obviously, for more than 3 segments the code starts to get extremely messy, since I need to add (and check) more and more inner loops. I tried coming up with a dynamic solution but I can't figure out how to generate the correct combination for all the segments (individually and as a whole). I also looked at some combinatorics source code but I'm unable to combine the different combinations of my segments.
I appreciate if anyone can point me in the right direction.

Recursion to the rescue (you might need to tweak a bit to cover edge cases, but it works):
function explodinator($str) {
$segments = explode('|', $str);
$pieces = array_map('str_split', $segments);
return e_helper($pieces);
}
function e_helper($pieces) {
if (count($pieces) == 1)
return $pieces[0];
$first = array_shift($pieces);
$subs = e_helper($pieces);
foreach($first as $char) {
foreach ($subs as $sub) {
$result[] = $char . '|' . $sub;
}
}
return $result;
}
print_r(explodinator('abc|b|ac'));
Outputs:
Array
(
[0] => a|b|a
[1] => a|b|c
[2] => b|b|a
[3] => b|b|c
[4] => c|b|a
[5] => c|b|c
)
As seen on ideone.

This looks like a job for recursive programming! :P
I first looked at this and thought it was going to be a on-liner (and probably is in perl).
There are other non-recursive ways (enumerate all combinations of indexes into segments then loop through, for example) but I think this is more interesting, and probably 'better'.
$str = explode('|', 'abc|b|ac');
$strlen = count( $str );
$results = array();
function splitAndForeach( $bchar , $oldindex, $tempthread) {
global $strlen, $str, $results;
$temp = $tempthread;
$newindex = $oldindex + 1;
if ( $bchar != '') { array_push($temp, $bchar ); }
if ( $newindex <= $strlen ){
print "starting foreach loop on string '".$str[$newindex-1]."' \n";
foreach(str_split( $str[$newindex - 1] ) as $c) {
print "Going into next depth ($newindex) of recursion on char $c \n";
splitAndForeach( $c , $newindex, $temp);
}
} else {
$found = implode('|', $temp);
print "Array length (max recursion depth) reached, result: $found \n";
array_push( $results, $found );
$temp = $tempthread;
$index = 0;
print "***************** Reset index to 0 *****************\n\n";
}
}
splitAndForeach('', 0, array() );
print "your results: \n";
print_r($results);

You could have two arrays: the alternatives and a current counter.
$alternatives = array(array('a', 'b', 'c'), array('b'), array('a', 'c'));
$counter = array(0, 0, 0);
Then, in a loop, you increment the "last digit" of the counter, and if that is equal to the number of alternatives for that position, you reset that "digit" to zero and increment the "digit" left to it. This works just like counting with decimal numbers.
The string for each step is built by concatenating the $alternatives[$i][$counter[$i]] for each digit.
You are finished when the "first digit" becomes as large as the number of alternatives for that digit.
Example: for the above variables, the counter would get the following values in the steps:
0,0,0
0,0,1
1,0,0 (overflow in the last two digit)
1,0,1
2,0,0 (overflow in the last two digits)
2,0,1
3,0,0 (finished, since the first "digit" has only 3 alternatives)

strpos() with multiple needles?

I am looking for a function like strpos() with two significant differences:
To be able to accept multiple needles. I mean thousands of needles at ones.
To search for all occurrences of the needles in the haystack and to return an array of starting positions.
Of course it has to be an efficient solution not just a loop through every needle. I have searched through this forum and there were similar questions to this one, like:
Using an array as needles in strpos
Define multiple needles using stripos
Can't search an array in PHP in_array for the presence of multiple needles
but nether of them was what I am looking for. I am using strpos just to illustrate my question better, probably something entirely different has to be used for this purpose.
I am aware of Zend_Search_Lucene and I am interested if it can be used to achieve this and how (just the general idea)?
Thanks a lot for Your help and time!

try preg match for multiple
if (preg_match('/word|word2/i', $str))
Checking for multiple strpos values

Here's some sample code for my strategy:
function strpos_array($haystack, $needles, $offset=0) {
$matches = array();
//Avoid the obvious: when haystack or needles are empty, return no matches
if(empty($needles) || empty($haystack)) {
return $matches;
}
$haystack = (string)$haystack; //Pre-cast non-string haystacks
$haylen = strlen($haystack);
//Allow negative (from end of haystack) offsets
if($offset < 0) {
$offset += $heylen;
}
//Use strpos if there is no array or only one needle
if(!is_array($needles)) {
$needles = array($needles);
}
$needles = array_unique($needles); //Not necessary if you are sure all needles are unique
//Precalculate needle lengths to save time
foreach($needles as &$origNeedle) {
$origNeedle = array((string)$origNeedle, strlen($origNeedle));
}
//Find matches
for(; $offset < $haylen; $offset++) {
foreach($needles as $needle) {
list($needle, $length) = $needle;
if($needle == substr($haystack, $offset, $length)) {
$matches[] = $offset;
break;
}
}
}
return($matches);
}
I've implemented a simple brute force method above that will work with any combination of needles and haystacks (not just words). For possibly faster algorithms check out:
Aho–Corasick string matching algorithm
Other Solution
function strpos_array($haystack, $needles, $theOffset=0) {
$matches = array();
if(empty($haystack) || empty($needles)) {
return $matches;
}
$haylen = strlen($haystack);
if($theOffset < 0) { // Support negative offsets
$theOffest += $haylen;
}
foreach($needles as $needle) {
$needlelen = strlen($needle);
$offset = $theOffset;
while(($match = strpos($haystack, $needle, $offset)) !== false) {
$matches[] = $match;
$offset = $match + $needlelen;
if($offset >= $haylen) {
break;
}
}
}
return $matches;
}

I know this doesn't answer the OP's question but wanted to comment since this page is at the top of Google for strpos with multiple needles. Here's a simple solution to do so (again, this isn't specific to the OP's question - sorry):
$img_formats = array('.jpg','.png');
$missing = array();
foreach ( $img_formats as $format )
if ( stripos($post['timer_background_image'], $format) === false ) $missing[] = $format;
if (count($missing) == 2)
return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));
If 2 items are added to the $missing array that means that the input doesn't satisfy any of the image formats in the $img_formats array. At that point you know that you can return an error, etc. This could easily be turned into a little function:
function m_stripos( $haystack = null, $needles = array() ){
//return early if missing arguments
if ( !$needles || !$haystack ) return false;
// create an array to evaluate at the end
$missing = array();
//Loop through needles array, and add to $missing array if not satisfied
foreach ( $needles as $needle )
if ( stripos($haystack, $needle) === false ) $missing[] = $needle;
//If the count of $missing and $needles is equal, we know there were no matches, return false..
if (count($missing) == count($needles)) return false;
//If we're here, be happy, return true...
return true;
}
Back to our first example using then the function instead:
$needles = array('.jpg','.png');
if ( !m_strpos( $post['timer_background_image'], $needles ) )
return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));
Of course, what you do after the function returns true or false is up to you.

It seems you are searching for whole words. In this case, something like this might help. As it uses built-in functions, it should be faster than custom code, but you have to profile it:
$words = str_word_count($str, 2);
$word_position_map = array();
foreach($words as $position => $word) {
if(!isset($word_position_map[$word])) {
$word_position_map[$word] = array();
}
$word_position_map[$word][] = $position;
}
// assuming $needles is an array of words
$result = array_intersect_key($word_position_map, array_flip($needles));
Storing the information (like the needles) in the right format will improve the runtime ( e.g. as you don't have to call array_flip).
Note from the str_word_count documentation:
For the purpose of this function, 'word' is defined as a locale dependent string containing alphabetic characters, which also may contain, but not start with "'" and "-" characters.
So make sure you set the locale right.

You could use a regular expression, they support OR operations. This would however make it fairly slow, compared to strpos.

How about a simple solution using array_map()?
$string = 'one two three four';
$needles = array( 'five' , 'three' );
$strpos_arr = array_map( function ( $check ) use ( $string ) {
return strpos( $string, $check );
}, $needles );
As return, you're going to have an array where the keys are the needles positions and the values are the starting positions, if found.
//print_r( $strpos_arr );
Array
(
[0] =>
[1] => 8
)

PHP count of occurrences of characters of a string within another string

Let's say I have two strings.
$needle = 'AGUXYZ';
$haystack = 'Agriculture ID XYZ-A';
I want to count how often characters that are in $needle occur in $haystack. In $haystack, there are the characters 'A' (twice), 'X', 'Y' and 'Z', all of which are in the needle, thus the result is supposed to be 5 (case-sensitive).
Is there any function for that in PHP or do I have to program it myself?
Thanks in advance!

You can calculate the length of the original string and the length of the string without these characters. The differences between them is the number of matches.
Basically,
$needle = 'AGUXYZ';
$haystack = 'Agriculture ID XYZ-A';
Here is the part that does the work. In one line.
$count = strlen($haystack) - strlen(str_replace(str_split($needle), '', $haystack));
Explanation: The first part is self-explanatory. The second part is the length of the string without the characters in the $needle string. This is done by replacing each occurrences of any characters inside the $needle with a blank string.
To do this, we split $needle into an array, once character for each item, using str_split. Then pass it to str_replace. It replaces each occurence of any items in the $search array with a blank string.
Echo it out,
echo "Count = $count\n";
you get:
Count = 5

Try this;
function count_occurences($char_string, $haystack, $case_sensitive = true){
if($case_sensitive === false){
$char_string = strtolower($char_string);
$haystack = strtolower($haystack);
}
$characters = str_split($char_string);
$character_count = 0;
foreach($characters as $character){
$character_count = $character_count + substr_count($haystack, $character);
}
return $character_count;
}
To use;
$needle = 'AGUXYZ';
$haystack = 'Agriculture ID XYZ-A';
print count_occurences($needle, $haystack);
You can set the third parameter to false to ignore case.

There's no built-in function that handles character sets, but you simply use the substr_count function in a loop as such:
<?php
$sourceCharacters = str_split('AGUXYZ');
$targetString = 'Agriculture ID XYZ-A';
$occurrenceCount = array();
foreach($sourceCharacters as $currentCharacter) {
$occurrenceCount[$currentCharacter] = substr_count($targetString, $currentCharacter);
}
print_r($occurrenceCount);
?>

There is no specific method to do this, but this built in method can surely help you:
$count = substr_count($haystack , $needle);
edit: I just reported the general substr_count method..in your particular case you need to call it for each character inside $needle (thanks #Alan Whitelaw)

If you are not interested in the character distribution, you could use a Regex
echo preg_match_all("/[$needle]/", $haystack, $matches);
which returns the number of full pattern matches (which might be zero), or FALSE if an error occurred. The solution offered by #thai above should be significantly faster though.
If the character distribution is of any importance, you can use count_chars:
$needle = 'AGUXYZ';
$haystack = 'Agriculture ID XYZ-A';
$occurences = array_intersect_key(
count_chars($haystack, 1),
array_flip(
array_map('ord', str_split($needle))
)
);
The result would be an array with keys being the ASCII values of the character.
You can then iterate over it with
foreach($occurences as $char => $amount) {
printf("There is %d occurences of %s\n", $amount, chr($char));
}
You could still pass the $occurences array to array_sum to calculate the total.

substr_count will get you close. However, it will not do individual characters. So you could loop over each character in $needle and call this function while summing the counts.

There is a PHP function substr_count to count the number of instances of a character in a string. It would be trivial to extend it for multiple characters:
function substr_multi_count ($haystack, $needle, $offset = 0, $length = null) {
$ret = 0;
if ($length === null) {
$length = strlen($haystack) - $offset;
}
for ($i = strlen($needle); $i--; ) {
$ret += substr_count($haystack, $needle, $offset, $length);
}
return $ret;
}

I have a recursive method to overcome this:
function countChar($str){
if(strlen($str) == 0) return 0;
if(substr($str,-1) == "x") return 1 + countChar(substr($str,0,-1));
return 0 + countChar(substr($str,0,-1));
}
echo countChar("xxSR"); // 2
echo countChar("SR"); // 0
echo countChar("xrxrpxxx"); // 5

I'd do something like:
split the string to chars (str_split), and then
use array_count_values to get an array of characters with the respective number of occurrences.
Code:
$needle = 'AGUXYZ';
$string = "asdasdadas asdadas asd asdsd";
$array_chars = str_split($string);
$value_count = array_count_values($array_chars);
for ($i = 0; $i < count($needle); $i++)
echo $needle[$i]. " is occur " .
($value_count[$needle[$i]] ? $value_count[$needle[$i]] : '0')." times";

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Working with substr_count() and arrays in PHP - php

Related

Filter an array of words by an array of single letters it should ONLY contain

PHP remove values below a given value in a "|"-separated string

"Unfolding" a String

strpos() with multiple needles?

PHP count of occurrences of characters of a string within another string

Categories

Resources