I have the following code:
<?php
$word = "aeagle";
$letter = "e";
$array = strposall($aegle, $letter);
print_r($array);
function strposall($haystack, $needle) {
$occurrence_points = array();
$pos = strpos($haystack, $needle);
if ($pos !== false) {
array_push($occurrence_points, $pos);
}
while ($pos = strpos($haystack, $needle, $pos + 1)) {
array_push($occurrence_points, $pos);
}
return $occurrence_points;
}
?>
As in the example, if I have aegle as my word and I'm searching for e within it, the function should return an array with the values 1 and 4 in it.
What's wrong with my code?
Why not trying instead
$word = "aeagle";
$letter = "e";
$occurrence_points = array_keys(array_intersect(str_split($word), array($letter)));
var_dump($occurrence_points);
I think you're passing the wrong parameters, shouild be $word instead of $aegle
Little bit more literal than the other answer:
function charpos($str, $char) {
$i = 0;
$pos = 0;
$matches = array();
if (strpos($str, $char) === false) {
return false;
}
while (!!$str) {
$pos = strpos($str, $char);
if ($pos === false) {
$str = '';
} else {
$i = $i + $pos;
$str = substr($str, $pos + 1);
array_push($matches, $i++);
}
}
return $matches;
}
https://ignite.io/code/511ff26eec221e0741000000
Using:
$str = 'abc is the place to be heard';
$positions = charpos($str, 'a');
print_r($positions);
while ($positions) {
$i = array_shift($positions);
echo "$i: $str[$i]\n";
}
Which gives:
Array (
[0] => 0
[1] => 13
[2] => 25
)
0: a
13: a
25: a
Other's have pointed out you're passing the wrong parameters. But you're also reinventing the wheel. Take a look at php's regular expression match-all (whoops, had linked the wrong function), it will already return an array of all matches with offsets, when used with the following flag.
flags
flags can be the following flag:
PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the value of matches into an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1.
Use a single letter pattern for the search term $letter = '/e/' and you should get back an array with all your positions as the second element of each result array, which you can then finagle into the output format you're looking for.
Update: Jared points out that you do get the capture of the pattern back, but with the flag set, you also get the offset. As a direct answer to the OP's question, try this code:
$word = "aeagle";
$pattern = "/e/";
$matches = array();
preg_match_all($pattern, $word, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
It has the following ouput:
Array
(
// Matches of the first pattern: /e/
[0] => Array
(
// First match
[0] => Array
(
// Substring of $word that matched
[0] => e
// Offset into $word where previous substring starts
[1] => 1
)
[1] => Array
(
[0] => e
[1] => 5
)
)
)
The results are 3D instead of 2D because preg_match_all can match multiple patterns at once. The hits are for the first (and in this case: only) pattern supplied and are thus in the first array.
And unlike the OP originally stated, 1 and 5 are the correct indexes of the letter e in the string 'aeagle'
aeagle
012345
^ ^
1 5
Performance wise, the customized version of strposall would probably be faster than a regular expression match. But learning to use an in-built function is almost always faster than developing, testing, supporting and maintaining your own code. And 9 times out of 10, that's the most expensive part of programming.
Related
Hi im trying to find all overlapping substrings in a string here is my code its only finding nonrepeating ACA.
$haystack = "ACAAGACACATGCCACATTGTCC";
$needle = "ACA";
echo preg_match_all("/$needle/", $haystack, $matches);
You're using echo to print the return value of preg_match_all. That is, you're displaying only the number of matches found. What you probably wanted to do was something like print_r($matches);, like this:
$haystack = "ACAAGACACATGCCACATTGTCC";
$needle = "ACA";
preg_match_all("/$needle/", $haystack, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => ACA
[1] => ACA
[2] => ACA
)
)
Demo
If your real concern is that it counted ACACA only once, well, there are three things that need to be said about that:
That's basically unavoidable with regex.
You really shouldn't count this twice, as it's overlapping. It's not a true recurrence of the pattern.
That said, if you want to count that twice, you could do so with something like this:
echo preg_match_all("/(?=$needle)/", $haystack, $matches);
Output:
4
Demo
Here a script to find all occurences of a substring, also the overlapping ones.
$haystack = "ACAAGACACATGCCACATTGTCC";
$needle = "ACA";
$positions = [];
$needle_len = strlen($needle);
$haystack_len = strlen($haystack);
for ($i = 0; $i <= $haystack_len; $i++) {
if( substr(substr($haystack,$i),0,$needle_len) == $needle){
$positions[]=$i;
}
}
print_r($positions);
Output: Array ( 0, 5, 7, 14 )
I have some arrays and they might all be formatted like so:
Array (
[0] => .
[1] => ..
[2] => 151108-some_image-006.jpg
[3] => high
[4] => low
)
I know they have 5 values, but I can not be certain where each value is being placed.
I am trying to get only the image out of this array
$pos = array_search('*.jpg', $main_photo_directory);
echo $main_photo_directory[$pos];
But as we all know, it's looking for a literal *.jpg which it can't find. I'm not so super at regex and wouldn't know how to format an appropriate string.
What is the best way to get the first image (assuming there may be more than one) out of this array?
ADDITION
One of the reasons I am asking is to find a simple way to search through an array. I do not know regex, though I'd like to use it. I wrote the question looking for a regex to find '.jpg' at the end of a string which no search results had yielded.
This should work:
echo current(preg_grep('/\.jpg$/', $array));
You can loop over the array until you find a string ending in jpg:
function endsWith($haystack, $needle) {
return $needle === "" || (($temp = strlen($haystack) - strlen($needle)) >= 0 && strpos($haystack, $needle, $temp) !== FALSE);
}
$array = [];
$result = false;
foreach($array as $img) {
if(endsWith($img, '.jpg')) {
$result = $img;
break;
}
}
echo $result;
How to check substrings in PHP by prefix or postfix.
For example, I have the search string named as $to_search as follows:
$to_search = "abcdef"
And three cases to check the if that is the substring in $to_search as follows:
$cases = ["abc def", "def", "deff", ... Other values ...];
Now I have to detect the first three cases using substr() function.
How can I detect the "abc def", "def", "deff" as substring of "abcdef" in PHP.
You might find the Levenshtein distance between the two words useful - it'll have a value of 1 for abc def. However your problem is not well defined - matching strings that are "similar" doesn't mean anything concrete.
Edit - If you set the deletion cost to 0 then this very closely models the problem you are proposing. Just check that the levenshtein distance is less than 1 for everything in the array.
This will find if any of the strings inside $cases are a substring of $to_search.
foreach($cases as $someString){
if(strpos($to_search, $someString) !== false){
// $someString is found inside $to_search
}
}
Only "def" is though as none of the other strings have much to do with each other.
Also on a side not; it is prefix and suffix not postfix.
To find any of the cases that either begin with or end with either the beginning or ending of the search string, I don't know of another way to do it than to just step through all of the possible beginning and ending combinations and check them. There's probably a better way to do this, but this should do it.
$to_search = "abcdef";
$cases = ["abc def", "def", "deff", "otherabc", "noabcmatch", "nodefmatch"];
$matches = array();
$len = strlen($to_search);
for ($i=1; $i <= $len; $i++) {
// get the beginning and end of the search string of length $i
$pre_post = array();
$pre_post[] = substr($to_search, 0, $i);
$pre_post[] = substr($to_search, -$i);
foreach ($cases as $case) {
// get the beginning and end of each case of length $i
$pre = substr($case, 0, $i);
$post = substr($case, -$i);
// check if any of them match
if (in_array($pre, $pre_post) || in_array($post, $pre_post)) {
// using the case as the array key for $matches will keep it distinct
$matches[$case] = true;
}
}
}
// use array_keys() to get the keys back to values
var_dump(array_keys($matches));
You can use array_filter function like this:
$cases = ["cake", "cakes", "flowers", "chocolate", "chocolates"];
$to_search = "chocolatecake";
$search = strtolower($to_search);
$arr = array_filter($cases, function($val) use ($search) { return
strpos( $search,
str_replace(' ', '', preg_replace('/s$/', '', strtolower($val))) ) !== FALSE; });
print_r($arr);
Output:
Array
(
[0] => cake
[1] => cakes
[3] => chocolate
[4] => chocolates
)
As you can it prints all the values you expected apart from deff which is not part of search string abcdef as I commented above.
I am looking for a function like strpos() with two significant differences:
To be able to accept multiple needles. I mean thousands of needles at ones.
To search for all occurrences of the needles in the haystack and to return an array of starting positions.
Of course it has to be an efficient solution not just a loop through every needle. I have searched through this forum and there were similar questions to this one, like:
Using an array as needles in strpos
Define multiple needles using stripos
Can't search an array in PHP in_array for the presence of multiple needles
but nether of them was what I am looking for. I am using strpos just to illustrate my question better, probably something entirely different has to be used for this purpose.
I am aware of Zend_Search_Lucene and I am interested if it can be used to achieve this and how (just the general idea)?
Thanks a lot for Your help and time!
try preg match for multiple
if (preg_match('/word|word2/i', $str))
Checking for multiple strpos values
Here's some sample code for my strategy:
function strpos_array($haystack, $needles, $offset=0) {
$matches = array();
//Avoid the obvious: when haystack or needles are empty, return no matches
if(empty($needles) || empty($haystack)) {
return $matches;
}
$haystack = (string)$haystack; //Pre-cast non-string haystacks
$haylen = strlen($haystack);
//Allow negative (from end of haystack) offsets
if($offset < 0) {
$offset += $heylen;
}
//Use strpos if there is no array or only one needle
if(!is_array($needles)) {
$needles = array($needles);
}
$needles = array_unique($needles); //Not necessary if you are sure all needles are unique
//Precalculate needle lengths to save time
foreach($needles as &$origNeedle) {
$origNeedle = array((string)$origNeedle, strlen($origNeedle));
}
//Find matches
for(; $offset < $haylen; $offset++) {
foreach($needles as $needle) {
list($needle, $length) = $needle;
if($needle == substr($haystack, $offset, $length)) {
$matches[] = $offset;
break;
}
}
}
return($matches);
}
I've implemented a simple brute force method above that will work with any combination of needles and haystacks (not just words). For possibly faster algorithms check out:
Aho–Corasick string matching algorithm
Other Solution
function strpos_array($haystack, $needles, $theOffset=0) {
$matches = array();
if(empty($haystack) || empty($needles)) {
return $matches;
}
$haylen = strlen($haystack);
if($theOffset < 0) { // Support negative offsets
$theOffest += $haylen;
}
foreach($needles as $needle) {
$needlelen = strlen($needle);
$offset = $theOffset;
while(($match = strpos($haystack, $needle, $offset)) !== false) {
$matches[] = $match;
$offset = $match + $needlelen;
if($offset >= $haylen) {
break;
}
}
}
return $matches;
}
I know this doesn't answer the OP's question but wanted to comment since this page is at the top of Google for strpos with multiple needles. Here's a simple solution to do so (again, this isn't specific to the OP's question - sorry):
$img_formats = array('.jpg','.png');
$missing = array();
foreach ( $img_formats as $format )
if ( stripos($post['timer_background_image'], $format) === false ) $missing[] = $format;
if (count($missing) == 2)
return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));
If 2 items are added to the $missing array that means that the input doesn't satisfy any of the image formats in the $img_formats array. At that point you know that you can return an error, etc. This could easily be turned into a little function:
function m_stripos( $haystack = null, $needles = array() ){
//return early if missing arguments
if ( !$needles || !$haystack ) return false;
// create an array to evaluate at the end
$missing = array();
//Loop through needles array, and add to $missing array if not satisfied
foreach ( $needles as $needle )
if ( stripos($haystack, $needle) === false ) $missing[] = $needle;
//If the count of $missing and $needles is equal, we know there were no matches, return false..
if (count($missing) == count($needles)) return false;
//If we're here, be happy, return true...
return true;
}
Back to our first example using then the function instead:
$needles = array('.jpg','.png');
if ( !m_strpos( $post['timer_background_image'], $needles ) )
return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));
Of course, what you do after the function returns true or false is up to you.
It seems you are searching for whole words. In this case, something like this might help. As it uses built-in functions, it should be faster than custom code, but you have to profile it:
$words = str_word_count($str, 2);
$word_position_map = array();
foreach($words as $position => $word) {
if(!isset($word_position_map[$word])) {
$word_position_map[$word] = array();
}
$word_position_map[$word][] = $position;
}
// assuming $needles is an array of words
$result = array_intersect_key($word_position_map, array_flip($needles));
Storing the information (like the needles) in the right format will improve the runtime ( e.g. as you don't have to call array_flip).
Note from the str_word_count documentation:
For the purpose of this function, 'word' is defined as a locale dependent string containing alphabetic characters, which also may contain, but not start with "'" and "-" characters.
So make sure you set the locale right.
You could use a regular expression, they support OR operations. This would however make it fairly slow, compared to strpos.
How about a simple solution using array_map()?
$string = 'one two three four';
$needles = array( 'five' , 'three' );
$strpos_arr = array_map( function ( $check ) use ( $string ) {
return strpos( $string, $check );
}, $needles );
As return, you're going to have an array where the keys are the needles positions and the values are the starting positions, if found.
//print_r( $strpos_arr );
Array
(
[0] =>
[1] => 8
)
Let's say I have two strings.
$needle = 'AGUXYZ';
$haystack = 'Agriculture ID XYZ-A';
I want to count how often characters that are in $needle occur in $haystack. In $haystack, there are the characters 'A' (twice), 'X', 'Y' and 'Z', all of which are in the needle, thus the result is supposed to be 5 (case-sensitive).
Is there any function for that in PHP or do I have to program it myself?
Thanks in advance!
You can calculate the length of the original string and the length of the string without these characters. The differences between them is the number of matches.
Basically,
$needle = 'AGUXYZ';
$haystack = 'Agriculture ID XYZ-A';
Here is the part that does the work. In one line.
$count = strlen($haystack) - strlen(str_replace(str_split($needle), '', $haystack));
Explanation: The first part is self-explanatory. The second part is the length of the string without the characters in the $needle string. This is done by replacing each occurrences of any characters inside the $needle with a blank string.
To do this, we split $needle into an array, once character for each item, using str_split. Then pass it to str_replace. It replaces each occurence of any items in the $search array with a blank string.
Echo it out,
echo "Count = $count\n";
you get:
Count = 5
Try this;
function count_occurences($char_string, $haystack, $case_sensitive = true){
if($case_sensitive === false){
$char_string = strtolower($char_string);
$haystack = strtolower($haystack);
}
$characters = str_split($char_string);
$character_count = 0;
foreach($characters as $character){
$character_count = $character_count + substr_count($haystack, $character);
}
return $character_count;
}
To use;
$needle = 'AGUXYZ';
$haystack = 'Agriculture ID XYZ-A';
print count_occurences($needle, $haystack);
You can set the third parameter to false to ignore case.
There's no built-in function that handles character sets, but you simply use the substr_count function in a loop as such:
<?php
$sourceCharacters = str_split('AGUXYZ');
$targetString = 'Agriculture ID XYZ-A';
$occurrenceCount = array();
foreach($sourceCharacters as $currentCharacter) {
$occurrenceCount[$currentCharacter] = substr_count($targetString, $currentCharacter);
}
print_r($occurrenceCount);
?>
There is no specific method to do this, but this built in method can surely help you:
$count = substr_count($haystack , $needle);
edit: I just reported the general substr_count method..in your particular case you need to call it for each character inside $needle (thanks #Alan Whitelaw)
If you are not interested in the character distribution, you could use a Regex
echo preg_match_all("/[$needle]/", $haystack, $matches);
which returns the number of full pattern matches (which might be zero), or FALSE if an error occurred. The solution offered by #thai above should be significantly faster though.
If the character distribution is of any importance, you can use count_chars:
$needle = 'AGUXYZ';
$haystack = 'Agriculture ID XYZ-A';
$occurences = array_intersect_key(
count_chars($haystack, 1),
array_flip(
array_map('ord', str_split($needle))
)
);
The result would be an array with keys being the ASCII values of the character.
You can then iterate over it with
foreach($occurences as $char => $amount) {
printf("There is %d occurences of %s\n", $amount, chr($char));
}
You could still pass the $occurences array to array_sum to calculate the total.
substr_count will get you close. However, it will not do individual characters. So you could loop over each character in $needle and call this function while summing the counts.
There is a PHP function substr_count to count the number of instances of a character in a string. It would be trivial to extend it for multiple characters:
function substr_multi_count ($haystack, $needle, $offset = 0, $length = null) {
$ret = 0;
if ($length === null) {
$length = strlen($haystack) - $offset;
}
for ($i = strlen($needle); $i--; ) {
$ret += substr_count($haystack, $needle, $offset, $length);
}
return $ret;
}
I have a recursive method to overcome this:
function countChar($str){
if(strlen($str) == 0) return 0;
if(substr($str,-1) == "x") return 1 + countChar(substr($str,0,-1));
return 0 + countChar(substr($str,0,-1));
}
echo countChar("xxSR"); // 2
echo countChar("SR"); // 0
echo countChar("xrxrpxxx"); // 5
I'd do something like:
split the string to chars (str_split), and then
use array_count_values to get an array of characters with the respective number of occurrences.
Code:
$needle = 'AGUXYZ';
$string = "asdasdadas asdadas asd asdsd";
$array_chars = str_split($string);
$value_count = array_count_values($array_chars);
for ($i = 0; $i < count($needle); $i++)
echo $needle[$i]. " is occur " .
($value_count[$needle[$i]] ? $value_count[$needle[$i]] : '0')." times";