Find first occurence of substring before a certain position - php

In PHP, if I have a long string, IE 10'000 chars, how would you suggest I go about finding the first occurence of a certain string before and after a given position.
IE, if I have the string:
BaaaaBcccccHELLOcccccBaaaaB
I can use strpos to find the position of HELLO. How could I then go about finding the position of the first occurence of B before HELLO and the first occurence of B after HELLO?

You can use stripos() and strripos() to find the first occurrence of a sub-string inside a string. You can also supply a negative offset to strripos() function to search in reverse order (from right to left). strripos() with negative offset
$body = "BaaaaBcccccHELLOcccccBaaaaB";
$indexOfHello = stripos($body, 'Hello');
if ($indexOfHello !== FALSE)
{
// First Occurrence of B before Hello
$indexOfB= stripos(substr($body,0,$indexOfHello),'B',($indexOfHello * -1));
print("First Occurance of B before Hello is ".$indexOfB."\n") ;
// First Occurrence of B before Hello (in reverse order)
$indexOfB= strripos($body,'B',($indexOfHello * -1));
print("First Occurrence of B before Hello (in reverse order) is ".$indexOfB."\n") ;
// First Occurrence of B after Hello
$indexOfB= stripos($body,'B',$indexOfHello);
print("First Occurance of B after Hello is ".$indexOfB."\n") ;
}

If you think about optimization there a lot of pattern search algorithms
Here is sample of naive pattern search:
/**
* Naive algorithm for Pattern Searching
*/
function search(string $pat, string $txt, int $searchFrom = 0, ?int $searchTill = null)
{
$M = strlen($pat);
$N = strlen($txt);
if ($searchTill !== null && $searchTill < $N){
$N = $searchTill;
}
for ($i = $searchFrom; $i <= $N - $M; $i++)
{
// For current index i,
// check for pattern match
for ($j = 0; $j < $M; $j++)
if ($txt[$i + $j] != $pat[$j])
break;
// if pat[0...M-1] =
// txt[i, i+1, ...i+M-1]
if ($j == $M)
return $i;
}
}
// Driver Code
$txt = "BaaaaBcccccHELLOcccccBaaaaB";
if (null!==($helloPos = search("HELLO", $txt))){
print("First Occurance of B before Hello is ".search("B", $txt, 0, $helloPos)."<br>") ;
print("First Occurance of B after Hello is ".search("B", $txt, $helloPos, null)."<br>") ;
}

Given the position…
To find the first occurrence before, you can take the substr() before the match and use strrpos().
To find the first occurrence after, you can still use strpos() and set the offset parameter.

Related

Remove lowercase letter if it is followed by an uppercase letter

The goal is to get from string $a="NewYork" new string without lowercase that stands before uppercase.
In this example, we should get output "NeYork"
I tried to do this through positions of small and big letters in ASCII table, but it doesn't work. I'm not sure is it possible to to do this in similar way, through positions in ASCII table.
function delete_char($a)
{
global $b;
$a = 'NewYork';
for($i =0; $i<strlen($a); $i++)
{
if( ord($a[$i])< ord($a[$i+1])){//this solves only part of a problem
chop($a,'$a[$i]');
}
else{
$b.=$a[$i];
}
}
return $b;
}
This is something a regular expression handles with ease
<?php
$a ="NewYorkNewYork";
$reg="/[a-z]([A-Z])/";
echo preg_replace($reg, "$1", $a); // NeYorNeYork
The regular expression searches for a lower case letter followed by an upper case letter, and captures the upper case one. preg_replace() then replace that combination with just the captured letter ($1).
See https://3v4l.org/o43bO
You don't need to capture the uppercase letter and use a backreference in the replacement string.
More simply, match the lowercase letter then use a lookahead for an uppercase letter -- this way you only replace the lowercase character with an empty string. (Demo)
echo preg_replace('~[a-z](?=[A-Z])~', '', 'NewYork');
// NeYork
As for a review of your code, there are multiple issues.
global $b doesn't make sense to me. You need the variable to be instantiated as an empty string within the scope of the custom function only. It more simply should be $b = '';.
The variable and function naming is unhelpful. A function's name should specifically describe the function's action. A variable should intuitively describe the data that it contains. Generally speaking, don't sacrifice clarity for brevity.
As a matter of best practice, you should not repeatedly call a function when you know that the value has not changed. Calling strlen() on each iteration of the loop is not beneficial. Declare $length = strlen($input) and use $length over and over.
$a[$i+1] is going to generate an undefined offset warning on the last iteration of the loop because there cannot possibly be a character at that offset when you already know the length of the string has been fully processed. In other words, the last character of a string will have an offset of "length - 1". There is more than one way to address this, but I'll use the null coalescing operator to set a fallback character that will not qualify the previous letter for removal.
Most importantly, you cannot just check that the current ord value is less than the next ord value. See here that lowercase letters have an ordinal range of 97 through 122 and uppercase letters have an ordinal range of 65 through 90. You will need to check that both letters meet the qualifying criteria for the current letter to be included in the result string.
Rewrite: (Demo)
function removeLowerCharBeforeUpperChar(string $input): string
{
$output = '';
$length = strlen($input);
for ($offset = 0; $offset < $length; ++$offset) {
$currentOrd = ord($input[$offset]);
$nextOrd = ord($input[$offset + 1] ?? '_');
if ($currentOrd < 97
|| $currentOrd > 122
|| $nextOrd < 65
|| $nextOrd > 90
){
$output .= $input[$offset];
}
}
return $output;
}
echo removeLowerCharBeforeUpperChar('MickMacKusa');
// MicMaKusa
Or with ctype_ functions: (Demo)
function removeLowerCharBeforeUpperChar(string $input): string
{
$output = '';
$length = strlen($input);
for ($offset = 0; $offset < $length; ++$offset) {
$nextLetter = $input[$offset + 1] ?? '';
if (ctype_lower($input[$offset]) && ctype_upper($nextLetter)) {
$output .= $nextLetter; // omit current letter, save next
++$offset; // double iterate
} else {
$output .= $input[$offset]; // save current letter
}
}
return $output;
}
To clarify, I would not use the above custom function in a professional script and both snippets are not built to process strings containing multibyte characters.
Simply, I create new variable $s used for store new string to be returned and a make loop iterate over $a string, I used ctype_upper to check if next character not uppercase append it to $s. at the end i return $s concatenate with last char of string.
function delete_char(string $a): string
{
if(!strlen($a))
{
return '';
}
$s='';
for($i = 0; $i < strlen($a)-1; $i++)
{
if(!ctype_upper($a[$i+1])){
$s.=$a[$i];
}
}
return $s.$a[-1];
}
echo delete_char("NewYork");//NeYork
Something like this maybe?
<?php
$word = 'NewYork';
preg_match('/.[A-Z].*/', $word, $match);
if($match){
$rlen = strlen($match[0]); //length from character before capital letter
$start = strlen($word)-$rlen; //first lower case before the capital
$edited_word = substr_replace($word, '', $start, 1); //removes character
echo $edited_word; //prints NeYork
}
?>

Count consecutive occurence of specific, identical characters in a string - PHP

I am trying to calculate a few 'streaks', specifically the highest number of wins and losses in a row, but also most occurences of games without a win, games without a loss.
I have a string that looks like this; 'WWWDDWWWLLWLLLL'
For this I need to be able to return:
Longest consecutive run of W charector (i will then replicate for L)
Longest consecutive run without W charector (i will then replicate for L)
I have found and adapted the following which will go through my array and tell me the longest sequence, but I can't seem to adapt it to meet the criteria above.
All help and learning greatly appreciated :)
function getLongestSequence($sequence){
$sl = strlen($sequence);
$longest = 0;
for($i = 0; $i < $sl; )
{
$substr = substr($sequence, $i);
$len = strspn($substr, $substr{0});if($len > $longest)
$longest = $len;
$i += $len;
}
return $longest;
}
echo getLongestSequence($sequence);
You can use a regular expression to detect sequences of identical characters:
$string = 'WWWDDWWWLLWLLLL';
// The regex matches any character -> . in a capture group ()
// plus as much identical characters as possible following it -> \1+
$pattern = '/(.)\1+/';
preg_match_all($pattern, $string, $m);
// sort by their length
usort($m[0], function($a, $b) {
return (strlen($a) < strlen($b)) ? 1 : -1;
});
echo "Longest sequence: " . $m[0][0] . PHP_EOL;
You can achieve the maximum count of consecutive character in a particular string using the below code.
$string = "WWWDDWWWLLWLLLL";
function getLongestSequence($str,$c) {
$len = strlen($str);
$maximum=0;
$count=0;
for($i=0;$i<$len;$i++){
if(substr($str,$i,1)==$c){
$count++;
if($count>$maximum) $maximum=$count;
}else $count=0;
}
return $maximum;
}
$match="W";//change to L for lost count D for draw count
echo getLongestSequence($string,$match);

PHP: trim word OR part of it from begining/end of string

I need to trim words from begining and end of string. Problem is, sometimes the words can be abbreviated ie. only first three letters (followed by dot).
I tried hard to find suitable regular expression. Basicaly I need to chatch three or more initial characters up to length of replacement, but I cannot find regular expression, that will match variable length and will keep order of characters.
For example, if I need to trim 'insurance' from sentence 'insur. companies are rich', then pattern \^[insurance]{3,9}\ comes to my mind, but this pattern will also catch words like 'sensace', because order of characters (and their occurance) inside [] is not important for regexp.
Also, at end of string, I need remove serial-numbers, that are abbreviated from beginig - say 'XK-25F14' is sometimes presented as '25F14'. So I decided to go purely with character by character comparison.
Therefore I end with following php function
function trimWords($s, $dirt, $case_insensitive = false, $reverse = true)
{
$pos = 0;
$func = $case_insensitive ? 'strncasecmp' : 'strncmp';
// Get number of initial characters, that match in both strings
while ($func($s, $dirt, $pos + 1) === 0)
$pos++;
// If more than 2 initial characters match, then remove the match
if ($pos > 2)
$s = substr($s, $pos);
// Reverse $s and $dirt so it will trim from the end of string
$s = strrev($s);
if ($reverse)
return trimWords($s, strrev($dirt), $case_insensitive, false);
// After second run return back-reversed string
return trim($s, ' .-');
}
I'm happy with this function, but it has one drawback. It trims only one occurence of word. How to make it trim more occurances, i.e. remove both 'insurance ' from 'Insurance insur. companies'.
And I'm also curious, it realy does not exists such regular expression, that will match variable length and will respect order of characters in pattern?
Final solution
Thanks to mrhobo I have ended with function based on regular expression. This function can be easily improved and shall also be the most efficient for this task.
I have modified my previous function and it is two times quicker than regexp, but it can remove only one word per single run, so to be able to remove word from begin and end, it has to runs itself twice and performance is same as regexp and to remove more than one occurance of word, it has to runs itself multiple times, which will then be more and more slower.
The final function goes like this.
function trimWords($string, $word, $case_insensitive = false, $min_abbrv = 3)
{
$exc = substr($word, $min_abbrv);
$pat = null;
$i = strlen($exc);
while ($i--)
$pat = '(?>'.preg_quote($exc[$i], '#').$pat.')?';
$pat = substr($word, 0, $min_abbrv).$pat;
$pat = '#(?<begin>^)?(?:\W*\b'.$pat.'\b\W*)+(?(begin)|$)#';
if ($case_insensitive)
$pat .= 'i';
return preg_replace($pat, '', $string);
}
NOTE: with this function, it does not matter, if abbreviation ends with dot or not, it wipes out any shorter form of word and also removes all nonword characters around the word.
EDIT: I just tried create replace pattern like insu(r|ra|ran|ranc|rance) and function with atomic groups is faster by ~30% and with longer words it could be possibly even more efficient.
Matching a word and all possible abbreviations from the nth letter isn't quite an easy task in regex.
Here is how I would do it for the word insurance from the 4th letter:
insu(?>r(?>a(?>n(?>c(?>(?<last>e))?)?)?)?)?(?(last)|\.)
http://regex101.com/r/aL2gV4
It works by using atomic groups to force the regex engine as far as possible forward past the last 'rance' letters using the nested pattern (?>a(?>b)?)?. If the last letter letter is matched we're not dealing with an abbreviation thus no dot is required, otherwise the dot is required. This is coded by (?(last)|\.).
To trim, I would create a function to build the above regex for an abbreviation. Then you can write a while loop that replaces each of the abbreviation regexes with empty space until there are no more matches.
Non regex version
Here is my non regex version that removes multiple words and abbreviated words from a string:
function trimWords($str, $word, $min_abbrv, $case_insensitive = false) {
$len = 0;
$word_len = strlen($word);
$strlen = strlen($str);
$cmp = $case_insensitive ? strncasecmp : strncmp;
for ($i = 0; $i < $strlen; $i++) {
if ($cmp($str[$i], $word[$len], $i) == 0) {
$len++;
} else if ($len > 0) {
if ($len == $word_len || ($len >= $min_abbrv && ($dot = $str[$i] == '.'))) {
$i -= $len;
$len += $dot;
$str = substr($str, 0, $i) . substr($str, $i+$len);
$strlen = strlen($str);
$dot = 0;
}
$len = 0;
}
}
return $str;
}
Example:
$string = 'ins. <- "ins." / insu. insuranc. insurance / insurance. <- "."';
echo trimWords($string, 'insurance', 4);
Output is:
ins. <- "ins." / / . <- "."
I wrote function that constructs regular expression pattern according to mrhobo and also simple test and benchmarked it against my function with pure PHP string comparison.
Here is the code:
$string = 'Insur. companies are nasty rich';
$dirt = 'insurance';
$cycles = 500000;
$start = microtime(true);
$i = $cycles;
while ($i) {
$i--;
regexpStyle($string, $dirt, true);
}
$stop = microtime(true);
$i = $cycles;
while ($i) {
$i--;
trimWords($string, $dirt, true);
}
$end = microtime(true);
$res1 = $stop - $start;
$res2 = $end - $stop;
$winner = $res1 < $res2 ? '<<<' : '>>>';
echo 'regexp: '.$res1.' '.$winner.' string operations: '.$res2;
function trimWords($s, $dirt, $case_insensitive = false, $reverse = true)
{
$pos = 0;
$func = $case_insensitive ? 'strncasecmp' : 'strncmp';
// Get number of initial characters, that match in both strings
while ($func($s, $dirt, $pos + 1) === 0)
$pos++;
// If more than 2 initial characters match, then remove the match
if ($pos > 2)
$s = substr($s, $pos);
// After second run return back-reversed string
return trim($s, ' .-');
}
function regexpStyle($s, $dirt, $case_insensitive, $min_abbrev = 3)
{
$ss = substr($dirt, $min_abbrev);
$arr = str_split($ss);
$patt = '(?>(?<last>'.array_pop($arr).'))?';
$i = count($arr);
while ($i)
$patt = '(?>'.$arr[--$i].$patt.')?';
$patt = '#^'.substr($dirt, 0, $min_abbrev).$patt.'(?(last)|\.)#';
$patt .= $case_insensitive ? 'i' : null;
return trim(preg_replace($patt, '', $s));
}
and the winner is... moment of silence... it is...
a draw
regexp: 8.5169589519501 >>> string operations: 8.0951890945435
but I have strong feeling that regexp approach could be better utilized.

Search for pattern in a string

Pattern search within a string.
for eg.
$string = "111111110000";
FindOut($string);
Function should return 0
function FindOut($str){
$items = str_split($str, 3);
print_r($items);
}
If I understand you correctly, your problem comes down to finding out whether a substring of 3 characters occurs in a string twice without overlapping. This will get you the first occurence's position if it does:
function findPattern($string, $minlen=3) {
$max = strlen($string)-$minlen;
for($i=0;$i<=$max;$i++) {
$pattern = substr($string,$i,$minlen);
if(substr_count($string,$pattern)>1)
return $i;
}
return false;
}
Or am I missing something here?
What you have here can conceptually be solved with a sliding window. For your example, you have a sliding window of size 3.
For each character in the string, you take the substring of the current character and the next two characters as the current pattern. You then slide the window up one position, and check if the remainder of the string has what the current pattern contains. If it does, you return the current index. If not, you repeat.
Example:
1010101101
|-|
So, pattern = 101. Now, we advance the sliding window by one character:
1010101101
|-|
And see if the rest of the string has 101, checking every combination of 3 characters.
Conceptually, this should be all you need to solve this problem.
Edit: I really don't like when people just ask for code, but since this seemed to be an interesting problem, here is my implementation of the above algorithm, which allows for the window size to vary (instead of being fixed at 3, the function is only briefly tested and omits obvious error checking):
function findPattern( $str, $window_size = 3) {
// Start the index at 0 (beginning of the string)
$i = 0;
// while( (the current pattern in the window) is not empty / false)
while( ($current_pattern = substr( $str, $i, $window_size)) != false) {
$possible_matches = array();
// Get the combination of all possible matches from the remainder of the string
for( $j = 0; $j < $window_size; $j++) {
$possible_matches = array_merge( $possible_matches, str_split( substr( $str, $i + 1 + $j), $window_size));
}
// If the current pattern is in the possible matches, we found a duplicate, return the index of the first occurrence
if( in_array( $current_pattern, $possible_matches)) {
return $i;
}
// Otherwise, increment $i and grab a new window
$i++;
}
// No duplicates were found, return -1
return -1;
}
It should be noted that this certainly isn't the most efficient algorithm or implementation, but it should help clarify the problem and give a straightforward example on how to solve it.
Looks like you more want to use a sub-string function to walk along and check every three characters and not just break it into 3
function fp($s, $len = 3){
$max = strlen($s) - $len; //borrowed from lafor as it was a terrible oversight by me
$parts = array();
for($i=0; $i < $max; $i++){
$three = substr($s, $i, $len);
if(array_key_exists("$three",$parts)){
return $parts["$three"];
//if we've already seen it before then this is the first duplicate, we can return it
}
else{
$parts["$three"] = i; //save the index of the starting position.
}
}
return false; //if we get this far then we didn't find any duplicate strings
}
Based on the str_split documentation, calling str_split on "1010101101" will result in:
Array(
[0] => 101
[1] => 010
[2] => 110
[3] => 1
}
None of these will match each other.
You need to look at each 3-long slice of the string (starting at index 0, then index 1, and so on).
I suggest looking at substr, which you can use like this:
substr($input_string, $index, $length)
And it will get you the section of $input_string starting at $index of length $length.
quick and dirty implementation of such pattern search:
function findPattern($string){
$matches = 0;
$substrStart = 0;
while($matches < 2 && $substrStart+ 3 < strlen($string) && $pattern = substr($string, $substrStart++, 3)){
$matches = substr_count($string,$pattern);
}
if($matches < 2){
return null;
}
return $substrStart-1;

How can I efficiently parse a string for two strings?

How can I efficiently determine if a given string contains two strings?
For example, let's say I'm given the string: abc-def-jk-l. This string either contains two strings divided by a -, or it's not a match. The matching possibilities are:
Possible Matches for "abc-def-jk-l" :
abc def-jk-l
abc-def jk-l
abc-def-jk l
Now, here are my columns of strings to match:
Column I Column II
------- -------
1. abc-def A. qwe-rt
2. ghijkl B. yui-op
3. mn-op-qr C. as-df-gh
4. stuvw D. jk-l
How can I efficiently check to see if the given string matches two strings in the columns above? (The above is a match - matching abc-def and jk-l)
Here are some more examples:
abc-def-yui-op [MATCH - Matches 1-B]
abc-def-zxc-v [NO MATCH - Matches 1, but not any in column II.]
stuvw-jk-l [MATCH - Matches 4-D]
mn-op-qr-jk-l [Is this a match?]
Now, given a strings above, how can I efficiently determine matches? (Efficiency will be key, because columns i and ii will each have millions of rows on indexed columns in their respected tables!)
UPDATE: The order will always be column i, then column ii. (or "no match", which could mean it matches only one column or none)
Here's some php to help:
<?php
$arrStrings = array('abc-def-yui-op','abc-def-zxc-v','stuvw-jk-l','stuvw-jk-l');
foreach($arrStrings as $string) {
print_r(stringMatchCheck($string));
}
function stringMatchCheck($string) {
$arrI = array('abc-def','ghijkl','mn-op-qr','stuvw');
$arrII = array('qwe-rt','yui-op','as-df-gh','jk-l');
// magic stackoverflow help goes here!
if ()
return array($match[0],$match[1]);
else
return false;
}
?>
Just use PHP's strpos(). Loop until you find an entry from $arrI in $string using strpos(), and do the same for $arrII.
More info on strpos(): http://php.net/manual/en/function.strpos.php
EDIT:
To help you see what I'm talking about, here's your function:
function stringMatchCheck($string) {
$arrI = array('abc-def','ghijkl','mn-op-qr','stuvw');
$arrII = array('qwe-rt','yui-op','as-df-gh','jk-l');
$match = array(NULL, NULL);
// get match, if any, from first group
for ($i=0; $i<count($arrI) && !is_null($match[0]); $i++) {
if (strpos($string,$arrI[$i]) !== false) {
$match[0]=$arrI[$i];
}
}
if (!is_null($match[0])) {
// get match, if any, from second group group
for ($i=0; $i<count($arrII) && !is_null($match[1]); $i++) {
if (strpos($string,$arrII[$i]) !== false) {
$match[1]=$arrII[$i];
}
}
}
if (!is_null($match[0]) && !is_null($match[1])) {
return $match;
} else {
return false;
}
}
For efficiency sake, rather than loop through every entry in each column, split the string into as many different words as it takes and search for every word combination. Basically what you mention as possible matches.
$words = explode("-", $string);
$end = count($words) - 1;
for ( $i = 1; $i < $end; $i++ ) {
$partOne = array_slice($words, 0, $i);
$parttwo = array_slice($words, $i);
$wordOne = implode("-" , $partOne);
$wordTwo = implode("-" , $partTwo);
/* SQL to select $wordOne and $wordTwo from the tables */
}

Categories