Split a string, remember the positions of splitting - php

Assume I have the following string:
I have | been very busy lately and need to go | to bed early
By splitting on "|", you get:
$arr = array(
[0] => I have
[1] => been very busy lately and need to go
[2] => to bed early
)
The first split is after 2 words, and the second split 8 words after that. The positions after how many words to split will be stored: array(2, 8, 3). Then, the string is imploded to be passed on to a custom string tagger:
tag_string('I have been very busy lately and need to go to bed early');
I don't know what the output of tag_string will be exactly, except that the total words will remain the same. Examples of output would be:
I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p
I-ee have been-vb very busy-df lately-nn and need-f to go to bed-uu early-yy
This will lengthen the string by an unknown number of characters. I have no control over tag_string. What I know is (1) the number of words will be the same as before and (2) the array was split after 2, and thereafter after 8 words, respectively. I now need a solution explode the tagged string into the same array as before:
$string = "I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p"
function split_string_again() {
// split after 2nd, and thereafter after 8th word
}
With output:
$arr = array(
[0] => I have-nn
[1] => been-vb very-vb busy lately and-rr need to-r go
[2] => to bed early-p
)
So to be clear (I wasn't before): I cannot split by remembering the strpos, because strpos before and after the string went through the tagger, aren't the same. I need to count the number of words. I hope I have made myself more clear :)

You wouldn't want to count the number of words, you would want to count the string length (strlen). If it is the same string without the pipes, then you want to split it with substr after a certain amount.
$strCounts = array();
foreach ($arr as $item) {
$strCounts[] = strlen($item);
}
// Later on.
$arr = array();
$i = 0;
foreach ($strCounts as $count) {
$arr[] = substr($string, $i, $count);
$i += $count; // increment the start position by the length
}
I have not tested this, simply a "theory" and probably has some kinks to work out. There may be a better way to go about it, I just don't know it.

Interesting question, although I think the rope data structure still applies it might be a little overkill since word placement won't change. Here is my solution:
$str = "I have | been very busy lately and need to go | to bed early";
function get_breaks($str)
{
$breaks = array();
$arr = explode("|", $str);
foreach($arr as $val)
{
$breaks[] = str_word_count($val);
}
return $breaks;
}
$breaks = get_breaks($str);
echo "<pre>" . print_r($breaks, 1) . "</pre>";
$str = str_replace("|", "", $str);
function rebreak($str, $breaks)
{
$return = array();
$old_break = 0;
$arr = str_word_count($str, 1);
foreach($breaks as $break)
{
$return[] = implode(" ", array_slice($arr, $old_break, $break));
$old_break += $break;
}
return $return;
}
echo "<pre>" . print_r(rebreak($str, $breaks), 1) . "</pre>";
echo "<pre>" . print_r(rebreak("I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p", $breaks), 1) . "</pre>";
Let me know if you have any questions, but it is pretty self explanatory. There are definitely ways to improve this as well.

I'm not quite sure I understood what you actually wanted to achieve. But here are a couple of things that might help you:
str_word_count() counts the number of words in a string. preg_match_all('/\p{L}[\p{L}\p{Mn}\p{Pd}\x{2019}]*/u', $string, $foo); does pretty much the same, but on UTF-8 strings.
strpos() finds the first occurrence of a string within another. You could easily find the positions of all | with this:
$pos = -1;
$positions = array();
while (($pos = strpos($string, '|', $pos + 1)) !== false) {
$positions[] = $pos;
}
I'm still not sure I understood why you can't just use explode() for this, though.
<?php
$string = 'I have | been very busy lately and need to go | to bed early';
$parts = explode('|', $string);
$words = array();
foreach ($parts as $s) {
$words[] = str_word_count($s);
}

Related

php foreach group results by 3 sets of characters

I am trying to group some foreach results based on the first 3 sets of characters.
For example i am currently listing sku codes for products and they look like this:
REF-MUSBOM-0500-ORA
REF-PROCOF-0001-LAT
REF-WHEREF-0001-TRO
REF-WHEREF-0001-ORA
REF-SHAKER-0700-C/B
REF-CREMON-0100-N/A
REF-GLUSUL-0090-N/A
REF-CRECAP-0090-N/A
REF-ALBFER-0120-N/A
REF-TSHCOT-LARG-BLK
REF-TSHCOT-MEDI-BLK
REF-ALBMAG-0090-N/A
REF-GYMJUG-2200-N/A
REF-OMEGA3-0090-N/A
REF-NEXGEN-0060-N/A
REF-VITAD3-0100-N/A
REF-SSSHAK-0739-N/A
REF-GINKGO-0090-N/A
REF-DIGEZY-0090-N/A
REF-VEST00-MEDI-N/A
REF-VEST00-LARG-N/A
REF-CREMON-0250-N/A
REF-MSM----0250-N/A
REF-GRNTEA-0100-N/A
REF-COLOST-0100-N/A
REF-GLUCHO-0090-N/A
REF-ZINCMA-0100-N/A
REF-BETALA-0250-N/A
REF-DRIBOS-0250-N/A
REF-HMB000-0090-N/A
REF-ALACID-0090-N/A
REF-CLA000-0090-N/A
REF-ACETYL-0090-N/A
REF-NXGPRO-0090-N/A
REF-LGLUTA-0250-N/A
REF-BCAA20-0200-N/A
REF-FLAPJA-0012-ACR
REF-FLAPJA-0012-MAP
REF-LCARNI-0100-N/A
REF-CORDYC-0090-N/A
REF-CREMON-0500-N/A
REF-BCAAEN-0330-APP
REF-PREWKT-0300-FPU
REF-TESFUS-0090-N/A
REF-AMIIFUS-0300-GAP
REF-AMIIFUS-0300-WME
REF-BCAINT-0400-FPU
REF-KRILLO-0090-N/A
REF-AMIIFUS-0300-PLE
REF-AMIIFUS-0300-FPU
REF-BCAINT-0400-WME
REF-ENZQ10-0090-N/A
REF-THERMO-0100-N/A
REF-LGLUTA-0500-N/A
REF-RBAR00-0012-DCB
REF-RBAR00-0012-PBC
REF-RBAR00-0012-WCR
REF-IMHEAV-2200-CHO
REF-PROCOF-0012-N/A
REF-DIEPRO-0900-STR
REF-DIEPRO-0900-BOF
REF-DIEPRO-0900-CHO
REF-INWPRO-0900-VAN
REF-INWPRO-0900-BOF
REF-INWPRO-0900-BCS
REF-INWPRO-0900-CHO
REF-INWPRO-0900-CMI
REF-INWPRO-0900-RAS
REF-INWPRO-0900-STR
REF-INWPRO-0900-CIN
REF-INWPRO-0900-CPB
REF-EGGPRO-0900-CHO
REF-EGGPRO-0900-VAN
REF-MICCAS-0909-CHO
REF-MICCAS-0909-CMI
REF-MICCAS-0909-VAN
REF-MICCAS-0909-STR
REF-BCAA50-0500-N/A
REF-MICWHE-0909-STR
REF-MICWHE-0909-VAN
REF-MICWHE-0909-CHIO
REF-MICWHE-0909-BAN
REF-1STOXT-2030-STR
REF-1STOXT-2030-VAN
REF-1STOXT-2030-CHO
REF-MUSBOM-0600-BCH
REF-MUSBOM-0600-FPU
REF-MUSBCF-0600-BCH
REF-MUSBCF-0600-FPU
REF-VEGANP-2100-STR
REF-VEGANP-2100-CHO
REF-INMPRO-2270-CPB
REF-DIETMR-2400-CPB
REF-INMPRO-2270-SCR
REF-INMPRO-2270-VIC
REF-MATRIX-1800-FRU
REF-INMPRO-2270-BOF
REF-MATRIX-1800-CHO
REF-INMPRO-2270-CHO
REF-ONESTO-2100-CHO
In the above list there are 2 skus which are:
REF-WHEREF-0001-TRO
REF-WHEREF-0001-ORA
The first 3 sets of characters split by - are the same. What would be the best approach of grouping all results leaving me an array something like this:
Array
(
[REF-WHEREF-0001] => Array
(
[0] => REF-WHEREF-0001-TRO
[1] => REF-WHEREF-0001-ORA
)
)
Are the first 3 groups (excluding the multiple -) always 13 characters? Then do something like this:
<?php
$arr = ["REF-MUSBOM-0500-ORA",
"REF-PROCOF-0001-LAT",
"REF-WHEREF-0001-TRO",
"REF-WHEREF-0001-PPL"];
$resultArr = [];
foreach ($arr as $sku) {
$resultArr[substr($sku, 0, 15)][] = $sku;
}
var_dump($resultArr);
If that length varies you might want to work with a regex or the strpos() of the third -.
I must say that I think you could come up with this yourself, since you were already thinking in the right direction i.e. foreach()
EDIT: Because I found other solutions more elegant looking, I decided to compare efficiency. This solution is a lot faster than the other ones.
I always create a new array with the index that I need for group, try this:
$arr=array('REF-MUSBOM-0500-ORA',
'REF-PROCOF-0001-LAT',
'REF-WHEREF-0001-TRO');
$newarr=array();
foreach($arr as $a){
$b=explode('-',$a);
array_pop($b);
$b=implode("-", $b);
$newarr[$b][]=$a;
}
echo '<pre>',print_r($newarr),'</pre>';
You will need to pick a group with some basics use of explode, implode and str_replace.
What does this solution do.
loop through the array of your items
explode to get last item index of exploded string assuming that it
would be dynamic in the end
implode & str_replace again to find out string of group name
And last strpos & in_array to have sample reponse
Solution
$array = array(
'REF-MUSBOM-0500-ORA',
'REF-PROCOF-0001-LAT',
'REF-WHEREF-0001-TRO',
'REF-WHEREF-0001-ORA',
'REF-SHAKER-0700-C/B',
'REF-CREMON-0100-N/A',
'REF-GLUSUL-0090-N/A',
'REF-CRECAP-0090-N/A',
'REF-ALBFER-0120-N/A',
);
$new_array = array();
foreach ($array as $key => $val) {
$group_arr = explode('-', $val);
$end = end($group_arr);
$combined_group = implode('-', $group_arr);
$group = str_replace('-' . $end, '', $combined_group);
if (strpos($val, $group) !== false && !in_array($group, $new_array)) {
$new_array[$group][] = $val;
}
}
echo '<pre>';print_r($new_array);echo '</pre>';
See demo on Sandbox

Find the word in a sentence with maximum specific character count

I am new to PHP Development and finally with the help of SO I am able to write a program for finding word in a sentence with maximum specific character count.
Below is what I have tried:
<?php
// Program to find the word in a sentence with maximum specific character count
// Example: "O Romeo, Romeo, wherefore art thou Romeo?”
// Solution: wherefore
// Explanation: Because "e" came three times
$content = file_get_contents($argv[1]); // Reading content of file
$max = 0;
$arr = explode(" ", $content); // entire array of strings with file contents
for($x =0; $x<count($arr); $x++) // looping through entire array
{
$array[$x] = str_split($arr[$x]); // converting each of the string into array
}
for($x = 0; $x < count($arr); $x++)
{
$count = array_count_values($array[$x]);
$curr_max = max($count);
if($curr_max > $max)
{
$max = $curr_max;
$word = $arr[$x];
}
}
echo $word;
?>
Question: Since I am new to PHP development I don't know the optimization techniques. Is there anyway I can optimize this code? Also, Can I use regex to optimize this code further? Kindly guide.
I love coding this type of mini-challenges in the minimum lines of code :D. So here is my solution:
function wordsWithMaxCharFrequency($sentence) {
$words = preg_split('/\s+/', $sentence);
$maxCharsFrequency = array_map (function($word) {
return max(count_chars(strtolower($word)));
}, $words);
return array_map(function($index) use($words) {
return $words[$index];
}, array_keys($maxCharsFrequency, max($maxCharsFrequency)));
}
print_r(wordsWithMaxCharFrequency("eeee yyyy"));
//Output: Array ( [0] => eeee [1] => yyyy )
print_r(wordsWithMaxCharFrequency("xx llll x"));
//Output: Array ( [0] => llll )
Update1:
If you want to get only A-Za-z words use the following code:
$matches = [];
//a word is either followed by a space or end of input
preg_match_all('/([a-z]+)(?=\s|$)/i', $sentence, $matches);
$words = $matches[1];
Just a contribution that could inspire you :D!
Good Luck.

similar substring in other string PHP

How to check substrings in PHP by prefix or postfix.
For example, I have the search string named as $to_search as follows:
$to_search = "abcdef"
And three cases to check the if that is the substring in $to_search as follows:
$cases = ["abc def", "def", "deff", ... Other values ...];
Now I have to detect the first three cases using substr() function.
How can I detect the "abc def", "def", "deff" as substring of "abcdef" in PHP.
You might find the Levenshtein distance between the two words useful - it'll have a value of 1 for abc def. However your problem is not well defined - matching strings that are "similar" doesn't mean anything concrete.
Edit - If you set the deletion cost to 0 then this very closely models the problem you are proposing. Just check that the levenshtein distance is less than 1 for everything in the array.
This will find if any of the strings inside $cases are a substring of $to_search.
foreach($cases as $someString){
if(strpos($to_search, $someString) !== false){
// $someString is found inside $to_search
}
}
Only "def" is though as none of the other strings have much to do with each other.
Also on a side not; it is prefix and suffix not postfix.
To find any of the cases that either begin with or end with either the beginning or ending of the search string, I don't know of another way to do it than to just step through all of the possible beginning and ending combinations and check them. There's probably a better way to do this, but this should do it.
$to_search = "abcdef";
$cases = ["abc def", "def", "deff", "otherabc", "noabcmatch", "nodefmatch"];
$matches = array();
$len = strlen($to_search);
for ($i=1; $i <= $len; $i++) {
// get the beginning and end of the search string of length $i
$pre_post = array();
$pre_post[] = substr($to_search, 0, $i);
$pre_post[] = substr($to_search, -$i);
foreach ($cases as $case) {
// get the beginning and end of each case of length $i
$pre = substr($case, 0, $i);
$post = substr($case, -$i);
// check if any of them match
if (in_array($pre, $pre_post) || in_array($post, $pre_post)) {
// using the case as the array key for $matches will keep it distinct
$matches[$case] = true;
}
}
}
// use array_keys() to get the keys back to values
var_dump(array_keys($matches));
You can use array_filter function like this:
$cases = ["cake", "cakes", "flowers", "chocolate", "chocolates"];
$to_search = "chocolatecake";
$search = strtolower($to_search);
$arr = array_filter($cases, function($val) use ($search) { return
strpos( $search,
str_replace(' ', '', preg_replace('/s$/', '', strtolower($val))) ) !== FALSE; });
print_r($arr);
Output:
Array
(
[0] => cake
[1] => cakes
[3] => chocolate
[4] => chocolates
)
As you can it prints all the values you expected apart from deff which is not part of search string abcdef as I commented above.

Cut string in PHP at nth-from-end occurrence of character

I have a string which can be written in a number of different ways, it will always follow the same pattern but the length of it can differ.
this/is/the/path/to/my/fileA.php
this/could/also/be/the/path/to/my/fileB.php
another/example/of/a/long/address/which/is/a/path/to/my/fileC.php
What I am trying to do is cut the string so that I am left with
path/to/my/file.php
I have some code which I got from this page and modified it to the following
$max = strlen($full_path);
$n = 0;
for($j=0;$j<$max;$j++){
if($full_path[$j]=='/'){
$n++;
if($n>=3){
break 1;
}
}
}
$path = substr($full_path,$j+1,$max);
Which basically cuts it at the 3rd instance of the '/' character, and gives me what is left. This was fine when I was working in one environment, but when I migrated it to a different server, the path would be longer, and so the cut would give me too long an address. I thought that rather than changing the hard coded integer value for each instance, it would work better if I had it cut the string at the 4th from last instance, as I always want to keep the last 4 'slashes' of information
Many thanks
EDIT - final code solution
$exploded_name = explode('/', $full_path);
$exploded_trimmed = array_slice($exploded_name, -4);
$imploded_name = implode('/', $exploded_trimmed);
just use explode with your string and if pattern is always the same then get last element of the array and your work is done
$pizza = "piece1/piece2/piece3/piece4/piece5/piece6";
$pieces = explode("/", $pizza);
echo $pieces[0]; // piece1
echo $pieces[1]; // piece2
Then reverse your array get first four elements of array and combine them using "implode"
to get desired string
This function below can work like a substr start from nth occurrence
function substr_after_nth($str, $needle, $key)
{
$array = explode($needle, $str);
$temp = array();
for ($i = $key; $i < count($array); $i++) {
$temp[] = $array[$i];
}
return implode($needle, $temp);
}
Example
$str = "hello-world-how-are-you-doing";
substr after 4th occurrence of "-" to get "you-doing"
call the function above as
echo substr_after_nth($str, "-", 4);
it will result as
you-doing

"Unfolding" a String

I have a set of strings, each string has a variable number of segments separated by pipes (|), e.g.:
$string = 'abc|b|ac';
Each segment with more than one char should be expanded into all the possible one char combinations, for 3 segments the following "algorithm" works wonderfully:
$result = array();
$string = explode('|', 'abc|b|ac');
foreach (str_split($string[0]) as $i)
{
foreach (str_split($string[1]) as $j)
{
foreach (str_split($string[2]) as $k)
{
$result[] = implode('|', array($i, $j, $k)); // more...
}
}
}
print_r($result);
Output:
$result = array('a|b|a', 'a|b|c', 'b|b|a', 'b|b|c', 'c|b|a', 'c|b|c');
Obviously, for more than 3 segments the code starts to get extremely messy, since I need to add (and check) more and more inner loops. I tried coming up with a dynamic solution but I can't figure out how to generate the correct combination for all the segments (individually and as a whole). I also looked at some combinatorics source code but I'm unable to combine the different combinations of my segments.
I appreciate if anyone can point me in the right direction.
Recursion to the rescue (you might need to tweak a bit to cover edge cases, but it works):
function explodinator($str) {
$segments = explode('|', $str);
$pieces = array_map('str_split', $segments);
return e_helper($pieces);
}
function e_helper($pieces) {
if (count($pieces) == 1)
return $pieces[0];
$first = array_shift($pieces);
$subs = e_helper($pieces);
foreach($first as $char) {
foreach ($subs as $sub) {
$result[] = $char . '|' . $sub;
}
}
return $result;
}
print_r(explodinator('abc|b|ac'));
Outputs:
Array
(
[0] => a|b|a
[1] => a|b|c
[2] => b|b|a
[3] => b|b|c
[4] => c|b|a
[5] => c|b|c
)
As seen on ideone.
This looks like a job for recursive programming! :P
I first looked at this and thought it was going to be a on-liner (and probably is in perl).
There are other non-recursive ways (enumerate all combinations of indexes into segments then loop through, for example) but I think this is more interesting, and probably 'better'.
$str = explode('|', 'abc|b|ac');
$strlen = count( $str );
$results = array();
function splitAndForeach( $bchar , $oldindex, $tempthread) {
global $strlen, $str, $results;
$temp = $tempthread;
$newindex = $oldindex + 1;
if ( $bchar != '') { array_push($temp, $bchar ); }
if ( $newindex <= $strlen ){
print "starting foreach loop on string '".$str[$newindex-1]."' \n";
foreach(str_split( $str[$newindex - 1] ) as $c) {
print "Going into next depth ($newindex) of recursion on char $c \n";
splitAndForeach( $c , $newindex, $temp);
}
} else {
$found = implode('|', $temp);
print "Array length (max recursion depth) reached, result: $found \n";
array_push( $results, $found );
$temp = $tempthread;
$index = 0;
print "***************** Reset index to 0 *****************\n\n";
}
}
splitAndForeach('', 0, array() );
print "your results: \n";
print_r($results);
You could have two arrays: the alternatives and a current counter.
$alternatives = array(array('a', 'b', 'c'), array('b'), array('a', 'c'));
$counter = array(0, 0, 0);
Then, in a loop, you increment the "last digit" of the counter, and if that is equal to the number of alternatives for that position, you reset that "digit" to zero and increment the "digit" left to it. This works just like counting with decimal numbers.
The string for each step is built by concatenating the $alternatives[$i][$counter[$i]] for each digit.
You are finished when the "first digit" becomes as large as the number of alternatives for that digit.
Example: for the above variables, the counter would get the following values in the steps:
0,0,0
0,0,1
1,0,0 (overflow in the last two digit)
1,0,1
2,0,0 (overflow in the last two digits)
2,0,1
3,0,0 (finished, since the first "digit" has only 3 alternatives)

Categories