I'm trying to write a highlighting functionality. There are two types of highlighting: positive and negative. Positive is done first. Highlighting in itself is very simple - just wrapping keyword/phrase in a span with a specific class, which depends on type of highlighting.
Problem:
Sometimes, negative highlighting can contain positive one.
Example:
Original text:
some data from blahblah test was not statistically valid
After text passes through positive highlighting "filter", it'll end up like this:
some data from <span class="positive">blahblah test</span> was not <span class="positive">statistically valid</span>
or
some data from <span class="positive">blahblah test</span> was not <span class="positive">statistically <span class="positive">valid</span></span>
Then in negative list, we have a phrase not statistically valid.
In both cases, resulting text after passing through both "filters" should look like:
some data from <span class="positive">blahblah test</span> was <span class="negative">not statistically valid</span>
Conditions:
- Amount of span tags or their location within keyword/phrase from negative "filter" list is unknown
- Keyword/phrase must be matched even if it includes span tags (including right before and right after keyword/phrase). These span tags have to be removed.
- If any span tags are detected, amount of opening and closing span tags removed has to be equal.
Questions:
- How to detect these span tags if there are any?
- Is this even possible with RegEx alone?
I don't think if it can be done with a single Regular Expression and if it's possible, then honestly I'm so lazy for blowing my mind to make it.
In what way do you think?
I came to a solution that takes 4 steps to achieve what you desire:
Extract all words from HTML and store them with their corresponding
positions.
Explode each Negative List's value to words
Match each word consecutively to the (1) in-order-words and store
Replace recently found values with their new HTML wrapper (<span class="negative">...</span>) by their positions
I have, however, made a detailed flowchart (I'm not good at flowchats, sorry) that you feel better in understanding things. It could help if you look at codes at the first.
Here is what we have:
$HTML = <<< HTML
some data from <span class="positive">blahblah test</span> was not <span class="positive">statistically <span class="positive">valid</span></span>
HTML;
$listOfNegatives = ['not statistically valid'];
To extract words (real words) I used a RegEx which will fulfill our needs at this step:
~\b(?<![</])\w+\b(?![^<>]+>)~
To get positions of each word too, a flag should be used with preg_match_all(): PREG_OFFSET_CAPTURE
/**
* Extract all words and their corresponsing positions
* #param [string] $HTML
* #return [array] $HTMLWords
*/
function extractWords($HTML) {
$HTMLWords = [];
preg_match_all("~\b(?<![</])\w+\b(?![^<>]+>)~", $HTML, $words, PREG_OFFSET_CAPTURE);
foreach ($words[0] as $word) {
$HTMLWords[$word[1]] = $word[0];
}
return $HTMLWords;
}
This function's output is something like this:
Array
(
[0] => some
[5] => data
[10] => from
[38] => blahblah
[47] => test
[59] => was
[63] => not
[90] => statistically
[127] => valid
)
What we should do here is to match each words of a list's value - consecutively - to words we just extracted. So as our first list's value not statistically valid we have three words not, statistically and valid and these words should come continuously in the extracted words array. (which happens)
To handle this I wrote a function:
/**
* Check if any of our defined list values can be found in an ordered-array of exctracted words
* #param [array] $HTMLWords
* #param [array] $listOfNegatives
* #return [array] $subString
*/
function checkNegativesExistence($HTMLWords, $listOfNegatives) {
$counter = 0;
$previousWordOffset = null;
$subStrings = [];
foreach ($listOfNegatives as $i => $string) {
$stringWords = explode(" ", $string);
$wordIndex = 0;
foreach ($HTMLWords as $offset => $HTMLWord) {
if ($wordIndex > count($stringWords) - 1) {
$wordIndex = 0;
$counter++;
}
if ($stringWords[$wordIndex] == $HTMLWord) {
$subStrings[$counter][] = [$HTMLWord, $offset, $previousWordOffset];
$wordIndex++;
} elseif (isset($subStrings[$counter]) && count($subStrings[$counter]) > 0) {
unset($subStrings[$counter]);
$wordIndex = 0;
}
$previousWordOffset = $offset + strlen($HTMLWord);
}
$counter++;
}
return $subStrings;
}
Which has an output like below:
Array
(
[0] => Array
(
[0] => Array
(
[0] => not
[1] => 63
[2] => 62
)
[1] => Array
(
[0] => statistically
[1] => 90
[2] => 66
)
[2] => Array
(
[0] => valid
[1] => 127
[2] => 103
)
)
)
If you see we have a complete string split into words and their offsets (we have two offsets, first one is real offset second one is offset of previous word). We need them later.
Now another thing we should consider is to replace this occurrence from offset 62 to 127 + strlen(valid) with <span class="negative">not statistically valid</span> and forget about every thing else.
/**
* Substitute newly matched strings with negative HTML wrapper
* #param [array] $subStrings
* #param [string] $HTML
* #return [string] $HTML
*/
function negativeHighlight($subStrings, $HTML) {
$offset = 0;
$HTMLLength = strlen($HTML);
foreach ($subStrings as $key => $value) {
$arrayOfWords = [];
foreach ($value as $word) {
$arrayOfWords[] = $word[0];
if (current($value) == $value[0]) {
$start = substr($HTML, $word[1], strlen($word[0])) == $word[0] ? $word[2] : $word[2] + $offset;
}
if (current($value) == end($value)) {
$defaultLength = $word[1] + strlen($word[0]) - $start;
$length = substr($HTML, $word[1], strlen($word[0])) === $word[0] ? $defaultLength : $defaultLength + $offset;
}
}
$string = implode(" ", $arrayOfWords);
$HTML = substr_replace($HTML, "<span class=\"negative\">{$string}</span>", $start, $length);
if ($HTMLLength > strlen($HTML)) {
$offset = -($HTMLLength - strlen($HTML));
} elseif ($HTMLLength < strlen($HTML)) {
$offset = strlen($HTML) - $HTMLLength;
}
}
return $HTML;
}
An important thing here I should note is that by doing first substitution we may affect offsets of other extracted values (that we don't have here). So calculating new HTML length is required:
if ($HTMLLength > strlen($HTML)) {
$offset = -($HTMLLength - strlen($HTML));
} elseif ($HTMLLength < strlen($HTML)) {
$offset = strlen($HTML) - $HTMLLength;
}
and... we should check if by this change of length how did our offsets changed:
Word's offset is intact (some characters were added/removed after
this word)
Word's offset is changed (some characters were added/removed before
this word)
This checking is done by this block (we need to check first and last word only):
if (current($value) == $value[0]) {
$start = substr($HTML, $word[1], strlen($word[0])) == $word[0] ? $word[2] : $word[2] + $offset;
}
if (current($value) == end($value)) {
$defaultLength = $word[1] + strlen($word[0]) - $start;
$length = substr($HTML, $word[1], strlen($word[0])) === $word[0] ? $defaultLength : $defaultLength + $offset;
}
Doing all together:
$newHTML = negativeHighlight(checkNegativesExistence(extractWords($HTML), $listOfNegatives), $HTML);
Output:
some data from <span class="positive">blahblah test</span> was <span class="negative">not statistically valid</span></span></span>
But there are problems with our last output: unmatched tags.
I'm sorry that I lied I've done this problem solving in 4 steps but it has one more. Here I made another RegEx to match all truly nested tags and those which are mistakenly existed:
~(<span[^>]+>([^<]*+<(?!/)(?:([a-zA-Z0-9]++)[^>]*>[^<]*</\3>|(?2)))*[^<]*</span>|(?'single'</[^>]+>|<[^>]+>))~
By a preg_replace_callback() I only replace tags in group named single with nothing:
echo preg_replace_callback("~(<span[^>]+>([^<]*+<(?!/)(?:([a-zA-Z0-9]++)[^>]*>[^<]*</\3>|(?2)))*[^<]*</span>|(?'single'</[^>]+>|<[^>]+>))~",
function ($match) {
if (isset($match['single'])) {
return null;
}
return $match[1];
},
$newHTML
);
and we have right output:
some data from <span class="positive">blahblah test</span> was <span class="negative">not statistically valid</span>
Failing cases
My solution does not output right HTML on below situations:
1- If a word like <was> is between other words:
<span class="positive">blahblah test</span> <was> not
Why?
Because my last RegEx spots <was> as an unmatched tag so it will
remove it.
2- If a word like not (which is part of a negative list's value in
our list) is enclosed with <> -> <not>. Which outputs:
some data from <span class="positive">blahblah test</span> was <not> <span class="positive">statistically <span class="positive">valid</span></span>
Why?
Because my first RegEx understands words which are not between tags
specific characters <>
3- If list has values that one is the other's substring:
$listOfNegatives = ['not statistically valid', 'not statistically'];
Why?
Because they overlap.
Working demo
Here is what I've come up with. I honestly can't say whether it will cope with the full range of the requirement, but it might help a bit
$s = 'some data from blahblah test was not statistically valid';
$replaced = highlight($s);
var_dump($replaced);
function highlight($s) {
// split the string on the negative parts, capturing the full negative string each time
$parts = preg_split('/(not statistically valid)/',$s,-1,PREG_SPLIT_DELIM_CAPTURE);
$output = '';
$negativePart = 0; // keep track of whether we're dealing with a negative or part or the remainder - they will alternate.
foreach ($parts as $part) {
if ($negativePart) {
$output .= negativeHighlight($part);
} else {
$output .= positiveHighlight($part);
}
$negativePart = !$negativePart;
}
return $output;
}
// only deals with a single negative part at a time, so just wraps with a span
function negativeHighlight($part) {
return "<span class='negative'>$part</span>";
}
// potentially deals with several replacements at once
function positiveHighlight($part) {
return preg_replace('/(blahblah test)|(statistically valid)/', "<span class='positive'>$1</span>", $part);
}
Related
I have some word lets say BKOO.
I need to remove all combinations of missing letters to generate sub words of this initial word. First remove only 1 letter, then n letters to build at least 2 letters words.
So from our example it means to make words like KOO, BOO, OO, BK, BO.
My current algorithm btw says it is possible to generate 7 combinations out of BKOO. (I also include the initial word).
Array
(
[0] => BKOO
[1] => Array
(
[0] => BKOO
[1] => KOO
[2] => OO
[3] => KO
[4] => BOO
[5] => BO
[6] => BKO
[7] => BK
)
)
Note there isnt words like BOK or OOK because that would mean do the reorder, but i dont want to do this. I want just leave letters out of current word, and don't do reorder.
Now problem is, this very slow for lenght like 15. It takes forever. How to speed it up?
function comb($s, $r = [], $init = false) {
if ($init) {
$s = mb_strtoupper($s);
$r[] = $s;
}
$l = strlen($s);
if (!$s || $l < 3) return [];
for ($i=0; $i<$l; $i++) {
$t = rem_index($s, $i);
$r[] = $t;
$r = array_merge($r, comb($t));
}
$ret = array_unique((array)$r);
return $init ? array_values($ret) : $ret;
}
// remove character at position
function rem_index($str, $ind)
{
return substr($str,0,$ind++). substr($str,$ind);
}
$s = 'BKOO';
print_r(comb($s, [], true));
https://www.tehplayground.com/62pjCAs70j7qpLJj
NERD SECTION: 🤓 😄
Interesting note - first i thought i will generate array of some dropping indexes eg, first drop only 1 letter so say drop 0 then 1 etc etc, then 2-combinations so drop 1 and 2, 1 and 3 etc, but then i thought it would be quite difficult to drop N letters out of string at once, so i came with idea that i always drop some letter from the string, and recursively call the function again if you get me, so the next level is one char dropped already and does the drop iteration again. Problem is it is very slow for some reason.
Btw if you have also the math background, what is equation to compute the resulting combinations? To me the rough computation is lets say for 15 letters word 14 * 13 * 12 or at least it does such iteration, but that would be milions of combinations and obviously its not like that even for shorter words like 8.
Thanks.
You can iterate the string to get it.
function foo(&$res,$str,$min_length){
if(strlen($str) <= $min_length){
return;
}
$remains=[];
for($i=0; $i<strlen($str); $i++){
$remain = substr($str,0,$i) . substr($str,$i+1);
if(!isset($res[$remain])) { // only process unprocessed sub string
$res[$remain] = $remain;
$remains[] = $remain;
}
}
foreach($remains as $remain){
if(strlen($remain) == $min_length){
$res[$remain] = $remain;
}else {
foo($res, $remain, $min_length);
}
}
return;
}
$str = "BKOO";
$res = [];
foo($res,$str,2);
var_dump(array_values($res));
Pattern search within a string.
for eg.
$string = "111111110000";
FindOut($string);
Function should return 0
function FindOut($str){
$items = str_split($str, 3);
print_r($items);
}
If I understand you correctly, your problem comes down to finding out whether a substring of 3 characters occurs in a string twice without overlapping. This will get you the first occurence's position if it does:
function findPattern($string, $minlen=3) {
$max = strlen($string)-$minlen;
for($i=0;$i<=$max;$i++) {
$pattern = substr($string,$i,$minlen);
if(substr_count($string,$pattern)>1)
return $i;
}
return false;
}
Or am I missing something here?
What you have here can conceptually be solved with a sliding window. For your example, you have a sliding window of size 3.
For each character in the string, you take the substring of the current character and the next two characters as the current pattern. You then slide the window up one position, and check if the remainder of the string has what the current pattern contains. If it does, you return the current index. If not, you repeat.
Example:
1010101101
|-|
So, pattern = 101. Now, we advance the sliding window by one character:
1010101101
|-|
And see if the rest of the string has 101, checking every combination of 3 characters.
Conceptually, this should be all you need to solve this problem.
Edit: I really don't like when people just ask for code, but since this seemed to be an interesting problem, here is my implementation of the above algorithm, which allows for the window size to vary (instead of being fixed at 3, the function is only briefly tested and omits obvious error checking):
function findPattern( $str, $window_size = 3) {
// Start the index at 0 (beginning of the string)
$i = 0;
// while( (the current pattern in the window) is not empty / false)
while( ($current_pattern = substr( $str, $i, $window_size)) != false) {
$possible_matches = array();
// Get the combination of all possible matches from the remainder of the string
for( $j = 0; $j < $window_size; $j++) {
$possible_matches = array_merge( $possible_matches, str_split( substr( $str, $i + 1 + $j), $window_size));
}
// If the current pattern is in the possible matches, we found a duplicate, return the index of the first occurrence
if( in_array( $current_pattern, $possible_matches)) {
return $i;
}
// Otherwise, increment $i and grab a new window
$i++;
}
// No duplicates were found, return -1
return -1;
}
It should be noted that this certainly isn't the most efficient algorithm or implementation, but it should help clarify the problem and give a straightforward example on how to solve it.
Looks like you more want to use a sub-string function to walk along and check every three characters and not just break it into 3
function fp($s, $len = 3){
$max = strlen($s) - $len; //borrowed from lafor as it was a terrible oversight by me
$parts = array();
for($i=0; $i < $max; $i++){
$three = substr($s, $i, $len);
if(array_key_exists("$three",$parts)){
return $parts["$three"];
//if we've already seen it before then this is the first duplicate, we can return it
}
else{
$parts["$three"] = i; //save the index of the starting position.
}
}
return false; //if we get this far then we didn't find any duplicate strings
}
Based on the str_split documentation, calling str_split on "1010101101" will result in:
Array(
[0] => 101
[1] => 010
[2] => 110
[3] => 1
}
None of these will match each other.
You need to look at each 3-long slice of the string (starting at index 0, then index 1, and so on).
I suggest looking at substr, which you can use like this:
substr($input_string, $index, $length)
And it will get you the section of $input_string starting at $index of length $length.
quick and dirty implementation of such pattern search:
function findPattern($string){
$matches = 0;
$substrStart = 0;
while($matches < 2 && $substrStart+ 3 < strlen($string) && $pattern = substr($string, $substrStart++, 3)){
$matches = substr_count($string,$pattern);
}
if($matches < 2){
return null;
}
return $substrStart-1;
I wondering if somebody can help me
I have this code for getting some text between [reply]
$msg = '> **
> <#1371620c479a4e98_> Chauffeure -Emails driver_age:72
> driver_nationality:IN
> driver_languages:French,
> driver_name:Rinto George
> driver_mobilenumber:9747 161861
> driver_email:rinto#example.com[reply]I am a principal yes , I know
> you[reply] Fragen oder Hilfe benotigt?
> 089-38168530 Secure Transmission of Sensitive Data by SSL
>';
Used Code
preg_match_all("/\[reply](.*)\[reply]/", $msg, $reply);
print_r($reply)
But it doesn't outputs my desired output
Suggestion
If you could sort it out by using [reply]Reply here[/reply] would be better,as I am using [reply][reply] , not looking well formed
Take odd array result of regex filter /\[reply\].*/ will give you what you want
here the result
Array
(
[0] => Array
(
[0] => [reply]I am a principal yes , I know
[1] => [reply] Fragen oder Hilfe benotigt?
)
)
You don't want regex. While you can accomplish what you're trying to do for the most basic cases with a simple regex like the one webbandit posted, it's going to break on more complicated examples (like the one in my comment).
That can be worked around with a better regex and lookaheads, but that's not what you want. You're doing string matching and you should be using a finite machine to pull this off. PHP's string algorithms can give you something quick and dirty that would work much better, e.g.
<?php
$text = "[reply] something [reply] bla bla bla [reply] something else [reply]";
$matches = array();
$lastMatch = 0;
$matchCount = 0;
$search = "[reply]";
while(true) {
$thisMatch = strpos($text, $search, $lastMatch+1);
if($thisMatch === FALSE)
break;
if(++$matchCount % 2 == 0)
{
$lastMatch = $thisMatch;
continue;
}
//print substr($text, $lastMatch + strlen($search), $thisMatch - $lastMatch - strlen($search)) . "\n";
array_push($matches, substr($text, $lastMatch + strlen($search), $thisMatch - $lastMatch - strlen($search)));
$lastMatch = $thisMatch;
}
print_r($matches);
?>
Will give you an array of replies in $matches.
Output:
[mqudsi#iqudsi:~/Desktop]$ php reply.php
Array
(
[0] => something
[1] => something else
)
For your revised question with [reply] and [/reply], the solution is here:
$text = "[reply] something [/reply] bla bla bla [reply] something else [/reply]";
$matches = array();
$end = -1;
while(true) {
$start = strpos($text, "[reply]", $end+1);
$end = strpos($text, "[/reply]", $start+1);
if($start === FALSE || $end === FALSE)
break;
array_push($matches, substr($text, $start + strlen("[reply]"), $end - $start - strlen("[reply]")));
$lastMatch = $thisMatch;
}
Dunno why Gokhan deleted his answer, but it was the right one.
"~\[reply](.*?)\[/reply]~is"
The only two things you need is an s modifier which tells regexp to apply expression to the whole text, not each string separately (as it is by default)
and a greed limiter ?
Regex is: /\[reply\]([^\[]+)\[\/reply\]/g
I want to replace one random word of which are several in a string.
So let's say the string is
$str = 'I like blue, blue is my favorite colour because blue is very nice and blue is pretty';
And let's say I want to replace the word blue with red but only 2 times at random positions.
So after a function is done the output could be like
I like red, blue is my favorite colour because red is very nice and blue is pretty
Another one could be
I like blue, red is my favorite colour because blue is very nice and red is pretty
So I want to replace the same word multiple times but every time on different positions.
I thought of using preg_match but that doesn't have an option that the position of the words peing replaced is random also.
Does anybody have a clue how to achieve this?
Much as I am loathed to use regex for something which is on the face of it very simple, in order to guarantee exactly n replaces I think it can help here, as it allows use to easily use array_rand(), which does exactly what you want - pick n random items from a list of indeterminate length (IMPROVED).
<?php
function replace_n_occurences ($str, $search, $replace, $n) {
// Get all occurences of $search and their offsets within the string
$count = preg_match_all('/\b'.preg_quote($search, '/').'\b/', $str, $matches, PREG_OFFSET_CAPTURE);
// Get string length information so we can account for replacement strings that are of a different length to the search string
$searchLen = strlen($search);
$diff = strlen($replace) - $searchLen;
$offset = 0;
// Loop $n random matches and replace them, if $n < 1 || $n > $count, replace all matches
$toReplace = ($n < 1 || $n > $count) ? array_keys($matches[0]) : (array) array_rand($matches[0], $n);
foreach ($toReplace as $match) {
$str = substr($str, 0, $matches[0][$match][1] + $offset).$replace.substr($str, $matches[0][$match][1] + $searchLen + $offset);
$offset += $diff;
}
return $str;
}
$str = 'I like blue, blue is my favorite colour because blue is very nice and blue is pretty';
$search = 'blue';
$replace = 'red';
$replaceCount = 2;
echo replace_n_occurences($str, $search, $replace, $replaceCount);
See it working
echo preg_replace_callback('/blue/', function($match) { return rand(0,100) > 50 ? $match[0] : 'red'; }, $str);
Well, you could use this algorithm:
calculate the random amount of times you want to replace the string
explode the string into an array
for that array replace the string occurence only if a random value between 1 and 100 is % 3 (for istance)
Decrease the number calculated at point 1.
Repeat until the number reaches 0.
<?php
$amount_to_replace = 2;
$word_to_replace = 'blue';
$new_word = 'red';
$str = 'I like blue, blue is my favorite colour because blue is very nice and blue is pretty';
$words = explode(' ', $str); //convert string to array of words
$blue_keys = array_keys($words, $word_to_replace); //get index of all $word_to_replace
if(count($blue_keys) <= $amount_to_replace) { //if there are less to replace, we don't need to randomly choose. just replace them all
$keys_to_replace = array_keys($blue_keys);
}
else {
$keys_to_replace = array();
while(count($keys_to_replace) < $amount_to_replace) { //while we have more to choose
$replacement_key = rand(0, count($blue_keys) -1);
if(in_array($replacement_key, $keys_to_replace)) continue; //we have already chosen to replace this word, don't add it again
else {
$keys_to_replace[] = $replacement_key;
}
}
}
foreach($keys_to_replace as $replacement_key) {
$words[$blue_keys[$replacement_key]] = $new_word;
}
$new_str = implode(' ', $words); //convert array of words back into string
echo $new_str."\n";
?>
N.B. I just realized this will not replace the first blue, since it is entered into the word array as "blue," and so doesn't match in the array_keys call.
Given a list of common words, sorted in order of prevalence of use, is it possible to form word combinations of an arbitrary length (any desired number of words) in order of the 'most common' sequences. For example,if the most common words are 'a, b, c' then for combinations of length two, the following would be generated:
aa
ab
ba
bb
ac
bc
ca
cb
cc
Here is the correct list for length 3:
aaa
aab
aba
abb
baa
bab
bba
bbb
aac
abc
bac
bbc
aca
acb
bca
bcb
acc
bcc
caa
cab
cba
cbb
cac
cbc
cca
ccb
ccc
This is simple to implement for combinations of 2 or 3 words (set length) for any number of elements, but can this be done for arbitrary lengths? I want to implement this in PHP, but pseudocode or even a summary of the algorithm would be much appreciated!
Here's a recursive function that might be what you need. The idea is, when given a length and a letter, to first generate all sequences that are one letter shorter that don't include that letter. Add the new letter to the end and you have the first part of the sequence that involves that letter. Then move the new letter to the left. Cycle through each sequence of letters including the new one to the right.
So if you had gen(5, d)
It would start with
(aaaa)d
(aaab)d
...
(cccc)d
then when it got done with the a-c combinations it would do
(aaa)d(a)
...
(aaa)d(d)
(aab)d(d)
...
(ccc)d(d)
then when it got done with d as the 4th letter it would move it to the 3rd
(aa)d(aa)
etc., etc.
<?php
/**
* Word Combinations (version c) 6/22/2009 1:20:14 PM
*
* Based on pseudocode in answer provided by Erika:
* http://stackoverflow.com/questions/1024471/generating-ordered-weighted-combinations-of-arbitrary-length-in-php/1028356#1028356
* (direct link to Erika's answer)
*
* To see the results of this script, run it:
* http://stage.dustinfineout.com/stackoverflow/20090622/word_combinations_c.php
**/
init_generator();
function init_generator() {
global $words;
$words = array('a','b','c');
generate_all(5);
}
function generate_all($len){
global $words;
for($i = 0; $i < count($words); $i++){
$res = generate($len, $i);
echo join("<br />", $res);
echo("<br/>");
}
}
function generate($len, $max_index = -1){
global $words;
// WHEN max_index IS NEGATIVE, STARTING POSITION
if ($max_index < 0) {
$max_index = count($words) - 1;
}
$list = array();
if ($len <= 0) {
$list[] = "";
return $list;
}
if ($len == 1) {
if ($max_index >= 1) {
$add = generate(1, ($max_index - 1));
foreach ($add as $addit) {
$list[] = $addit;
}
}
$list[] = $words[$max_index];
return $list;
}
if($max_index == 0) {
$list[] = str_repeat($words[$max_index], $len);
return $list;
}
for ($i = 1; $i <= $len; $i++){
$prefixes = generate(($len - $i), ($max_index - 1));
$postfixes = generate(($i - 1), $max_index);
foreach ($prefixes as $pre){
//print "prefix = $pre<br/>";
foreach ($postfixes as $post){
//print "postfix = $post<br/>";
$list[] = ($pre . $words[$max_index] . $post);
}
}
}
return $list;
}
?>
I googled for php permutations and got: http://www.php.happycodings.com/Algorithms/code21.html
I haven't looked into the code if it is good or not. But it seems to do what you want.
I don't know what the term is for what you're trying to calculate, but it's not combinations or even permutations, it's some sort of permutations-with-repetition.
Below I've enclosed some slightly-adapted code from the nearest thing I have lying around that does something like this, a string permutation generator in LPC. For a, b, c it generates
abc
bac
bca
acb
cab
cba
Probably it can be tweaked to enable the repetition behavior you want.
varargs mixed array permutations(mixed array list, int num) {
mixed array out = ({});
foreach(mixed item : permutations(list[1..], num - 1))
for(int i = 0, int j = sizeof(item); i <= j; i++)
out += ({ implode(item[0 .. i - 1] + ({ list[0] }) + item[i..], "") });
if(num < sizeof(list))
out += permutations(list[1..], num);
return out;
}
FWIW, another way of stating your problem is that, for an input of N elements, you want the set of all paths of length N in a fully-connected, self-connected graph with the input elements as nodes.
I'm assuming that when saying it's easy for fixed length, you're using m nested loops, where m is the lenght of the sequence (2 and 3 in your examples).
You could use recursion like this:
Your words are numbered 0, 1, .. n, you need to generate all sequences of length m:
generate all sequences of length m:
{
start with 0, and generate all sequences of length m-1
start with 1, and generate all sequences of length m-1
...
start with n, and generate all sequences of length m-1
}
generate all sequences of length 0
{
// nothing to do
}
How to implement this? Well, in each call you can push one more element to the end of the array, and when you hit the end of the recursion, print out array's contents:
// m is remaining length of sequence, elements is array with numbers so far
generate(m, elements)
{
if (m == 0)
{
for j = 0 to elements.length print(words[j]);
}
else
{
for i = 0 to n - 1
{
generate(m-1, elements.push(i));
}
}
}
And finally, call it like this: generate(6, array())