I have some word lets say BKOO.
I need to remove all combinations of missing letters to generate sub words of this initial word. First remove only 1 letter, then n letters to build at least 2 letters words.
So from our example it means to make words like KOO, BOO, OO, BK, BO.
My current algorithm btw says it is possible to generate 7 combinations out of BKOO. (I also include the initial word).
Array
(
[0] => BKOO
[1] => Array
(
[0] => BKOO
[1] => KOO
[2] => OO
[3] => KO
[4] => BOO
[5] => BO
[6] => BKO
[7] => BK
)
)
Note there isnt words like BOK or OOK because that would mean do the reorder, but i dont want to do this. I want just leave letters out of current word, and don't do reorder.
Now problem is, this very slow for lenght like 15. It takes forever. How to speed it up?
function comb($s, $r = [], $init = false) {
if ($init) {
$s = mb_strtoupper($s);
$r[] = $s;
}
$l = strlen($s);
if (!$s || $l < 3) return [];
for ($i=0; $i<$l; $i++) {
$t = rem_index($s, $i);
$r[] = $t;
$r = array_merge($r, comb($t));
}
$ret = array_unique((array)$r);
return $init ? array_values($ret) : $ret;
}
// remove character at position
function rem_index($str, $ind)
{
return substr($str,0,$ind++). substr($str,$ind);
}
$s = 'BKOO';
print_r(comb($s, [], true));
https://www.tehplayground.com/62pjCAs70j7qpLJj
NERD SECTION: 🤓 😄
Interesting note - first i thought i will generate array of some dropping indexes eg, first drop only 1 letter so say drop 0 then 1 etc etc, then 2-combinations so drop 1 and 2, 1 and 3 etc, but then i thought it would be quite difficult to drop N letters out of string at once, so i came with idea that i always drop some letter from the string, and recursively call the function again if you get me, so the next level is one char dropped already and does the drop iteration again. Problem is it is very slow for some reason.
Btw if you have also the math background, what is equation to compute the resulting combinations? To me the rough computation is lets say for 15 letters word 14 * 13 * 12 or at least it does such iteration, but that would be milions of combinations and obviously its not like that even for shorter words like 8.
Thanks.
You can iterate the string to get it.
function foo(&$res,$str,$min_length){
if(strlen($str) <= $min_length){
return;
}
$remains=[];
for($i=0; $i<strlen($str); $i++){
$remain = substr($str,0,$i) . substr($str,$i+1);
if(!isset($res[$remain])) { // only process unprocessed sub string
$res[$remain] = $remain;
$remains[] = $remain;
}
}
foreach($remains as $remain){
if(strlen($remain) == $min_length){
$res[$remain] = $remain;
}else {
foo($res, $remain, $min_length);
}
}
return;
}
$str = "BKOO";
$res = [];
foo($res,$str,2);
var_dump(array_values($res));
Related
I am populating my DB table with unique download codes.
My intention is to make sure that at the end I will have a 1000 unique codes.
So far I have this code in my controller method:
// determining how many codes have to be generated
$totalcount_generated_so_far = DownloadCode->count();
$quantity = 1000 - $totalcount_generated_so_far;
if($quantity < 0) {
return "nothing generated! You already have more than 1000 codes.";
}
$object = new DownloadCode;
for($i = 0; $i < $quantity; $i++) {
$object = new DownloadCode;
$length = 6;
$keys = array_merge(range(1,9), range('A', 'Z'));
$key1 = "";
for($i=0; $i < $length; $i++) {
$key1 .= $keys[mt_rand(0, count($keys) - 1)];
}
$object->code_download = $key1; // a ready to use 6-digit
$object->save();
}
return $quantity . " unique codes have been generated.";
Problem: The above code does not check if a generated code is unique.
TODO:
Make the function to check if the code has been already generated (a very rare event, but still!)?
Partial solution:
I could put the $object->save(); inside an if condition:
// check for uniqueness
$uniq_test = DownloadCode::where('code_download',$key2)->first();
if($uniq_test) {
$i--
} else {
$object->save();
}
Is there a better solution?
Thank you.
The problem with random numbers is that there is no way to guarantee that you can generate a certain number of them. For any value of n there is a probability, however small, that you will generate the same random number repeatedly, and thus never reach the number of different codes you need.
One answer is to use a deterministic function, but that can be predictable.
To generate a known number of random codes combine the two methods.
So, in pseudo code:
for some number of iterations
generate a random code of some length
append a sequential number in some range
return the list of codes.
Identical random codes will be distinguished by differing sequential suffixes, so no collision.
In PHP this would look something like this:
function makeCodes($numCodes, $codeLength) {
// source digits
$digits = '01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$codes = [];
$nextCode = 0;
// Make a number of codes
for ($i = 0; $i<$numCodes; $i++) {
$code = '';
// Create a randomised string from the source digits
for ($j = 0; $j<$codeLength-3;$j++) {
$code .= $digits[random_int(0,strlen($digits)-1)];
}
// Append the sequential element
$codes[] = sprintf('%s%03X', $code, $nextCode);
$nextCode++;
}
return $codes;
}
print_r(makeCodes(10, 24));
Returns:
(
[0] => BL8TKD86VW086XS3PBKZ4000
[1] => MSBYAAPWGLROKL0NKP48L001
[2] => XCDI783PW1J1RD9X3KM71002
[3] => GAKZE96PVA1X6DR7X1Y4N003
[4] => M6DCEEOMLYGC42DPD8GVY004
[5] => 1DKFL67IZ2EA0UTEIWW61005
[6] => XMSU0UUD9GHDAQN3XMYW5006
[7] => 4QOKM1YOPCW2NK1E6CL9Q007
[8] => VHMURGPH7AKR8HOEXPBAN008
[9] => EU0L5QAGPB211WZ5VDE4R009
)
This will produce a list of ten 24-digit codes made up of a 21-digit random prefix followed by a a 3-digit hex number in the range 000 to 009
There are obviously many possible variations on this. Change the sequential range to some other starting point; change the sequential length; prepend the sequential portion; embed the sequential portion, and so on. I'm sure you can dream up something to suit your preferences.
Demo: https://3v4l.org/cboZ0
Laravel has a helper Which generates safety unique IDs.
Generate UUID with Laravel
$random = str_random(20);
dd($random);
I'm trying to write a highlighting functionality. There are two types of highlighting: positive and negative. Positive is done first. Highlighting in itself is very simple - just wrapping keyword/phrase in a span with a specific class, which depends on type of highlighting.
Problem:
Sometimes, negative highlighting can contain positive one.
Example:
Original text:
some data from blahblah test was not statistically valid
After text passes through positive highlighting "filter", it'll end up like this:
some data from <span class="positive">blahblah test</span> was not <span class="positive">statistically valid</span>
or
some data from <span class="positive">blahblah test</span> was not <span class="positive">statistically <span class="positive">valid</span></span>
Then in negative list, we have a phrase not statistically valid.
In both cases, resulting text after passing through both "filters" should look like:
some data from <span class="positive">blahblah test</span> was <span class="negative">not statistically valid</span>
Conditions:
- Amount of span tags or their location within keyword/phrase from negative "filter" list is unknown
- Keyword/phrase must be matched even if it includes span tags (including right before and right after keyword/phrase). These span tags have to be removed.
- If any span tags are detected, amount of opening and closing span tags removed has to be equal.
Questions:
- How to detect these span tags if there are any?
- Is this even possible with RegEx alone?
I don't think if it can be done with a single Regular Expression and if it's possible, then honestly I'm so lazy for blowing my mind to make it.
In what way do you think?
I came to a solution that takes 4 steps to achieve what you desire:
Extract all words from HTML and store them with their corresponding
positions.
Explode each Negative List's value to words
Match each word consecutively to the (1) in-order-words and store
Replace recently found values with their new HTML wrapper (<span class="negative">...</span>) by their positions
I have, however, made a detailed flowchart (I'm not good at flowchats, sorry) that you feel better in understanding things. It could help if you look at codes at the first.
Here is what we have:
$HTML = <<< HTML
some data from <span class="positive">blahblah test</span> was not <span class="positive">statistically <span class="positive">valid</span></span>
HTML;
$listOfNegatives = ['not statistically valid'];
To extract words (real words) I used a RegEx which will fulfill our needs at this step:
~\b(?<![</])\w+\b(?![^<>]+>)~
To get positions of each word too, a flag should be used with preg_match_all(): PREG_OFFSET_CAPTURE
/**
* Extract all words and their corresponsing positions
* #param [string] $HTML
* #return [array] $HTMLWords
*/
function extractWords($HTML) {
$HTMLWords = [];
preg_match_all("~\b(?<![</])\w+\b(?![^<>]+>)~", $HTML, $words, PREG_OFFSET_CAPTURE);
foreach ($words[0] as $word) {
$HTMLWords[$word[1]] = $word[0];
}
return $HTMLWords;
}
This function's output is something like this:
Array
(
[0] => some
[5] => data
[10] => from
[38] => blahblah
[47] => test
[59] => was
[63] => not
[90] => statistically
[127] => valid
)
What we should do here is to match each words of a list's value - consecutively - to words we just extracted. So as our first list's value not statistically valid we have three words not, statistically and valid and these words should come continuously in the extracted words array. (which happens)
To handle this I wrote a function:
/**
* Check if any of our defined list values can be found in an ordered-array of exctracted words
* #param [array] $HTMLWords
* #param [array] $listOfNegatives
* #return [array] $subString
*/
function checkNegativesExistence($HTMLWords, $listOfNegatives) {
$counter = 0;
$previousWordOffset = null;
$subStrings = [];
foreach ($listOfNegatives as $i => $string) {
$stringWords = explode(" ", $string);
$wordIndex = 0;
foreach ($HTMLWords as $offset => $HTMLWord) {
if ($wordIndex > count($stringWords) - 1) {
$wordIndex = 0;
$counter++;
}
if ($stringWords[$wordIndex] == $HTMLWord) {
$subStrings[$counter][] = [$HTMLWord, $offset, $previousWordOffset];
$wordIndex++;
} elseif (isset($subStrings[$counter]) && count($subStrings[$counter]) > 0) {
unset($subStrings[$counter]);
$wordIndex = 0;
}
$previousWordOffset = $offset + strlen($HTMLWord);
}
$counter++;
}
return $subStrings;
}
Which has an output like below:
Array
(
[0] => Array
(
[0] => Array
(
[0] => not
[1] => 63
[2] => 62
)
[1] => Array
(
[0] => statistically
[1] => 90
[2] => 66
)
[2] => Array
(
[0] => valid
[1] => 127
[2] => 103
)
)
)
If you see we have a complete string split into words and their offsets (we have two offsets, first one is real offset second one is offset of previous word). We need them later.
Now another thing we should consider is to replace this occurrence from offset 62 to 127 + strlen(valid) with <span class="negative">not statistically valid</span> and forget about every thing else.
/**
* Substitute newly matched strings with negative HTML wrapper
* #param [array] $subStrings
* #param [string] $HTML
* #return [string] $HTML
*/
function negativeHighlight($subStrings, $HTML) {
$offset = 0;
$HTMLLength = strlen($HTML);
foreach ($subStrings as $key => $value) {
$arrayOfWords = [];
foreach ($value as $word) {
$arrayOfWords[] = $word[0];
if (current($value) == $value[0]) {
$start = substr($HTML, $word[1], strlen($word[0])) == $word[0] ? $word[2] : $word[2] + $offset;
}
if (current($value) == end($value)) {
$defaultLength = $word[1] + strlen($word[0]) - $start;
$length = substr($HTML, $word[1], strlen($word[0])) === $word[0] ? $defaultLength : $defaultLength + $offset;
}
}
$string = implode(" ", $arrayOfWords);
$HTML = substr_replace($HTML, "<span class=\"negative\">{$string}</span>", $start, $length);
if ($HTMLLength > strlen($HTML)) {
$offset = -($HTMLLength - strlen($HTML));
} elseif ($HTMLLength < strlen($HTML)) {
$offset = strlen($HTML) - $HTMLLength;
}
}
return $HTML;
}
An important thing here I should note is that by doing first substitution we may affect offsets of other extracted values (that we don't have here). So calculating new HTML length is required:
if ($HTMLLength > strlen($HTML)) {
$offset = -($HTMLLength - strlen($HTML));
} elseif ($HTMLLength < strlen($HTML)) {
$offset = strlen($HTML) - $HTMLLength;
}
and... we should check if by this change of length how did our offsets changed:
Word's offset is intact (some characters were added/removed after
this word)
Word's offset is changed (some characters were added/removed before
this word)
This checking is done by this block (we need to check first and last word only):
if (current($value) == $value[0]) {
$start = substr($HTML, $word[1], strlen($word[0])) == $word[0] ? $word[2] : $word[2] + $offset;
}
if (current($value) == end($value)) {
$defaultLength = $word[1] + strlen($word[0]) - $start;
$length = substr($HTML, $word[1], strlen($word[0])) === $word[0] ? $defaultLength : $defaultLength + $offset;
}
Doing all together:
$newHTML = negativeHighlight(checkNegativesExistence(extractWords($HTML), $listOfNegatives), $HTML);
Output:
some data from <span class="positive">blahblah test</span> was <span class="negative">not statistically valid</span></span></span>
But there are problems with our last output: unmatched tags.
I'm sorry that I lied I've done this problem solving in 4 steps but it has one more. Here I made another RegEx to match all truly nested tags and those which are mistakenly existed:
~(<span[^>]+>([^<]*+<(?!/)(?:([a-zA-Z0-9]++)[^>]*>[^<]*</\3>|(?2)))*[^<]*</span>|(?'single'</[^>]+>|<[^>]+>))~
By a preg_replace_callback() I only replace tags in group named single with nothing:
echo preg_replace_callback("~(<span[^>]+>([^<]*+<(?!/)(?:([a-zA-Z0-9]++)[^>]*>[^<]*</\3>|(?2)))*[^<]*</span>|(?'single'</[^>]+>|<[^>]+>))~",
function ($match) {
if (isset($match['single'])) {
return null;
}
return $match[1];
},
$newHTML
);
and we have right output:
some data from <span class="positive">blahblah test</span> was <span class="negative">not statistically valid</span>
Failing cases
My solution does not output right HTML on below situations:
1- If a word like <was> is between other words:
<span class="positive">blahblah test</span> <was> not
Why?
Because my last RegEx spots <was> as an unmatched tag so it will
remove it.
2- If a word like not (which is part of a negative list's value in
our list) is enclosed with <> -> <not>. Which outputs:
some data from <span class="positive">blahblah test</span> was <not> <span class="positive">statistically <span class="positive">valid</span></span>
Why?
Because my first RegEx understands words which are not between tags
specific characters <>
3- If list has values that one is the other's substring:
$listOfNegatives = ['not statistically valid', 'not statistically'];
Why?
Because they overlap.
Working demo
Here is what I've come up with. I honestly can't say whether it will cope with the full range of the requirement, but it might help a bit
$s = 'some data from blahblah test was not statistically valid';
$replaced = highlight($s);
var_dump($replaced);
function highlight($s) {
// split the string on the negative parts, capturing the full negative string each time
$parts = preg_split('/(not statistically valid)/',$s,-1,PREG_SPLIT_DELIM_CAPTURE);
$output = '';
$negativePart = 0; // keep track of whether we're dealing with a negative or part or the remainder - they will alternate.
foreach ($parts as $part) {
if ($negativePart) {
$output .= negativeHighlight($part);
} else {
$output .= positiveHighlight($part);
}
$negativePart = !$negativePart;
}
return $output;
}
// only deals with a single negative part at a time, so just wraps with a span
function negativeHighlight($part) {
return "<span class='negative'>$part</span>";
}
// potentially deals with several replacements at once
function positiveHighlight($part) {
return preg_replace('/(blahblah test)|(statistically valid)/', "<span class='positive'>$1</span>", $part);
}
I want a way to take any input (url) and get back a number between 1-4, distributed as even as possible 25% for any input. It's important that it gets the same value of 1-4 every time.
The reason I want this is so that I can create seemingly random and evenly disturbed content for a set of CNAMEs (subdomains) for a CDN. It would take pictures that were originally www.website.com/picture.png and output them as
cdn1.website.com/picture.png or
cdn2.website.com/picture.png or
cdn3.website.com/picture.png or
cdn4.website.com/picture.png
Effectively allowing me to bypass the browser restrictions set to a subdomain, giving me more parallel connections (Read more: http://yuiblog.com/blog/2007/04/11/performance-research-part-4/). The reason why I want the URL to always pass back to a specific CDN is for caching purposes; If the www.website.com/picture.png is first displayed as cdn1.website.com/picture.png and a second time around as cdn2.website.com/picture.png then the browser would not know that it has the same picture cached already under cdn1 and would download the same picture twice, rather than relying on cache.
Here the suggested php at it, but I as you can see from results that I don't get that 25% ratio I would like for small sample set. I am looking for alternatives that would also be somewhat close to 25% distribution for small samples.
<?php
$num_array = array();
for ($i = 1; $i <= 10000; $i++) {
$num_array[]=(crc32(genRandomURL()) % 4)+1;
}
print "<pre>";
print_r(array_count_values($num_array));
print "</pre>";
$num_array = array();
for ($i = 1; $i <= 10; $i++) {
$num_array[]=(crc32(genRandomURL()) % 4)+1;
}
print "<pre>";
print_r(array_count_values($num_array));
print "</pre>";
function genRandomURL($length = 10) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyz';
$string = "";
for ($p = 0; $p < $length; $p++) {
$string .= $characters[mt_rand(0, strlen($characters))];
}
return "http://www.website.com/dir/dir2/dir3/".$string.".png";
}
?>
Results:
Array
(
[3] => 2489
[1] => 2503
[2] => 2552
[4] => 2456
)
Array
(
[1] => 5
[2] => 1
[3] => 3
[4] => 1
)
How about creating a hash of the name, getting the last two bits of that hash, then finally converting them back into a decimal number. Should return the same value so long as your name doesn't change.
function img_id($string){
$hash = md5($string); // create hash
$bin_hash = base_convert($hash, 16, 2); // convert to binary
$last_bits = substr($bin_hash, -2); // get last two bits
$img_int = bindec($last_bits)+1; // turn bits to integer, and + 1
return $img_int; // will be number from 1 to 4
}
$picture = 'picture.png';
$cdn_id = img_id($picture);
$url = "cdn{$cdn_id}.website.com/{$picture}";
If your name might change then you could also look at doing a hash of the actual file contents.
Pattern search within a string.
for eg.
$string = "111111110000";
FindOut($string);
Function should return 0
function FindOut($str){
$items = str_split($str, 3);
print_r($items);
}
If I understand you correctly, your problem comes down to finding out whether a substring of 3 characters occurs in a string twice without overlapping. This will get you the first occurence's position if it does:
function findPattern($string, $minlen=3) {
$max = strlen($string)-$minlen;
for($i=0;$i<=$max;$i++) {
$pattern = substr($string,$i,$minlen);
if(substr_count($string,$pattern)>1)
return $i;
}
return false;
}
Or am I missing something here?
What you have here can conceptually be solved with a sliding window. For your example, you have a sliding window of size 3.
For each character in the string, you take the substring of the current character and the next two characters as the current pattern. You then slide the window up one position, and check if the remainder of the string has what the current pattern contains. If it does, you return the current index. If not, you repeat.
Example:
1010101101
|-|
So, pattern = 101. Now, we advance the sliding window by one character:
1010101101
|-|
And see if the rest of the string has 101, checking every combination of 3 characters.
Conceptually, this should be all you need to solve this problem.
Edit: I really don't like when people just ask for code, but since this seemed to be an interesting problem, here is my implementation of the above algorithm, which allows for the window size to vary (instead of being fixed at 3, the function is only briefly tested and omits obvious error checking):
function findPattern( $str, $window_size = 3) {
// Start the index at 0 (beginning of the string)
$i = 0;
// while( (the current pattern in the window) is not empty / false)
while( ($current_pattern = substr( $str, $i, $window_size)) != false) {
$possible_matches = array();
// Get the combination of all possible matches from the remainder of the string
for( $j = 0; $j < $window_size; $j++) {
$possible_matches = array_merge( $possible_matches, str_split( substr( $str, $i + 1 + $j), $window_size));
}
// If the current pattern is in the possible matches, we found a duplicate, return the index of the first occurrence
if( in_array( $current_pattern, $possible_matches)) {
return $i;
}
// Otherwise, increment $i and grab a new window
$i++;
}
// No duplicates were found, return -1
return -1;
}
It should be noted that this certainly isn't the most efficient algorithm or implementation, but it should help clarify the problem and give a straightforward example on how to solve it.
Looks like you more want to use a sub-string function to walk along and check every three characters and not just break it into 3
function fp($s, $len = 3){
$max = strlen($s) - $len; //borrowed from lafor as it was a terrible oversight by me
$parts = array();
for($i=0; $i < $max; $i++){
$three = substr($s, $i, $len);
if(array_key_exists("$three",$parts)){
return $parts["$three"];
//if we've already seen it before then this is the first duplicate, we can return it
}
else{
$parts["$three"] = i; //save the index of the starting position.
}
}
return false; //if we get this far then we didn't find any duplicate strings
}
Based on the str_split documentation, calling str_split on "1010101101" will result in:
Array(
[0] => 101
[1] => 010
[2] => 110
[3] => 1
}
None of these will match each other.
You need to look at each 3-long slice of the string (starting at index 0, then index 1, and so on).
I suggest looking at substr, which you can use like this:
substr($input_string, $index, $length)
And it will get you the section of $input_string starting at $index of length $length.
quick and dirty implementation of such pattern search:
function findPattern($string){
$matches = 0;
$substrStart = 0;
while($matches < 2 && $substrStart+ 3 < strlen($string) && $pattern = substr($string, $substrStart++, 3)){
$matches = substr_count($string,$pattern);
}
if($matches < 2){
return null;
}
return $substrStart-1;
I have alphabet array 24 character: "A B C D E F G H I J K L M N O P Q R S T U V W X"
I want collect all case with: 3 unique characters.
First case: ABC, DEF, GHI, JKL, MNO, PQR, STU, VWX
This is a little late coming, but for anyone else reading over this: If you are looking to split a string into 3-character chunks, try PHP's built in str_split() function. It takes in a $string and $split_length argument. For example:
$alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWX';
$grouped = str_split($alphabet, 3);
var_export( $grouped );
This outputs the following array:
array ( 0 => 'ABC', 1 => 'DEF', 2 => 'GHI',
3 => 'JKL', 4 => 'MNO', 5 => 'PQR',
6 => 'STU', 7 => 'VWX', )
This works for the example given in the question. If you want to have every possible combination of those 24 letters, Artefacto's answer makes more sense.
There's a 1:1 relationship between the permutations of the letters of the alphabet and your sets lists. Basically, once you have a permutation of the alphabet, you just have to call array_chunk to get the sets.
Now, 24! of anything (that is 620448401733239439360000) will never fit in memory (be it RAM or disk), so the best you can do is to generate a number n between 1 and 24! (the permutation number) and then generate such permutation. For this last step, see for example Generation of permutations following Lehmer and Howell and the papers there cited.
$alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$c = strlen($alphabet);
$result = array();
for ($i = 0; $i < $c; ++$i) {
$current0 = $i;
for ($j = 0; $j < $c; ++$j) {
if ($current0 == $j) continue;
$current1 = $j;
for ($k = 0; $k < $c; ++$k) {
if (isset($current0 == $k || $current1 == $k)) continue;
$result[] = $alphabet[$i].$alphabet[$j].$alphabet[$k];
}
}
}
Hope I understood your question right. This one iterates over the alphabet in three loops and always skips the characters which are already used. Then I push the result to $result.
But better try the script with only five letters ;) Using alls strlen($alphabet) (don't wanna count now...) will need incredibly much memory.
(I am sure there is some hacky version which is faster than that, but this is most straightforward I think.)