How correct incorrect words from dictionary array? - php

I've dictionary array with correct words:
<?php
$dict = array("apple","windows","microsoft","happy");
?>
I tried some options for correction, but it did not help me to solve my problem. I will give an example of one of the methods used
<?php
$config_dic= pspell_config_create ('en');
function orthograph($string)
{
// Suggests possible words in case of misspelling
$config_dic = pspell_config_create('en');
// Ignore words under 3 characters
pspell_config_ignore($config_dic, 3);
// Configure the dictionary
pspell_config_mode($config_dic, PSPELL_FAST);
$dictionary = pspell_new_config($config_dic);
// To find out if a replacement has been suggested
$replacement_suggest = false;
$string = explode('', trim(str_replace(',', ' ', $string)));
foreach ($string as $key => $value) {
if(!pspell_check($dictionary, $value)) {
$suggestion = pspell_suggest($dictionary, $value);
// Suggestions are case sensitive. Grab the first one.
if(strtolower($suggestion [0]) != strtolower($value)) {
$string [$key] = $suggestion [0];
$replacement_suggest = true;
}
}
}
if ($replacement_suggest) {
// We have a suggestion, so we return to the data.
return implode('', $string);
} else {
return null;
}
}
$search = $_POST['input'];
$suggestion_spell = orthograph($search);
if ($suggestion_spell) {
echo "Try with this spelling : $suggestion_spell";
}
$dict = pspell_new ("en");
if (!pspell_check ($dict, "lappin")) {
$suggestions = pspell_suggest ($dict, "lappin");
foreach ($suggestions as $suggestion) {
echo "Did you mean: $suggestion?<br />";
}
}
// Suggests possible words in case of misspelling
$config_dic = pspell_config_create('en');
// Ignore words under 3 characters
pspell_config_ignore($config_dic, 3);
// Configure the dictionary
pspell_config_mode($config_dic, PSPELL_FAST);
$dictionary = pspell_new_config($config_dic);
..................................................
The idea is there but could not realize. If you can understand and help as something with something then for you earlier, thank you. The idea is that it checks each word for similarity from an array with the correct words in percentages well or in anything and if the words 90-95% coincide with the word in the mass then replace it with the correct version. For example, in the word maximum, 2-3 letters can be omitted. Most people will miss 1-2 letters. Such functionality is in google translate. I want to implement such a functional in my project.
The user, for example, can enter a word in such different incorrect variants:
$textarea = "Windows - Widows, Apple - Aple";
There are such cases that the user adds what is not needed letters in
the word
$text = "microsoft - microosoft, happy - haapy";
I would like to create my own function to solve this problem. Than to use the side extended ... Therefore for an example can show what that variants.

Related

How to check array data that matches from random characters in php?

I have an array like below:
$fruits = array("apple","orange","papaya","grape")
I have a variable like below:
$content = "apple";
I need to filter some condition like: if this variable matches at least one of the array elements, do something. The variable, $content, is a bunch of random characters that is actually one of these available in the array data like below:
$content = "eaplp"; // it's a dynamically random char from the actual word "apple`
what have I done was like the below:
$countcontent = count($content);
for($a=0;$a==count($fruits);$a++){
$countarr = count($fruits[$a]);
if($content == $fruits[$a] && $countcontent == $countarr){
echo "we got".$fruits[$a];
}
}
I tried to count how many letters these phrases had and do like if...else... when the total word in string matches with the total word on one of array data, but is there something that we could do other than that?
We can check if an array contains some value with in_array. So you can check if your $fruits array contains the string "apple" with,
in_array("apple", $fruits)
which returns a boolean.
If the order of the letters is random, we can sort the string alphabetically with this function:
function sorted($s) {
$a = str_split($s);
sort($a);
return implode($a);
}
Then map this function to your array and check if it contains the sorted string.
$fruits = array("apple","orange","papaya","grape");
$content = "eaplp";
$inarr = in_array(sorted($content), array_map("sorted", $fruits));
var_dump($inarr);
//bool(true)
Another option is array_search. The benefit from using array_search is that it returns the position of the item (if it's found in the array, else false).
$pos = array_search(sorted($content), array_map("sorted", $fruits));
echo ($pos !== false) ? "$fruits[$pos] found." : "not found.";
//apple found.
This will also work but in a slightly different manner.
I split the strings to arrays and sort them to match eachoter.
Then I use array_slice to only match the number of characters in $content, if they match it's a match.
This means this will match in a "loose" way to with "apple juice" or "apple curd".
Not sure this is wanted but figured it could be useful for someone.
$fruits = array("apple","orange","papaya","grape","apple juice", "applecurd");
$content = "eaplp";
$content = str_split($content);
$count = count($content);
Foreach($fruits as $fruit){
$arr_fruit = str_split($fruit);
// sort $content to match order of $arr_fruit
$SortCont = array_merge(array_intersect($arr_fruit, $content), array_diff($content, $arr_fruit));
// if the first n characters match call it a match
If(array_slice($SortCont, 0, $count) == array_slice($arr_fruit, 0, $count)){
Echo "match: " . $fruit ."\n";
}
}
output:
match: apple
match: apple juice
match: applecurd
https://3v4l.org/hHvp3
It is also comparable in speed with t.m.adams answer. Sometimes faster sometimes slower, but note how this code can find multiple answers. https://3v4l.org/IbuuD
I think this is the simplest way to answer that question. some of the algorithm above seems to be "overkill".
$fruits = array("apple","orange","papaya","grape");
$content = "eaplp";
foreach ($fruits as $key => $fruit) {
$fruit_array = str_split($fruit); // split the string into array
$content_array = str_split($content); // split the content into array
// check if there's no difference between the 2 new array
if ( sizeof(array_diff($content_array, $fruit_array)) === 0 ) {
echo "we found the fruit at key: " . $key;
return;
}
}
What about using only native PHP functions.
$index = array_search(count_chars($content), array_map('count_chars', $fruits));
If $index is not null you will get the position of $content inside $fruits.
P.S. Be aware that count_chars might not be the fastest approach to that problem.
With a random token to search for a value in your array, you have a problem with false positives. That can give misleading results depending on the use case.
On search cases, for example wrong typed words, I would implement a filter solution which produces a matching array. One could sort the results by calculating the levenshtein distance to fetch the most likely result, if necessary.
String length solution
Very easy to implement.
False positives: Nearly every string with the same length like apple and grape would match.
Example:
$matching = array_filter($fruits, function ($fruit) use ($content) {
return strlen($content) == strlen($fruit);
});
if (count($matching)) {
// do your stuff...
}
Regular expression solution
It compares string length and in a limited way containing characters. It is moderate to implement and has a good performance on big data cases.
False positives: A content like abc would match bac but also bbb.
Example:
$matching = preg_grep(
'#['.preg_quote($content).']{'.strlen($content).'}#',
$fruits
);
if (count($matching)) {
// do your stuff...
}
Alphanumeric sorting solution
Most accurate but also a slow approach concerning performance using PHP.
False positives: A content like abc would match on bac or cab.
Example:
$normalizer = function ($value) {
$tokens = str_split($value);
sort($tokens);
return implode($tokens);
};
$matching = array_filter($fruits, function ($fruit) use ($content, $normalizer) {
return ($normalizer($fruit) == $normalizer($content));
});
if (count($matching)) {
// do your stuff...
}
Here's a clean approach. Returns unscrambled value early if found, otherwise returns null. Only returns an exact match.
function sortStringAlphabetically($stringToSort)
{
$splitString = str_split($stringToSort);
sort($splitString);
return implode($splitString);
}
function getValueFromRandomised(array $dataToSearch = [], $dataToFind)
{
$sortedDataToFind = sortStringAlphabetically($dataToFind);
foreach ($dataToSearch as $value) {
if (sortStringAlphabetically($value) === $sortedDataToFind) {
return $value;
}
}
return null;
}
$fruits = ['apple','orange','papaya','grape'];
$content = 'eaplp';
$dataExists = getValueFromRandomised($fruits, $content);
var_dump($dataExists);
// string(5) "apple"
Not found example:
$content = 'eaplpo';
var_dump($dataExists);
// NULL
Then you can use it (eg) like this:
echo !empty($dataExists) ? $dataExists . ' was found' : 'No match found';
NOTE: This is case sensitive, meaning it wont find "Apple" from "eaplp". That can be resolved by doing strtolower() on the loop's condition vars.
How about looping through the array, and using a flag to see if it matches?
$flag = false;
foreach($fruits as $fruit){
if($fruit == $content){
$flag = true;
}
}
if($flag == true){
//do something
}
I like t.m.adams answer but I also have a solution for this issue:
array_search_random(string $needle, array $haystack [, bool $strictcase = FALSE ]);
Description: Searches a string in array elements regardless of the position of the characters in the element.
needle: the caracters you are looking for as a string
haystack: the array you want to search
strictcase: if set to TRUE needle 'mood' will match 'mood' and 'doom' but not 'Mood' and 'Doom', if set to FALSE (=default) it will match all of these.
Function:
function array_search_random($needle, $haystack, $strictcase=false){
if($strictcase === false){
$needle = strtolower($needle);
}
$needle = str_split($needle);
sort($needle);
$needle = implode($needle);
foreach($haystack as $straw){
if($strictcase === false){
$straw = strtolower($straw);
}
$straw = str_split($straw);
sort($straw);
$straw = implode($straw);
if($straw == $needle){
return true;
}
}
return false;
}
if(in_array("apple", $fruits)){
true statement
}else{
else statement
}

Decoding anagram with recursive function doesn't give expected output

So I'm trying to decode an anagram into words from my dictionary file. But my recursive function isn't behaving like I'm expecting.
The thoughts about the code is to eliminate letters as they are used on words and output me the string it came up with.
<?php
function anagram($string, $wordlist)
{
if(empty($string))
return;
foreach($wordlist as $line)
{
$line = $org = trim($line);
$line = str_split($line);
sort($line);
foreach($line as $key => $value)
{
if($value != $string[$key])
{
continue 2;
}
}
echo $org . anagram(array_slice($string, count($line)), $wordlist);
}
echo PHP_EOL;
}
$string = "iamaweakishspeller";
$string = str_split($string);
sort($string);
$file = file('wordlist');
anagram($string, $file);
This is my result for now, it looks awful, but I'm having some issues with the code - it's going into an indefinite loop with the same roughly 200 words from the word list.
Can someone take an extra peak at this?
Situation
You have a dictionary(file) and an anagram which contains one or multiple words. The anagram doesn't contain any punctuation or letter case of the original word(s).
Now you want to find all true solutions where you use up all characters of the anagram and decode it into word(s) from the dictionary.
Note: There is a chance that you find multiple solutions and you will never know which one the original text was and in which order the words were, since the characters of multiple words are mixed in the anagram and you don't have punctuation or the case of the letters in it.
Your code
The problem in your current code is exactly that you have multiple words mixed together. If you sort them now and you want to search them in the dictionary you won't be able to find them, since the characters of multiple words are mixed. Example:
anagram = "oatdgc" //"cat" + "dog"
wordList = ["cat", "dog"]
wordListSorted = ["act", "dgo"]
anagramSorted = acdgot
↓↓↓
WordListSorted[0] → cat ✗ no match
WordListSorted[1] → dog ✗ no match
Solution
First I will explain in theory how we construct all possible true solutions and then I explain how every part in the code works.
Theory
So to start we have an anagram and a dictionary. Now we first filter the dictionary by the anagram and only keep the words, which can be constructed by the anagram.
Then we go through all words and for each word we add it to a possible solution, remove it from the anagram, filter the dictionary by the new anagram and call the function with the new values recursively.
We do this until either the anagram is empty and we found a true solution, which we add to our solution collection, or there are no words remaining and it is not a possible solution.
Code
We have two helper functions array_diff_once() and preSelectWords() in our code.
array_diff_once() is pretty much the same as the built-in array_diff() function, except that it only removes values once and not all occurrences. Otherwise there isn't much to explain. It simply loops through the second array and removes the values once in the first array, which then gets returned.
function array_diff_once($arrayOne, $arrayTwo){
foreach($arrayTwo as $v) {
if(($key = array_search($v, $arrayOne)) !== FALSE)
array_splice($arrayOne, $key, 1);
}
return $arrayOne;
}
preSelectWords() takes an anagram and a word list as argument. It simply checks with the help of array_diff_once(), which words of the word list can be constructed with the given anagram. Then it returns all possible words from the word list, which can be constructed with the anagram.
function preSelectWords($anagram, $wordList){
$tmp = [];
foreach($wordList as $word){
if(!array_diff_once(str_split(strtolower($word)), $anagram))
$tmp[] = $word;
}
return $tmp;
}
Now to the main function decodeAnagram(). We pass the anagram and a word list, which we first filter with preSelectWords(), as arguments to the function.
In the function itself we basically just loop through the words and for each word we remove it from the anagram, filter the word list by the new anagram and add the word to a possible solution and call the function recursively.
We do this until either the anagram is empty and we found a true solution, which we add to our solution array, or there are no words left in the list and with that no possible solution.
function decodeAnagram($anagram, $wordList, $solution, &$solutions = []){
if(empty($anagram) && sort($solution) && !isset($solutions[$key = implode($solution)])){
$solutions[$key] = $solution;
return;
}
foreach($wordList as $word)
decodeAnagram(array_diff_once($anagram, str_split(strtolower($word))), preSelectWords(array_diff_once($anagram, str_split(strtolower($word))), $wordList), array_merge($solution, [$word]), $solutions);
}
Code
<?php
function decodeAnagram($anagram, $wordList, $solution, &$solutions = []){
if(empty($anagram) && sort($solution) && !isset($solutions[$key = implode($solution)])){
$solutions[$key] = $solution;
return;
}
foreach($wordList as $word)
decodeAnagram(array_diff_once($anagram, str_split(strtolower($word))), preSelectWords(array_diff_once($anagram, str_split(strtolower($word))), $wordList), array_merge($solution, [$word]), $solutions);
}
function preSelectWords($anagram, $wordList){
$tmp = [];
foreach($wordList as $word){
if(!array_diff_once(str_split(strtolower($word)), $anagram))
$tmp[] = $word;
}
return $tmp;
}
function array_diff_once($arrayOne, $arrayTwo){
foreach($arrayTwo as $v) {
if(($key = array_search($v, $arrayOne)) !== FALSE)
array_splice($arrayOne, $key, 1);
}
return $arrayOne;
}
$solutions = [];
$anagram = "aaaeeehiikllmprssw";
$wordList = ["I", "am", "a", "weakish", "speller", "William", "Shakespeare", "other", "words", "as", "well"];
//↑ file("wordlist", FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES)
decodeAnagram(str_split(strtolower($anagram)), preSelectWords(str_split(strtolower($anagram)), $wordList), [], $solutions);
print_r($solutions);
?>
Output
Array
(
[Iaamspellerweakish] => Array
(
[0] => I
[1] => a
[2] => am
[3] => speller
[4] => weakish
)
[ShakespeareWilliam] => Array
(
[0] => Shakespeare
[1] => William
)
)
(Ignore the keys here, since those are the identifiers of the solutions)

PHP regex generator

I have now got a working regex string for the below needed criteria:
a one line php-ready regex that encompasses a number of keywords, and keyterms and will match at least one of them.
For example:
Keyterms:
apple
banana
strawberry
pear cake
Now if any of these key terms are found then it returns true. However, to add a little more difficulty here, the pear cake term should be split as two keywords which must both be in the string, but need not be together.
Example strings which should return true:
A great cake is made from pear
i like apples
i like apples and bananas
i like cakes made from pear and apples
I like cakes made from pears
The working regex is:
/\bapple|\bbanana|\bstrawberry|\bpear.*?\bcake|\bcake.*?\bpear/
Now I need a php function that will create this regex on the fly from an array of keyterms. The stickler is that a keyterm may have any number of keywords within that key. Only on of the keyterms need be found, but multiple can be present. As above all of the the words within a keyterm must appear in the string in any order.
I have written a function for you here:
<?php
function permutations($array)
{
$list = array();
for ($i=0; $i<=10000; $i++) {
shuffle($array);
$tmp = implode(',',$array);
if (isset($list[$tmp])) {
$list[$tmp]++;
} else {
$list[$tmp] = 1;
}
}
ksort($list);
$list = array_keys($list);
return $list;
}
function CreateRegex($array)
{
$toReturn = '/';
foreach($array AS $value)
{
//Contains spaces
if(strpos($value, " ") != false)
{
$pieces = explode(" ", $value);
$combos = permutations($pieces);
foreach($combos AS $currentCombo)
{
$currentPieces = explode(',', $currentCombo);
foreach($currentPieces AS $finallyGotIt)
{
$toReturn .= '\b' . $finallyGotIt . '.*?';
}
$toReturn = substr($toReturn, 0, -3) . '|';
}
}
else
{
$toReturn .= '\b' . $value . '|';
}
}
$toReturn = substr($toReturn, 0, -1) . '/';
return $toReturn;
}
var_dump(CreateRegex(array('apple', 'banana', 'strawberry', 'pear cake')));
?>
I got the permutations function from:
http://www.hashbangcode.com/blog/getting-all-permutations-array-php-74.html
But I would recommend to find a better function and use another one since just at first glance this one is pretty ugly since it increments $i to 10,000 no matter what.
Also, here is a codepad for the code:
http://codepad.org/nUhFwKz1
Let me know if there is something wrong with it!

PHP: Check string for certain words

How can I check if data submitted from a form or querystring has certain words in it?
I'm trying to look for words containing admin, drop, create etc in form [Post] data and querystring data so I can accept or reject it.
I'm converting from ASP to PHP. I used to do this using an array in ASP (keep all illegal words in a string and use ubound to check the whole string for those words), but is there a better (efficient) way to do this in PHP?
Eg: A string like this would be rejected: "The administrator dropped a blah blah" because it has admin and drop in it.
I intend using this to check usernames when creating accounts and for other things too.
Thanks
You could use stripos()
int stripos ( string $haystack , string $needle [, int $offset = 0 ] )
You could have a function like:
function checkBadWords($str, $badwords) {
foreach ($badwords as $word) {
if (stripos(" $str ", " $word ") !== false) {
return false;
}
}
return true;
}
And to use it:
if (!checkBadWords('something admin', array('admin')) {
// ...
}
strpos() will let you search for a substring within a larger string. It's quick and works well. It returns false if the string's not found, and a number (which could be zero, so you need to use === to check) if it finds the string.
stripos() is a case-insensitive version of the same.
I'm trying to look for words containing admin, drop, create etc in form [Post] data and querystring data so I can accept or reject it.
I suspect that you are trying to filter the string so it's suitable for including in something like a database query, or something like that. If this is the case, this is probably not a good way to go about it, and you'd need to actually need to escape the string using mysql_real_escape_string() or equivalent.
$badwords = array("admin", "drop",);
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (strpos($word, $bw) === 0) {
//contains word $word that starts with bad word $bw
}
}
}
For JGB146, here is a performance comparison with regular expressions:
<?php
function has_bad_words($badwords, $string) {
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (stripos($word, $bw) === 0) {
return true;
}
}
return false;
}
}
function has_bad_words2($badwords, $string) {
$regex = array_map(function ($w) {
return "(?:\\b". preg_quote($w, "/") . ")"; }, $badwords);
$regex = "/" . implode("|", $regex) . "/";
return preg_match($regex, $string) != 0;
}
$badwords = array("abc", "def", "ghi", "jkl", "mnop");
$string = "The quick brown fox jumps over the lazy dog";
$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
has_bad_words($badwords, $string);
}
echo "elapsed: ". (microtime(true) - $start);
$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
has_bad_words2($badwords, $string);
}
echo "elapsed: ". (microtime(true) - $start);
Example output:
elapsed: 0.076514959335327
elapsed: 0.29999899864197
So regular expressions are much slower.
You could use regular expression like this:
preg_match("~(admin)|(drop)|(another token)|(yet another)~",$subject);
building the pattern string from array
$pattern = implode(")|(", $banned_words);
$pattern = "~(".$pattern.")~";
function check($string, $array) {
foreach($array as $item) {
if( preg_match("/($item)/", $string) )
return true;
}
return false;
}
You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.
Originally, I was thinking you could do this with a simple preg_match() call (hence the downvote), however preg_match does not support arrays. Instead, you can do a replacement via preg_replace to have all rejected strings replaced with nothing, and then check to see if the string is changed. This is simple and avoids requiring a loop iteration for each rejected string.
$rejectedStrs = array("/admin/", "/drop/", "/create/");
if($input == preg_replace($rejectedStrs, "", $input)) {
//do stuff
} else {
//reject
}
Note also that you can provide case-insensitive searches by using the i flag on the regex patterns, changing the array of patterns to $rejectedStrs = array("/admin/i", "/drop/i", "/create/i");
On Efficiency
There has been some debate about the efficiency of doing it this way vs the accepted nested loop method. I ran some tests and found the preg_replace method executed around twice as fast as the nested loop. Here is the code and output of those tests:
$input = "You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement. You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.";
$input = "Short string with no matches";
$input2 = "Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. ";
$input3 = "Short string which loop will match quickly";
$input4 = "Longer string that will eventually be matches but first has a lot of words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words and then finally the word create near the end";
$start1 = microtime(true);
$rejectedStrs = array("/loop/", "/operation/", "/create/");
$p_matches = 0;
for ($i = 0; $i < 10000; $i++) {
if (preg_check($rejectedStrs, $input)) $p_matches++;
if (preg_check($rejectedStrs, $input2)) $p_matches++;
if (preg_check($rejectedStrs, $input3)) $p_matches++;
if (preg_check($rejectedStrs, $input4)) $p_matches++;
}
$start2 = microtime(true);
$rejectedStrs = array("loop", "operation", "create");
$l_matches = 0;
for ($i = 0; $i < 10000; $i++) {
if (loop_check($rejectedStrs, $input)) $l_matches++;
if (loop_check($rejectedStrs, $input2)) $l_matches++;
if (loop_check($rejectedStrs, $input3)) $l_matches++;
if (loop_check($rejectedStrs, $input4)) $l_matches++;
}
$end = microtime(true);
echo "preg_match: ".$start1." ".$start2."= ".($start2-$start1)."\nloop_match: ".$start2." ".$end."=".($end-$start2);
function preg_check($rejectedStrs, $input) {
if($input == preg_replace($rejectedStrs, "", $input))
return true;
return false;
}
function loop_check($badwords, $string) {
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (stripos($word, $bw) === 0) {
return true;
}
}
return false;
}
}
Output:
preg_match: 1281908071.4032 1281908071.9947= 0.5915060043335
loop_match: 1281908071.9947 1281908073.006=1.0112948417664
This is actually pretty simple, use substr_count.
And example for you would be:
if (substr_count($variable_to_search, "drop"))
{
echo "error";
}
And to make things even simpler, put your keywords (ie. "drop", "create", "alter") in an array and use foreach to check them. That way you cover all your words. An example
foreach ($keywordArray as $keyword)
{
if (substr_count($variable_to_search, $keyword))
{
echo "error"; //or do whatever you want to do went you find something you don't like
}
}

Regular Expression to match unlimited number of options

I want to be able to parse file paths like this one:
/var/www/index.(htm|html|php|shtml)
into an ordered array:
array("htm", "html", "php", "shtml")
and then produce a list of alternatives:
/var/www/index.htm
/var/www/index.html
/var/www/index.php
/var/www/index.shtml
Right now, I have a preg_match statement that can split two alternatives:
preg_match_all ("/\(([^)]*)\|([^)]*)\)/", $path_resource, $matches);
Could somebody give me a pointer how to extend this to accept an unlimited number of alternatives (at least two)? Just regarding the regular expression, the rest I can deal with.
The rule is:
The list needs to start with a ( and close with a )
There must be one | in the list (i.e. at least two alternatives)
Any other occurrence(s) of ( or ) are to remain untouched.
Update: I need to be able to also deal with multiple bracket pairs such as:
/var/(www|www2)/index.(htm|html|php|shtml)
sorry I didn't say that straight away.
Update 2: If you're looking to do what I'm trying to do in the filesystem, then note that glob() already brings this functionality out of the box. There is no need to implement a custom solutiom. See #Gordon's answer below for details.
I think you're looking for:
/(([^|]+)(|([^|]+))+)/
Basically, put the splitter '|' into a repeating pattern.
Also, your words should be made up 'not pipes' instead of 'not parens', per your third requirement.
Also, prefer + to * for this problem. + means 'at least one'. * means 'zero or more'.
Not exactly what you are asking, but what's wrong with just taking what you have to get the list (ignoring the |s), putting it into a variable and then explodeing on the |s? That would give you an array of however many items there were (including 1 if there wasn't a | present).
Non-regex solution :)
<?php
$test = '/var/www/index.(htm|html|php|shtml)';
/**
*
* #param string $str "/var/www/index.(htm|html|php|shtml)"
* #return array "/var/www/index.htm", "/var/www/index.php", etc
*/
function expand_bracket_pair($str)
{
// Only get the very last "(" and ignore all others.
$bracketStartPos = strrpos($str, '(');
$bracketEndPos = strrpos($str, ')');
// Split on ",".
$exts = substr($str, $bracketStartPos, $bracketEndPos - $bracketStartPos);
$exts = trim($exts, '()|');
$exts = explode('|', $exts);
// List all possible file names.
$names = array();
$prefix = substr($str, 0, $bracketStartPos);
$affix = substr($str, $bracketEndPos + 1);
foreach ($exts as $ext)
{
$names[] = "{$prefix}{$ext}{$affix}";
}
return $names;
}
function expand_filenames($input)
{
$nbBrackets = substr_count($input, '(');
// Start with the last pair.
$sets = expand_bracket_pair($input);
// Now work backwards and recurse for each generated filename set.
for ($i = 0; $i < $nbBrackets; $i++)
{
foreach ($sets as $k => $set)
{
$sets = array_merge(
$sets,
expand_bracket_pair($set)
);
}
}
// Clean up.
foreach ($sets as $k => $set)
{
if (false !== strpos($set, '('))
{
unset($sets[$k]);
}
}
$sets = array_unique($sets);
sort($sets);
return $sets;
}
var_dump(expand_filenames('/(a|b)/var/(www|www2)/index.(htm|html|php|shtml)'));
Maybe I'm still not getting the question, but my assumption is you are running through the filesystem until you hit one of the files, in which case you could do
$files = glob("$path/index.{htm,html,php,shtml}", GLOB_BRACE);
The resulting array will contain any file matching your extensions in $path or none. If you need to include files by a specific extension order, you can foreach over the array with an ordered list of extensions, e.g.
foreach(array('htm','html','php','shtml') as $ext) {
foreach($files as $file) {
if(pathinfo($file, PATHINFO_EXTENSION) === $ext) {
// do something
}
}
}
Edit: and yes, you can have multiple curly braces in glob.
The answer is given, but it's a funny puzzle and i just couldn't resist
function expand_filenames2($str) {
$r = array($str);
$n = 0;
while(preg_match('~(.*?) \( ( \w+ \| [\w|]+ ) \) (.*) ~x', $r[$n++], $m)) {
foreach(explode('|', $m[2]) as $e)
$r[] = $m[1] . $e . $m[3];
}
return array_slice($r, $n - 1);
}
print_r(expand_filenames2('/(a|b)/var/(ignore)/(www|www2)/index.(htm|html|php|shtml)!'));
maybe this explains a bit why we like regexps that much ;)

Categories