PHP: Check string for certain words - php

How can I check if data submitted from a form or querystring has certain words in it?
I'm trying to look for words containing admin, drop, create etc in form [Post] data and querystring data so I can accept or reject it.
I'm converting from ASP to PHP. I used to do this using an array in ASP (keep all illegal words in a string and use ubound to check the whole string for those words), but is there a better (efficient) way to do this in PHP?
Eg: A string like this would be rejected: "The administrator dropped a blah blah" because it has admin and drop in it.
I intend using this to check usernames when creating accounts and for other things too.
Thanks

You could use stripos()
int stripos ( string $haystack , string $needle [, int $offset = 0 ] )
You could have a function like:
function checkBadWords($str, $badwords) {
foreach ($badwords as $word) {
if (stripos(" $str ", " $word ") !== false) {
return false;
}
}
return true;
}
And to use it:
if (!checkBadWords('something admin', array('admin')) {
// ...
}

strpos() will let you search for a substring within a larger string. It's quick and works well. It returns false if the string's not found, and a number (which could be zero, so you need to use === to check) if it finds the string.
stripos() is a case-insensitive version of the same.
I'm trying to look for words containing admin, drop, create etc in form [Post] data and querystring data so I can accept or reject it.
I suspect that you are trying to filter the string so it's suitable for including in something like a database query, or something like that. If this is the case, this is probably not a good way to go about it, and you'd need to actually need to escape the string using mysql_real_escape_string() or equivalent.

$badwords = array("admin", "drop",);
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (strpos($word, $bw) === 0) {
//contains word $word that starts with bad word $bw
}
}
}
For JGB146, here is a performance comparison with regular expressions:
<?php
function has_bad_words($badwords, $string) {
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (stripos($word, $bw) === 0) {
return true;
}
}
return false;
}
}
function has_bad_words2($badwords, $string) {
$regex = array_map(function ($w) {
return "(?:\\b". preg_quote($w, "/") . ")"; }, $badwords);
$regex = "/" . implode("|", $regex) . "/";
return preg_match($regex, $string) != 0;
}
$badwords = array("abc", "def", "ghi", "jkl", "mnop");
$string = "The quick brown fox jumps over the lazy dog";
$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
has_bad_words($badwords, $string);
}
echo "elapsed: ". (microtime(true) - $start);
$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
has_bad_words2($badwords, $string);
}
echo "elapsed: ". (microtime(true) - $start);
Example output:
elapsed: 0.076514959335327
elapsed: 0.29999899864197
So regular expressions are much slower.

You could use regular expression like this:
preg_match("~(admin)|(drop)|(another token)|(yet another)~",$subject);
building the pattern string from array
$pattern = implode(")|(", $banned_words);
$pattern = "~(".$pattern.")~";

function check($string, $array) {
foreach($array as $item) {
if( preg_match("/($item)/", $string) )
return true;
}
return false;
}

You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.
Originally, I was thinking you could do this with a simple preg_match() call (hence the downvote), however preg_match does not support arrays. Instead, you can do a replacement via preg_replace to have all rejected strings replaced with nothing, and then check to see if the string is changed. This is simple and avoids requiring a loop iteration for each rejected string.
$rejectedStrs = array("/admin/", "/drop/", "/create/");
if($input == preg_replace($rejectedStrs, "", $input)) {
//do stuff
} else {
//reject
}
Note also that you can provide case-insensitive searches by using the i flag on the regex patterns, changing the array of patterns to $rejectedStrs = array("/admin/i", "/drop/i", "/create/i");
On Efficiency
There has been some debate about the efficiency of doing it this way vs the accepted nested loop method. I ran some tests and found the preg_replace method executed around twice as fast as the nested loop. Here is the code and output of those tests:
$input = "You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement. You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.";
$input = "Short string with no matches";
$input2 = "Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. ";
$input3 = "Short string which loop will match quickly";
$input4 = "Longer string that will eventually be matches but first has a lot of words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words and then finally the word create near the end";
$start1 = microtime(true);
$rejectedStrs = array("/loop/", "/operation/", "/create/");
$p_matches = 0;
for ($i = 0; $i < 10000; $i++) {
if (preg_check($rejectedStrs, $input)) $p_matches++;
if (preg_check($rejectedStrs, $input2)) $p_matches++;
if (preg_check($rejectedStrs, $input3)) $p_matches++;
if (preg_check($rejectedStrs, $input4)) $p_matches++;
}
$start2 = microtime(true);
$rejectedStrs = array("loop", "operation", "create");
$l_matches = 0;
for ($i = 0; $i < 10000; $i++) {
if (loop_check($rejectedStrs, $input)) $l_matches++;
if (loop_check($rejectedStrs, $input2)) $l_matches++;
if (loop_check($rejectedStrs, $input3)) $l_matches++;
if (loop_check($rejectedStrs, $input4)) $l_matches++;
}
$end = microtime(true);
echo "preg_match: ".$start1." ".$start2."= ".($start2-$start1)."\nloop_match: ".$start2." ".$end."=".($end-$start2);
function preg_check($rejectedStrs, $input) {
if($input == preg_replace($rejectedStrs, "", $input))
return true;
return false;
}
function loop_check($badwords, $string) {
foreach (str_word_count($string, 1) as $word) {
foreach ($badwords as $bw) {
if (stripos($word, $bw) === 0) {
return true;
}
}
return false;
}
}
Output:
preg_match: 1281908071.4032 1281908071.9947= 0.5915060043335
loop_match: 1281908071.9947 1281908073.006=1.0112948417664

This is actually pretty simple, use substr_count.
And example for you would be:
if (substr_count($variable_to_search, "drop"))
{
echo "error";
}
And to make things even simpler, put your keywords (ie. "drop", "create", "alter") in an array and use foreach to check them. That way you cover all your words. An example
foreach ($keywordArray as $keyword)
{
if (substr_count($variable_to_search, $keyword))
{
echo "error"; //or do whatever you want to do went you find something you don't like
}
}

Related

recursively get user input value in array values

I am leaning recursion and I want to create a search engine which depends on a user value and gets from an array all values which together make up the word that the user typed.
For example I have this array :
$array = array('it', 'pro', 'gram', 'grammer', 'mer', 'programmer');
$string = "itprogrammer";
If anyone can help I appreciate it a lot. Thank you for your help.
Here is a recursive function that will do what you want. It loops through the array, looking for words that match the beginning of the string. It it finds one, it then recursively tries to find words in the array (excluding the word already matched) which match the the string after it has had the first match removed.
function find_words($string, $array) {
// if the string is empty, we're done
if (strlen($string) == 0) return array();
$output = array();
for ($i = 0; $i < count($array); $i++) {
// does this word match the start of the string?
if (stripos($string, $array[$i]) === 0) {
$match_len = strlen($array[$i]);
$this_match = array($array[$i]);
// see if we can match the rest of the string with other words in the array
$rest_of_array = array_merge($i == 0 ? array() : array_slice($array, 0, $i), array_slice($array, $i+1));
if (count($matches = find_words(substr($string, $match_len), $rest_of_array))) {
// yes, found a match, return it
foreach ($matches as $match) {
$output[] = array_merge($this_match, $match);
}
}
else {
// was end of string or didn't match anything more, just return the current match
$output[] = $this_match;
}
}
}
// any matches? if so, return them, otherwise return false
return $output;
}
You can display the output in the format you desire with:
$wordstrings = array();
if (($words_array = find_words($string, $array)) !== false) {
foreach ($words_array as $words) {
$wordstrings[] = implode(', ', $words);
}
echo implode("<br>\n", $wordstrings);
}
else {
echo "No match found!";
}
I made a slightly more complex example (demo on rextester):
$array = array('pro', 'gram', 'merit', 'mer', 'program', 'it', 'programmer');
$strings = array("programmerit", "probdjsabdjsab", "programabdjsab");
Output:
string: 'programmerit' matches:
pro, gram, merit<br>
pro, gram, mer, it<br>
program, merit<br>
program, mer, it<br>
programmer, it
string: 'probdjsabdjsab' matches:
pro
string: 'programabdjsab' matches:
pro, gram<br>
program
Update
Updated code and demo based on OPs comments about not needing to match the whole string.

How to check array data that matches from random characters in php?

I have an array like below:
$fruits = array("apple","orange","papaya","grape")
I have a variable like below:
$content = "apple";
I need to filter some condition like: if this variable matches at least one of the array elements, do something. The variable, $content, is a bunch of random characters that is actually one of these available in the array data like below:
$content = "eaplp"; // it's a dynamically random char from the actual word "apple`
what have I done was like the below:
$countcontent = count($content);
for($a=0;$a==count($fruits);$a++){
$countarr = count($fruits[$a]);
if($content == $fruits[$a] && $countcontent == $countarr){
echo "we got".$fruits[$a];
}
}
I tried to count how many letters these phrases had and do like if...else... when the total word in string matches with the total word on one of array data, but is there something that we could do other than that?
We can check if an array contains some value with in_array. So you can check if your $fruits array contains the string "apple" with,
in_array("apple", $fruits)
which returns a boolean.
If the order of the letters is random, we can sort the string alphabetically with this function:
function sorted($s) {
$a = str_split($s);
sort($a);
return implode($a);
}
Then map this function to your array and check if it contains the sorted string.
$fruits = array("apple","orange","papaya","grape");
$content = "eaplp";
$inarr = in_array(sorted($content), array_map("sorted", $fruits));
var_dump($inarr);
//bool(true)
Another option is array_search. The benefit from using array_search is that it returns the position of the item (if it's found in the array, else false).
$pos = array_search(sorted($content), array_map("sorted", $fruits));
echo ($pos !== false) ? "$fruits[$pos] found." : "not found.";
//apple found.
This will also work but in a slightly different manner.
I split the strings to arrays and sort them to match eachoter.
Then I use array_slice to only match the number of characters in $content, if they match it's a match.
This means this will match in a "loose" way to with "apple juice" or "apple curd".
Not sure this is wanted but figured it could be useful for someone.
$fruits = array("apple","orange","papaya","grape","apple juice", "applecurd");
$content = "eaplp";
$content = str_split($content);
$count = count($content);
Foreach($fruits as $fruit){
$arr_fruit = str_split($fruit);
// sort $content to match order of $arr_fruit
$SortCont = array_merge(array_intersect($arr_fruit, $content), array_diff($content, $arr_fruit));
// if the first n characters match call it a match
If(array_slice($SortCont, 0, $count) == array_slice($arr_fruit, 0, $count)){
Echo "match: " . $fruit ."\n";
}
}
output:
match: apple
match: apple juice
match: applecurd
https://3v4l.org/hHvp3
It is also comparable in speed with t.m.adams answer. Sometimes faster sometimes slower, but note how this code can find multiple answers. https://3v4l.org/IbuuD
I think this is the simplest way to answer that question. some of the algorithm above seems to be "overkill".
$fruits = array("apple","orange","papaya","grape");
$content = "eaplp";
foreach ($fruits as $key => $fruit) {
$fruit_array = str_split($fruit); // split the string into array
$content_array = str_split($content); // split the content into array
// check if there's no difference between the 2 new array
if ( sizeof(array_diff($content_array, $fruit_array)) === 0 ) {
echo "we found the fruit at key: " . $key;
return;
}
}
What about using only native PHP functions.
$index = array_search(count_chars($content), array_map('count_chars', $fruits));
If $index is not null you will get the position of $content inside $fruits.
P.S. Be aware that count_chars might not be the fastest approach to that problem.
With a random token to search for a value in your array, you have a problem with false positives. That can give misleading results depending on the use case.
On search cases, for example wrong typed words, I would implement a filter solution which produces a matching array. One could sort the results by calculating the levenshtein distance to fetch the most likely result, if necessary.
String length solution
Very easy to implement.
False positives: Nearly every string with the same length like apple and grape would match.
Example:
$matching = array_filter($fruits, function ($fruit) use ($content) {
return strlen($content) == strlen($fruit);
});
if (count($matching)) {
// do your stuff...
}
Regular expression solution
It compares string length and in a limited way containing characters. It is moderate to implement and has a good performance on big data cases.
False positives: A content like abc would match bac but also bbb.
Example:
$matching = preg_grep(
'#['.preg_quote($content).']{'.strlen($content).'}#',
$fruits
);
if (count($matching)) {
// do your stuff...
}
Alphanumeric sorting solution
Most accurate but also a slow approach concerning performance using PHP.
False positives: A content like abc would match on bac or cab.
Example:
$normalizer = function ($value) {
$tokens = str_split($value);
sort($tokens);
return implode($tokens);
};
$matching = array_filter($fruits, function ($fruit) use ($content, $normalizer) {
return ($normalizer($fruit) == $normalizer($content));
});
if (count($matching)) {
// do your stuff...
}
Here's a clean approach. Returns unscrambled value early if found, otherwise returns null. Only returns an exact match.
function sortStringAlphabetically($stringToSort)
{
$splitString = str_split($stringToSort);
sort($splitString);
return implode($splitString);
}
function getValueFromRandomised(array $dataToSearch = [], $dataToFind)
{
$sortedDataToFind = sortStringAlphabetically($dataToFind);
foreach ($dataToSearch as $value) {
if (sortStringAlphabetically($value) === $sortedDataToFind) {
return $value;
}
}
return null;
}
$fruits = ['apple','orange','papaya','grape'];
$content = 'eaplp';
$dataExists = getValueFromRandomised($fruits, $content);
var_dump($dataExists);
// string(5) "apple"
Not found example:
$content = 'eaplpo';
var_dump($dataExists);
// NULL
Then you can use it (eg) like this:
echo !empty($dataExists) ? $dataExists . ' was found' : 'No match found';
NOTE: This is case sensitive, meaning it wont find "Apple" from "eaplp". That can be resolved by doing strtolower() on the loop's condition vars.
How about looping through the array, and using a flag to see if it matches?
$flag = false;
foreach($fruits as $fruit){
if($fruit == $content){
$flag = true;
}
}
if($flag == true){
//do something
}
I like t.m.adams answer but I also have a solution for this issue:
array_search_random(string $needle, array $haystack [, bool $strictcase = FALSE ]);
Description: Searches a string in array elements regardless of the position of the characters in the element.
needle: the caracters you are looking for as a string
haystack: the array you want to search
strictcase: if set to TRUE needle 'mood' will match 'mood' and 'doom' but not 'Mood' and 'Doom', if set to FALSE (=default) it will match all of these.
Function:
function array_search_random($needle, $haystack, $strictcase=false){
if($strictcase === false){
$needle = strtolower($needle);
}
$needle = str_split($needle);
sort($needle);
$needle = implode($needle);
foreach($haystack as $straw){
if($strictcase === false){
$straw = strtolower($straw);
}
$straw = str_split($straw);
sort($straw);
$straw = implode($straw);
if($straw == $needle){
return true;
}
}
return false;
}
if(in_array("apple", $fruits)){
true statement
}else{
else statement
}

Regex for number comparison?

I would like to perform regex to return true/false if the input 5 digit from input matching data in database, no need to cater of the sequence, but need the exact numbers.
Eg: In database I have 12345
When I key in a 5 digit value into search, I want to find out whether it is matching the each number inside the 12345.
If I key in 34152- it should return true
If I key in 14325- it should return true
If I key in 65432- it should return false
If I key in 11234- it should return false
Eg: In database I have 44512
If I key in 21454- it should return true
If I key in 21455- it should return false
How to do this using php with regex
This is a way avoiding regex
<?php
function cmpkey($a,$b){
$aa = str_split($a); sort($aa);
$bb = str_split($b); sort($bb);
return ( implode("",$aa) == implode("",$bb));
}
?>
Well, it's not going to be a trivial regex, I can tell you that. You could do something like this:
$chars = count_chars($input, 1);
$numbers = array();
foreach ($chars as $char => $frequency) {
if (is_numeric(chr($char))) {
$numbers[chr($char)] = $frequency;
}
}
// that converts "11234" into array(1 => 2, 2 => 1, 3 => 1, 4 => 1)
Now, since MySQL doesn't support assertions in regex, you'll need to do this in multiple regexes:
$where = array();
foreach ($numbers AS $num => $count) {
$not = "[^$num]";
$regex = "^";
for ($i = 0; $i < $count; $i++) {
$regex .= "$not*$num";
}
$regex .= "$not*";
$where[] = "numberField REGEXP '$regex'";
}
$where = '((' . implode(') AND (', $where).'))';
That'll produce:
(
(numberField REGEXP '^[^1]*1[^1]*1[^1]*$')
AND
(numberField REGEXP '^[^2]*2[^2]*$')
AND
(numberField REGEXP '^[^3]*3[^3]*$')
AND
(numberField REGEXP '^[^4]*4[^4]*$')
)
That should do it for you.
It's not pretty, but it should take care of all of the possible permutations for you, assuming that your stored data format is consistent...
But, depending on your needs, you should try to pull it out and process it in PHP. In which case the regex would be far simpler:
^(?=.*1.*1})(?=.*2)(?=.*3)(?=.*4)\d{5}$
Or, you could also pre-sort the number before you insert it. So instead of inserting 14231, you'd insert 11234. That way, you always know the sequence is ordered properly, so you just need to do numberField = '11234' instead of that gigantic beast above...
Try using
^(?=.*1)(?=.*2)(?=.*3)(?=.*4)(?=.*5).{5}$
This will get much more complicated, when you have duplicate numbers.
You really should not do this with regex. =)

How to find first non-repetitive character from a string?

I've spent half day trying to figure out this and finally I got working solution.
However, I feel like this can be done in simpler way.
I think this code is not really readable.
Problem: Find first non-repetitive character from a string.
$string = "abbcabz"
In this case, the function should output "c".
The reason I use concatenation instead of $input[index_to_remove] = ''
in order to remove character from a given string
is because if I do that, it actually just leave empty cell so that my
return value $input[0] does not not return the character I want to return.
For instance,
$str = "abc";
$str[0] = '';
echo $str;
This will output "bc"
But actually if I test,
var_dump($str);
it will give me:
string(3) "bc"
Here is my intention:
Given: input
while first char exists in substring of input {
get index_to_remove
input = chars left of index_to_remove . chars right of index_to_remove
if dupe of first char is not found from substring
remove first char from input
}
return first char of input
Code:
function find_first_non_repetitive2($input) {
while(strpos(substr($input, 1), $input[0]) !== false) {
$index_to_remove = strpos(substr($input,1), $input[0]) + 1;
$input = substr($input, 0, $index_to_remove) . substr($input, $index_to_remove + 1);
if(strpos(substr($input, 1), $input[0]) == false) {
$input = substr($input, 1);
}
}
return $input[0];
}
<?php
// In an array mapped character to frequency,
// find the first character with frequency 1.
echo array_search(1, array_count_values(str_split('abbcabz')));
Python:
def first_non_repeating(s):
for i, c in enumerate(s):
if s.find(c, i+1) < 0:
return c
return None
Same in PHP:
function find_first_non_repetitive($s)
{
for($i = 0; i < strlen($s); $i++) {
if (strpos($s, $s[i], $i+1) === FALSE)
return $s[i];
}
}
Pseudocode:
Array N;
For each letter in string
if letter not exists in array N
Add letter to array and set its count to 1
else
go to its position in array and increment its count
End for
for each position in array N
if value at potition == 1
return the letter at position and exit for loop
else
//do nothing (for clarity)
end for
Basically, you find all distinct letters in the string, and for each letter, you associate it with a count of how many of that letter exist in the string. then you return the first one that has a count of 1
The complexity of this method is O(n^2) in the worst case if using arrays. You can use an associative array to increase it's performance.
1- use a sorting algotithm like mergesort (or quicksort has better performance with small inputs)
2- then control repetetive characters
non repetetive characters will be single
repetetvives will fallow each other
Performance : sort + compare
Performance : O(n log n) + O(n) = O(n log n)
For example
$string = "abbcabz"
$string = mergesort ($string)
// $string = "aabbbcz"
Then take first char form string then compare with next one if match repetetive
move to the next different character and compare
first non-matching character is non-repetetive
This can be done in much more readable code using some standard PHP functions:
// Count number of occurrences for every character
$counts = count_chars($string);
// Keep only unique ones (yes, we use this ugly pre-PHP-5.3 syntax here, but I can live with that)
$counts = array_filter($counts, create_function('$n', 'return $n == 1;'));
// Convert to a list, then to a string containing every unique character
$chars = array_map('chr', array_keys($counts));
$chars = implode($chars);
// Get a string starting from the any of the characters found
// This "strpbrk" is probably the most cryptic part of this code
$substring = strlen($chars) ? strpbrk($string, $chars) : '';
// Get the first character from the new string
$char = strlen($substring) ? $substring[0] : '';
// PROFIT!
echo $char;
$str="abbcade";
$checked= array(); // we will store all checked characters in this array, so we do not have to check them again
for($i=0; $i<strlen($str); $i++)
{
$c=0;
if(in_array($str[$i],$checked)) continue;
$checked[]=$str[$i];
for($j=$i+1;$j<=strlen($str);$j++)
{
if($str[$i]==$str[$j])
{
$c=1;
break;
}
}
if($c!=1)
{
echo "First non repetive char is:".$str[$i];
break;
}
}
This should replace your code...
$array = str_split($string);
$array = array_count_values($array);
$array = array_filter($array, create_function('$key,$val', 'return($val == 1);'));
$first_non_repeated_letter = key(array_shift($array));
Edit: spoke too soon. Took out 'array_unique', thought it actually dropped duplicate values. But character order should be preserved to be able to find the first character.
Here's a function in Scala that would do it:
def firstUnique(chars:List[Char]):Option[Char] = chars match {
case Nil => None
case head::tail => {
val filtered = tail filter (_!=head)
if (tail.length == filtered.length) Some(head) else firstUnique(filtered)
}
}
scala> firstUnique("abbcabz".toList)
res5: Option[Char] = Some(c)
And here's the equivalent in Haskell:
firstUnique :: [Char] -> Maybe Char
firstUnique [] = Nothing
firstUnique (head:tail) = let filtered = (filter (/= head) tail) in
if (tail == filtered) then (Just head) else (firstUnique filtered)
*Main> firstUnique "abbcabz"
Just 'c'
You can solve this more generally by abstracting over lists of things that can be compared for equality:
firstUnique :: Eq a => [a] -> Maybe a
Strings are just one such list.
Can be also done using array_key_exists during building an associative array from the string. Each character will be a key and will count the number as value.
$sample = "abbcabz";
$check = [];
for($i=0; $i<strlen($sample); $i++)
{
if(!array_key_exists($sample[$i], $check))
{
$check[$sample[$i]] = 1;
}
else
{
$check[$sample[$i]] += 1;
}
}
echo array_search(1, $check);

Regular Expression to match unlimited number of options

I want to be able to parse file paths like this one:
/var/www/index.(htm|html|php|shtml)
into an ordered array:
array("htm", "html", "php", "shtml")
and then produce a list of alternatives:
/var/www/index.htm
/var/www/index.html
/var/www/index.php
/var/www/index.shtml
Right now, I have a preg_match statement that can split two alternatives:
preg_match_all ("/\(([^)]*)\|([^)]*)\)/", $path_resource, $matches);
Could somebody give me a pointer how to extend this to accept an unlimited number of alternatives (at least two)? Just regarding the regular expression, the rest I can deal with.
The rule is:
The list needs to start with a ( and close with a )
There must be one | in the list (i.e. at least two alternatives)
Any other occurrence(s) of ( or ) are to remain untouched.
Update: I need to be able to also deal with multiple bracket pairs such as:
/var/(www|www2)/index.(htm|html|php|shtml)
sorry I didn't say that straight away.
Update 2: If you're looking to do what I'm trying to do in the filesystem, then note that glob() already brings this functionality out of the box. There is no need to implement a custom solutiom. See #Gordon's answer below for details.
I think you're looking for:
/(([^|]+)(|([^|]+))+)/
Basically, put the splitter '|' into a repeating pattern.
Also, your words should be made up 'not pipes' instead of 'not parens', per your third requirement.
Also, prefer + to * for this problem. + means 'at least one'. * means 'zero or more'.
Not exactly what you are asking, but what's wrong with just taking what you have to get the list (ignoring the |s), putting it into a variable and then explodeing on the |s? That would give you an array of however many items there were (including 1 if there wasn't a | present).
Non-regex solution :)
<?php
$test = '/var/www/index.(htm|html|php|shtml)';
/**
*
* #param string $str "/var/www/index.(htm|html|php|shtml)"
* #return array "/var/www/index.htm", "/var/www/index.php", etc
*/
function expand_bracket_pair($str)
{
// Only get the very last "(" and ignore all others.
$bracketStartPos = strrpos($str, '(');
$bracketEndPos = strrpos($str, ')');
// Split on ",".
$exts = substr($str, $bracketStartPos, $bracketEndPos - $bracketStartPos);
$exts = trim($exts, '()|');
$exts = explode('|', $exts);
// List all possible file names.
$names = array();
$prefix = substr($str, 0, $bracketStartPos);
$affix = substr($str, $bracketEndPos + 1);
foreach ($exts as $ext)
{
$names[] = "{$prefix}{$ext}{$affix}";
}
return $names;
}
function expand_filenames($input)
{
$nbBrackets = substr_count($input, '(');
// Start with the last pair.
$sets = expand_bracket_pair($input);
// Now work backwards and recurse for each generated filename set.
for ($i = 0; $i < $nbBrackets; $i++)
{
foreach ($sets as $k => $set)
{
$sets = array_merge(
$sets,
expand_bracket_pair($set)
);
}
}
// Clean up.
foreach ($sets as $k => $set)
{
if (false !== strpos($set, '('))
{
unset($sets[$k]);
}
}
$sets = array_unique($sets);
sort($sets);
return $sets;
}
var_dump(expand_filenames('/(a|b)/var/(www|www2)/index.(htm|html|php|shtml)'));
Maybe I'm still not getting the question, but my assumption is you are running through the filesystem until you hit one of the files, in which case you could do
$files = glob("$path/index.{htm,html,php,shtml}", GLOB_BRACE);
The resulting array will contain any file matching your extensions in $path or none. If you need to include files by a specific extension order, you can foreach over the array with an ordered list of extensions, e.g.
foreach(array('htm','html','php','shtml') as $ext) {
foreach($files as $file) {
if(pathinfo($file, PATHINFO_EXTENSION) === $ext) {
// do something
}
}
}
Edit: and yes, you can have multiple curly braces in glob.
The answer is given, but it's a funny puzzle and i just couldn't resist
function expand_filenames2($str) {
$r = array($str);
$n = 0;
while(preg_match('~(.*?) \( ( \w+ \| [\w|]+ ) \) (.*) ~x', $r[$n++], $m)) {
foreach(explode('|', $m[2]) as $e)
$r[] = $m[1] . $e . $m[3];
}
return array_slice($r, $n - 1);
}
print_r(expand_filenames2('/(a|b)/var/(ignore)/(www|www2)/index.(htm|html|php|shtml)!'));
maybe this explains a bit why we like regexps that much ;)

Categories