I have been using this /\(\s*([^)]+?)\s*\)/ regex to remove outer brackets with PHP preg_replace function (Read more in my previous question Regex to match any character except trailing spaces).
This works fine when there is only one pair of brackets, but problem is when there is more, for example ( test1 t3() test2) becomes test1 t3( test2) instead test1 t3() test2.
I am aware of regex limitations, but it would be nice if I could just make it not matching anything if there is more then one pair of brackets.
So, example behavior is good enough:
( test1 test2 ) => test1 test2
( test1 t3() test2 ) => (test1 t3() test2)
EDIT:
I would like to keep trimming trailing white spaces inside removed brackets.
You can use this recursive regex based code that will work with nested brackets also. Only condition is that brackets should be balanced.
$arr = array('Foo ( test1 test2 )', 'Bar ( test1 t3() test2 )', 'Baz ((("Fdsfds")))');
foreach($arr as $str)
echo "'$str' => " .
preg_replace('/ \( \s* ( ( [^()]*? | (?R) )* ) \s* \) /x', '$1', $str) . "\n";
OUTPUT:
'Foo ( test1 test2 )' => 'Foo test1 test2'
'Bar ( test1 t3() test2 )' => 'Bar test1 t3() test2'
'Baz ((("Fdsfds")))' => 'Baz (("Fdsfds"))'
Try this
$result = preg_replace('/\(([^)(]+)\)/', '$1', $subject);
Update
\(([^\)\(]+)\)(?=[^\(]+\()
RegEx explanation
"
\( # Match the character “(” literally
( # Match the regular expression below and capture its match into backreference number 1
[^\)\(] # Match a single character NOT present in the list below
# A ) character
# A ( character
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\) # Match the character “)” literally
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
[^\(] # Match any character that is NOT a ( character
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\( # Match the character “(” literally
)
"
You may want this (As I guess it is what you want originally):
$result = preg_replace('/\(\s*(.+)\s*\)/', '$1', $subject);
This would get
"(test1 test2)" => "test1 test2"
"(test1 t3() test2)" => "test1 t3() test2"
"( test1 t3(t4) test2)" => "test1 t3(t4) test2"
Related
I'm looking to split a string by spaces, unless there is the string " NOT ", in which case I would only want to split by the space before the "NOT", and not after the "NOT".
Example:
"cancer disease NOT brain NOT sickle"
should become:
["cancer", "disease", "NOT brain", "NOT sickle"]
Here is what I have so far, but it is incorrect:
$splitKeywordArr = preg_split('/[^(NOT)]( )/', "cancer disease NOT brain NOT sickle")
It results in:
["cance", "diseas", "NOT brai", "NOT sickle"]
I know why it is incorrect, but I don't know how to fix it.
You may use
<?php
$text = "cancer disease NOT brain NOT sickle";
$pattern = "~NOT\s+(*SKIP)(*FAIL)|\s+~";
print_r(preg_split($pattern, $text));
?>
Which yields
Array
(
[0] => cancer
[1] => disease
[2] => NOT brain
[3] => NOT sickle
)
See a demo on ideone.com.
You might also match optional repetitions of the word NOT followed by 1+ word characters in case the word occurs multiple times after each other.
(?:\bNOT\h+)*\w+
The pattern matches:
(?: Non capture group
\bNOT\h+ A word boundary, match NOT and 1 or more horizontal whitespace chars
)* Close non capture group and optionally repeat
\w+ Match 1+ word characters
Regex demo | Php demo
$str = "cancer disease NOT brain NOT sickle";
preg_match_all('/(?:\bNOT\h+)*\w+/', $str, $matches);
print_r($matches[0]);
Output
Array
(
[0] => cancer
[1] => disease
[2] => NOT brain
[3] => NOT sickle
)
I have a problem similar to this question, however with one more twist.
I want to explode the following string:
title:"tab system" color:="blue" price:>10
into
array("title:\"tab system\"", "color:=\"blue\"", "price:>10")
Here's what I've tried so far from the above link:
$text = "title:\"tab system\" color:=\"blue\" price:>10";
preg_match_all('/"(?:\\\\.|[^\\\\"])*"|\S+/', $text, $matches);
print_r($matches);
Which produces:
(
[0] => title:"tab
[1] => system"
[2] => color:="blue"
[3] => price:>10
)
and:
print_r(str_getcsv($text, ' '));
which produces the same thing.
These solutions don't work for me because as you can see, it's possible that the quotes may not start next to the delimiter (in this case, a space). Also, that's just one example of an input string, there could be many variations of it.
You may use
preg_split('~(?<!\\\\)(?:\\\\{2})*"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"(*SKIP)(*F)|\s+~s', $s)
See the regex demo
Details
(?<!\\) - no \ allowed immediately to the left of the current location
(?:\\{2})* - zero or more double backslashes
" - a quote
[^"\\]* - 0+ chars other than " and \
(?:\\.[^"\\]*)* - 0+ sequences of
\\. - any escape sequence
[^"\\]* - 0+ chars other than " and \
" - a quote
(*SKIP)(*F) - skipping the match and proceeding to the next match from the current match end location
| - or
\s+ - 1+ whitespaces in any other contexts.
See the PHP demo:
$s = 'title:"tab system" color:="blue" price:>10';
print_r(preg_split('~(?<!\\\\)(?:\\\\{2})*"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"(*SKIP)(*F)|\s+~s', $s));
Output:
Array
(
[0] => title:"tab system"
[1] => color:="blue"
[2] => price:>10
)
Look for a whitespace with a double quote either before or after and then split on it:
$result = preg_split( "/((?<=\")\s)|(\s(?=\"))/" , $string );
((?<=\")\s) looks for "[space] but does not select the "
(\s(?=\")) looks for [space]" but does not select the "
The result:
Array
(
[0] => title:"tab system"
[1] => color:="blue"
[2] => price:>10
)
having a string like this:
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"
the desired result is:
[0] => Array (
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
what I get with:
preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);
is:
[0] => Array (
[0] => dateto:'2015-10-07
[1] => 15:05'
[2] => xxxx
[3] => datefrom:'2015-10-09
[4] => 15:05'
[5] => yyyy
[6] => asdf
)
Also tried with preg_split("/[\s]+/", $str) but no clue how to escape if value is between quotes. Can anyone show me how and also please explain the regex. Thank you!
I would use PCRE verb (*SKIP)(*F),
preg_split("~'[^']*'(*SKIP)(*F)|\s+~", $str);
DEMO
Often, when you are looking to split a string, using preg_split isn't the best approach (that seems a little counter intuitive, but that's true most of the time). A more efficient way consists to find all items (with preg_match_all) using a pattern that describes all that is not the delimiter (white-spaces here):
$pattern = <<<'EOD'
~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
EOD;
if (preg_match_all($pattern, $str, $m))
$result = $m[0];
pattern details:
~ # pattern delimiter
(?=\S) # the lookahead assertion only succeeds if there is a non-
# white-space character at the current position.
# (This lookahead is useful for two reasons:
# - it allows the regex engine to quickly find the start of
# the next item without to have to test each branch of the
# following alternation at each position in the strings
# until one succeeds.
# - it ensures that there's at least one non-white-space.
# Without it, the pattern may match an empty string.
# )
[^'"\s]* #"'# all that is not a quote or a white-space
(?: # eventual quoted parts
'[^']*' [^'"\s]* #"# single quotes
|
"[^"]*" [^'"\s]* # double quotes
)*
~
demo
Note that with this a little long pattern, the five items of your example string are found in only 60 steps. You can use this shorter/more simple pattern too:
~(?:[^'"\s]+|'[^']*'|"[^"]*")+~
but it's a little less efficient.
For your example, you can use preg_split with negative lookbehind (?<!\d), i.e.:
<?php
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf";
$matches = preg_split('/(?<!\d)(\s)/', $str);
print_r($matches);
Output:
Array
(
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
Demo:
http://ideone.com/EP06Nt
Regex Explanation:
(?<!\d)(\s)
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\d)»
Match a single character that is a “digit” «\d»
Match the regex below and capture its match into backreference number 1 «(\s)»
Match a single character that is a “whitespace character” «\s»
I have been sitting for hours to figure out a regExp for a preg_match_all function in php.
My problem is that i whant two different things from the string.
Say you have the string "Code is fun [and good for the brain.] But the [brain is] tired."
What i need from this an array of all the word outside of the brackets and the text in the brackets together as one string.
Something like this
[0] => Code
[1] => is
[2] => fun
[3] => and good for the brain.
[4] => But
[5] => the
[6] => brain is
[7] => tired.
Help much appreciated.
You could try the below regex also,
(?<=\[)[^\]]*|[.\w]+
DEMO
Code:
<?php
$data = "Code is fun [and good for the brain.] But the [brain is] tired.";
$regex = '~(?<=\[)[^\]]*|[.\w]+~';
preg_match_all($regex, $data, $matches);
print_r($matches);
?>
Output:
Array
(
[0] => Array
(
[0] => Code
[1] => is
[2] => fun
[3] => and good for the brain.
[4] => But
[5] => the
[6] => brain is
[7] => tired.
)
)
The first lookbind (?<=\[)[^\]]* matches all the characters which are present inside the braces [] and the second [.\w]+ matches one or more word characters or dot from the remaining string.
You can use the following regex:
(?:\[([\w .!?]+)\]+|(\w+))
The regex contains two alternations: one to match everything inside the two square brackets, and one to capture every other word.
This assumes that the part inside the square brackets doesn't contain any characters other than alphabets, digits, _, !, ., and ?. In case you need to add more punctuation, it should be easy enough to add them to the character class.
If you don't want to be that specific about what should be captured, then you can use a negated character class instead — specify what not to match instead of specifying what to match. The expression then becomes: (?:\[([^\[\]]+)\]|(\w+))
Visualization:
Explanation:
(?: # Begin non-capturing group
\[ # Match a literal '['
( # Start capturing group 1
[\w .!?]+ # Match everything in between '[' and ']'
) # End capturing group 1
\] # Match literal ']'
| # OR
( # Begin capturing group 2
\w+ # Match rest of the words
) # End capturing group 2
) # End non-capturing group
Demo
I have a string that looks like:
'- 10 TEST (FOO 3 TEST.BAR 213 BAZ (\HELLO) TEST'
Same format for any other string. How could I get, the value of FOO, BAR and BAZ. So, for instance, I can get an array as:
'FOO' => 3,
'BAR' => 213,
'BAZ' => HELLO
preg_match is your friend :)
You want to use preg_match to first grab the matches and then put them into an array. This will give you what you are looking for:
$str = '- 10 TEST (FOO 3 TEST.BAR 213 BAZ (\HELLO) TEST';
preg_match('/FOO (\d+).+BAR (\d+).+BAZ \(\\\\(\w+)\)/i', $str, $match);
$array = array(
'FOO' => $match[1],
'BAR' => $match[2],
'BAZ' => $match[3]
);
print_r($array);
This is assuming though that the first two values are numbers and the last is word characters.
Assuming neither the constants nor the values have space in them,
this will work for the given example :
$str = '(CONST1 value1 X.CONST2 value2 CONST3 (\VALUE3) X...)';
preg_match('/\((\S+)\s+(\S+)\s+.*?\.(\S+)\s+(\S+)\s+(\S+)\s+\(\\\\(\S+)\)/', $str, $m);
$arr = array();
for($i=1; $i<count($m);$i+=2) {
$arr[$m[$i]] = $m[$i+1];
}
print_r($arr);
output:
Array
(
[CONST1] => value1
[CONST2] => value2
[CONST3] => VALUE3
)
explanation
\( : an open parenthesis
(\S+) : 1rst group, CONST1, all chars that are not spaces
\s+ : 1 or more spaces
(\S+) : 2nd group, value1
\s+.*?\. : some spaces plus any chars plus a dot
(\S+) : 3rd group, CONST2
\s+ : 1 or more spaces
(\S+) : 4th group, value2
\s+ : 1 or more spaces
(\S+) : 5th group, CONST3
\s+ : 1 or more spaces
\( : an open parenthesis
\\\\ : backslash
(\S+) : 6th group, VALUE3
\) : a closing parenthesis