Regular Expressions: get what is outside of the brackets - php

I'm using PHP and I have text like:
first [abc] middle [xyz] last
I need to get what's inside and outside of the brackets. Searching in StackOverflow I found a pattern to get what's inside:
preg_match_all('/\[.*?\]/', $m, $s)
Now I'd like to know the pattern to get what's outside.
Regards!

You can use preg_split for this as:
$input ='first [abc] middle [xyz] last';
$arr = preg_split('/\[.*?\]/',$input);
print_r($arr);
Output:
Array
(
[0] => first
[1] => middle
[2] => last
)
This allows some surrounding spaces in the output. If you don't want them you can use:
$arr = preg_split('/\s*\[.*?\]\s*/',$input);
preg_split splits the string based on a pattern. The pattern here is [ followed by anything followed by ]. The regex to match anything is .*. Also [ and ] are regex meta char used for char class. Since we want to match them literally we need to escape them to get \[.*\]. .* is by default greedy and will try to match as much as possible. In this case it will match abc] middle [xyz. To avoid this we make it non greedy by appending it with a ? to give \[.*?\]. Since our def of anything here actually means anything other than ] we can also use \[[^]]*?\]
EDIT:
If you want to extract words that are both inside and outside the [], you can use:
$arr = preg_split('/\[|\]/',$input);
which split the string on a [ or a ]

$inside = '\[.+?\]';
$outside = '[^\[\]]+';
$or = '|';
preg_match_all(
"~ $inside $or $outside~x",
"first [abc] middle [xyz] last",
$m);
print_r($m);
or less verbose
preg_match_all("~\[.+?\]|[^\[\]]+~", $str, $matches)

Use preg_split instead of preg_match.
preg_split('/\[.*?\]/', 'first [abc] middle [xyz] last');
Result:
array(3) {
[0]=>
string(6) "first "
[1]=>
string(8) " middle "
[2]=>
string(5) " last"
}
ideone

As every one says that you should use preg_split, but only one person replied with an expression that meets your needs, and i think that is a little complex - not complex, a little to verbose but he has updated his answer to counter that.
This expression is what most of the replies have stated.
/\[.*?\]/
But that only prints out
Array
(
[0] => first
[1] => middle
[2] => last
)
and you stated you wanted whats inside and outside the braces, sio an update would be:
/[\[.*?\]]/
This gives you:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
but as you can see that its capturing white spaces as well, so lets go a step further and get rid of those:
/[\s]*[\[.*?\]][\s]*/
This will give you a desired result:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
This i think is the expression your looking for.
Here is a LIVE Demonstration of the above Regex

Related

regex, anything but phrase a or phrase b, as part of a larger expression

I'm trying to build a function to drop vars in a string
for example
Hi, My name is {!first_name!}
and my family name is {!last_name!},
To
sum it up, my name is {!full_name!}.
I am a {!job_title!}.
To use my
function just write your vars like this #{!code!}
and my regular expression
/(#?)(\{\!\s*([^\{\!\!\}]*)\s*\!\})/Uis
my problem is that if I have more then one character as the start or end of the var, then [^x] does not work like expected
so how can I have an expression like this
/(#?)(\{\!\s*(**anything but {! or !}**)\s*\!\})/Uis
Or maybe their is a better approach altogether.
Thank you.
edit:
Here is my full function click here
as it is now, it works, but if I want to do something like this.
echo dropVars($str, $vars, ['{!','!}']);
It will fail (actually it will not, but I hope you get my point)
You may use
'~(#?)({!\s*((?:(?!{!|!}).)*?)\s*!})~is'
See the regex demo
Details
(#?) - Group 1: an optional # char
({!\s*((?:(?!{!|!}).)*?)\s*!}) - Group 2:
{! - a {! substring
\s* - 0+ whitespaces
((?:(?!{!|!}).)*?) - Group 3: any char, as few as possible, that does not start {! or !} substrings
\s* - 0+ whitespaces
!} - a literal substring.
Try:
<?php
$input = "Hi, My name is {!first_name!}
and my family name is {!last_name!},
To sum it up, my name is {!full_name!}.
I am a {!job_title!}.
To use my function just write your vars like this #{!code!}";
preg_match_all('/#?\{\!\s*([^{!}]*)\s*\!\}/mi', $input, $matches);
print_r($matches);
?>
Output:
Array
(
[0] => Array
(
[0] => {!first_name!}
[1] => {!last_name!}
[2] => {!full_name!}
[3] => {!job_title!}
[4] => #{!code!}
)
[1] => Array
(
[0] => first_name
[1] => last_name
[2] => full_name
[3] => job_title
[4] => code
)
)
I changed your regex to be a little simpler and I'm capturing only the text part in between the exclamations and curly brackets. I removed the 'U' and 's' flags since I didn't think they were needed. I added the 'm' flag to allow the pattern to match over multiple lines.
Here's another example that replaces each template variable with a corresponding value:
<?php
$input = "Hi, My name is {!first_name!}
and my family name is {!last_name!},
To sum it up, my name is {!full_name!}.
I am a {!job_title!}.
To use my function just write your vars like this #{!code!}";
$replacement_values = [
"first_name" => "Billy",
"last_name" => "Jean",
"full_name" => "Ms. Billy Jean",
"job_title" => "Lover",
"code" => "vars",
];
$input = preg_replace_callback(
'/#?\{\!\s*([^{!}]*)\s*\!\}/mi',
function ($matches){
global $replacement_values;
return $replacement_values[$matches[1]];
},
$input
);
echo $input;
?>
Output:
Hi, My name is Billy
and my family name is Jean,
To sum it up, my name is Ms. Billy Jean.
I am a Lover.
To use my function just write your vars like this vars

Regex to Match Passed Function/Method Parameters

I've had a good look around for a question that asked this before; alas, my search for a PHP preg_match search returned no results (maybe my searching skills fell short, I suppose justified considering it's a Regex question!).
Consider the text below:
The quick __("brown ") fox jumps __('over the') lazy __("dog")
Now currently I need to 'scan' for the given method __('') above, whereas it could include the spacing and different quotations ('|"). My best attempt after numerous 'iterations':
(__\("(.*?)"\))|(__\('(.*?)'\))
Or at its simplest form:
__\((.*?)\)
To break this down:
Anything that starts with __
Escaped ( and quotation mark " or '. Thus, \(\"
(.*?) Non-greedy match of all characters
Escaped closing " and last bracket.
| between the two expressions match either/or.
However, this only gets partial matches, and spaces are throwing off the search entirely. Apologies if this has been asked before, please link me if so!
Tester Link for the pattern provided above:
PHP Live Regex Test Tool
When the searched method string uses single quotes it will end up in another capture group than if it has double quotes. So in fact, your regular expression works (except for the spaces, see further down), but you'd have to look at a different index in your result array:
$input = 'The quick __("brown ") fox jumps __(\'over the\') lazy __("dog")';
// using your regular expression:
$res = preg_match_all("/(__\(\"(.*?)\"\))|(__\('(.*?)'\))/", $input, $matches);
print_r ($matches);
Note that you need preg_match_all instead of preg_match to get all matches.
Output:
Array
(
[0] => Array
(
[0] => __("brown ")
[1] => __('over the')
[2] => __("dog")
)
[1] => Array
(
[0] => __("brown ")
[1] =>
[2] => __("dog")
)
[2] => Array
(
[0] => brown
[1] =>
[2] => dog
)
[3] => Array
(
[0] =>
[1] => __('over the')
[2] =>
)
[4] => Array
(
[0] =>
[1] => over the
[2] =>
)
)
So, the result array has 5 elements, the first one representing the complete match, and all the others correspond to the 4 capture groups you have in your regular expression. As the capture groups for single quotes are not those of the double quotes, you'll find the matches at different places.
To "solve" this, you could use a back reference in your regular expression, which would look back to see which was the opening quote (single or double) and require the same to be repeated at the end:
$res = preg_match_all("/__\(([\"'])(.*?)\\1\)/", $input, $matches);
Note the back reference \1 (the backslash had to be escaped with another one). This refers back to the first capture group, where we have ["'] (again an escape was necessary) to match both kinds of quotes.
You also wanted to deal with spaces. On your PHP Live Regex you used a test string that had such spaces between the brackets and quotes. To deal with these so they still match the method strings correctly, the regular expression should get two additional \s*:
$res = preg_match_all("/__\(\s*([\"'])(.*?)\\1\s*\)/", $input, $matches);
Now the output is:
Array
(
[0] => Array
(
[0] => __("brown ")
[1] => __('over the')
[2] => __("dog")
)
[1] => Array
(
[0] => "
[1] => '
[2] => "
)
[2] => Array
(
[0] => brown
[1] => over the
[2] => dog
)
)
... and the text captured by the groups is now nicely arranged.
See this code run on eval.in and PHP Live Regex.
When working with stuff like this, don't forget about escaping:
<?php
ob_start();
?>
The quick __("brown ") fox jumps __( 'over the' ) lazy __("dog").
And __("everyone says \"hi\"").
<?php
$content = ob_get_clean();
$re = <<<RE
/__ \(
\s*
" ( (?: \\\\. | [^"])+ ) "
|
' ( (?: \\\\. | [^'])+ ) '
\s*
\)
/x
RE;
preg_match_all($re, $content, $matches, PREG_SET_ORDER);
foreach($matches as $match)
echo end($match), "\n";
How about this:
(__(\('[^']+'\)|\("[^"]+"\)))
Instead of the non greedy ., use any char but the quotes [^'] or [^"]
Enclose double and single quotes with square brackets as a character class:
$str = 'The quick __( "brown ") fox jumps __(\'over the\') lazy __("dog")';
preg_match_all("/__\(\s*([\"']).*?\\1\s*\)/ium", $str, $matches);
echo '<pre>';
var_dump($matches[0]);
// the output:
array (size=3)
0 => string '__( "brown ")'
1 => string '__('over the')'
2 => string '__("dog")'
And here is example with the same solution on phpliveregex.com:
http://www.phpliveregex.com/p/exF
(section preg_match_all)

Using regex to not match periods between numbers

I have a regex code that splits strings between [.!?], and it works, but I'm trying to add something else to the regex code. I'm trying to make it so that it doesn't match [.] that's between numbers. Is that possible? So, like the example below:
$input = "one.two!three?4.000.";
$inputX = preg_split("~(?>[.!?]+)\K(?!$)~", $input);
print_r($inputX);
Result:
Array ( [0] => one. [1] => two! [2] => three? [3] => 4. [4] => 000. )
Need Result:
Array ( [0] => one. [1] => two! [2] => three? [3] => 4.000. )
You should be able to split on this:
(?<=(?<!\d(?=[.!?]+\d))[.!?])(?![.!?]|$)
https://regex101.com/r/kQ6zO4/1
It uses lookarounds to determine where to split. It looks behind to try to match anything in the set [.!?] one or more times as long as it isn't preceded by and succeeded by a digit.
It also won't return the last empty match by ensuring the last set isn't the end of the string.
UPDATE:
This should be much more efficient actually:
(?!\d+\.\d+).+?[.!?]+\K(?!$)
https://regex101.com/r/eN7rS8/1
Here is another possibility using regex flags:
$input = "one.two!three???4.000.";
$inputX = preg_split("~(\d+\.\d+[.!?]+|.*?[.!?]+)~", $input, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($inputX);
It includes the delimiter in the split and ignores empty matches. The regex can be simplified to ((?:\d+\.\d+|.*?)[.!?]+), but I think what is in the code sample above is more efficient.

How to split a string on logic operators

I'm needing to parse some user inputs. They're coming to me in the form of clauses ex:
total>=100
name="foo"
bar!="baz"
I have a list of all of the available operators (<, >, <=, !=, = etc) and was using this to build a regex pattern.
My goal is to get each clause split into 3 pieces:
$result=["total", ">=", "100"]
$result=["name", "=", "foo"]
$result=["bar", "!=", "baz"]
My pattern takes all the operators and builds something like this (condensed for length)(this example only matches > and >=:
preg_split("/(?<=>)|(?=>)|(?<=>=)|(?=>=)/", $clause,3)
So a lookbehind and a lookahead for each operator. I had preg_split restrict to 3 groups in case a string contained an operator character (name="<wow>").
My regex works pretty great, however it fails terribly for any operator which includes characters in another operator. For example, >= is never split right because > is matched and split first. The same for != which is matched by =
Here's what I'm getting:
$result=["total", ">", "=100"]
$result=["bar", "!", "=baz"]
Is it possible to use regex to do what I'm attempting? I need to keep track of the operator and can't simply split the string on it (hence the lookahead/behind solution).
One possiblity I considered would be to force a space or unusual character around all the operators so that > and >= would become, say, {>} and {>=} if the regex had to match the brackets, then it wouldn't be able to match early like it is now. However, this isn't an elegant solution and it seems like some of the regex masters here might know a better way.
Is regex the best solution or should I use string functions?
This question is somewhat similar, but I don't believe the answer's pseudocode is accurate - I couldn't get it to work well. How to manipulate and validate string containing conditions that will be evaluated by php
I'd suggest matching instead of splitting, as the result will still be an array.
^(.*?)([!<>=|]=?)(.*?)$
Here is a demo.
PHP code:
$re = "/^(.*?)([!<>=|]=?)(.*?)$/m";
$str = "total>=100\nname=\"foo\"\nbar!=\"baz\"";
preg_match_all($re, $str, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => total>=100
[1] => name="foo"
[2] => bar!="baz"
)
[1] => Array
(
[0] => total
[1] => name
[2] => bar
)
[2] => Array
(
[0] => >=
[1] => =
[2] => !=
)
[3] => Array
(
[0] => 100
[1] => "foo"
[2] => "baz"
)
)
You can try this regexp
/^(.*)([><!]?[=]+|[>]+|[<]+)(.*)$/mgU
I have tried it here: https://regex101.com/ with input:
xxx>"sdads"
yyy<"sadasd"
name="foo"
total>=100
total<=100
total<=100
bar!="baz"
and it matched everything in right place
Using the regex: /([^<=>!]*)([<=>!]{1,2})(.*)/ with preg_match on each line will get you the desired result; at least for your examples, but likely much more.
I think one syntax that is useful and maybe you didn't know about is [].
[...] means match any character in the braces
[^...] means match any character NOT in the braces
Code example
$test = 'total>=100';
$regex = '/([^<=>!]*)([<=>!]{1,2})(.*)/';
preg_match($regex, $test, $match);
print_r($match);
result:
array(4
0 => total>=100
1 => total
2 => >=
3 => 100
)

Split a string while keeping delimiters and string outside

I'm trying to do something that must be really simple, but I'm fairly new to PHP and I'm struggling with this one. What I want is to split a string containing 0, 1 or more delimiters (braces), while keeping the delimiters AND the string between AND the string outside.
ex: 'Hello {F}{N}, how are you?' would output :
Array ( [0] => Hello
[1] => {F}
[2] => {N}
[3] => , how are you? )
Here's my code so far:
$value = 'Hello {F}{N}, how are you?';
$array= preg_split('/[\{\}]/', $value,-1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($array);
which outputs (missing braces) :
Array ( [0] => Hello
[1] => F
[2] => N
[3] => , how are you? )
I also tried :
preg_match_all('/\{[^}]+\}/', $myValue, $array);
Which outputs (braces are there, but the text outside is flushed) :
Array ( [0] => {F}
[1] => {N} )
I'm pretty sure I'm on the good track with preg_split, but with the wrong regex. Can anyone help me with this? Or tell me if I'm way off?
You aren't capturing the delimiters. Add them to a capturing group:
/(\{.*?\})/
You need parentheses around the part of the expression to be captured:
preg_split('/(\{[^}]+\})/', $myValue, -1, PREG_SPLIT_DELIM_CAPTURE);
See the documentation for preg_split().

Categories