Remove text outside of [] {} () bracket - php

I have a string, from which I want to keep text inside a pair of brackets and remove everything outside of the brackets:
Hello [123] {45} world (67)
Hello There (8) [9] {0}
Desired output:
[123] {45} (67) (8) [9] {0}
Code tried but fails:
$re = '/[^()]*+(\((?:[^()]++|(?1))*\))[^()]*+/';
$text = preg_replace($re, '$1', $text);

If the values in the string are always an opening bracket paired up with a closing bracket and no nested parts, you can match all the bracket pairs which you want to keep, and match all other character except the brackets that you want to remove.
(?:\[[^][]*]|\([^()]*\)|{[^{}]*})(*SKIP)(*F)|[^][(){}]+
Explanation
(?: Non capture gorup
\[[^][]*] Match from [...]
| Or
\([^()]*\) Match from (...)
| Or
{[^{}]*} Match from {...}
) Close non capture group
(*SKIP)(*F)| consume characters that you want to avoid, and that must not be a part of the match result
[^][(){}]+ Match 1+ times any char other than 1 of the listed
Regex demo | Php demo
Example code
$re = '/(?:\[[^][]*]|\([^()]*\)|{[^{}]*})(*SKIP)(*F)|[^][(){}]+/m';
$str = 'Hello [123] {45} world (67)
Hello There (8) [9] {0}';
$result = preg_replace($re, '', $str);
echo $result;
Output
[123]{45}(67)(8)[9]{0}
If you want to remove all other values:
(?:\[[^][]*]|\([^()]*\)|{[^{}]*})(*SKIP)(*F)|.
Regex demo

Looks like you wanted to target nested stuff as well. There are already questions about how to match balanced parenthesis. Adjust one of those patterns to fit your needs, e.g. something like
$pattern = '/\((?:[^)(]*(?R)?)*+\)|\{(?:[^}{]*+(?R)?)*\}|\[(?:[^][]*+(?R)?)*\]/';
You can try this on Regex101. Extract those with preg_match_all and implode the matches.
if(preg_match_all($pattern, $str, $out) > 0)
echo implode(' ', $out[0]);
If you need to match the stuff outside, even with this pattern you can use (*SKIP)(*F) that also used #Thefourthbird in his elaborately answer! For skipping the bracketed see this other demo.

If the brackets are not nested, the following should suffice:
[^[{(\]})]+(?=[[{(]|$)
Demo.
Breakdown:
[^[{(\]})]+ # Match one or more characters except for opening/closing bracket chars.
(?=[[{(]|$) # A positive Lookahead to ensure that the match is either followed by
# an opening bracket char or is at the end of the string.

Related

simple pattern with preg_match_ALL work fine!, how to use with preg_replace?

thanks by your help.
my target is use preg_replace + pattern for remove very sample strings.
then only using preg_replace in this string or others, I need remove ANY content into <tag and next symbol >, the pattern is so simple, then:
$x = '#<\w+(\s+[^>]*)>#is';
$s = 'DATA<td class="td1">111</td><td class="td2">222</td>DATA';
preg_match_all($x, $s, $Q);
print_r($Q[1]);
[1] => Array
(
[0] => class="td1"
[1] => class="td2"
)
work greath!
now I try remove strings using the same pattern:
$new_string = '';
$Q = preg_replace($x, "\\1$new_string", $s);
print_r($Q);
result is completely different.
what is bad in my use of preg_replace?
using only preg_replace() how I can remove this strings?
(we can use foreach(...) for remove each string, but where is the error in my code?)
my result expected when I intro this value:
$s = 'DATA<td class="td1">111</td><td class="td2">222</td>DATA';
is this output:
$Q = 'DATA<td>111</td><td>222</td>DATA';
Let's break down your RegEx, #<\w+(\s+[^>]*)>#is, and see if that helps.
# // Start delimiter
< // Literal `<` character
\w+ // One or more word-characters, a-z, A-Z, 0-9 or _
( // Start capturing group
\s+ // One or more spaces
[^>]* // Zero or more characters that are not the literal `>`
) // End capturing group
> // Literal `>` character
# // End delimiter
is // Ignore case and `.` matches all characters including newline
Given the input DATA<td class="td1">DATA this matches <td class="td1"> and captures class="td1". The difference between match and capture is very important.
When you use preg_match you'll see the entire match at index 0, and any subsequent captures at incrementing indexes.
When you use preg_replace the entire match will be replaced. You can use the captures, if you so choose, but you are replacing the match.
I'm going to say that again: whatever you pass as the replacement string will replace the entirety of the found match. If you say $1 or \\=1, you are saying replace the entire match with just the capture.
Going back to the sample after the breakdown, using $1 is the equivalent of calling:
str_replace('<td class="td1">', ' class="td1"', $string);
which you can see here: https://3v4l.org/ZkPFb
To your question "how to change [0] by $new_string", you are doing it correctly, it is your RegEx itself that is wrong. To do what you are trying to do, your pattern must capture the tag itself so that you can say "replace the HTML tag with all of the attributes with just the tag".
As one of my comments noted, this is where you'd invert the capturing. You aren't interesting in capturing the attributes, you are throwing those away. Instead, you are interested in capturing the tag itself:
$string = 'DATA<td class="td1">DATA';
$pattern = '#<(\w+)\s+[^>]*>#is';
echo preg_replace($pattern, '<$1>', $string);
Demo: https://3v4l.org/oIW7d

Find a pattern in a string

I am trying to detect a string inside the following pattern: [url('example')] in order to replace the value.
I thought of using a regex to get the strings inside the squared brackets and then another to get the text inside the parenthesis but I am not sure if that's the best way to do it.
//detect all strings inside brackets
preg_match_all("/\[([^\]]*)\]/", $text, $matches);
//loop though results to get the string inside the parenthesis
preg_match('#\((.*?)\)#', $match, $matches);
To match the string between the parenthesis, you might use a single pattern to get a match only:
\[url\(\K[^()]+(?=\)])
The pattern matches:
\[url\( Match [url(
\K Clear the current match buffer
[^()]+ Match 1+ chars other than ( and )
(?=\)]) Positive lookahead, assert )] to the right
See a regex demo.
For example
$re = "/\[url\(\K[^()]+(?=\)])/";
$text = "[url('example')]";
if (preg_match($re, $text, $match)) {
var_dump($match[0]);;
}
Output
string(9) "'example'"
Another option could be using a capture group. You can place the ' inside or outside the group to capture the value:
\[url\(([^()]+)\)]
See another regex demo.
For example
$re = "/\[url\(([^()]+)\)]/";
$text = "[url('example')]";
if (preg_match($re, $text, $match)) {
var_dump($match[1]);;
}
Output
string(9) "'example'"

Trying to create a regex in PHP that matches patterns inside a pattern

I have seen some regex examples where the string is "Test string: Group1Group2", and using preg_match_all(), matching for patterns of text that exists inside the tags.
However, what I am trying to do is a bit different, where my string is something like this:
"some t3xt../s8fo=123,sij(variable1=123,variable2=743,variable3=535)"
What I want to do is match the sections such as 'variable=123' that exist inside the parenthesis.
What I have so far is this:
if( preg_match_all("/\(([^\)]*?)\)"), $string_value, $matches )
{
print_r( $matches[1] );
}
But this just captures everything that's inside the parenthesis, and doesn't match anything else.
Edit:
The desired output would be:
"variable1=123"
"variable2=743"
"variable3=535"
The output that I am getting is:
"variable1=123,variable2=743,variable3=535"
You can extract the matches you need with a single call to preg_match_all if the matches do not contain (, ) or ,:
$s = '"some t3xt../s8fo=123,sij(variable1=123,variable2=743,variable3=535)"';
if (preg_match_all('~(?:\G(?!\A),|\()\K[^,]+(?=[^()]*\))~', $s, $matches)) {
print_r($matches[0]);
}
See the regex demo and a PHP demo.
Details:
(?:\G(?!\A),|\() - either end of the preceding successful match and a comma, or a ( char
\K - match reset operator that discards all text matched so far from the current overall match memory buffer
[^,]+ - one or more chars other than a comma (use [^,]* if you expect empty matches, too)
(?=[^()]*\)) - a positive lookahead that requires zero or more chars other than ( and ) and then a ) immediately to the right of the current location.
I would do this:
preg_match("/\(([^\)]+)\)/", $string_value, $matches);
$result = explode(",", $matches[1]);
If your end result is an array of key => value then you can transform it into a query string:
preg_match("/\(([^\)]+)\)/", $string_value, $matches);
parse_str(str_replace(',', '&', $matches[1]), $result);
Which yields:
Array
(
[variable1] => 123
[variable2] => 743
[variable3] => 535
)
Or replace with a newline \n and use parse_ini_string().

Match regex pattern that isn't within a bbcode tag

I am attempting to create a regex patten that will match words in a string that begin with #
Regex that solves this initial problem is '~(#\w+)~'
A second requirement of the code is that it must also ignore any matches that occur within [quote] and [/quote] tags
A couple of attempts that have failed are:
(?:[0-9]+|~(#\w+)~)(?![0-9a-z]*\[\/[a-z]+\])
/[quote[\s\]][\s\S]*?\/quote](*SKIP)(*F)|~(#\w+)~/i
Example: the following string should have an array output as displayed:
$results = [];
$string = "#friends #john [quote]#and #jane[/quote] #doe";
//run regex match
preg_match_all('regex', $string, $results);
//dump results
var_dump($results[1]);
//results: array consisting of:
[1]=>"#friends"
[2]=>"#john"
[3]=>"#doe
You may use the following regex (based on another related question):
'~(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|#\w+~s'
See the regex demo. The regex accounts for nested [quote] tags.
Details
(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F) - matches the pattern inside capturing parentheses and then (*SKIP)(*F) make the regex engine omit the matched text:
\[quote] - a literal [quote] string
(?:(?1)|.)*? - any 0+ (but as few as possible) occurrences of the whole Group 1 pattern ((?1)) or any char (.)
\[/quote] - a literal [/quote] string
| - or
#\w+ - a # followed with 1+ word chars.
PHP demo:
$results = [];
$string = "#friends #john [quote]#and #jane[/quote] #doe";
$rx = '~(\[quote\](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|#\w+~s';
preg_match_all($rx, $string, $results);
print_r($results[0]);
// => Array ( [0] => #friends [1] => #john [2] => #doe )

php preg_match_all not working

Regex is highlighting the wrong words like Hell«o» and ignoring the correct words «Hello» or Hello,
So, my problem is working fine for my javascript code, but when i try it for php it also highlighting the string, which shouldn't:
'«This is the point of sale» ';
here is my regex: https://regex101.com/r/SqCR1y/14
PHP Code:
$re = '/^(?:.*[[{(«][^\]})»\n]*|[^[{(«\n]*[\]})»].*|.*\w[[{(«].*|.*[\]})»]\w.*)$/m';
$str = '«This is the point of sale»';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
//Output
array(1) {
[0]=>
array(1) {
[0]=>
string(29) "«This is the point of sale»"
}
}
expected: empty array
jsfiddle here, which is working fine
Thanks in advance
you're not using the right pattern. try this:
$re = '/^
(?:
\([^)\n] | [^(\n]*\). |
\[[^]\n] | [^[\n]*\]. |
{[^}\n] | [^{\n]}.* |
«[^»\n] | [^«\n]*». |
.?\w[[{(«]. | .?[\]})»]\w.
)
$/mxu';
What about a string like "(not) balanced)" ? Should that be legal?
This type of pattern isn't explicit in your test input, but since none of your "good" strings are imbalanced, you could consider covering these cases by using regex recursion to match balanced bracket expressions and targeting valid strings instead of invalid ones:
$re = '/
^
(?!.*\w[{}«»\(\)\[\]]\w) //disallow brackets inside words
(?:
[^\n{}«»\(\)\[\]]| //non bracket character, OR:
( //(capture group #1, the recursive subpattern) "one of the following balanced groups":
(\((?:(?>[^\n«»\(\){}\[\]]|(?1))*)\))| //balanced paren groups
(\[(?:(?>[^\n«»\(\){}\[\]]|(?1))*)\])| //balanced bracket groups
(«(?:(?>[^\n«»\(\){}\[\]]|(?1))*)»)| //balanced chevron groups
({(?:(?>[^\n«»\(\){}\[\]]|(?1))*)}) //balanced curly bracket groups
)
)+ //repeat "non bracket character or balanced group" until end of string
$
/mxu';
The recursion takes this form:
[openbracket]([nonbracket] | [open/close pattern again via recursion])*[closebracket]
To use part of the pattern recursively you identify it via the capture group that encloses it (?N), where N is the number of the group.
*The initial negative lookahead will fail any "word boundary" violations before going into the recursive stuff
*This regex looks to be about 35% faster than the original approach, as seen here: https://regex101.com/r/MBITHe/4

Categories