I know this question was asked many times before and was read most of them, but I have still issue with this.
I will have a string that mapped with [[[ and ]]], and I don't know the position of this string and either I don't know how many times this would be happen.
for example :
$string = '[[[this is a string]]] and this is some other part. [[[this is another]]]and etc.';
Now, would some body help me to learn how can I find this is a string and this is another
Thanks in Advance
You need to use preg_match_all(), and you also need to be sure to escape the square brackets since they are special characters.
$string = '[[[this is a string]]] and this is some other part. [[[this is another]]]and etc.';
preg_match_all('/\[\[\[([^\]]*)\]\]\]/', $string, $matches);
print_r($matches);
Regex logic:
\[\[\[([^\]]*)\]\]\]
Debuggex Demo
Output:
Array
(
[0] => Array
(
[0] => [[[this is a string]]]
[1] => [[[this is another]]]
)
[1] => Array
(
[0] => this is a string
[1] => this is another
)
)
Here is a method using lookbehinds and lookaheads:
$string = '[[[this is a string]]] and this is some other part. [[[this is another]]]and etc.';
preg_match_all('/(?<=\[{3}).*?(?=\]{3})/', $string, $m);
print_r($m);
This outputs the following:
Array
(
[0] => Array
(
[0] => this is a string
[1] => this is another
)
)
Here is the explanation of the REGEX:
(?<= \[{3} ) .*? (?= \]{3} )
1 2 3 4 5 6 7
(?<= Positive lookbehind - This combination of (?<= ... ) tells REGEX to make sure that whatever is in the parenthesis has to appear directly before whatever it is we are trying to match. It will check to see if it's there, but won't include it in the matches.
\[{3} This says to look for an opening square brace '[', three times in a row {3}. The only thing is that the square brace is a special character in REGEX, so we have to escape it with a backslash \. [ becomes \[.
) Closing parenthesis ) for the lookbehind (Item #1)
.*? This tells REGEX to match any character ., any number of times * until it hits the next part of our regular expression ?. In this case, the next part that it will hit will be a lookahead for three closing square braces.
(?= Positive lookahead - The combination of (?= ... ) tells REGEX to make sure that whatever is in the parenthesis has to be directly in front (ahead) of what we are currently matching. It will check to see if it's there, but won't include it as part of our match.
\]{3} This looks for a closing square brace ], three times in a row {3} and as with item #2, must be escaped with a backslash \.
) Closing parenthesis ) for the lookahead (Item #5)
Related
I am trying to extract [[String]] with regular expression. Notice how a bracket opens [ and it needs to close ]. So you would receive the following matches:
[[String]]
[String]
String
If I use \[[^\]]+\] it will just find the first closing bracket it comes across without taking into consideration that a new one has opened in between and it needs the second close. Is this at all possible with regular expression?
Note: This type can either be String, [String] or [[String]] so you don't know upfront how many brackets there will be.
You can use the following PCRE compliant regex:
(?=((\[(?:\w++|(?2))*])|\b\w+))
See the regex demo. Details:
(?= - start of a positive lookahead (necessary to match overlapping strings):
(- start of Capturing group 1 (it will hold the "matches"):
(\[(?:\w++|(?2))*]) - Group 2 (technical, used for recursing): [, then zero or more occurrences of one or more word chars or the whole Group 2 pattern recursed, and then a ] char
| - or
\b\w+ - a word boundary (necessary since all overlapping matches are being searched for) and one or more word chars
) - end of Group 1
) - end of the lookahead.
See the PHP demo:
$s = "[[String]]";
if (preg_match_all('~(?=((\[(?:\w++|(?2))*])|\b\w+))~', $s, $m)){
print_r($m[1]);
}
Output:
Array
(
[0] => [[String]]
[1] => [String]
[2] => String
)
I want to create a regex that saves all of $text1 and $text2 in two separade arrays. text1 and text2 are: ($text1)[$text2] that exist in string.
I wrote this code to parse between brackets as:
<?php
preg_match_all("/\[[^\]]*\]/", $text, $matches);
?>
It works correctly .
And I wrote another code to parse between parantheses as:
<?php
preg_match('/\([^\)]*\)/', $text, $match);
?>
But it just parses between one of parantheses not all of the parantheses in string :(
So I have two problems:
1) How can I parse text between all of the parantheses in the string?
2) How can I reach $text1 and $text2 as i described at top?
Please help me. I am confused about regex. If you have a good resource share it link. Thanks ;)
Use preg_match_all() with the following regular expression:
/(\[.+?\])(\(.+?\))/i
Demo
Details
/ # begin pattern
( # first group, brackets
\[ # literal bracket
.+? # any character, one or more times, greedily
\] # literal bracket, close
) # first group, close
( # second group, parentheses
\( # literal parentheses
.+? # any character, one or more times, greedily
\) # literal parentheses, close
) # second group, close
/i # end pattern
Which will save everything between brackets in one array, and everything between parentheses in another. So, in PHP:
<?php
$s = "[test1](test2) testing the regex [test3](test4)";
preg_match_all("/(\[.+?\])(\(.+?\))/i", $s, $m);
var_dump($m[1]); // bracket group
var_dump($m[2]); // parentheses group
Demo
The only reason you were failing to capture multiple ( ) wrapped substrings is because you were calling preg_match() instead of preg_match_all().
A couple of small points:
The ) inside of your negated character class didn't need to be escaped.
The closing square bracket (at the end of your pattern) doesn't need to be escaped; regex will not mistake it to mean the end of a character class.
There is no need to declare the i pattern modifier, you have no letters in your pattern to modify.
Combine your two patterns into one and bake in my small points and you have a fully refined/optimized pattern.
In case you don't know why your patterns are great, I'll explain. You see, when you ask the regex engine to match "greedily", it can move more efficiently (take less steps).
By using a negated character class, you can employ greedy matching. If you only use . then you have to use "lazy" matching (*?) to ensure that matching doesn't "go too far".
Pattern: ~\(([^)]*)\)\[([^\]]*)]~ (11 steps)
The above will capture zero or more characters between the parentheses as Capture Group #1, and zero or more characters between the square brackets as Capture Group #2.
If you KNOW that your target strings will obey your strict format, you can even remove the final ] from the pattern to improve efficiency. (10 steps)
Compare this with lazy . matching. ~\((.*?)\)\[(.*?)]~ (35 steps) and that's only on your little 16-character input string. As your text increases in length (I can only imagine that you are targeting these substrings inside a much larger block of text) the performance impact will become greater.
My point is, always try to design patterns that use "greedy" quantifiers in pursuit of making the best / most efficient pattern. (further tips on improving efficiency: avoid piping (|), avoid capture groups, and avoid lookarounds whenever reasonable because they cost steps.)
Code: (Demo)
$string='Demo #1: (11 steps)[1] and Demo #2: (35 steps)[2]';
var_export(preg_match_all('~\(([^)]*)\)\[([^\]]*)]~',$string,$out)?array_slice($out,1):[]);
Output: (I trimmed off the fullstring matches with array_slice())
array (
0 =>
array (
0 => '11 steps',
1 => '35 steps',
),
1 =>
array (
0 => '1',
1 => '2',
),
)
Or depending on your use: (with PREG_SET_ORDER)
Code: (Demo)
$string='Demo #1: (11 steps)[1] and Demo #2: (35 steps)[2]';
var_export(preg_match_all('~\(([^)]*)\)\[([^\]]*)]~',$string,$out,PREG_SET_ORDER)?$out:[]);
Output:
array (
0 =>
array (
0 => '(11 steps)[1]',
1 => '11 steps',
2 => '1',
),
1 =>
array (
0 => '(35 steps)[2]',
1 => '35 steps',
2 => '2',
),
)
I wrote this RegEx: '/\[\.{2}([^\.].+)\]/'
And it is supposed to match patterns like this: [..Class,Method,Parameter]
It works until I have a pattern like this: [..Class1,Method1,Para1][..Class2,Method2,Para2]
I tried to make the RegEx lazy by putting a ? behin the +. '/\[\.{2}([^\.].+?)\]/' but it didn't help. Any suggestions?
I believe you wanted to use [^\.]+ rather than [\.].+. Note that .+ is a greedily quantified dot pattern and matches any 1 or more chars other than line break chars, and thus matches across both ] and [.
Match any 1 or more chars other than ] with [^]] rather than using [^\.]:
\[\.{2}([^]]+)]
See this regex demo
Details
\[ - a [ char
\.{2} - two dot chars
([^]]+) - Group 1: one or more chars other than ] (no need to escape ] when it is the first char in a character class)
] - a closing bracket (no need to escape ] when it is outside a character class).
PHP demo:
$str = '[..Class,Method,Parameter] [..Class1,Method1,Para1][..Class2,Method2,Para2]';
preg_match_all('/\[\.{2}([^\.].+?)\]/', $str, $matches);
print_r($matches[0]);
Results:
Array
(
[0] => [..Class,Method,Parameter]
[1] => [..Class1,Method1,Para1]
[2] => [..Class2,Method2,Para2]
)
I'm using preg_match_all and I want to capture the floating point numbers that do not have a letter following them.
For example
-20.4a 110b 139 31c 10.4
Desired
[0] => Array
(
[0] => 139
[1] => 10.4
)
I've tried was able do to the opposite using this pattern:
/\d+(.\d+)?(?=[a-z])/i
which captures the numbers with letters that you can see in this demo. But I can't figure out how to capture the numbers that have no trailing letters.
Use negative lookahead:
/\d+(\.\d+)?(?![a-z])/i
But it is not sufficient, you have to exclude also digit and dot:
/\d+(?:\.\d+)?(?![a-z\d.])/i
PHP:
$string = '-20.4a 110b 139 31c 10.4';
preg_match_all('/\d+(?:\.\d+)?(?![a-z\d.])/', $string, $match);
print_r($match);
Output:
Array
(
[0] => Array
(
[0] => 139
[1] => 10.4
)
)
You can use this regex with a positive lookahead:
[+-]?\b\d*\.?\d+(?=\h|$)
RegEx Demo
(?=\h|$) asserts presence of a horizontal white space or end of line after matched number.
Alternatively you can use this regex with a possessive quantifier:
[+-]?\b\d*\.?\d++(?![.a-zA-Z])
RegEx Demo 2
There are a few approaches one can take here.
Atomic group matching and a negative lookahead or word boundary:
(?>\d+(?:\.\d+)?)(?![a-z])
(?>\d+(?:\.\d+)?)\b
Using a negative lookahead that also denies a dot and numbers:
\d+(?:\.\d+)?(?![a-z.\d])
Positive lookahead to a space (seems to be the separator in here) or the end of string
\d+(?:\.\d+)?(?=\s|$)
I am trying to match words containing the following: eph gro iss
I have eph|gro|iss which will match eph gro iss in this example: new grow miss eph.
However I need to match the whole word. For example it should match all of the miss not just iss and grow not just gro
Thanks
You can do it like this:
\b(\w*(eph|gro|iss)\w*)\b
How it works:
The expression is bracketed with word-boundary anchors \b, so it only matches whole words. These words must contain one of the literals eph, gro or iss somewhere, but the \w* parts allow the literals to appear anywhere within the whole word.
The important thing here is that you need to adopt some specific definition for "words". If you are OK with the regex definition that words are sequences that match [a-zA-Z0-9_]+ then you can use the above verbatim.
If your definition of word is something else, you will need to replace the \b anchors and \w classes appropriately.
Try this:
\b([a-zA-Z]*(?:eph|gro|iss)[a-zA-Z]*)\b
Breakdown:
\b - word boundary
( - start capture
[a-zA-Z]* - zero or more letters
(?:eph|gro|iss) - your original regex, non-capturing
[a-zA-Z]* - zero or more letters
) - end capture
\b - word boundary
Example output:
php > $string = "new grow miss eph";
php > preg_match_all("/\b([a-zA-Z]*(?:eph|gro|iss)[a-zA-Z]*)\b/", $string, $matches);
php > print_r($matches);
Array
(
[0] => Array
(
[0] => grow
[1] => miss
[2] => eph
)
[1] => Array
(
[0] => grow
[1] => miss
[2] => eph
)
)