I'm new to regular expressions and try to extract text in a string which starts with a value in brackets on the beginning of a new line until the next string in brackets.
My string:
(1x) cat
dog
(2) ele(4)phant
tiger
(x) fish
bird
I need to get:
- "1x" and "cat\r\ndog"
- "2" and "ele(4)phant\r\ntiger"
- "x" and "fish\r\nbird"
My regex:
(\r\n)*(\((.*?)\))(.*)
This gets me:
Match 1
Full match 0-8 `(1x) cat`
Group 2. 0-4 `(1x)`
Group 3. 1-3 `1x`
Group 4. 4-8 ` cat`
Match 2
Full match 13-28 `(2) ele(4)phant`
Group 2. 13-16 `(2)`
Group 3. 14-15 `2`
Group 4. 16-28 ` ele(4)phant`
Match 3
Full match 35-44 `(x) fish `
Group 2. 35-38 `(x)`
Group 3. 36-37 `x`
Group 4. 38-44 ` fish `
The problem is that my regex seems to stop at the end of the line so the strings on the new line (dog, tiger, bird) are missing.
Do you have an idea how to also get the content of the next lines until the next match?
You may use
'~^\(([^()]*)\)(.*(?:\R(?!\([^()]*\)).*)*)~m'
See the regex demo
Details
^ - start of a line (due to m modifier, ^ matches the start of a line rather than the start of the whole string)
\( - a (
([^()]*) - Group 1:
[^()]* - 0+ chars other than ( and ) (you might use your .*? here, if you do not want to overflow across lines, and want to match ( inside (...))
\) - a ) char
(.*(?:\R(?!\([^()]*\)).*)*) - Group 2:
.* - the rest of the line
(?:\R(?!\([^()]*\)).*)* - 0+ sequences of
\R(?!\([^()]*\)) - line break not followed with (...) substring
.* - rest of the line
Related
I want to allow inputs like _X_C or _X_X_X with the following regex:
^(\_X\_C|\_X\_L|\_L?)((\_X){0,3})$
The following should be allowed only once:
_X_C
_X_L
_L
or ...
_X_X_X (0 to threetimes)
The only thing that does not work is the allowance of "_X_X_X" or even "_X"
What did I do wrong?
You may use
^(?:_X_C|_X_L|_L?|(?:_X){0,3})$
^(?:_X_[CL]|_L?|(?:_X){0,3})$
See the regex demo.
Details:
^ - start of string
(?: - start of a non-capturing group:
_X_[CL]| - _X_ and then C or L, or
_L?| - a _ and then an optional L, or
(?:_X){0,3} - zero, one, two or three occurrences of _X substring
) - end of the group
$ - end of string.
I have a regular expression like thus:-
^
(\+?(?![0])\d{1,2})?
(00\d{2})?
([0-9]{10,10})
$
My test data is as follows:-
1. +447531234123 - pass
2. 447531234123 - pass
3. 00447531234123 - pass
4. 07531234123 - fail
5. 7531234123 - match
1-4 are all correct. #5 is incorrect. I'd like make all numbers fail if they aren't preceeded by +44, 44 or 0044. So, if one of the first two groups don't match - the third should fail.
Looks like you are after:
^(?:\+|00)?44\d{10}$
See the online demo
^ - Start line anchor.
(?: - Open non-capture group:
\+|00 - A literal plus or a double zero.
)? - Close non-capture group and make it optional.
44\d{10} - Literally 44 followed by 10 digits.
$ - End line anchor.
Edit:
For all country codes, rather than the hard-coded 44 please use:-
/^(?:\+|00)?([1-9]){2}?\d{9,10}$/
Text for example
data=1 type=old
data=2 type=test (2)
type=test data=3 (3)
I need get data-id from 2 and 3 lines
My code:
(data=([\d]+)|type=test)\s+(?!\1)((?1))
but don't get data=3
You need the g from global and m from multiline in your regex:
/(data=([\d]+)|type=test)\s+(?!\1)((?1))/gm
In the most simple form you may use
^(?=.*type=test).*data=(\d+)
See the regex demo
You may add word/whitespace boundaries later if necessary, e.g.
^(?=.*\btype=test\b).*\bdata=(\d+)\b
^(?=.*(?<!\S)type=test(?!\S)).*(?<!\S)data=(\d+)(?!\S)
The point is
^ - start of string
(?=.*type=test) - there must be type=test after any 0+ chars as many as possible to the right of the current position
.* - any 0+ chars other than line break chars as many as possible
data= - a string
(\d+) - Group 1: 1+ digits
I 'm working on a regex to match the example multi-line input below. I have tried this pattern but it is not working:
(^[A].*\n.*\nZ.*)[^A]
What regex can I use to select texts BETWEEN two delimiters? The markers A and Z are case-sensitive and start of line.
Start Marker=A
Stop Marker=Z
---INPUT---
AThis is the first linea AA
Csecond line - today is a good day
ZC is the delimeter
AZThis A is the fourth line
Bravo
Delta blah blah's test
Echo test test
Z The end of the second match
AAnother match here - the third one
CZharlie test--
Omega test
Zend of the third match...
------------
---EXPECTED MATCHES-----
[1]
AThis is the first linea AA
Csecond line - today is a good day
ZC is the delimeter
[2]
AZThis A is the fourth line
Bravo
Delta blah blah's test
Echo test test
Z The end of the second match
[3]
AAnother match here - the third one
CZharlie test--
Omega test
Zend of the third match...
------------------------
Can anyone help me figure out the correct pattern?
Activating multiline and dotall modifiers you could have that expression:
/^A.*?^Z.*?$/ms
Key points:
Caret used with multiline (m) modifier ensures A or Z match only at the beginning of a line.
Dotall (s) modifier makes . match all characters including newline.
Use of non-greedy repetition (*?) to not go to far.
$ at the end allow to capture the whole line instead of just the character Z
Demo
Edit: Effectively with a double star, catastrophic backtracking is never that far. Let's build something stronger.
Add more contrast, be explicit, and be possessive!
I ended with /^A(?>[^Z]*(?>(?!^)Z[^Z]*)*)^Z.*$/gm.
Let's decompose :
^A # Match starts with an A at the beginning of a line.
(?> # Matches the following as atomic and never come back to it!
[^Z]* # Matches any non Z.
(?> # Nested atomic group.
(?!^)Z # Matches a Z if not at the beginning of a line.
[^Z]* # Matches any non Z.
)* # Repeat this atomic group as much as possible.
) # End of the atomic group.
^Z # Matches a line beginning Z.
.*$ # Matches any character until end of the line.
Note that we removed the single line flag which isn't needed anymore.
I'm working through a bunch of text in which I'm looking for the following strings:
INT.
EXT.
INT./EXT.
EXT./INT.
The text under analysis is, for instance,
17 INT. BLOOM HOUSE - NIGHT 17
27 INT./EXT. BLOOM HOUSE - (PRESENT) DAY 27
Calls in php to, for instance,
preg_match("/^\w.*(INT\.\/EXT\.|EXT\.\/INT\.|EXT\.|INT\.)(.*)$/", $a_line, $matches);
and variants of that don't quite handle the greediness right (or so I think, anyway), and something gets left out, usually INT./EXT. or EXT./INT. items. Any advice? Thanks!
True, you need to use lazy dot matching with \w.*?, but you can also optimize the pattern to shorten the alternation group like this:
/^\w.*?(INT\.(?:\/EXT\.)?|EXT\.(?:\/INT\.)?)(.*)$/
See the regex demo
Also, if you are processing the text as a whole, you will need a /m multiline modifer.
Details:
^ - start of a string
\w - a word char
.*? - any 0+ chars other than line break chars as few as possible up to the first
(INT\.(?:\/EXT\.)?|EXT\.(?:\/INT\.)?) - Group 1 capturing either:
INT\.(?:\/EXT\.)? - INT. followed with optional /EXT. substring
| - or
EXT\.(?:\/INT\.)? - EXT. followed with optional /INT. substring
(.*) - Group 2: any 0+ chars other than line break chars up to the...
$ - end of string.