Identify the (weat)her using regexr tool - php

I am using this tool http://regexr.com/3fvg9
I want to mark this (weat)her in regexxr tool.
(weath)er is good. // i want to mark this word
(weather is go)od. // i want to mark this word
Please help me.

Since there is no way to check with a regex if a word is "known" or not, I suggest extracting these parts you need first and then use a kind of a spelling dictionary to check if the words are correct. It won't be 100% accurate, but still better than pure regex.
The expression you need to extract the parts of glued words with parentheses is
(?|([a-zA-Z0-9]+)\(([a-zA-Z\s]+)\)|\(([a-zA-Z\s]+)\)([a-zA-Z0-9]+))
See the regex demo at regex101 that supports PHP regex.
The regex matches 2 alternatives inside a branch reset group inside which all capturing groups in different branches are numbered starting with the same ID:
([a-zA-Z0-9]+)\(([a-zA-Z\s]+)\) - Group 1 (([a-zA-Z0-9]+)) matching 1+ alphanumeric chars, then (, and then Group 2 (([a-zA-Z\s]+)) matching 1+ letters and whitespaces and then a ) is matched
| - or
\(([a-zA-Z\s]+)\)([a-zA-Z0-9]+) - a (, then Group 1 (([a-zA-Z\s]+)) matching 1+ letters and whitespaces, ), and then Group 2 (([a-zA-Z0-9]+)) matching 1+ alphanumeric chars

Related

How to Retrieve Overlapping Matches with Complex Regex and Preg_Match_All in PHP

Have read the following which have some overlap (pun intended!) with the issue I am facing:
preg_match_all how to get *all* combinations? Even overlapping ones
Overlapping matches with preg_match_all and pattern ending with repeated character
However, I don’t really know how to apply their answers to my issue which is a little more complicated.
My regex that I use with preg_match_all():
/.{240}[^\[]Order[^ ][^\(].{9}/u
With the following string:
56A.  Subject to the provisions of this Act, any decision of the Court or the Appeal Board shall be final and conclusive, and no decision or order of the Court or the Appeal Board shall be challenged, appealed against, reviewed, quashed or called into question in any court and shall not be subject to any Quashing Order, Prohibiting Order, Mandatory Order or injunction in any court on any account.[20/99; 42/2005]
I intended it to match exactly 3 times. The first match has “Quashing Order” 9 characters before the end. The second match has “Prohibiting Order” 9 characters before the end. The third match has “Mandatory Order” 9 characters before the end.
However, as expected it’s only matching the first one, as the expected matches are overlapping.
I applied what I read in the other posts, I tried this:
(?=(.{240}[^\[]Order[^ ][^\(].{9}))
I still don’t get what I need.
How do I solve this?
You can use
\w+\s+Order\b
See the regex demo.
Regex details
\w+ - one or more word chars
\s+ - 1 or more whitespaces
Order\b - a whole word Order, as \b is a word boundary.
You will need to use a positive look-behind assertion for .{240}, just like the answer you found suggests using a positive look-ahead assertion for .{9}:
/(?<=.{240})[^\[]Order[^ ][^\(](?=.{9})/u
This RE matches your string only twice because of [^ ], as #bobblebubble said. Adjust that part as necessary.

Regext look ahead not working as expected (As I learned about it) [duplicate]

Im creating a regex that searches for a text, but only if there isnt a dash after the match. Im using lookahead for this:
Regex: Text[\s\.][0-9]*(?!-)
Expected result Result
--------------- -------
Text 11 Text 11 Text 11
Text 52- <No Match> Text 5
Test case: https://regex101.com/r/doklxc/1/
The lookahead only seems to be matching with the previous character, which leaves me with Text 5, while I need it to not return a match at all.
Im checking the https://www.regular-expressions.info/ guides and tried using groups, but I cant wrap my head around this one.
How can I make it so the lookbehind function affects the entire preceding match?
Im using the default .Net Text.RegularExpressions library.
The [0-9]* backtracks and lets the regex engine find a match even if there is a -.
There are two ways: either use atomic groups or check for a digit in the lookahead:
Text[\s.][0-9]*(?![-\d])
Or
Text(?>[\s.][0-9]*)(?!-)
See the regex demo #1 and the regex demo #2.
Details
Text[\s.][0-9]*(?![-\d]) matches Text, then a dot or a whitespace, then 0 or more digits, and then it checks of there is a - or digit immediately to the right, and if there is, it fails the match. Even when trying to backtrack and match fewer digits than it grabbed before, the \d in the lookahead will fail those attempts
Text(?>[\s.][0-9]*)(?!-) matches Text, then an atomic group starts where backtracking won't be let in after the group patterns find their matching text. (?!-) only checks for a - after the [0-9]* pattern tries to grab any digits.

PCRE(php) Is it possible to check if sequence of numbers contains only unique number for that sequence?

Assuming I have a set of numbers (from 1 to 22) divided by some trivial delimiters (comma, point, space, etc). I need to make sure that this set of numbers does not contain any repetition of the same number. Examples:
1,14,22,3 // good
1,12,12,3 // not good
Is it possible to do via regular expression?
I know it's easy to do using just php, but I really wander how to make it work with regex.
Yes, you could achieve this through regex via negative looahead.
^(?!.*\b(\d+)\b.*\b\1\b)\d+(?:,\d+)+$
(?!.*\b(\d+)\b.*\b\1\b) Negative lookahead at the start asserts that the there wouldn't be a repeated number present in the match. \b(\d+)\b.*\b\1\b matches the repeated number.
\d+ matches one or more digits.
(?:,\d+)+ One or more occurances of , , one or more digits.
$ Asserts that we are at the end .
DEMO
OR
Regex for the numbers separated by space, dot, comma as delimiters.
^(?!.*\b(\d+)\b.*\b\1\b)\d+(?:([.\s,])\d+)(?:\2\d+)*$
(?:([.\s,])\d+) capturing group inside this non-capturing group helps us to check for following delimiters are of the same type. ie, the above regex won't match the strings like 2,3 5.6
DEMO
You can use this regex:
^(?!.*?(\b\d+)\W+\1\b)\d+(\W+\d+)*$
Negative lookahead (?!.*?(\b\d+)\W+\1\b) avoids the match when 2 similar numbers appear one after another separated by 1 or more non-word characters.
RegEx Demo
Here is the solution that fit my current need:
^(?>(?!\2\b|\3\b)(1\d{1}|2[0-2]{1}|\d{1}+)[,.; ]+)(?>(?!\1\b|\3\b)(1\d{1}|2[0-2]{1}|\d{1}+)[,.; ]+)(?>(?!\1\b|\2\b)(1\d{1}|2[0-2]{1}|\d{1}+))$
It returns all the sequences with unique numbers divided by one or more separator and also limit the number itself from 1 to 22, allowing only 3 numbers in the sequence.
See working example
Yet, it's not perfect, but work fine! Thanks a lot to everyone who gave me a hand on this!

Regex matching multiple pattern

I have these
name
name[one]
name[one][two]
name[one][two][three]
I want to be able to match them like this:
[name]
[name, one]
[name, one, two]
[name, one, two, three]
Here's my regex I've tried:
/([\w]+)(?:(?:\[([\w]+)\])+)?/
I just can't quite to seem to get it right, only gets the last square brackets
You can't have a dynamic number of captures; the number of captures is exactly equal to the number of capture parenthesis pairs ((?:...) don't count). You have two capture parenthesis pairs, that means you get two captures - no more, no less.
To handle variable number of matches, use submatches (in a replace with a function, if your language supports that), or split.
You haven't labeled with a programming language, so this is as specific as I can go.
This should do ([\w]+)(?:\[([\w]+)\]\+)?
http://regex101.com/r/mF8pC8/3
Changes from original regex - removed extra capture and added \ before last +.
1st Capturing group ([\w]+)
[\w]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\w match any word character [a-zA-Z0-9_]
(?:\[([\w]+)\]\+)? Non-capturing group
Quantifier: Between zero and one time, as many times as possible, giving back as needed [greedy]
\[ matches the character [ literally
2nd Capturing group ([\w]+)
[\w]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\w match any word character [a-zA-Z0-9_]
\] matches the character ] literally
\+ matches the character + literally
g modifier: global. All matches (don't return on first match)
You can't repeat groups in a regex. You can write them out a number of times though. This works for up to three groups in square brackets. You can add more if you like.
(\w+)\[(\w+)\](?:\[(\w+)\])?(?:\[(\w+)\])?
You can not have dynamic number of captures with php regexp...
Why not just, write something like: explode('[',strtr('name[one][two][three]', [']'=>''])) - it will give you desired result.

Regex Lookahead (PHP)

I have a quick question about regex for PHP.
My code:
^(\d{0,4}?)\.(?=(\d{1,2}))$
doesn't seem to work, where it's supposed to capture an optional group of up to 4 digits, then look ahead and conditionally capture a period based on if it captures a group of 1-2 digits.
Does anyone know why this doesn't work?
That's not the right way to do it - nothing about your regex indicates that the . is optional.
Try:
^(\d{0,4})(?:\.(\d{1,2}))?$
This will match up to four digits, which may optionally be followed by a dot, then one or two digits. In any case, the two subpatterns will contain the groups of digits.

Categories