I have strings containing this (where the number are integers representing the user id)
#[calbert](3)
#[username](684684)
I figured I need the following to get the username and user id
\((.*?)\)
and
\[(.*?)])
But is there a way to get both at once?
And PHP returns, is it possible to only get the result without the parenthesis (and brackets in the username case)
Array
(
[0] => (3)
[1] => 3
)
\[([^\]]*)\]|\(([^)]*)\)
Try this.See demo.You need to use | or operator.This provides regex engine to provide alternating capturing group if the first one fails.
https://regex101.com/r/tX2bH4/31
$re = "/\\[([^\\]]*)\\]|\\(([^)]*)\\)/im";
$str = " #[calbert](3)\n #[username](684684)";
preg_match_all($re, $str, $matches);
Or you can use ur own regex. \((.*?)\)|\[(.*?)\])
Through positive lookbehind and lookahead assertion.
(?<=\[)[^\]]*(?=])|(?<=\()\d+(?=\))
(?<=\[) Asserts that the match must be preceeded by [ character,
[^\]]* matches any char but not of ] zero or more times.
(?=]) Asserts that the match must be followed by ] symbol.
| Called logical OR operator used to combine two regexes.
DEMO
Related
I am trying to extract [[String]] with regular expression. Notice how a bracket opens [ and it needs to close ]. So you would receive the following matches:
[[String]]
[String]
String
If I use \[[^\]]+\] it will just find the first closing bracket it comes across without taking into consideration that a new one has opened in between and it needs the second close. Is this at all possible with regular expression?
Note: This type can either be String, [String] or [[String]] so you don't know upfront how many brackets there will be.
You can use the following PCRE compliant regex:
(?=((\[(?:\w++|(?2))*])|\b\w+))
See the regex demo. Details:
(?= - start of a positive lookahead (necessary to match overlapping strings):
(- start of Capturing group 1 (it will hold the "matches"):
(\[(?:\w++|(?2))*]) - Group 2 (technical, used for recursing): [, then zero or more occurrences of one or more word chars or the whole Group 2 pattern recursed, and then a ] char
| - or
\b\w+ - a word boundary (necessary since all overlapping matches are being searched for) and one or more word chars
) - end of Group 1
) - end of the lookahead.
See the PHP demo:
$s = "[[String]]";
if (preg_match_all('~(?=((\[(?:\w++|(?2))*])|\b\w+))~', $s, $m)){
print_r($m[1]);
}
Output:
Array
(
[0] => [[String]]
[1] => [String]
[2] => String
)
I'm preg_match_all looping through a string using different patterns. Sometimes these patterns look a lot like each other, but differ slightly.
Right now I'm looking for a way to stop pattern A from matching strings that only pattern B - which has a 'T' in front of the 4 digits - should match.
The problem I'm running into is that pattern A also matches pattern B:
A:
(\d{4})(A|B)?(C|D)?
... matches 1234, 1234A, 1234AD, etc.
B:
I also have another pattern:
T(\d{4})\/(\d{4})
... which matches strings like: T7878/6767
The result
When running a preg_match_all on "T7878/6767 1234AD", A will give the following matches:
7878, 6767, 1234AD
Does anyone have a suggestion how to prevent A from matching B in a string like "Some text T7878/6767 1234AD and some more text"?
Your help is greatly appreciated!
Scenario with boundaries
If you only want to match those specific strings within some boundaries, use those boundary patterns on each side of the pattern.
If you expect a whitespace boundary before each match, then add the (?<!\S) negative lookbehind at the start of the pattern. If you expect a whitespace boundary at the end of the match, add the (?!\S) negative lookahead. If there can be any chars (as is in your original question), then SKIP-FAIL is the only way (see below).
So, in this first case, you may use
(?<!\S)(\d{4})([AB]?)([CD]?)(?!\S)
and
(?<!\S)T(\d{4})\/(\d{4})(?!\S)
See Pattern 1 demo and Pattern 2 demo.
Scenario with no specific boundaries
You need to make sure the second pattern is skipped when you parse the string with the first one. Use SKIP-FAIL technique for this:
'~T\d{4}/\d{4}(*SKIP)(*F)|(\d{4})(A|B)?(C|D)?~'
See the regex demo.
If you do not need the capturing groups, you may simplify it to
'~T\d{4}/\d{4}(*SKIP)(*F)|\d{4}[AB]?[CD]?~'
See another demo
Details
T\d{4}/\d{4} - T followed with 4 digits, / and another 4 digits
(*SKIP)(*F) - the matched text is discarded and the next match is searched from the matched text end
| - or
\d{4}[AB]?[CD]? - 4 digits, then optionally A or B and then optionally C or D.
From what you're asking, your current regexes don't really work. (A|B)?(C|D)? will never match AB. So I think you meant [ABCD]
Here's your new regex:
T(\d{4})\/(\d{4}) (\d{4}[ABCD]*)
For the string input:
T7878/6767 1234AB
We get the groups:
Match 1
Full match 0-17 `T7878/6767 1234AB`
Group 1. 1-5 `7878`
Group 2. 6-10 `6767`
Group 3. 11-17 `1234AB`
Regex101
Your syntax is pretty specific, so you regex just needs to be. Get rid of all your capture groups because they are screwing things up. You only need two groups which match your string syntax exactly.
First groups looks for word bounday followed by T then 4 digits then / then 4 more digits and a word break.
Second groups matches 4 digits and then letters A-D between 0 and 2 times. It has a negative lookbehind so will only match if there is a whitespace character before the 4 digits
(\bT\d{4}\/\d{4}\b)|(?<!\S)(\d{4}[A-D]{0,2})
Preg match all output:
Array
(
[0] => Array
(
[0] => T7878/6767
[1] => 1234AB
)
[1] => Array
(
[0] => T7878/6767
[1] =>
)
[2] => Array
(
[0] =>
[1] => 1234AB
)
)
The task is pretty clear. In the input we have a variable regex pattern, which supposedly contains named subpatterns, and in the output we need to get an array of subpattern names:
function get_subpattern_names($any_input_pattern) {
// What pattern to use here?
$pattern_to_get_names = '/.../';
preg_match_all($pattern_to_get_names, $any_input_pattern, $matches);
return $matches;
}
So the question is what to use as $pattern_to_get_names in the function above?
For example:
get_subpattern_names('/(?P<name>\w+): (?P<digit>\d+)/');
should return:
array('name', 'digit');
P.S.: According to PCRE documentation subpattern names consist of up to 32 alphanumeric characters and underscores.
As we don't control the input pattern, we need to take into account all possible syntaxes of naming. According to PHP documentation they are:
(?P<name>pattern), (?<name>pattern) and (?'name'pattern).
We also need to take into account nested subpatterns, for example:
(?<name1>.*(?<name2>pattern).*).
There's no need to count duplicating names, to preserve the appearance order, or to get numerical, non-capturing or other types of subpatterns. Just list of names if present.
You may get a list of all valid named capture group names using
"~(?<!\\\\)(?:\\\\{2})*\(\?(?|P?<([_A-Za-z]\w{0,31})>|'([_A-Za-z]\w{0,31})')~"
See the regex and an online PHP demo.
The point is to match an unescaped ( that is followed with a ? that is then followed with either P< or < and then has a group name pattern ending with > or ' followed with the group name pattern and then '.
$rx = "~(?<!\\\\)(?:\\\\{2})*\(\?(?|P?<([_A-Za-z]\w{0,31})>|'([_A-Za-z]\w{0,31})')~";
$s = "(?P<name>\w+): (?<name2>\w+): (?'digit'\d+)";
preg_match_all($rx, $s, $res);
print_r($res[1]);
yields
Array
(
[0] => name
[1] => name2
[2] => digit
)
Pattern details
(?<!\\) - no \ immediately to the left of the current location
(?:\\\\)* - 0+ double backslashes (to allow any escaped backslash before ()
\( - a (
\? - a ?
(?|P?<([_A-Za-z]\w{0,31})>|'([_A-Za-z]\w{0,31})') - a branch reset group:
P?<([_A-Za-z]\w{0,31})> - an optional P, <, a _ or an ASCII letter, 0 to 31 word chars (digits/letters/_) (captured into Group 1), and >
| - or
'([_A-Za-z]\w{0,31})' - ', a _ or an ASCII letter, 0 to 31 word chars (digits/letters/_) (also captured into Group 1), and then '
The group name patterns are all captured into Group 1, you just need to get $res[1].
Wiktor's solution does seem quite thorough, but here's what I came up with.
print_r(get_subpattern_names('/(?P<name>\w+): (?P<digit>\d+)/'));
function get_subpattern_names($input_pattern){
preg_match_all('/\?P\<(.+?)\>/i', $input_pattern, $matches);
return $matches[1];
}
This should work for most cases. More importantly, this is much more readable and self-explanatory.
Basically, I search for ?P< followed by (.+?) which translates to a non-greedy version of something in between the angular brackets. The function then just returns the first offset in the $matches array which points to the first set of parenthesis matched.
I'm still trying to get to grips with regex patterns and just after a little double-checking if someone wouldn't mind obliging!
I have a string which should either contain:
A 10 digit (numbers and letters) licence key, for example: 1234567890 OR
A 25 digit (numbers and letters) licence key, for example: ABCD1EFGH2IJKL3MNOP4QRST5 OR
A 29 digit licence number (25 numbers and letters, separated into 5 group by hyphens), for example: ABCD1-EFGH2-IJKL3-MNOP4-QRST51
I can match the first two fine, using ctype_alnum and strlen functions. However, for the last one I think I'll need to use regex and preg_match.
I had a go over at regex101.com and came up with the following:
preg_match('^([A-Za-z0-9]{5})+-+([A-Za-z0-9]{5})+-+([A-Za-z0-9]{5})+-([A-Za-z0-9]{5})+-+([A-Za-z0-9]{5})', $str);
Which seems to match what I'm looking for.
I want the string to only contain an exact match for a string beginning with the licence number, and contain nothing other than mixed upper/lower case letters and numbers in any order and hyphens between each group of 5 characters (so a total of 29 characters - I don't want any further matches). No white space, no other characters and nothing else before or after the 29 digit key.
Will the above work, without allowing any other combinations? Will it stop checking at 29 characters? I'm not sure if there is a simpler way to express this in regex?
Thanks for your time!
The main point is that you need to use both ^ (start of string) and $ (end of string) anchors. Also, when you use + after (...), you allow 1 or more repetitions of the whole subpattern inside the (...). So, you need to remove the +s and add the $ anchor. Also, you need regex delimiters for your regex to work in PHP preg_match. I prefer ~ so as not to escape /. Maybe it is not the case here, but this is a habit.
So, the regex can look like
'~^[A-Za-z0-9]{5}(?:-[A-Za-z0-9]{5}){4}$~'
See the regex demo
The (?:-[A-Za-z0-9]{5}){4} matches 4 occurrences of -[A-Za-z0-9]{5} subpattern. The (?:...) is a non-capturing group whose matched text does not get stored in any buffer (unlike the capturing group).
See the IDEONE demo:
$re = '~^[A-Za-z0-9]{5}(?:-[A-Za-z0-9]{5}){4}$~';
$str = "ABCD1-EFGH2-IJKL3-MNOP4-QRST5";
if (preg_match($re, $str, $matches)) {
echo "Matched!";
}
How about:
preg_match('/^([a-z0-9]{5})(?:-(?1)){4}$/i', $str);
Explanation:
/ : regex delimiter
^ : begining of string
( : begin group 1
[a-z0-9]{5} : exactly 5 alphanum.
) : end of group 1
(?: : begin NON capture group
- : a dash
(?1) : same as definition in group 1 (ie. [a-z0-9]{5})
){4} : this group must be repeated 4 times
$ : end of string
/i : regex delimiter with case insensitive modifier
a* means zero or more of a.
In the string 'abbabba' we have two occurrence of abba. (abba)bba and abb(abba).
preg_match_all matches only first occurrence.
Am i missing any basic of regex fundamental?
$string = 'abbabba';
preg_match_all("/ab*a/", $string, $matches);
print_r($matches);
Array ( [0] => Array ( [0] => abba ) )
Searches subject for all matches to the regular expression given in pattern and puts them in matches in the order specified by flags.
After the first match is found, the subsequent searches are continued on from end of the last match.
Source
Solution: Use a lookahead assertion together with a capturing group:
preg_match_all('/(?=(ab*a))/', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[1];
Because ab*a consumes text. It means the parser matches the first occurrence abba and starts again from bba, not matching with your pattern.
To find number of occurrences go with:
preg_match_all('/(?=ab*a)/', $input, $result);
print(count($result[0]));
To find matches, use:
preg_match_all('/(?=(ab*a))/', $input, $result);
print_r($result[1]);
The * indicates a greedy search. When the engine sees b* it will match all the way to the end of the string and move backwards until it gets to a b, then it will check if the matched string to that point matches your pattern.
First it looks to match the a, which it does right off the bat. Then it sees b* so it matches the string all the way up until the last occurrence of b in the string after the first a (because the * indicates the b should match 'greedily'), essentially matching abb at this point. It then sees it needs to match another a to make a successful match, so it grabs the next character, which is an a, and then it's done, leaving bba remaining, which won't match your pattern. Hope this helps.
If you do what Omega said you will achieve epic victory.