Regex - matching all between second set of brackets ([]) - php

I have the following string that I need to match only the last seven digets between [] brackets. The string looks like this
[15211Z: 2012-09-12] ([5202900])
I only need to match 5202900 in the string contained between ([]), a similar number could appear anywhere in the string so something like this won't work (\d{7})
I also tried the following regex
([[0-9]{1,7}])
but this includes the [] in the string?

If you just want the 7 digits, not the brackets, but want to make sure that the digits are surrounded with brackets:
(?<=\[)\d{7}(?=\])
FYI: This is called a positive lookahead and positive lookbehind.
Good source on the topic: http://www.regular-expressions.info/lookaround.html

Try matching \(\[(\d{7})\]\), so you match this whole regular expression, then you take group 1, the one between unescaped parentheses. You can replace {7} with a '*' for zero or more, + for 1 or more or a precise range like you already showed in your question.

You can try to use
\[(\d{1,7})\]

If first pattern looks like yours (not only digits), then this should work for you to extract group of digits surrounded by brackets like ([123]):
\(\[(\d+)\]\)

From your details, lookbehind and lookaround seems to be good one. You can also use this one:
(\d{7})\]\)$
Since the pattern of seven digit is expected at the end of the line, engine need to work less in order to find the match.
Hope it helps!

Here is a benchmark (in Perl, but I think is close the same in php) that compares lookaround approach and capture group:
use Benchmark qw(:all);
my $str = q/[15211Z: 2012-09-12] ([5202900])/;
my $count = -3;
cmpthese($count, {
'lookaround' => sub {
$str =~ /(?<=\[)\d{7}(?=\])/;
},
'capture group' => sub {
$str =~ /\[(\d{7})\]/;
},
});
result:
Rate lookaround capture group
lookaround 274914/s -- -70%
capture group 931043/s 239% --
As we can see, capture is more than 3 times faster than lookaround.

Related

Group regex with fix part

$txt = "toto1 555.4545.555.999.7465.432.674";
$rgx = "/([\w]+)\s([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/";
preg_match($rgx, $txt, $res);
var_dump($res);
I would like to simplify this pattern by avoiding repeating "([0-9]+)" because i don't know how many they are.
Any one can say me how ?
Here is a direct answer to the question, as you have stated it:
/[\w]+\s[0-9]+(?:\.[0-9]+)+/
However, note that I have removed all of the numbered capture groups. This could be problematic, depending on what you're actually trying to achieve.
It is not possible to "count" with capture groups in regular expressions, so you would need to write some other code (i.e. not just one match, with one regex, and using back-references) to deal with this if you wish to run any queries like "What digits appear after the fifth "."?"
There are two ways you can do this. If you just need to verify that the string matches the pattern, this regex will do the job: \w+\s(?:[0-9]+\.?)+
However, if you need to split the string in to it's component parts (in my interpretation, the beginning word followed by the sequence of decimal separated numbers), then you could use this pattern: (\w+)\s((?:[0-9]+\.?)+)
The second pattern will return the beginning word, toto1 in group 1, followed by the decimal separated numbers in group 2 555.4545.555.999.7465.432.674 which you could then split in PHP if required: $sequence = explode('.', $matches[2]);
What you need can be obtained with a preg_split with a regex matching 1 or more whitespaces or dots:
$txt = "toto1 555.4545.555.999.7465.432.674";
$rgx = '/[\s.]+/';
$res = preg_split($rgx, $txt);
print_r($res);
See the PHP demo
If you need a regex approach, you can use a \G based regex with preg_match_all:
'~(?|([\w]+)|(?!\A)\G[\s.]*([0-9]+))~'
See the regex demo and a PHP demo:
$txt = "toto1 555.4545.555.999.7465.432.674";
$rgx = '~(?|(\w+)|(?!\A)\G[\s.]*([0-9]+))~';
preg_match_all($rgx, $txt, $res);
print_r($res[1]);
Pattern details:
The (?|...) is a branch reset group to reset group IDs in all the branches
(\w+) - Group 1 matches 1+ word chars
| - or (then goes Branch 2)
(?!\A)\G - the end of the previous successful match
[\s.]* - zero or more whitespaces or dots
([0-9]+) - Group 1 (again!) matching 1 or more digits.

How to match all words but "stop" in a string by regex

another regex question. I use PHP, and have a string: fdjkaljfdlstopfjdslafdj. You see there is a stop in the middle. I just want to replace any other words excluding that stop. i try to use [^stop], but it also includes the s at the end of the string.
My Solution
Thanks everyone’s help here.
I also figure out a solution with pure RegEx method(I mean in my knowledge scoop to RegEx. PCRE verbs are too advanced for me). But it needs 2 steps. I don’t want to mix PHP method in, because sometimes the jobs are out of coding area, i.e. multi-renaming filenames in Total Commander.
Let’s see the string: xxxfooeoropwfoo,skfhlk;afoofsjre,jhgfs,vnhufoolsjunegpq. For example, I want to keep all foos in this string, and replace any other non-foo greedily into ---.
First, I need to find all the non-foo between each foo: (?<=foo).+?(?=foo).
The string will turn into xxxfoo---foo---foo---foolsjunegpq, just both sides non-foo words left now.
Then use [^-]+(?=foo)|(?<=foo)[^-]+.
This time: ---foo---foo---foo---foo---. All words but foo have been turned into ---.
i just dont want to include "stop"...
You can skip it by using PCRE verbs (*SKIP)(*F) try like this
stop(*SKIP)(*F)|.
Demo at regex101
or sequence: (stop)(*SKIP)(*F)|(?:(?!(?1)).)+
or for words: stop(*SKIP)(*F)|\w+
[^stop] doesn't means any text that is NOT stop. It just means any character that is not one of the 4 characters inside [...] which is in this case s,t,o,p.
Better to split on the text you don't want to match:
$s = 'fdjkaljfdlstopfjdslafdjstopfoobar';
php> $arr = preg_split('/stop/', $s);
php> print_r($arr);
Array
(
[0] => fdjkaljfdl
[1] => fjdslafdj
[2] => foobar
)
You can generalize this to any pattern:
(?<neg>stop)(*SKIP)(*FAIL)|(?s:.)+?(?=\Z|(?&neg))
Demo
Just put the pattern you don't want in the neg group.
This regex will try to do the following for any character position:
Match the pattern you don't want. If it matches, discard it with (*SKIP)(*FAIL) and restart another match at this position.
If the pattern you don't want doesn't match at a particular position, then match anything, until either:
You reach the end of the input string (\Z)
Or the pattern you don't want immediately follows the current matching position ((?&neg))
This approach is slower than manually tuning the expression, you could get better performance at the cost of repeating yourself, which avoids the recursion:
stop(*SKIP)(*FAIL)|(?s:.)+?(?=\Z|stop)
But of course, the best approach would be to use the features provided by your language: match the string you don't want, then use code to discard it and keep everything else.
In PHP, you can use the PREG_OFFSET_CAPTURE flag to tell the preg_match_all function to provide you the offsets of each match.

Regex for word not followed by asterisk

i need a regex (for php) matching any 1 or 2 characters that start with a + and end not with a *.
So far i got this one: /\+\b\w{1,2}\b/ which finds +a3 but also finds +a3* as the asterisk is seen as after the word.
In a String like +find +in +me* i only want to find the +in but not the +me*.
I tried with /\+\b[\w\*]{1,2}\b/ but that does not seem to make any difference.
preg_replace($regex,'','+do+find +in +me*'); //expected result: '+do+find +me*'
How about:
/\+\w{1,2}\b(?!\*)/
(?!\*) is a negative lookahead that assure a * doesn't follow the two character.
The \b isn't mandatory between \+ and \w.
Edit according to comment:
This matches the "+2c" in "whatever+2c" what would i need to change that it wont match this but only matches for "whatever +2c" or "+2c whatever"
Use this one:
/(?:^|\s)\+\w{1,2}(?:\s|$)
According to comments:
/(?<=^|\s)\+\w{1,2}(?:\s|$)/

Quick PHP regex for digit format

I just spent hours figuring out how to write a regular expression in PHP that I need to only allow the following format of a string to pass:
(any digit)_(any digit)
which would look like:
219211_2
so far I tried a lot of combinations, I think this one was the closest to the solution:
/(\\d+)(_)(\\d+)/
also if there was a way to limit the range of the last number (the one after the underline) to a certain amount of digits (ex. maximal 12 digits), that would be nice.
I am still learning regular expressions, so any help is greatly appreciated, thanks.
The following:
\d+_\d{1,12}(?!\d)
Will match "anywhere in the string". If you need to have it either "at the start", "at the end" or "this is the whole thing", then you will want to modify it with anchors
^\d+_\d{1,12}(?!d) - must be at the start
\d+_\d{1,12}$ - must be at the end
^\d+_\d{1,12}$ - must be the entire string
demo: http://regex101.com/r/jG0eZ7
Explanation:
\d+ - at least one digit
_ - literal underscore
\d{1,12} - between 1 and 12 digits
(?!\d) - followed by "something that is not a digit" (negative lookahead)
The last thing is important otherwise it will match the first 12 and ignore the 13th. If your number happens to be at the end of the string and you used the form I originally had [^\d] it would fail to match in that specific case.
Thanks to #sln for pointing that out.
You don't need double escaping \\d in PHP.
Use this regex:
"/^(\d+)_(\d{1,12})$/"
\d{1,12} will match 1 to 12 digist
Better to use line start/end anchors to avoid matching unexpected input
Try this:
$regex= '~^/(\d+)_(\d+)$~';
$input= '219211_2';
if (preg_match($regex, $input, $result)) {
print_r($result);
}
Just try with following regex:
^(\d+)_(\d{1,12})$

Regexp for numbers with delimeters

I would like to know how can I create a regexp to match the pattern 3,231 or 3,231,201 in php ?
Thanks
/([0-9,]+k?)/
The above regex will match numbers, comma, with an optional 'k' in the end.
A pattern could look like this:
/([0-9]+)(,[0-9]{3})*/
This will allow for something like:
123
123,123
123,123,123
1,123
123456
but not:
123,
123,1
123,12
123,1234
You can modify the behaviour, e.g. allowing for more/less digits after the comma by changing {3} into + or {1,4} (1 to 4) or {3,} (3 or more).
Well that is pretty easy;
/^[0-9,]+$/
this works with (for example) 1231,1231,1312,12312

Categories