regex to match contents of last [bracketed text] - php

Target string:
Come to the castle [Mario], I've baked
you [a cake]
I want to match the contents of the last brackets, ignoring the other brackets ie
a cake
I'm a bit stuck, can anyone provide the answer?

Try this, uses a negative look ahead assertion
\[[^\[]*\](?!\[)$

This should do it:
\[([^[\]]*)][^[]*(?:\[[^\]]*)?$
\[([^[\]]*)] matches any sequence of […] that does not contain [ or ];
[^[]* matches any following characters that are not [ (i.e. the begin of another potential group of […]);
(?:\[[^\]]*)?$ matches a potential single [ that is not followed by a closing ].

You could use some sort of a look-ahead. And because we don't know the precise nature of what text/characters will have to be processed, it could look something like this, but it will need a little work:
\[[a-z\s]*\](?!.*\[([a-z\s]*)\])
Your contents should be matched in \1, or possibly \2.

Simple is best: .*\[(.*?)] will do what you want; with nested brackets it will return the last, innermost one and ignore bad nesting. There's no need for a negative character class: the .*? makes sure you don't have any right brackets in the match, and since the .* makes sure you match at the last possible spot, it also keeps out any 'outer' left brackets.

Related

Regex - Match characters but don't include within results

I have got the following Regex, which ALMOST works...
(?:^https?:\/\/)(?:www|[a-z]+)\.([^.]+)
I need the result to be the only result, or within the same position in the Array.
So for example this http://m.facebook.com/ matches perfect, there is only 1 group.
However, if I change it to http://facebook.com/ then I get com/in place of where Facebook should be. So I need to have (?:www|[a-z]+) as an optional check really.
Edit:
What I expect is just to match facebook, if ANY of the strings are as follows:
http://www.facebook.com
http://facebook.com
http://m.facebook.com
And obviously the https counterparts.
This is my Regex now
(?:^https?:\/\/)(?:www)?\.?([^.]+)
This is close, however it matches the m on when I try `http://m.facebook.com
https://regex101.com/r/GDapY5/1
So I need to have (?:www|[a-z]+) as an optional check really.
A ? at the end of a pattern is generally used for "optional" bits -- it means "match zero or one" of that thing, so your subpattern would be something like this:
(?:www|[a-z]+)?
If you're simply trying to get the second level domain, I wouldn't bother with regex, because you'll be constantly adjusting it to handle special cases you come across. Just split on dots and take the penultimate value:
$domain = array_reverse(explode('.', parse_url($str)['host']))[1];
Or:
$domain = array_reverse(explode('.', parse_url($str, PHP_URL_HOST)))[1];
Perhaps you could make the first m. part optional with (?:\w+\.)?.
Instead of a capturing group you could use \K to reset the starting point of the reported match.
Then match one or more word characters \w+ and use a positive lookahead to assert that what follows is a dot (?=\.)
For example:
^https?://(?:www)?(?:\w+\.)?\K\w+(?=\.)
Edit: Or you could match for m. or www. using an alternation:
^https?://(?:m\.|www\.)?\K\w+(?=\.)
Demo Php

Positive look ahead regex confusing

I'm building this regex with a positive look ahead in it. Basically it must select all text in the line up to last period that precedes a ":" and add a "|" to the end to delimit it. Some sample text below. I am testing this in gskinner and editpadpro which has full grep regex support apparently so if I could get the answers in that for I'd appreciate it.
The regex below works to a degree but I am unsure if it is correct. Also it falls down if the text contains brackets.
Finally I would like to add another ignore rule like the one that ignores but includes "Co." in the selection. This second ignore rule would ignore but include periods that have a single Capital letter before them. Sample text below too. Thanks for all the help.
^(?:[^|]+\|){3}(.*?)[^(?:Co)]\.(?=[^:]*?\:)
121| Ryan, T.N. |2001. |I like regex. But does it like me (2) 2: 615-631.
122| O' Toole, H.Y. |2004. |(Note on the regex). Pages 90-91 In: Ryan, A. & Toole, B.L. (Editors) Guide to the regex functionality in php. Timmy, Tommy& Stewie, Quohog. * Produced for Family Guy in Quohog.
I don't think I understand what you want to do. But this part [^(?:Co)] is definitely not correct.
With the square brackets you are creating a character class, because of the ^ it is a negated class. That means at this place you don't want to match one of those characters (?:Co), in other words it will match any other character than "?)(:Co".
Update:
I don't think its possible. How should I distinguish between L. Co. or something similar and the end of the sentence?
But I found another error in your regex. The last part (?=[^:]*?\:) should be (?=[^.]*?\:) if you want to match the last dot before the : with your expression it will match on the first dot.
See it here on Regexr
This seems to do what you want.
(.*\.)(?=[^:]*?:)
It quite simply matches all text up to the last full stop that occurs before the colon.

Regex to allow all characters except repeats of a particular given character

I've been fumbling with this for a bit and thought I'd put it up to the regex experts:
I want to match strings like this:
abc[abcde]fff
abcffasd
so I want to allow single brackets (e.g. [ or ]). However, I don't want to allow double brackets in sequence (e.g. [[ or ]]).
This means this string shouldn't pass the regex:
abc[abcde]fff[[gg]]
My best guess so far is based on an example I found, something like:
(?>[a-zA-Z\[\]']+)(?!\[\[)
However, this doesn't work (it matches even when double brackets are present), presumably because the brackets are contained in the first part as well.
You want something like:
^(?:\[?[^\[]|\[$)*$
At each character, the pattern accepts an opening bracket followed by another character, or the end of the string.
Or a little more neatly, using a negative lookahead:
^(?:(?!\[\[).)*$
Here, the pattern will only match characters as long as it doesn't see two [[ ahead.
Not to be deterred!
^(?:(?:[a-z]+)|(?:\](?!\]))|(?:\[(?!\[)))+$
I removed the only two or more thing. I removed the redundant character classes for only one characters. This seems to pass all test cases I can think of. Any string of characters containing only single [ or ].
Let me know if it works for you!
I'm not sure I can answer this, but I'll post what I have as I'm going through it.
First, I have this which seems to match without the brackets. This is any letter not follwed by 2 or more of itself.
^(?:([a-z])(?!\1{2,}))+$
We can add the brackets into the character class and it will start matching brackets; but, obviously it will also allow them to follow the same rules as the letters (two together is valid). How do we separate the bracket behavior from the letter behavior?
^(?:([a-z\[\]])(?!\1{2,}))+$
This feels dirty, but seems to work. Looking at the other answer, I like that a lot better. Now to figure out why I didn't think of it.
^(?:(?:([a-z])(?!\1{2,}))|(?:[\]](?![\]]))|(?:[\[](?![\[])))+$
Also, for some reason I thought it was 1-2 of each character but only one of [ and ] so this is all worthless anyway :).
You can try this negative lookahead:
$arr = array('abc[abcde]fff', 'abcffasd', 'abc[abcde]fff[[gg]]');
foreach ($arr as $str) {
echo $str,' => ';
$ret = preg_match('/^(?!.*?(\[\[)).+$/', $str, $m);
echo "$ret\n";
}
OUTPUT
abc[abcde]fff => 1
abcffasd => 1
abc[abcde]fff[[gg]] => 0
This regex should allow all letters and brackets except two consecutive brackets (i.e. [], [[ or ]])
([a-zA-Z\[\]][a-zA-Z])+
EDIT: Sorry, this won't work for strings with odd length

How can I check if a string EXACTLY matches a regex pattern?

I'm working on a registration script for my client's product sales website.
I'm currently working on a reference ID input area, and I want to make sure that the reference ID is within the correct parameters of the payment method
The Reference ID will look something like this: XXXXX-XXXXX-XXXXX
I'm trying to use this RegEx pattern to match it: /(\w+){5}-(\w+){5}-(\w+){5}/
This matches it perfectly, but it also matches XXXXX-XXXXX-XXXXXXXXXX
Or at least it finds a match in there. I want it to make sure the entire string matches. I'm not too familiar with RegEx
How can I do this?
You need to use start and finish anchors. Alternatively, if you don't need to capture those groups, you can omit the parenthesis.
Also, the +{5} means match more than once exactly 5 times. I believe you didn't want that so I dropped the +.
/^\w{5}-\w{5}-\w{5}\z/
Also, I used \z so your string doesn't match "abcde-12345-edcba\n".
Use ^ and $ to match the start and end of the input string, respectively.
Also note that your use of + was superfluous, as (\w+){5} means "a word character, at least once, times five" which means it can match at least five times. You probably meant (\w){5} (or just \w{5} if you don't need the backreference; I'll assume in my example that you do).
/^(\w){5}-(\w){5}-(\w){5}$/
put the regular expression in between ^ and $ to match the whole string and check if it matches anything
example:
/^(\w+){5}-(\w+){5}-(\w+){5}$/
Try
/^([\w]{5,5})-([\w]{5,5})-([\w]{5,5})$/i
There are several online regex tester out there, I work with this one before I code.
Enclose it in "^" and "$" thus:
/^(\w+){5}-(\w+){5}-(\w+){5}$/
You need ^ to match the start of the string and $ to match the end:
/^\w{5}-\w{5}-\w{5}$/
Note that (\w+){5} is incorrect because that means five repetitions of \w+, but that in turn means "one or more word characters".
/^(\w){5}-(\w){5}-(\w){5}$/
You need to explicitly say that you want the pattern to start at the beginning of the string and end at it's ending.
You can improve it: /^((\w){5}-){2}(\w){5}$/ ; this way, you can easily modify the number of elements your serial number might have.
Use ^ and $ to mark the start and end of the regex string:
/^\w{5}-\w{5}-\w{5}$/
http://www.regular-expressions.info/anchors.html
In preg, \b marks word boundaries. So you could try with something like
/\b(\w+){5}-(\w+){5}-(\w+){5}\b/

regex to remove all whitespaces except between brackets

I've been wrestling with an issue I was hoping to solve with regex.
Let's say I have a string that can contain any alphanumeric with the possibility of a substring within being surrounded by square brackets. These substrings could appear anywhere in the string like this. There can also be any number of bracket-ed substrings.
Examples:
aaa[bb b]
aaa[bbb]ccc[d dd]
[aaa]bbb[c cc]
You can see that there are whitespaces in some of the bracketed substrings, that's fine. My main issue right now is when I encounter spaces outside of the brackets like this:
a aa[bb b]
Now I want to preserve the spaces inside the brackets but remove them everywhere else.
This gets a little more tricky for strings like:
a aa[bb b]c cc[d dd]e ee[f ff]
Here I would want the return to be:
aaa[bb b]ccc[d dd]eee[f ff]
I spent some time now reading through different reg ex pages regarding lookarounds, negative assertions, etc. and it's making my head spin.
NOTE: for anyone visiting this, I was not looking for any solution involving nested brackets. If that was the case I'd probably do it pragmatically like some of the comments mentioned below.
This regex should do the trick:
[ ](?=[^\]]*?(?:\[|$))
Just replace the space that was matched with "".
Basically all it's doing is making sure that the space you are going to remove has a "[" in front of it, but not if it has a "]" before it.
That should work as long as you don't have nested square brackets, e.g.:
a a[b [c c]b]
Because in that case, the space after the first "b" will be removed and it will become:
aa[b[c c]b]
This doesn't sound like something you really want regex for. It's very easy to parse directly by reading through. Pseudo-code:
inside_brackets = false;
for ( i = 0; i < length(str); i++) {
if (str[i] == '[' )
inside_brackets = true;
else if str[i] == ']'
inside_brackets = false;
if ( ! inside_brackets && is_space(str[i]) )
delete(str[i]);
}
Anything involving regex is going to involve a lot of lookbehind stuff, which will be repeated over and over, and it'll be much slower and less comprehensible.
To make this work for nested brackets, simply change inside_brackets to a counter, starting at zero, incrementing on open brackets, and decrementing on close brackets.
This works for me:
(\[.+?\])|\s
Then you simply pass in a replacement value of $1 when you call the replace function. The idea is to look for the patterns inside the brackets first and make sure they're untouched. And then every space outside the brackets gets replaced with nothing.
Note that I tested this with Regex Hero (a .NET regex tester), and not in PHP. So I'm not 100% sure this will work for you.
That was an interesting one. Sounded simple at first, then seemed rather difficult. And then the solution I finally arrived at was indeed simple. I was surprised the solution didn't require a lookaround of any sort. And it should be faster than any method that uses a lookaround.
How to do this depends on what should be done with:
a b [ c [ d [ e ] f ] g
That is ambiguous; possible answers are at least:
ab[ c [ d [ e ] f ]g
ab[ c [ d [ e ]f]g
error out; the brackets don't match!
For the first two cases, you can use regexps. For the third case, you'd be much better off with a (small) parser.
For either case one or two, split the string on the first [. Strip spaces from everything before [ (that's obviously outside of the brackets). Next, look for .*\] (case 1) or .*?\] (case 2) and move that over to your output. Repeat until you're out of input.
Resurrecting this question because it had a simple solution that wasn't mentioned.
\[[^]]*\](*SKIP)(*F)|\s+
The left side of the alternation matches complete sets of brackets then deliberately fails. The right side matches and captures spaces to Group 1, and we know they are the right spaces because if they were within brackets they would have been failed by the expression on the left.
See the matches in this demo
This means you can just do
$replace = preg_replace("~\[[^]]*\](*SKIP)(*F)|\s+~","",$string);
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
The following will match start-of-line or end-of-bracket (which must come before any space you want to match) followed by anything that isn't start-of-bracket or a space, followed by some space.
/((^|\])[^ \[]*) +/
replacing "all" with $1 will remove the first block of spaces from each non-bracketed sequence. You will have to repeat the match to remove all spaces.
Example:
abcd efg [hij klm]nop qrst u
abcdefg [hij klm]nopqrst u
abcdefg[hij klm]nopqrstu
done

Categories