php regex capturing parantheses - php

I can't capture what i want with capturing parantheses...
I'm searching in /hodsakers/marsh-zwartArray/d and i want to capture marsh-zwartArray but sometimes the last / is not present in what i'm searching.
I search and try many things =/ like :
(marshall[\s\S]*)\/
it work but if the last backslash is not present it doesn't.
I also try
(marsh[\s\S]*)(\/)?
in this case that's the opposite, it work without the last backslash but not anymore if there is one, it will get all the string and capture nothing =/
So i don't know how i can capture in both cases =/
Thanks for your help

You may use a [^\/]* negated character class to match 0+ chars other than /:
/marsh[^\/]*/
See the regex demo

Related

Query regarding Regex pattern

The data along with the regex pattern I'm using is linked here:
(?m)(?<=Note:)(\w+|\s+)*$
The sample text is:
Date:21
Month:03
Year:2017
Amount:50
Category:Test
Account:Testimg
Note:Tested
Date:21
Month:03
Year:2017
Amount:48
Category:Great
Account:Good
Note:Better
As you can imagine, I want all the text after the word "Note:" including the spaces and right up to the end of the line. I'm getting the results I need, but I'm not sure if this is a proper solution.
Is this the right way of going about it? Could it be made simpler?
Thank you.
Since your lines start with Note: and you need to use ^ anchor before it. You may use capturing as I suggested in my first comment:
preg_match_all('/^Note:(.+)/m', $s, $matches)
See this demo.
Here, ^Note:(.+) will assert the position at the start of the line, then Note: will get matched, and then any 1+ chars other than line break chars will get captured into Group 1, you will just need to access it using the right index.
Alternatively, use \K to drop the Note::
preg_match_all('/^Note:\K.+/m', $s, $matches)
See another regex demo
Here, ^Note:\K.+ will also match the Note: at the start of the line, and then the text will be dropped due to \K match reset operator, and then 1+ chars other than line break chars will get consumed and placed into the match buffer.
Note the $ anchor is not even necessary here, since .+ will only match greedily up to the end of line on its own.
You can simplify this to just /Note:(.*)$/gm, I've updated your regex101 example. But other than that yes you're going about it the right way.

Regex issue with capturing group PHP

I have been looking around for a while and haven't been able to find what I'm looking for or perhaps I just don't know enough to know what I am looking for...
I have a situation where I want to capture and expression surrounded by parentheses that is a alphabetic character between 1 and 5 characters long. This is not difficult. Next I want to exclude the exact string (AP) from my search.
I am using regex101 and I appear to be getting a match on the string I want to get (or the space right before it) but the match is only returning ''' and not the full (EXC) that I want. Here is the regex I have currently:
/((?=\(\D{1,5}\)))(?:.(?!AP))/gism
Any suggestions or pointers in the right direction; I will provide more information if necessary.
Here is a comparison of the regex you first had, and the regex that works (spaces added):
/((?= \(\D{1,5}\) )) (?: .(?!AP ))/gism
/(( \D{1,5} )) (?<! ( AP ))/gism
Your first pattern will match the opening parenthesis when they are around something nonnumeric that doesn't start with ap. Look arounds do not match characters, keep in mind. (The dot is the only character that isn't in a look around.)
The other pattern removes the literal parenthesis: \( and \). It also removed the look ahead ?= so that you are actually capturing something. The last part of the regex is a negative look behind. In this case, all it does is prevent the pattern from matching a p when the thing it matches ends in ap.
I cannot explain why the second pattern has so many unnecessary parenthesis, however. This is equivalent:
/(\D{1,5})(?<!AP)/gism

Regex - Match Word Aslong As Nothing Follows It

Having a little trouble with regex. I'm trying to test for a match but only if nothing follows it. So in the below example if I go to test/create/1/2 - it still matches. I only want to match if it's explicitally test/create/1 (but the one is dynamic).
if(preg_match('^test/create/(.*)^', 'test/create/1')):
// do something...
endif;
I've found some answers that suggest using $ before my delimiter but it doesn't appear to do anything. Or a combination of ^ and $ but I can't quite figure it out. Regex confuses the hell out of me!
EDIT:
I didn't really explain this well enough so just to clarify:
I need the if statement to return true if a URL is test/create/{id} - the {id} being dynamic (and of any length). If the {id} is followed by a forward slash the if statement should fail. So that if someone types in test/create/1/2 - it will fail because of the forward slash after the 1.
Solution
I went for thedarkwinter's answer in the end as it's what worked best for me, although other answers did work as well.
I also had to add an little extra in the regex to make sure that it would work with hyphens as well so the final code looked like this:
if(preg_match('^test/create/[\w-]*$^', 'test/create/1')):
// do something...
endif;
/w matches word characters, and $ matches end of string
if(preg_match('^test/create/\w*$^', 'test/create/1'))
will match test/create/[word/num] and nothing following.
I think thats what you are after.
edit added * in \w*
Here you go:
"/^test\\/create\\/([^\\/]*)$/"
This says:
The string that starts with "test" followed by a forward slash (remember the first backslash escapes the second so PHP puts a letter backslash in the input, which escapes the / to regex) followed by create followed by a forward slash followed by and capture everything that isn't a slash which is then the end of the string.
Comment if you need more detail
I prefer my expressions to always start with / because it has no meaning as a regex character, I've seen # used, I believe some other answer uses ^, this means "start of string" so I wouldn't use it as my regex delimiters.
Use following regular expression (use $ to denote end of the input):
'|test/create/[^/]+$|'
If you want only match digits, use folloiwng instead (\d match digit character):
'^test/create/\d+$^'
The ^ is an anchor for the beginning of the line, i.e. no characters occurring before the ^ . Use a $ to designate the end of the string, or end of the line.
EDIT: wanted to add a suggestion as well:
Your solution is fine and works, but in terms of style I'd advise against using the carat (^) as a delimiter -- especially because it has special meaning as either negation or as a start of line anchor so it's a bit confusing to read it that way. You can legally use most special characters as long as they don't occur (or are escaped) in the regex itself. Just talking about a matter of style/maintainability here.
Of course nearly every potential delimiter has some special meaning, but you also often tend to see the ^ at the beginning of a regex so I might chose another alternative. For example # is a good choice here :
if(preg_match('#test/create/[\w-]*$#', $mystring)) {
//etc
}
The regex abc$ will match abc only when it's the last string.
abcd # no match
dabc # match
abc # match

What do these certain symbols/parts mean in preg_match?

I know a little about preg_match, however there are some that look rather complex and some that contain symbols that I don't entirely understand. For example:
On the first one - I can only assume this has something to do with an e-mail address and url, but what do things like [^/] and the ? mean?
preg_match('#^(?:http://)?([^/]+)#i', $variable);
.....
In the second one - what do things like the ^, {5} and $ mean?
preg_match("/^[A-Z]{5}[0-9]{4}[A-Z]{1}$/", $variable);
It's just these small things I'm not entirely sure on and a brief explanation would be much appreciated.
Here are the direct answers. I kept them short because they won't make sense without an understanding of regex. That understanding is best gained at http://www.regular-expressions.info/tools.html. I advise you to also try out the regex helper tools listed there, they allow you to experiment - see live capturing/matching as you edit the pattern, very helpful.
Simple parentheses ( ) around something makes it a group. Here you have (?=) which is an assertion, specifically a positive look ahead assertion. All it does is check whether what's inside actually exists forward from the current cursor position in the haystack. Still with me?
Example: foo(?=bar) matches foo only if followed by bar. bar is never matched, only foo is returned.
With this in mind, let's dissect your regex:
/^.*(?=.{4,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/
Reads as:
^.* From Start, capture 0-many of any character
(?=.{4,}) if there are at least 4 of anything following this
(?=.*[0-9]) if there is: 0-many of any, ending with an integer following
(?=.*[a-z]) if there is: 0-many of any, ending with a lowercase letter following
(?=.*[A-Z]) if there is: 0-many of any, ending with an uppercase letter following
.*$ 0-many of anything preceding the End
Although I am not a fan of just posting links, I think a regex tutorial would be too much. So check out this Regular Expression cheat sheet it will probably get you on your way if you already have a little understanding of what it does.
Also check out this for some explanations and more helpful links; http://coding.smashingmagazine.com/2009/06/01/essential-guide-to-regular-expressions-tools-tutorials-and-resources/
First one:
The # actually don't have anything to do with the content that is matched. Usually, you use / as the delimiter character in a regex. Downside is, that you need to escape it everytime you want to use it. So here, # is used as the delimiter.
[^/] is a character group. [/] would match only the / character, ^ inverts this. [^/] matches all characters except the /.
Second one:
^ matches the beginning of the string, $ the end of the string. You can use this to enforce that the regex has to apply to the whole string you are matching on.
{5} is a quantifier. It is equivalent to {5,5} which is minimum 5, maximum 5, so it matches exactly 5 characters.
first one:
[^/] = everything but no slash
second one:
^ look from beginning of $variable
{5} exactly 5 occurencies of [A-Z]
$ look until end of $variable reached
combination of ^ and $ means that everything between that has to apply to $variable

PHP regexp for /prefix/action(/id)

I'm trying to build a regexp that would match both strings:
/prefix/action
and
/prefix/action/43
but not
/prefix/action/
or
/prefix/action43
where 43 can be anything like [0-9]+, only if preceded by a slash
I also want to capture the number (not the slash), if any, as a named group.
~/prefix/action/?(?P<itemID>[0-9]+)?~
This expression does match, except for a condition that numbers must be preceded by a slash, all my attempts at adding lookahead or lookbehind assertions failed on this one.
I do understand that with additional external processing the task can be solved, but I'm looking for a regexp solution only. Thank you for understanding.
Your suggestions are greatly appreciated.
Solved by:
~/prefix/action(?:/(?P<itemID>[0-9]+))?~
How about making both the / and the following number optional?
^/prefix/action(?:/[0-9]+)?$
To capture the optional number in a named group add (?P<itemID>...) around the number regex:
^/prefix/action(?:/(?P<itemID>[0-9]+))?$
Also note that I've added start and end anchors, which were missing in your regex. Without them you'll get a match if you have a sub-string of the input that matches the regex.

Categories