I'm trying to build a regexp that would match both strings:
/prefix/action
and
/prefix/action/43
but not
/prefix/action/
or
/prefix/action43
where 43 can be anything like [0-9]+, only if preceded by a slash
I also want to capture the number (not the slash), if any, as a named group.
~/prefix/action/?(?P<itemID>[0-9]+)?~
This expression does match, except for a condition that numbers must be preceded by a slash, all my attempts at adding lookahead or lookbehind assertions failed on this one.
I do understand that with additional external processing the task can be solved, but I'm looking for a regexp solution only. Thank you for understanding.
Your suggestions are greatly appreciated.
Solved by:
~/prefix/action(?:/(?P<itemID>[0-9]+))?~
How about making both the / and the following number optional?
^/prefix/action(?:/[0-9]+)?$
To capture the optional number in a named group add (?P<itemID>...) around the number regex:
^/prefix/action(?:/(?P<itemID>[0-9]+))?$
Also note that I've added start and end anchors, which were missing in your regex. Without them you'll get a match if you have a sub-string of the input that matches the regex.
Related
The data along with the regex pattern I'm using is linked here:
(?m)(?<=Note:)(\w+|\s+)*$
The sample text is:
Date:21
Month:03
Year:2017
Amount:50
Category:Test
Account:Testimg
Note:Tested
Date:21
Month:03
Year:2017
Amount:48
Category:Great
Account:Good
Note:Better
As you can imagine, I want all the text after the word "Note:" including the spaces and right up to the end of the line. I'm getting the results I need, but I'm not sure if this is a proper solution.
Is this the right way of going about it? Could it be made simpler?
Thank you.
Since your lines start with Note: and you need to use ^ anchor before it. You may use capturing as I suggested in my first comment:
preg_match_all('/^Note:(.+)/m', $s, $matches)
See this demo.
Here, ^Note:(.+) will assert the position at the start of the line, then Note: will get matched, and then any 1+ chars other than line break chars will get captured into Group 1, you will just need to access it using the right index.
Alternatively, use \K to drop the Note::
preg_match_all('/^Note:\K.+/m', $s, $matches)
See another regex demo
Here, ^Note:\K.+ will also match the Note: at the start of the line, and then the text will be dropped due to \K match reset operator, and then 1+ chars other than line break chars will get consumed and placed into the match buffer.
Note the $ anchor is not even necessary here, since .+ will only match greedily up to the end of line on its own.
You can simplify this to just /Note:(.*)$/gm, I've updated your regex101 example. But other than that yes you're going about it the right way.
I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.
A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.
You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.
In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.
I have some text with heading string and set of letters.
I need to get first one-digit number after set of string characters.
Example text:
ABC105001
ABC205001
ABC305001
ABCD105001
ABCD205001
ABCD305001
My RegEx:
^(\D*)(\d{1})(?=\d*$)
Link: http://www.regexr.com/390gv
As you cans see, RegEx works ok, but it captures first groups in results also. I need to get only this integer and when I try to put ?= in first group like this: ^(?=\D*)(\d{1})(?=\d*$) , Regex doesn't work.
Any ideas?
Thanks in advance.
(?=..) is a lookahead that means followed by and checks the string on the right of the current position.
(?<=...) is a lookbehind that means preceded by and checks the string on the left of the current position.
What is interesting with these two features, is the fact that contents matched inside them are not parts of the whole match result. The only problem is that a lookbehind can't match variable length content.
A way to avoid the problem is to use the \K feature that remove all on the left from match result:
^[A-Z]+\K\d(?=\d*$)
You're trying to use a positive lookahead when really you want to use non-capturing groups.
The one match you want will work with this regex:
^(?:\D*\d{1})(\d*)$
The (?: string will start a non-capturing group. This will not come back in matches.
So, if you used preg_match(';^(?:\D*\d{1})(\d*)$;', $string, $matches) to find your match, $matches[1] would be the string for which you're looking. (This is because $matches[0] will always be the full match from preg_match.)
try:
^(?:\D*)(\d{1})(?=\d*$) // (?: is the beginning of a no capture group
I got a "little" problem already...
I only want to replace some names with other names or something.
It works fine so far, but with some names I got a problem.
For example I want to replace "Cho" with "Cho'Gath",
but of course I don't want to replace "Cho'Gath" with "Cho'Gath'Gath".
So therefore I created this regular expression, and replace all "Cho"'s except of "Cho'Gath":
/\bCho(?!.+Gath)\b/i
This works and it doesn't replace "Cho'Gath", but it also doesn't replace "Cho Hello World Gath" ... that is my first problem!
The second one is follwing: I also want to replace all "Yi", but not "Master Yi", so I tried the same with the following regular expression:
/\b(?!Master.+)Yi\b/i
This doesn't replace "Master Yi", okay. But it also doesn't replace "Yi", but it should do! (I also tried /\b(?!Master(**\s**))Yi\b/i but this also doesn't work)
So far I don't know what to do know... can anyone help me with that?
Your first problem is easily solved if you replace .+ with the actual character that you want to match (or not to match): ', but let's have a look at the second one, this is quite interesting:
I also want to replace all "Yi", but not "Master Yi", so I tried the
same with the following regular expression:
/\b(?!Master.+)Yi\b/i
This is a negative lookahead on \b. The expression does match a single "Yi", but look what it does with "Master Yi":
Hello I am Master Yi
^
\b
This boundary is not followed by "Master" but followed by "Yi". So your expression also matches the "Yi" in this string.
The negative lookahead is quite pointless because it checks if the boundary that is directly followed by "Yi" (remember that a lookahead assertion just "looks ahead" without moving the pointer forward) is not directly followed by "Master". This is always the case.
You could use a lookbehind assertion instead, but only without the (anyways unnecessary) .+, because lookbehind assertions must have fixed lengths:
/\b(?<!Master )Yi\b/i
matches every "Yi" that is not preceded by "Master ".
For the first regex:
\bCho(?!.Gath)\b
For the second:
(?<!\bMaster )Yi\b
Your first regex had .+ in it, that is one character, one or more times; and as quantifiers are greedy by default, this swallows the whole input before reluctantly giving back to match the next token (G).
Your second regex used a negative lookahead, what you wanted was a negative lookbehind. That is, a position where the text before that position does not match.
And note that regexes in lookbehinds must be of finite length.
I know a little about preg_match, however there are some that look rather complex and some that contain symbols that I don't entirely understand. For example:
On the first one - I can only assume this has something to do with an e-mail address and url, but what do things like [^/] and the ? mean?
preg_match('#^(?:http://)?([^/]+)#i', $variable);
.....
In the second one - what do things like the ^, {5} and $ mean?
preg_match("/^[A-Z]{5}[0-9]{4}[A-Z]{1}$/", $variable);
It's just these small things I'm not entirely sure on and a brief explanation would be much appreciated.
Here are the direct answers. I kept them short because they won't make sense without an understanding of regex. That understanding is best gained at http://www.regular-expressions.info/tools.html. I advise you to also try out the regex helper tools listed there, they allow you to experiment - see live capturing/matching as you edit the pattern, very helpful.
Simple parentheses ( ) around something makes it a group. Here you have (?=) which is an assertion, specifically a positive look ahead assertion. All it does is check whether what's inside actually exists forward from the current cursor position in the haystack. Still with me?
Example: foo(?=bar) matches foo only if followed by bar. bar is never matched, only foo is returned.
With this in mind, let's dissect your regex:
/^.*(?=.{4,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/
Reads as:
^.* From Start, capture 0-many of any character
(?=.{4,}) if there are at least 4 of anything following this
(?=.*[0-9]) if there is: 0-many of any, ending with an integer following
(?=.*[a-z]) if there is: 0-many of any, ending with a lowercase letter following
(?=.*[A-Z]) if there is: 0-many of any, ending with an uppercase letter following
.*$ 0-many of anything preceding the End
Although I am not a fan of just posting links, I think a regex tutorial would be too much. So check out this Regular Expression cheat sheet it will probably get you on your way if you already have a little understanding of what it does.
Also check out this for some explanations and more helpful links; http://coding.smashingmagazine.com/2009/06/01/essential-guide-to-regular-expressions-tools-tutorials-and-resources/
First one:
The # actually don't have anything to do with the content that is matched. Usually, you use / as the delimiter character in a regex. Downside is, that you need to escape it everytime you want to use it. So here, # is used as the delimiter.
[^/] is a character group. [/] would match only the / character, ^ inverts this. [^/] matches all characters except the /.
Second one:
^ matches the beginning of the string, $ the end of the string. You can use this to enforce that the regex has to apply to the whole string you are matching on.
{5} is a quantifier. It is equivalent to {5,5} which is minimum 5, maximum 5, so it matches exactly 5 characters.
first one:
[^/] = everything but no slash
second one:
^ look from beginning of $variable
{5} exactly 5 occurencies of [A-Z]
$ look until end of $variable reached
combination of ^ and $ means that everything between that has to apply to $variable