Query regarding Regex pattern - php

The data along with the regex pattern I'm using is linked here:
(?m)(?<=Note:)(\w+|\s+)*$
The sample text is:
Date:21
Month:03
Year:2017
Amount:50
Category:Test
Account:Testimg
Note:Tested
Date:21
Month:03
Year:2017
Amount:48
Category:Great
Account:Good
Note:Better
As you can imagine, I want all the text after the word "Note:" including the spaces and right up to the end of the line. I'm getting the results I need, but I'm not sure if this is a proper solution.
Is this the right way of going about it? Could it be made simpler?
Thank you.

Since your lines start with Note: and you need to use ^ anchor before it. You may use capturing as I suggested in my first comment:
preg_match_all('/^Note:(.+)/m', $s, $matches)
See this demo.
Here, ^Note:(.+) will assert the position at the start of the line, then Note: will get matched, and then any 1+ chars other than line break chars will get captured into Group 1, you will just need to access it using the right index.
Alternatively, use \K to drop the Note::
preg_match_all('/^Note:\K.+/m', $s, $matches)
See another regex demo
Here, ^Note:\K.+ will also match the Note: at the start of the line, and then the text will be dropped due to \K match reset operator, and then 1+ chars other than line break chars will get consumed and placed into the match buffer.
Note the $ anchor is not even necessary here, since .+ will only match greedily up to the end of line on its own.

You can simplify this to just /Note:(.*)$/gm, I've updated your regex101 example. But other than that yes you're going about it the right way.

Related

php regex capturing parantheses

I can't capture what i want with capturing parantheses...
I'm searching in /hodsakers/marsh-zwartArray/d and i want to capture marsh-zwartArray but sometimes the last / is not present in what i'm searching.
I search and try many things =/ like :
(marshall[\s\S]*)\/
it work but if the last backslash is not present it doesn't.
I also try
(marsh[\s\S]*)(\/)?
in this case that's the opposite, it work without the last backslash but not anymore if there is one, it will get all the string and capture nothing =/
So i don't know how i can capture in both cases =/
Thanks for your help
You may use a [^\/]* negated character class to match 0+ chars other than /:
/marsh[^\/]*/
See the regex demo

Replacing all matches except if surrounded by or only if surrounded by

Given a text string (a markdown document) I need to achieve one of this two options:
to replace all the matches of a particular expression ((\W)(theWord)(\W)) all across the document EXCEPT the matches that are inside a markdown image syntax ![Blah theWord blah](url).
to replace all the matches of a particular expression ({{([^}}]+)}}\[\[[^\]\]]+\]\]) ONLY inside the markdown images, ie.: ![Blah {{theWord}}[[1234]] blah](url).
Both expressions are currently matching everything, no matter if inside the markdown image syntax or not, and I've already tried everything I could think.
Here is an example of the first option
And here is an example of the second option
Any help and/or clue will be highly appreciated.
Thanks in advance!
Well I modified first expression a little bit as I thought there are some extra capturing groups then made them by adding a lookahead trick:
-First one (Live demo):
\b(vitae)\b(?![^[]*]\s*\()
-Second one (Live demo):
{{([^}}]+)}}\[\[[^\]\]]+\]\](?=[^[]*]\s*\()
Lookahead part explanations:
(?! # Starting a negative lookahead
[^[]*] # Everything that's between brackets
\s* # Any whitespace
\( # Check if it's followed by an opening parentheses
) # End of lookahead which confirms the whole expression doesn't match between brackets
(?= means a positive lookahead
You can leverage the discard technique that it really useful for this cases. It consists of having below pattern:
patternToSkip1 (*SKIP)(*FAIL)|patternToSkip2 (*SKIP)(*FAIL)| MATCH THIS PATTERN
So, according you needs:
to replace all the matches of a particular expression ((\W)(theWord)(\W)) all across the document EXCEPT the matches that are inside a markdown image syntax
You can easily achieve this in pcre through (*SKIP)(*FAIL) flags, so for you case you can use a regex like this:
\[.*?\](*SKIP)(*FAIL)|\bTheWord\b
Or using your pattern:
\[.*?\](*SKIP)(*FAIL)|(\W)(theWord)(\W)
The idea behind this regex is tell regex engine to skip the content within [...]
Working demo
The first regex is easily fixed with a SKIP-FAIL trick:
\!\[.*?\]\(http[^)]*\)(*SKIP)(*FAIL)|\bvitae\b
To replace with the word of your choice. It is a totally valid way in PHP (PCRE) regex to match something outside some markers.
See Demo 1
As for the second one, it is harder, but acheivable with \G that ensures we match consecutively inside some markers:
(\!\[.*?|(?<!^)\G)((?>(?!\]\(http).)*?){{([^}]+?)}}\[{2}[^]]+?\]{2}(?=.*?\]\(http[^)]*?\))
To replace with $1$2{{NEW_REPLACED_TEXT}}[[NEW_DIGITS]]
See Demo 2
PHP:
$re1 = "#\!\[.*?\]\(http[^)]*\)(*SKIP)(*FAIL)|\bvitae\b#i";
$re2 = "#(\!\[.*?|(?<!^)\G)((?>(?!\]\(http).)*?){{([^}]+?)}}\[{2}[^]]+?\]{2}(?=.*?\]\(http[^)]*?\))#i";

PHP - Preg match reversal?

How do you inverse a Regex expression in PHP?
This is my code:
preg_match("!<div class=\"foo\">.*?</div>!is", $source, $matches);
This is checking the $source String for everything within the Container and stores it in the $matches variable.
But what I want to do is reversing the expression i.e. I want to get everything that is NOT inside the container.
I know there is something called negative lookahead, but I am really bad with Regular expressions and didn't manage to come up with a working solution.
Simply using ?!
preg_match("?!<div class=\"foo\">.*?</div>!is", $source, $matches);
Does not seem to work.
Thanks!
New solution
Since your goal is to remove the matching divs, as mentioned in the comment, using the original regex with preg_split, plus implode would be the simpler solution:
implode('', preg_split('~<div class="foo">.*?</div>~is', $text))
Demo on ideone
Old solution
I'm not sure whether this is a good idea, but here is my solution:
~(.*?)(?:<div class="foo">.*?</div>|$)~is
Demo on regex101
The result can be picked out from capturing group 1 of each matches.
Note that the last match is always an empty string, and there can be empty string match between 2 matching divs or if the string starts with matching div. However, you need to concatenate them anyway, so it seems to be a non-issue.
The idea is to rely on the fact that lazy quantifier .*? will always try the sequel (whatever comes after it) first before advancing itself, resulting in something similar to look-ahead assertion that makes sure that whatever matched by .*? will not be inside <div class="foo">.*?</div>.
The div tag is matched along in each match in order to advance the cursor past the closing tag. $ is used to match the text after the last matching div.
The s flag makes . matches any character, including line separators.
Revision: I had to change .+? to .*?, since .+? handle strings with 2 matching div next to each other and strings start with matching div.
Anyway, it's not a good idea to modify HTML with regular expression. Use a parser instead.
<div class=\"foo\">.*?</div>\K|.
You can simply do this by using \K.
\K resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.
A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.
You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.
In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

Regex - Match Word Aslong As Nothing Follows It

Having a little trouble with regex. I'm trying to test for a match but only if nothing follows it. So in the below example if I go to test/create/1/2 - it still matches. I only want to match if it's explicitally test/create/1 (but the one is dynamic).
if(preg_match('^test/create/(.*)^', 'test/create/1')):
// do something...
endif;
I've found some answers that suggest using $ before my delimiter but it doesn't appear to do anything. Or a combination of ^ and $ but I can't quite figure it out. Regex confuses the hell out of me!
EDIT:
I didn't really explain this well enough so just to clarify:
I need the if statement to return true if a URL is test/create/{id} - the {id} being dynamic (and of any length). If the {id} is followed by a forward slash the if statement should fail. So that if someone types in test/create/1/2 - it will fail because of the forward slash after the 1.
Solution
I went for thedarkwinter's answer in the end as it's what worked best for me, although other answers did work as well.
I also had to add an little extra in the regex to make sure that it would work with hyphens as well so the final code looked like this:
if(preg_match('^test/create/[\w-]*$^', 'test/create/1')):
// do something...
endif;
/w matches word characters, and $ matches end of string
if(preg_match('^test/create/\w*$^', 'test/create/1'))
will match test/create/[word/num] and nothing following.
I think thats what you are after.
edit added * in \w*
Here you go:
"/^test\\/create\\/([^\\/]*)$/"
This says:
The string that starts with "test" followed by a forward slash (remember the first backslash escapes the second so PHP puts a letter backslash in the input, which escapes the / to regex) followed by create followed by a forward slash followed by and capture everything that isn't a slash which is then the end of the string.
Comment if you need more detail
I prefer my expressions to always start with / because it has no meaning as a regex character, I've seen # used, I believe some other answer uses ^, this means "start of string" so I wouldn't use it as my regex delimiters.
Use following regular expression (use $ to denote end of the input):
'|test/create/[^/]+$|'
If you want only match digits, use folloiwng instead (\d match digit character):
'^test/create/\d+$^'
The ^ is an anchor for the beginning of the line, i.e. no characters occurring before the ^ . Use a $ to designate the end of the string, or end of the line.
EDIT: wanted to add a suggestion as well:
Your solution is fine and works, but in terms of style I'd advise against using the carat (^) as a delimiter -- especially because it has special meaning as either negation or as a start of line anchor so it's a bit confusing to read it that way. You can legally use most special characters as long as they don't occur (or are escaped) in the regex itself. Just talking about a matter of style/maintainability here.
Of course nearly every potential delimiter has some special meaning, but you also often tend to see the ^ at the beginning of a regex so I might chose another alternative. For example # is a good choice here :
if(preg_match('#test/create/[\w-]*$#', $mystring)) {
//etc
}
The regex abc$ will match abc only when it's the last string.
abcd # no match
dabc # match
abc # match

Categories