Regex cant limit search range - php

I have following problem:
I have a pattern like this:
/(?<=template=")(.*?)(.*\/)/gm
And an text like this:
template="test/widgets/glasgow.phtml"}}
My regex should search for the path infront of my file, i need to cut it out so that it will look at the end like this:
template="glasgow.phtml"}}
That works fine but the problem is that i have sometimes an text that looks like this:
block="core/template" template="test/widgets/getcallus.phtml"}}</p>
It cuts everything out till the </.
This is getting cutted out:
test/widgets/getcallus.phtml"}}</
Instead of:
test/widgets/
I have tried to limit the end with $ but it doesnt do nothing.
I am testing it on regexr.com
https://regexr.com/50hi2

You may use the following pattern:
template="\K[^"\/]*\/[^"\/]*\/
See the regex demo. In PHP, you may get rid of backslashes if you specify another regex delimiter:
$regex = '~template="\K[^"/]*/[^"/]*/~';
Details
template=" - literal text
\K - match reset operator
[^"\/]* - 0 or more chars other than / and "
\/ - a / char
[^"\/]* - 0 or more chars other than / and "
\/ - a / char
It is equal to template="\K(?:[^"\/]*\/){2}, where (?:...){2} repeats the non-capturing group sequence of patterns twice.

Be careful with (.*?)(.*\/)
This pattern corresponds to a REDOS vulnerability. (There are 2^n ways to read the n chars before the last /...
To keep a regex closed to yours, you can use
/(?<=template=")([^"]*?\/)*([^"]*)"/
([^"]*?\/)* reads as many blocks "non / nor " chars followed by /" as possible.
https://regex101.com/r/SMSv5R/2

Related

How to get strings after specific patterns

I have strings that looks like this:
searchUniqueCode("name", "FF14_1451_DAD4");searchUniqueCode("name", "F1F1_1451_DAD4");
searchUniqueCode("name", "FF14_3121_DAD4");searchUniqueCode("name", "SH14_1451_DAD4");
searchUniqueCode("name", "FF14_1131_DAD4");searchUniqueCode("name", "FF14_1451_D31F");
And I am trying to get all of the strings under " " after the common pattern searchUniqueCode("name", Like the FF14_1451_DAD4.
Is there any way I can achieve that using PHP?
Thank you!
Try this regex:
searchUniqueCode\("name",\s*"\K[^"]+
Click for Demo
Check code here
Explanation:
searchUniqueCode\("name", - matches searchUniqueCode\("name",
\s*" - matches 0 or more occurrences of a white-space followed by a "
\K - un-matches whatever has been matched so far and starts the match from the current position
[^"]+ - matches 1 or more occurrences of any character that is not a ". This is the desired match that will match everything until the next occurrence of "
Or
You can capture the desired values in group 1 as shown below:
searchUniqueCode\("name",\s*"([^"]+)" - Working code here

Regex prevent selecting characters from previous match

My title probably doesn't explain exactly what I mean. Take the following string:
POWERSTART9^{{2|3}}POWERENDx{{3^EXSTARTxEXEND}}=POWERSTART27^{{1|4}}POWEREND
What I want to do here is isolate the parts that are like this:
{{2|3}} or {{1|4}}
The following expression works to an extent, it selects the first one {{2|3}} with no issue:
\{\{(.*?)\|(.*?)\}\}
The problem is, it's not just selecting the first if {{2|3}} and the second of {{1|4}} because after the first one we have {{3^EXSTARTxEXEND}} so it's taking the starting point from {{3 and going right until the end of the second part I want |4}}
Here it is highlighted on RegExr:
I've never been great with regex and can't work out how to stop it doing that. Any ideas? I basically want it to only match the exact pattern and not something that contains it.
You may use
\{\{((?:(?!{{).)*?)\|(.*?)}}
See the regex demo.
If there can be no { and } inside the {{...}} substrings, you may use a simpler \{\{([^{}|]*)\|([^{}]*)}} expression (see demo).
Details
\{\{ - a {{ substring
((?:(?!{{).)*?) - Capturing group 1: any char (.), as few as possible (*?), that does not start a {{ char sequence (tempered greedy token)
[^{}|]* - any 0 or more chars other than {, } and |
\| - a | char
(.*?) - Capturing group 2: any 0 or more chars, as few as possible
[^{}]* - any 0 or more chars other than { and }
}} - a }} substring.
Try this \{\{([^\^|]*)\|([^\^|]*)\}\}
https://regex101.com/r/bLF8Oq/1

Matching string that contains asterisk [duplicate]

This question already has answers here:
Reference - What does this regex mean?
(1 answer)
Regular expressions: Ensuring b doesn't come between a and c
(4 answers)
Closed 3 years ago.
I know this sounds easy but I am stuck.
I want to match strings that has asterisk *.
Essentially I want to allow strings having asterisk at front/back/both but not middle:
(At max there will be 2 asterisks, front and both but no middle, and the presence string is a must)
ALLOW:
*string* *string string* string
DENY:
*str*ing*
*str*ing str*ing* str*ing
*string*****
I tried
^\\*?((?!\\*).)*\\\*?$
and somehow it works.
Can someone explains how this works?
And verify if this is correct because regex..hard to debug and check..
You can use the following regex:
^\*?\w+\*?$
demo: https://regex101.com/r/vwuXv2/1/
Explanations:
^ anchor imposing the start of a line
\*? a * appearing at most one time
\w+ at least 1 word char appearing in the text ([a-zA-Z0-9_] feel free to change it depending on your need)
\*? a * appearing at most one time
$ end of line anchor
Now if you are interested in partial line matches, you can use the following regex:
(?<=^| )\*?\w+\*?(?=$| )
demo: https://regex101.com/r/vwuXv2/2/
Explanations: you add lookbehind, lookahead assertions.
Adding Japanese characters as requested in the comment (add in [^*\s] all the characters you need to exclude from the words):
^\*?[^*\s]+\*?$
demo: https://regex101.com/r/RaCmwt/1/
or
^\*?[[:alpha:]]+\*?$
(with unicode flag enabled) or just
^\*?\p{L}+\*?$
demo: https://regex101.com/r/RaCmwt/2/
You can simply say: Optionally start with asterisk, 0 or more arbitrary characters except asterisk, optionally end with asterisk.
^\*?[^*]*\*?$
https://regex101.com/r/bibCEc/2
An alternative is to inverse the match and test if there is not ( i.e. if(!...)) any asterisk not at the begin or end using negative look behind and look ahead:
(?<!^)\*(?!$)
https://regex101.com/r/8St0M4/2
According to your recent edit you would use the quatifier + to match 1 or more characters:
^\*?[^*]+\*?$
https://regex101.com/r/bibCEc/3

PHP Regex Not Quite Working

I am using the following regex:
^[0-9.,]*(([.,][-])|([.,][0-9]{2}))?\$
I use this regex to check for valid prices -- so it catches/rejects things like xxx, or llddd or 34.23dsds
and allows things like 100 or 120.00
The problem with it seems to be if it is blank(empty) it passes as valid which it should not -- any ideas how to change this??
Thanks
One of your problems is that you use the dot in your regex which stands for "any character". If you mean a dot you need to escape it like this \.
Also you should have at least one number in it so exchange the asterisk * by a + for "one or more".
Then you can have .,.,.,.,.,.,- if you do not remove the comma and dot from the first part:
^[0-9]+(([\.,][-])|([\.,][0-9]{2}))?$
Taking yoiur regex and just solving the "don't match blanks" problem:
^[0-9.,]+(([.,][-])|([.,][0-9]{2}))?$
the * allows 0 or more, while the + allows 1 or more, thus the * allowed blanks but the + will not, instead there must be at least one digit.
EDIT:
You should clean this regex up a bit to be
^[0-9]+(?:[.,-](?:[0-9]{2})?)?$
This solves the matching of ",,,"
http://www.regextester.com/?fam=95185
EDIT 2: #Fuzzzzel pointed out that this did not match the case "50,-" which we assume you would like to match and that removing capturing groups is presumptive. Here's the latest iteration of my suggested regex:
^[0-9]+([.,-](-|([0-9]{2}))?)?$

Capturing key value pairs from a url string with a regex pattern

I'm trying to use regex to parse a string like the below:
/subject=hello±#text=something that may contain\#hello.com or a normal sla/sh±#date=blah/somethingelseI don't want to capture after the first/
into:
subject = hello
text =something that may contain\#hello.com or a normal sla/sh
date = blah
Ideally I'd like to be able to split the string after the first '/' by something like '±#' - and only that combination in that order.
I've looked around and at the minute have the below:
([^/±#,= ]+)=([^±#,= ]+)
But this doesn't match only '±#' - it matches either # or ±.
It also doesn't cope with the escaped #. (Instead i get: text= something that may contain\ ).
Is there a better way to do this?
Thanks
Try this:
(?:\/|(?<=±#))(.*?=.*?)(?:±#|$|\/(?!.*±#))
See live demo
An important part is the negative look ahead after the trailing slash /(?!.*±#) - this means "match a slash, but only if ±# doesn't appear in the input after it".
Given this input:
/subject=hello±#text=something that may contain\#hello.com or a normal sla/sh±#date=blah/somethingelseI don't want to capture after the first/
It produces matches whose group 1 are:
subject=hello
text=something that may contain\#hello.com or a normal sla/sh
date=blah

Categories