RegEx with character set inside positive lookbehind, Is it possible? - php

I need to match "name" only after "listing", but of course those words could be any url directory or page.
mydomain.com/listing/name
so the only thing I can "REGuest" (request) is to be some parent directory there.
In other words, I want to match the "position" i.e. whatever comes 2nd after the domain.
I'm trying something like
(?<=mydomain\.com/[^/\?&]+/)[^/\?&]+(?:/)?
But the character set won't work inside the positive lookbehind, at least it's setup to match only ONE character. As soon as I try to match other than one (e.g. modify it with +, ? or *) it just stops working.
I'm obviously missing the positive lookbehind syntax and it seems not intended for what I'm trying.
How can I match that 2nd level filename?
Thanks.

Regular-expressions.info states that
The bad news is that most regex flavors do not allow you to use just
any regex inside a lookbehind, because they cannot apply a regular
expression backwards. Therefore, the regular expression engine needs
to be able to figure out how many steps to step back before checking
the lookbehind...
(Read further, they even mention Perl, Python and Java.)
I think the quantifier might be the problem. I found this on stackoverflow and briefly flew over it.
Wouldn't it be possible to just match the whole path, and use a group for the second level filename:
mydomain\.com\/[^\/\?&]+\/([^\/\?&]+)(?:\/)?
(note: I had to escape the / for my tests...)
The result of this would be something like:
Array
(
[0] => mydomain.com/listing/name
[1] => name
)
Now, because I don't know the context of your problem, I just assumed you would be able to postprocess the results and get the group 1 (index 1) from the result. If not, I unfortunately don't know...

Related

Get 'XXX' value using a RegEx in PHP

I need some help building a regex for get the value of XXX in the following set of possible matches:
+58XXXYYYYYYY
+580XXXYYYYYYY
0XXXYYYYYYY
XXXYYYYYYY
This are phone numbers so XXX is dynamic and will not hold always the same value. The RegEx is intended to be used on PHP so I know I should use preg_match() function but I have not idea about the regex. Can any give me some advice on this?
This sounds like it matches your requirements:
(\d{3})\d{7}$
With a Live Demo
If you want to match the last 3 numbers, you could use /([0-9]{3,3})$/.
The parenthesis indicates a capturing group (what you are looking for). Inside that group you want to match a pattern [...]of any number 0-9 exactly 3 times {3,3}. And finally you want to match the last occurrence of this pattern, so the $ indicates the end of the line.
A very handy tool to building simple regex queries is Debuggex. I use it all the time!

Retrieve 0 or more matches from comma separated list inside parenthesis using regex

I am trying to retrieve matches from a comma separated list that is located inside parenthesis using regular expression. (I also retrieve the version number in the first capture group, though that's not important to this question)
What's worth noting is that the expression should ideally handle all possible cases, where the list could be empty or could have more than 3 entries = 0 or more matches in the second capture group.
The expression I have right now looks like this:
SomeText\/(.*)\s\(((,\s)?([\w\s\.]+))*\)
The string I am testing this on looks like this:
SomeText/1.0.4 (debug, OS X 10.11.2, Macbook Pro Retina)
Result of this is:
1. [6-11] `1.0.4`
2. [32-52] `, Macbook Pro Retina`
3. [32-34] `, `
4. [34-52] `Macbook Pro Retina`
The desired result would look like this:
1. [6-11] `1.0.4`
2. [32-52] `debug`
3. [32-34] `OS X 10.11.2`
4. [34-52] `Macbook Pro Retina`
According to the image above (as far as I can see), the expression should work on the test string. What is the cause of the weird results and how could I improve the expression?
I know there are other ways of solving this problem, but I would like to use a single regular expression if possible. Please don't suggest other options.
When dealing with a varying number of groups, regex ain't the best. Solve it in two steps.
First, break down the statement using a simple regex:
SomeText\/([\d.]*) \(([^)]*)\)
1. [9-14] `1.0.4`
2. [16-55] `debug, OS X 10.11.2, Macbook Pro Retina`
Then just explode the second result by ',' to get your groups.
Probably the \G anchor works best here for binding the match to an entry point. This regex is designed for input that is always similar to the sample that is provided in your question.
(?<=SomeText\/|\G(?!^))[(,]? *\K[^,)(]+
(?<=SomeText\/|\G) the lookbehind is the part where matches should be glued to
\G matches where the previous match ended (?!^) but don't match start
[(,]? *\ matches optional opening parenthesis or comma followed by any amount of space
\K resets beginning of the reported match
[^,)(]+ matches the wanted characters, that are none of ( ) ,
Demo at regex101 (grab matches of $0)
Another idea with use of capture groups.
SomeText\/([^(]*)\(|\G(?!^),? *([^,)]+)
This one without lookbehind is a bit more accurate (it also requires the opening parenthesis), of better performance (needs fewer steps) and probably easier to understand and maintain.
SomeText\/([^(]*)\( the entry anchor and version is captured here to $1
|\G(?!^),? *([^,)]+) or glued to previous match: capture to $2 one or more characters, that are not , ) preceded by optional space or comma.
Another demo at regex101
Actually, stribizhev was close:
(?:SomeText\/([^() ]*)\s*\(|(?!^)\G),?\s*([^(),]+)(?=[^()]*\))
Just had to make that one class expect at least one match
(?:SomeText\/([0-9.]+)\s*\(|(?!^)\G),?\s*([^(),]+)(?=[^()]*\)) is a little more clear as long as the version number is always numbers and periods.
I wanted to come up with something more elegant than this (though this does actually work):
SomeText\/(.*)\s\(([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?\)
Obviously, the
([^\,]+)?\,?\s?
is repeated 6 times.
(It can be repeated any number of times and it will work for any number of comma-separated items equal to or below that number of times).
I tried to shorten the long, repetitive list of ([^\,]+)?\,?\s? above to
(?:([^\,]+)\,?\s?)*
but it doesn't work and my knowledge of regex is not currently good enough to say why not.
This should solve your problem. Use the code you already have and add something like this. It will determine where commas are in your string and delete them.
Use trim() to delete white spaces at the start or the end.
$a = strpos($line, ",");
$line = trim(substr($line, 55-$a));
I hope, this helps you!

How to match all words but "stop" in a string by regex

another regex question. I use PHP, and have a string: fdjkaljfdlstopfjdslafdj. You see there is a stop in the middle. I just want to replace any other words excluding that stop. i try to use [^stop], but it also includes the s at the end of the string.
My Solution
Thanks everyone’s help here.
I also figure out a solution with pure RegEx method(I mean in my knowledge scoop to RegEx. PCRE verbs are too advanced for me). But it needs 2 steps. I don’t want to mix PHP method in, because sometimes the jobs are out of coding area, i.e. multi-renaming filenames in Total Commander.
Let’s see the string: xxxfooeoropwfoo,skfhlk;afoofsjre,jhgfs,vnhufoolsjunegpq. For example, I want to keep all foos in this string, and replace any other non-foo greedily into ---.
First, I need to find all the non-foo between each foo: (?<=foo).+?(?=foo).
The string will turn into xxxfoo---foo---foo---foolsjunegpq, just both sides non-foo words left now.
Then use [^-]+(?=foo)|(?<=foo)[^-]+.
This time: ---foo---foo---foo---foo---. All words but foo have been turned into ---.
i just dont want to include "stop"...
You can skip it by using PCRE verbs (*SKIP)(*F) try like this
stop(*SKIP)(*F)|.
Demo at regex101
or sequence: (stop)(*SKIP)(*F)|(?:(?!(?1)).)+
or for words: stop(*SKIP)(*F)|\w+
[^stop] doesn't means any text that is NOT stop. It just means any character that is not one of the 4 characters inside [...] which is in this case s,t,o,p.
Better to split on the text you don't want to match:
$s = 'fdjkaljfdlstopfjdslafdjstopfoobar';
php> $arr = preg_split('/stop/', $s);
php> print_r($arr);
Array
(
[0] => fdjkaljfdl
[1] => fjdslafdj
[2] => foobar
)
You can generalize this to any pattern:
(?<neg>stop)(*SKIP)(*FAIL)|(?s:.)+?(?=\Z|(?&neg))
Demo
Just put the pattern you don't want in the neg group.
This regex will try to do the following for any character position:
Match the pattern you don't want. If it matches, discard it with (*SKIP)(*FAIL) and restart another match at this position.
If the pattern you don't want doesn't match at a particular position, then match anything, until either:
You reach the end of the input string (\Z)
Or the pattern you don't want immediately follows the current matching position ((?&neg))
This approach is slower than manually tuning the expression, you could get better performance at the cost of repeating yourself, which avoids the recursion:
stop(*SKIP)(*FAIL)|(?s:.)+?(?=\Z|stop)
But of course, the best approach would be to use the features provided by your language: match the string you don't want, then use code to discard it and keep everything else.
In PHP, you can use the PREG_OFFSET_CAPTURE flag to tell the preg_match_all function to provide you the offsets of each match.

regex matching url

In PHP, the klein routing will match as many routes as it can.
2 routes I have set up are conflicting. They are:
$route1: '/websites/[i:websiteId]/users/[i:id]?'
and
$route2: '/websites/[i:websiteId]/users/[a:filename].[json|csv:extension]?'
This is the URL I'm trying to match, which I think should match the first and not the second, is:
/api/v1-test/websites/100/users/4
The regex produced for these two are:
$regex1: `^/api(?:/(v1|v1-test))/websites(?:/(?P<websiteId>[0-9]++))/users(?:/(?P<id>[0-9]++))?$`
$regex2: `^/api(?:/(v1|v1-test))/websites(?:/(?P<websiteId>[0-9]++))/users(?:/(?P<filename>[0-9A-Za-z]++))(?:\.(?P<extension>json|csv))?$`
I mean for it not to match if there is no '.csv' or '.json'. The problem is that it is matching both routes. For the second, the resulting filename is '4' and the extension is blank.
Sending /api/v1-test/websites/100/users/users.csv works correctly and only matches the second route.
I only have control over the route, not the regex or the matching.
Thanks.
This bit here
(?:\.(?P<extension>json|csv))?
at the end of your second regex causes it to match whether or not there's a filename due to the ? at the very end. Question marks mean 0 or 1 of the previous expression. Get rid of that and, at the least, strings will only match this regex when they have the extension.
To make this change, just remove the question mark from your second route, like so:
$route2: '/websites/[i:websiteId]/users/[a:filename].[json|csv:extension]'
The problem is that the match_type is defined really... weirdly:
$match_types = array(
'i' => '[0-9]++',
'a' => '[0-9A-Za-z]++',
[...]
As such, you can't really capture a sequence corresponding to [a-zA-Z] only... The only option I see would be to use 3 routes:
$route1: '/websites/[i:websiteId]/users/[i:id]?'
$route2: '/websites/[i:websiteId]/users/[a:filename]'
$route3: '/websites/[i:websiteId]/users/[a:filename].[json|csv:extension]'
And to assign the same actions for routes 2 and 3. Then you would have this:
/api/v1-test/websites/100/users/ is matched by 1
/api/v1-test/websites/100/users/4 is matched by 1
/api/v1-test/websites/100/users/test is matched by 2
/api/v1-test/websites/100/users/test.csv is matched by 3
Which seems like the behavior you wanted.
Abother (easier) solution would be to take advantage of this bit of the documentation:
Routes automatically match the entire request URI.
If you need to match only a part of the request URI
or use a custom regular expression, use the # operator.
You can then define your routes like this:
$route1: '#/websites/[0-9]+/users/[0-9]*$'
$route1: '#/websites/[0-9]+/users/[a-zA-Z]+(\.[a-zA-Z]+)?$'

Regular expression help needed

Although I can find a lot of tutorials on regular expressions, it remains above my grasp. The regular expression that I want to create is simple (judged by what I see in some of the examples), but I simply can not figure it out.
I want to do a simple replacement as follows:
I have image metadata saved in a MySQL table, with fields: id, name, title and alt.
In my content, I want to write [[IMAGE:1:right]]content here[[image:2:left]].
I want to get the matches of the ID (the digit) and the float (left or right) and replace the entire string with the image floated left or right, retrieved by the ID from the database table.
Here is my attempt:
preg_match("/^\[\[image:(\d+):(left|right)\]\]+/i", "[[IMAGE:1:right]]content here[[image:2:left]]", $matches);
This gives me the return of:
Array ( [0] => [[IMAGE:1:right]] [1] => 1 [2] => right )
So, it finds one, but I want it to find ALL of them, as I may have more than one image in a post. As far as I can tell, the + there should match all entries, and the i should match case insensitive. It appears as if the case insensitive way works, but I get only one return.
Could someone please let me know what I am doing wrong?
That's not quite how it works. That + only applies to the token immediately before it - the ]. You want to make the match global in Perl vernacular, which for PHP (which I think you're using?) means calling the function preg_match_all(). You'll also have to remove the ^, as only one of the images occurs at the beginning of the string.
Also, [ and ] are special characters in regex - so please escape them when you want a literal bracket by writing \[\[ and \]\].

Categories