How can I write a regexp that recursively matches RESTful path? - php

Regexp being not my strength, I would like some help on this one, if it is even possible:
I need to create a regexp that recursively matches a RESTful path. The purpose is to create a Symfony route matching this regexp. Here is some examples of what I mean by RESTful path:
/resources
/resources/123
/resources/123/children-resources
/resources/123/children-resources/123
/resources/123/children-resources/123/grandchildren-resources
And so on...
Basically, I would like this pattern to repeat itself indefinitly one or more time:
^\/[a-z]+(\-[a-z]+)*(\/[0-9]+)?$
Note that to access to a child resource, the identifier of the parent resource must be present.
I made a short list of unit tests (for two-level paths only to start) here:
https://regex101.com/r/Hxg0m4/2/tests
I searched questions on the same subject, but none were really relevant to my question. I also tried some modifications on the regexp above - like using the + sign at the end of the regexp, or use (?R)... It never passed my unit tests.
Any help will be gladly appreciated.
P.S: This is my first question on stackoverflow, please don't hesitate to tell me how to better formulate my question.

This recursive pattern should work:
^(\/[a-z]+(?:-[a-z]+)*(?:$|\/\d+(?:$|(?1))))
Explanation:
^ // assert start of string
(
\/ // start with a slash
[a-z]+(?:-[a-z]+)* // followed by a word
(?: // then, either:
$ // end of string
| // or:
\/ // a slash
\d+ // followed by digits
(?: // then, either:
$ // end of string
| // or:
(?1) // recurse the entire pattern (except the start of string anchor)
)
)
)

Related

Replace slashes / for dashes - in markdown files

I have ~300 markdown files held within a single Git repository. I need to change the format of all the internal links within these documents. Internal links are links that do not leave the repository. They look something like this:
Checkout the [new plugin](/developers/tools/plugin/install-the-plugin)
guide if you're stuck. If you know what you're doing head on over to
the [examples section](/developers/examples/plugin-tutorials) and get
your hands dirty.
I need to change all the internal links so that they:
Don't contain /developers/
All the slashes / are converted to dashes -.
The example above should look something like this:
Checkout the [new plugin](tools-plugin-install-the-plugin) guide if
you're stuck. If you know what you're doing head on over to the
[examples section](examples-plugin-tutorials) and get your hands dirty.
One caveat is that I don't want to target images. Images look identical to links, just with an exclamation mark ! at the start:
![Plugin Logo](/developers/tools/plugin/images/logo.png)
I've looked into things and it looks like sed is a way forward in terms of tools. I've managed to build the following regex that captures the links I'm looking for:
\]\(\/developers\/.*\)
This regex doesn't ignore the ![]() image syntax annoyingly. I was able to get PHP to return the locations of each hit on each page, but then I wasn't able to do a find-and-replace on the slashes / within those results.
Any ideas or pointers would be greatly appreciated.
You may do it with a single PHP regex:
$text = preg_replace('~!\[[^][]*]\([^()]*\)(*SKIP)(*F)|(?:\G(?!\A)|(?<=]\()/developers/)([^()/]*)/(?=[^()]*\))~', '$1-', $text)
See the regex demo
Details
!\[[^][]*]\([^()]*\)(*SKIP)(*F) - match !, [, any 0+ chars other than [ and ], then a ](, 0+ chars other than ( and ), ) and then omit the match and go on to search for the next match at the end of the current failed match
| - or
(?:\G(?!\A)|(?<=]\()/developers/) - end of the previous successful match (\G(?!\A)) or (|) a /developers/ string preceded with ](
([^()/]*) - Group 1 ($1): any 0+ chars other than (, ) and /
/ - a / char
(?=[^()]*\)) - ...that is followed with any 0+ chars other than ( and ) and then a ).

Selecting a X number of lines using Regular Expressions within the given separators

So, im trying to select only the content inside the --- traces, which is obviously multiline, but I have to use regular expressions due the fact not all entries will contain this very same amount of meta data.
The example im trying to match using PHP's preg_match function is this:
---
Title: A fresh start to all of us. Right?
Slug: a-fresh-start-to-all-of-us-right
Author: admin
Date: 12/05/2015 16:29
Draft: false
Image: http://placehold.it/400x280
Tags: codesans, install, markdown
---
# A fresh start comes, after all.
As you can see, nothing below the --- traces can be matched.
Im trying to match with this regular expression:
/^(.*)?[^\n]$/gm
but it doesn't seems to work so far. I already tried to tokenize the traces to make them delimiters-like but it also didn't work (this regex: /^(\-{3})?(.*)?(\-{3})?[^\n]$/gm).
Any guidance, please?
You need to use DOTALL flag i.e. s for this:
/(\A|\R)-{3}\R(.+?)-{3}(\R|\z)/s
btw there is no g flag in PHP.
RegEx Demo
Could use something like this (?m)^---.*\s+([\S\s]*?)^---
This way uses the normal dot.
(?m)
^ --- .* \s+
( [\S\s]*? ) # (1)
^ ---

RegEx match parent items containing duplicates

The title is probably confusing, but I have no idea how to properly phrase this.
So here's my goal. I have this string (or something like it):
[some_element]Random string chars [some_element]Ramdon[/some_element] some more random chars[/some_element]
(Some of you may recognize that these are WordPress shortcodes, but this methodology would still be useful elsewhere to me as well.)
What I need to do is match the parent "element". My usual approach might be something like this:
\[(\w+)].*?\[\/\1]
The problem is, this won't work in the above example, because it's "child element" has a the same closing "tag".
How could I get this regex to work, reglardless of how many nested children exist (literally, an infinite number of duplicate nested children)?
You can use this recursive regex in PHP:
$re = '~\s* ( \[some_element\] ( (?: .* | (?1) )* ) \[/some_element\] )~x';
RegEx Demo
This will give you this string in matched group #2:
Random string chars [some_element]Ramdon[/some_element] some more random chars
This looks like a job for recursive patterns (in php).
But I am sadly way too inexperienced to write the pattern here without trying :(
Maybe you can figure that out yourself. I am going to try it too, but that's gonna take a while...
would you look at that:
(The words between {[< and >]} are not part of the pattern, they describe what the subpattern should do.)
[ ( ( {[< some way to match any string except [word] >]} ) | (?R) )* ]

Can you help simplify my ip range matching regex?

I have a regex that will match IP addresses.
it looks like:
^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|*|25[0-5]-25[0-5]|2[0-4][0-9]-25[0-5]|2[0-4][0-9]-2[0-4][0-9]|[01]?[0-9][0-9]?-25[0-5]|[01]?[0-9][0-9]?-2[0-4][0-9]|[01]?[0-9][0-9]?-[01]?[0-9][0-9])$
which you will mostly recognise from many other posts here on SO. however I have modded it to match the range form XXX.XXX.XXX.XXX-XXY
However it now seems a little complex, particularly the final () capture. I would like some help to simplify this regex if possible.
Just to be clear
aaaa - not matched
999.1.1.1 - not matched
1.1.1.999 - not matched
192.168.2.1 - matched
192.168.2.* - matched
192.168.2.10-20 - matched
EDIT
I forgot to mention that I need the existing capture groups as well.
You could perhaps use optional groups (?: ... )? instead and use another grouping for the first 3 parts of the IP?
^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5](?:-25[0-5])?|
2[0-4][0-9](?:-(?:25[0-5]|2[0-4][0-9]))?|
[01]?[0-9][0-9]?(?:-(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))?|
\*)$
regex101 demo
Updated with capture groups
^((?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.
((?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.
((?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\.
(25[0-5](?:-25[0-5])?|
2[0-4][0-9](?:-(?:25[0-5]|2[0-4][0-9]))?|
[01]?[0-9][0-9]?(?:-(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))?|
\*)$
This works -
^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(?:(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\-(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(\*))$
As can be seen here
This should work and is a bit shorter:
^((25[0-5]|2[0-4]\d|[01]?\d{1,2})\.){3}(\*|(25[0-5]|2[0-4]\d|[01]?\d{1,2}))(\-(25[0-5]|2[0-4]\d|[01]?\d{1,2}))?$
See:
http://regex101.com/r/sD9iZ0

RegEx with character set inside positive lookbehind, Is it possible?

I need to match "name" only after "listing", but of course those words could be any url directory or page.
mydomain.com/listing/name
so the only thing I can "REGuest" (request) is to be some parent directory there.
In other words, I want to match the "position" i.e. whatever comes 2nd after the domain.
I'm trying something like
(?<=mydomain\.com/[^/\?&]+/)[^/\?&]+(?:/)?
But the character set won't work inside the positive lookbehind, at least it's setup to match only ONE character. As soon as I try to match other than one (e.g. modify it with +, ? or *) it just stops working.
I'm obviously missing the positive lookbehind syntax and it seems not intended for what I'm trying.
How can I match that 2nd level filename?
Thanks.
Regular-expressions.info states that
The bad news is that most regex flavors do not allow you to use just
any regex inside a lookbehind, because they cannot apply a regular
expression backwards. Therefore, the regular expression engine needs
to be able to figure out how many steps to step back before checking
the lookbehind...
(Read further, they even mention Perl, Python and Java.)
I think the quantifier might be the problem. I found this on stackoverflow and briefly flew over it.
Wouldn't it be possible to just match the whole path, and use a group for the second level filename:
mydomain\.com\/[^\/\?&]+\/([^\/\?&]+)(?:\/)?
(note: I had to escape the / for my tests...)
The result of this would be something like:
Array
(
[0] => mydomain.com/listing/name
[1] => name
)
Now, because I don't know the context of your problem, I just assumed you would be able to postprocess the results and get the group 1 (index 1) from the result. If not, I unfortunately don't know...

Categories