PHP url routing regex - php

I am using a regular expression to route my application, this is the regex I am using:
#/users/([a-zA-Z0-9\-\_]+)/posts#
But unfortunately it matches against these these urls too:
/users/:uid/posts/:pid
/users/:uid/posts/:pid/comment/:cid
But it shouldn't, it should match exact the same url so only:
/users/:uid/posts
What should I change in the regex to make it match the exact same string?
Thanks for help

You should include anchors for the beginning (^) and end ($) of the string:
#^/users/([a-zA-Z0-9\-\_]+)/posts/?$#
I also allowed for an optional / at the end of the URL.

Related

Match string not followed by another string

I'm trying to match a string. For example, in the following strings, first should match, the second should not.
/users/akinuri/
/users/akinuri/asd/
I've tried a negative lookahead, but it didn't work. Obviously I'm doing something wrong.
if (preg_match("/\/users\/.*?\/(?!.*\/)/", "/users/akinuri/asd/")) {
echo "match";
}
I'm experimenting (trying to create a route system). Right now, I'm just trying to determine if the requested uri is valid. If a visitor requests the second string, it should return false and I'll return a not found page. For example, test this on SO. The second url returns a not found page.
https://stackoverflow.com/users/2202732/akinuri/
https://stackoverflow.com/users/2202732/akinuri/asdasd
So, can I do this check using regex? If so, how? What am I doing wrong?
Or should I split the text and then do further checks? This seems a bit redundant.
You don't need a lookahead for this. You can use a negated character class [^/]+ to match 1+ of any character except /. You also need to use anchors in regex to make sure you match complete input.
For PHP code, you can use this regex:
'~^/users/[^/]+/?$~'
Note that /?$ makes trailing slash an optional match.
RegEx Demo

PHP Regex that matches URLs without invalid characters

I'm looking for a Regex that will find URLs in a string but ignore pre/following characters which are not part of the URL.
for example, from the string:
example.co.uk (main site: example.com),
The Regex will find:
example.co.uk and exaple.com.
In order to find URLs within a given string, I use the Regex '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i'.
The problem is that if I use this regex with the given string above, it finds example.co.uk and example.com) with the closing bracket at the end.
Is there any Regex that can find URLs in a string, not matter what characters it has from both sides?
Thanks!
You may have to use a word boundary (\b) ...
(?:www\.|https?:\/\/)?[a-z0-9]+\.[a-z0-9]{2,4}\S*\b
^
regex demo

How to exclude a word or string from an URL - Regex

I'm using the following Regex to match all types of URL in PHP (It works very well):
$reg_exUrl = "%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";
But now, I want to exclude Youtube, youtu.be and Vimeo URLs:
I'm doing something like this after researching, but it is not working:
$reg_exUrl = "%\b(([\w-]+://?|www[.])(?!youtube|youtu|vimeo)[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";
I want to do this, because I have another regex that match Youtube urls which returns an iframe and this regex is causing confusion between the two Regex.
Any help would be gratefully appreciated, thanks.
socodLib, to exclude something from a string, place yourself at the beginning of the string by anchoring with a ^ (or use another anchor) and use a negative lookahead to assert that the string doesn't contain a word, like so:
^(?!.*?(?:youtube|some other bad word|some\.string\.with\.dots))
Before we make the regex look too complex by concatenating it with yours, let;s see what we would do if you wanted to match some word characters \w+ but not youtube or google, you would write:
^(?!.*?(?:youtube|google))\w+
As you can see, after the assertion (where we say what we don't want), we say what we do want by using the \w+
In your case, let's add a negative lookahead to your initial regex (which I have not tuned):
$reg_exUrl = "%(?i)\b(?!.*?(?:youtu\.?be|vimeo))(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";
I took the liberty of making the regex case insensitive with (?i). You could also have added i to your s modifier at the end. The youtu\.?be expression allows for an optional dot.
I am certain you can apply this recipe to your expression and other regexes in the future.
Reference
Regex lookarounds
StackOverflow regex FAQ

Match regular expression without slash in URL-like string

I have the following regular expression #^en/cities/(.*?)/$# and I want it to match anything, but stop at the first slash.
I.e. it should match 'paris' but not 'paris/france' if someone manages to type that URL.
I thought I already had it as non-greedy with the questionmark, but apparently not.
Use a negated character class.
#^en/cities/([^/]+)/$#
In your regex the non-greediness failed because you anchored the last slash to the end ($). Removing that would also make your regex work:
#^en/cities/.*?/#

How can I make this URL validation regular expression less greedy?

So I have the following regular expression:
https?://(www\.)?flickr\.com/photos/(.+)/?
To match against the following URL:
http://www.flickr.com/photos/username/
How can I stop the final forward slash (/) from being included in the username sub-pattern (.+)?
I have tried:
https?://(www\.)?flickr\.com/photos/(.+?)/?
But then it only matches the first letter of the username.
https?://(?:www\.)?flickr\.com/photos/([^/]+)/?
I added ?: to the first group so it's not capturing, then used [^/] instead of the dot in the last match. This assures you that everything between "photos/" and the very next "/" is captured.
If you need to capture the first www just use this:
https?://(www\.)?flickr\.com/photos/([^/]+)/?
You need to make sure it doesn't match the forward slash:
https?://(?:www\.)?flickr\.com/photos/([^/]+)/?
You could also make the regex lazy (which is what I guess you were doing with the (.+?) syntax), but the above will work just fine
Change (.+) to ([^/]+). This will match until it encounters a /, so you might want to throw some other stuff in the class too.
There are generally two ways to do this:
Append a question mark, to make the matching non-greedy. .* will match as much as possible, .*? will match as little as possible.
Exclude the character you want to match next. If you want to stop on /, use [^/]*.
If you know there will be a trailing slash, take out the final ?.

Categories