PHP Regex to match url contains url fragment - php

I have one url fragment: page/login and i need to know if another url fragment contains them.
These, will match:
/admin/page/login/
/admin/page/login
admin/page/login
http://www.dot.com/admin/page/login
/admin/page/login?id=10
/admin/page/login/id/10
/admin/page/login/?id=10
/admin/page/login/user?id=10
/admin/page/login/user/?id=10
page/login
page/login/
page/login/id/10
/page/login/id/10
And these not:
/admin/firstpage/login
admin/page/loginOk
/admin/page/loginOk/id/10
mypage/login/id/10
/mypage/login/id/10
mypage/login
I tried: page\/login[\/\s\?], \/?page\/login[\/\s\?] without any result

You can use a word boundary so partial matches aren't matched.
\bpage\/login[\/\s?]
Demo: https://regex101.com/r/yhNsdw/1/
Also if you change your delimiter none of the forward slashes will need to be escaped.

Related

PHP preg_replace_callback match string but exclude urls

What I'm trying to do is find all the matches within a content block, but ignore anything that is inside tags, for use inside preg_replace_callback().
For example:
test
test title
test
In this case, I want the first line to match, and the third line to match, but NOT the url match, nor the title match in between the a tags.
I've got a regex that I feel like is close:
#(?!<.*?)(\btest\b)(?![^<>]*?>)#si
(and this will not match the url part)
But how do I modify the regex to also exclude the "test" between a and /a?
If it's always the same pattern you can use [A-Z] or a combination like [A-Za-z]
I ended up solving it myself. This regex pattern will do what I wanted:
#(?!<a[^>]*?>)(\btest\b)(?![^<]*?<\/a>)#si

preg_match an url without the sublinks

I have an expression which I only want to find /settings and not if it contains a sublink like /settings/test1
Right now my expression only finds the one which contain a sublink, but not the one I want one.
^\/settings\/
/settings
/settings/test1
/settings/test2
http://www.phpliveregex.com/p/jPr
You match sublinks because your pattern contains / at the end. You need to remove the / and anchor the pattern at the end of the string with the $ anchor:
Use
^\/settings$
See the regex demo

How do I extract one group from a URL using regex for use in a redirect?

I've read the Best RegEx Trick Ever and tried to wrap my head around the other answers here on Stack Exchange and just can't seem to get it right. Take these three strings:
http://www.test.com/newyork/class-schedule
http://www.test.com/location/newyork/class-schedule
http://www.test.com/location/newyork/training
I need a regex that will extract the newyork from the first string and save it for a replace later, but will NOT match any part of the other strings. Also, for obscure reasons, I can not include http://www.test.com as a condition for matching (so I can't use anything before the slash that precedes newyork). Note that in this scenario, newyork could easily be chicago, atlanta, or any other city name with no spaces or punctuation.
The only thing I've been able to figure out that isolates only newyork in the first string is the following:
/.*\.com\/(.[^\/]*)\/class-schedule/g
However, this relies on using the URL first which I can't use.
Any ideas on how to achieve this WITHOUT using the URL?
[EDIT]
To clarify what I'm looking for, I'm trying to take the results from the first string and add "location" to it, still using regex. So:
http://www.test.com/newyork/class-schedule
would become
http://www.test.com/location/newyork/class-schedule
using something like
http://www.test.com/location/$1/class-schedule
Try this: ~/(\w+)/[-a-z]+?/?(?:\?.*?)*(:?\s|$)~gm
See it working here: https://regex101.com/r/4VMazZ/3.
So it will use the end of URL instead of the beginning and match only the word between slash 2 and 3 from the end. There can be a query string it will still work.
[EDIT 1]
I exchanged 2 chars doing typo in the end so it was capturing one extra group: /(\w+)/[-a-z]+?/?(?:\?.*?)*(?:\s|$). here: https://regex101.com/r/4VMazZ/4
If you use preg_match($pattern, $string, $matches); the result you want (newyork) will be in $matches[1];, $matches[0] contains everything.
You can see the captures in 'MATCH INFORMATION' panel on regex101 in my example!
[EDIT 2] after your comment.
If you want to replace the whole url you have to match the whole URL, something like this: .*?/(\w+)/[-a-z]+?/?(?:\?.*?)*(?:\s|$) will do in this example. See it working here: https://regex101.com/r/4VMazZ/5
[EDIT 3] Add capturing of last part for replacement.
So as you want to reuse last part you need to add capturing parenthesis: .*?/(\w+)/([-a-z]+?)/?(?:\?.*?)*(?:\s|$).
See it working here: https://regex101.com/r/4VMazZ/6
Could this work? See it here.
(?<=location\/|\.\w{3}\/|\.\w{2}\/)(?!location).*?(?=\/|$)
It matches everything following .xxx/ or .xx/ or location/. I don't know if one letter domain exist, in this case, you can add |\.\w\/ to the lookahead at the start of the regex.
(?<=location\/|\.\w{3}\/|\.\w{2}\/) is a lookahead, so it matches the following pattern only if preceded by location/ or .xxx or .xx
.*? matches every character (lazy)
(?=\/|$) end match if next character is / or on line end
Note: If location is counted as part of the url, I don't think what you are asking is possible in regex, as the city name could be anywhere in string. If so, then you could have a list of cities and check what part of the url matches one of them.
EDIT: You need the multiline m flag so $ also matches end of line

Including a literal string in the regex

Current URLs:
http://domain.com/index.php?route=common/home
http://domain.com/index.php?route=account/register
http://domain.com/index.php?route=checkout/cart
http://domain.com/index.php?route=checkout/checkout
Desired URLs:
http://domain.com/home
http://domain.com/register
http://domain.com/cart
http://domain.com/checkout
Regex:
(?=\=)(.*?)(?<=\/).+$
... almost works, but it matches (for example the last URL) =checkout/ whereas I need it to match index.php?route= as well so I can remove the whole index.php?route=checkout/ from the URL.
I tried index.php?route=(?=\=)(.*?)(?<=\/).+$ but ofcourse it doesn't work.
To remove the desired part, you should substitute:
index.php\?route\=[^\/]*\/
with an empty string.
To be more strict and precise, you should use lookbehind:
(?<=http:\/\/domain.com\/)index.php\?route\=[^\/]*\/
Check the regex here: https://regex101.com/r/sY7aV6/1

Rapidshare URL not matching correctly

I'm trying to make sure that a Rapidshare URL is valid when a user submits it through my form.
This is the regex that I've come up with so far:
http://rapidshare.com/files/[0-9]+/[a-zA-Z0-9\._-]+
A rapidshare link looks like this:
http://rapidshare.com/files/168501977/some_random-file.zip
My pattern matches, but not entirely correctly. For example, if we use this input:
http://rapidshare.com/files/168501977/some_random-file.zipĀ£%^$
It will still match using the PHP function preg_match(), and let it go through, even though there are illegal symbols on the end of the URL. I want the pattern to match the entire input, and not just a random length that matches.
Any help would be appreciated, cheers!
You need to anchor the regex pattern. Use ^ to anchor the beginning and $ to anchor the end. So the pattern becomes:
^http://rapidshare.com/files/[0-9]+/[a-zA-Z0-9\._-]+$
This prevents a partial match of the string like the example is generating.
Validate the start and the end of your string using ^ and $. Example:
^ht{2}p:\/{2}rapidshare\.com\/files\/\d+\/[\.a-zA-Z_-]+$

Categories