Including a literal string in the regex - php

Current URLs:
http://domain.com/index.php?route=common/home
http://domain.com/index.php?route=account/register
http://domain.com/index.php?route=checkout/cart
http://domain.com/index.php?route=checkout/checkout
Desired URLs:
http://domain.com/home
http://domain.com/register
http://domain.com/cart
http://domain.com/checkout
Regex:
(?=\=)(.*?)(?<=\/).+$
... almost works, but it matches (for example the last URL) =checkout/ whereas I need it to match index.php?route= as well so I can remove the whole index.php?route=checkout/ from the URL.
I tried index.php?route=(?=\=)(.*?)(?<=\/).+$ but ofcourse it doesn't work.

To remove the desired part, you should substitute:
index.php\?route\=[^\/]*\/
with an empty string.
To be more strict and precise, you should use lookbehind:
(?<=http:\/\/domain.com\/)index.php\?route\=[^\/]*\/
Check the regex here: https://regex101.com/r/sY7aV6/1

Related

PHP preg_replace_callback match string but exclude urls

What I'm trying to do is find all the matches within a content block, but ignore anything that is inside tags, for use inside preg_replace_callback().
For example:
test
test title
test
In this case, I want the first line to match, and the third line to match, but NOT the url match, nor the title match in between the a tags.
I've got a regex that I feel like is close:
#(?!<.*?)(\btest\b)(?![^<>]*?>)#si
(and this will not match the url part)
But how do I modify the regex to also exclude the "test" between a and /a?
If it's always the same pattern you can use [A-Z] or a combination like [A-Za-z]
I ended up solving it myself. This regex pattern will do what I wanted:
#(?!<a[^>]*?>)(\btest\b)(?![^<]*?<\/a>)#si

PHP regex last occurrence of words

My string is: /var/www/domain.com/public_html/foo/bar/folder/another/..
I want to remove the root folder from this string, to get only public folder, because some servers have multiple websites inside.
My actual regex is: /^(.*?)(www|public_html|public|html)/s
My actual result is: /domain.com/public_html/foo/bar/folder/another/..
But i want to remove the last ocorrence, and get somethig like this: /foo/bar/folder/another/..
Thanks!
You have to use a greedy quantifier and to check if the alternative is enclosed between slashes using lookarounds:
/^.*(?<![^\/])(?:www|public(?:_html)?|html)(?![^\/])/
About the lookarounds: I use negative lookarounds with a negated character class to check if there is a slash or the limit of the string at the same time. This way you are sure that for instance html is a folder and not the part of another folder name.
I removed the s modifier that is useless. I removed the capture groups too since the goal is to replace all with an empty string.
The ? makes your expression non-greedy which is not actually what you want here. Try:
^(.*)(www|public_html|public|html)
which should keep going until the last match.
Demo: https://regex101.com/r/v5WbB3/1/

PHP Regex to match url contains url fragment

I have one url fragment: page/login and i need to know if another url fragment contains them.
These, will match:
/admin/page/login/
/admin/page/login
admin/page/login
http://www.dot.com/admin/page/login
/admin/page/login?id=10
/admin/page/login/id/10
/admin/page/login/?id=10
/admin/page/login/user?id=10
/admin/page/login/user/?id=10
page/login
page/login/
page/login/id/10
/page/login/id/10
And these not:
/admin/firstpage/login
admin/page/loginOk
/admin/page/loginOk/id/10
mypage/login/id/10
/mypage/login/id/10
mypage/login
I tried: page\/login[\/\s\?], \/?page\/login[\/\s\?] without any result
You can use a word boundary so partial matches aren't matched.
\bpage\/login[\/\s?]
Demo: https://regex101.com/r/yhNsdw/1/
Also if you change your delimiter none of the forward slashes will need to be escaped.

Regular expression to replace all url from string but skip one

I have regular expression that's is removing all url from a string but I want to change this and add exception for my site link.
$url = 'This is url for example to remove www.somewbsite.com but i want to skip removing this url www.mywebsite.com';
$no_url = preg_replace("/(https|http|ftp)\:\/\/|([a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]{2,4})|([a-z0-9A-Z]+\.[a-zA-Z]{2,4})|\?([a-zA-Z0-9]+[\&\=\#a-z]+)/i", "★", $url);
First of all, since you are replacing with a hard-coded symbol, and you are using a case-insensitive modifier, your regex can be reduced to
'~(?:https?|ftp)://|(?:[a-z0-9]+\.)?[a-z0-9]+\.[a-z]{2,4}|\?[a-z0-9]+[&=#a-z]+~i'
whatever it means to match. Note that 2 alternatives here were too similar ([a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]{2,4})|([a-z0-9A-Z]+\.[a-zA-Z]{2,4}), they are merged into 1 with the help of an optional non-capturing group ((?:[a-z0-9]+\.)?).
Now, if you want to avoid matching a specific pattern, you may use a SKIP-FAIL technique: match what you want to preserve and skip it.
'~www\.mywebsite\.com(*SKIP)(*FAIL)|(?:https?|ftp)://|(?:[a-z0-9]+\.)?[a-z0-9]+\.[a-z]{2,4}|\?[a-z0-9]+[&=#a-z]+~i'
See this regex demo.

Regex pattern to match any character except the last one

I am trying to match a string using two different patterns to work together.
My source string is something like this:
Text, white-spaces, new lines and more text then ^^^^<customtag>
I need to get a group (the second one) that would capture one caret or none then a formatted HTML-like tag. So the first group would capture anything else.
It means that the string above should output this:
(Group 1)Text, white-spaces, new lines and more text then ^^^
(Group 2)^<customtag>
In the source string carets may be one, none or up to two thousands.
I need a good pattern that matches all those carets except the last one.
The code below is what I tried.
preg_match_all('/([\s\S]*\^*)(\^?<\w+>)$/', $string, $matches);
Please note: I used [\s\S] instead of the dot to match any character as well as white-spaces and new lines too.
You may follow the below regex:
(?s)(.*)((\^|(?<!\^))<[^>]+>)
Live demo
PHP code:
preg_match_all('/(?s)(.*)((\^|(?<!\^))<[^>]+>)/', $string, $matches);
You can use as this:
preg_match_all('/(.*)((\^<[^>]*>)|([^\^]<[^>]*>))$/', $string, $matches);
See it working here: http://regexr.com?383g9
In this other link it is working fine: http://regex101.com/r/eQ3vV7

Categories