Regex for parsing URL - php

I am working on parsing urls in the form of http://site.com/page/var1/var2 and I am using the following regex to do so:
([^/]+)
Now I understand that this takes all values that are not / but I want to also be able to take variables with escaped slashes in them. When I paste these into my browser they end up looking like this: http://site.com/page/var1//stillvar1/var2 which is equal to var1/stillvar1 and var2. My question is bascially how can I modify this regex equation to catch all values which are not / unless / is followed by a slash.
Hopefully I'm being clear.
Thanks in advance!

How about the following regex?
((//|[^/])+)

Dont clearly understand your requirement but as far as i understand try this (tested in robular)
^\w+\:\/{2}(\w+\.)+\w+(\/{1}\w+)+$

Related

PHP: How would I remove parts of a string between 2 chunks of characters without removing too much?

This problem is driving me nuts. Let's say I have a string:
This is a &start;pretty bad&end; string that I want to &start;somehow&end; display differently
I want to be able to remove the &start; and &end; parts as well as everything in between so it says:
This is a string that I want to display differently
I tried using preg_replace with a regular expression but it took off too much, ie:
This is a display differently
The question is: how do I remove the stuff just between sets of &start; and &end; pairs and make sure that it doesn't remove anything between any &end; and &start; segments?
Keep in mind, I'm working with hundreds of strings that are very different to each other so I'm looking for a flexible solution that'll work with all of them.
Thanks in advance for any help with this.
Edit: Replaced dollar signs with ampersands. Oops!
Try this regex /\&start;(.+?)\$end;/g
It looks like it works as desired: https://regex101.com/r/MW5nom/2
I quickly tried it on chrome console using JS, tried converting it into PHP:
"This is a &start;pretty bad$end; string that I want to &start;somehow$end; display differently".replace(/\&start;(.+?)\$end;/g, "")

PHP preg_match , check if language is defined in url

I would like to test for a language match in a url.
Url will be like : http://www.domainname.com/en/#m=4&guid=%some_param%
I want to check if there is an existing language code within the url. I was thinking something between these lines :
^(.*:)\/\/([a-z\-.]+)(:[0-9]+)?(.*)$
or
^(http|https:)\/\/([a-z\-.]+)(:[0-9]+)?(.*)$
I'm not that sharp with regex. can anyone help or point me towards the right direction ?
[https]+://[a-z-]+.([a-z])+/
try this,
http://www.regexr.com/ this is a easy site for creating regex
If you know the data you are testing is a url then I would not bother adding all of the url parts to the regex. Keep it simple like: /\/[a-z]{2}\// That looks for a two letter combination between two forward slashes. If you need to capture the language code then wrap it in parentheses: /\/([a-z]{2})\//

Regular expression for a redirect

I'm trying to write a regular expression for a redirect and not having any luck. In this example, an old URL might exist like this:
example.com/about-us/Default.asp
example.com/the-team/Default.asp
Which I want to redirect to:
example.com/about-us/
example.com/the-team/
I've come up with this:
/(\d*)/Default.asp
Which doesn't work...
I've also tried this:
/(\d*)/Default\.asp
As I thought there might be a problem with not having an escape char for the '.', still no luck. Can anyone see what I'm doing wrong?
Got it working thanks to what minitech pointed out:
/(.*)/Default.asp$
worked a treat! Thanks
Since you only need to remove the "Default.asp", you only have to search for that. The regex would look something like this
/Default\.asp/
The dot being escaped since the dot is a special character.
If you're using php, you can do a simple preg_replace
preg_replace('/Default\.asp/', '', 'example.com/about-us/Default.asp');

Can't use OR( | ) in php Regular expression

I'm a newbie here. I'm facing a weird problem in using regex in PHP.
$result = "some very long long string with different kind of links";
$regex='/<.*?href.*?="(.*?net.*?)"/'; //this is the regex rule
preg_match_all($regex,$result,$parts);
Here in this code I'm trying to get the links from the result string. But it will provide me only those links which contains .net. But I also want to get those links which have .com. For this I tried this code
$regex='/<.*?href.*?="(.*?net|com.*?)"/';
But it shows nothing.
SOrry for my bad English.
Thanks in advance.
Update 1 :
now i'm using this
$regex='/<.*?href.*?="(.*?)"/';
this rule grab all the links from the string. But this is not perfect. Because it also grabs other substrings like "javascript".
The | character applies to everything within the capturing group, so (.*?net|com.*?) will match either .*?net or com.*?, I think what you want is (.*?(net|com).*?).
If you do not want the extra capturing group, you can use (.*?(?:net|com).*?).
You could also use (.*?net.*?|.*?com.*?), but this is not recommended because of the unnecessary repetition.
Your regex gets interpreted as .*?net or com.*?. You'll want (.*?(net|com).*?).
Try this:
$regex='/<.*?href.*?="(.*?\.(?:net|com)\b.*?)"/i';
or better:
$regex='/<a .*?href\s*+=\s*+"\K.*?\.(?:net|com)\b[^"]*+/i';
<.*?href
is a problem. This will match from the first < on the current line to the first href, regardless of whether they belong to the same tag.
Generally, it's unwise to try and parse HTML with regexes; if you absolutely insist on doing that, at least be a bit more specific (but still not perfect):
$regex='/<[^<>]*href[^<>=]*="(?:[^"]*(net|com)[^"]*)"/';

Need help with regular expressions - URL redirection

I'm trying to redirect an easy to remember url to a php file but I'm having some trouble with the regex.
Here's what I have at the moment:
RewriteRule ^tcb/([a-zA-Z0-9]{1,})/([a-zA-Z0-9]{1,})/([a-zA-Z0-9]{1,}) /tcb/lerbd.php?autocarro=$1&tipo=$2&dsd=$3
It is working but only if I supply all 3 arguments. I want the last two arguments to be optional so it either works with only the first or all three. I'm hoping you can help me with this.
Thank you very much.
Adding a ? after something in a RegEx makes it optional. so something like:
RewriteRule ^tcb/([a-zA-Z0-9]{1,})/?(([a-zA-Z0-9]{1,})/([a-zA-Z0-9]{1,}))? /tcb/lerbd.php?autocarro=$1&tipo=$3&dsd=$4
Notice I introduced a new grouping around the 2nd and 3rd arguments, so the backreferences had to be shifted.
You might also want to put an optional / at the end, too, so it can be used just as if it pointed to a directory...
Here's how I solved the problem. It could be helpful for someone who might stumble upon this question:
RewriteRule ^tcb/([a-zA-Z0-9]{1,})/?(([a-zA-Z0-9]{1,})/([a-zA-Z0-9]{1,}))?$ /tcb/lerbd.php?autocarro=$1&tipo=$3&dsd=$4

Categories