Need a regular expression to capture url path - php

I am using PHP, and I have been trying to create a regular expression pattern to capture part of URL path, but to no avail.
The possible URL path could be any of these:
"product/zzz"
"yyyyyyyy/product/zzz"
"xxxxx/yyyyyyyy/product/zzz"
"xxxxx/yyyyyyyy/.../product/zzz" (... means other possible words)
what I need to capture is the part before "product".
for the first case, the result should be an empty string.
for the rest, they are "yyyyyyyy", "xxxxx/yyyyyyyy" and "xxxxx/yyyyyyyy/..."
Can anyone here give me hint? thanks!
PS.
It looks like the part I wanted is a repetition of same pattern "xxxx/". but I am not good at using group of regex.
Update:
I probably found a solution, by capturing pattern "xxx/" with zero or more repetitions: "([^/]+/)*"
so the full regex should be "(([^/]+/)*)product/([^/]+)"
#SERPRO: it passed the test in your "Live RegExp".
Hope it is helpful.

I would use parse_url():
$path = parse_url($url, PHP_URL_PATH);
// Deal with $path to figure out what's after '/product/'

This should work for you:
#(.*?)/?product.*\b#
You can see an example of result strings here:
http://xrg.es/#5awa10

This should do it:
^(.*[^/]|)/*product/[^/]+/*$
It will also allow an arbitrary number of slashes at the end of the path.
The part inside parentheses is your result.

Related

help with a regex code

i have this regex code
/^(https?:\/\/+[\w\-]+\.[\w\-]+)/i
it works but there is a problem
you NEED http:// in the url for it to validate, and what i am making, the user will not want to add http:// to the url they want to just have example.com, if its possible i need it to work weather it has http:// or not
i don't know how to make my own regex, and ive searched but cannot find a one that does what i need, unless im just not looking in the right place. (Google :P)
Don't bother with regex. Use parse_url function.
You can just make it optional
/^((?:https?:\/\/+)?[\w\-]+\.[\w\-]+)/i
The (?:) around the part you don't want to have is a non capturing group, the ? afterwards makes it optional.
I'm not sure what the + after the second slash is good for, it says at least one of the preceding character. That means it allows also stuff like http://////////.
I hope you are aware, that this regex is far from matching valid URLs.
For example it will match stuff like
http://////////------------.-
or at least
http://N.O
^ after this position you can write what you want and it will match valid.
Here on Regexr you can see what your regex is matching.
See Purple Coder's answer for a probably better solution.
/^((https?:\/\/+)?[\w-]+.[\w-]+)/i
I'm using this :
// Validate that the string contains at least a dot .
var filterWebsite = /^([a-zA-Z0-9:_\.\-/])+\.([a-zA-Z0-9_\.\-/])+$/;

Simple PHP Regex

I am setting up a Zend_Route (but it is still just a regex) and I wish to match a url like
/en/experience/this-is-my-name-and-the-last-is-1-of-id-123456.html
So I want to grab the
this-is-my-name-and-the-last-is-1-of
and the
123456
I tried
\w{2}/experience/(.+)?-(\d+)\.html
but that doesn't seem to work.
It would be easy if the other way around e.g. if it was id the name
/en/experience/123456-this-is-my-name-and-the-last-is-1-of-id.html
I could use
\w{2}/experience/(\d+)-(.+)\.html
But that is a cop out - so any advice on how to match original format?
Try this one:
/\w{2}/experience/(.+?)-(\d+)\.html
try this:
/\w{2}/experience/(.+)?-(\d+)\.html
zend route internally does this:
preg_match('#^/\w{2}/experience/(.+)?-(\d+)\.html$#i', '/en/experience/this-is-my-name-and-the-last-is-1-of-id-123456.html', $matches);
so, your pattern only matches with a slash on the beginning.

How do I modify this regex to validate URIs with empty parameter value in querystring?

I'm using this code to validate URIs in php:
preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $uri)
However, this won't pass for URIs that end with a equals sign.
e.g. http://example.com?query=fish&offset=10 returns true, http://example.com?query=fish&offset= doesn't.
I can't see why this should be the case from the regex as it allows all characters following the ? sign.
Any tips?
Thanks,
Chris
Why don't you use filter_var? ;)
Your RegEx isn't working as you anticipate.
Your second group (.[a-z0-9-]+)* is capturing EVERYTHING past http://e. However, it requires that there are at least 2 characters to work, and since it's greedy, it will capture as much as it possibly can.
Try this instead:
^http(s)?://[a-z0-9-]+\.[a-z0-9-]+(\.[a-z0-9-]+)?(/[-a-z0-9=?&/]*)?$
If need be, change the last capturing group to include any characters you might need to include in your query string or URI.

PHP if string contains URL isolate it

In PHP, I need to be able to figure out if a string contains a URL. If there is a URL, I need to isolate it as another separate string.
For example: "SESAC showin the Love! http://twitpic.com/1uk7fi"
I need to be able to isolate the URL in that string into a new string. At the same time the URL needs to be kept intact in the original string. Follow?
I know this is probably really simple but it's killing me.
Something like
preg_match('/[a-zA-Z]+:\/\/[0-9a-zA-Z;.\/?:#=_#&%~,+$]+/', $string, $matches);
$matches[0] will hold the result.
(Note: this regex is certainly not RFC compliant; it may fetch malformed (per the spec) URLs. See http://www.faqs.org/rfcs/rfc1738.html).
this doesn't account for dashes -. needed to add -
preg_match('/[a-zA-Z]+:\/\/[0-9a-zA-Z;.\/\-?:#=_#&%~,+$]+/', $_POST['string'], $matches);
URLs can't contain spaces, so...
\b(?:https?|ftp)://\S+
Should match any URL-like thing in a string.
The above is the pure regex. PHP preg_* and string escaping rules apply before you can use it.
$test = "SESAC showin the Love! http://twitpic.com/1uk7fi";
$myURL= strstr ($test, "http");
echo $myURL; // prints http://twitpic.com/1uk7fi

Regular expression to extract from URI

I need a regular expression to extract from two types of URIs
http://example.com/path/to/page/?filter
http://example.com/path/to/?filter
Basically, in both cases I need to somehow isolate and return
/path/to
and
?filter
That is, both /path/to and filter is arbitrary. So I suppose I need 2 regular expressions for this? I am doing this in PHP but if someone could help me out with the regular expressions I can figure out the rest. Thanks for your time :)
EDIT: So just want to clearify, if for example
http://example.com/help/faq/?sort=latest
I want to get /help/faq and ?sort=latest
Another example
http://example.com/site/users/all/page/?filter=none&status=2
I want to get /site/users/all and ?filter=none&status=2. Note that I do not want to get the page!
Using parse_url might be easier and have fewer side-effects then regex:
$querystring = parse_url($url, PHP_URL_QUERY);
$path = parse_url($var, PHP_URL_PATH);
You could then use explode on the path to get the first two segments:
$segments = explode("/", $path);
Try this:
^http://[^/?#]+/([^/?#]+/[^/?#]+)[^?#]*\?([^#]*)
This will get you the first two URL path segments and query.
not tested but:
^https?://[^ /]+[^ ?]+.*
which should match http and https url with or without path, the second argument should match until the ? (from the ?filter for instance) and the .* any char except the \n.
Have you considered using explode() instead (http://nl2.php.net/manual/en/function.explode.php) ? The task seems simple enough for it. You would need 2 calls (one for the / and one for the ?) but it should be quite simple once you did that.

Categories