regex explode url parts - php

// repeat /?([^/)? 10 times
preg_match("`/?([^/]+)/?([^/]+)?/?([^/]+)?/?([^/]+)?/?([^/]+)?/?/?([^/]+)/?([^/]+)?/?([^/]+)?/?([^/]+)?/?([^/]+)?/?`");
This is the regex that I use to parse request uri without query. It can capture up to 10 url parts. That's not optimal nor good. Current workaround is to put more than enough /?([^/]+) and hope that no one will exceed that limit.
Is there a regex that can capture unlimited url parts? Basically I need regex that does this: explode('/', $url).
Caveat: Can use only preg_match!
No explode, preg_match_all, g,

In comments you shown the router compenent you are using: Seriously Simple Router.
The solution is to access that sort of information in the controller, not in the router. Following example shows how you can do it:
// Wrap the whole url into the first capturing group to
// forward it to the controller
Router::route('(a/b/c/([\d]+))', function($url, $id) {
var_dump(explode('/', $url));
});
Router::execute($_SERVER['REQUEST_URI']);

Related

Regex solution to find a regex pattern and parse it.

I am trying to write a simple router for PHP. And I am facing some problem. Example of the routes are as follows.
$route = []
$route['index'] = "/";
$route['home'] = "/home";
$route['blog'] = "/blog/[a-z]";
$route['article'] = "/article/id/[\d+]/title/[\w+]";
Now if we take the last example, I would like the regex only to look for patterns such as [\d+] and [\w+] that is it. I will use explode() to actually cross check if URL contains /blog/, /id/ and /title/. I don't want regex's help with that, but only to detect the patterns and match it.
for example. If a given $URL was dev.test/blog/id/11/title/politics
I would need some like: preg_match($route['url'], $URL)
So, now the preg_match() function knows, that after "/article/id/ there is a pattern asking only for a digit to occur, then if the digit is found it will continue parsing, or else it will show fail or 0.
I don't know much about regex to handle this complex problem.
Your question is a little unclear, but if you want only to capture the [\d+] or [\w+] parts of the target string, you should consider using brackets to capture sub-matches, and the (?:xxx) non-capturing match, which checks for the pattern but does not add it to the array, something like:
$route['article'] = "(?:\/article\/id\/)([\d+])(?:\/title\/)([\w+])";
This will add the matched [\d+] and [\w+] to your matches array only. You'll find them like so:
$matches[0][0] and matches[1][0].
See http://www.regular-expressions.info/tutorial.html for an outstanding tutorial on regexes, by the way.
If you aren't sure of the values of 'article', 'id', and 'title' in advance, then you will probably at least need to be sure of the number of directories given in the url. That means as long as you know the position of the [\d+] and [\w+] entries, you could use
$route['article'] = "(?:\/[\w+]\/[w+]\/)([\d+])(?:\/[\w+]\/)([\w+])"

Extract URL parameters with regex - repeating a capture group

I'm attempting to extract the URL parameters via regex and am sooo close to getting it to work. I even know what the problem is: my regex is stumbling on repeated capture groups. But I simply cannot figure out how to fix it.
Language is PHP.
My URL looks something like the one below. It can have no parameters, just one or multiple:
member.php?action=bla&arg=2&test=15&schedule=16
My regex looks like this:
member\.php((?:[\?|&](\w*)=(\w*))*)
And my capture groups end up being:
1. action=bla&arg=2&test=15&schedule=16
2. schedule
3. 16
I cannot figure out how to capture all the parameters individually. Will I just have to settle for the first capture group and explode it myself? It would be much more elegant for my purposes if I can do all the work inside one regex.
try:
<?php
$str="member.php?action=bla&arg=2&test=15&schedule=16#test";
preg_match_all('/([^?&=#]+)=([^&#]*)/',$str,$m);
print_r($m);
//combine the keys and values onto an assoc array
$data=array_combine( $m[1], $m[2]);
print_r($data);
?>
Have you tried parse_url and parse_str ?
Extract parameters and their values with-> &[\w]*=[\d]*

Regex in preg_replace to detect url format and extract elements

I need to replace certain user-entered URLs with embedded flash objects...and I'm having trouble with a regex that I'm using to match the url...I think mainly because the URLs are SEO-friendly and therefore a bit more difficult to parse
URL structure: http://www.site.com/item/item_title_that_can_include_1('_etc-32CHARACTERALPHANUMERICGUID
I need to both detect a match of an URL in that format and capture the 32CHARACTERALPHANUMERICGUID which is always placed after the - in the url
something like this:
$ret = preg_replace('#http://www\.site\.com/item/([^-])-([a-zA-Z0-9]+)#','<embed>itemid=$2</embed>', $ret);
For some reason, the above does not find a match for an URL in the specified format. I'm new to regexes, so I think I'm missing something fairly obvious.
You should check out parse_url().
Examine the results - it was made for parsing URLs. You'll be able to extract the data you require from the tokens returned.
If you are regex crazy, try this...
/^http:\/\/www\.site\.com\/item\/[^-]*\-([a-zA-Z0-9]{32})$/
Your example is almost there, but...
When you do the not character range, i.e. [^-], you still need a quantifier. I placed *, or 0 or more.
You don't seem to use the item title, so we won't bother capturing it.
You should use beginning (^) and end ($) anchors if the string is always exactly like that.
You say the GUID is 32 chars, so we may as well explicitly state that with the {32} quantifier.

PHP Regex on URL - split into variables

I am trying to implement a php script which will run on every call to my site, look for a certain pattern of URL, then explode the URL and perform a redirect.
Basically I want to run this on a new CMS to catch all incoming links from the old CMS, and redirect, based on mapping, say an article id stripped form the URL to the same article ID imported into the new CMS's DB.
I can do the implementation, the redirect etc, but I am lost on the regex.
I need to catch any occurrences of:
domain.com/content/view/*/34/ or domain.com/content/view/*/30/ (where * is a wildcard) and capture * and the 30 or 34 in a variable which I will then use in a DB query.
If the following is encountered:
domain.com/content/view/*/34/1/*/
I need to capture the first * and the second *.
Be very grateful for anyone who can give me a hand on this.
I'm not sure regular expressions are the way to go. I think it would probably be easier to use explode ('/' , $url) and check by looping over that array.
Here are the steps I would follow:
$url = parse_url($url, PHP_URL_PATH);
$url = trim($url, '/');
$parts = explode ('/' , $url);
Then you can check if
($parts[0]=='content' && $parts[1]=='view' && $parts[3]=='34')
You can also easily get the information you want with $parts[2].
It's actually very simple, a more flexible and straightforward approach is to explode() the url into an array called something like $segments, and then test on there. If you have a very small number of expected URLs, then this kind of approach is probably easier to maintain and to read.
I wouldn't recommend doing this in the htaccess file because of the performance overhead.
First, I would use the PHP function parse_url() to get the path, devoid of any protocol or hostname.
Once you have that the following code should get you the info you need.
<?php
$url = 'http://domain.com/content/view/*/34/'; // first example
$url = 'http://domain.com/content/view/*/34/1/*/'; // second example
$url_array = parse_url($url);
$path = $url_array['path'];
// Match the URL against regular expressions
if (preg_match('/content\/view\/([^\/]+)\/([0-9]+)\//i', $path, $matches)){
print_r($matches);
}
if (preg_match('/content\/view\/([^\/]+)\/([0-9]+)\/([0-9]+)\/([^\/]+)/i', $path, $matches)){
print_r($matches);
}
?>
([^/]+) matches any sequence of characters except a forward slash
([0-9]+) matches any sequence of numbers
Though you can probably write a single regular expression to match most URL variants, consider using multiple regular expressions to check for different types of URLs. Depending on how much traffic you get, the speed hit won't be all that terrible.
Also, I recommend reading Mastering Regular Expressions by O'reilly. A good knowledge of regular expressions will come in handy quite often.
http://www.regular-expressions.info/php.html

Regular expression to extract from URI

I need a regular expression to extract from two types of URIs
http://example.com/path/to/page/?filter
http://example.com/path/to/?filter
Basically, in both cases I need to somehow isolate and return
/path/to
and
?filter
That is, both /path/to and filter is arbitrary. So I suppose I need 2 regular expressions for this? I am doing this in PHP but if someone could help me out with the regular expressions I can figure out the rest. Thanks for your time :)
EDIT: So just want to clearify, if for example
http://example.com/help/faq/?sort=latest
I want to get /help/faq and ?sort=latest
Another example
http://example.com/site/users/all/page/?filter=none&status=2
I want to get /site/users/all and ?filter=none&status=2. Note that I do not want to get the page!
Using parse_url might be easier and have fewer side-effects then regex:
$querystring = parse_url($url, PHP_URL_QUERY);
$path = parse_url($var, PHP_URL_PATH);
You could then use explode on the path to get the first two segments:
$segments = explode("/", $path);
Try this:
^http://[^/?#]+/([^/?#]+/[^/?#]+)[^?#]*\?([^#]*)
This will get you the first two URL path segments and query.
not tested but:
^https?://[^ /]+[^ ?]+.*
which should match http and https url with or without path, the second argument should match until the ? (from the ?filter for instance) and the .* any char except the \n.
Have you considered using explode() instead (http://nl2.php.net/manual/en/function.explode.php) ? The task seems simple enough for it. You would need 2 calls (one for the / and one for the ?) but it should be quite simple once you did that.

Categories