I am trying to implement a php script which will run on every call to my site, look for a certain pattern of URL, then explode the URL and perform a redirect.
Basically I want to run this on a new CMS to catch all incoming links from the old CMS, and redirect, based on mapping, say an article id stripped form the URL to the same article ID imported into the new CMS's DB.
I can do the implementation, the redirect etc, but I am lost on the regex.
I need to catch any occurrences of:
domain.com/content/view/*/34/ or domain.com/content/view/*/30/ (where * is a wildcard) and capture * and the 30 or 34 in a variable which I will then use in a DB query.
If the following is encountered:
domain.com/content/view/*/34/1/*/
I need to capture the first * and the second *.
Be very grateful for anyone who can give me a hand on this.
I'm not sure regular expressions are the way to go. I think it would probably be easier to use explode ('/' , $url) and check by looping over that array.
Here are the steps I would follow:
$url = parse_url($url, PHP_URL_PATH);
$url = trim($url, '/');
$parts = explode ('/' , $url);
Then you can check if
($parts[0]=='content' && $parts[1]=='view' && $parts[3]=='34')
You can also easily get the information you want with $parts[2].
It's actually very simple, a more flexible and straightforward approach is to explode() the url into an array called something like $segments, and then test on there. If you have a very small number of expected URLs, then this kind of approach is probably easier to maintain and to read.
I wouldn't recommend doing this in the htaccess file because of the performance overhead.
First, I would use the PHP function parse_url() to get the path, devoid of any protocol or hostname.
Once you have that the following code should get you the info you need.
<?php
$url = 'http://domain.com/content/view/*/34/'; // first example
$url = 'http://domain.com/content/view/*/34/1/*/'; // second example
$url_array = parse_url($url);
$path = $url_array['path'];
// Match the URL against regular expressions
if (preg_match('/content\/view\/([^\/]+)\/([0-9]+)\//i', $path, $matches)){
print_r($matches);
}
if (preg_match('/content\/view\/([^\/]+)\/([0-9]+)\/([0-9]+)\/([^\/]+)/i', $path, $matches)){
print_r($matches);
}
?>
([^/]+) matches any sequence of characters except a forward slash
([0-9]+) matches any sequence of numbers
Though you can probably write a single regular expression to match most URL variants, consider using multiple regular expressions to check for different types of URLs. Depending on how much traffic you get, the speed hit won't be all that terrible.
Also, I recommend reading Mastering Regular Expressions by O'reilly. A good knowledge of regular expressions will come in handy quite often.
http://www.regular-expressions.info/php.html
Related
// repeat /?([^/)? 10 times
preg_match("`/?([^/]+)/?([^/]+)?/?([^/]+)?/?([^/]+)?/?([^/]+)?/?/?([^/]+)/?([^/]+)?/?([^/]+)?/?([^/]+)?/?([^/]+)?/?`");
This is the regex that I use to parse request uri without query. It can capture up to 10 url parts. That's not optimal nor good. Current workaround is to put more than enough /?([^/]+) and hope that no one will exceed that limit.
Is there a regex that can capture unlimited url parts? Basically I need regex that does this: explode('/', $url).
Caveat: Can use only preg_match!
No explode, preg_match_all, g,
In comments you shown the router compenent you are using: Seriously Simple Router.
The solution is to access that sort of information in the controller, not in the router. Following example shows how you can do it:
// Wrap the whole url into the first capturing group to
// forward it to the controller
Router::route('(a/b/c/([\d]+))', function($url, $id) {
var_dump(explode('/', $url));
});
Router::execute($_SERVER['REQUEST_URI']);
I've tried searching for the answer but it's a tough question to ask in the first place, here goes.
Consider the following URL(s):
hotels-london-kensington-5star
hotels-london-kensington-mayfair-5star
hotels-london-5star
the following would be returned using the correct regular expression:
london-kensington
london-kensington-mayfair
london
I'm trying to use regular expression to get the centre value(s) ONLY. I know the first and last words of the string in all cases i.e. 'hotels' (first) and '5star' (last).
UPDATE: I need to use regular expressions as the URL's are being routed through Codeigniters URI router. The centre part of the URI is dynamically built for search results.
Since you know the first and last words, you also know their lengths.
$out = substr($in,7,-6);
Or more generally:
$out = substr($in,strlen($begin),-strlen($end));
EDIT: If regex is required, just use /(?<=hotels-).*(?=-5star)/
For such simple cases you can use a explode()
$parts = explode('-', $string);
$wanted = array_slice($parts, 1, count($parts)-2);
Update: A regular expression. I still think it's easier to split the string into pieces manually.
~hotels-(.+)-5start~
I'm looking for a regex pattern that will return N slugs/chunks (all pieces of the URL, separated or split on the "/" char.) as matches from a "friendly" URL.
The pattern should not include the domain or a leading slash.
Also, the pattern should work with an unknown number of slugs and/or slashes.
For example, some example URLs and desired returned slugs/chunks:
"" = array()
"foo/bar/" = array('foo', 'bar')
"foo/bar/baz" = array('foo', 'bar', 'baz')
"foo-bar/baz" = array('foo-bar', 'baz')
Finally, I need to pass this regex pattern preg_match (or similar) and have it return the results via the function's $matches parameter.
For example:
<?php preg_match($your_pattern, $friendly_url, $your_pattern_matches); ?>
... similar results can be prduced using explode().
This pattern is being used in a much more complex scenario than my little old example; requiring the use/forcing me to use regex patterns via preg_match for the solution. Basically, I'm passing preg_match a pattern of choice, which is why I need a regex pattern as opposed to simply using explode.
Your help is GREATLY appreciated!
Cheers!
First of all, check the manual of preg_split
$segments = preg_split('[/]', $uri, 0, PREG_SPLIT_NO_EMPTY);
If you insist on preg_match take a look on this:
$uri = '/foo-bar/baz';
preg_match_all('%[^/]+%', $uri, $matches);
print_r($matches);
Sounds like explode() would do the job without having to bother with regexes:
$matches = explode('/', $url);
Sorry but I don't think you can do what you want with preg_match.
After reading the documentation
You can see that preg_match will stop at the first match. You want an array of the matches in a friendly url however this can only be achieved by multiple matches , in order to store the values in an array OR by a single match which would capture the whole thing. Both of these cases do not fit you so I am afraid that you would have to use something else than preg_match.
I have a regular expression for matching URIs. For example,
preg_match("/^my\/uri\//i", "my/uri/whatever");
Which I use for routing, for exmample "http://www.mywebsite.com/my/uri/page.html" will match the above (with the protocol/host removed of course).
Is there any way to evaluate the regular expression into the most general URI that will match? For example,
"my/uri/"
I didn't understand what you actually want.
This code might be what you need:
$general_uri = 'my/uri/';
$regex = '/^' . preg_quote($general_uri) . '/i';
If you want reverse of the above code:
$regex = '/^my\/uri\//i';
$general_uri = str_replace('\\', '', preg_replace('/^\/\^(.*)\/i?$/', '$1', $regex));
However above code will not work on complicated regexes.
Maybe if you can tell the original problem that leads you into this dead end someone can give a twist on the situation.
I've made myself a routing algoritm and I just use and explode on '/' and it works very nicely, something like Magento or Zend Framework does it, registering path controllers or routers and linking them.
Maybe your original problem can be solved without need to write a regular expression engine with PHP.
I need a regular expression to extract from two types of URIs
http://example.com/path/to/page/?filter
http://example.com/path/to/?filter
Basically, in both cases I need to somehow isolate and return
/path/to
and
?filter
That is, both /path/to and filter is arbitrary. So I suppose I need 2 regular expressions for this? I am doing this in PHP but if someone could help me out with the regular expressions I can figure out the rest. Thanks for your time :)
EDIT: So just want to clearify, if for example
http://example.com/help/faq/?sort=latest
I want to get /help/faq and ?sort=latest
Another example
http://example.com/site/users/all/page/?filter=none&status=2
I want to get /site/users/all and ?filter=none&status=2. Note that I do not want to get the page!
Using parse_url might be easier and have fewer side-effects then regex:
$querystring = parse_url($url, PHP_URL_QUERY);
$path = parse_url($var, PHP_URL_PATH);
You could then use explode on the path to get the first two segments:
$segments = explode("/", $path);
Try this:
^http://[^/?#]+/([^/?#]+/[^/?#]+)[^?#]*\?([^#]*)
This will get you the first two URL path segments and query.
not tested but:
^https?://[^ /]+[^ ?]+.*
which should match http and https url with or without path, the second argument should match until the ? (from the ?filter for instance) and the .* any char except the \n.
Have you considered using explode() instead (http://nl2.php.net/manual/en/function.explode.php) ? The task seems simple enough for it. You would need 2 calls (one for the / and one for the ?) but it should be quite simple once you did that.