PHP: How do I convert a regular expression to an example match? - php

I have a regular expression for matching URIs. For example,
preg_match("/^my\/uri\//i", "my/uri/whatever");
Which I use for routing, for exmample "http://www.mywebsite.com/my/uri/page.html" will match the above (with the protocol/host removed of course).
Is there any way to evaluate the regular expression into the most general URI that will match? For example,
"my/uri/"

I didn't understand what you actually want.
This code might be what you need:
$general_uri = 'my/uri/';
$regex = '/^' . preg_quote($general_uri) . '/i';
If you want reverse of the above code:
$regex = '/^my\/uri\//i';
$general_uri = str_replace('\\', '', preg_replace('/^\/\^(.*)\/i?$/', '$1', $regex));
However above code will not work on complicated regexes.

Maybe if you can tell the original problem that leads you into this dead end someone can give a twist on the situation.
I've made myself a routing algoritm and I just use and explode on '/' and it works very nicely, something like Magento or Zend Framework does it, registering path controllers or routers and linking them.
Maybe your original problem can be solved without need to write a regular expression engine with PHP.

Related

Regex solution to find a regex pattern and parse it.

I am trying to write a simple router for PHP. And I am facing some problem. Example of the routes are as follows.
$route = []
$route['index'] = "/";
$route['home'] = "/home";
$route['blog'] = "/blog/[a-z]";
$route['article'] = "/article/id/[\d+]/title/[\w+]";
Now if we take the last example, I would like the regex only to look for patterns such as [\d+] and [\w+] that is it. I will use explode() to actually cross check if URL contains /blog/, /id/ and /title/. I don't want regex's help with that, but only to detect the patterns and match it.
for example. If a given $URL was dev.test/blog/id/11/title/politics
I would need some like: preg_match($route['url'], $URL)
So, now the preg_match() function knows, that after "/article/id/ there is a pattern asking only for a digit to occur, then if the digit is found it will continue parsing, or else it will show fail or 0.
I don't know much about regex to handle this complex problem.
Your question is a little unclear, but if you want only to capture the [\d+] or [\w+] parts of the target string, you should consider using brackets to capture sub-matches, and the (?:xxx) non-capturing match, which checks for the pattern but does not add it to the array, something like:
$route['article'] = "(?:\/article\/id\/)([\d+])(?:\/title\/)([\w+])";
This will add the matched [\d+] and [\w+] to your matches array only. You'll find them like so:
$matches[0][0] and matches[1][0].
See http://www.regular-expressions.info/tutorial.html for an outstanding tutorial on regexes, by the way.
If you aren't sure of the values of 'article', 'id', and 'title' in advance, then you will probably at least need to be sure of the number of directories given in the url. That means as long as you know the position of the [\d+] and [\w+] entries, you could use
$route['article'] = "(?:\/[\w+]\/[w+]\/)([\d+])(?:\/[\w+]\/)([\w+])"

Regular expressions in PHP behave strangely

So, I've got some kind of database and I use regular expressions to process all those lines. But the problem is there may be no or not single '#' symbol in email section. I decided to put # before domain(there are not many of them) and then just remove all the #'s I don't need.
I use some online regular expressions constructors - like this one http://www.phpliveregex.com/ . I got following regular expression for putting # before domain:
preg_replace("/(dodgit|trashymail|pookmail|spambob|mailinator)/", "#$1", $myline);
But it just doesn't work. For example:
CynthiaELopezdodgit.com
doesn't change after this script.
What can be wrong? I'm new to PHP so sorry if the problem is really stupid :)
You need to get the return value
$newLine = preg_replace("/(dodgit|trashymail|pookmail|spambob|mailinator)/", "#$1", $myline);
$newLine will contain the email with the #, $myline will continue to hold the one without. preg_replace does not mutate the original variable
Your regex works fine. I would check to make sure you're looking at the right variable. preg_replace doesn't overwrite the variable, but instead returns it.
For a working example: http://codepad.org/PSxK7Jtv

Need a regular expression to capture url path

I am using PHP, and I have been trying to create a regular expression pattern to capture part of URL path, but to no avail.
The possible URL path could be any of these:
"product/zzz"
"yyyyyyyy/product/zzz"
"xxxxx/yyyyyyyy/product/zzz"
"xxxxx/yyyyyyyy/.../product/zzz" (... means other possible words)
what I need to capture is the part before "product".
for the first case, the result should be an empty string.
for the rest, they are "yyyyyyyy", "xxxxx/yyyyyyyy" and "xxxxx/yyyyyyyy/..."
Can anyone here give me hint? thanks!
PS.
It looks like the part I wanted is a repetition of same pattern "xxxx/". but I am not good at using group of regex.
Update:
I probably found a solution, by capturing pattern "xxx/" with zero or more repetitions: "([^/]+/)*"
so the full regex should be "(([^/]+/)*)product/([^/]+)"
#SERPRO: it passed the test in your "Live RegExp".
Hope it is helpful.
I would use parse_url():
$path = parse_url($url, PHP_URL_PATH);
// Deal with $path to figure out what's after '/product/'
This should work for you:
#(.*?)/?product.*\b#
You can see an example of result strings here:
http://xrg.es/#5awa10
This should do it:
^(.*[^/]|)/*product/[^/]+/*$
It will also allow an arbitrary number of slashes at the end of the path.
The part inside parentheses is your result.

PHP Regex on URL - split into variables

I am trying to implement a php script which will run on every call to my site, look for a certain pattern of URL, then explode the URL and perform a redirect.
Basically I want to run this on a new CMS to catch all incoming links from the old CMS, and redirect, based on mapping, say an article id stripped form the URL to the same article ID imported into the new CMS's DB.
I can do the implementation, the redirect etc, but I am lost on the regex.
I need to catch any occurrences of:
domain.com/content/view/*/34/ or domain.com/content/view/*/30/ (where * is a wildcard) and capture * and the 30 or 34 in a variable which I will then use in a DB query.
If the following is encountered:
domain.com/content/view/*/34/1/*/
I need to capture the first * and the second *.
Be very grateful for anyone who can give me a hand on this.
I'm not sure regular expressions are the way to go. I think it would probably be easier to use explode ('/' , $url) and check by looping over that array.
Here are the steps I would follow:
$url = parse_url($url, PHP_URL_PATH);
$url = trim($url, '/');
$parts = explode ('/' , $url);
Then you can check if
($parts[0]=='content' && $parts[1]=='view' && $parts[3]=='34')
You can also easily get the information you want with $parts[2].
It's actually very simple, a more flexible and straightforward approach is to explode() the url into an array called something like $segments, and then test on there. If you have a very small number of expected URLs, then this kind of approach is probably easier to maintain and to read.
I wouldn't recommend doing this in the htaccess file because of the performance overhead.
First, I would use the PHP function parse_url() to get the path, devoid of any protocol or hostname.
Once you have that the following code should get you the info you need.
<?php
$url = 'http://domain.com/content/view/*/34/'; // first example
$url = 'http://domain.com/content/view/*/34/1/*/'; // second example
$url_array = parse_url($url);
$path = $url_array['path'];
// Match the URL against regular expressions
if (preg_match('/content\/view\/([^\/]+)\/([0-9]+)\//i', $path, $matches)){
print_r($matches);
}
if (preg_match('/content\/view\/([^\/]+)\/([0-9]+)\/([0-9]+)\/([^\/]+)/i', $path, $matches)){
print_r($matches);
}
?>
([^/]+) matches any sequence of characters except a forward slash
([0-9]+) matches any sequence of numbers
Though you can probably write a single regular expression to match most URL variants, consider using multiple regular expressions to check for different types of URLs. Depending on how much traffic you get, the speed hit won't be all that terrible.
Also, I recommend reading Mastering Regular Expressions by O'reilly. A good knowledge of regular expressions will come in handy quite often.
http://www.regular-expressions.info/php.html

Regular expression to extract from URI

I need a regular expression to extract from two types of URIs
http://example.com/path/to/page/?filter
http://example.com/path/to/?filter
Basically, in both cases I need to somehow isolate and return
/path/to
and
?filter
That is, both /path/to and filter is arbitrary. So I suppose I need 2 regular expressions for this? I am doing this in PHP but if someone could help me out with the regular expressions I can figure out the rest. Thanks for your time :)
EDIT: So just want to clearify, if for example
http://example.com/help/faq/?sort=latest
I want to get /help/faq and ?sort=latest
Another example
http://example.com/site/users/all/page/?filter=none&status=2
I want to get /site/users/all and ?filter=none&status=2. Note that I do not want to get the page!
Using parse_url might be easier and have fewer side-effects then regex:
$querystring = parse_url($url, PHP_URL_QUERY);
$path = parse_url($var, PHP_URL_PATH);
You could then use explode on the path to get the first two segments:
$segments = explode("/", $path);
Try this:
^http://[^/?#]+/([^/?#]+/[^/?#]+)[^?#]*\?([^#]*)
This will get you the first two URL path segments and query.
not tested but:
^https?://[^ /]+[^ ?]+.*
which should match http and https url with or without path, the second argument should match until the ? (from the ?filter for instance) and the .* any char except the \n.
Have you considered using explode() instead (http://nl2.php.net/manual/en/function.explode.php) ? The task seems simple enough for it. You would need 2 calls (one for the / and one for the ?) but it should be quite simple once you did that.

Categories