Regex solution to find a regex pattern and parse it. - php

I am trying to write a simple router for PHP. And I am facing some problem. Example of the routes are as follows.
$route = []
$route['index'] = "/";
$route['home'] = "/home";
$route['blog'] = "/blog/[a-z]";
$route['article'] = "/article/id/[\d+]/title/[\w+]";
Now if we take the last example, I would like the regex only to look for patterns such as [\d+] and [\w+] that is it. I will use explode() to actually cross check if URL contains /blog/, /id/ and /title/. I don't want regex's help with that, but only to detect the patterns and match it.
for example. If a given $URL was dev.test/blog/id/11/title/politics
I would need some like: preg_match($route['url'], $URL)
So, now the preg_match() function knows, that after "/article/id/ there is a pattern asking only for a digit to occur, then if the digit is found it will continue parsing, or else it will show fail or 0.
I don't know much about regex to handle this complex problem.

Your question is a little unclear, but if you want only to capture the [\d+] or [\w+] parts of the target string, you should consider using brackets to capture sub-matches, and the (?:xxx) non-capturing match, which checks for the pattern but does not add it to the array, something like:
$route['article'] = "(?:\/article\/id\/)([\d+])(?:\/title\/)([\w+])";
This will add the matched [\d+] and [\w+] to your matches array only. You'll find them like so:
$matches[0][0] and matches[1][0].
See http://www.regular-expressions.info/tutorial.html for an outstanding tutorial on regexes, by the way.
If you aren't sure of the values of 'article', 'id', and 'title' in advance, then you will probably at least need to be sure of the number of directories given in the url. That means as long as you know the position of the [\d+] and [\w+] entries, you could use
$route['article'] = "(?:\/[\w+]\/[w+]\/)([\d+])(?:\/[\w+]\/)([\w+])"

Related

Need a regular expression to capture url path

I am using PHP, and I have been trying to create a regular expression pattern to capture part of URL path, but to no avail.
The possible URL path could be any of these:
"product/zzz"
"yyyyyyyy/product/zzz"
"xxxxx/yyyyyyyy/product/zzz"
"xxxxx/yyyyyyyy/.../product/zzz" (... means other possible words)
what I need to capture is the part before "product".
for the first case, the result should be an empty string.
for the rest, they are "yyyyyyyy", "xxxxx/yyyyyyyy" and "xxxxx/yyyyyyyy/..."
Can anyone here give me hint? thanks!
PS.
It looks like the part I wanted is a repetition of same pattern "xxxx/". but I am not good at using group of regex.
Update:
I probably found a solution, by capturing pattern "xxx/" with zero or more repetitions: "([^/]+/)*"
so the full regex should be "(([^/]+/)*)product/([^/]+)"
#SERPRO: it passed the test in your "Live RegExp".
Hope it is helpful.
I would use parse_url():
$path = parse_url($url, PHP_URL_PATH);
// Deal with $path to figure out what's after '/product/'
This should work for you:
#(.*?)/?product.*\b#
You can see an example of result strings here:
http://xrg.es/#5awa10
This should do it:
^(.*[^/]|)/*product/[^/]+/*$
It will also allow an arbitrary number of slashes at the end of the path.
The part inside parentheses is your result.

regular expressions checking two strings

Hi wonder if anyone can help - I'm trying to check for occurance of one of two possible strings using regex - but my knowlege of regex is very limited, so I'm not having much sucess.
I'm trying to look for 'Email' and 'eMailConfirm', this is what I have so far and is working for Email
subject is the id of a input field, so it could be 'name','Email','eMailConfirm'
$subject = $getPromoOuter['label'];
$pattern = '/^Email/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 0);
I tried a number of potential expressions to try and incorporate the second string but I can't seem to get it to play (plus a few guesswork ones based on others)
any idea how I can concatenate those two strings and check for an occurance of either?
Thanks for looking
I'll just place an answer here, as I do think I have a good idea what your requirement is.
Your current regex is /^Email/ which matches any string which starts with 'Email'. (whether or not it has to start with it is unclear to me).
In case you need to match either Email or eMailConfirm, not at the start of the string, you should go for
/Email|eMailConfirm/
If the matches do need to be at the front of the string, just prepend both with a '^' character:/^Email|^eMailConfirm/

Capturing a pattern of unknown repitition in PCRE

This may be a quick question for experienced regular expressionists, but I'm having trouble getting my match to execute correctly.
Suppose I had a string that looked like this:
http://aaa-bbbb-cc-ddddd-eee-.sub.dom
I would like to go capture all of the "aaa", "bbbb", "cc", and "ddddd" substrings, but I'm not sure how many there will be (e.g., having all triplets up through "zzz").
This is the regular expression I'm trying to use right now:
/http:\/\/(\w*?\-)+\.sub\.dom/
I wrote it this way because:
I want to match substrings, but I want each to terminate when a - is parsed
I want to capture one or more of these substrings
But it seems to only be saving the last match that it makes (in the above case, it would only match "eee-".
Is there a good way to capture all of the matched substrings?
More information: I'm using PHP's PCRE function preg_replace_callback. Thanks!
No, it is not possible to match an unknown number of capture groups.
If you try to repeat a capture group, it will always contain the last value captured.
Could you explain a bit more broadly what you're trying to do? Perhaps there is another simple way to do it (possibly without regular expressions).
If you want the items in the subdomain, and then all matches between the dashes... This should work:
$string = "http://aaa-bbbb-cc-ddddd-eee-.sub.dom";
preg_match("/^http:\/\/([\w-]+?)\..*$/i", $string, $match);
$parts = explode('-', $match[1]);
print_r($parts);
Short of that you will probably have to build a small parsing script to parse the string yourself if that doesn't do it for you.

Regex matching optional section

So I have two possible strings here for example.
/user/name
and
/user/name?redirect=1
I'm trying to figure out the proper regex to match either with a result of:
Array ([0] => /user/name [1] => user [2] => name)
I think the part I'm having an issue with is that the question mark and the GET query after it are optional and will only be there some of the time. I've tried many different things and can't seem to come up with a regex to match the strings whether the ?** is there or not.
Don't use a regex,
Use parse_url(), and explode()
$result = parse_url("/here/is/a/path?query=string");
$pieces = explode("/", $result['path']);
? is the "zero-or-one" quantifier. So you could append (\?.*)? to your regex, which will optionally match zero or one instances of a literal question-mark followed by any number of characters.
In regex you can specify something as optional using the ? parameter. So for instance, the regex n?ever matches ever and never.
In your case, you might want something like /([A-Za-z0-9]+)/([A-Za-z0-9]+)(\?redirect=1)?
This will match /.../... (given the "..." consist of letters and numbers) or /.../...?redirect=1
If there are more possible flags that could come after the question mark than simply redirect=1, try the more general:
/([A-Za-z0-9]+)/([A-Za-z0-9]+)(\?[A-Za-z0-9]+=[A-Za-z0-9]+)?(&[A-Za-z0-9]+=[A-Za-z0-9]+)*
preg_match('{^/(user)/(name)(?=\?redirect=1)?$}', $subject, $matches);
This is a look ahead assertion. It won't be included in the match itself.
But like the other answers suggest you shouldn't use regex to parse URLs. Just posting the actual answer to the specific question for completeness.

Regex in preg_replace to detect url format and extract elements

I need to replace certain user-entered URLs with embedded flash objects...and I'm having trouble with a regex that I'm using to match the url...I think mainly because the URLs are SEO-friendly and therefore a bit more difficult to parse
URL structure: http://www.site.com/item/item_title_that_can_include_1('_etc-32CHARACTERALPHANUMERICGUID
I need to both detect a match of an URL in that format and capture the 32CHARACTERALPHANUMERICGUID which is always placed after the - in the url
something like this:
$ret = preg_replace('#http://www\.site\.com/item/([^-])-([a-zA-Z0-9]+)#','<embed>itemid=$2</embed>', $ret);
For some reason, the above does not find a match for an URL in the specified format. I'm new to regexes, so I think I'm missing something fairly obvious.
You should check out parse_url().
Examine the results - it was made for parsing URLs. You'll be able to extract the data you require from the tokens returned.
If you are regex crazy, try this...
/^http:\/\/www\.site\.com\/item\/[^-]*\-([a-zA-Z0-9]{32})$/
Your example is almost there, but...
When you do the not character range, i.e. [^-], you still need a quantifier. I placed *, or 0 or more.
You don't seem to use the item title, so we won't bother capturing it.
You should use beginning (^) and end ($) anchors if the string is always exactly like that.
You say the GUID is 32 chars, so we may as well explicitly state that with the {32} quantifier.

Categories