PHP preg_match part of url - php

I am trying to create a url router in PHP, that works like django's.
The problems is, I don't know php regular expressions very well.
I would like to be able to match urls like this:
/post/5/
/article/slug-goes-here/
I've got an array of regexes:
$urls = array(
"(^[/]$)" => "home.index",
"/post/(?P<post_id>\d+)/" => "home.post",
);
The first regex in the array works to match the home page at / but I can't get the second one to work.
Here's the code I am using to match them:
foreach($urls as $regex => $mapper) {
if (preg_match($regex, $uri, $matches)) {
...
}
}
I should also note that in the example above, I am trying to match the post_id in the url: /post/5/ so that I can pass the 5 along to my method.

You must delimit the regex. Delimiting allows you to provide 'options' (such as 'i' for case insensitive matching) as part of the pattern:
,/post/(?P<post_id>\d+)/,
here, I have delimited the regex with commas.
As you have posted it, your regex was being delimited with /, which means it was treating everything after the second / as 'options', and only trying to match the "post" part.
The example you are trying to match against looks like it isn't what you're actually after based on your current regex.
If you are after a regex which will match something like;
/post/P1234/
Then, the following:
preg_match(',/post/(P\d+)/,', '/post/P1234/', $matches);
print_r($matches);
will result in:
Array
(
[0] => /post/P1234/
[1] => P1234
)
Hopefully that clears it up for you :)
Edit
Based on the comment to your OP, you are only trying to match a number after the /post/ part of the URL, so this slightly simplified version:
preg_match(',/post/(\d+)/,', '/post/1234/', $matches);
print_r($matches);
will result in:
Array
(
[0] => /post/1234/
[1] => 1234
)

If your second RegExp is meant to match urls like /article/slug-goes-here/, then the correct regular expression is
#\/article\/[\w-]+\/#
That should do it! Im not pretty sure about having to escape the /, so you can try without escaping them. The tag Im guessing is extracted from a .NET example, because that framework uses such tags to name matching groups.
I hope I can be of help!

php 5.2.2: Named subpatterns now accept the syntax (?<name>) and
(?'name') as well as (?P<name>). Previous versions accepted only
(?P<name>).
http://php.net/manual/fr/function.preg-match.php

Related

How to use a regular expression with preg_match_all to split a string into blocks following a pattern

I'm going to be working with a long string of data that is serialized into blocks using a pattern (x:y).
However, I struggle with regular expressions, and are looking for resources to help identify how to construct a regex to identify any/all of these blocks as they appear in a string.
For example, given the following string:
$s = 't:user c:red t:admin n:"bob doe" s:expressionsf:json';
Note: the f:json at the end is missing a space on purpose, because the format might vary with how the string is eventually given to me. Each block might be spaced, and they might not.
How would I identify each block of x:y to end with the below result:
Array
(
[0] => t:user
[1] => c:red
[2] => t:admin
[3] => n:"bob doe"
[4] => s:expression
[5] => f:json
)
I've tested various expressions using my limited knowledge, but have not been terribly successful.
I can successfully match the pattern using something like this:
^[ctrns]:.+
But this unfourtunately matches the entire string. The part I seem to be missing is how to break each block, while also maintaining the ability to keep spaces within the pairs (see n:"bob doe" example).
Any assistance would be super appreciated! Also, ideally any submission would be explained as to what each token in the expression was accomplishing so that I better my understanding of these techniques.
I've been using https://regexr.com/ to practice.
You may use this regex in preg_match_all:
[ctnsf]:(?:"[^"\\]*(?:\\.[^"\\]*)*"|\S+?(?=[ctnsf]:|\s|$))
RegEx Demo
RegEx Details:
[ctnsf]:: Match one of ctnsf characters followed by :
(?:"[^"\\]*(?:\\.[^"\\]*)*": Match a quoted substring. This takes care of escaped quotes as well.
|: OR
\S+?: Match 1+ not-whitespace characters (non-greedy)
(?=[ctnsf]:|\s|$): Positive lookahead to assert one of the conditions given in assertions.
Code:
$re = '/[ctnsf]:(?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\S+?(?=[ctnsf]:|\s|$))/m';
$str = 't:user c:red t:admin n:"bob \\"doe" s:expressionsf:json';
preg_match_all($re, $str, $matches);
// Print the entire match result
print_r($matches[0]);
Code Demo

How can I extract a query string from these logs?

I have a bunch of lines in a log file where I need to extract only the query string part. I have identified these pattern:
/path/optin.html?e=somebase64string&l=somedifferentbase64string HTTP...
"/path/optin.html?e=somebase64string%3D&l=somedifferentbase64string" "browser info"...
"/path/optin.html?" "browser info"...
Some notes:
Sometimes path and query string are enclosed in double quotes
Sometimes there's no query string at all, obviously the ones with no query string are to be discarded.
Sometimes the base64 string was url encoded, so the ending "=" part comes as "%3D" instead. I don't think this has affected my script but I'd thought I'd note it also.
So, I was able to correctly extract - hopefully - all of the lines that follow the first pattern above, but the others I'm having some trouble with.
This is the pattern I'm trying with:
$pattern = '/html\?(.*)\s*HTTP/';
then I run a preg_match against the log line.
Anyone can help me out with a better regex pattern?
I need to grab this part off the log lines:
e=somebase64string&l=somedifferentbase64string
Thanks
You can use a pattern like ~\?([^\s.]*)~ to match everything after a ? until you reach a whitespace character (assuming a rule that "URLs will never have spaces in them [that aren't %20]):
$pattern = '~\?([^\s.]*)~';
preg_match_all($pattern, $logs, $output);
Then trim off any quotes (e.g. in your last example):
$output = array_map(function($var) { return rtrim($var, '"'); }, $output[1]);
Giving you:
Array
(
[0] => e=somebase64string&l=somedifferentbase64string
[1] => e=somebase64string%3D&l=somedifferentbase64string
[2] =>
)
Example

Regular Expression (preg_match) match anything

This is how far I got.
This is working:
$urls = $this->match_all('/<a href="(http:\/\/www.imdb.de\/title\/tt.*?)".*?>.*?<\/a>/ms',
$content, 1);
Now I wan't to do the same with a different site.
But the link of the site has different structure:
http://www.example.org/ANYTHING
I don't know what I am doing wrong but with this other site (example.org) it is not working.
Here is what I have tried
$urls = $this->match_all('/<a href="(http:\/\/www.example.org\/.*?)".*?>.*?<\/a>/ms',
$content, 1);
Thank you for your help. Stackoverflow is so awesome!
ANYTHING is usually represented by .*? (which you already use in your original regex). You could also use [^"]+ as placeholder in your case.
It sounds like you want the following regular expression:
'/<a href="(http:\/\/example\.org\/.*?)".*?>.*?<\/a>/ms'
You can also use a different delimiter to avoid escaping the backslashes:
'#<a href="(http://example\.org/.*?)".*?>.*?</a>#ms'
Note the escaping of the . in the domain name, as you intend to match a literal ., not any character.
I think this should help
/<a href="(http:\/\/www.example.org\/.*?)".*?>.*?<\/a>/ms
text
Result:
Array
(
[0] => text
[1] => http://www.example.org/ANYTHING
)
EDIT: I always find this site very useful for when i want to try out preg_match - http://www.solmetra.com/scripts/regex/index.php

PHP regular expression not being matched - what is wrong?

I have the following regular expression:
"^[x]{1}[a-z]{3,4}:[a-z0-9]{1,6}"
I want to use it to be able to match strings like:
xabc:z123
However, when I try it with this regex tester, it does not match the pattern. Is it my pattern that is wrong, or is the online tester unreliable?.
If my pattern is wrong, could someone point out why it is wrong.
Also, I want to make the pattern matching case insensitive - but I'm not too sure the best way to do that (thought better to ask rather than trial and error). How do I change the pattern so it matches irrespective of case?
Just add an i for case insensitive matching:
/^[x]{1}[a-z]{3,4}:[a-z0-9]{1,6}/i
By the way, your regular expression works!?
Output:
Array
(
[0] => xabc:z123
)
If you want to have something like:
Array
(
[0] => 'xabc:z123',
[1] => 'x',
[2] => 'abc'
...
)
You need to add groups using (), e.g.:
/^([x]{1})([a-z]{3,4}):([a-z0-9]{1,6})/i
In the tester, you have to enter the regex without the surrounding quotes. In PHP source code, you have to use quotes and a regex delimiter; the tester shows that in the code it generates:
$ptn = "/^[x]{1}[a-z]{3,4}:[a-z0-9]{1,6}/";
To make it case insensitive, you have two options. One is to add an i after the closing delimiter, as #middus's answer demonstrates. The other is to add (?i) to the the regex itself:
(?i)^[x]{1}[a-z]{3,4}:[a-z0-9]{1,6}
The tester will accept it either way; if you don't add the delimiters yourself it adds / to either end, which means any slashes in your regex need to be escaped (i.e., it doesn't escape them for you). Be aware that PHP allows you to use other characters as the delimiters, but that tester only recognizes /.
Some further notes:
To match a single x, all you need is x. The square brackets are unnecessary when there's only one letter inside them, and the {1} quantifier never has any effect--it's pure clutter.
If you're using the regex to validate the string, you may want to add a $ anchor to the end.
End result:
/^x[a-z]{3,4}:[a-z0-9]{1,6}$/i
Here is another tester that lets you choose your own delimiters, among other things.

Regex matching optional section

So I have two possible strings here for example.
/user/name
and
/user/name?redirect=1
I'm trying to figure out the proper regex to match either with a result of:
Array ([0] => /user/name [1] => user [2] => name)
I think the part I'm having an issue with is that the question mark and the GET query after it are optional and will only be there some of the time. I've tried many different things and can't seem to come up with a regex to match the strings whether the ?** is there or not.
Don't use a regex,
Use parse_url(), and explode()
$result = parse_url("/here/is/a/path?query=string");
$pieces = explode("/", $result['path']);
? is the "zero-or-one" quantifier. So you could append (\?.*)? to your regex, which will optionally match zero or one instances of a literal question-mark followed by any number of characters.
In regex you can specify something as optional using the ? parameter. So for instance, the regex n?ever matches ever and never.
In your case, you might want something like /([A-Za-z0-9]+)/([A-Za-z0-9]+)(\?redirect=1)?
This will match /.../... (given the "..." consist of letters and numbers) or /.../...?redirect=1
If there are more possible flags that could come after the question mark than simply redirect=1, try the more general:
/([A-Za-z0-9]+)/([A-Za-z0-9]+)(\?[A-Za-z0-9]+=[A-Za-z0-9]+)?(&[A-Za-z0-9]+=[A-Za-z0-9]+)*
preg_match('{^/(user)/(name)(?=\?redirect=1)?$}', $subject, $matches);
This is a look ahead assertion. It won't be included in the match itself.
But like the other answers suggest you shouldn't use regex to parse URLs. Just posting the actual answer to the specific question for completeness.

Categories