PHP - Preg match with multiple conditions - php

I have the following the following string:
"#(admin/pages|admin/pages/add|admin/pages/[0-9]+)#"
And this string to compare it to:
"admin/pages/1"
What I need is to return "admin/pages/1" when comparing the 2 strings using preg_match(). I have the following code and it's not working:
if(preg_match("#(admin/pages|admin/pages/add|admin/pages/[0-9]+)#", "admin/pages/1", $matches) {
var_dump($matches);
}
This is the output i get:
array(2) { [0]=> string(11) "admin/pages" [1]=> string(11) "admin/pages" }
Can anybody help out?

Use the following short regex parrent:
"#admin/pages(/add|/[0-9]+)?#"
(/add|/[0-9]+)? - optional alternative group, matches either /add or /<number> at the end of searched substring if occurs

Change your regex to:
"#(admin/pages(?:/\d+)?|admin/pages/add)#"
You don't need both variants (admin/pages|admin/pages/[0+9]+) if you put the digits in the first pattern and make them optional.
Question marks and repetitions are greedy by default, that's why it will always include the digits in the match for my version.
On the other hand, if you have an alternation, it will always pick the first match. Since your first alternation does not include digits, they are not matched.
If you're also wondering why you get your match two times, that's because of the way preg_match works.
Quote from the documentation:
$matches[0] will contain the text that matched the full pattern,
$matches[1] will have the text that matched the first captured
parenthesized subpattern, and so on.
You can remove the outer parentheses if the whole match is enough:
"#admin/pages(?:/\d+)?|admin/pages/add#"
Just use $matches[0].
And, as #RomanPerekhrest has written and I shamelessly include in this answer, you can shorten your pattern. You don't need to include admin/pages multiple times:
"#admin/pages(?:/add|/\d+)?#"

Try changing the order of components:
"#(admin/pages/[0-9]+|admin/pages/add|admin/pages)#"
The regular expression is satisfied as soon as something matches. In your case, it stopped as soon as it found admin/pages without looking any further.

Related

Regex - Match characters but don't include within results

I have got the following Regex, which ALMOST works...
(?:^https?:\/\/)(?:www|[a-z]+)\.([^.]+)
I need the result to be the only result, or within the same position in the Array.
So for example this http://m.facebook.com/ matches perfect, there is only 1 group.
However, if I change it to http://facebook.com/ then I get com/in place of where Facebook should be. So I need to have (?:www|[a-z]+) as an optional check really.
Edit:
What I expect is just to match facebook, if ANY of the strings are as follows:
http://www.facebook.com
http://facebook.com
http://m.facebook.com
And obviously the https counterparts.
This is my Regex now
(?:^https?:\/\/)(?:www)?\.?([^.]+)
This is close, however it matches the m on when I try `http://m.facebook.com
https://regex101.com/r/GDapY5/1
So I need to have (?:www|[a-z]+) as an optional check really.
A ? at the end of a pattern is generally used for "optional" bits -- it means "match zero or one" of that thing, so your subpattern would be something like this:
(?:www|[a-z]+)?
If you're simply trying to get the second level domain, I wouldn't bother with regex, because you'll be constantly adjusting it to handle special cases you come across. Just split on dots and take the penultimate value:
$domain = array_reverse(explode('.', parse_url($str)['host']))[1];
Or:
$domain = array_reverse(explode('.', parse_url($str, PHP_URL_HOST)))[1];
Perhaps you could make the first m. part optional with (?:\w+\.)?.
Instead of a capturing group you could use \K to reset the starting point of the reported match.
Then match one or more word characters \w+ and use a positive lookahead to assert that what follows is a dot (?=\.)
For example:
^https?://(?:www)?(?:\w+\.)?\K\w+(?=\.)
Edit: Or you could match for m. or www. using an alternation:
^https?://(?:m\.|www\.)?\K\w+(?=\.)
Demo Php

pregmatch between characters and any numeric

I'm stuck writing a preg_match
I have a string:
XPMG_ar121023.txt
and need to extract the 2 letters between XPMG_ and the first digit - be it a 0-9
$str = 'XPMG_ar121023.txt';
preg_match('/('XPMG_')|[0-9\,]))/', $str, $match);
print_r($match);
Maybe this isn't the best option: My characters will always be
You can just do
$str = "XPMG_ar121023.txt" ;
preg_match('/_([a-z]+)/i', $str, $match);
var_dump($match[1]);
Output
string 'ar' (length=2)
This is too simple for a regular expression. Just $match = substr($str,5,3) would get what you're asking for.
Let me walk through this step by step so as to help you solve similar problems in the future. Suppose we have the following format for our filenames:
XPMG_ar121023.txt
We know what we want to capture, we want the "ar" right after the _ and just before the numbers begin. So our expression would look something like this:
_[a-z]+
This is pretty straight-forward. We're starting by looking for an underscore, followed by any number of letters between a and z. The square brackets define a character class. Our class consists of the alphabet, but you can push specific numbers in there and more if you like.
Now because we want to capture only the letters, we need to put parenthesis around that part of the pattern:
_([a-z]+)
In the result we will now have access to only that subpattern. Next we put our delimiters in place to specify where our pattern begins, and ends:
/_([a-z]+)/
And lastly, after our closing delimiter we can add some modifiers. As it is written, our pattern only looks for lower-case letters. We can add the i modifier to make this case-insensitive:
/_([a-z]+)/i
Voila, we're done. Now we can pass it into preg_match to see what it spits out:
preg_match( "/_([a-z]+)/i", "XPMG_ar121023.txt", $match );
This function takes a pattern as the first parameter, a string to match it against as the second, and lastly a variable to spit the results into. When all is said and done, we can check $match for our data.
The results of this operation follow:
array(2) {
[0]=> string(3) "_ar"
[1]=> string(2) "ar"
}
This is the contents of $match. Notice our full pattern is found in the first index of the array, and our captured portion is provided in the second index of the array.
echo $match[1]; // ar
Hope this helps.
Well, why not:
$letters = $str[5].$str[6];
:)
After all, you'll always need the 2 chars after the fixed prefix, there are many ways that do not require a regexp (substr() being the best anyway)

using preg_match_all to find patterns, don't include pattern deliminator in matchs

I'm matching patterns with reg_ex as in
$Structure = 'C:N:X:A:V:T:J:N:G:T:N:N:C:J:N:C:A:J:N:.:';
preg_match_all('/(T:|G:|L:|D:).*?(G:|i:|X:|\.:)/', $Structure, $arr, PREG_SET_ORDER);
the results I get are
T:J:N:G: , T:N:N:C:J:N:C:A:J:N:.:
How can I modify the query so that the deliminator (G:|i:|X:|.:) of the match is not included in the find, but will bu used in the next search. In other words make the result look as bellow:
T:J:N: , G:T:N:N:C:J:N:C:A:J:N:
instead?
Is this possible?
Thanks
Yes, instead of making your 2nd capturing group consume the input, turn it into a positive lookahead:
/(T:|G:|L:|D:).*?(?=(?:G:|i:|X:|\.:))/
Now, instead of matching (and consuming) the delimiter, this:
(?=(?:G:|i:|X:|\.:))
States that the regex must assert that the delimiter is present from the current point forward, i.e. a positive lookahead.
This results in:
"T:J:N:, G:T:N:N:C:J:N:C:A:J:N:"
It is possible by lookaheads, with the following syntax:
(?=G:|i:|X:|\.:)
That will not consume the piece that matches the regex.
On a side note, the delimiter means the slashes that you have enclosing your regex and not the capturing group you have.

How can use a match in the same regex in php?

I have this string (that is a serialized variable in php):
s:12:"hello "world";
and I wanna to find "hello "world" only with regex, I try this, but seems it is stupid :P
(s:(?P<num>[0-9]+):".{\k{num}}";)
I only want to know how I can use "num" result in the its regex?
this regex is used in a big regex so I can't check for end of string.
thanks advance!
You can use your named capturing groups as backreference like this
Back references to the named subpatterns can be achieved by (?P=name)
or, since PHP 5.2.2, also by \k or \k'name'. Additionally PHP
5.2.4 added support for \k{name} and \g{name}.
According to php.net
But I think this can be used only to match the found pattern again, but not as a number in a quantifier. (At least I didn't got it to work.)
You can use preg_match function, which will populate an array of matches:
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches1 will have the text that matched the first captured parenthesized subpattern, and so on.
More information about preg_match: PHP: preg_match
$text = 's:12:"hello "world";s:12:"good bue world";';
$pattern = "(.*:[0-9]+:\"(.*)\";.*)U";
preg_match_all($pattern,$text,$r);

PHP Regular Expression - Repeating Match of a Group

I have a string that may look something like this:
$r = 'Filed under: <a>Group1</a>, <a>Group2</a>';
Here is the regular expression I am using so far:
preg_match_all("/Filed under: (?:<a.*?>([\w|\d|\s]+?)<\/a>)+?/", $r, $matches);
I want the regular expression to inside the () to continue to make matches as designated with the +? at the end. But it just won't do it. ::sigh::
Any ideas. I know there has to be a way to do this in one regular expression instead of breaking it up.
Just for fun here's a regex that will work with a single preg_match_all:
'%(?:Filed under:\s*+|\G</a>)[^<>]*+<a[^<>]*+>\K[^<>]*%`
Or, in a more readable format:
'%(?:
Filed under: # your sentinel string
|
\G # NEXT MATCH POSITION
</a> # an end tag
)
[^<>]*+ # some non-tag stuff
<a[^<>]*+> # an opening tag
\K # RESET MATCH START
[^<>]+ # the tag's contents
%x'
\G matches the position where the next match attempt would start, which is usually the spot where the previous successful match ended (but if the previous match was zero-length, it bumps ahead one more). That means the regex won't match a substring starting with </a> until after it's matched one starting with Filed under: at at least once.
After the sentinel string or an end tag has been matched, [^<>]*+<a[^<>]*+> consumes everything up to and including the next start tag. Then \K spoofs the start position so the match (if there is one) appears to start after the <a> tag (it's like a positive lookbehind, but more flexible). Finally, [^<>]+ matches the tag's contents and brings the match position up to the end tag so \G can match.
But, as I said, this is just for fun. If you don't have to do the job in one regex, you're better off with a multi-step approach like the one #codaddict used; it's more readable, more flexible, and more maintainable.
\K reference
\G reference
EDIT: Although the references I gave are for the Perl docs, these features are supported by PHP, too--or, more accurately, by the PCRE lib. I think the Perl docs are a little better, but you can also read about this stuff in the PCRE manual.
Try:
<?php
$r = 'Filed under: <a>Group1</a>, <a>Group2</a>, <a>Group3</a>, <a>Group4</a>';
if(preg_match_all("/<a.*?>([^<]*?)<\/a>/", $r, $matches)) {
var_dump($matches[1]);
}
?>
output:
array(4) {
[0]=>
string(6) "Group1"
[1]=>
string(6) "Group2"
[2]=>
string(6) "Group3"
[3]=>
string(6) "Group4"
}
EDIT:
Since you want to include the string 'Filed under' in the search to uniquely identify the match, you can try this, I'm not sure if it can be done using a single call to preg_match
// Since you want to match everything after 'Filed under'
if(preg_match("/Filed under:(.*)$/", $r, $matches)) {
if(preg_match_all("/<a.*?>([^<]*?)<\/a>/", $matches[1], $matches)) {
var_dump($matches[1]);
}
}
$r = 'Filed under: <a>Group1</a>, <a>Group2</a>'
$s = explode("</a>",$r);
foreach ($s as $k){
if ($k){
$k=explode("<a>",$k);
print "$k[1]\n";
}
}
output
$ php test.php
Group1
Group2
I want the regular expression to inside the () to continue to make matches as designated with the +? at the end.
+? is a lazy quantifier - it will match as few times as possible. In other words, just once.
If you want to match several times, you want a greedy quantifier - +.
Also note that your regex doesn't quite work - the match fails as soon as it encounters the comma between the tags, because you haven't accounted for it. That likely needs correcting.

Categories