PHP regex url matching - php

I have some example:
mysite.com/tag/HD+Wallpaper //match
mysite.com/tag/HD+Wallpaper/ //match
mysite.com/tag/HD+Wallpaper?page= //match
mysite.com/tag/HD+Wallpaper/?page= //match
mysite.com/tag/HD+Wallpaper/sdadasdas //not match
with "HD+Wallpaper" is a param
i try :
^tag/(.+?)(|\/|\?(.*?))$i
How can i fix it ? TY

You could try following regex
tag\/.*?(?=\/|\?|$)
Demo
Note:
In the demo, I added backslash \ to escape /

This regex should work for you:
tag\/([^\/]*?)(?=\/$|\/\?|\?|$)
Demo

You might use
tag/[^/\s]+(?:/(?:\?.*)?)?$
Explanation
tag/ Match literally (perhaps /tag/ or \btag would be more specific)
[^/\s]+ Match 1+ occurrences of any char except / or a whitespace char
(?: Non capture group
/(?:\?.*)? Match / followed by an optional part to match ? and the rest of the string
)? Close group and make it optional
$ End of string
Regex demo

Related

Why is non-greedy match consuming entire pattern even when followed by another non-greedy match

Using PHP8, I'm struggling to figure out how to conditionally match some key that may or may not appear in a string.
I would like to match both
-----------key=xyz---------------
AND
--------------------------
The dashes("-") could be any non-space character, and only used here for a cleaner to read example.
The regex is matching "key=..." if its containing group is greedy like below.
But this isn't adequate, because the full match will fail a "key=xyz" is missing the subject string.
/
(\S*)?
(key\=(?<foundkey>[[:alnum:]-]*))
\S*
/x
if that capture group is non-greedy, then the regex just ignores the key match any "key=xyz"
/
(\S*)?
(key\=(?<foundkey>[[:alnum:]-]*))?
\S*
/x
I tried debugging in this regex101 example but couldn't figure it out.
I sorted this out using multiple regexs, but hoping someone can help address my misunderstandings so I learn know how to make this work as a single regex.
Thanks
You may use:
/
^
\S*?
(?:
key=(?<foundkey>\w+)
\S*
)?
$
/xm
RegEx Demo
RegEx Breakdown:
^: Start
\S*?: Match 0 or more whitespaces non-greedy
(?:: Start Lookahead
key=(?<foundkey>\w+): Match key= text followed by 1+ word characters as capture group foundkey
\S*: Match 0 or more whitespaces
)?: End lookahead. ? makes it an optional match
$; End

How to capture all phrases which doesn't have a pattern in the middle of theirself?

I want to capture all strings that doesn't have the pattern _ a[a-z]* _ in the specified position in the example below:
<?php
$myStrings = array(
"123-456",
"123-7-456",
"123-Apple-456",
"123-0-456",
"123-Alphabet-456"
);
foreach($myStrings as $myStr){
echo var_dump(
preg_match("/123-(?!a[a-z]*)-456/i", $myStr)
);
}
?>
You can check the following solution at this Regex101 share link.
^(123-(?:(?![aA][a-zA-Z]*).*)-456)|(123-456)$
It uses regex non-capturing group (?:) and regex negative lookahead (?!) to find all inner sections that do not start with 'a' (or 'A') and any letters after that. Also, the case with no inner section (123-456) is added (with the | sign) as a 2nd alternative for a wrong pattern.
A lookahead is a zero-length assertion. The middle part also needs to be consumed to meet 456. For consuming use e.g. \w+- for one or more word characters and hyphen inside an optional group that starts with your lookahead condition. See this regex101 demo (i flag for caseless matching).
Further for searching an array preg_grep can be used (see php demo at tio.run).
preg_grep('~^123-(?:(?!a[a-z]*-)\w+-)?456$~i', $myStrings);
There is also an invert option: PREG_GREP_INVERT. If you don't need to check for start and end a more simple pattern like -a[a-z]*- without lookahead could be used (another php demo).
Match the pattern and invert the result:
!preg_match('/a[a-z]*/i', $yourStr);
Don't try to do everything with a regex when programming languages exist to do the job.
You are not getting a match because in the pattern 123-(?!a[a-z]*)-456 the lookahead assertion (?!a[a-z]*) is always true because after matching the first - it has to directly match another hyphen like the pattern actually 123--456
If you move the last hyphen inside the lookahead like 123-(?!a[a-z]*-)456 you only get 1 match for 123-456 because you are actually not matching the middle part of the string.
Another option with php can be to consume the part that you don't want, and then use SKIP FAIL
^123-(?:a[a-z]*-(*SKIP)(*F)|\w+-)?456$
Explanation
^ Start of string
123- Match literally
(?: Non capture group for the alternation
a[a-z]*-(*SKIP)(*F) Match a, then optional chars a-z, then match - and skip the match
| Or
\w+- Match 1+ word chars followed by -
)? Close the non capture group and make it optional to also match when there is no middle part
456 Match literally
$ End of string
Regex demo
Example
$myStrings = array(
"123-456",
"123-7-456",
"123-Apple-456",
"123-0-456",
"123-Alphabet-456",
"123-b-456"
);
foreach($myStrings as $myStr) {
if (preg_match("/^123-(?:a[a-z]*-(*SKIP)(*F)|\w+-)?456$/i", $myStr, $match)) {
echo "Match for $match[0]" . PHP_EOL;
} else {
echo "No match for $myStr" . PHP_EOL;
}
}
Output
Match for 123-456
Match for 123-7-456
No match for 123-Apple-456
Match for 123-0-456
No match for 123-Alphabet-456
Match for 123-b-456

Regex curly braces and quotes get inner text

In the following string {lang('stmt')} I want to get just the stmt where it may also be as follows {lang("stmt")}.
I'm bad with regex, I've tried {lang(.*?)} which gives me ('stmt').
You might match {lang(" or {lang(' and capture the ' or " using a capturing group. This group can by used with a backreference to match the same character.
Use \K to forget what was previously matched.
Then match 0+ characters non greedy .*? and use a positive lookahead using the backreference \1 to assert what follows is ')} or ")}
\{lang\((['"])\K.*?(?=\1\)})
Regex demo
Match either ' or " with a character set, then lazy-repeat any character until the first capture group can be matched again:
lang\((['"])(.*?)\1
https://regex101.com/r/MBKhX3/1
In PHP code:
$str = "{lang('stmt')}";
preg_match('/lang\(([\'"])(.*?)\1/', $str, $matches);
print(json_encode($matches));
Result:
["lang('stmt'","'","stmt"]
(the string you want will be in the second capture group)
Try this one too.
lang\([('")][a-z]*['")]\)
Keep ( and ) outside the (.*) to get value without ( and )
regex:
{lang\('|"['|"]\)}
php: '/{lang\([\'|"](.*?)[\'|"]\)}/'

Regex match exact words

I want my regex to match ?ver and ?v, but not ?version
This is what I have so far: $parts = preg_split( "(\b\?ver\b|\b\?v\b)", $src );
I think the trouble might be how I escape the ?.
Your pattern tries to match a ? that is preceded with a word char, and since there is none, you do not have a match.
Use the following pattern:
'/\?v(?:er)?\b/'
See the regex demo
Pattern details:
\? - a literal ? char
v(?:er)? - v or ver
\b - a word boundary (i.e. there must be a non-word char (not a digit, letter or _) or end of string after v or ver).
Note you do not need the first (initial) word boundary as it is already there, between a ? (a non-word char) and v (a word char). You would need a word boundary there if the ? were optional.
Try the following regex pattern;
(\?v(?:\b|(?:er(?!sion))))
Demo
This will allow ?ver and ?v, but will use a negative look-ahead to prevent matching if ?ver is followed by sion, as in your case ?version.
Building upon above answers, to match a word without being a part of another you can try
\b(WORD_HERE)\b which in your case is \b(\?ver)\b
this will allow ver and prevent version average

Regex group include if condition

i have try to use that regex /^(\S+)(?:\?$|$)/
with yolo and yolo?
works with both but on the second string (yolo?) the ? will be include on the capturing group (\S+).
It's a bug of regex or i have made some mistake?
edit: i don't want that the '?' included on the capturing group. Sry for my bad english.
You can use
If what you want to capture can't have a ? in it, use a negated character class [^...] (see demo here):
^([^\s?]+)\??$
If what you want to capture can have ? in it (for example, yolo?yolo? and you want
yolo?yolo), you need to make your quantifier + lazy by adding ? (see demo here):
^(\S+?)\??$
There is BTW no need for a capturing group here, you can use a look ahead (?=...) instead and look at the whole match (see demo here):
^[^\s?]+(?=\??$)
What was happening
The rules are: quantifiers (like +) are greedy by default, and the regex engine will return the first match it finds.
Considers what this means here:
\S+ will first match everything in yolo?, then the engine will try to match (?:\?$|$).
\?$ fails (we're already at the end of the string, so we now try to match an empty string and there's no ? left), but $ matches.
The regex has succesfully reached its end, the engine returns the match where \S+ has matched all the string and everything is in the first capturing group.
To match what you want you have to make the quantifier lazy (+?), or prevent the character class (yeah, \S is a character class) from matching your ending delimiter ? (with [^\s?] for example).
This is the correct response as \S+ matches one or more non-whitespace characters greedily, of which ? is one.
thus the question mark is matched in the (\S+) group and the non-capturing group resolves to $ you could make it work as you expect by making the match non-greedy with:
/^(\S+?)(?:\?$|$)/
demo
alternatively you could restrict the character group:
/^([^\s?]+)(?:\?$|$)/
demo
Make the + non greedy:
^(\S+?)\??$
The below regex would capture all the non space characters followed by an option ?,
^([\S]+)\??$
DEMO
OR
^([\w]+)\??$
DEMO
If you use \S+, it matches even the ? character also. So to seperate word and non word character you could use the above regex. It would capture only the word characters and matches the optional ? which is follwed by one or more word characters.
It is doing that because \S matches any non-white space character and it is being greedy.
Following the + quantifier with ? for a non-greedy match will prevent this.
^(\S+?)\??$
Or use \w here which matches any word character.
^(\w+)\??$

Categories