Using Regex to detect if string exists - php

I need to use PHP's preg_match() and Regex to detect the following conditions:
If a URL path is one of the following:
products/new items
new items/products
new items/products/brand name
Do something...
I can't seem to figure out how to check if the a string exists before or after the word products. The closest I can get is:
if (preg_match("([a-zA-Z0-9_ ]+)\/products\/([a-zA-Z0-9_ ]+)", $url_path)) {
// Do something
Would anyone know a way to check if the first part of the string exists within the one regex line?

You could use an alternation with an optional group for the last item making the / part of the optional group.
If you are only looking for a match, you can omit the capturing groups.
(?:[a-zA-Z0-9_ ]+/products(?:/[a-zA-Z0-9_ ]+)?|products/[a-zA-Z0-9_ ]+)
Explanation
(?: Non catpuring group
[a-zA-Z0-9_ ]+/products Match 1+ times what is listed in the character class, / followed by products
(?:/[a-zA-Z0-9_ ]+)? Optionally match / and what is listed in the character class
| Or
products/[a-zA-Z0-9_ ]+ Match products/, match 1+ times what is listed
) Close group
Regex demo
Note that [a-zA-Z0-9_ ]+ might be shortened to [\w ]+

You can use alternation
([\w ]+)\/products|products\/([\w ]+)
Regex Demo
Note:- I am not sure how you're using the matched values, if you don't need back reference to any specific values then you can avoid capturing group, i.e.
[\w ]+\/products|products\/[\w ]+

Related

How to capture all phrases which doesn't have a pattern in the middle of theirself?

I want to capture all strings that doesn't have the pattern _ a[a-z]* _ in the specified position in the example below:
<?php
$myStrings = array(
"123-456",
"123-7-456",
"123-Apple-456",
"123-0-456",
"123-Alphabet-456"
);
foreach($myStrings as $myStr){
echo var_dump(
preg_match("/123-(?!a[a-z]*)-456/i", $myStr)
);
}
?>
You can check the following solution at this Regex101 share link.
^(123-(?:(?![aA][a-zA-Z]*).*)-456)|(123-456)$
It uses regex non-capturing group (?:) and regex negative lookahead (?!) to find all inner sections that do not start with 'a' (or 'A') and any letters after that. Also, the case with no inner section (123-456) is added (with the | sign) as a 2nd alternative for a wrong pattern.
A lookahead is a zero-length assertion. The middle part also needs to be consumed to meet 456. For consuming use e.g. \w+- for one or more word characters and hyphen inside an optional group that starts with your lookahead condition. See this regex101 demo (i flag for caseless matching).
Further for searching an array preg_grep can be used (see php demo at tio.run).
preg_grep('~^123-(?:(?!a[a-z]*-)\w+-)?456$~i', $myStrings);
There is also an invert option: PREG_GREP_INVERT. If you don't need to check for start and end a more simple pattern like -a[a-z]*- without lookahead could be used (another php demo).
Match the pattern and invert the result:
!preg_match('/a[a-z]*/i', $yourStr);
Don't try to do everything with a regex when programming languages exist to do the job.
You are not getting a match because in the pattern 123-(?!a[a-z]*)-456 the lookahead assertion (?!a[a-z]*) is always true because after matching the first - it has to directly match another hyphen like the pattern actually 123--456
If you move the last hyphen inside the lookahead like 123-(?!a[a-z]*-)456 you only get 1 match for 123-456 because you are actually not matching the middle part of the string.
Another option with php can be to consume the part that you don't want, and then use SKIP FAIL
^123-(?:a[a-z]*-(*SKIP)(*F)|\w+-)?456$
Explanation
^ Start of string
123- Match literally
(?: Non capture group for the alternation
a[a-z]*-(*SKIP)(*F) Match a, then optional chars a-z, then match - and skip the match
| Or
\w+- Match 1+ word chars followed by -
)? Close the non capture group and make it optional to also match when there is no middle part
456 Match literally
$ End of string
Regex demo
Example
$myStrings = array(
"123-456",
"123-7-456",
"123-Apple-456",
"123-0-456",
"123-Alphabet-456",
"123-b-456"
);
foreach($myStrings as $myStr) {
if (preg_match("/^123-(?:a[a-z]*-(*SKIP)(*F)|\w+-)?456$/i", $myStr, $match)) {
echo "Match for $match[0]" . PHP_EOL;
} else {
echo "No match for $myStr" . PHP_EOL;
}
}
Output
Match for 123-456
Match for 123-7-456
No match for 123-Apple-456
Match for 123-0-456
No match for 123-Alphabet-456
Match for 123-b-456

regex skip match if its follows by whitespace and a keyword

Currently trying to match comments with regexes but only if no function follows.
Currently I use a regex which also matches the keyword function.
And then check in the source code (php) if this group is set or not.
/\/\*\*.*?\*\/\s*(function)?/sg
https://regex101.com/r/l0j1ip/1
Now the question is whether it is possible to realize with pure regex.
I have tried it with a simple negative lookahead but without success.
Although the comment is no longer made individually, but then just with the subsequent comment.
/\/\*\*.*?\*\/\s*(?!function)/sg
https://regex101.com/r/PuUUw6/1
Next I tried non capture group. But also there without success.
/(?:\/\*\*.*?\*\/\s*function)|\/\*\*.*?\*\/\s*/sg
https://regex101.com/r/wkQE7E/1
After a comment with the information (*SKIP)(*FAIL) I also tried it without success.
All matches above this keyword are skipped. Also the single matches are skipped.
/\/\*\*.*?\*\/\s*function(*SKIP)(*FAIL)|\/\*\*.*?\*\//sg
https://regex101.com/r/OJSFrF/1
After reading the question again, it should be doable using negative lookahead ; the repetition must be inside the negative expression:
/\/\*\*((?!\*\/).)*\*\/(?!\s*function)/sg
Seems you need to understand better how backtracking works, using .*? instead of .* means the regex engine will try first to match everything after before .* however the negative lookahead makes the match fail and .* continues to match. Using ((?!\*\/).)* can't match \*\/ wheras .*? can, after backtracking.
Another solution is to use atomic group (?>\/\*\*.*?\*\/)(?!\s*function).
Another option without the /s flag could be
/\*\*(?:[^*]*+|\*(?!/)[^*]*+)*\*/(?!\s*function)
The pattern matches:
/\*\* Match /**
(?: Non capture group
[^*]*+ Match any char except * using a possessive quantifier
| Or
\*(?!/) Match * not followed by /
[^*]*+ Match any char except * using a possessive quantifier
)* Close non capture group and optionally repeat
\*/ Match */
(?!\s*function) Negative lookahead, assert not optional whitspace chars followed by function to the right
Regex demo
Note that you don't have to escape the backslash when using a different delimiter.
$regex = '~/\*\*(?:[^*]*+|\*(?!/)[^*]*+)*\*/(?!\s*function)~';

How to check the URL's structure using PHP preg_match?

All my site's URLs have the following structure:
https://www.example.com/section/item
where section is a word and item is a number.
So, possible URLs are:
https://www.example.com
https://www.example.com/section
https://www.example.com/section/item
By .htaccess, all requests go to index.php (route).
I want to show a 404 error message if user types:
https://www.example.com/section/item/somethingelse
In order to check the URL's structure, how can I change the pattern properly in the following function?
function isValidURL($url) {
return preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $url);
}
Thanks.
If section is a word (and can not contain digits), and item is a number, you could match word characters except digits using [^\W\d]+ and \d+ to match 1+ digits.
As in the example data there are optional parts, you could replace (/.*)?$ with (?:/[^\W\d]+(?:/\d+)?)?$.
Explanation
(?: Non capturing group
/[^\W\d]+ For section, match 1+ times a word char except a digit
(?:/\d+)? For item, optionally match / and 1+ digits
)? Close non capturing group and make it optional
If section can be a word which can also consists of only digits, you could also use \w+
The pattern might look like
^https?://[a-z0-9-]+(?:\.[a-z0-9-]+)*(?::[0-9]+)?(?:/[^\W\d]+(?:/\d+)?)?$
Regex demo
Note to escape the dot to match it literally.

PHP preg_replace datetime and matches

I want to style MySQL datetime format string to be something like the following:
<b>2018-07-09</b><i>10:25:00</i>
I'm trying to use preg_replace() to replace matched patterns like the following:
preg_replace("/([^\s]+)/",\'<b>$1</b><i>$2</i>\',$date)
However, the pattern https://regex101.com/r/LLRx3x/1, indicates that there two matches where each one has one group. The first match is the date, while the second is the time. I could not able to utilize $1 and $2 to access and replace each match. The above preg_match code returns something like:
<b>2018-07-09</b><i></i> <b>13:25:18</b><i></i>
So How could I access each match and replace them to get my goal?
You may capture the date and time into separate 2 groups,then you can use $1 and $2:
preg_replace('~^(\S+)\s+(\S+)$~', '<b>$1</b><i>$2</i>', $date)
See the regex demo and a PHP demo.
Note that the $n replacement backreferences only refer to the corresponding capturing group values, so if you defined one capturing group in your pattern, only 1 capture is accessible with $1 and $2 will hold an empty string, i.e. preg_replace('~^(\S+)\s+\S+$~', '<b>$1</b><i>$2</i>', '2018-07-09 13:25:18') will yield <b>2018-07-09</b><i></i>).
So, the point here is to match both the date and time but capture them into separate capturing groups, and then use the corresponding backreferences accordingly.
Pattern details
^ - start of the string
(\S+) - Capturing group 1 (later referred to with $1 backreference): any 1+ non-whitespace chars
\s+ - 1+ whitespaces
(\S+) - Capturing group 2 (later referred to with $2 backreference): any 1+ non-whitespace chars
$ - end of string.

Exclude a certain match in a capturing group in regex

I have a regex capturing group and I want to exclude a number if it matches a certain pattern also.
This is my capturing group:
https://regex101.com/r/zL1tL8/1
if \n is followed by a number and character like "1st", "2nd", "4dffgsd", "3sf" then it should stop the match BEFORE the number.
0-9 is important in the capturing group.
So far I have this pattern [0-9][a-zA-Z]+ to match a number followed by characters. How do I apply this to the capturing group as a condition?
Update:
https://regex101.com/r/zL1tL8/4
Line 1 is wrong.
It should not match a number followed by characters
You'll want to use a negative lookahead to "stop" the match if something after matches your pattern. So, something like this might work:
(\\n(?![0-9][a-zA-Z]))
See it in use here: https://regex101.com/r/zL1tL8/2
Here's a page with some more info on lookahead and lookbehind: http://www.rexegg.com/regex-lookarounds.html

Categories