Regex Nested Shortcode Does Not Work - php

Can someone tell me why this does not work ? - https://regex101.com/r/hJ5zN6/11
Test string:
[test][dzspgb_container][dzspgb_row][dzspgb_row_part part="1.4"][dzspgb_element text="whwaha" type_element="text"][/dzspgb_element][dzspgb_element text="test" type_element="text"][/dzspgb_element][/dzspgb_row_part][dzspgb_row_part part="1.4"][/dzspgb_row_part][dzspgb_row_part part="1.4"][/dzspgb_row_part][dzspgb_row_part part="1.4"][/dzspgb_row_part][/dzspgb_row][dzspgb_container]test second[/dzspgb_container][/dzspgb_container][/thisbreaks]
Test regex:
*\[dzspgb_container(.*?)](.*?)\[\/dzspgb_container\](?!\s*\[\/)*
If we remove [/thisbreaks] from the string, it will work.

It's because of the negative lookahead assertion at the end. I suggest you to remove that lookahead and use a greedy regex pattern like below.
\[dzspgb_container(.*?)](.*)\[\/dzspgb_container\]
DEMO
(?!\s*\[\/) asserts that the match won't be followed by (zero or more space characters and a [ symbol)

Related

PHP regex - match everything but not exactly one or more word

I try to find any string it not exactly one or more word
My pattern
(?!(^ignoreme$)|(^ignoreme2$))
Iam looking for
ignoreme - no
ignoreme2 - no
ignoremex - match
ignorem - match
gnoreme - match
ignoreme22 - match
But it return many space. How to do that thank.
https://regex101.com/r/u4EsNv/1
You may use this corrected regex:
^(?!ignoreme2?$).*$
Updated RegEx Demo
RegEx Details:
^: Start
(?!ignoreme2?$): Negartive lookahead to fail the match when we have ignoreme or ignoreme2 ahead till end.
.*: Match 0 more of any characters
$: End
Note that regex (?!(^ignoreme$)|(^ignoreme2$)) matches first 2 invalid cases because you have included ^ in negative lookahead expressions not outside. This causes regex engine to start matching after 1st character to satisfy lookahead assertions. (You can see that in regex101 highlighted matches)

PHP preg_match exclude

OK this regex will match string like 2aa, a2, 2aaaaaa, aaaa2, aaa2aaaa, 2222a2222-2222-aaaa... in short, mix of alphanumeric characters in a sequence:
preg_match("/(?:\d+[a-z]|[a-z]+\d)[a-z\d]*/i")
now I want to exclude something but I'm stuck, something like this doesn't work
preg_match("/(?!1920x1200|1920x1080)(?:\d+[a-z]|[a-z]+\d)[a-z\d]*/i")
for example the string aaaaa222aaa1920x1200bbbbb1234556789 is still matched but it shouldn't because it contains 1920x1200
any help is appreciated :)
i'm using regex found here for matching alphanum sequences Regex: match only letters WITH numbers
regex test: https://regex101.com/r/vU9aU9/1
Your negative lookahead should have .* in front to allow for 0 or more characters before not-allowed text. Also use anchors in your regex.
regex should be:
preg_match('/^.*?1920x1200.*$(*SKIP)(*F)|(?:\d+[a-z]|[a-z]+\d)[a-z\d]*/im')
RegEx Demo

Regex capturing words that have at least one lowercase letter

I'm trying to capture words in a string like:
1vTvFpU
KOoy6Cc
With regex pattern:
\b(?=(?:.*?[a-z]){1,})[A-Za-z0-9\/\-_.]{7,7}\b
But I have a problem because it also matches words like:
FDSFDFI
WEWEFDP
RRRRRRR
In a string:
FDSFDFI sdfdfdf
WEWEFDP traliii
RRRRRRR sdfdfdf
What Am I doing wrong?
I suggest you to use \S* instead of .* inside the lookahead. Because when you include .*? inside the lookahead, it checks for atleast one lower-case letter for the whole line not for the word.
\b(?=(?:\S*?[a-z]))[A-Za-z0-9\/\-_.]{7}\b
{7,7} is equal to {7}
DEMO
No need to use a lookahead to do that, character classes suffice:
[^\Wa-z]*+\w+
Then checks the string length with php (for example with array_filter).

Php lookahead assertion at the end of the regex

I want to write a regex with assertions to extract the number 55 from string unknownstring/55.1, here is my regex
$str = 'unknownstring/55.1';
preg_match('/(?<=\/)\d+(?=\.1)$/', $str, $match);
so, basically I am trying to say give me the number that comes after slash, and is followed by a dot and number 1, and after that there are no characters. But it does not match the regex. I just tried to remove the $ sign from the end and it matched. But that condition is essential, as I need that to be the end of the string, because the unknownstring part can contain similar text, e.g. unknow/545.1nstring/55.1. Perhaps I can use preg_match_all, and take the last match, but I want understand why the first regex does not work, where is my mistake.
Thanks
Use anchor $ inside lookahead:
(?<=\/)\d+(?=\.1$)
RegEx Demo
You cannot use $ outside the positive lookahead because your number is NOT at the end of input and there is a \.1 following it.

PHP regex and adjacent capturing groups

I'm using capturing groups in regular expressions for the first time and I'm wondering what my problem is, as I assume that the regex engine looks through the string left-to-right.
I'm trying to convert an UpperCamelCase string into a hyphened-lowercase-string, so for example:
HelloWorldThisIsATest => hello-world-this-is-a-test
My precondition is an alphabetic string, so I don't need to worry about numbers or other characters. Here is what I tried:
mb_strtolower(preg_replace('/([A-Za-z])([A-Z])/', '$1-$2', "HelloWorldThisIsATest"));
The result:
hello-world-this-is-atest
This is almost what I want, except there should be a hyphen between a and test. I've already included A-Z in my first capturing group so I would assume that the engine sees AT and hyphenates that.
What am I doing wrong?
The Reason your Regex will Not Work: Overlapping Matches
Your regex matches sA in IsATest, allowing you to insert a - between the s and the A
In order to insert a - between the A and the T, the regex would have to match AT.
This is impossible because the A is already matched as part of sA. You cannot have overlapping matches in direct regex.
Is all hope lost? No! This is a perfect situation for lookarounds.
Do it in Two Easy Lines
Here's the easy way to do it with regex:
$regex = '~(?<=[a-zA-Z])(?=[A-Z])~';
echo strtolower(preg_replace($regex,"-","HelloWorldThisIsATest"));
See the output at the bottom of the php demo:
Output: hello-world-this-is-a-test
Will add explanation in a moment. :)
The regex doesn't match any characters. Rather, it targets positions in the string: the positions between the change in letter case. To do so, it uses a lookbehind and a lookahead
The (?<=[a-zA-Z]) lookbehind asserts that what precedes the current position is a letter
The (?=[A-Z]) lookahead asserts that what follows the current position is an upper-case letter.
We just replace these positions with a -, and convert the lot to lowercase.
If you look carefully on this regex101 screen, you can see lines between the words, where the regex matches.
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
I've separated the two regular expressions for simplicity:
preg_replace(array('/([a-z])([A-Z])/', '/([A-Z]+)([A-Z])/'), '$1-$2', $string);
It processes the string twice to find:
lowercase -> uppercase boundaries
multiple uppercase letters followed by another uppercase letter
This will have the following behaviour:
ThisIsHTMLTest -> This-Is-HTML-Test
ThisIsATest -> This-Is-A-Test
Alternatively, use a look-ahead assertion (this will effect the reuse of the last capital letter that was used in the previous match):
preg_replace('/([A-Z]+|[a-z]+)(?=[A-Z])/', '$1-', $string);
To fix the interesting use case Jack mentioned in your comments (avoid splitting of abbreviations), I went with zx81's route of using lookahead and lookbehinds.
(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])
You can split it in two for the explanation:
First part
(?<= look behind to see if there is:
[a-z] any character of: 'a' to 'z'
) end of look-behind
(?= look ahead to see if there is:
[A-Z] any character of: 'A' to 'Z'
) end of look-ahead
(TL;DR: Match between strings of the CamelCase Pattern.)
Second part
(?<= look behind to see if there is:
[A-Z] any character of: 'A' to 'Z'
) end of look-behind
(?= look ahead to see if there is:
[A-Z] any character of: 'A' to 'Z'
[a-z] any character of: 'a' to 'z'
) end of look-ahead
(TL;DR: Special case, match between abbreviation and CamelCase pattern)
So your code would then be:
mb_strtolower(preg_replace('/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/', '-', "HelloWorldThisIsATest"));
Demo of matches
Demo of code

Categories