Regex, how add more condition for matching in PHP

Regex, how add more condition for matching in PHP - php

everyone. I want to ask help writing the statement of regex in PHP
// This will match everything between "(" and ")"
'#('.preg_quote('(').')(.*)('.preg_quote(')').')#si'
//But I want to add one more condition
//-> Only match before "(" contains at least 2 Capital Letters continued.
For example,
// (bbbbb) <- No
// aa(bbbbb) <- No
// A(bbbbb) <-No
// AA(bbbbb) <- Yes
// AAA(bbbbb) <- Yes
How can I do it?
Thank you very much!!

You can make regex like this:
$regex = '#[A-Z]{2,}\((.*)\)#';
Note I have removed case insensitive pattern modifier i as you DO want case sensitivity. I have also removed s pattern modifier as my guess is you don't want this comparison across multiple lines.
You might consider ungreedy pattern modifier U if you expect that there might be multiple un-nested sets of parenthesis on a line and you want to capture them separately (i.e. AA(bbbbb) CC(ddddd)).
Also notice I got rid of all that preg_quote() non-sense as it makes your pattern harder to read. That function is best saved for cases where you might have variable string data you are inserting into the pattern which you need to escape. That is not the case here.

$re = "/^[A-Z]{2,}\([a-z]{5}\)$/";
I also added ^ and $. You can remove those operators if you're wanting to search a match within a string, but if you want to strictly match the whole string passed to preg_match, you may want to keep them.

Related

Matching between comment lines

I'm trying to match
Some HTML content
Using preg_match
\<\!\-\- FOR (\d+) \-\-\>(.*)\<\!\-\- END FOR \-\-\>
Doesn't work since they are on different lines.

First you need to learn that < ! - > are not special characters. Escaping them with backslashes makes you look a bit silly.
Then learn about the /x and /s flags. One of them is what you need. The other is me trying to trick you into learning something unrelated.
Then test your regular expression with some HTML content that contains two or more of those FOR/END FORs and see what happens.
Also, you need to look into how to make your capturing conditions "greedy" or "non greedy". By default, matches will be greedy. So a condition such as "A(.)B" with the string "A1B A2B A3B" would find one match "1B A2B A3" - everything form the first "A" to the last "B". If you wanted to find all the values between each set of A/B, then you need make the match non-greedy - "A(.?)B"

PHP regex last occurrence of words

My string is: /var/www/domain.com/public_html/foo/bar/folder/another/..
I want to remove the root folder from this string, to get only public folder, because some servers have multiple websites inside.
My actual regex is: /^(.*?)(www|public_html|public|html)/s
My actual result is: /domain.com/public_html/foo/bar/folder/another/..
But i want to remove the last ocorrence, and get somethig like this: /foo/bar/folder/another/..
Thanks!

You have to use a greedy quantifier and to check if the alternative is enclosed between slashes using lookarounds:
/^.*(?<![^\/])(?:www|public(?:_html)?|html)(?![^\/])/
About the lookarounds: I use negative lookarounds with a negated character class to check if there is a slash or the limit of the string at the same time. This way you are sure that for instance html is a folder and not the part of another folder name.
I removed the s modifier that is useless. I removed the capture groups too since the goal is to replace all with an empty string.

The ? makes your expression non-greedy which is not actually what you want here. Try:
^(.*)(www|public_html|public|html)
which should keep going until the last match.
Demo: https://regex101.com/r/v5WbB3/1/

match regex php between two string with string in middle

I would like to get a string made of one word with a delimiter word before and after it
i tried but doen t work
$stringData2 = file_get_contents('testtext3.txt');
$regular2=('/(?<=first del)*MAIN WORD(?=last del)*\s');
preg_match_all($regular2,
$stringData2,
$out, PREG_PATTERN_ORDER);
thank you very much for any help

No quantifier needed, add delimeter at end, put \s inside lookahead.
'/(?<=first del)MAIN WORD(?=last del\s)/'

This regex
(?<=xx)[^\s]*(?=yy)
matches hello in:
xxhelloyy
but fails to match in:
xxhello worldyy
This is probably what you're looking for.

If you want the delimiter string included in the match, then you should not be using lookahead or look or look behind. It should be something rather basic, like this.
/\s?first del MAIN WORD last del\s?/
If you do want to return JUST the MAIN WORD part of the match, then this will work.
/(?<=\s?first del)MAIN WORD(?=last del\s?)/
Put a 'i' at the very end of that to make it case insensitive, if you want. I only mention this, because in the example you gave me above has different case between the example text and the desired response.

Parse block with php regex

I'm trying to write a (I think) pretty simple RegEx with PHP but it's not working.
Basically I have a block defined like this:
%%%%blockname%%%%
stuff goes here
%%%%/blockname%%%%
I'm not any good at RegEx, but this is what I tried:
preg_match_all('/^%%%%(.*?)%%%%(.*?)%%%%\/(.*?)%%%%$/i',$input,$matches);
It returns an array with 4 empty entries.
I guess it also, apart from actually working, needs some sort of pointer for the third match because it should be equal to the first one?
Please enlighten me :)

You need to allow the dot to match newlines, and to allow ^ and $ to match at the start and end of lines (not just the entire string):
preg_match_all('/^%%%%(.*?)%%%%(.*?)%%%%\/(.*?)%%%%$/sm',$input,$matches);
The s (single-line) option makes the dot match any character including newlines.
The m (multi-line) option allows ^ and $ to match at the start and end of lines.
The i option is unnecessary in your regex since there are no case-sensitive characters in it.
Then, to answer the second part of your question: If blockname is the same in both cases, then you can make that explicit by using a backreference to the first capturing group:
preg_match_all('/^%%%%(.*?)%%%%(.*?)%%%%\/\1%%%%$/sm',$input,$matches);

I'm pretty sure you can't since these operations would need to save a variable and you can't in regex. You should try to do this using PHP's built-in token parser. http://php.net/manual/en/function.token-get-all.php

What does this Regular Expression do

$pee = preg_replace( '|<p>|', "$1<p>", $pee );
This regular expression is from the Wordpress source code (formatting.php, wpautop function); I'm not sure what it does, can anyone help?
Actually I'm trying to port this function to Python...if anyone knows of an existing port already, that would be much better as I'm really bad with regex.

The preg_replace() function - somewhat confusingly - allows you to use other delimiters besides the standard "/" for regular expressions, so
"|<p>|"
Would be a regular expression just matching
"<p>"
in the text. However, I'm not clear on what the replacement parameter of
"$1<p>"
would be doing, since there's no grouping to map to $1. It would seem like as given, this is just replacing a paragraph tag with an empty string followed by a paragraph tag, and in effect doing nothing.
Anyone with more in-depth knowledge of PHP quirks have a better analysis?

wordpress really calls a variable "pee" ?
I'm not sure what the $1 stands for (there are no braces in the first parameter?), so I don't think it actually does anything, but i could be wrong.

...?
Actually, it looks like this takes the first <p> tag and prepends the previous regular expression's first match to it (since there's no match in this one),
However, it seems that this behavior is bad to say the least, as there's no guarantee that preg_* functions won't clobber $1 with their own values.
Edit: Judging from Jay's comment, this regex actually does nothing.

The pipe symbols | in this case do not have the default meaning of "match this or that" but are use as alternative delimiters for the pattern instead of the more common slashes /. This may make sense, if you want to match for / without having to escape those appearances (e.g. /(.\*)\/(.\*)\// is not as readable as #/(.\*)/(.\*)/#). Seems quite contra productive to use | instead which is just another reserved char for patterns, though.
Normally $1 in the replacement pattern should match the first group denoted by parentheses. E.g if you've got a pattern like
"(.*)<p>"
$0 would contain the whole match and $1 the part before the <p>.
As the given reg-ex does not declare any groups and $1 is not a valid name for a variable (in PHP4) defined elsewhere, this call seems to replace any occurrences of <p> with <p>?
To be honest, now I'm also quite confused. Just a guess: gets another pattern-matching method (preg_match and the like) called before the given line so the $1 is "leaked" from there?

I highly recommend the amazing RegexBuddy

I believe that line does nothing.
For what it's worth, this is the previous line, in which $1 is set:
$pee = preg_replace('!<p>([^<]+)\s*?(</(?:div|address|form)[^>]*>)!', "<p>$1</p>$2", $pee);
However, I don't think that's worth anything. In my testing, $1 does not maintain a value from one preg_replace to the next, even if the next doesn't set its own value for $1. Remember that PHP variable names cannot begin with a number (see: http://php.net/language.variables ), so $1 is not a PHP variable. It only means something within a single preg_replace, and in this case the rules of preg_replace suggest it doesn't mean anything.
That said, autop being such a widely-used function makes me doubt my own conclusion that this line is doing nothing. So I look forward to someone correcting me.

The regex simply matches the literal text . The choice to delimit the regex with the vertical bar instead of forward slashes is very unfortunate. It doesn't change the code, but it makes it harder for humans to read. (It also makes it impossible to use the alternation operator in the regex.)
$1 is not a valid variable name in PHP, so $1 is never interpolated in double-quoted strings. The $1 gets passed to preg_replace unchanged. preg_replace parses the replacement string, and replaces $1 with the contents of the first capturing group. If there is no capturing group, $1 is replaced with nothing.
Thus, this code does the same as:
$pee = preg_replace( '/<p>/', "<p>", $pee );
It's not correct that this does nothing. The search-and-replace will run, slowing down your software, and eating up memory for temporary copies of $pee.

It replace the match from the pattern
"|<p>|"
by the string
"$1<p>"
The | in the replacement pattern is causes the regex engine to match either the part on the left side, or the part on the right side.
I do not get why it's used that way because usually it's for something like "ta(b|p)e"...
For the $1, I guess the variable $1 is in the PHP code and it replaced during the preg_replace so if $1 = "test"; the replacement will replace the
"<p>"
to
"test<p>"
But I am not sure of it for the $1

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex, how add more condition for matching in PHP - php

$re = "/^[A-Z]{2,}\([a-z]{5}\)$/"; I also added ^ and $. You can remove those operators if you're wanting to search a match within a string, but if you want to strictly match the whole string passed to preg_match, you may want to keep them.

Related

Matching between comment lines

PHP regex last occurrence of words

match regex php between two string with string in middle

Parse block with php regex

What does this Regular Expression do

Categories

Resources