regex and preg_replace_callback - php

I have a problem with a regular expression.
I'm working with tokens and I have to parse a text like this:
Just some random text
#IT=AB|First statement# #xxxx=xxx|First statement|Second statement#
More text
I use preg_replace_callback since I have to use the first statement or the second one, depending on the first expression is true or not; it's a sort of IF...ELSE... statement.
What I expect are 2 elements like this:
#IT=AB|First statement#
#xxxx=xxx|First statement|Second statement#
So I can start manipulating them inside my callback function.
I tried with this regex /#.*#/, but i get the entire string, it's not parsed into elements.
How can I achieve that? I'm sorry but regex aren't my thing :(

The quantifier * is greedy by default. So a .* will match as much as it can and as a result it'll match a # as well. To fix this you can make the * non-greedy by adding a ? after it. Now a .*? will try to much as little as it can.
/#.*?#/
or you can look for only non # characters between two #:
/#[^#]*#/

Related

Regex Including the next occurence of word

The regex works perfectly but the problem is it also include the next occurrence instead of ending with the first occurrence then start again from the
Regex : (?=<appView)\s{0,1}(.*)(?<=<\/appView>)
String: <appView></appView> <appView></appView>
But my problem is it eat matches the whole word like
(Match 1)<appView></appView> <appView></appView>
I want it to search the group differently but i cant make it work.
Desired output : (Match 1) <appView></appView> (Match 2)<appView></appView>
\s{0,1} equals \s? You need to use (.*?) to be lazy instead of (.*)
Use this pattern: ~(?=<appView)\s?(.*?)(?<=</appView>)~
Demo Link
*note, you don't have to escape / in the closing tag if you use something other than a slash as your pattern delimiter. I am using ~ at the beginning and end of my pattern to avoid escaping.
I fully recommend to switch from regex to an actual sequential xml parser. Regex is aweful for parsing xml based files, for example because of the problems below.
That said, you can "fix" your regex by using ([^<>]*). This will match all characters without < or >, which will make sure that no other tags are nested inside. If done with all tags, you cannot match something like <appview><unclosedTag></appView>, because it is invalid. If you can be certain that the structure is correct, this is slightly less of an issue.
Another problem your approach has is that if you have nested tags like so: <appView> something <appView> something else </appView> else </appView>, your approach will make you end up with [replaced] else </appView>.

PHP regular expression : match the closest one

I have a string like this
<div><span style="">toto</span> some character <span>toto2</span></div>
My regex:
/(<span .*>)(.*)(<\/span>)/
I used preg_match and it returns the entire string
<span style="">toto</span> some character <span>toto2</span>
I want it returns:
<span style="">toto</span>
and
<span>toto2</span>
What do I need to do to achieve this? Thanks.
How about this:
/(<span[^>]*>)(.*?)(<\/span>)/
Check the docs here at PHP preg_match Repetition:
By default, the quantifiers are "greedy", that is, they match as much as possible
and
However, if a quantifier is followed by a question mark, then it becomes lazy, and instead matches the minimum number of times possible
Even though I guess all previous answers are correct, I just want to add that as you only want to capture the whole expressions (i.e. from to ) you don't have to capture eveything inside the regexp with ()
The following does what you expect without capturing additional expressions
/(<span\w*[^>]*>[^<]*<\/span>)/
(tested on http://rubular.com/)
EDIT : of course there might be some differences between PHP and ruby regexp implementations, but the idea is the same :)

Regex to find function call php

I need the regex to find function calls in strings in php, I have tried to search here on stackoverflow but none of the ones i've tried worked.
this pattern: ^.*([\w][\(].*[\)])
This will match: functionone(fgfg) but also functionone(fgfg) dhgfghfgh functiontwo() as one match. Not 2 separate matches (as in functionone(fgfg) and functiontwo().
I don't know how to write it but I think this is what I need.
1. Any string, followed by (
2. Any string followed by )
And then it should stop, not continue. Any regex-gurus that can help me out?
I see 5 issues with your regex
If you want to match 2 functions in the same row, don't use the anchor ^, this will anchor the regex to the start of the string.
You then don't need .* at the start maybe more something like \w+ (I am not sure what the spec of a function name in PHP is)
if there is only one entry in a character class (and its not a negated one), you don't need the character class
The quantifier between the brackets needs to be a lazy one (followed by a ?). So after this 4 points your regex would look something like
\w+\(.*?\)
Is a regex really the right tool for this job?
Don't use regexp for this... use PHP's built-in tokenizer
A function signature is not a regular language. As such, you cannot use a regular expression to match a function signature. Your current regex will match signatures that are NOT valid function signatures.
What I would suggest you use is the PHP tokenizer.

Need php regex between 2 sets of chars

I need a regular expression for php that outputs everything between <!--:en--> and <!--:-->.
So for <!--:en-->STRING<!--:--> it would output just STRING.
EDIT: oh and the following <!--:--> nedds to be the first one after <!--:en--> becouse there are more in the text..
The one you want is actually not too complicated:
/<!--:en-->(.*?)<!--:-->/gi
Your matches will be in capture group 1.
Explanation:
The .*? is a lazy quantifier. Basically, it means "keep matching until you find the shortest string that will still fit this pattern." This is what will cause the matching to stop at the first instance of <!--:-->, rather than sucking up everything until the last <!--:--> in the document.
Usage is something like preg_match("/<!--:en-->(.*?)<!--:-->/gi", $input) if I recall my PHP correctly.
If you have just that input
$input = '<!--:en-->STRING<!--:-->';
You can try with
$output = strip_tags($input);
Try:
^< !--:en-- >(.*)< !--:-- >$
I don't think any of the other characters need to be escaped.
<!--:en--\b[^>]*>(.*?)<!--:-->
This will match the things between your tags. This will break if you nest your tags, but you didnt say you were doing that :)

What does this Regular Expression do

$pee = preg_replace( '|<p>|', "$1<p>", $pee );
This regular expression is from the Wordpress source code (formatting.php, wpautop function); I'm not sure what it does, can anyone help?
Actually I'm trying to port this function to Python...if anyone knows of an existing port already, that would be much better as I'm really bad with regex.
The preg_replace() function - somewhat confusingly - allows you to use other delimiters besides the standard "/" for regular expressions, so
"|<p>|"
Would be a regular expression just matching
"<p>"
in the text. However, I'm not clear on what the replacement parameter of
"$1<p>"
would be doing, since there's no grouping to map to $1. It would seem like as given, this is just replacing a paragraph tag with an empty string followed by a paragraph tag, and in effect doing nothing.
Anyone with more in-depth knowledge of PHP quirks have a better analysis?
wordpress really calls a variable "pee" ?
I'm not sure what the $1 stands for (there are no braces in the first parameter?), so I don't think it actually does anything, but i could be wrong.
...?
Actually, it looks like this takes the first <p> tag and prepends the previous regular expression's first match to it (since there's no match in this one),
However, it seems that this behavior is bad to say the least, as there's no guarantee that preg_* functions won't clobber $1 with their own values.
Edit: Judging from Jay's comment, this regex actually does nothing.
The pipe symbols | in this case do not have the default meaning of "match this or that" but are use as alternative delimiters for the pattern instead of the more common slashes /. This may make sense, if you want to match for / without having to escape those appearances (e.g. /(.\*)\/(.\*)\// is not as readable as #/(.\*)/(.\*)/#). Seems quite contra productive to use | instead which is just another reserved char for patterns, though.
Normally $1 in the replacement pattern should match the first group denoted by parentheses. E.g if you've got a pattern like
"(.*)<p>"
$0 would contain the whole match and $1 the part before the <p>.
As the given reg-ex does not declare any groups and $1 is not a valid name for a variable (in PHP4) defined elsewhere, this call seems to replace any occurrences of <p> with <p>?
To be honest, now I'm also quite confused. Just a guess: gets another pattern-matching method (preg_match and the like) called before the given line so the $1 is "leaked" from there?
I highly recommend the amazing RegexBuddy
I believe that line does nothing.
For what it's worth, this is the previous line, in which $1 is set:
$pee = preg_replace('!<p>([^<]+)\s*?(</(?:div|address|form)[^>]*>)!', "<p>$1</p>$2", $pee);
However, I don't think that's worth anything. In my testing, $1 does not maintain a value from one preg_replace to the next, even if the next doesn't set its own value for $1. Remember that PHP variable names cannot begin with a number (see: http://php.net/language.variables ), so $1 is not a PHP variable. It only means something within a single preg_replace, and in this case the rules of preg_replace suggest it doesn't mean anything.
That said, autop being such a widely-used function makes me doubt my own conclusion that this line is doing nothing. So I look forward to someone correcting me.
The regex simply matches the literal text . The choice to delimit the regex with the vertical bar instead of forward slashes is very unfortunate. It doesn't change the code, but it makes it harder for humans to read. (It also makes it impossible to use the alternation operator in the regex.)
$1 is not a valid variable name in PHP, so $1 is never interpolated in double-quoted strings. The $1 gets passed to preg_replace unchanged. preg_replace parses the replacement string, and replaces $1 with the contents of the first capturing group. If there is no capturing group, $1 is replaced with nothing.
Thus, this code does the same as:
$pee = preg_replace( '/<p>/', "<p>", $pee );
It's not correct that this does nothing. The search-and-replace will run, slowing down your software, and eating up memory for temporary copies of $pee.
It replace the match from the pattern
"|<p>|"
by the string
"$1<p>"
The | in the replacement pattern is causes the regex engine to match either the part on the left side, or the part on the right side.
I do not get why it's used that way because usually it's for something like "ta(b|p)e"...
For the $1, I guess the variable $1 is in the PHP code and it replaced during the preg_replace so if $1 = "test"; the replacement will replace the
"<p>"
to
"test<p>"
But I am not sure of it for the $1

Categories