Strange result of using asterisk * quantifier - php

I am trying to practice asterisk * quantifier on a simple string, but while i have only two letters, the result contains a third match.
<?php
$x = 'ab';
preg_match_all("/a*/",$x,$m);
echo '<pre>';
var_dump($m);
echo '</pre>';
?>
the result came out:
array(1) {
[0]=>
array(3) {
[0]=> string(1) "a"
[1]=> string(0) ""
[2]=> string(0) ""
}
}
As i understand it first matched a then nothing matched when b, so the result should be
array(1) {
[0]=>
array(2) {
[0]=> string(1) "a"
[1]=> string(0) ""
}
}
So what is the third match?

From using a regex demo tool here, we can see that the first match is a, while the second and third matches are the zero width delimiters in between a and b, and also in between b and the end of the string.
Keep in mind that the behavior of preg_match_all is to repeatedly take the pattern a* and try to apply it sequentially to the entire input string.
I suspect that what you really want to use here is a+. If you examine this second demo, you will see that with a+ we only get a single match, for the single a letter in ab. So, I vote for using a+ here to resolve your problem.

Your regular expression '/a/*' Matches zero(empty) or more consecutive a characters.
Example : if you try to match '/a*/' to an empty string it will return one match because * refer to nothing or more . see here
the preg_match_all continues to look until finishning processing the entire string. Once match is found, it remainds of the string to try and apply another match.

Related

can't understand the difference between * and + quantifiers in REGEX [duplicate]

This question already has answers here:
Difference between * and + regex
(7 answers)
Closed 4 years ago.
I am new to regex, and as i have studied, the * matches zero or more and + matches one or more, so i started to test this:
<?php
preg_match("/a/", 'bbba',$m);
preg_match("/a*/", 'bbba',$o);
preg_match("/a+/", 'bbba',$p);
echo '<pre>';
var_dump($m);
var_dump($o);
var_dump($p);
echo '</pre>';
?>
but the result is that * didn't match any thing and returned empty while the letter a exists:
array(1) {
[0]=>
string(1) "a"
}
array(1) {
[0]=>
string(0) ""
}
array(1) {
[0]=>
string(1) "a"
}
so what i miss here.
/a/ matches the first a in bbba
/a*/ matches 0 or more a characters. There are 0 a characters between the start of the string and the first b so it matches there.
/a+/ matches 1 or more a characters so it matches the first a character
The thing to note here is that a regex will try and match as early in the string it is checking as possible.
a* means match string which may NOT contain a because * matches zero or more, hence pattern a* will match even empty string.
To see all matches you can use preg_match_all, like:
<?php
preg_match_all("/a*/", 'bbba', $o);
var_dump($o);
as result you will see:
array(1) {
[0]=>
array(5) {
[0]=>
string(0) ""
[1]=>
string(0) ""
[2]=>
string(0) ""
[3]=>
string(1) "a"
[4]=>
string(0) ""
}
}
hope it will help you.
* means that the preceding item will be matched zero or more times.
+ means that the preceding item will be matched one or more times.
Also a* match empty, that why it shows an empty result. You can use preg_match_all("/a*/", 'bbba',$o); and then filter the results on the non-empty values of the array resulting.

PHP: regex to match complete matching brackets?

In PHP I have the following string:
$text = "test 1
{blabla:database{test}}
{blabla:testing}
{option:first{A}.Value}{blabla}{option:second{B}.Value}
{option:third{C}.Value}{option:fourth{D}}
{option:fifth}
test 2
";
I need to get all {option...} out of this string (5 in total in this string). Some have multiple nested brackets in them, and some don't. Some are on the same line, some are not.
I already found this regex:
(\{(?>[^{}]+|(?1))*\})
so the following works fine :
preg_match_all('/(\{(?>[^{}]+|(?1))*\})/imsx', $text, $matches);
The text that's not inside curly brackets is filtered out, but the matches also include the blabla-items, which I don't need.
Is there any way this regex can be changed to only include the option-items?
This problem is far better suited to a proper parser, however you can do it with regex if you really want to.
This should work as long as you're not embedding options inside other options.
preg_match_all(
'/{option:((?:(?!{option:).)*)}/',
$text,
$matches,
PREG_SET_ORDER
);
Quick explanation.
{option: // literal "{option:"
( // begin capturing group
(?: // don't capture the next bit
(?!{option:). // everything NOT literal "{option:"
)* // zero or more times
) // end capture group
} // literal closing brace
var_dumped output with your sample input looks like:
array(5) {
[0]=>
array(2) {
[0]=>
string(23) "{option:first{A}.Value}"
[1]=>
string(14) "first{A}.Value"
}
[1]=>
array(2) {
[0]=>
string(24) "{option:second{B}.Value}"
[1]=>
string(15) "second{B}.Value"
}
[2]=>
array(2) {
[0]=>
string(23) "{option:third{C}.Value}"
[1]=>
string(14) "third{C}.Value"
}
[3]=>
array(2) {
[0]=>
string(18) "{option:fourth{D}}"
[1]=>
string(9) "fourth{D}"
}
[4]=>
array(2) {
[0]=>
string(14) "{option:fifth}"
[1]=>
string(5) "fifth"
}
}
Try this regular expression - it was tested using .NET regular expressions, it may work with PHP as well:
\{option:.*?{\w}.*?}
Please note - I'm assuming that you have only 1 pair of brackets inside, and inside that pair you have only 1 alphanumeric character
I modified your initial expression to search for the string '(option:)' appended with non-whitespace characters (\S*), bounded by curly braces '{}'.
\{(option:)\S*\}
Given your input text, the following entries are matched in regexpal:
test 1
{blabla:database{test}}
{blabla:testing}
{option:first{A}.Value} {option:second{B}.Value}
{option:third{C}.Value}
{option:fourth{D}}
{option:fifth}
test 2
If you don't have multiple pairs of brackets on the same level this should works
/(\{option:(([^{]*(\{(?>[^{}]+|(?4))*\})[^}]*)|([^{}]+))\})/imsx

PHP preg_match_all same line

Having trouble with a regular expression (they are not my strong suit). I'm trying to match all strings between {{ and }}, but if a set of brackets occurs on the same line, it counts that as a single match... Example:
$string = "
Hello, kind sir
{{SHOULD_MATCH1}} {{SHOULD_MATCH2}}
welcome to
{{SHOULD_MATCH3}}
";
preg_match_all("/{{(.*)}}/", $string, $matches);
var_dump($matches); // returns arrays with 2 results instead of 3
returns:
array(2) {
[0]=>
array(2) {
[0]=>
string(35) "{{SHOULD_MATCH1}} {{SHOULD_MATCH2}}"
[1]=>
string(17) "{{SHOULD_MATCH3}}"
}
[1]=>
array(2) {
[0]=>
string(31) "SHOULD_MATCH1}} {{SHOULD_MATCH2"
[1]=>
string(13) "SHOULD_MATCH3"
}
}
Any help? Thanks!
Replace the * quantifier with its non-greedy form *?.
This will make it match as little as possible while still allowing the expression to match as a whole, which is different from its current behavior of matching as much as possible.
You can use one the following patterns.
{{(.+?)}
{{([^}]+)
{{(\w+)
{{([[:digit:][:upper:]_]+)
{{([\p{Lu}\p{N}_]+)

Matching any amount of words regular expression

I'm trying to capture a line with n-number of words that follow a title sequence in PHP, but I cannot capture anything more than the first word. Here are the contents of the file that I am trying to match:
Name: test
Caption: test test test test
And here is the regular expression code and results...
preg_match_all('/([A-z]+:)\s*(\w+)[\r|\r\n|\n]*/', $contents, $array);
Results:
array(3) {
[0]=> array(2) {
[0]=> string(11) "Name: test "
[1]=> string(14) "Caption: test "
}
[1]=> array(2) {
[0]=> string(5) "Name:"
[1]=> string(8) "Caption:"
}
[2]=> array(2) {
[0]=> string(4) "test"
[1]=> string(4) "test"
}
}
Any help would be greatly appreciated.
Assuming that your input data always looks like your example (title segment, colon, words; all on a single line), this should do it:
preg_match_all('/([A-Za-z]+:)\s*(.*)/', $contents, $array);
This would result in $array[1] matching something like Name:, and then $array[2] would match the rest of the line (you may have to use trim() to strip any leading and/or trailing white space from $array[2]).
If you only want to capture "words" in the second part, I believe you could change the second capture group to something like:
preg_match_all('/([A-Za-z]+:)\s*([\w\s]+)/', $contents, $array);
Note also that you shouldn't use the [A-z] construct, since there are non-alphabetical characters in the ASCII table between the upper case letters and the lower case letters. See the ASCII Table for a character map.

php - regex pattern

I need to use a regex pattern , but what is the right php "decode" . my pattern is "similar" to BBcode i.e. ['something'] the 'something' could be "any length" but realistically I doubt not more than 10 chars/numbers. What is the correct php syntax to "unscrambe" i.e.
if ($row->xyz =['something'] ):
do this
else:
do that
endif;
Thanks in advance
A basic regexp to match BBCode style tags would look something like this:
preg_match('/\[[\/]?[A-Za-z0-9]+\]/', $row->xyz)
That will match anything that starts with a "[", ends with a "]", and has one or more alphanumeric characters in the middle (with an optional "/" for an end-tag.) Note it has flaws - for example, if you have a nested "[...]" in a larger "[...]", it will only grab the inner one. (i.e. [foo[bar]] will return only "[bar]".)
Example:
<?php
$regexp = '/\[[\/]?[A-Za-z0-9]+\]/';
$testString = '[i]An italic string with some [b]bold[/b] text.[/i]';
preg_match_all($regexp, $testString, $result);
print_r($result);
?>
Result:
array(1) {
[0]=> array(4) {
[0]=> string(3) "[i]"
[1]=> string(3) "[b]"
[2]=> string(4) "[/b]"
[3]=> string(4) "[/i]"
}
}
Of course, I'm not sure this is what you actually mean you want to do, but it is what you say you want to do. Are you sure you want to find BBCodes, rather than find strings that are wrapped in them?

Categories