This question already has answers here:
Difference between * and + regex
(7 answers)
Closed 4 years ago.
I am new to regex, and as i have studied, the * matches zero or more and + matches one or more, so i started to test this:
<?php
preg_match("/a/", 'bbba',$m);
preg_match("/a*/", 'bbba',$o);
preg_match("/a+/", 'bbba',$p);
echo '<pre>';
var_dump($m);
var_dump($o);
var_dump($p);
echo '</pre>';
?>
but the result is that * didn't match any thing and returned empty while the letter a exists:
array(1) {
[0]=>
string(1) "a"
}
array(1) {
[0]=>
string(0) ""
}
array(1) {
[0]=>
string(1) "a"
}
so what i miss here.
/a/ matches the first a in bbba
/a*/ matches 0 or more a characters. There are 0 a characters between the start of the string and the first b so it matches there.
/a+/ matches 1 or more a characters so it matches the first a character
The thing to note here is that a regex will try and match as early in the string it is checking as possible.
a* means match string which may NOT contain a because * matches zero or more, hence pattern a* will match even empty string.
To see all matches you can use preg_match_all, like:
<?php
preg_match_all("/a*/", 'bbba', $o);
var_dump($o);
as result you will see:
array(1) {
[0]=>
array(5) {
[0]=>
string(0) ""
[1]=>
string(0) ""
[2]=>
string(0) ""
[3]=>
string(1) "a"
[4]=>
string(0) ""
}
}
hope it will help you.
* means that the preceding item will be matched zero or more times.
+ means that the preceding item will be matched one or more times.
Also a* match empty, that why it shows an empty result. You can use preg_match_all("/a*/", 'bbba',$o); and then filter the results on the non-empty values of the array resulting.
Related
I am trying to practice asterisk * quantifier on a simple string, but while i have only two letters, the result contains a third match.
<?php
$x = 'ab';
preg_match_all("/a*/",$x,$m);
echo '<pre>';
var_dump($m);
echo '</pre>';
?>
the result came out:
array(1) {
[0]=>
array(3) {
[0]=> string(1) "a"
[1]=> string(0) ""
[2]=> string(0) ""
}
}
As i understand it first matched a then nothing matched when b, so the result should be
array(1) {
[0]=>
array(2) {
[0]=> string(1) "a"
[1]=> string(0) ""
}
}
So what is the third match?
From using a regex demo tool here, we can see that the first match is a, while the second and third matches are the zero width delimiters in between a and b, and also in between b and the end of the string.
Keep in mind that the behavior of preg_match_all is to repeatedly take the pattern a* and try to apply it sequentially to the entire input string.
I suspect that what you really want to use here is a+. If you examine this second demo, you will see that with a+ we only get a single match, for the single a letter in ab. So, I vote for using a+ here to resolve your problem.
Your regular expression '/a/*' Matches zero(empty) or more consecutive a characters.
Example : if you try to match '/a*/' to an empty string it will return one match because * refer to nothing or more . see here
the preg_match_all continues to look until finishning processing the entire string. Once match is found, it remainds of the string to try and apply another match.
I have a pattern. Whenever a specific matching group is not present, it skips and find another match even if it skips the next matching group.
There are 4 capturing group.
first group, 2nd group, 3rd group, 4th group
3rd group is not always there. In my sample string, there are 3 sets. The first one does not contain any character for the 3rd group. I want a conditional statement for the 3rd group. If it does not found any character, then it should capture blank or space.
Demo: https://regex101.com/r/zK0aW4/1
it should be like this: https://regex101.com/r/sD4eB7/1
but I don't know how to assign condition for this.
If third match is not present then it should get blank. How do I write this in regex pattern?
For example:
$string = "\nTHIS IS FIRST PATTERN 63101 0789158126 0-0000000-000-0000\n4415 THIS IS FIRST \nPATTERN 49401-9528\n0406842931 Third match 0-0000000-000-0000\n11403 THIS IS FIRST PATTERN 49401-\n9595\n0112853789 Third match 0-0000000-000-0000";
preg_match_all(
"/([A-Z ,\.\-\&#\\\\n\/0-9&]+)(\d{10})([A-Z a-z]+)(\d{1}-\d{7}-\d{3}-\d{4}|\d{1}-\d{7}-\d{2}-\d{4})/",
$string,
$matches
);
This should output something like:
array(3) {
[0]=>
array(3) {
[0]=>
string(78) "\nTHIS IS FIRST PATTERN 63101 0789158126 0-0000000-000-0000"
[1]=>
string(84) "\n4415 THIS IS FIRST \nPATTERN 49401-9528\n0406842931 Third match 0-0000000-000-0000"
[2]=>
string(87) "\n11403 THIS IS FIRST PATTERN 49401-\n9595\n0112853789 Third match 0-0000000-000-0000"
}
[1]=>
array(5) {
[0]=>
string(36) "\nTHIS IS FIRST PATTERN 63101"
[1]=>
string(42) "\n4415 THIS IS FIRST \nPATTERN 49401-9528\n"
[2]=>
string(45) "\n11403 THIS IS FIRST PATTERN 49401-\n9595\n"
}
[2]=>
array(3) {
[0]=>
string(10) "0789158126"
[1]=>
string(10) "0406842931"
[2]=>
string(10) "0112853789"
}
[3]=>
array(3) {
[0]=>
string(15) " "
[1]=>
string(15) " Third match "
[2]=>
string(15) " Third match "
}
[4]=>
array(3) {
[0]=>
string(17) "0-0000000-000-0000"
[1]=>
string(17) "0-0000000-000-0000"
[2]=>
string(17) "0-0000000-000-0000"
}
}
Try this: https://regex101.com/r/zK0aW4/2
((?:[A-Z ,.&#\/0-9-]|&|\\n)+?)(\d{10})([A-Z a-z]+)?(\d{1}-\d{7}-\d{3}-\d{4}|\d{1}-\d{7}-\d{2}-\d{4})
Because your initial group has so many matches it was extending too far. By changing to a non-greedy or lazy match (*? or +?) it will match as little as possible. This makes it behave better with the following patterns.
Character classes (surrounded by [ and ]) are for matching single characters; I assumed that you wanted to match only a literal & and \n, so moved those out of the character class.
In PHP I have the following string:
$text = "test 1
{blabla:database{test}}
{blabla:testing}
{option:first{A}.Value}{blabla}{option:second{B}.Value}
{option:third{C}.Value}{option:fourth{D}}
{option:fifth}
test 2
";
I need to get all {option...} out of this string (5 in total in this string). Some have multiple nested brackets in them, and some don't. Some are on the same line, some are not.
I already found this regex:
(\{(?>[^{}]+|(?1))*\})
so the following works fine :
preg_match_all('/(\{(?>[^{}]+|(?1))*\})/imsx', $text, $matches);
The text that's not inside curly brackets is filtered out, but the matches also include the blabla-items, which I don't need.
Is there any way this regex can be changed to only include the option-items?
This problem is far better suited to a proper parser, however you can do it with regex if you really want to.
This should work as long as you're not embedding options inside other options.
preg_match_all(
'/{option:((?:(?!{option:).)*)}/',
$text,
$matches,
PREG_SET_ORDER
);
Quick explanation.
{option: // literal "{option:"
( // begin capturing group
(?: // don't capture the next bit
(?!{option:). // everything NOT literal "{option:"
)* // zero or more times
) // end capture group
} // literal closing brace
var_dumped output with your sample input looks like:
array(5) {
[0]=>
array(2) {
[0]=>
string(23) "{option:first{A}.Value}"
[1]=>
string(14) "first{A}.Value"
}
[1]=>
array(2) {
[0]=>
string(24) "{option:second{B}.Value}"
[1]=>
string(15) "second{B}.Value"
}
[2]=>
array(2) {
[0]=>
string(23) "{option:third{C}.Value}"
[1]=>
string(14) "third{C}.Value"
}
[3]=>
array(2) {
[0]=>
string(18) "{option:fourth{D}}"
[1]=>
string(9) "fourth{D}"
}
[4]=>
array(2) {
[0]=>
string(14) "{option:fifth}"
[1]=>
string(5) "fifth"
}
}
Try this regular expression - it was tested using .NET regular expressions, it may work with PHP as well:
\{option:.*?{\w}.*?}
Please note - I'm assuming that you have only 1 pair of brackets inside, and inside that pair you have only 1 alphanumeric character
I modified your initial expression to search for the string '(option:)' appended with non-whitespace characters (\S*), bounded by curly braces '{}'.
\{(option:)\S*\}
Given your input text, the following entries are matched in regexpal:
test 1
{blabla:database{test}}
{blabla:testing}
{option:first{A}.Value} {option:second{B}.Value}
{option:third{C}.Value}
{option:fourth{D}}
{option:fifth}
test 2
If you don't have multiple pairs of brackets on the same level this should works
/(\{option:(([^{]*(\{(?>[^{}]+|(?4))*\})[^}]*)|([^{}]+))\})/imsx
Having trouble with a regular expression (they are not my strong suit). I'm trying to match all strings between {{ and }}, but if a set of brackets occurs on the same line, it counts that as a single match... Example:
$string = "
Hello, kind sir
{{SHOULD_MATCH1}} {{SHOULD_MATCH2}}
welcome to
{{SHOULD_MATCH3}}
";
preg_match_all("/{{(.*)}}/", $string, $matches);
var_dump($matches); // returns arrays with 2 results instead of 3
returns:
array(2) {
[0]=>
array(2) {
[0]=>
string(35) "{{SHOULD_MATCH1}} {{SHOULD_MATCH2}}"
[1]=>
string(17) "{{SHOULD_MATCH3}}"
}
[1]=>
array(2) {
[0]=>
string(31) "SHOULD_MATCH1}} {{SHOULD_MATCH2"
[1]=>
string(13) "SHOULD_MATCH3"
}
}
Any help? Thanks!
Replace the * quantifier with its non-greedy form *?.
This will make it match as little as possible while still allowing the expression to match as a whole, which is different from its current behavior of matching as much as possible.
You can use one the following patterns.
{{(.+?)}
{{([^}]+)
{{(\w+)
{{([[:digit:][:upper:]_]+)
{{([\p{Lu}\p{N}_]+)
I'm attempting to use regexp to parse a search string that from time to time may contain special syntax. The syntax im looking for is [special keyword : value] and i want each match put into an array. Keep in mind that the search string will contain other text that is not intended to be parsed.
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
$specialKeywords = array();
preg_match("/\[{1}.+\:{1}.+\]{1}/", $searchString, $specialKeywords);
var_dump($specialKeywords);
Output:
array(1) { [0]=> string(43) "[StartDate:2010-11-01] [EndDate:2010-11-31]" }
Desired Output:
array(2) { [0]=> string() "[StartDate:2010-11-01]"
[1]=> string() "[EndDate:2010-11-01]"}
Please let me know if i am not being clear enough.
Your .+ matches across the boundaries between the two [...] parts because it matches any character, and as many of them as possible. You could be more restrictive about which characters may be matched. Also {1} is redundant and can be dropped.
/\[[^:]*:[^\]]*\]/
should work more reliably.
Explanation:
\[ # match a [
[^:]* # match any number of characters except :
: # match a :
[^\]]* # match any number of characters except ]
\] # match a ]
This:
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
preg_match_all('/\[.*?\]/', $searchString, $match);
print_r($match);
gives the expected result, I'm not sure if it matches all the constraints.
Try the following:
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
$specialKeywords = array();
preg_match_all("/\[\w+:\d{4}-\d\d-\d\d\]/i", $searchString, $specialKeywords);
var_dump($specialKeywords[0]);
Outputs:
array(2) {
[0]=>
string(22) "[StartDate:2010-11-01]"
[1]=>
string(20) "[EndDate:2010-11-31]"
}
Use this regex: "/\[(.*?)\:(.*?)\]{1}/" and also use preg_match_all, it will return
array(3) {
[0]=>
array(2) {
[0]=>
string(22) "[StartDate:2010-11-01]"
[1]=>
string(20) "[EndDate:2010-11-31]"
}
[1]=>
array(2) {
[0]=>
string(9) "StartDate"
[1]=>
string(7) "EndDate"
}
[2]=>
array(2) {
[0]=>
string(10) "2010-11-01"
[1]=>
string(10) "2010-11-31"
}
}
/\[.+?\:.+?\]/
I suggest this method, less complex but it handles the same as tim's