Matching any amount of words regular expression - php

I'm trying to capture a line with n-number of words that follow a title sequence in PHP, but I cannot capture anything more than the first word. Here are the contents of the file that I am trying to match:
Name: test
Caption: test test test test
And here is the regular expression code and results...
preg_match_all('/([A-z]+:)\s*(\w+)[\r|\r\n|\n]*/', $contents, $array);
Results:
array(3) {
[0]=> array(2) {
[0]=> string(11) "Name: test "
[1]=> string(14) "Caption: test "
}
[1]=> array(2) {
[0]=> string(5) "Name:"
[1]=> string(8) "Caption:"
}
[2]=> array(2) {
[0]=> string(4) "test"
[1]=> string(4) "test"
}
}
Any help would be greatly appreciated.

Assuming that your input data always looks like your example (title segment, colon, words; all on a single line), this should do it:
preg_match_all('/([A-Za-z]+:)\s*(.*)/', $contents, $array);
This would result in $array[1] matching something like Name:, and then $array[2] would match the rest of the line (you may have to use trim() to strip any leading and/or trailing white space from $array[2]).
If you only want to capture "words" in the second part, I believe you could change the second capture group to something like:
preg_match_all('/([A-Za-z]+:)\s*([\w\s]+)/', $contents, $array);
Note also that you shouldn't use the [A-z] construct, since there are non-alphabetical characters in the ASCII table between the upper case letters and the lower case letters. See the ASCII Table for a character map.

Related

Conditional regex pattern for preg_match_all PHP

I have a pattern. Whenever a specific matching group is not present, it skips and find another match even if it skips the next matching group.
There are 4 capturing group.
first group, 2nd group, 3rd group, 4th group
3rd group is not always there. In my sample string, there are 3 sets. The first one does not contain any character for the 3rd group. I want a conditional statement for the 3rd group. If it does not found any character, then it should capture blank or space.
Demo: https://regex101.com/r/zK0aW4/1
it should be like this: https://regex101.com/r/sD4eB7/1
but I don't know how to assign condition for this.
If third match is not present then it should get blank. How do I write this in regex pattern?
For example:
$string = "\nTHIS IS FIRST PATTERN 63101 0789158126 0-0000000-000-0000\n4415 THIS IS FIRST \nPATTERN 49401-9528\n0406842931 Third match 0-0000000-000-0000\n11403 THIS IS FIRST PATTERN 49401-\n9595\n0112853789 Third match 0-0000000-000-0000";
preg_match_all(
"/([A-Z ,\.\-\&#\\\\n\/0-9&]+)(\d{10})([A-Z a-z]+)(\d{1}-\d{7}-\d{3}-\d{4}|\d{1}-\d{7}-\d{2}-\d{4})/",
$string,
$matches
);
This should output something like:
array(3) {
[0]=>
array(3) {
[0]=>
string(78) "\nTHIS IS FIRST PATTERN 63101 0789158126 0-0000000-000-0000"
[1]=>
string(84) "\n4415 THIS IS FIRST \nPATTERN 49401-9528\n0406842931 Third match 0-0000000-000-0000"
[2]=>
string(87) "\n11403 THIS IS FIRST PATTERN 49401-\n9595\n0112853789 Third match 0-0000000-000-0000"
}
[1]=>
array(5) {
[0]=>
string(36) "\nTHIS IS FIRST PATTERN 63101"
[1]=>
string(42) "\n4415 THIS IS FIRST \nPATTERN 49401-9528\n"
[2]=>
string(45) "\n11403 THIS IS FIRST PATTERN 49401-\n9595\n"
}
[2]=>
array(3) {
[0]=>
string(10) "0789158126"
[1]=>
string(10) "0406842931"
[2]=>
string(10) "0112853789"
}
[3]=>
array(3) {
[0]=>
string(15) " "
[1]=>
string(15) " Third match "
[2]=>
string(15) " Third match "
}
[4]=>
array(3) {
[0]=>
string(17) "0-0000000-000-0000"
[1]=>
string(17) "0-0000000-000-0000"
[2]=>
string(17) "0-0000000-000-0000"
}
}
Try this: https://regex101.com/r/zK0aW4/2
((?:[A-Z ,.&#\/0-9-]|&|\\n)+?)(\d{10})([A-Z a-z]+)?(\d{1}-\d{7}-\d{3}-\d{4}|\d{1}-\d{7}-\d{2}-\d{4})
Because your initial group has so many matches it was extending too far. By changing to a non-greedy or lazy match (*? or +?) it will match as little as possible. This makes it behave better with the following patterns.
Character classes (surrounded by [ and ]) are for matching single characters; I assumed that you wanted to match only a literal & and \n, so moved those out of the character class.

PHP and RegEx: how to split a string including comma,space,colon to some substring

I'm trying to split a string that can either be comma, space or semi-colon delimitted. It could also contain a space or spaces after each delimitter. For example
chr1:22222-333333 or
chr1 22222 333333 or
chr1 22222 333333 or
chr1:22,222-33,333
Any one of these would produce an array with three values ["chr1","22222","33333"], I have tried some method, but it not all complete. especially the fourth case.
Thank you very much for help me.
$yourString = "chr1:22222-33333"; // for instance
$output = preg_split("/:| |;/", $yourString);
This acts as an equivalent of explode() but when you want multiple delimiters.
Explanation of the characters in the preg_split statement:
/ acts to enclose the regular expression, as to say ok, that's happening here
| acts as a OR statement, as if to tell this OR this OR that
So that in the end, /:| |;/ means select anything that is ":" or " " or ";"
If you want to practice or simply understand better the principles of RegEx, you can have a look to this nice collection of RegEx tutorials
you can use str_replace with explode
$str = array('chr1:22222-333333', 'chr1 22222 333333', 'chr1 22222 333333', 'chr1:22,222-33,333');
foreach($str as $val){
var_dump(explode(" ", str_replace(array(',',':','-'), array('',' ', ' '), $val)));
}
which pretty much removes all , then replaces : AND - with a space then explodes with spaces as a delimiter.
Demo
which produces
array(3) {
[0]=>
string(4) "chr1"
[1]=>
string(5) "22222"
[2]=>
string(6) "333333"
}
array(3) {
[0]=>
string(4) "chr1"
[1]=>
string(5) "22222"
[2]=>
string(6) "333333"
}
array(3) {
[0]=>
string(4) "chr1"
[1]=>
string(5) "22222"
[2]=>
string(6) "333333"
}
array(3) {
[0]=>
string(4) "chr1"
[1]=>
string(5) "22222"
[2]=>
string(5) "33333"
}
If you value conciseness and want to keep things neat, preg_split is the best way to go, in my opinion.
In the following examples, I assume you want your input separated by commas, spaces or colons:
$splitted = preg_split("/[,: ]/", $string);
If you want to treat tabs as whitespaces, you can replace the single space character with \s, which will match tabs as well:
$splitted = preg_split("/[,:\s]/", $string);
Note: The \s will match newlines too, if your input may eventually be a multline string.
Yet, if you don't trust your input (You don't, right?) and think that perhaps subsequent spaces and/or tabs should be ignored and treated as single spaces, you can go with this version:
$splitted = preg_split("/,|:|\s/", $string);
All the forms above work great provided the input you presented. If you want to play with these a little, this is a nice place to do so.

PHP: regex to match complete matching brackets?

In PHP I have the following string:
$text = "test 1
{blabla:database{test}}
{blabla:testing}
{option:first{A}.Value}{blabla}{option:second{B}.Value}
{option:third{C}.Value}{option:fourth{D}}
{option:fifth}
test 2
";
I need to get all {option...} out of this string (5 in total in this string). Some have multiple nested brackets in them, and some don't. Some are on the same line, some are not.
I already found this regex:
(\{(?>[^{}]+|(?1))*\})
so the following works fine :
preg_match_all('/(\{(?>[^{}]+|(?1))*\})/imsx', $text, $matches);
The text that's not inside curly brackets is filtered out, but the matches also include the blabla-items, which I don't need.
Is there any way this regex can be changed to only include the option-items?
This problem is far better suited to a proper parser, however you can do it with regex if you really want to.
This should work as long as you're not embedding options inside other options.
preg_match_all(
'/{option:((?:(?!{option:).)*)}/',
$text,
$matches,
PREG_SET_ORDER
);
Quick explanation.
{option: // literal "{option:"
( // begin capturing group
(?: // don't capture the next bit
(?!{option:). // everything NOT literal "{option:"
)* // zero or more times
) // end capture group
} // literal closing brace
var_dumped output with your sample input looks like:
array(5) {
[0]=>
array(2) {
[0]=>
string(23) "{option:first{A}.Value}"
[1]=>
string(14) "first{A}.Value"
}
[1]=>
array(2) {
[0]=>
string(24) "{option:second{B}.Value}"
[1]=>
string(15) "second{B}.Value"
}
[2]=>
array(2) {
[0]=>
string(23) "{option:third{C}.Value}"
[1]=>
string(14) "third{C}.Value"
}
[3]=>
array(2) {
[0]=>
string(18) "{option:fourth{D}}"
[1]=>
string(9) "fourth{D}"
}
[4]=>
array(2) {
[0]=>
string(14) "{option:fifth}"
[1]=>
string(5) "fifth"
}
}
Try this regular expression - it was tested using .NET regular expressions, it may work with PHP as well:
\{option:.*?{\w}.*?}
Please note - I'm assuming that you have only 1 pair of brackets inside, and inside that pair you have only 1 alphanumeric character
I modified your initial expression to search for the string '(option:)' appended with non-whitespace characters (\S*), bounded by curly braces '{}'.
\{(option:)\S*\}
Given your input text, the following entries are matched in regexpal:
test 1
{blabla:database{test}}
{blabla:testing}
{option:first{A}.Value} {option:second{B}.Value}
{option:third{C}.Value}
{option:fourth{D}}
{option:fifth}
test 2
If you don't have multiple pairs of brackets on the same level this should works
/(\{option:(([^{]*(\{(?>[^{}]+|(?4))*\})[^}]*)|([^{}]+))\})/imsx

PHP preg_match_all same line

Having trouble with a regular expression (they are not my strong suit). I'm trying to match all strings between {{ and }}, but if a set of brackets occurs on the same line, it counts that as a single match... Example:
$string = "
Hello, kind sir
{{SHOULD_MATCH1}} {{SHOULD_MATCH2}}
welcome to
{{SHOULD_MATCH3}}
";
preg_match_all("/{{(.*)}}/", $string, $matches);
var_dump($matches); // returns arrays with 2 results instead of 3
returns:
array(2) {
[0]=>
array(2) {
[0]=>
string(35) "{{SHOULD_MATCH1}} {{SHOULD_MATCH2}}"
[1]=>
string(17) "{{SHOULD_MATCH3}}"
}
[1]=>
array(2) {
[0]=>
string(31) "SHOULD_MATCH1}} {{SHOULD_MATCH2"
[1]=>
string(13) "SHOULD_MATCH3"
}
}
Any help? Thanks!
Replace the * quantifier with its non-greedy form *?.
This will make it match as little as possible while still allowing the expression to match as a whole, which is different from its current behavior of matching as much as possible.
You can use one the following patterns.
{{(.+?)}
{{([^}]+)
{{(\w+)
{{([[:digit:][:upper:]_]+)
{{([\p{Lu}\p{N}_]+)

preg_match not returning expected results

I'm attempting to use regexp to parse a search string that from time to time may contain special syntax. The syntax im looking for is [special keyword : value] and i want each match put into an array. Keep in mind that the search string will contain other text that is not intended to be parsed.
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
$specialKeywords = array();
preg_match("/\[{1}.+\:{1}.+\]{1}/", $searchString, $specialKeywords);
var_dump($specialKeywords);
Output:
array(1) { [0]=> string(43) "[StartDate:2010-11-01] [EndDate:2010-11-31]" }
Desired Output:
array(2) { [0]=> string() "[StartDate:2010-11-01]"
[1]=> string() "[EndDate:2010-11-01]"}
Please let me know if i am not being clear enough.
Your .+ matches across the boundaries between the two [...] parts because it matches any character, and as many of them as possible. You could be more restrictive about which characters may be matched. Also {1} is redundant and can be dropped.
/\[[^:]*:[^\]]*\]/
should work more reliably.
Explanation:
\[ # match a [
[^:]* # match any number of characters except :
: # match a :
[^\]]* # match any number of characters except ]
\] # match a ]
This:
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
preg_match_all('/\[.*?\]/', $searchString, $match);
print_r($match);
gives the expected result, I'm not sure if it matches all the constraints.
Try the following:
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
$specialKeywords = array();
preg_match_all("/\[\w+:\d{4}-\d\d-\d\d\]/i", $searchString, $specialKeywords);
var_dump($specialKeywords[0]);
Outputs:
array(2) {
[0]=>
string(22) "[StartDate:2010-11-01]"
[1]=>
string(20) "[EndDate:2010-11-31]"
}
Use this regex: "/\[(.*?)\:(.*?)\]{1}/" and also use preg_match_all, it will return
array(3) {
[0]=>
array(2) {
[0]=>
string(22) "[StartDate:2010-11-01]"
[1]=>
string(20) "[EndDate:2010-11-31]"
}
[1]=>
array(2) {
[0]=>
string(9) "StartDate"
[1]=>
string(7) "EndDate"
}
[2]=>
array(2) {
[0]=>
string(10) "2010-11-01"
[1]=>
string(10) "2010-11-31"
}
}
/\[.+?\:.+?\]/
I suggest this method, less complex but it handles the same as tim's

Categories