"Optional" substring matching with regex - php

I am writing a regular expression in PHP that will need to extract data from strings that look like:
Naujasis Salemas, Šiaurės Dakota
Jungtinės Valstijos (Centras, Šiaurės Dakota)
I would like to extract:
Naujasis Salemas
Centras
For the first case, I have written [^-]*(?=,), which works quite well. I would like to modify the expression so that if there are parenthesis ( and ) , it should search between those parenthesis and then extract everything before the comma.
Is it possible to do something like this with just 1 expression? If so, how can I make it search within parenthesis if they exist?

A conditional might help you here:
$stra = 'Naujasis Salemas, Šiaurės Dakota';
$strb = 'Jungtinės Valstijos (Centras, Šiaurės Dakota)';
$regex = '
/^ # Anchor at start of string.
(?(?=.*\(.+,.*\)) # Condition to check for: presence of text in parenthesis.
.*\(([^,]+) # If condition matches, match inside parenthesis to first comma.
| ([^,]+) # Else match start of string to first comma.
)
/x
';
preg_match($regex, $stra, $matches) and print_r($matches);
/*
Array
(
[0] => Naujasis Salemas
[1] =>
[2] => Naujasis Salemas
)
*/
preg_match($regex, $strb, $matches) and print_r($matches);
/*
Array
(
[0] => Jungtinės Valstijos (Centras
[1] => Centras
)
*/
Note that the index in $matches changes slightly above, but you might be able to work around that using named subpatterns.

I think this one could do it:
[^-(]+(?=,)
This is the same regex as your, but it doesn't allow a parenthesis in the matched string. It will still match on the first subject, and on the second it will match just after the opening parenthesis.
Try it here: http://ideone.com/Crhzz

You could use
[^(),]+(?=,)
That would match any text except commas or parentheses, followed by a comma.

Related

Regex for find value between curly braces which have pipe separator

$str = ({max_w} * {max_h} * {key|value}) / {key_1|value}
I have the above formula, I want to match the value with curly braces and which has a pipe separator. Right now the issue is it's giving me the values which have not pipe separator. I am new in regex so not have much idea about that. I tried below one
preg_match_all("^\{(|.*?|)\}^",$str, PREG_PATTERN_ORDER);
It gives below output
Array
(
[0] => key|value
[1] => max_w
[2] => max_h
[3] => key_1|value
)
Expected output
Array
(
[0] => key|value
[1] => key_1|value
)
Not sure about PHP. Here's the general regex that will do this.
{([^{}]*\|[^{}]*)}
Here is the demo.
You can use
(?<={)[^}]*\|[^}]*(?=})
For the given string the two matches are shown by the pointy characters:
({max_w} * {max_h} * {key|value}) / {key_1|value}
^^^^^^^^^ ^^^^^^^^^^^
Demo
(?<={) is a positive lookbehind. Arguably, the positive lookahead (?=}) is not be needed if it is known that all braces appear in matching, non-overlapping pairs.
The pattern \{(|.*?|)\} has 2 alternations | that can be omitted as the alternatives on the left and right of it are not really useful.
That leaves \{(.*?)} where the . can match any char including a pipe char, and therefore does not make sure that it is matched in between.
You can use a pattern that does not crosses matching a curly or a pipe char to match a single pipe in between.
{\K[^{}|]*\|[^{}|]*(?=})
{ Match opening {
\K Forget what is matches until now
[^{}|]* Match any char except the listed
\| Match a | char
[^{}|]* Match any char except the listed
(?=}) Assert a closing } to the right
Regex demo | PHP demo
$str = "({max_w} * {max_h} * {key|value}) / {key_1|value}";
$pattern = "/{\K[^{}|]*\|[^{}|]*(?=})/";
preg_match_all($pattern, $str, $matches);
print_r($matches[0]);
Output
Array
(
[0] => key|value
[1] => key_1|value
)
Or using a capture group:
{([^{}|]*\|[^{}|]*)}
Regex demo

Trying to create a regex in PHP that matches patterns inside a pattern

I have seen some regex examples where the string is "Test string: Group1Group2", and using preg_match_all(), matching for patterns of text that exists inside the tags.
However, what I am trying to do is a bit different, where my string is something like this:
"some t3xt../s8fo=123,sij(variable1=123,variable2=743,variable3=535)"
What I want to do is match the sections such as 'variable=123' that exist inside the parenthesis.
What I have so far is this:
if( preg_match_all("/\(([^\)]*?)\)"), $string_value, $matches )
{
print_r( $matches[1] );
}
But this just captures everything that's inside the parenthesis, and doesn't match anything else.
Edit:
The desired output would be:
"variable1=123"
"variable2=743"
"variable3=535"
The output that I am getting is:
"variable1=123,variable2=743,variable3=535"
You can extract the matches you need with a single call to preg_match_all if the matches do not contain (, ) or ,:
$s = '"some t3xt../s8fo=123,sij(variable1=123,variable2=743,variable3=535)"';
if (preg_match_all('~(?:\G(?!\A),|\()\K[^,]+(?=[^()]*\))~', $s, $matches)) {
print_r($matches[0]);
}
See the regex demo and a PHP demo.
Details:
(?:\G(?!\A),|\() - either end of the preceding successful match and a comma, or a ( char
\K - match reset operator that discards all text matched so far from the current overall match memory buffer
[^,]+ - one or more chars other than a comma (use [^,]* if you expect empty matches, too)
(?=[^()]*\)) - a positive lookahead that requires zero or more chars other than ( and ) and then a ) immediately to the right of the current location.
I would do this:
preg_match("/\(([^\)]+)\)/", $string_value, $matches);
$result = explode(",", $matches[1]);
If your end result is an array of key => value then you can transform it into a query string:
preg_match("/\(([^\)]+)\)/", $string_value, $matches);
parse_str(str_replace(',', '&', $matches[1]), $result);
Which yields:
Array
(
[variable1] => 123
[variable2] => 743
[variable3] => 535
)
Or replace with a newline \n and use parse_ini_string().

Matching whole words between commas, or a comma at the beginning, or a comma at the end with Regex

I have a string like this:
page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags
I made this regex that I expect to get the whole tags with:
(?<=\,)(rss-latest|listing-latest-no-category|category-128|page-9000)(?=\,)
I want it to match all the ocurrences.
In this case:
page-9000 and rss-latest.
This regex checks whole words between commas just fine but it ignores the first and the last because it's not between commas (obviously).
I've also tried that it checks if it's between commas OR one comma at the beginning OR one comma to the end, however it would give me false positives, as it would match:
category-128
while the string contains:
page-category-128
Any help?
Try using the following pattern:
(?<=,|^)(rss-latest|listing-latest-no-category|category-128|page-9000)(?=,|$)
The only change I have made is to add boundary markers ^ and $ to the lookarounds to also match on the start and end of the input.
Script:
$input = "page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags";
preg_match_all("/(?<=,|^)(rss-latest|listing-latest-no-category|category-128|page-9000)(?=,|$)/", $input, $matches);
print_r($matches[1]);
This prints:
Array
(
[0] => page-9000
[1] => rss-latest
)
Here is a non-regex way using explode and array_intersect:
$arr1 = explode(',', 'page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags');
$arr2 = explode('|', 'rss-latest|listing-latest-no-category|category-128|page-9000');
print_r(array_intersect($arr1, $arr2));
Output:
Array
(
[0] => page-9000
[6] => rss-latest
)
The (?<=\,) and (?=,) require the presence of , on both sides of the matching pattern. You want to match also at the start/end of string, and this is where you need to either explicitly tell to match either , or start/end of string or use double-negating logic with negated character classes inside negative lookarounds.
You may use
(?<![^,])(?:rss-latest|listing-latest-no-category|category-128|page-9000)(?![^,])
See the regex demo
Here, (?<![^,]) matches the start of string position or a , and (?![^,]) matches the end of string position or ,.
Now, you do not even need a capturing group, you may get rid of its overhead using a non-capturing group, (?:...). preg_match_all won't have to allocate memory for the submatches and the resulting array will be much cleaner.
PHP demo:
$re = '/(?<![^,])(?:rss-latest|listing-latest-no-category|category-128|page-9000)(?![^,])/m';
$str = 'page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags';
if (preg_match_all($re, $str, $matches)) {
print_r($matches[0]);
}
// => Array ( [0] => page-9000 [1] => rss-latest )

Regex match that exclude certain pattern

I want to split the string around for comma(,) or &. This is simple but I want to stop the match for any content between brackets.
For example if I run on
sleeping , waking(sit,stop)
there need to be only one split and two elements
thanks in advance
This is a perfect example for the (*SKIP)(*FAIL) mechanism PCRE (and thus PHP) offers.
You could come up with the following code:
<?php
$string = 'sleeping , waking(sit,stop)';
$regex = '~\([^)]*\)(*SKIP)(*FAIL)|[,&]~';
# match anything between ( and ) and discard it afterwards
# instead match any of the characters found on the right in square brackets
$parts = preg_split($regex, $string);
print_r($parts);
/*
Array
(
[0] => sleeping
[1] => waking(sit,stop)
)
*/
?>
This will split any , or & which is not in parentheses.

php RegEx extract values from string

I am new to regular expressions and I am trying to extract some specific values from this string:
"Iban: EU4320000713864374\r\nSwift: DTEADCCC\r\nreg.no 2361 \r\naccount no. 1234531735"
Values that I am trying to extract:
EU4320000713864374
2361
This is what I am trying to do now:
preg_match('/[^Iban: ](?<iban>.*)[^\\r\\nreg.no ](?<regnr>.*)[^\\r\\n]/',$str,$matches);
All I am getting back is null or empty array. Any suggestions would be highly appreciated
The square brackets make no sense, you perhaps meant to anchor at the beginning of a line:
$result = preg_match(
'/^Iban: (?<iban>.*)\R.*\R^reg.no (?<regnr>.*)/m'
, $str, $matches
);
This requires to set the multi-line modifier (see m at the very end). I also replaced \r\n with \R so that this handles all kind of line-separator sequences easily.
Example: https://eval.in/47062
A slightly better variant then only captures non-whitespace values:
$result = preg_match(
'/^Iban: (?<iban>\S*)\R.*\R^reg.no (?<regnr>\S*)/m'
, $str, $matches
);
Example: https://eval.in/47069
Result then is (beautified):
Array
(
[0] => "Iban: EU4320000713864374
Swift: DTEADCCC
reg.no 2361"
[iban] => "EU4320000713864374"
[1] => "EU4320000713864374"
[regnr] => "2361"
[2] => "2361"
)
preg_match("/Iban: (\\S+).*reg.no (\\S+)/s", $str, $matches);
There is a specific feature about newlines: dot (.) does not match newline character unless s flag is specified.

Categories