PHP Regex in recursion formula - php

I have a personal expression:
GREATER (5.26; 7; LESSER (3.00; 6; GREATER (7; 8)))
I would like a regex a function that would return what contains the parentheses with delimiter for example passing "GREATER" as an expression the regex would return an array with
[0] => 5.26; 7
[1] => 7; 8
I'm using this regex preg_match_all("/\((([^()]*|(?R))*)\)/", $valor, $matches); but it does not return me correctly.
Does anyone have any light on this regex?

I would recommend the regex /(?<=GREATER\s\()([\d.]+)(?:;\s)([\d.]+)/g.
Breaking this down:
(?<=GREATER\s\() - Does a positive lookbehind on GREATER (
([\d.]+) - Grabs any digits and dots that follow this, and groups them
(?:;\s) - Processes but doesn't group the ; and space
([\d.]+) - Grabs any digits and dots that follow this, and groups them
The global flag (g) is required for this to target each of the sets.
Run against the string GREATER (5.26; 7; LESSER (3.00; 6; GREATER (7; 8)))
This gives:
array(3
0 => array(2
0 => 5.26; 7
1 => 7; 8
)
1 => array(2
0 => 5.26
1 => 7
)
2 => array(2
0 => 7
1 => 8
)
)
This allows you access to the 'combined' two sets that proceed the word GREATER in the first grouping, in addition to separating the sets out in the subsequent groupings (allowing for easy access).
This can be seen working at PHPLiveRegex here.

Related

Parse strictly formatted text containing multiple entries with no delimiting character

I have a string containing multiple products orders which have been joined together without a delimiter.
I need to parse the input string and convert sets of three substrings into separate rows of data.
I tried splitting the string using split() and strstr() function, but could not generate the desired result.
How can I convert this statement into different columns?
RM is Malaysian Ringgit
From this statement:
"2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.6"
Into seperate row:
2 x Brew Coffeee Panas: RM7.4
2 x Tongkat Ali Ais: RM8.6
And this 2 row into this table in DB:
Table: Products
Product Name
Quantity
Total Amount (RM)
Brew Coffeee Panas
2
7.4
Tongkat Ali Ais
2
8.6
*Note: the "total amount" substrings will reliably have a numeric value with precision to one decimal place.
You could use regex if your string format is consistent. Here's an expression that could do that:
(\d) x (.+?): RM(\d+\.\d)
Basic usage
$re = '/(\d) x (.+?): RM(\d+\.\d)/';
$str = '2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.6';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_export($matches);
Which gives
array (
0 =>
array (
0 => '2 x Brew Coffeee Panas: RM7.4',
1 => '2',
2 => 'Brew Coffeee Panas',
3 => '7.4',
),
1 =>
array (
0 => '2 x Tongkat Ali Ais: RM8.6',
1 => '2',
2 => 'Tongkat Ali Ais',
3 => '8.6',
),
)
Group 0 will always be the full match, after that the groups will be quantity, product and price.
Try it online
Capture one or more digits
Match the space, x, space
Capture one or more non-colon characters until the first occuring colon
Match the colon, space, then RM
Capture the float value that has a max decimal length of 1OP says in comment under question: it only take one decimal place for the amount
There are no "lazy quantifiers" in my pattern, so the regex can move most swiftly.
This regex pattern is as Accurate as the sample data and requirement explanation allows, as Efficient as it can be because it only contains greedy quantifiers, as Concise as it can be thanks to the negated character class, and as Readable as the pattern can be made because there are no superfluous characters.
Code: (Demo)
var_export(
preg_match_all('~(\d+) x ([^:]+): RM(\d+\.\d)~', $string, $m)
? array_slice($m, 1) // omit the fullstring matches
: [] // if there are no matches
);
Output:
array (
0 =>
array (
0 => '2',
1 => '2',
),
1 =>
array (
0 => 'Brew Coffeee Panas',
1 => 'Tongkat Ali Ais',
),
2 =>
array (
0 => '7.4',
1 => '8.6',
),
)
You can add the PREG_SET_ORDER argument to the preg_match_all() call to aid in iterating the matches as rows.
preg_match_all('~(\d+) x ([^:]+): RM(\d+\.\d)~', $string, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
echo '<tr><td>' . implode('</td><td>', array_slice($match, 1)) . '</td></tr>';
}
You can use a regex like this:
/(\d+)\sx\s([^:]+):\sRM(\d+\.?\d?)(?=\d|$)/
Explanation:
(\d+) captures one or more digits
\s matches a whitespace character
([^:]+): captures one or more non : characters that come before a : character (you can also use something like [a-zA-Z0-9\s]+): if you know exactly which characters can exist before the : character - in this case lower case and upper case letters, digits 0 through 9 and whitespace characters)
(\d+\.?\d?) captures one or more digits, followed by a . and another digit if they exist
(?=\d|$) is a positive lookahead which matches a digit after the main expression without including it in the result, or the end of the string
You can also add the PREG_SET_ORDER flag to preg_match_all() to group the results:
PREG_SET_ORDER
Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on.
Code example:
<?php
$txt = "2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.62 x B026 Kopi Hainan Kecil: RM312 x B006 Kopi Hainan Besar: RM19.5";
$pattern = "/(\d+)\sx\s([^:]+):\sRM(\d+\.?\d?)(?=\d|$)/";
if(preg_match_all($pattern, $txt, $matches, PREG_SET_ORDER)) {
print_r($matches);
}
?>
Output:
Array
(
[0] => Array
(
[0] => 2 x Brew Coffeee Panas: RM7.4
[1] => 2
[2] => Brew Coffeee Panas
[3] => 7.4
)
[1] => Array
(
[0] => 2 x Tongkat Ali Ais: RM8.6
[1] => 2
[2] => Tongkat Ali Ais
[3] => 8.6
)
[2] => Array
(
[0] => 2 x B026 Kopi Hainan Kecil: RM31
[1] => 2
[2] => B026 Kopi Hainan Kecil
[3] => 31
)
[3] => Array
(
[0] => 2 x B006 Kopi Hainan Besar: RM19.5
[1] => 2
[2] => B006 Kopi Hainan Besar
[3] => 19.5
)
)
See it live here php live editor and here regex tester.
The first thing I would do would be to perform a simple replacement using preg_replace to insert, with the aid of a a back-reference to the captured item, based upon the known format of a single decimal point. Anything beyond that single decimal point forms part of the next item - the quantity in this case.
$str="2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.625 x Koala Kebabs: RM15.23 x Fried Squirrel Fritters: RM32.4";
# qty price
# 2 7.4
# 2 8.6
# 25 15.2
# 3 32.4
/*
Our RegEx to find the decimal precision,
to split the string apart and the quantity
*/
$pttns=(object)array(
'repchar' => '#(RM\d{1,}\.\d{1})#',
'splitter' => '#(\|)#',
'combo' => '#^((\d{1,}) x)(.*): RM(\d{1,}\.\d{1})$#'
);
# create a new version of the string with our specified delimiter - the PIPE
$str = preg_replace( $pttns->repchar, '$1|', $str );
# split the string intp pieces - discard empty items
$a=array_filter( preg_split( $pttns->splitter, $str, null ) );
#iterate through matches - find the quantity,item & price
foreach($a as $str){
preg_match($pttns->combo,$str,$matches);
$qty=$matches[2];
$item=$matches[3];
$price=$matches[4];
printf('%s %d %d<br />',$item,$qty,$price);
}
Which yields:
Brew Coffeee Panas 2 7
Tongkat Ali Ais 2 8
Koala Kebabs 25 15
Fried Squirrel Fritters 3 32

Regex to split string with the last occurrence of a dot, colon or underscore

we have thousands of rows of data containing articlenumers in all sort of formats and I need to split off main article number from a size indicator. There is (almost) always a dot, dash or underscore between some last characters (not always 2).
In short: Data is main article number + size indicator, the separator is differs but 1 of 3 .-_
Question: how do I split main article number + size indicator? My regex below isn't working that I built based on some Google-ing.
preg_match('/^(.*)[\.-_]([^\.-_]+)$/', $sku, $matches);
Sample data + expected result
AR.110052.15-40 [AR.110052.15 & 40]
BI.533.41-41 [BI.533.41 & 41]
CG.00554.000-39 [CG.00554.000 & 39]
LL.PX00.SC004-40 [LL.PX00.SC004 & 40]
LOS.HAPPYSOCKS.1X [LOS.HAPPYSOCKS & 1X]
MI.PMNH300043-XXXXL [MI.PMNH300043 & XXXXL]
You need to move the - to the end of character class to make the regex engine parse it as a literal hyphen:
^(.*)[._-]([^._-]+)$
See the regex demo. Actually, even ^(.+)[._-](.+)$ will work.
^ - matches the start of string
(.*) - Group 1 capturing any 0+ chars as many as possible up to the last...
[._-] - either . or _ or -
([^._-]+) - Group 2: one or more chars other than ., _ and -
$ - end of string.
Use preg_split() instead of preg_match() because:
this isn't a validation task, it is an extraction task and
preg_split() returns the exact desired array compared to preg_match() which carries the unnecessary fullstring match in its returned array.
Limit the number of elements produced (like you would with explode()'s limit parameter.
No capture groups are needed at all.
Greedily match zero or more characters, then just before matching the latest occurring delimiter, restart the fullstring match with \K. This will effectively use the matched delimiter as the character to explode on and it will be "lost" in the explosion.
Code: (Demo)
$strings = [
'AR.110052.15-40',
'BI.533.41-41',
'CG.00554.000-39',
'LL.PX00.SC004-40',
'LOS.HAPPYSOCKS.1X',
'MI.PMNH300043-XXXXL',
];
foreach ($strings as $string) {
var_export(preg_split('~.*\K[._-]~', $string, 2));
echo "\n";
}
Output:
array (
0 => 'AR.110052.15',
1 => '40',
)
array (
0 => 'BI.533.41',
1 => '41',
)
array (
0 => 'CG.00554.000',
1 => '39',
)
array (
0 => 'LL.PX00.SC004',
1 => '40',
)
array (
0 => 'LOS.HAPPYSOCKS',
1 => '1X',
)
array (
0 => 'MI.PMNH300043',
1 => 'XXXXL',
)

PHP Regex Matching Multiple Options

I am attempting to write some code that looks for the following:
Yesterday
Last 7 Days
Last 30 Days
This Year
Last Year
I have the following regex:
/yesterday|(\d+)(?=\s+(\w+))|(\w+)(?=\s+(year))/i
using:
preg_match("/yesterday|(\d+)(?=\s+(\w+))|(\w+)(?=\s+(year))/i", $input, $output)
I get the following results using phpliveregex.com with the preg_match:
array(5
0 => Last
1 =>
2 =>
3 => Last
4 => Year
)
array(5
0 => This
1 =>
2 =>
3 => This
4 => year
)
array(1
0 => yesterday
)
array(3
0 => 30
1 => 30
2 => days
)
array(3
0 => 7
1 => 7
2 => days
My issue is with the 'Year' options and the fact that they have empty keys because I want to refer to $output[1] and $output[2] to get the interval and 'span' (days). Only a single string will be passed at a time so it will be one of the options listed above and not multiple options to look for at once.
If anyone can help me find the best solution to return 'yesterday' or ('7' and 'days') or ('30' and 'days') or ('This' and 'Year') or ('Last' and 'Year') I would appreciate it very much!
EDIT
This is my desired output:
'Yesterday'
$output[0] => 'Yesterday'
'Last 7 Days'
$output[0] => '7'
$output[1] => 'Days'
'Last 30 Days'
$output[0] => '30'
$output[1] => 'Days'
'This Year'
$output[0] => 'This'
$output[1] => 'Year'
'Last Year'
$output[0] => 'Last'
$output[1] => 'Year'
I am trying to capture the 'groups' necessary to process the rest of my code.
You can use the branch reset feature to avoid empty groups:
$text = <<<'EOD'
Yesterday
Last 7 Days
Last 30 Days
This Year
Last Year
EOD;
$pattern = '~\b(?|yesterday\b|\d+(?= (days\b))|\w+(?= (year\b)))~i';
if (preg_match_all($pattern, $text, $matches, PREG_SET_ORDER))
print_r($matches);
// or preg_match without PREG_SET_ORDER if you test the strings one by one
pattern details:
\b
(?| # open the branch reset group
yesterday \b # when this branch succeeds the capture group is not defined
|
\d+ (?=[ ](days\b)) # in each branch the capture group
|
\w+ (?=[ ](year\b)) # has the same number
) # (so there is only one capture group)
result:
Array
(
[0] => Array
(
[0] => Yesterday
)
[1] => Array
(
[0] => 7
[1] => Days
)
[2] => Array
(
[0] => 30
[1] => Days
)
[3] => Array
(
[0] => This
[1] => Year
)
[4] => Array
(
[0] => Last
[1] => Year
)
)
Note that when you build the branch reset, you must begin with alternatives that has no groups, then alternatives with one groups, then two groups, etc. otherwise you may obtain useless empty groups in the result.
Note too that the group 0 isn't really a capture group but it is the whole match.
You can use:
/((?:Last|This)\s+(?:\d+\s+Days|Year)|Yesterday)/
Matches:
MATCH 1
1. [0-9] `Yesterday`
MATCH 2
1. [10-21] `Last 7 Days`
MATCH 3
1. [22-34] `Last 30 Days`
MATCH 4
1. [35-44] `This Year`
MATCH 5
1. [45-54] `Last Year`
Regex Demo:
https://regex101.com/r/mA8jZ5/1
Regex Explanation:
/((?:Last|This)\s+(?:\d+\s+Days|Year)|Yesterday)/
1st Capturing group ((?:Last|This)\s+(?:\d+\s+Days|Year)|Yesterday)
1st Alternative: (?:Last|This)\s+(?:\d+\s+Days|Year)
(?:Last|This) Non-capturing group
1st Alternative: Last
Last matches the characters Last literally (case sensitive)
2nd Alternative: This
This matches the characters This literally (case sensitive)
\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
(?:\d+\s+Days|Year) Non-capturing group
1st Alternative: \d+\s+Days
\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
Days matches the characters Days literally (case sensitive)
2nd Alternative: Year
Year matches the characters Year literally (case sensitive)
2nd Alternative: Yesterday
Yesterday matches the characters Yesterday literally (case sensitive)
What you just described can be Achieved with the following Regex:
(yesterday|\d+(?=\s+\w+)|\w+(?=\s+year))\s*(\w*)$
Tested on Regex101.com Demo Here :

Regex (preg_split): how do I split based on a delimiter, excluding delimiters included in a pair of quotes?

I split this:
1 2 3 4/5/6 "7/8 9" 10
into this:
1
2
3
4
5
6
"7/8 9"
10
with preg_split()
So my question is, how do I split based on a delimiter, excluding delimiters inside a pair of quotes?
I kind of want to avoid capturing the things in quotes first and would ideally like it to be a one liner.
You can use the following.
$text = '1 2 3 4/5/6 "7/8 9" 10';
$results = preg_split('~"[^"]*"(*SKIP)(*F)|[ /]+~', $text);
print_r($results);
Explanation:
On the left side of the alternation operator we match anything in quotations making the subpattern fail, forcing the regular expression engine to not retry the substring using backtracking control with (*SKIP) and (*F). The right side of the alternation operator matches either a space character or a forward slash not in quotations.
Output
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => "7/8 9"
[7] => 10
)
You can use:
$s = '1 2 3 4/5/6 "7/8 9" 10';
$arr = preg_split('~("[^"]*")|[ /]+~', $s, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
print_r( $arr );
OUTPUT:
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => "7/8 9"
[7] => 10
)
An other way with an optional group:
$arr = preg_split('~(?:"[^"]*")?\K[/\s]+~', $s);
The pattern "[^"]*"[/\s]+ matches a quoted part followed by one or more spaces and slashes. But since you don't want to remove quoted parts, you put a \K after it. The \K removes all that have been matched on the left from the match result. With this trick, when a quoted part is found the regex engine returns only spaces or slashes after and split on them.
Since there are not always a quoted part before a space or a slash, you only need to make it optional with a non-capturing group (?:...) and a question mark ?

Catching ids and its values from a string with preg_match

I was wondering how can I create preg_match for catching:
id=4
4 being any number and how can I search for the above example in a string?
If this is could be correct /^id=[0-9]/, the reason why I'm asking is because I'm not really good with preg_match.
for 4 being any number, we must set the range for it:
/^id\=[0-9]+/
\escape the equal-sign, plus after the number means 1 or even more.
You should go with the the following:
/id=(\d+)/g
Explanations:
id= - Literal id=
(\d+) - Capturing group 0-9 a character range between 0 and 9; + - repeating infinite times
/g - modifier: global. All matches (don't return on first match)
Example online
If you want to grab all ids and its values in PHP you could go with:
$string = "There are three ids: id=10 and id=12 and id=100";
preg_match_all("/id=(\d+)/", $string, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => id=10
[1] => id=12
[2] => id=100
)
[1] => Array
(
[0] => 10
[1] => 12
[2] => 100
)
)
Example online
Note: If you want to match all you must use /g modifier. PHP doesn't support it but has other function for that which is preg_match_all. All you need to do is remove the g from the regex.

Categories