PHP Regex Matching Multiple Options - php

I am attempting to write some code that looks for the following:
Yesterday
Last 7 Days
Last 30 Days
This Year
Last Year
I have the following regex:
/yesterday|(\d+)(?=\s+(\w+))|(\w+)(?=\s+(year))/i
using:
preg_match("/yesterday|(\d+)(?=\s+(\w+))|(\w+)(?=\s+(year))/i", $input, $output)
I get the following results using phpliveregex.com with the preg_match:
array(5
0 => Last
1 =>
2 =>
3 => Last
4 => Year
)
array(5
0 => This
1 =>
2 =>
3 => This
4 => year
)
array(1
0 => yesterday
)
array(3
0 => 30
1 => 30
2 => days
)
array(3
0 => 7
1 => 7
2 => days
My issue is with the 'Year' options and the fact that they have empty keys because I want to refer to $output[1] and $output[2] to get the interval and 'span' (days). Only a single string will be passed at a time so it will be one of the options listed above and not multiple options to look for at once.
If anyone can help me find the best solution to return 'yesterday' or ('7' and 'days') or ('30' and 'days') or ('This' and 'Year') or ('Last' and 'Year') I would appreciate it very much!
EDIT
This is my desired output:
'Yesterday'
$output[0] => 'Yesterday'
'Last 7 Days'
$output[0] => '7'
$output[1] => 'Days'
'Last 30 Days'
$output[0] => '30'
$output[1] => 'Days'
'This Year'
$output[0] => 'This'
$output[1] => 'Year'
'Last Year'
$output[0] => 'Last'
$output[1] => 'Year'
I am trying to capture the 'groups' necessary to process the rest of my code.

You can use the branch reset feature to avoid empty groups:
$text = <<<'EOD'
Yesterday
Last 7 Days
Last 30 Days
This Year
Last Year
EOD;
$pattern = '~\b(?|yesterday\b|\d+(?= (days\b))|\w+(?= (year\b)))~i';
if (preg_match_all($pattern, $text, $matches, PREG_SET_ORDER))
print_r($matches);
// or preg_match without PREG_SET_ORDER if you test the strings one by one
pattern details:
\b
(?| # open the branch reset group
yesterday \b # when this branch succeeds the capture group is not defined
|
\d+ (?=[ ](days\b)) # in each branch the capture group
|
\w+ (?=[ ](year\b)) # has the same number
) # (so there is only one capture group)
result:
Array
(
[0] => Array
(
[0] => Yesterday
)
[1] => Array
(
[0] => 7
[1] => Days
)
[2] => Array
(
[0] => 30
[1] => Days
)
[3] => Array
(
[0] => This
[1] => Year
)
[4] => Array
(
[0] => Last
[1] => Year
)
)
Note that when you build the branch reset, you must begin with alternatives that has no groups, then alternatives with one groups, then two groups, etc. otherwise you may obtain useless empty groups in the result.
Note too that the group 0 isn't really a capture group but it is the whole match.

You can use:
/((?:Last|This)\s+(?:\d+\s+Days|Year)|Yesterday)/
Matches:
MATCH 1
1. [0-9] `Yesterday`
MATCH 2
1. [10-21] `Last 7 Days`
MATCH 3
1. [22-34] `Last 30 Days`
MATCH 4
1. [35-44] `This Year`
MATCH 5
1. [45-54] `Last Year`
Regex Demo:
https://regex101.com/r/mA8jZ5/1
Regex Explanation:
/((?:Last|This)\s+(?:\d+\s+Days|Year)|Yesterday)/
1st Capturing group ((?:Last|This)\s+(?:\d+\s+Days|Year)|Yesterday)
1st Alternative: (?:Last|This)\s+(?:\d+\s+Days|Year)
(?:Last|This) Non-capturing group
1st Alternative: Last
Last matches the characters Last literally (case sensitive)
2nd Alternative: This
This matches the characters This literally (case sensitive)
\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
(?:\d+\s+Days|Year) Non-capturing group
1st Alternative: \d+\s+Days
\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
Days matches the characters Days literally (case sensitive)
2nd Alternative: Year
Year matches the characters Year literally (case sensitive)
2nd Alternative: Yesterday
Yesterday matches the characters Yesterday literally (case sensitive)

What you just described can be Achieved with the following Regex:
(yesterday|\d+(?=\s+\w+)|\w+(?=\s+year))\s*(\w*)$
Tested on Regex101.com Demo Here :

Related

Parse strictly formatted text containing multiple entries with no delimiting character

I have a string containing multiple products orders which have been joined together without a delimiter.
I need to parse the input string and convert sets of three substrings into separate rows of data.
I tried splitting the string using split() and strstr() function, but could not generate the desired result.
How can I convert this statement into different columns?
RM is Malaysian Ringgit
From this statement:
"2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.6"
Into seperate row:
2 x Brew Coffeee Panas: RM7.4
2 x Tongkat Ali Ais: RM8.6
And this 2 row into this table in DB:
Table: Products
Product Name
Quantity
Total Amount (RM)
Brew Coffeee Panas
2
7.4
Tongkat Ali Ais
2
8.6
*Note: the "total amount" substrings will reliably have a numeric value with precision to one decimal place.
You could use regex if your string format is consistent. Here's an expression that could do that:
(\d) x (.+?): RM(\d+\.\d)
Basic usage
$re = '/(\d) x (.+?): RM(\d+\.\d)/';
$str = '2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.6';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_export($matches);
Which gives
array (
0 =>
array (
0 => '2 x Brew Coffeee Panas: RM7.4',
1 => '2',
2 => 'Brew Coffeee Panas',
3 => '7.4',
),
1 =>
array (
0 => '2 x Tongkat Ali Ais: RM8.6',
1 => '2',
2 => 'Tongkat Ali Ais',
3 => '8.6',
),
)
Group 0 will always be the full match, after that the groups will be quantity, product and price.
Try it online
Capture one or more digits
Match the space, x, space
Capture one or more non-colon characters until the first occuring colon
Match the colon, space, then RM
Capture the float value that has a max decimal length of 1OP says in comment under question: it only take one decimal place for the amount
There are no "lazy quantifiers" in my pattern, so the regex can move most swiftly.
This regex pattern is as Accurate as the sample data and requirement explanation allows, as Efficient as it can be because it only contains greedy quantifiers, as Concise as it can be thanks to the negated character class, and as Readable as the pattern can be made because there are no superfluous characters.
Code: (Demo)
var_export(
preg_match_all('~(\d+) x ([^:]+): RM(\d+\.\d)~', $string, $m)
? array_slice($m, 1) // omit the fullstring matches
: [] // if there are no matches
);
Output:
array (
0 =>
array (
0 => '2',
1 => '2',
),
1 =>
array (
0 => 'Brew Coffeee Panas',
1 => 'Tongkat Ali Ais',
),
2 =>
array (
0 => '7.4',
1 => '8.6',
),
)
You can add the PREG_SET_ORDER argument to the preg_match_all() call to aid in iterating the matches as rows.
preg_match_all('~(\d+) x ([^:]+): RM(\d+\.\d)~', $string, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
echo '<tr><td>' . implode('</td><td>', array_slice($match, 1)) . '</td></tr>';
}
You can use a regex like this:
/(\d+)\sx\s([^:]+):\sRM(\d+\.?\d?)(?=\d|$)/
Explanation:
(\d+) captures one or more digits
\s matches a whitespace character
([^:]+): captures one or more non : characters that come before a : character (you can also use something like [a-zA-Z0-9\s]+): if you know exactly which characters can exist before the : character - in this case lower case and upper case letters, digits 0 through 9 and whitespace characters)
(\d+\.?\d?) captures one or more digits, followed by a . and another digit if they exist
(?=\d|$) is a positive lookahead which matches a digit after the main expression without including it in the result, or the end of the string
You can also add the PREG_SET_ORDER flag to preg_match_all() to group the results:
PREG_SET_ORDER
Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on.
Code example:
<?php
$txt = "2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.62 x B026 Kopi Hainan Kecil: RM312 x B006 Kopi Hainan Besar: RM19.5";
$pattern = "/(\d+)\sx\s([^:]+):\sRM(\d+\.?\d?)(?=\d|$)/";
if(preg_match_all($pattern, $txt, $matches, PREG_SET_ORDER)) {
print_r($matches);
}
?>
Output:
Array
(
[0] => Array
(
[0] => 2 x Brew Coffeee Panas: RM7.4
[1] => 2
[2] => Brew Coffeee Panas
[3] => 7.4
)
[1] => Array
(
[0] => 2 x Tongkat Ali Ais: RM8.6
[1] => 2
[2] => Tongkat Ali Ais
[3] => 8.6
)
[2] => Array
(
[0] => 2 x B026 Kopi Hainan Kecil: RM31
[1] => 2
[2] => B026 Kopi Hainan Kecil
[3] => 31
)
[3] => Array
(
[0] => 2 x B006 Kopi Hainan Besar: RM19.5
[1] => 2
[2] => B006 Kopi Hainan Besar
[3] => 19.5
)
)
See it live here php live editor and here regex tester.
The first thing I would do would be to perform a simple replacement using preg_replace to insert, with the aid of a a back-reference to the captured item, based upon the known format of a single decimal point. Anything beyond that single decimal point forms part of the next item - the quantity in this case.
$str="2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.625 x Koala Kebabs: RM15.23 x Fried Squirrel Fritters: RM32.4";
# qty price
# 2 7.4
# 2 8.6
# 25 15.2
# 3 32.4
/*
Our RegEx to find the decimal precision,
to split the string apart and the quantity
*/
$pttns=(object)array(
'repchar' => '#(RM\d{1,}\.\d{1})#',
'splitter' => '#(\|)#',
'combo' => '#^((\d{1,}) x)(.*): RM(\d{1,}\.\d{1})$#'
);
# create a new version of the string with our specified delimiter - the PIPE
$str = preg_replace( $pttns->repchar, '$1|', $str );
# split the string intp pieces - discard empty items
$a=array_filter( preg_split( $pttns->splitter, $str, null ) );
#iterate through matches - find the quantity,item & price
foreach($a as $str){
preg_match($pttns->combo,$str,$matches);
$qty=$matches[2];
$item=$matches[3];
$price=$matches[4];
printf('%s %d %d<br />',$item,$qty,$price);
}
Which yields:
Brew Coffeee Panas 2 7
Tongkat Ali Ais 2 8
Koala Kebabs 25 15
Fried Squirrel Fritters 3 32

Regex: Capturing multiple instances in one word group

I'm not good at Regex and I've been trying for hours now so I hope you can help me. I have this text:
✝his is *✝he* *in✝erne✝*
I need to capture (using PREG_OFFSET_CAPTURE) only the ✝ in a word surrounded with *, so I only need to capture the last three ✝ in this example. The output array should look something like this:
[0] => Array
(
[0] => ✝
[1] => 17
)
[1] => Array
(
[0] => ✝
[1] => 32
)
[2] => Array
(
[0] => ✝
[1] => 44
)
I've tried using (✝) but ofcourse this will select all instances including the words without asterisks. Then I've tried \*[^ ]*(✝)[^ ]*\* but this only gives me the last instance in one word. I've tried many other variations but all were wrong.
To clarify: The asterisk can be at all places in the string, but always at the beginning and end of a word. The opening asterisk always precedes a space except at the beginning of the string and the closing asterisk always ends with a space except at the end of the string. I must add that punctuation marks can be inside these asterisks. ✝ is exactly (and only) what I need to capture and can be at any position in a word.
You could make use of the \G anchor to get iterative matches between the *. The anchor matches either at the start of the string, or at the end of the previous match.
(?:\*|\G(?!^))[^&*]*(?>&(?!#)[^&*]*)*\K✝(?=[^*]*\*)
Explanation
(?: Non capture group
\* Match *
| Or
\G(?!^) Assert the end of the previous match, not at the start
) Close non capture group
[^&*]* Match 0+ times any char except & and *
(?> Atomic group
&(?!#) Match & only when not directly followed by #
[^&*]* Match 0+ times any char except & and *
)* Close atomic group and repeat 0+ times
\K Clear the match buffer (forget what is matched until now)
✝ Match literally
(?=[^*]*\*) Positive lookahead, assert a * at the right
Regex demo | Php demo
For example
$re = '/(?:\*|\G(?!^))[^&*]*(?>&(?!#)[^&*]*)*\K✝(?=[^*]*\*)/m';
$str = '✝his is *✝he* *in✝erne✝*';
preg_match_all($re, $str, $matches, PREG_OFFSET_CAPTURE);
print_r($matches[0]);
Output
Array
(
[0] => Array
(
[0] => ✝
[1] => 16
)
[1] => Array
(
[0] => ✝
[1] => 31
)
[2] => Array
(
[0] => ✝
[1] => 43
)
)
Note The the offset is 1 less than the expected as the string starts counting at 0. See PREG_OFFSET_CAPTURE
If you want to match more variations, you could use a non capturing group and list the ones that you would accept to match. If you don't want to cross newline boundaries you can exclude matching those in the negated character class.
(?:\*|\G(?!^))[^&*\r\n]*(?>&(?!#)[^&*\\rn]*)*\K&#(?:x271D|169);(?=[^*\r\n]*\*)
Regex demo

PHP regex select word that contain specific character from string

000001 0001 000000000000001975 00 02 0 000 2017/12/13 14:13:27
i m developing a system with laravel.
this is the string i get from a csv file. i need to select this date and time. into an array. if i can select word that contain : (14:13:27) i can get time and same method to date.
Try this pattern:
/(\d{4}\/[^ ]+) \K([\d:]+)/
Online Demo
A simple solution would be -
$string= "000001 0001 000000000000001975 00 02 0 000 2017/12/13 44:13:27";
preg_match("/([0-9]+):([0-5][0-9]):([0-5][0-9])/", $string, $matches);
echo $matches[0];
I'm still unclear about the OP's exact desired output, but I was more-so underwhelmed by the patterns in the other answers. I'll post this battery of solutions for the betterment of Stackoverflow since I couldn't find a suitable duplicate to close with.
I'm using the tildes ~ as pattern delimiters so that the / characters in the pattern don't need to be escaped. Also, notice I am not calling \K to restart the fullstring match because there is no reason to do so.
Code: (Demo)
$string='000001 0001 000000000000001975 00 02 0 000 2017/12/13 14:13:27';
var_export(preg_match('~\d{4}/\d{2}/\d{2}~',$string,$out)?$out:[]); // date
echo "\n\n";
var_export(preg_match('~\d{2}:\d{2}:\d{2}~',$string,$out)?$out:[]); // time
echo "\n\n";
var_export(preg_match('~\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}~',$string,$out)?$out:[]); // full datetime
echo "\n\n";
var_export(preg_match('~(\d{4}/\d{2}/\d{2}) (\d{2}:\d{2}:\d{2})~',$string,$out)?$out:[]); // capture date and time
echo "\n\n";
var_export(preg_match_all('~\d{4}/\d{2}/\d{2}|\d{2}:\d{2}:\d{2}~',$string,$out)?$out:[]); // capture date or time
echo "\n\n";
var_export(preg_match('~(\d{4})/(\d{2})/(\d{2}) (\d{2}):(\d{2}):(\d{2})~',$string,$out)?$out:[]); // capture date digits and time digits
Output:
// date
array (
0 => '2017/12/13',
)
// time
array (
0 => '14:13:27',
)
full date time
array (
0 => '2017/12/13 14:13:27',
)
// capture date and time
array (
0 => '2017/12/13 14:13:27',
1 => '2017/12/13',
2 => '14:13:27',
)
// capture date or time
array (
0 =>
array (
0 => '2017/12/13',
1 => '14:13:27',
),
)
// capture date digits and time digits
array (
0 => '2017/12/13 14:13:27',
1 => '2017',
2 => '12',
3 => '13',
4 => '14',
5 => '13',
6 => '27',
)
p.s. For future readers, if you require stronger date validation than this, then regex is probably not the right tool for your task.

need some help on regex in preg_match_all()

so I need to extract the ticket number "Ticket#999999" from a string.. how do i do this using regex.
my current regex is working if I have more than one number in the Ticket#9999.. but if I only have Ticket#9 it's not working please help.
current regex.
preg_match_all('/(Ticket#[0-9])\w\d+/i',$data,$matches);
thank you.
In your pattern [0-9] matches 1 digit, \w matches another digit and \d+ matches 1+ digits, thus requiring 3 digits after #.
Use
preg_match_all('/Ticket#([0-9]+)/i',$data,$matches);
This will match:
Ticket# - a literal string Ticket#
([0-9]+) - Group 1 capturing 1 or more digits.
PHP demo:
$data = "Ticket#999999 ticket#9";
preg_match_all('/Ticket#([0-9]+)/i',$data,$matches, PREG_SET_ORDER);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => Ticket#999999
[1] => 999999
)
[1] => Array
(
[0] => ticket#9
[1] => 9
)
)

Regex (preg_split): how do I split based on a delimiter, excluding delimiters included in a pair of quotes?

I split this:
1 2 3 4/5/6 "7/8 9" 10
into this:
1
2
3
4
5
6
"7/8 9"
10
with preg_split()
So my question is, how do I split based on a delimiter, excluding delimiters inside a pair of quotes?
I kind of want to avoid capturing the things in quotes first and would ideally like it to be a one liner.
You can use the following.
$text = '1 2 3 4/5/6 "7/8 9" 10';
$results = preg_split('~"[^"]*"(*SKIP)(*F)|[ /]+~', $text);
print_r($results);
Explanation:
On the left side of the alternation operator we match anything in quotations making the subpattern fail, forcing the regular expression engine to not retry the substring using backtracking control with (*SKIP) and (*F). The right side of the alternation operator matches either a space character or a forward slash not in quotations.
Output
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => "7/8 9"
[7] => 10
)
You can use:
$s = '1 2 3 4/5/6 "7/8 9" 10';
$arr = preg_split('~("[^"]*")|[ /]+~', $s, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
print_r( $arr );
OUTPUT:
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => "7/8 9"
[7] => 10
)
An other way with an optional group:
$arr = preg_split('~(?:"[^"]*")?\K[/\s]+~', $s);
The pattern "[^"]*"[/\s]+ matches a quoted part followed by one or more spaces and slashes. But since you don't want to remove quoted parts, you put a \K after it. The \K removes all that have been matched on the left from the match result. With this trick, when a quoted part is found the regex engine returns only spaces or slashes after and split on them.
Since there are not always a quoted part before a space or a slash, you only need to make it optional with a non-capturing group (?:...) and a question mark ?

Categories