extract value from string using php - php

I'm trying to extract the start date April 1, 2017 using preg_match_all() from the following line where Both date is dynamic.
for April 1, 2017 to April 30, 2017
$contents = "for April 1, 2017 to April 30, 2017";
if(preg_match_all('/for\s(.*)+\s(.*)+,\s(.*?)+ to\s[A-Za-z]+\s[1-9]+,\s[0-9]\s/', $contents, $matches)){
print_r($matches);
}

If you want to match the whole string and match the 2 date like patterns you could use 2 capturing groups.
Note that it does not validate the date itself.
\bfor\h+(\w+\h+\d{1,2},\h+\d{4})\h+to\h+((?1))\b
In parts
\bfor\h+ Word boundary, match for and 1+ horizontal whitespace chars
( Capture group 1
\w+\h+\d{1,2} Match 1+ word chars, 1+ horizontal whitespace chars and 1 or 2 digits
,\h+\d{4} Match a comma, 1+ horizontal whitespace chars and 4 digits
) Close group
\h+to\h+ Match to between 1+ horizontal whitspace chars
( Capture group 2
(?1) Subroutine call to capture group 1 (the same pattern of group 1 as it is the same logic)
) Close group
\b Word boundary
Regex demo
For example
$re = '/\bfor\h+(\w+\h+\d{1,2},\h+\d{4})\h+to\h+((?1))\b/';
$str = 'for April 1, 2017 to April 30, 2017';
preg_match_all($re, $str, $matches);
print_r($matches)
Output
Array
(
[0] => Array
(
[0] => for April 1, 2017 to April 30, 2017
)
[1] => Array
(
[0] => April 1, 2017
)
[2] => Array
(
[0] => April 30, 2017
)
)

Related

PHP string split regular

Regular exp = (Digits)*(A|B|DF|XY)+(Digits)+
I'm confused about this pattern really
I want to separate this string in PHP, someone can help me
My input maybe something like this
A1234
B 1239
1A123
12A123
1A 1234
12 A 123
1234 B 123456789
12 XY 1234567890
and convert to this
Array
(
[0] => 12
[1] => XY
[2] => 1234567890
)
<?php
$input = "12 XY 123456789";
print_r(preg_split('/\d*[(A|B|DF|XY)+\d+]+/', $input, 3));
//print_r(preg_split('/[\s,]+/', $input, 3));
//print_r(preg_split('/\d*[\s,](A|B)+[\s,]\d+/', $input, 3));
You may match and capture the numbers, letters, and numbers:
$input = "12 XY 123456789";
if (preg_match('/^(?:(\d+)\s*)?(A|B|DF|XY)(?:\s*(\d+))?$/', $input, $matches)){
array_shift($matches);
print_r($matches);
}
See the PHP demo and the regex demo.
^ - start of string
(?:(\d+)\s*)? - an optional sequence of:
(\d+) - Group 1: any or more digits
\s* - 0+ whitespaces
(A|B|DF|XY) - Group 2: A, B, DF or XY
(?:\s*(\d+))? - an optional sequence of:
\s* - 0+ whitespaces
(\d+) - Group 3: any or more digits
$ - end of string.

Time String to Seconds

How can i parse strings with regex to calculate the total seconds?
The strings will be in example:
40s
11m1s
1h47m3s
I started with the following regex
((\d+)h)((\d+)m)((\d+)s)
But this regex will only match the last example.
How can i make the parts optional?
Is there a better regex?
The format that you are using is very similar to the one that is used by java.time.Duration:
https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html#parse-java.lang.CharSequence-
Maybe you can use it instead of writing something custom?
Duration uses a format like this:
P1H47M3S
Maybe you can add the leading "P", and parse it (not sure if you have to uppercase)?
The format is called "ISO-8601":
https://en.wikipedia.org/wiki/ISO_8601
For example,
$set = array(
'40s',
'11m1s',
'1h47m3s'
);
$date = new DateTime();
$date2 = new DateTime();
foreach ($set as $value) {
$date2->add(new DateInterval('PT'.strtoupper($value)));
}
echo $date2->getTimestamp() - $date->getTimestamp(); // 7124 = 1hour 58mins 44secs.
You could use optional non-capture groups, for each (\dh, \dm, \ds):
$strs = ['40s', '11m1s', '1h47m3s'];
foreach ($strs as $str) {
if (preg_match('~(?:(\d+)h)?(?:(\d+)m)?(?:(\d+)s)?~', $str, $matches)) {
print_r($matches);
}
}
Outputs:
Array
(
[0] => 40s
[1] => // h
[2] => // m
[3] => 40 // s
)
Array
(
[0] => 11m1s
[1] => // h
[2] => 11 // m
[3] => 1 // s
)
Array
(
[0] => 1h47m3s
[1] => 1 // h
[2] => 47 // m
[3] => 3 // s
)
Regex:
(?: # non-capture group 1
( # capture group 1
\d+ # 1 or more number
) # end capture group1
h # letter 'h'
) # end non-capture group 1
? # optional
(?: # non-capture group 2
( # capture group 2
\d+ # 1 or more number
) # end capture group1
m # letter 'm'
) # end non-capture group 2
? # optional
(?: # non-capture group 3
( # capture group 3
\d+ # 1 or more number
) # end capture group1
s # letter 's'
) # end non-capture group 3
? # optional
This expression:
/(\d*?)s|(\d*?)m(\d*?)s|(\d*?)h(\d*?)m(\d*?)s/gm
returns 3 matches, one for each line. Each match is separated into the salient groups of only numbers.
The gist is that this will match either any number of digits before an 's' or that plus any number of digits before an 'm' or that plus any number of digits before an 'h'.

need some help on regex in preg_match_all()

so I need to extract the ticket number "Ticket#999999" from a string.. how do i do this using regex.
my current regex is working if I have more than one number in the Ticket#9999.. but if I only have Ticket#9 it's not working please help.
current regex.
preg_match_all('/(Ticket#[0-9])\w\d+/i',$data,$matches);
thank you.
In your pattern [0-9] matches 1 digit, \w matches another digit and \d+ matches 1+ digits, thus requiring 3 digits after #.
Use
preg_match_all('/Ticket#([0-9]+)/i',$data,$matches);
This will match:
Ticket# - a literal string Ticket#
([0-9]+) - Group 1 capturing 1 or more digits.
PHP demo:
$data = "Ticket#999999 ticket#9";
preg_match_all('/Ticket#([0-9]+)/i',$data,$matches, PREG_SET_ORDER);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => Ticket#999999
[1] => 999999
)
[1] => Array
(
[0] => ticket#9
[1] => 9
)
)

Extracting GTIN (regex)

I'm looking to extract GTIN codes from documents, they're 8, 12, 13 or 14 digit numbers. So I'm doing this:
$html = '8 digit 12345678 and now 12 digit 123456789012';
$extractGTIN = '/\d{7}$|^\d{11}$|^\d{12}$|^\d{13}/mi';
preg_match_all($extractGTIN, $html, $barcodes);
echo print_r ($barcodes, 1);
... but unexpectedly, it returns:
Array
(
[0] => Array
(
[0] => 6789012
)
)
You have not anchored the alternatives properly, use word boundaries. Instead of alternations, you may use an optional group here:
/\b\d{8}(?:\d{4,6})?\b/
See the regex demo.
Details:
\b - a leading word boundary
\d{8} - 8 digits
(?:\d{4,6})? - an optional sequence of 4, 5 or 6 digits (thus, matching all in all 8, 12, 13, 14 digits)
\b - trailing word boundary.
PHP demo:
$text = '8 digit 12345678 and now 12 digit 123456789012';
$extractGTIN = '/\b\d{8}(?:\d{4,6})?\b/';
preg_match_all($extractGTIN, $text, $barcodes);
print_r($barcodes[0]);
// => Array ( [0] => 12345678 [1] => 123456789012 )

PHP Regex Matching Multiple Options

I am attempting to write some code that looks for the following:
Yesterday
Last 7 Days
Last 30 Days
This Year
Last Year
I have the following regex:
/yesterday|(\d+)(?=\s+(\w+))|(\w+)(?=\s+(year))/i
using:
preg_match("/yesterday|(\d+)(?=\s+(\w+))|(\w+)(?=\s+(year))/i", $input, $output)
I get the following results using phpliveregex.com with the preg_match:
array(5
0 => Last
1 =>
2 =>
3 => Last
4 => Year
)
array(5
0 => This
1 =>
2 =>
3 => This
4 => year
)
array(1
0 => yesterday
)
array(3
0 => 30
1 => 30
2 => days
)
array(3
0 => 7
1 => 7
2 => days
My issue is with the 'Year' options and the fact that they have empty keys because I want to refer to $output[1] and $output[2] to get the interval and 'span' (days). Only a single string will be passed at a time so it will be one of the options listed above and not multiple options to look for at once.
If anyone can help me find the best solution to return 'yesterday' or ('7' and 'days') or ('30' and 'days') or ('This' and 'Year') or ('Last' and 'Year') I would appreciate it very much!
EDIT
This is my desired output:
'Yesterday'
$output[0] => 'Yesterday'
'Last 7 Days'
$output[0] => '7'
$output[1] => 'Days'
'Last 30 Days'
$output[0] => '30'
$output[1] => 'Days'
'This Year'
$output[0] => 'This'
$output[1] => 'Year'
'Last Year'
$output[0] => 'Last'
$output[1] => 'Year'
I am trying to capture the 'groups' necessary to process the rest of my code.
You can use the branch reset feature to avoid empty groups:
$text = <<<'EOD'
Yesterday
Last 7 Days
Last 30 Days
This Year
Last Year
EOD;
$pattern = '~\b(?|yesterday\b|\d+(?= (days\b))|\w+(?= (year\b)))~i';
if (preg_match_all($pattern, $text, $matches, PREG_SET_ORDER))
print_r($matches);
// or preg_match without PREG_SET_ORDER if you test the strings one by one
pattern details:
\b
(?| # open the branch reset group
yesterday \b # when this branch succeeds the capture group is not defined
|
\d+ (?=[ ](days\b)) # in each branch the capture group
|
\w+ (?=[ ](year\b)) # has the same number
) # (so there is only one capture group)
result:
Array
(
[0] => Array
(
[0] => Yesterday
)
[1] => Array
(
[0] => 7
[1] => Days
)
[2] => Array
(
[0] => 30
[1] => Days
)
[3] => Array
(
[0] => This
[1] => Year
)
[4] => Array
(
[0] => Last
[1] => Year
)
)
Note that when you build the branch reset, you must begin with alternatives that has no groups, then alternatives with one groups, then two groups, etc. otherwise you may obtain useless empty groups in the result.
Note too that the group 0 isn't really a capture group but it is the whole match.
You can use:
/((?:Last|This)\s+(?:\d+\s+Days|Year)|Yesterday)/
Matches:
MATCH 1
1. [0-9] `Yesterday`
MATCH 2
1. [10-21] `Last 7 Days`
MATCH 3
1. [22-34] `Last 30 Days`
MATCH 4
1. [35-44] `This Year`
MATCH 5
1. [45-54] `Last Year`
Regex Demo:
https://regex101.com/r/mA8jZ5/1
Regex Explanation:
/((?:Last|This)\s+(?:\d+\s+Days|Year)|Yesterday)/
1st Capturing group ((?:Last|This)\s+(?:\d+\s+Days|Year)|Yesterday)
1st Alternative: (?:Last|This)\s+(?:\d+\s+Days|Year)
(?:Last|This) Non-capturing group
1st Alternative: Last
Last matches the characters Last literally (case sensitive)
2nd Alternative: This
This matches the characters This literally (case sensitive)
\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
(?:\d+\s+Days|Year) Non-capturing group
1st Alternative: \d+\s+Days
\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
Days matches the characters Days literally (case sensitive)
2nd Alternative: Year
Year matches the characters Year literally (case sensitive)
2nd Alternative: Yesterday
Yesterday matches the characters Yesterday literally (case sensitive)
What you just described can be Achieved with the following Regex:
(yesterday|\d+(?=\s+\w+)|\w+(?=\s+year))\s*(\w*)$
Tested on Regex101.com Demo Here :

Categories