Extracting GTIN (regex)

Extracting GTIN (regex) - php

I'm looking to extract GTIN codes from documents, they're 8, 12, 13 or 14 digit numbers. So I'm doing this:
$html = '8 digit 12345678 and now 12 digit 123456789012';
$extractGTIN = '/\d{7}$|^\d{11}$|^\d{12}$|^\d{13}/mi';
preg_match_all($extractGTIN, $html, $barcodes);
echo print_r ($barcodes, 1);
... but unexpectedly, it returns:
Array
(
[0] => Array
(
[0] => 6789012
)
)

You have not anchored the alternatives properly, use word boundaries. Instead of alternations, you may use an optional group here:
/\b\d{8}(?:\d{4,6})?\b/
See the regex demo.
Details:
\b - a leading word boundary
\d{8} - 8 digits
(?:\d{4,6})? - an optional sequence of 4, 5 or 6 digits (thus, matching all in all 8, 12, 13, 14 digits)
\b - trailing word boundary.
PHP demo:
$text = '8 digit 12345678 and now 12 digit 123456789012';
$extractGTIN = '/\b\d{8}(?:\d{4,6})?\b/';
preg_match_all($extractGTIN, $text, $barcodes);
print_r($barcodes[0]);
// => Array ( [0] => 12345678 [1] => 123456789012 )

Related

Parse strictly formatted text containing multiple entries with no delimiting character

I have a string containing multiple products orders which have been joined together without a delimiter.
I need to parse the input string and convert sets of three substrings into separate rows of data.
I tried splitting the string using split() and strstr() function, but could not generate the desired result.
How can I convert this statement into different columns?
RM is Malaysian Ringgit
From this statement:
"2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.6"
Into seperate row:
2 x Brew Coffeee Panas: RM7.4
2 x Tongkat Ali Ais: RM8.6
And this 2 row into this table in DB:
Table: Products
Product Name
Quantity
Total Amount (RM)
Brew Coffeee Panas
2
7.4
Tongkat Ali Ais
2
8.6
*Note: the "total amount" substrings will reliably have a numeric value with precision to one decimal place.

You could use regex if your string format is consistent. Here's an expression that could do that:
(\d) x (.+?): RM(\d+\.\d)
Basic usage
$re = '/(\d) x (.+?): RM(\d+\.\d)/';
$str = '2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.6';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_export($matches);
Which gives
array (
0 =>
array (
0 => '2 x Brew Coffeee Panas: RM7.4',
1 => '2',
2 => 'Brew Coffeee Panas',
3 => '7.4',
),
1 =>
array (
0 => '2 x Tongkat Ali Ais: RM8.6',
1 => '2',
2 => 'Tongkat Ali Ais',
3 => '8.6',
),
)
Group 0 will always be the full match, after that the groups will be quantity, product and price.
Try it online

Capture one or more digits
Match the space, x, space
Capture one or more non-colon characters until the first occuring colon
Match the colon, space, then RM
Capture the float value that has a max decimal length of 1OP says in comment under question: it only take one decimal place for the amount
There are no "lazy quantifiers" in my pattern, so the regex can move most swiftly.
This regex pattern is as Accurate as the sample data and requirement explanation allows, as Efficient as it can be because it only contains greedy quantifiers, as Concise as it can be thanks to the negated character class, and as Readable as the pattern can be made because there are no superfluous characters.
Code: (Demo)
var_export(
preg_match_all('~(\d+) x ([^:]+): RM(\d+\.\d)~', $string, $m)
? array_slice($m, 1) // omit the fullstring matches
: [] // if there are no matches
);
Output:
array (
0 =>
array (
0 => '2',
1 => '2',
),
1 =>
array (
0 => 'Brew Coffeee Panas',
1 => 'Tongkat Ali Ais',
),
2 =>
array (
0 => '7.4',
1 => '8.6',
),
)
You can add the PREG_SET_ORDER argument to the preg_match_all() call to aid in iterating the matches as rows.
preg_match_all('~(\d+) x ([^:]+): RM(\d+\.\d)~', $string, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
echo '<tr><td>' . implode('</td><td>', array_slice($match, 1)) . '</td></tr>';
}

You can use a regex like this:
/(\d+)\sx\s([^:]+):\sRM(\d+\.?\d?)(?=\d|$)/
Explanation:
(\d+) captures one or more digits
\s matches a whitespace character
([^:]+): captures one or more non : characters that come before a : character (you can also use something like [a-zA-Z0-9\s]+): if you know exactly which characters can exist before the : character - in this case lower case and upper case letters, digits 0 through 9 and whitespace characters)
(\d+\.?\d?) captures one or more digits, followed by a . and another digit if they exist
(?=\d|$) is a positive lookahead which matches a digit after the main expression without including it in the result, or the end of the string
You can also add the PREG_SET_ORDER flag to preg_match_all() to group the results:
PREG_SET_ORDER
Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on.
Code example:
<?php
$txt = "2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.62 x B026 Kopi Hainan Kecil: RM312 x B006 Kopi Hainan Besar: RM19.5";
$pattern = "/(\d+)\sx\s([^:]+):\sRM(\d+\.?\d?)(?=\d|$)/";
if(preg_match_all($pattern, $txt, $matches, PREG_SET_ORDER)) {
print_r($matches);
}
?>
Output:
Array
(
[0] => Array
(
[0] => 2 x Brew Coffeee Panas: RM7.4
[1] => 2
[2] => Brew Coffeee Panas
[3] => 7.4
)
[1] => Array
(
[0] => 2 x Tongkat Ali Ais: RM8.6
[1] => 2
[2] => Tongkat Ali Ais
[3] => 8.6
)
[2] => Array
(
[0] => 2 x B026 Kopi Hainan Kecil: RM31
[1] => 2
[2] => B026 Kopi Hainan Kecil
[3] => 31
)
[3] => Array
(
[0] => 2 x B006 Kopi Hainan Besar: RM19.5
[1] => 2
[2] => B006 Kopi Hainan Besar
[3] => 19.5
)
)
See it live here php live editor and here regex tester.

The first thing I would do would be to perform a simple replacement using preg_replace to insert, with the aid of a a back-reference to the captured item, based upon the known format of a single decimal point. Anything beyond that single decimal point forms part of the next item - the quantity in this case.
$str="2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.625 x Koala Kebabs: RM15.23 x Fried Squirrel Fritters: RM32.4";
# qty price
# 2 7.4
# 2 8.6
# 25 15.2
# 3 32.4
/*
Our RegEx to find the decimal precision,
to split the string apart and the quantity
*/
$pttns=(object)array(
'repchar' => '#(RM\d{1,}\.\d{1})#',
'splitter' => '#(\|)#',
'combo' => '#^((\d{1,}) x)(.*): RM(\d{1,}\.\d{1})$#'
);
# create a new version of the string with our specified delimiter - the PIPE
$str = preg_replace( $pttns->repchar, '$1|', $str );
# split the string intp pieces - discard empty items
$a=array_filter( preg_split( $pttns->splitter, $str, null ) );
#iterate through matches - find the quantity,item & price
foreach($a as $str){
preg_match($pttns->combo,$str,$matches);
$qty=$matches[2];
$item=$matches[3];
$price=$matches[4];
printf('%s %d %d<br />',$item,$qty,$price);
}
Which yields:
Brew Coffeee Panas 2 7
Tongkat Ali Ais 2 8
Koala Kebabs 25 15
Fried Squirrel Fritters 3 32

extract value from string using php

I'm trying to extract the start date April 1, 2017 using preg_match_all() from the following line where Both date is dynamic.
for April 1, 2017 to April 30, 2017
$contents = "for April 1, 2017 to April 30, 2017";
if(preg_match_all('/for\s(.*)+\s(.*)+,\s(.*?)+ to\s[A-Za-z]+\s[1-9]+,\s[0-9]\s/', $contents, $matches)){
print_r($matches);
}

If you want to match the whole string and match the 2 date like patterns you could use 2 capturing groups.
Note that it does not validate the date itself.
\bfor\h+(\w+\h+\d{1,2},\h+\d{4})\h+to\h+((?1))\b
In parts
\bfor\h+ Word boundary, match for and 1+ horizontal whitespace chars
( Capture group 1
\w+\h+\d{1,2} Match 1+ word chars, 1+ horizontal whitespace chars and 1 or 2 digits
,\h+\d{4} Match a comma, 1+ horizontal whitespace chars and 4 digits
) Close group
\h+to\h+ Match to between 1+ horizontal whitspace chars
( Capture group 2
(?1) Subroutine call to capture group 1 (the same pattern of group 1 as it is the same logic)
) Close group
\b Word boundary
Regex demo
For example
$re = '/\bfor\h+(\w+\h+\d{1,2},\h+\d{4})\h+to\h+((?1))\b/';
$str = 'for April 1, 2017 to April 30, 2017';
preg_match_all($re, $str, $matches);
print_r($matches)
Output
Array
(
[0] => Array
(
[0] => for April 1, 2017 to April 30, 2017
)
[1] => Array
(
[0] => April 1, 2017
)
[2] => Array
(
[0] => April 30, 2017
)
)

How to split repeated chars and numbers with preg_split?

I'm trying to solve some problem and I need to split repeated chars and all integers
$code = preg_split('/(.)(?!\1|$)\K/', $code);
I tried this one, but it separate and not repeated chars and not repeated integers , I need only chars
I have a string 'FFF86C6'
I need an array (FFF, 86, C, 6);
with pattern '/(.)(?!\1|$)\K/' returns (FFF, 8, 6, C, 6)
Do you have any idea how to make it?

You can use this regex with preg_match_all:
([A-Za-z])(\1*)|\d+
It looks for a letter, followed by some number of the same character, or some digits. By using preg_match_all we find all matches in the string. Usage in PHP:
$string = "FFF86CR6";
$pieces = preg_match_all('/([A-Za-z])(\1*)|\d+/', $string, $matches);
print_r($matches[0]);
Output:
Array (
[0] => FFF
[1] => 86
[2] => C
[3] => R
[4] => 6
)
Demo on 3v4l.org

regexp monetary strings with decimals and thousands separator

https://www.tehplayground.com/KWmxySzbC9VoDvP9
Why is the first string matched?
$list = [
'3928.3939392', // Should not be matched
'4.239,99',
'39',
'3929',
'2993.39',
'393993.999'
];
foreach($list as $str){
preg_match('/^(?<![\d.,])-?\d{1,3}(?:[,. ]?\d{3})*(?:[^.,%]|[.,]\d{1,2})-?(?![\d.,%]|(?: %))$/', $str, $matches);
print_r($matches);
}
output
Array
(
[0] => 3928.3939392
)
Array
(
[0] => 4.239,99
)
Array
(
[0] => 39
)
Array
(
[0] => 3929
)
Array
(
[0] => 2993.39
)
Array
(
)

You seem to want to match the numbers as standalone strings, and thus, you do not need the lookarounds, you only need to use anchors.
You may use
^-?(?:\d{1,3}(?:[,. ]\d{3})*|\d*)(?:[.,]\d{1,2})?$
See the regex demo
Details
^ - start of string
-? - an optional -
(?: - start of a non-capturing alternation group:
\d{1,3}(?:[,. ]\d{3})* - 1 to 3 digits, followed with 0+ sequences of ,, . or space and then 3 digits
| - or
\d* - 0+ digits
) - end of the group
(?:[.,]\d{1,2})? - an optional sequence of . or , followed with 1 or 2 digits
$ - end of string.

need some help on regex in preg_match_all()

so I need to extract the ticket number "Ticket#999999" from a string.. how do i do this using regex.
my current regex is working if I have more than one number in the Ticket#9999.. but if I only have Ticket#9 it's not working please help.
current regex.
preg_match_all('/(Ticket#[0-9])\w\d+/i',$data,$matches);
thank you.

In your pattern [0-9] matches 1 digit, \w matches another digit and \d+ matches 1+ digits, thus requiring 3 digits after #.
Use
preg_match_all('/Ticket#([0-9]+)/i',$data,$matches);
This will match:
Ticket# - a literal string Ticket#
([0-9]+) - Group 1 capturing 1 or more digits.
PHP demo:
$data = "Ticket#999999 ticket#9";
preg_match_all('/Ticket#([0-9]+)/i',$data,$matches, PREG_SET_ORDER);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => Ticket#999999
[1] => 999999
)
[1] => Array
(
[0] => ticket#9
[1] => 9
)
)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extracting GTIN (regex) - php

Related

Parse strictly formatted text containing multiple entries with no delimiting character

extract value from string using php

How to split repeated chars and numbers with preg_split?

regexp monetary strings with decimals and thousands separator

need some help on regex in preg_match_all()

Categories

Resources