PHP string split regular

PHP string split regular - php

Regular exp = (Digits)*(A|B|DF|XY)+(Digits)+
I'm confused about this pattern really
I want to separate this string in PHP, someone can help me
My input maybe something like this
A1234
B 1239
1A123
12A123
1A 1234
12 A 123
1234 B 123456789
12 XY 1234567890
and convert to this
Array
(
[0] => 12
[1] => XY
[2] => 1234567890
)
<?php
$input = "12 XY 123456789";
print_r(preg_split('/\d*[(A|B|DF|XY)+\d+]+/', $input, 3));
//print_r(preg_split('/[\s,]+/', $input, 3));
//print_r(preg_split('/\d*[\s,](A|B)+[\s,]\d+/', $input, 3));

You may match and capture the numbers, letters, and numbers:
$input = "12 XY 123456789";
if (preg_match('/^(?:(\d+)\s*)?(A|B|DF|XY)(?:\s*(\d+))?$/', $input, $matches)){
array_shift($matches);
print_r($matches);
}
See the PHP demo and the regex demo.
^ - start of string
(?:(\d+)\s*)? - an optional sequence of:
(\d+) - Group 1: any or more digits
\s* - 0+ whitespaces
(A|B|DF|XY) - Group 2: A, B, DF or XY
(?:\s*(\d+))? - an optional sequence of:
\s* - 0+ whitespaces
(\d+) - Group 3: any or more digits
$ - end of string.

Related

PHP - Regex optimization split string in parts

In PHP I try to make a regex to split a string in different parts as array elements.
For example this are my strings :
$string1 = "For a serving of 100 g Sugars: 2.3 g (Approximately)";
$string2 = "For a serving of 100 g Saturated Fat: 5.8 g (Approximately)";
$string3 = "For a portion of 100 g Energy Value: 290 kcal (Approximately)";
And I want to extract specific informations from these strings :
$arrayString1 = array('100 g','Sugars', '2.3 g');
$arrayString2 = array('100 g','Saturated Fat', '5.8 g');
$arrayString3 = array('100 g','Energy Value', '290 kcal');
I made this regex :
(^For a serving of )([\d g]*)([^:]*)(: )([\d.\d]*)( )([a-z]*)
Do you have any idea how to optimize this regex?
Thanks

You could make it a bit more specific matching the g or kcal and the digits.
To match all examples, you can use an alternation to match either of the alternatives (?:serving|portion)
Instead of using 7 capturing groups, you can use 3 capturing groups.
You can omit the first capturing group (^For a serving of )and combine the values of the digits and the unit.
^For\h+a\h+(?:serving|portion)\h+of\h+(\d+\h+g)\h+([^:\r\n]+):\h+(\d+(?:\.\d+)? (?:g|kcal))\b
^ Start of string
For\h+a\h+(?:serving|portion)\h+of\h+ Match the beginning of the string with either serving or portion
(\d+\h+g)\h+ Capture group 1, match 1+ digits and g
([^:\r\n]+):\h+ Capture group 2, match 1+ times any char except :, followed by matching : and 1+ horizontal whitspace chars
( Capture group 3
\d+(?:\.\d+)? Match 1+ digits with an optional decimal part
\h+(?:g|kcal) Match 1+ horizontal whitespace chars and either g or kcal
)\b Close group 3 and a word boundary to prevent the word being part of a longer word
Regex demo | Php demo
For example
$pattern = "~^For\h+a\h+(?:serving|portion)\h+of\h+(\d+\h+g)\h+([^:\r\n]+):\h+(\d+(?:\.\d+)?\h+(?:g|kcal))\b~";
$strings = [
"For a serving of 100 g Sugars: 2.3 g (Approximately)",
"For a serving of 100 g Saturated Fat: 5.8 g (Approximately)",
"For a portion of 100 g Energy Value: 290 kcal (Approximately)"
];
foreach ($strings as $string) {
preg_match($pattern, $string, $matches);
array_shift($matches);
print_r($matches);
}
Output
Array
(
[0] => 100 g
[1] => Sugars
[2] => 2.3 g
)
Array
(
[0] => 100 g
[1] => Saturated Fat
[2] => 5.8 g
)
Array
(
[0] => 100 g
[1] => Energy Value
[2] => 290 kcal
)

PHP regexp how get all matches in preg_match

I have string
$s = 'Sections: B3; C2; D4';
and regexp
preg_match('/Sections(?:[:;][\s]([BCDE][\d]+))+/ui', $s, $m);
Result is
Array
(
[0] => Sections: B3; C2; D4
[1] => D4
)
How I can get array with all sections B3, C2, D4
I can't use preg_match_all('/[BCDE][\d]+)/ui', because searching strongly after Sections: word.
The number of elements (B3, С2...) can be any.

You may use
'~(?:\G(?!^);|Sections:)\s*\K[BCDE]\d+~i'
See the regex demo
Details
(?:\G(?!^);|Sections:) - either the end of the previous match and a ; (\G(?!^);) or (|) a Sections: substring
\s* - 0 or more whitespace chars
\K - a match reset operator
[BCDE] - a char from the character set (due to i modifier, case insensitive)
\d+ - 1 or more digits.
See the PHP demo:
$s = "Sections: B3; C2; D4";
if (preg_match_all('~(?:\G(?!^);|Sections:)\s*\K[BCDE]\d+~i', $s, $m)) {
print_r($m[0]);
}
Output:
Array
(
[0] => B3
[1] => C2
[2] => D4
)

You don't need regex an explode will do fine.
Remove "Section: " then explode the rest of the string.
$s = 'Sections: B3; C2; D4';
$s = str_replace('Sections: ', '', $s);
$arr = explode("; ", $s);
Var_dump($arr);
https://3v4l.org/PcrNK

regexp monetary strings with decimals and thousands separator

https://www.tehplayground.com/KWmxySzbC9VoDvP9
Why is the first string matched?
$list = [
'3928.3939392', // Should not be matched
'4.239,99',
'39',
'3929',
'2993.39',
'393993.999'
];
foreach($list as $str){
preg_match('/^(?<![\d.,])-?\d{1,3}(?:[,. ]?\d{3})*(?:[^.,%]|[.,]\d{1,2})-?(?![\d.,%]|(?: %))$/', $str, $matches);
print_r($matches);
}
output
Array
(
[0] => 3928.3939392
)
Array
(
[0] => 4.239,99
)
Array
(
[0] => 39
)
Array
(
[0] => 3929
)
Array
(
[0] => 2993.39
)
Array
(
)

You seem to want to match the numbers as standalone strings, and thus, you do not need the lookarounds, you only need to use anchors.
You may use
^-?(?:\d{1,3}(?:[,. ]\d{3})*|\d*)(?:[.,]\d{1,2})?$
See the regex demo
Details
^ - start of string
-? - an optional -
(?: - start of a non-capturing alternation group:
\d{1,3}(?:[,. ]\d{3})* - 1 to 3 digits, followed with 0+ sequences of ,, . or space and then 3 digits
| - or
\d* - 0+ digits
) - end of the group
(?:[.,]\d{1,2})? - an optional sequence of . or , followed with 1 or 2 digits
$ - end of string.

Extracting GTIN (regex)

I'm looking to extract GTIN codes from documents, they're 8, 12, 13 or 14 digit numbers. So I'm doing this:
$html = '8 digit 12345678 and now 12 digit 123456789012';
$extractGTIN = '/\d{7}$|^\d{11}$|^\d{12}$|^\d{13}/mi';
preg_match_all($extractGTIN, $html, $barcodes);
echo print_r ($barcodes, 1);
... but unexpectedly, it returns:
Array
(
[0] => Array
(
[0] => 6789012
)
)

You have not anchored the alternatives properly, use word boundaries. Instead of alternations, you may use an optional group here:
/\b\d{8}(?:\d{4,6})?\b/
See the regex demo.
Details:
\b - a leading word boundary
\d{8} - 8 digits
(?:\d{4,6})? - an optional sequence of 4, 5 or 6 digits (thus, matching all in all 8, 12, 13, 14 digits)
\b - trailing word boundary.
PHP demo:
$text = '8 digit 12345678 and now 12 digit 123456789012';
$extractGTIN = '/\b\d{8}(?:\d{4,6})?\b/';
preg_match_all($extractGTIN, $text, $barcodes);
print_r($barcodes[0]);
// => Array ( [0] => 12345678 [1] => 123456789012 )

split string by spaces and colon but not if inside quotes

having a string like this:
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"
the desired result is:
[0] => Array (
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
what I get with:
preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);
is:
[0] => Array (
[0] => dateto:'2015-10-07
[1] => 15:05'
[2] => xxxx
[3] => datefrom:'2015-10-09
[4] => 15:05'
[5] => yyyy
[6] => asdf
)
Also tried with preg_split("/[\s]+/", $str) but no clue how to escape if value is between quotes. Can anyone show me how and also please explain the regex. Thank you!

I would use PCRE verb (*SKIP)(*F),
preg_split("~'[^']*'(*SKIP)(*F)|\s+~", $str);
DEMO

Often, when you are looking to split a string, using preg_split isn't the best approach (that seems a little counter intuitive, but that's true most of the time). A more efficient way consists to find all items (with preg_match_all) using a pattern that describes all that is not the delimiter (white-spaces here):
$pattern = <<<'EOD'
~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
EOD;
if (preg_match_all($pattern, $str, $m))
$result = $m[0];
pattern details:
~ # pattern delimiter
(?=\S) # the lookahead assertion only succeeds if there is a non-
# white-space character at the current position.
# (This lookahead is useful for two reasons:
# - it allows the regex engine to quickly find the start of
# the next item without to have to test each branch of the
# following alternation at each position in the strings
# until one succeeds.
# - it ensures that there's at least one non-white-space.
# Without it, the pattern may match an empty string.
# )
[^'"\s]* #"'# all that is not a quote or a white-space
(?: # eventual quoted parts
'[^']*' [^'"\s]* #"# single quotes
|
"[^"]*" [^'"\s]* # double quotes
)*
~
demo
Note that with this a little long pattern, the five items of your example string are found in only 60 steps. You can use this shorter/more simple pattern too:
~(?:[^'"\s]+|'[^']*'|"[^"]*")+~
but it's a little less efficient.

For your example, you can use preg_split with negative lookbehind (?<!\d), i.e.:
<?php
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf";
$matches = preg_split('/(?<!\d)(\s)/', $str);
print_r($matches);
Output:
Array
(
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
Demo:
http://ideone.com/EP06Nt
Regex Explanation:
(?<!\d)(\s)
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\d)»
Match a single character that is a “digit” «\d»
Match the regex below and capture its match into backreference number 1 «(\s)»
Match a single character that is a “whitespace character” «\s»

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP string split regular - php

Related

PHP - Regex optimization split string in parts

PHP regexp how get all matches in preg_match

regexp monetary strings with decimals and thousands separator

Extracting GTIN (regex)

split string by spaces and colon but not if inside quotes

Categories

Resources