Using preg_match_all from title

Using preg_match_all from title - php

I need to extract from post title strings like:
12ml
12 ml
123ml
123 ml
12.3ml
12.3 ml
Now im using:
preg_match_all("/[0-9]+\sml/i", $post->post_title, $percentage);
if(isset($percentage[0][0]) && $percentage[0][0] != "" ){
$text = $percentage[0][0]." ";
}
echo $text;
But dont know how to set it for point separated numbers.

You could do:
$str = "abc 12ml def 12 ml xyz 123ml tuv 123 ml jhsfg 12.3ml qjsdfkjfhg 12.3 ml";
if (preg_match_all("/\d+(?:\.\d+)?\s*ml/i", $str, $percentage)) {
print_r($percentage);
}
Output:
Array
(
[0] => Array
(
[0] => 12ml
[1] => 12 ml
[2] => 123ml
[3] => 123 ml
[4] => 12.3ml
[5] => 12.3 ml
)
)
Explanation:
/ : regex delimiter
\d+ : 1 or more digits
(?: : start non capture group
\. : a dot
\d+ : 1 or more digits
)? : end group, optional
\s* : 0 or more spaces
ml : literally ml
/i : regex delimiter, flag case insensitive

Related

Parse strictly formatted text containing multiple entries with no delimiting character

I have a string containing multiple products orders which have been joined together without a delimiter.
I need to parse the input string and convert sets of three substrings into separate rows of data.
I tried splitting the string using split() and strstr() function, but could not generate the desired result.
How can I convert this statement into different columns?
RM is Malaysian Ringgit
From this statement:
"2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.6"
Into seperate row:
2 x Brew Coffeee Panas: RM7.4
2 x Tongkat Ali Ais: RM8.6
And this 2 row into this table in DB:
Table: Products
Product Name
Quantity
Total Amount (RM)
Brew Coffeee Panas
2
7.4
Tongkat Ali Ais
2
8.6
*Note: the "total amount" substrings will reliably have a numeric value with precision to one decimal place.

You could use regex if your string format is consistent. Here's an expression that could do that:
(\d) x (.+?): RM(\d+\.\d)
Basic usage
$re = '/(\d) x (.+?): RM(\d+\.\d)/';
$str = '2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.6';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_export($matches);
Which gives
array (
0 =>
array (
0 => '2 x Brew Coffeee Panas: RM7.4',
1 => '2',
2 => 'Brew Coffeee Panas',
3 => '7.4',
),
1 =>
array (
0 => '2 x Tongkat Ali Ais: RM8.6',
1 => '2',
2 => 'Tongkat Ali Ais',
3 => '8.6',
),
)
Group 0 will always be the full match, after that the groups will be quantity, product and price.
Try it online

Capture one or more digits
Match the space, x, space
Capture one or more non-colon characters until the first occuring colon
Match the colon, space, then RM
Capture the float value that has a max decimal length of 1OP says in comment under question: it only take one decimal place for the amount
There are no "lazy quantifiers" in my pattern, so the regex can move most swiftly.
This regex pattern is as Accurate as the sample data and requirement explanation allows, as Efficient as it can be because it only contains greedy quantifiers, as Concise as it can be thanks to the negated character class, and as Readable as the pattern can be made because there are no superfluous characters.
Code: (Demo)
var_export(
preg_match_all('~(\d+) x ([^:]+): RM(\d+\.\d)~', $string, $m)
? array_slice($m, 1) // omit the fullstring matches
: [] // if there are no matches
);
Output:
array (
0 =>
array (
0 => '2',
1 => '2',
),
1 =>
array (
0 => 'Brew Coffeee Panas',
1 => 'Tongkat Ali Ais',
),
2 =>
array (
0 => '7.4',
1 => '8.6',
),
)
You can add the PREG_SET_ORDER argument to the preg_match_all() call to aid in iterating the matches as rows.
preg_match_all('~(\d+) x ([^:]+): RM(\d+\.\d)~', $string, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
echo '<tr><td>' . implode('</td><td>', array_slice($match, 1)) . '</td></tr>';
}

You can use a regex like this:
/(\d+)\sx\s([^:]+):\sRM(\d+\.?\d?)(?=\d|$)/
Explanation:
(\d+) captures one or more digits
\s matches a whitespace character
([^:]+): captures one or more non : characters that come before a : character (you can also use something like [a-zA-Z0-9\s]+): if you know exactly which characters can exist before the : character - in this case lower case and upper case letters, digits 0 through 9 and whitespace characters)
(\d+\.?\d?) captures one or more digits, followed by a . and another digit if they exist
(?=\d|$) is a positive lookahead which matches a digit after the main expression without including it in the result, or the end of the string
You can also add the PREG_SET_ORDER flag to preg_match_all() to group the results:
PREG_SET_ORDER
Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on.
Code example:
<?php
$txt = "2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.62 x B026 Kopi Hainan Kecil: RM312 x B006 Kopi Hainan Besar: RM19.5";
$pattern = "/(\d+)\sx\s([^:]+):\sRM(\d+\.?\d?)(?=\d|$)/";
if(preg_match_all($pattern, $txt, $matches, PREG_SET_ORDER)) {
print_r($matches);
}
?>
Output:
Array
(
[0] => Array
(
[0] => 2 x Brew Coffeee Panas: RM7.4
[1] => 2
[2] => Brew Coffeee Panas
[3] => 7.4
)
[1] => Array
(
[0] => 2 x Tongkat Ali Ais: RM8.6
[1] => 2
[2] => Tongkat Ali Ais
[3] => 8.6
)
[2] => Array
(
[0] => 2 x B026 Kopi Hainan Kecil: RM31
[1] => 2
[2] => B026 Kopi Hainan Kecil
[3] => 31
)
[3] => Array
(
[0] => 2 x B006 Kopi Hainan Besar: RM19.5
[1] => 2
[2] => B006 Kopi Hainan Besar
[3] => 19.5
)
)
See it live here php live editor and here regex tester.

The first thing I would do would be to perform a simple replacement using preg_replace to insert, with the aid of a a back-reference to the captured item, based upon the known format of a single decimal point. Anything beyond that single decimal point forms part of the next item - the quantity in this case.
$str="2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.625 x Koala Kebabs: RM15.23 x Fried Squirrel Fritters: RM32.4";
# qty price
# 2 7.4
# 2 8.6
# 25 15.2
# 3 32.4
/*
Our RegEx to find the decimal precision,
to split the string apart and the quantity
*/
$pttns=(object)array(
'repchar' => '#(RM\d{1,}\.\d{1})#',
'splitter' => '#(\|)#',
'combo' => '#^((\d{1,}) x)(.*): RM(\d{1,}\.\d{1})$#'
);
# create a new version of the string with our specified delimiter - the PIPE
$str = preg_replace( $pttns->repchar, '$1|', $str );
# split the string intp pieces - discard empty items
$a=array_filter( preg_split( $pttns->splitter, $str, null ) );
#iterate through matches - find the quantity,item & price
foreach($a as $str){
preg_match($pttns->combo,$str,$matches);
$qty=$matches[2];
$item=$matches[3];
$price=$matches[4];
printf('%s %d %d<br />',$item,$qty,$price);
}
Which yields:
Brew Coffeee Panas 2 7
Tongkat Ali Ais 2 8
Koala Kebabs 25 15
Fried Squirrel Fritters 3 32

PHP preg_split split by group 1

I have these inputs:
Rosemary Hess (2018) (Germany) (all media)
Jackie H Spriggs (catering)
I want to split them by the first parentheses, the output i want:
array:2 [
0 => "Rosemary Hess"
1 => "(2018) (Germany) (all media)"
]
array:2 [
0 => "Jackie H Spriggs"
1 => "(catering)"
]
I tried these but not working correctly :
preg_split("/(\s)\(/", 'Rosemary Hess (2018) (Germany) (all media)')
But it splits every space with parentheses and returns five items rather two.

You can use
$strs= ["Rosemary Hess (2018) (Germany) (all media)", "Jackie H Spriggs (catering)"];
foreach ($strs as $s){
print_r( preg_split('~\s*(?=\([^()]*\))~', $s, 2) );
}
// => Array ( [0] => Rosemary Hess [1] => (2018) (Germany) (all media) )
// => Array ( [0] => Jackie H Spriggs [1] => (catering) )
See the PHP demo. See the regex demo.
The preg_split third $limit argument set to 2 makes it split the string with the first occurrence of the pattern that matches:
\s* - 0+ whitespaces
(?=\([^()]*\)) - that are followed with (, 0 or more chars other than ( and ) and then a ).

preg_split : splitting a string according to a very specific pattern

Regex/PHP n00b here. I'm trying to use the PHP "preg_split" function...
I have strings that follow a very specific pattern according to which I want to split them.
Example of a string:
CADAVRES [FILM] (Canada : Québec, Érik Canuel, 2009, long métrage) FICTION
Desired result:
[0]CADAVRES
[1]FILM
[2]Canada : Québec
[3]Érik Canuel
[4]2009
[5]long métrage
[6]FICTION
Delimiters (in order of occurrence):
" ["
"] ("
", "
", "
", "
") "
How do I go about writing the regex correctly?
Here's what I've tried:
<?php
$pattern = "/\s\[/\]\s\(/,\s/,\s/,\s/\)\s/";
$string = "CADAVRES [FILM] (Canada : Québec, Érik Canuel, 2009, long métrage) FICTION";
$keywords = preg_split($pattern, $string);
print_r($keywords);
It's not working, and I don't understand what I'm doing wrong. Then again, I've just begun trying to deal with regex and PHP, so yeah... There are so many escape characters, I can't see right...
Thank you very much!

I managed to work out a solution using preg_match_all:
$input = "CADAVRES [FILM] (Canada : Québec, Érik Canuel, 2009, long métrage) FICTION";
preg_match_all("|[^-\\[\\](),/\\s]+(?:(?: :)? [^-\\[\\](),/]+)?|", $input, $matches);
print_r($matches[0]);
Array
(
[0] => CADAVRES
[1] => FILM
[2] => Canada : Québec
[3] => Érik Canuel
[4] => 2009
[5] => long métrage
[6] => FICTION
)
The above regex considers a term as any character which is not something like bracket, comma, parenthesis, etc. It also allows for two word terms, possibly with a colon separator in the middle.

You can use this regex to split on:
([^\w:]\s[^\w:]?|\s[^\w:])
It looks for a non-(word or :) character, followed by a space, followed by an optional non-(word or :) character; or a space followed by a non-(word or :) character. This will match all your desired split patterns. In PHP (note you need the u modifier to deal with unicode characters):
$input = "CADAVRES [FILM] (Canada : Québec, Érik Canuel, 2009, long métrage) FICTION";
$keywords = preg_split('/([^\w:]\s[^\w:]?|\s[^\w:])/u', $input);
print_r($keywords);
Output:
Array
(
[0] => CADAVRES
[1] => FILM
[2] => Canada : Québec
[3] => Érik Canuel
[4] => 2009
[5] => long métrage
[6] => FICTION
)
Demo on 3v4l.org

Here's an attempt with preg_match:
$pattern = "/^([^\[]+)\[([^\]]+)\]\s+\(([^,]+),\s+([^,]+),\s+([^,]+),\s+([^,]+)\)\s+(.+)$/i";
$string = "CADAVRES [FILM] (Canada : Québec, Érik Canuel, 2009, long métrage) FICTION";
preg_match($pattern, $string, $keywords);
array_shift($keywords);
print_r($keywords);
Output:
Array
(
[0] => CADAVRES
[1] => FILM
[2] => Canada : Québec
[3] => Érik Canuel
[4] => 2009
[5] => long métrage
[6] => FICTION
)
Try it!
Regex breakdown:
^ anchor to start of string
( begin capture group 1
[^\[]+ one or more non-left bracket characters
) end capture group 1
\[ literal left bracket
( begin capture group 2
[^\]]+ one or more non-right bracket characters
) end capture group 2
\] literal bracket
\s+ one or more spaces
\( literal open parenthesis
( open capture group 3
[^,]+ one or more non-comma characters
) end capture group 3
,\s+ literal comma followed by one or more spaces
([^,]+),\s+([^,]+),\s+([^,]+) repeats of the above
\) literal closing parenthesis
\s+ one or more spaces
( begin capture group 7
.+ everything else
) end capture group 7
$ EOL
This assumes your structure to be static and is not particularly pretty, but on the other hand, should be robust to delimiters creeping into fields where they're not supposed to be. For example, the title having a : or , in it seems plausible and would break a "split on these delimiters anywhere"-type solution. For example,
"Matrix:, Trilogy() [FILM, reviewed: good] (Canada() : Québec , \t Érik Canuel , ): 2009 , long ():():[][]métrage) FICTIO , [(:N";
correctly parses as:
Array
(
[0] => Matrix:, Trilogy()
[1] => FILM, reviewed: good
[2] => Canada() : Québec
[3] => Érik Canuel
[4] => ): 2009
[5] => long ():():[][]métrage
[6] => FICTIO , [(:N
)
Try it!
Additionally, if your parenthesized comma region is variable length, you might want to extract that first and parse it, then handle the rest of the string.

Matching string regular expression

I would like to match data from strings like the following:
24.Legacy.S01E08.720p.HDTV.x264-AVS[rarbg]
Colony.S02E09.720p.HDTV.x264-FLEET[rarbg]
24.Legacy (everything before S01E08)
S => 01
E => 08
720p.HDTV.x264 (everything between S01E08 and -)
AVS (everything between - en [)
rarbg (everything between [])
The following test almost works but needs some tweaks:
preg_match_all(
'/(.*?).S([0-9]+)E([0-9]+).(.*?)(.*?)[(.*?)]/s',
$download,
$posts,
PREG_SET_ORDER
);

You're so close, you just need to add the tests for the second half of the requirements:
(.*?).S([0-9]+)E([0-9]+).(.*?)-(.*?)\[(.*?)\]
https://regex101.com/r/PfgMfq/1

You should not need the /s modifier, it extends . to match meta chars and line breaks.
I would recommend to use the /e modifier to also allow lower case 's01e14'
Don't forget to escape the regex chars like . and [ with \. and \[
// NAME SEASON EPISOE MEDIUM OPTIONS
$regex = '/(.+)\.S([0-9]+)E([0-9]+)\.(.+)\[(.+)\]/i';
preg_match_all(
$regex,
$download,
$posts,
PREG_SET_ORDER
);
Test with '24.Legacy.S01E08.720p.HDTV.x264-AVS[rarbg]'
Array
(
[0] => 24.Legacy.S01E08.720p.HDTV.x264-AVS[rarbg]
[1] => 24.Legacy
[2] => 01
[3] => 08
[4] => 720p.HDTV.x264-AVS
[5] => rarbg
)

Just write it down then :)
^
(?P<title>.+?) # title
S(?P<season>\d+) # season
E(?P<episode>\d+)\. # episode
(?P<quality>[^-]+)- # quality
(?P<type>[^[]+) # type
\[
(?P<torrent>[^]]+) # rest
\]
$
Demo on regex101.com.

If a part is optional just add some ( ) around it and a ? behind it, like this
// NAME SEASON EPISOE MEDIUM OPTIONS
$regex = '/(.+)\.S([0-9]+)E([0-9]+)\.(.+)(\[(.+)\])?/i';
but watch out for changing $match indexes
Array
(
[0] => 24.Legacy.S01E08.720p.HDTV.x264-AVS[rarbg]
[1] => 24.Legacy
[2] => 01
[3] => 08
[4] => 720p.HDTV.x264-AVS
[5] => [rarbg]
[6] => rarbg
)
if you don't need the rarbg value you can skip the inner ()
// NAME SEASON EPISOE MEDIUM OPTIONS
$regex = '/(.+)\.S([0-9]+)E([0-9]+)\.(.+)(\[.+\])?/i';

Getting multiple subpatterns with the same name

Regarding my previous post I'm trying to match with regular expressions all use statements in a class file.
<?php
use Vendor\ProjectArticle\Model\Peer,
Vendor\Library\Template;
use Vendor\Blablabla;
$file = file_get_contents($class_path);
$a = preg_match_all('#use (?:(?<ns>[^,;]+),?)+;#mi', $file, $use);
var_dump(array('$a' => $a, '$use' => $use));
Unfortunately I'm not blessed with all namespaces used in case of multiple class names in one use statement. Only last one matched is being stored.
Array
(
[$a] => 2
[$use] => Array
(
[0] => Array
(
[0] => use Vendor\ProjectArticle\Model\Peer,
Vendor\Library\Template;
[1] => use Vendor\Blablabla;
)
[ns] => Array
(
[0] =>
Vendor\Library\Template
[1] => Vendor\Blablabla
)
[1] => Array
(
[0] =>
Vendor\Library\Template
[1] => Vendor\Blablabla
)
)
)
Can this be accomplished with some pattern modifier or something?
~Thanks

Should be able to use the \G anchor for this.
# '~(?:(?!\A)\G|^Use\s+),?\s*(?<ns>[^,;]+)(?=(?:,|[^,;]*)*;)~mi'
(?xmi-) # Inline modifier = expanded, multiline, case insensitive
(?:
(?! \A ) # Not beginning of string
\G # If matched before, start at end of last match
| # or,
^ Use \s+ # Beginning of line then 'Use' + whitespace
)
,? \s* # Whitespace trim
(?<ns> [^,;]+ ) # (1), A namespace value
(?= # Lookahead, each match validates a final ';'
(?: , | [^,;]* )*
;
)
Output:
** Grp 0 - ( pos 0 , len 36 )
use Vendor\ProjectArticle\Model\Peer
** Grp 1 - ( pos 4 , len 32 )
Vendor\ProjectArticle\Model\Peer
---------------------
** Grp 0 - ( pos 36 , len 30 )
,
Vendor\Library\Template
** Grp 1 - ( pos 43 , len 23 )
Vendor\Library\Template
---------------------
** Grp 0 - ( pos 69 , len 20 )
use Vendor\Blablabla
** Grp 1 - ( pos 73 , len 16 )
Vendor\Blablabla

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Using preg_match_all from title - php

Related

Parse strictly formatted text containing multiple entries with no delimiting character

PHP preg_split split by group 1

preg_split : splitting a string according to a very specific pattern

Matching string regular expression

Getting multiple subpatterns with the same name

Categories

Resources