Using Regex with product pricing trailing currency symbol - php

So, still learning, regex is mind numbing stuff. But I have a working regex to preg_match in php any numbers based around product pricing that follow a currency symbol £. This may be helpful as I couldn't find a working example to consider all variants (such as thousand , and decimals etc). Any improvements to the regex totally welcome!
My question is why though does the array contain 3 instances of every number? And what's the meaning of the "2" that follows?
(?<=\£|GBP)((\d{1,6}(,\d{3})*)|(\d+))(\.\d{2})?
Function:
function website($url) {
$xml = new DOMDocument();
if(#$xml->loadHTMLFile($url)) {
$xpath = new DOMXPath( $xml );
$textNodes = $xpath->query( '//text()' );
foreach ( $textNodes as $textNode ) {
if ( preg_match('/(?<=\£|GBP)((\d{1,6}(,\d{3})*)|(\d+))(\.\d{2})?/', $textNode->nodeValue, $matches, PREG_OFFSET_CAPTURE ) ) {
$website_prices[] = $matches;
global $website_prices;
}
}
}
print_r is dumping:
[3] => Array
(
[0] => Array
(
[0] => 545
[1] => 2
)
[1] => Array
(
[0] => 545
[1] => 2
)
[2] => Array
(
[0] => 545
[1] => 2
)
)

Your current regex has lots of unnecessary grouping / formatting, which isn't needed. The following regex would be suitable in your case :
(?<=£|GBP)[\d.,]+
see demo / explanation
PHP
(implementation)
<?php
$re = '/(?<=£|GBP)[\d.,]+/';
$str = '£545 £5450 £54.20 £5450 £545,620 £545,620.96
GBP545 GBP5450 GBP54.20 GBP5450 GBP545,620 GBP545,620.96';
preg_match_all($re, $str, $matches);
print_r($matches);
?>
­(output)
Array
(
[0] => Array
(
[0] => 545
[1] => 5450
[2] => 54.20[3] => 5450
[4] => 545,620
[5] => 545,620.96
[6] => 545
[7] => 5450
[8] => 54.20
[9] => 5450
[10] => 545,620
[11] => 545,620.96
)
)

Related

PHP preg_match() doesn't match all subpatterns

I have a preg_match() which matches the pattern but doesn't receive the expected matches (in third param).
My regex patterns have multiple subpatterns.
$pattern = "~^&multi&[^&]+(&(?:(p-(?<sad>[1-9]\d*)|page-(?<sad>[1-9]\d*))))?&[^&]+(&(?:(p-(?<gogosi>[1-9]\d*)|page-(?<gogosi>[1-9]\d*))))?&?$~J";
$string = "&multi&mickael&p-23&george&page-34";
preg_match($pattern, $string, $matches);
This is what $matches contains:
Array
(
[0] => &multi&mickael&p-23&george&page-34
[1] => &p-23
[2] => p-23
[sad] =>
[3] => 23
[4] =>
[5] => &page-34
[6] => page-34
[gogosi] => 34
[7] =>
[8] => 34
)
The problem is [sad] should have 23 value.
If I don't include in $string second page (page-34), 'cause is optional [...]
$string = "&multi&mickael&p-23&george";
[...] I have good $matches 'cause my [sad] got his value:
Array
(
[0] => &multi&mickael&p-23&george
[1] => &p-23
[2] => p-23
[sad] => 23
[3] => 23
)
But I want regex to return properly value even when I have both paginations in $string.
What to do such that all subpatterns will have their value ?
Note: Words as ('p', 'page') are only examples. Can be any words there.
Note: Above data is just an example. Don't give me workaround solutions, but something good for any input data.
You may use a branch reset group, (?|...|...):
'~^&multi&[^&]+(&((?|p-(?<sad>[1-9]\d*)|page-(?<sad>[1-9]\d*))))?&[^&]+(&((?|p-(?<gogosi>[1-9]\d*)|page-(?<gogosi>[1-9]\d*))))?&?$~J'
See the regex demo.
See the PHP demo:
$pattern = "~^&multi&[^&]+(&((?|p-(?<sad>[1-9]\d*)|page-(?<sad>[1-9]\d*))))?&[^&]+(&((?|p-(?<gogosi>[1-9]\d*)|page-(?<gogosi>[1-9]\d*))))?&?$~J";
$string = "&multi&mickael&p-23&george&page-34";
if (preg_match($pattern, $string, $matches)) {
print_r($matches);
}
Output:
Array
(
[0] => &multi&mickael&p-23&george&page-34
[1] => &p-23
[2] => p-23
[sad] => 23
[3] => 23
[4] => &page-34
[5] => page-34
[gogosi] => 34
[6] => 34
)

Parsing digit which has three letters with preg_match

I have a string which I have to parse digit which has three letters but I want to use same pattern using preg_match.
Here is my code can anybody help me out.
$string=" AMOUNT - 10.00CAD 0.50XGA 1.00XQA";
if(preg_match('/^\s+AMOUNT\s+\-\s+\d+[.]\d+[A-Z]{3}\s+((?J)(?<amount>\d+[.]\d+)(XGA)?(?J)\s+(?<amount>\d+[.]\d+)(XQA))/',$string,$m))
{
print_r($m);
}
I'd use preg_match_all like that:
$string=" AMOUNT - 10.00CAD 0.50XGA 1.00XQA";
if(preg_match_all('/^\s+AMOUNT\s+-\s+(*SKIP)(*F)|(\d+\.\d+)[A-Z]{3}\b/', $string, $m)) {
print_r($m);
}
Output:
Array
(
[0] => Array
(
[0] => 10.00CAD
[1] => 0.50XGA
[2] => 1.00XQA
)
[amount] => Array
(
[0] => 10.00
[1] => 0.50
[2] => 1.00
)
[1] => Array
(
[0] => 10.00
[1] => 0.50
[2] => 1.00
)
)

What is the regex for the text between quotes?

Ok, I have tried looking at other answers, but couldn't get mine solved. So here is the code:
{"chg":"-0.71","vol":"40700","time":"11.08.2011 12:29:09","high":"1.417","low":"1.360","last":"1.400","pcl":"1.410","turnover":"56,560.25"}
I need to get every second value in the quotes (as the "name" values are constant). I actually worked out that I need to get text between :" and " but i can't manage to write a regex for that.
EDIT: I'm doing preg_match_all in php. And its between :" and ", not " and " as someone else edited.
Why on earth would you attempt to parse JSON with regular expressions? PHP already parses JSON properly, with built-in functionality.
Code:
<?php
$input = '{"chg":"-0.71","vol":"40700","time":"11.08.2011 12:29:09","high":"1.417","low":"1.360","last":"1.400","pcl":"1.410","turnover":"56,560.25"}';
print_r(json_decode($input, true));
?>
Output:
Array
(
[chg] => -0.71
[vol] => 40700
[time] => 11.08.2011 12:29:09
[high] => 1.417
[low] => 1.360
[last] => 1.400
[pcl] => 1.410
[turnover] => 56,560.25
)
Live demo.
You may need to escape characters or add a forward slash to the front or back depending on your language. But it's basically:
:"([^"].*?)"
or
/:"([^"].*?)"/
I've test this in groovy as below and it works.
import java.util.regex.*;
String test='{"chg":"-0.71","vol":"40700","time":"11.08.2011 12:29:09","high":"1.417","low":"1.360","last":"1.400","pcl":"1.410","turnover":"56,560.25"}'
// Create a pattern to match breaks
Pattern p = Pattern.compile(':"([^"]*)"');
// Split input with the pattern
// Run some matches
Matcher m = p.matcher(test);
while (m.find())
System.out.println("Found comment: "+m.group().replace('"','').replace(":",""));
Output was:
Found comment: -0.71
Found comment: 40700
Found comment: 11.08.2011 12:29:09
Found comment: 1.417
Found comment: 1.360
Found comment: 1.400
Found comment: 1.410
Found comment: 56,560.25
PHP Example
<?php
$subject = '{"chg":"-0.71","vol":"40700","time":"11.08.2011 12:29:09","high":"1.417","low":"1.360","last":"1.400","pcl":"1.410","turnover":"56,560.25"}';
$pattern = '/(?<=:")[^"]*/';
preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>
Output is:
Array ( [0] => Array ( [0] => Array ( [0] => -0.71 [1] => 8 ) [1] => Array ( [0] => 40700 [1] => 22 ) [2] => Array ( [0] => 11.08.2011 12:29:09 [1] => 37 ) [3] => Array ( [0] => 1.417 [1] => 66 ) [4] => Array ( [0] => 1.360 [1] => 80 ) [5] => Array ( [0] => 1.400 [1] => 95 ) [6] => Array ( [0] => 1.410 [1] => 109 ) [7] => Array ( [0] => 56,560.25 [1] => 128 ) ) )

How to preg_split using PREG_SPLIT_DELIM_CAPTURE

$str = "blabla and, some more blah";
$delimiters = " ,¶.\n";
$char_buff = preg_split("/(,) /", $str, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($char_buff);
I get:
Array (
[0] => blabla and
[1] => ,
[2] => some more blah
)
I was able to figure out how to use the parenthesis to get the comma to show up in its own array element -- but how can I do this with multiple different delimiters (for example, those in the $delimiters variable)?
You need to create a character class by wrapping the delimiters with [ and ].
<?php
$str = "blabla and, some more blah. Blah.\nSecond line.";
$delimiters = " ,¶.\n";
$char_buff = preg_split('/([' . $delimiters . '])/', $str, -1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($char_buff);
You also need to use PREG_SPLIT_NO_EMPTY so that in places where you get two matches in a row, for instance a comma followed by a space, you don't get an empty match.
Output
Array
(
[0] => blabla
[1] =>
[2] => and
[3] => ,
[4] =>
[5] => some
[6] =>
[7] => more
[8] =>
[9] => blah
[10] => .
[11] =>
[12] => Blah
[13] => .
[14] =>
[15] => Second
[16] =>
[17] => line
[18] => .
)
Depending on what you are doing, using strtok may be a more appropriate way of doing it though.
Use something like:
'/([,.])/'
That is put each delimiter in that square bracket.
Each delimiter expression needs to be inside its own group.
print_r(preg_split('/2\d4/' , '12345', null, PREG_SPLIT_DELIM_CAPTURE));
Array ( [0] => 1 [1] => 5 )
print_r(preg_split('/(2)(\d)(4)/', '12345', null, PREG_SPLIT_DELIM_CAPTURE));
Array ( [0] => 1 [1] => 2 [2] => 3 [3] => 4 [4] => 5 )

preg_match return all parts in array

I've got following php code:
$match = array();
if (preg_match("%^(/\d+)(/test)(/\w+)*$%", "/25/test/t1/t2/t3/t4", $match))
print_r($match);
I'm getting this result:
Array ( [0] => /25/test/t1/t2/t3/t4 [1] => /25 [2] => /test [3] => /t4 )
What do i need to change in my regexp to get this result:
Array ( [0] => /25/test/t1/t2/t3/t4 [1] => /25 [2] => /test [3] => /t1 [4] => /t2 [5] => /t3 [6] => /t4)
you need preg_match_all
preg_match_all( '~(/\w+)~', $str, $matches );
in your situation you can use explode too
<?php
$str = '/a/b/1/2/3/4';
if(preg_match('/^(\/\w+)*$/', $str) && preg_match_all('/\/\w+/', $str, $matches)) {
$matches = $matches[0];
print_r($matches);
}
?>
Prints:
Array
(
[0] => /a
[1] => /b
[2] => /1
[3] => /2
[4] => /3
[5] => /4
)
Using your original example, you could use a recursive expression:
"%(/\w+)(?>[^(/\w+)]?|(?R))%"
This works my matching (/\w+) subexpressions in turn. Therfore the match for
"/a/b/1/2/3/4"
Would be:
Array
(
[0] => Array
(
[0] => /a [1] => /b [2] => /1 [3] => /2 [4] => /3 [5] => /4
)
...
However your later examples complicate things. A simple 0 or more match will only return the last (greedy) or first (ungreedy) match - not all submatches. preg_match_all won't be able to handle your dynamic expression.
You will have to clarify what you're trying to achieve in more detail before a suitable solution can be provided.

Categories