I'm writing php script which will recognise bank payment reports.
For example, I have this code:
$str = "Customer Name /First Polises number - SAT431223 (5.20 eur), BOS32342 (33,85 euro), (32,10 eiro), (78.66 €), €1232,2, (11.45)"
And I need to find all this currency combinations in string, so the input be like this:
5.20
33.85
32.10
78.66
1232.20
11.45
How can I do that? I know the function preg_match(), but I don't understand how to write pattern for that case.
preg_match will give you only first match found. But you can use preg_match_all to get array of all matches.
Here's everything you need to know about how to build regex patterns:
http://php.net/manual/en/reference.pcre.pattern.syntax.php
You need pattern like this: /[0-9]+[,.]{1}[0-9]{2}/
/ - delimiter, can be other character, but you need it on the beginning and end of the pattern.
[0-9] - matches digits
+ and {1}, and {2} - they define amount of charaters. + is "one or more", number in {} is exact number of characters.
[,.]{1} - this matches exactly one ({1}) character from set of ,..
Example code:
$matches = array();
preg_match_all('/[0-9]+[,.]{1}[0-9]{2}/', $str, $matches);
var_dump($matches);
Result:
array (size=1)
0 =>
array (size=5)
0 => string '5.20' (length=4)
1 => string '33,85' (length=5)
2 => string '32,10' (length=5)
3 => string '78.66' (length=5)
4 => string '11.45' (length=5)
I would do this with:
/([0-9]+[,.][0-9]+)/g
Matching:
Numbers (zero or more times)
Dot or Comma
Numbers (zero or more times)
Note the g: Global to get all matches
Example and more detailed break-down of the regex: https://regex101.com/r/eH6aX6/1
That will match any double values in the provided sentence which are not necessarily currency...
Hope it points you t the correct direction
Related
I have a placeholder in my content which has the following format
{{label1#label2_label3}}}
And I'm correctly matching it with this regex
preg_match('/\{\{(\w+|d+|_+|#+)*\}\}/i', $content, $matches);
The problem is that the $matches array which PHP returns has the following data (preg_match docs)
array (size=2)
0 => string '{{label1#label2_label3}}' (length=44)
1 => string 'label2_label3' (length=16)
While my expected output is the following
array (size=2)
0 => string '{{label1#label2_label3}}' (length=44)
1 => string 'label1#label2_label3' (length=16)
My solution was to use a replace to simply get rid of the parenthesis like so:
$matches[1] = str_replace("}", "", (str_replace("{","",$matches[0])));
which works but I'm concerned about the performance while rendering a page with many placeholders.
Is there any flag or function I'm missing to just tell PHP to return the entire string inside {{ }} in $matches1?
Using \w also matches \d and _ so that will leave \w and #
You get that result where label1# is missing as you repeat a capture group which will capture the value of the last iteration.
As you want a match for label1#label2_label3 you can use a single character class to match word characters and the # char and use a non repeating capture group.
{{([\w#]+)}}
Regex demo | PHP demo
$content = "{{label1#label2_label3}}";
preg_match('/{{([\w#]+)}}/i', $content, $matches);
print_r($matches);
Output
Array
(
[0] => {{label1#label2_label3}}
[1] => label1#label2_label3
)
If the # and _ can not be at the start or at the end:
{{([^\W_]+(?:[_#][^\W_]+)*)}}
The pattern in parts:
{{ Match literally
( Capture group 1
[^\W_]+ Match 1+ word characters without _
(?:[_#][^\W_]+)* Optionally repeat matching either _ or # and 1+ word characters without _
) Close group 1
}} Match literally
Regex demo
I have strings with numbers-dashes sub-strings. I want to find these sub-strings and replace them after some modifications.
For example, the string is like:
This is the string number 123-45-6789-0 which contains 12-34567.
Now I want to find sub-strings of numbers-dashes (123-45-6789-0 and 12-34567) and replace them with the modified sub-strings. For example the final string would be like this:
This is the modified string number 0-6789-45-123 which contains 34567-12.
I have already tried preg_match_all(string $pattern, string $subject, array &$matches) with:
$pattern = '/-*\d+-*/';
but it gives me an array of numbers each one with a dash, like this:
$matches = [123-, 45-, 6789-, 0, 12-, 34567]
whereas, I want an array of two sub-strings, like this:
$matches = [0 => 123-45-6789-0, 1 => 12-34567]
in order to do modifications and replacements (using str_replace()), separately.
Which pattern and methods should I use for these purpose?
Thanks in advance.
You may use \d+(?:-\d+)+ regex with a preg_replace_callback` function:
$str = 'This is the string number 123-45-6789-0 which contains 12-34567.';
echo preg_replace_callback('~\d+(?:-\d+)+~', function($m) {
return implode('-', array_reverse(explode('-', $m[0]))); }
,$str);
// => This is the string number 0-6789-45-123 which contains 34567-12.
See PHP demo and the regex demo.
The \d+(?:-\d+)+ pattern matches
\d+ - 1+ digits
(?:-\d+)+ - 1 or more occurrences of - and 1+ digits sequences.
$m is a match array, $m[0] holds the match value. With explode, the string is split with -, then the array is reversed, and then joined back with implode.
I have the following expression:
$exp = "/^(?!.*?that).*$/";
which is meant to match any line that does not contain "that".
I have the following three sentences:
$str = array(
"I like this sentence.", #line1
"I like that sentence.", #line2
"I link THAT sentence." #line3
);
The match is case-sensitive and therefore only lines 1 and 3 are matched. So far so good.
However, I would like to make it case-insensitive, so that it only matches line 1. I have tried with an inline modifier, i.e. "(?-i ... )":
$exp = "/^(?!.*?(?i:that)).*$/";
and as a flag, i.e. "/ ... /i":
$exp = "/^(?!.*?that).*$/i";
but to no avail.
I run the search with the following loop:
foreach($str as $s) {
preg_match_all($exp, $s, $matches);
var_dump($matches);
}
with output:
array (size=1)
0 =>
array (size=1)
0 => string 'I like this sentence.' (length=21)
array (size=1)
0 =>
array (size=0)
empty
array (size=1)
0 =>
array (size=1)
0 => string 'I link THAT sentence.' (length=21)
and an online demo is available here: https://regex101.com/r/bs9rzF/1
I would grateful for any tips about how I can make my regular expression case-insensitive.
EDIT: I was incorrectly using "?-i" instead of "?-i", as some contributors correctly point out. Fixed now.
Your first regex ^(?!.*?that).*$ has nothing to do with case sensitivity as you are not using any modifier for case insensitivity.
The regex matches first and third sentence because your regex is saying that there shouldn't be a word that (case sensitive here) in the sentence, which is true for first and third sentence (In third sentence you have THAT which is not same as that)
To match only the first sentence, you can use the inline modifier (?i) like
(?i)^(?!.*?that).*$
See here
BTW, your /^(?!.*?that).*$/i regex is also correct.
You were close:
^(?!.*?(?i)that).*$
See a demo on regex101.com. In your expression ((?-i)) you were turning the modifier off.
I have a string:
$day = "11.08.2012 PROC BRE-AMS 08:00-12:00 ( MIETWAGEN MIT BAK RES 6049687886 ) Y AMS-AMS 13:15-19:15"
And I have a regular expression:
$data = preg_split("/(?=[A-Z]{1,4}[\s]+[A-Z]{3}[\-][A-Z]{3}[\s]+)/", $day);
The expected $data-Array should be:
array
0 => string '11.08.2012 ' (length=11)
1 => string 'PROC 08:00-12:00 ( MIETWAGEN MIT BAK RES 6049687886 ) ' (length=22)
2 => string 'Y AMS-AMS 13:15-19:15' (length=21)
But my result is:
0 => string '11.08.2012 ' (length=11)
1 => string 'P' (length=1)
2 => string 'R' (length=1)
3 => string 'O' (length=1)
4 => string 'C BRE-AMS 08:00-12:00 ( MIETWAGEN MIT BAK RES 6049687886 ) ' (length=59)
5 => string 'Y AMS-AMS 13:15-19:15' (length=21)
I cannot retrace what´s happening here. Could someone pleaqse explain?
In short, the problem is that (?=...) subexpression in your pattern match a position. I understand that was exactly your intention; the problem is, the next match is started not when the pattern specified in (?=) ends its match - but at the position matched by the lookahead + 1 symbol.
Let's check this process in details. First time the split is attempted, it walks the string until it got to the position marked by asterisk:
11.08.2012 *PROC BRE-AMS 08:00-12:00
... where it can match the pattern given. For the next attempt, the starting position 'bumps along' one symbol, so now we're here:
11.08.2012 P*ROC BRE-AMS 08:00-12:00
... and voila, we again can match this pattern, because of that {1,4} quantifier! That's how you got these 'irregular' P, R and O symbols.
That's for explanation, now for the "how to fix" part. The easiest way out of this, I suppose, is adding this little twist in your split pattern:
$data = preg_split('/\b(?=[A-Z]{1,4}\s+[A-Z]{3}-[A-Z]{3}\s+)/', $day);
We still match for position - but now this position should be the one that separates a 'word' symbol from a non-word one. The same idea can be expressed with negative lookbehind pattern:
$data = preg_split('/(?<![A-Z])(?=[A-Z]{1,4}\s+[A-Z]{3}-[A-Z]{3}\s+)/', $day);
... which is actually more precise, but less elegant, I suppose. )
Two sidenotes here: 1) don't use character class syntax when you need to specify a single symbol (simple - - - or 'shortcut' one, like \s); 2) use single quotation marks to delimit your pattern unless you want to interpolate some variables in it.
A hyphen is a metacharacter in a character class. If you want to include a hyphen in a character class you have to backslash escape it (although in this specific case it works since your character class has nothing but a hyphen).
If you need to include the split string, anchor the start of the lookahead to a word boundary, so that only the first letter of the first 1-4 character sequence is tested:
/(?=\b[A-Z]{1,4}\s+[A-Z]{3}-[A-Z]{3}\s+)/'
I know I'm just being simple-minded at this point but I'm stumped. Suppose I have a textual target that looks like this:
Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother's HG766 id was RB1223.
Using this RegExp: \s[A-Z][A-Z]\d\d\d\d\s, how would I extract, individually, the first and second occurrences of the matching strings? "JH6781" and "RB1223", respectively. I guarantee that the matching string will appear exactly twice in the target text.
Note: I do NOT want to change the existing string at all, so str_replace() is not an option.
Erm... how about using this regex:
/\b[A-Z]{2}\d{4}\b/
It means 'match boundary of a word, followed by exactly two capital English letters, followed by exactly four digits, followed by a word boundary'. So it won't match 'TGX7777' (word boundary is followed by three letters - pattern match failed), and it won't match 'TX77777' (four digits are followed by another digit - fail again).
And that's how it can be used:
$str = "Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother's HG766 id was RB1223.";
preg_match_all('/\b[A-Z]{2}\d{4}\b/', $str, $matches);
var_dump($matches[0]);
// array
// 0 => string 'JH6781' (length=6)
// 1 => string 'RB1223' (length=6)
$s='Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother\'s HG766 id was RB1223.';
$n=preg_match_all('/\b[A-Z][A-Z]\d\d\d\d\b/',$s,$m);
gives the result $n=2, then
print_r($m);
gives the result
Array
(
[0] => Array
(
[0] => JH6781
[1] => RB1223
)
)
You could use a combination of preg_match with the offset parameter(5th) and strpos to select the first and second occurrence.
Alternatively you could use preg_match_all and just use the first two array entries
<?php
$first = preg_match($regex, $subject, $match);
$second = preg_match($regex, $subject, $match, 0, strpos($match[0]) + 1);
?>