How can I optimize this regular expression? - php

I have a regular expression, and I would like to ask whether it is possible to simplify it?
preg_match_all('/([0-9]{2}\.[0-9]{2}\.[0-9]{4}) (([01]?[0-9]|2[0-3])\:[0-5][0-9]\:[0-5][0-9]?) поступление на сумму (\d+) WM([A-Z]) от корреспондента (\d+)/', $message->getMessageBody(), $info);

I think this is the best you can do:
preg_match_all('/((?:\d\d\.){2}\d{4}) (([01]?\d|2[0-3])(:[0-5]\d){1,2}) поступление на сумму (\d+) WM([A-Z]) от корреспондента (\d+)/', $message, $info);
Unless you don't need those exact words in there. Then you could:
preg_match_all('/((?:\d\d\.){2}\d{4}) (([01]?\d|2[0-3])(:[0-5]\d){1,2})\D+(\d+) WM([A-Z])\D+(\d+)/', $message, $info);

You can start by using free-spacing mode and some comments (which will help your and everyone else's understanding - which makes simplifying easier). Note that you'll have to put literal spaces in parentheses now, though:
/
( # group 1
[0-9]{2}\.[0-9]{2}\.[0-9]{4}
# match a date
)
[ ]
( # group 2
( # group 3
[01]?[0-9]# match an hour from 0 to 19
| # or
2[0-3] # match an hour from 20 to 23
)
\:
[0-5][0-9] # minutes
\:
[0-5][0-9]? # seconds
)
[ ]поступление[ ]на[ ]сумму[ ]
# literal text
(\d+) # a number into group 4
[ ]WM # literal text
([A-Z]) # a letter into group 5
[ ]от[ ]корреспондента[ ]
# literal text
(\d+) # a number into group 6
/x
Now we can't simplify the part at the end - unless you don't want to capture the parenthesised things, in which case you can simply omit most of the parentheses.
You can slightly shorten the expression, by using \d as a substitute for \d, in which case \d\d is even shorter than \d{2}.
Next, there is no need to escape colons.
And finally, there seems to be something odd with your seconds. If you want to allow single-digit seconds, make the 0-5 optional, and not the the \d after it:
/
( # group 1
\d\d\.\d\d\.\d{4}
# match a date
)
[ ]
( # group 2
( # group 3
[01]?\d # match an hour from 0 to 19
| # or
2[0-3] # match an hour from 20 to 23
)
:
[0-5]\d # minutes
:
[0-5]?\d # seconds
)
[ ]поступление[ ]на[ ]сумму[ ]
# literal text
(\d+) # a number into group 4
[ ]WM # literal text
([A-Z]) # a letter into group 5
[ ]от[ ]корреспондента[ ]
# literal text
(\d+) # a number into group 6
/x
I don't think it will get much simpler than that.

Related

Regex validation for North American phone numbers

I am having trouble finding a pattern that would detect the following
909-999-9999
909 999 9999
(909) 999-9999
(909) 999 9999
999 999 9999
9999999999
\A[(]?[0-9]{3}[)]?[ ,-][0-9]{3}[ ,-][0-9]{3}\z
I tried it but it doesn't work for all the instances . I was thinking I can divide the problem by putting each character into an array and then checking it. but then the code would be too long.
You have 4 digits in the last group, and you specify 3 in the regex.
You also need to apply a ? quantifier (1 or 0 occurrence) to the separators since they are optional.
Use
^[(]?[0-9]{3}[)]?[ ,-]?[0-9]{3}[ ,-]?[0-9]{4}$
See the demo here
PHP demo:
$re = "/\A[(]?[0-9]{3}[)]?[ ,-]?[0-9]{3}[ ,-]?[0-9]{4}\z/";
$strs = array("909-999-9999", "909 999 9999", "(909) 999-9999", "(909) 999 9999", "999 999 9999","9999999999");
$vals = preg_grep($re, $strs);
print_r($vals);
And another one:
$re = "/\A[(]?[0-9]{3}[)]?[ ,-]?[0-9]{3}[ ,-]?[0-9]{4}\z/";
$str = "909-999-9999";
if (preg_match($re, $str, $m)) {
echo "MATCHED!";
}
BTW, optional ? subpatterns perform better than alternations.
Try this regex:
^(?:\(\d{3}\)|\d{3})[- ]?\d{3}[- ]?\d{4}$
Explaining:
^ # from start
(?: # one of
\(\d{3}\) # '(999)' sequence
| # OR
\d{3} # '999' sequence
) #
[- ]? # may exist space or hyphen
\d{3} # three digits
[- ]? # may exist space or hyphen
\d{4} # four digits
$ # end of string
Hope it helps.

Regex for parsing figure references

I'm trying to create a regex to parse figures references inside a text. I must match at least these cases:
Fig* 1, 2 and 3 (not only 3, any number)
Fig* 1-3
Fig* 1 and 2
Fig* 1
Fig* 1 to 4
So I tried the following regex:
(Fig[a-zA-Z.]*)(\s(\d(,|\s)* )+|\d\s|and\s\d|\s\d-\d|\s\d)*
The best result would be having the numbers separated, but having the match I can just clean up the result and parse the numbers.
But I just can't seem to be able to parse that "1 to 4". Also, this regex seems not optmized at all. Any ideas?
Here is a sample: http://www.phpliveregex.com/p/3Zj
try this:
(Fig.*) ((\d( to | and |-)\d)|\d)|(\d,\d and \d)
You can use this pattern:
(Fig(?:ures?|s\.)) (\d+(?:(?:-|, | (?:and|to) )\d+)*)
If you need more flexibility, you can replace spaces with \h+ or \h*
edit:
I see my previous regex didn't work.
Atempting redemption, I offer two alternatives that do work -
1.
Using Multi-Line mode - This uses the \G anchor which provides a means
to get an aligned and trimmed output suitable for array
# '/(^Fig[a-zA-Z.]*\h+|(?!^)\G)(?(?<=\d)\h*,\h*)(\d+)(?|\h*(-)\h*(\d+)|\h+(and)\h+(\d+)|\h+(to)\h+(\d+))?/'
( # (1 start)
^ Fig [a-zA-Z.]* \h+ # Fig's
| # or,
(?! ^ ) # Start at the end of last match
\G
) # (1 end)
(?(?<= \d ) # Conditional, if previous digit
\h* , \h* # Require a comma
) # End conditional
( \d+ ) # (2), Digit
(?| # Branch reset (optionally, one of the (-|and|to) \d forms)
\h*
( - ) # (3), '-'
\h*
( \d+ ) # (4), Digit
| \h+
( and ) # (3), 'and'
\h+
( \d+ ) # (4), Digit
| \h+
( to ) # (3), 'to'
\h+
( \d+ ) # (4), Digit
)?
Perl test case
$/ = undef;
$str = <DATA>;
while ($str =~ /(^Fig[a-zA-Z.]*\h+|(?!^)\G)(?(?<=\d)\h*,\h*)(\d+)(?|\h*(-)\h*(\d+)|\h+(and)\h+(\d+)|\h+(to)\h+(\d+))?/mg)
{
length($1) ?
print "'$1'\t'$2'\t'$3'\t'$4'\n" :
print "'$1'\t\t'$2'\t'$3'\t'$4'\n" ;
}
__DATA__
Figs. 1, 2, 3 and 4
Figures 1, 2
Figs. 1 and 2
Figure 1-3
Figure 1 to 3
Figure 1
Output >>
'Figs. ' '1' '' ''
'' '2' '' ''
'' '3' 'and' '4'
'Figures ' '1' '' ''
'' '2' '' ''
'Figs. ' '1' 'and' '2'
'Figure ' '1' '-' '3'
'Figure ' '1' 'to' '3'
'Figure ' '1' '' ''
2. Using Multi-Line mode - This matches entire line, where capture group 1 contains 'Figs',
group 2 contains all the number forms
# '/^(Fig[a-zA-Z.]*\h+)((?(?<=\d)\h*,\h*|\d+(?:\h*-\h*\d+|\h+and\h+\d+|\h+to\h+\d+)?)+)\h*$/'
^
( Fig [a-zA-Z.]* \h+ ) # (1), Fig's
( # (2 start), All the num's
(?(?<= \d ) # Conditional, if previous digit
\h* , \h* # Require a comma
| # or
\d+ # Require a digit
(?: # (and optionally, one of the \d (-|and|to) \d forms)
\h* - \h* \d+
| \h+ and \h+ \d+
| \h+ to \h+ \d+
)?
)+ # End conditional, do many times
) # (2 end)
\h*
$

Preg_match/Preg_replace in php for matching pattern and replacing it in php

I want to replace value in string with XXX
input:
insert into employees values('shrenik', 555, NULL)
output:
insert into employees values('XXX', XXX, NULL)
I tried this: ([0-9]|\'.*\')
I want to match first for insert into after that want to skip the string up to (. I already mentioned in the statement the pattern and output I required.
Thanks in advance.
You can use this:
$sql = 'insert into employees values(\'shrenik\', 555, NULL)';
$pattern = '~(?:\binsert into [^(]*\(|\G(?<!^),(?:\s*+NULL,)*)\s*+\K(\')?(?(1)[^\']*\'|(?!NULL\b)[^\s,)]*)~i';
$sql = preg_replace($pattern, '$1XXX$1', $sql);
pattern details
~ # pattern delimiter
(?: # non capturing group: where the pattern is allowed to start
\binsert into [^(]*\( # after "insert to" until the opening parenthesis
| # OR
\G(?<!^), # after a precedent match if there is a comma
(?:\s*+NULL,)* # skip NULL values
)
\s*+ # zero or more spaces
\K # reset all that was matched before from match result
(')? # optional capture group 1 with single quote
(?(1) # IF capture group 1 exists:
[^']*' # THEN matches all characters except ' followed by a literal '
| # ELSE
(?!NULL\b)[^\s,)]* # matches all characters except spaces, comma, ) and the last NULL value
) # ENDIF
~i # closing pattern delimiter, case-insensitive

Regex/PHP Replace any repeating (but flexible) word group

How can I match "Any Group" repeated as "ANY GROUP" or "ANYGROUP"
$string = "Foo Bar (Any Group - ANY GROUP Baz)
Foo Bar (Any Group - ANYGROUP Baz)";
so they return as "Foo Bar (Any Group - Baz)"
The separator would always be -
This post extends Regex/PHP Replace any repeating word group
This matches "Any Group - ANY GROUP" but not when repeated without blank.
$result = preg_replace(
'%
( # Match and capture
(?: # the following:...
[\w/()]{1,30} # 1-30 "word" characters
[^\w/()]+ # 1 or more non-word characters
){1,4} # 1 to 4 times
) # End of capturing group 1
([ -]*) # Match any number of intervening characters (space/dash)
\1 # Match the same as the first group
%ix', # Case-insensitive, verbose regex
'\1\2', $subject);
This is ugly (as I said it would be), but it should work:
$result = preg_replace(
'/((\b\w+)\s+) # One repeated word
\s*-\s*
\2
|
((\b\w+)\s+(\w+)\s+) # Two repeated words
\s*-\s*
\4\s*\5
|
((\b\w+)\s+(\w+)\s+(\w+)\s+) # Three
\s*-\s*
\7\s*\8\s*\9
|
((\b\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+) # Four
\s*-\s*
\11\s*\12\s*\13\s*\14\b/ix',
'\1\3\6\10-', $subject);
Up to 6 word(s) solution is:
$result = preg_replace(
'/
(\(\s*)
(([^\s-]+)
\s*?([^\s-]*)
\s*?([^\s-]*)
\s*?([^\s-]*)
\s*?([^\s-]*)
\s*?([^\s-]*))
(\s*\-\s*)
\3\s*\4\s*\5\s*\6\s*\7\s*\8\s*
/ix',
'\1\2\9',
$string);
Check this demo.

Regex/PHP Replace any repeating word group

How can match
$string = "Foo Bar (Any Group - ANY GROUP Baz)";
Should return as "Foo Bar (Any Group - Baz)"
Is it possible without bruteforce as here Replace repeating strings in a string ?
Edit:
* The group could consist of 1-4 words while each word could match [A-Za-z0-9\/\(\)]{1,30}
* The separator would always be -
Leaving the space out of the list of allowed "word" characters, the following works for your example:
$result = preg_replace(
'%
( # Match and capture
(?: # the following:...
[\w/()]{1,30} # 1-30 "word" characters
[^\w/()]+ # 1 or more non-word characters
){1,4} # 1 to 4 times
) # End of capturing group 1
([ -]*) # Match any number of intervening characters (space/dash)
\1 # Match the same as the first group
%ix', # Case-insensitive, verbose regex
'\1\2', $subject);

Categories