This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
The preg_replace() function has so many possible values, like:
<?php
$patterns = array('/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/', '/^\s*{(\w+)}\s*=/');
$replace = array('\3/\4/\1\2', '$\1 =');
echo preg_replace($patterns, $replace, '{startDate} = 1999-5-27');
What does:
\3/\4/\1\2
And:
/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/','/^\s*{(\w+)}\s*=/
mean?
Is there any information available to help understand the meanings at one place? Any help or documents would be appreciated! Thanks in Advance.
Take a look at http://www.tutorialspoint.com/php/php_regular_expression.htm
\3 is the captured group 3
\4 is the captured group 4
...an so on...
\w means any word character.
\d means any digit.
\s means any white space.
+ means match the preceding pattern at least once or more.
* means match the preceding pattern 0 times or more.
{n,m} means match the preceding pattern at least n times to m times max.
{n} means match the preceding pattern exactly n times.
(n,} means match the preceding pattern at least n times or more.
(...) is a captured group.
So, the first thing to point out, is that we have an array of patterns ($patterns), and an array of replacements ($replace). Let's take each pattern and replacement and break it down:
Pattern:
/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/
Replacement:
\3/\4/\1\2
This takes a date and converts it from a YYYY-M-D format to a M/D/YYYY format. Let's break down it's components:
/ ... / # The starting and trailing slash mark the beginning and end of the expression.
(19|20) # Matches either 19 or 20, capturing the result as \1.
# \1 will be 19 or 20.
(\d{2}) # Matches any two digits (must be two digits), capturing the result as \2.
# \2 will be the two digits captured here.
- # Literal "-" character, not captured.
(\d{2}) # Either 1 or 2 digits, capturing the result as \3.
# \3 will be the one or two digits captured here.
- # Literal "-" character, not captured.
(\d{2}) # Either 1 or 2 digits, capturing the result as \4.
# \4 will be the one or two digits captured here.
This match is replaced by \3/\4/\1\2, which means:
\3 # The two digits captured in the 3rd set of `()`s, representing the month.
/ # A literal '/'.
\4 # The two digits captured in the 4rd set of `()`s, representing the day.
/ # A literal '/'.
\1 # Either '19' or '20'; the first two digits captured (first `()`s).
\2 # The two digits captured in the 2nd set of `()`s, representing the last two digits of the year.
Pattern:
/^\s*{(\w+)}\s*=/
Replacement:
$\1 =
This takes a variable name encoded as {variable} and converts it to $variable = <date>. Let's break it down:
/ ... / # The starting and trailing slash mark the beginning and end of the expression.
^ # Matches the beginning of the string, anchoring the match.
# If the following character isn't matched exactly at the beginning of the string, the expression won't match.
\s* # Any whitespace character. This can include spaces, tabs, etc.
# The '*' means "zero or more occurrences".
# So, the whitespace is optional, but there can be any amount of it at the beginning of the line.
{ # A literal '{' character.
(\w+) # Any 'word' character (a-z, A-Z, 0-9, _). This is captured in \1.
# \1 will be the text contained between the { and }, and is the only thing "captured" in this expression.
} # A literal '}' character.
\s* # Any whitespace character. This can include spaces, tabs, etc.
= # A literal '=' character.
This match is replaced by $\1 =, which means:
$ # A literal '$' character.
\1 # The text captured in the 1st and only set of `()`s, representing the variable name.
# A literal space.
= # A literal '=' character.
Lastly, I wanted to show you a couple of resources. The regex-format you're using is called "PCRE", or Perl-Compatible Regular Expressions. Here is a quick cheat-sheet on PCRE for PHP. Over the last few years, several tools have been popping up to help you visualize, explain, and test regular expressions. One is Regex 101 (just Google "regex tester" or "regex visualizer"). If you look here, this is an explanation of the first RegEx, and here is an explanation of the second. There are others as well, like Debuggex, Regex Tester, etc. But I find the detailed match breakdown on Regex 101 to be pretty useful.
Related
I need to match a series of strings that:
Contain at least 3 numbers
0 or more letters
0 or 1 - (not more)
0 or 1 \ (not more)
These characters can be in any position in the string.
The regular expression I have so far is:
([A-Z0-9]*[0-9]{3,}[\/]?[\-]?[0-9]*[A-Z]*)
This matches the following data in the following cases. The only one that does not match is the first one:
02ABU-D9435
013DFC
1123451
03323456782
ADS7124536768
03SDFA9433/0
03SDFA9433/
03SDFA9433/1
A41B03423523
O4AGFC4430
I think perhaps I am being too prescriptive about positioning. How can I update this regex to match all possibilities?
PHP PCRE
The following would not match:
01/01/2018 [multiple / or -]
AA-AA [no numbers]
Thanks
One option could be using lookaheads to assert 3 digits, not 2 backslashes and not 2 times a hyphen.
(?<!\S)(?=(?:[^\d\s]*\d){3})(?!(?:[^\s-]*-){2})(?!(?:[^\s\\]*\\){2})[A-Z0-9/\\-]+(?!\S)
About the pattern
(?<!\S) Assert what is on the left is not a non whitespace char
(?=(?:[^\d\s]*\d){3}) Assert wat is on the right is 3 times a whitespace char or digit
(?!(?:[^\s-]*-){2}) Assert what is on the right is not 2 times a whitespace char a hyphen
(?!(?:[^\s\\]*\\){2}) Assert what is on the right is not 2 times a whitespace char a backslash
[A-Z0-9/\\-]+ Match any of the listed 1+ times
(?!\S) Assert what is on the right is not a non whitespace char
Regex demo
Your patterns can be checked with positive/negative lookaheads anchored at the start of the string:
at least 3 digits -> find (not necessarily consecutive) 3 digits
no more than 1 '-' -> assert absence of (not necessarily consecutive) 2 '-' characters
no more than 1 '/' -> assert absence of (not necessarily consecutive) 2 '/' characters
0 or more letters -> no check needed.
If these conditions are met, any content is permitted.
The regex implementing this:
^(?=(([^0-9\r\n]*\d){3}))(?!(.*-){2})(?!(.*\/){2}).*$
Check out this Regex101 demo.
Remark
This solution assumes that each string tested resides on its own line, ie. not just being separated by whitespace.
In case the strings are separated by whitespace, choose the solution of user #TheFourthBird (which essentially is the same as this one but caters for the whitespace separation)
You can test the condition for both the hyphen and the slash into a same lookahead using a capture group and a backreference:
~\A(?!.*([-/]).*\1)(?:[A-Z/-]*\d){3,}[A-Z/-]*\z~
demo
detailled:
~ # using the tild as pattern delimiter avoids to escape all slashes in the pattern
\A # start of the string
(?! .* ([-/]) .* \1 ) # negative lookahead:
# check that there's no more than one hyphen and one slash
(?: [A-Z/-]* \d ){3,} # at least 3 digits
[A-Z/-]* # eventual other characters until the end of the string
\z # end of the string.
~
To better understand (if you are not familiar with): these three subpatterns start from the same position (in this case the beginning of the string):
\A
(?! .* ([-/]) .* \1 )
(?: [A-Z/-]* \d ){3,}
This is possible only because the two first are zero-width assertions that are simple tests and don't consume any character.
I need to generate a regex that will match the following format:
-1 LKSJDF LSAALSKJ~
Syjsdf
lkjdf
This block may contain multiple characters including digits, colons, etc. Any character other than a tilde.
~
I'm currently using this:
/(-\d|\d)\s([^$\~][a-zA-Z\s]*)\~\n/s
Which matches the first line fine. I need to capture the -1 through 60 that begins the pattern, the words after the space and up until the first tilde. I then need to capture all of the text BETWEEN the tildes.
I'm not the strongest with regex in the first place, but I'm having trouble getting this to work without also capturing the tildes.
You can use
'/^(-?\d+)\s+([^~]*)~([^~]+)~/m'
See demo
The regex matches:
^ - start of a line (due to /m modifier ^ does not match start of string any longer)
(-?\d+) - (Group 1) a one or zero - followed with one or more digits
\s+ - one or more whitespace symbols (to only match tab and regular spaces, use \h+ instead)
([^~]*) - (Group 2) zero or more characters other than a ~ (you can force to match these characters on the first line only by adding a \n\r to the negated character class - [^~\n\r])
~ - a literal leading tilde
([^~]+) - (Group 3) one or more characters other than a tilde
~ - a literal trailing tilde
If you need to only match these strings if the number is an integer between -1 and 60, you can use
'/^(-1|[1-5]?[0-9]|60)\s+([^~]*)~([^~]+)~/m'
See another demo
Here, the first group matches integer numbers from -1 to 60 with (-1|[1-5]?[0-9]|60) alternation group. -1 and 60 match literal numbers, and [1-5]?[0-9] matches one or zero (optional) digit from 1 to 5 (replace with [0-5]? if a leading zero is allowed) and then any one digit may follow.
I have following sting. I wanted to know any string has two slashes or not.
$sting = "largeimg/fee0b04800e22590/myimage1.jpg";
I am trying to use the following PHP emthodl
if(preg_match("#^/([A-Za-z]|[0-9])/([A-Za-z]|[0-9]+)$#", $sting))
But it is not working properly. Please help me.
Here is how to do it in regex (see demo):
^([^/]*/){2}
Your code:
if(preg_match("#^([^/]*/){2}#", $sting)) {
// two slashes!
}
Explain Regex
^ # the beginning of the string
( # group and capture to \1 (2 times):
[^/]* # any character except: '/' (0 or more
# times (matching the most amount
# possible))
/ # '/'
){2} # end of \1 (NOTE: because you are using a
# quantifier on this capture, only the LAST
# repetition of the captured pattern will be
# stored in \1)
you could use substr_count(), do:
$sting = "largeimg/fee0b04800e22590/myimage1.jpg";
if(substr_count($sting, '/') == 2) { echo "has 2 slashes"; }
To check for 2 slashes you can use this regex:
preg_match('#/[^/]*/#', $sting)
Several other answers provide regular expressions that work but they do not explain why the expression in the question does not work. The expression is:
#^/([A-Za-z]|[0-9])/([A-Za-z]|[0-9]+)$#
The section ([A-Za-z]|[0-9]) is equivalent to ([A-Za-z0-9]). The extra + in the second similar section makes that part quite different. The + is of higher precedence than the |. Hence the section ([A-Za-z]|[0-9]+) is equivalent to ([A-Za-z]|([0-9]+)) (ignoring the difference between capturing and non-capturing brackets). The expression is interpreted as:
^ Start of string
/ The character '/'
([A-Za-z]|[0-9]) One alphanumeric
/ The character '/'
(
[A-Za-z] One alpha character
| or
[0-9]+ One or more digits
)
$ End of the string
This will only match strings where the first three characters are /, one alphanumeric, then /. Then the remainder of the string must be either one alpha or several digits. Thus these strings would be matched:
/a/b
/c/123
/4/d
/5/6
/7/890123456789
These strings would not be matched:
/aa/b
c/c/123
/44/d
/5/6a
/5/a6
/7/ee
PHP vars can be of the following formats and can contain letters numbers and underscores:
$var_1
$var_1[key_1]
$var_1['key_1']
$var_1["key_1"]
$var_1[key_1][key_2]
$var_1['key_1']['key_2']
$var_1["key_1"]["key_2"]
$var_1->property_1
$var_1->property_1->property_2
Array and object will never have more than 2 nested elements. Objects won't have methods (i.e. $var_1->method_1() is not needed).
I need a RegEx matching them all, or a minimum amount of several RegExes, that would convert them into HTML echo snippets in the following format:
<?=$1?>
Where $1 is the entire matched string. If possible to add constants to the same RegEx it would be just perfect:
CONST_1 into <?=CONST_1?>
This should do it for the given examples:
\$\w+(?:\[(["']|)\w+\1\]|->\w+){0,2}
Replace it with <?=$0?> (make sure to use 0, because 1 is the first capture and not the entire match). I did not include constants, because I think that is rather tricky (how do you know it's a constant and not a reserved keyword - include all keywords?).
Explanation of the regex:
\$ # literal $
\w+ # letters, digits, underscores
(?: # subpattern to match indexing or a member
\[ # literal [
(["']|) # a ', a " or nothing (capture it in group 1)
\w+ # letters, digits, underscores
\1 # the correct matching closing delimiter
\] # literal ]
| # or
-> # literal ->
\w+ # letters, digits, underscores
){0,2} # end of subpattern, repeat 0 to 2 times
Note that if you use this pattern within PHP, you might have to escape the '.
A regular expression in preg_match is given as /server\-([^\-\.\d]+)(\d+)/. Can someone help me understand what this means? I see that the string starts with server- but I dont get ([^\-\.\d]+)(\d+)'
[ ] -> Match anything inside the square brackets for ONE character position once and only once, for example, [12] means match the target to 1 and if that does not match then match the target to 2 while [0123456789] means match to any character in the range 0 to 9.
- -> The - (dash) inside square brackets is the 'range separator' and allows us to define a range, in our example above of [0123456789] we could rewrite it as [0-9].
You can define more than one range inside a list, for example, [0-9A-C] means check for 0 to 9 and A to C (but not a to c).
NOTE: To test for - inside brackets (as a literal) it must come first or last, that is, [-0-9] will test for - and 0 to 9.
^ -> The ^ (circumflex or caret) inside square brackets negates the expression (we will see an alternate use for the circumflex/caret outside square brackets later), for example, [^Ff] means anything except upper or lower case F and [^a-z] means everything except lower case a to z.
You can check more explanations about it in the source I got this information: http://www.zytrax.com/tech/web/regex.htm
And if u want to test, u can try this one: http://gskinner.com/RegExr/
Here's the explanation:
# server\-([^\-\.\d]+)(\d+)
#
# Match the characters “server” literally «server»
# Match the character “-” literally «\-»
# Match the regular expression below and capture its match into backreference number 1 «([^\-\.\d]+)»
# Match a single character NOT present in the list below «[^\-\.\d]+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# A - character «\-»
# A . character «\.»
# A single digit 0..9 «\d»
# Match the regular expression below and capture its match into backreference number 2 «(\d+)»
# Match a single digit 0..9 «\d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
You can use programs such as RegexBuddy if you intend to work with regexes and are willing to spend some funds.
You can also use this free web based explanation utility.
^ means not one of the following characters inside the brackets
\- \. are the - and . characters
\d is a number
[^\-\.\d]+ means on of more of the characters inside the bracket, so one or more of anything not a -, . or a number.
(\d+) one or more number
Here is the explanation given by the perl module YAPE::Regex::Explain
The regular expression:
(?-imsx:server\-([^\-\.\d]+)(\d+))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
server 'server'
----------------------------------------------------------------------
\- '-'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^\-\.\d]+ any character except: '\-', '\.', digits
(0-9) (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------