meaning of `$/i` in regular expressions - php

What does the $/i mean in the following php code?
preg_match ('/^[A-Z \'.-]{2,20}$/i')

/ denotes the end of the pattern. The i is a modifier that makes the pattern case-insensitive, and the $ anchor matches the end of the string.

the $ is an anchor -- it means the end of the string should be there. the / is the end delimiter for the regular expression. The i means that the regular expressions should be case-insensitive (notice that [A-Z \'.-] only matches A-Z -- the i means it doesn't have to look for a-z as well).

Dollar sign is a common regex symbol meaning "end of line".
The slash at the end is the end of the expression itself.
Any letters after that slash are options you can turn on or off, called modifiers. In the case of i it means case-insensitive.

$ Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character
/ is the ending delimiter of the regex pattern in PHP
i represents case insensitive regular expression search

you can also use this to understand things better, and can be used for testing/practice too.
http://gskinner.com/RegExr/

Related

PHP regex Pattern Modifiers A and D

Can anyone help me with modifiers A and D?
I read the description 3 times and did a couple of tests on regex101 but I can not do it so that they would work. Or I can not find an example of what they would have earned.
For example, the regular expression
<u>[a-z]+<\/u>
works the same way with A and without A
https://regex101.com/r/X3nkMF/1/
See PHP/PCRE Manual: Possible modifiers in regex patterns
A(PCRE_ANCHORED)
If this modifier is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the start of the string which is being searched (the "subject string"). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl.
Example: /bar/A matches bar baz but not foo bar
There is also the \A anchor available to match start of the string. This is helpful in multiline mode (using the m flag) where ^ matches start of each line.
D(PCRE_DOLLAR_ENDONLY)
If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this modifier, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This modifier is ignored if m modifier is set. There is no equivalent to this modifier in Perl.
Example: /foo$/D matches foo but not foo\n
There is also the lower \z anchor available to match the absolute end of the string: foo\z Whereas the upper \Z would behave similar the dollar sign and also match before last \n with the difference that in multiline mode (m flag) upper \Z won't match at the end of each line.
<u>[a-z]+<\/u>
It does not matter whether you anchor that pattern to the beginning or not, it will always match the first line of
<u>word</u>
<u>main</u>
only - unless you add the g modifier to not stop after the first match.
So add /g and /gA, and then you will see what a difference this A makes ...

Issue with regular expression for string validation

I am trying to validate following type of string using regular expressions in PHP. Using PHP 5.5.9.
String is in following format:
/[sometext]/course/[sometext1]/[sometext2]
What I need is a regex that will accept string that is only in that format and nothing else. Meaning these would be invalid:
/aaa/course/bbb/ccc/
/aaa/course/bbb/ccc/ddd
What I have so far is this:
/\/(?P<domain>.+?)\/course\/(?P<courseid>.+?)\/(?P<reportname>.+?)/
Any ideas?
Update:
With the help from all posters and especially wiktor-stribi%c5%bcew I got this one that works:
$regex = '#^/(?P<domain>[^/]+)/course/(?P<courseid>[^/]+)/(?P<reportname>[^/]+)$#';
You can use the following regular expression:
^\/(?P<domain>[^\/]+)\/course\/(?P<courseid>[^\/]+)\/(?P<reportname>[^\/]+)$
PHP:
$re = '~^/(?P<domain>[^/]+)/course/(?P<courseid>[^/]+)/(?P<reportname>[^/]+)$~';
See the regex demo
The [^\/] is a negated character class that matches any character but /.
The ^ and $ are usually enough to make sure your input starts and ends with the current pattern (you can replace them with \A and \z respectively to make sure the \z matches at the very end of the string, or use ^/$ with the /D modifier).
Even if you use lazy .+? dot matching, the . can overflow several / delimiters if it is necessary to return a valid match.
first... use something other than '/' as your delimiter (the slashes as the beginning and end of the regex)... it makes it easier to write the regex without having to escape the delimiter within
$regex = '#^/[a-z]+/[^/]+/[a-z]+/[a-z]+$#'

Regex to detect the colon and sides of it?

First see my string please:
$a = "[ child : parent ]";
How can I detect that the pattern is:
[(optional space)word or character(optional space) : (optional space)word or character(optional space)]
You can catch this as follows in PHP:
Your regular expression is /\[ *\w+ *: *\w+ *]/
You would write code that would look like this to see if it matched.
if (preg_match('/regex/', $string)) {
// do things
}
Explanation of the Regular Expression
There is a backslash (\) before the open bracket because
[ has special meaning in regular expressions. The backslash
prevents its special meaning from being used.
The asterisk (*) matches 0 or more of the previous character expression. In this
case, it matches 0 or more spaces. If you instead used the
expression \s*, it would match 0 or more white-space characters
(space, tab, line break). Finally, if you wanted it to match 0 or 1
of the previous character, you would use ? instead of *.
The plus (+) matches 1 or more of the previous character expression. The \w character expression matches a letter, digit, or underscore. If you don't want underscores to match, you should instead use a character class. For example, you could use [A-Za-z0-9].
You can find more information on regular expressions at http://www.regular-expressions.info and http://www.regular-expressions.info/php.html
From your sample text I'd say you mean a human word and not \w regex word
preg_match('/\[ ?([a-z]+) ?: ?([a-z]+) ?\]/i', $a, $matches);
Explained demo: http://regex101.com/r/hB2oV9
$matches will save both values, test with var_dump($matches);
I'm not sure on the php-specific version of regex, but this should work:
\[ ?\w+ ? : ?\w+ ?\]
Here is a site that I've used in the past to find regular expressions for my needed patterns.
use this regex \[\s*\w+\s*:\s*\w+\s*\]
I would probably do it like this
preg_match('/^\[\s?\w+\s+:\s+\w+\s?\]$/', $string)

What is wrong in this regular expression and how can I improve it?

I'm using the following regex code:
^[a-z0-9_-]{3,15}$^
I'm using this for username validation and I want it to match alphanumeric characters, - , _ and periods.
The following weird thing happens:
It doesn't match this:
bla.b
But it matches this one:
bla.blabla
How can I change this, so that it matches both? I still would like to be able to change the min and max characters freely. (btw. there maybe more wrong things about this regex. This one I discovered accidentally)
UPDATE: I should mention that I'm using this in CakePHP validation and this gives me an error:
^[a-z0-9_.-]{3,15}$
this is the error:
Warning (2): preg_match() [function.preg-match]: No ending delimiter '^' found
You made a little mistake and forgot to put the ^ at the beginning. Choosing a different delimiter might make that more visible:
^[a-z0-9_-]{3,15}$^ // your non-working version
^ ^
/^[a-z0-9_-]{3,15}$/ // using / as delimiters instead, setting the beginning
^
Remember:
^ - marks the beginning of the subject
$ - marks the end of the subject
Both are part of the pattern. The delimiters are used to separate the pattern from the modifiers (you don't use any modifiers here).
Alternatively you can denote the beginning and end as well with \A and \Z if it helps.
To now also match the dot, add it to your character class:
/^[a-z0-9_.-]{3,15}$/
^
^[a-z0-9_-]{3,15}$^
should be:
^[a-z0-9_-]{3,15}$
^ denotes the start of the string, and $ denotes the end of string.
This should do it:
/^[a-z0-9_\.\-]{3,15}$/
If you want to match a username then you probably do not want it to start or end with a dot. In that case you can use this:
/^(?<!\.)[a-z0-9_\.\-]{3,15}(?!\.)$/
This is how that regex breaks down:
^ means the "beginning of the string"
(?<!\.) makes sure that the username cannot start with a dot
[a-z0-9\._-]{3,15} means 3 to 15 alphanumeric characters, dots, underscores and hyphens
(?!\.) makes sure that the username cannot end with a dot
$ means the "end of the string"
If you allow uppercase characters then you can shorten the regex slightly:
/^(?<!\.)[\w\.\-]{3,15}(?!\.)$/
The \w is short for [a-zA-Z0-9_], also called word characters.
Another way of making sure that a username does not start or end with a dot is to use three consecutive [], like so:
/^[\w\-][\w\.\-]{1,13}[\w\-]$/
It can be useful if you need to match something in Javascript which, I believe, does not support lookbehind and lookahead.

What are those characters in a regular expression?

I found this regex that works correctly but I didn't understand what is # (at the start) and at the end of the expression. Are not ^ and $ the start/end characters?
preg_match_all('#^/([^/]+)/([^/]+)/$#', $s, $matches);
Thanks
The matched pattern contains many /, thus the # is used as regex delimeter. These are identical
/^something$/
and
#^something$#
If you have multiple / in your pattern the 2nd example is better suited to avoid ugly masking with \/. This is how the RE would like like with using the standard // syntax:
/^\/([^\/]+)\/([^\/]+)\/$/
About #:
That's a delimiter of the regular expression itself. It's only meaning is to tell which delimiter is used for the expression. Commonly / is used, but others are possible. PCRE expressions need a delimiter with preg_match or preg_match_all.
About ^:
Inside character classes ([...]), the ^ has the meaning of not if it's the first character.
[abc] : matching a, b or c
[^abc] : NOT matching a, b or c, match every other character instead
Also # at the start and the end here are custom regex delimiters. Instead of the usual /.../ you have #...#. Just like perl.
These are delimiters. You can use any delimiter you want, but they must appear at the start and end of the regular expression.
Please see this documentation for a detail insight in to regular expressions:
http://www.php.net/manual/en/pcre.pattern.php
You can use pretty much anything as delimiters. The most common one is /.../, but if the pattern itself contains / and you don't want to escape any and all occurrences, you can use a different delimiter. My personal preference is (...) because it reminds me that $0 of the result is the entire pattern. But you can do anything, <...>, #...#, %...%, {...}... well, almost anything. I don't know exactly what the requirements are, but I think it's "any non-alphanumeric character".
Let me break it down:
# is the first character, so this is the character used as the delimiter of the regular expression - we know we've got to the end when we reach the next (unescaped) one of these
^ outside of a character class, this means the beginning of the string
/ is just a normal 'slash' character
([^/]+) This is a bracketed expression containing at least one (+) instance of any character that isn't a / (^ at the beginning of a character class inverts the character class - meaning it will only match characters that are not in this list)
/ again
([^/]+) again
/ again
$ this matches the end of the string
# this is the final delimeter, so we know that the regex is now finished.

Categories