GUID validation RegEx fails new line character [duplicate] - php

I use a regex pattern i preg_match php function. The pattern is let's say '/abc$/'. It matches both strings:
'abc'
and
'abc
'
The second one has the line break at its end. What would be the pattern that matches only this first string?
'abc'

The reason why /abc$/ matches both "abc\n" and "abc" is that $ matches the location at the end of the string, or (even without /m modifier) the position before the newline that is at the end of the string.
You need the following regex:
/abc\z/
where \z is the unambiguous very end of the string, or
/abc$/D
where the /D modifier will make $ behave the same way as \z. See PHP.NET:
The meaning of dollar can be changed so that it matches only at the very end of the string, by setting the PCRE_DOLLAR_ENDONLY option at compile or matching time.
See the regex demo

Related

How to pull a IP out of a string using preg_match in PHP:? [duplicate]

What is the difference between "\\w+#\\w+[.]\\w+" and "^\\w+#\\w+[.]\\w+$"? I have tried to google for it but no luck.
^ means "Match the start of the string" (more exactly, the position before the first character in the string, so it does not match an actual character).
$ means "Match the end of the string" (the position after the last character in the string).
Both are called anchors and ensure that the entire string is matched instead of just a substring.
So in your example, the first regex will report a match on email#address.com.uk, but the matched text will be email#address.com, probably not what you expected. The second regex will simply fail.
Be careful, as some regex implementations implicitly anchor the regex at the start/end of the string (for example Java's .matches(), if you're using that).
If the multiline option is set (using the (?m) flag, for example, or by doing Pattern.compile("^\\w+#\\w+[.]\\w+$", Pattern.MULTILINE)), then ^ and $ also match at the start and end of a line.
Try the Javadoc:
http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
^ and $ match the beginnings/endings of a line (without consuming them)

php preg_replace and newline characters [duplicate]

I use a regex pattern i preg_match php function. The pattern is let's say '/abc$/'. It matches both strings:
'abc'
and
'abc
'
The second one has the line break at its end. What would be the pattern that matches only this first string?
'abc'
The reason why /abc$/ matches both "abc\n" and "abc" is that $ matches the location at the end of the string, or (even without /m modifier) the position before the newline that is at the end of the string.
You need the following regex:
/abc\z/
where \z is the unambiguous very end of the string, or
/abc$/D
where the /D modifier will make $ behave the same way as \z. See PHP.NET:
The meaning of dollar can be changed so that it matches only at the very end of the string, by setting the PCRE_DOLLAR_ENDONLY option at compile or matching time.
See the regex demo

PHP regex Pattern Modifiers A and D

Can anyone help me with modifiers A and D?
I read the description 3 times and did a couple of tests on regex101 but I can not do it so that they would work. Or I can not find an example of what they would have earned.
For example, the regular expression
<u>[a-z]+<\/u>
works the same way with A and without A
https://regex101.com/r/X3nkMF/1/
See PHP/PCRE Manual: Possible modifiers in regex patterns
A(PCRE_ANCHORED)
If this modifier is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the start of the string which is being searched (the "subject string"). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl.
Example: /bar/A matches bar baz but not foo bar
There is also the \A anchor available to match start of the string. This is helpful in multiline mode (using the m flag) where ^ matches start of each line.
D(PCRE_DOLLAR_ENDONLY)
If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this modifier, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This modifier is ignored if m modifier is set. There is no equivalent to this modifier in Perl.
Example: /foo$/D matches foo but not foo\n
There is also the lower \z anchor available to match the absolute end of the string: foo\z Whereas the upper \Z would behave similar the dollar sign and also match before last \n with the difference that in multiline mode (m flag) upper \Z won't match at the end of each line.
<u>[a-z]+<\/u>
It does not matter whether you anchor that pattern to the beginning or not, it will always match the first line of
<u>word</u>
<u>main</u>
only - unless you add the g modifier to not stop after the first match.
So add /g and /gA, and then you will see what a difference this A makes ...

If and only if end of string

To assert the end of a string with a regex you can use $
From what I've read though this is exactly what it does:
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
So this means it's not quite true, for example it wouldn't make a difference if I appended \n to a string when using $.
In my case this would be a security flaw in my PHP code as I use this regex to validate alphanumeric usernames on registration:
/^[a-zA-Z0-9]+$/
Is there a way to strictly assert if and only if it's the end of the string with a regex?
There are at least 2 ways to make sure you match at the very end of the string with a PCRE regex.
You may use \z anchor that matches at the very end of the string:
/^[a-zA-Z0-9]+\z/
Or, you may use a D modifier:
/^[a-zA-Z0-9]+$/D
The PCRE_DOLLAR_ENDONLY modifier D makes the $ anchor match at the very end of the string (excluding the position before the final newline in the string), i.e. act as \z anchor.

What is wrong in this regular expression and how can I improve it?

I'm using the following regex code:
^[a-z0-9_-]{3,15}$^
I'm using this for username validation and I want it to match alphanumeric characters, - , _ and periods.
The following weird thing happens:
It doesn't match this:
bla.b
But it matches this one:
bla.blabla
How can I change this, so that it matches both? I still would like to be able to change the min and max characters freely. (btw. there maybe more wrong things about this regex. This one I discovered accidentally)
UPDATE: I should mention that I'm using this in CakePHP validation and this gives me an error:
^[a-z0-9_.-]{3,15}$
this is the error:
Warning (2): preg_match() [function.preg-match]: No ending delimiter '^' found
You made a little mistake and forgot to put the ^ at the beginning. Choosing a different delimiter might make that more visible:
^[a-z0-9_-]{3,15}$^ // your non-working version
^ ^
/^[a-z0-9_-]{3,15}$/ // using / as delimiters instead, setting the beginning
^
Remember:
^ - marks the beginning of the subject
$ - marks the end of the subject
Both are part of the pattern. The delimiters are used to separate the pattern from the modifiers (you don't use any modifiers here).
Alternatively you can denote the beginning and end as well with \A and \Z if it helps.
To now also match the dot, add it to your character class:
/^[a-z0-9_.-]{3,15}$/
^
^[a-z0-9_-]{3,15}$^
should be:
^[a-z0-9_-]{3,15}$
^ denotes the start of the string, and $ denotes the end of string.
This should do it:
/^[a-z0-9_\.\-]{3,15}$/
If you want to match a username then you probably do not want it to start or end with a dot. In that case you can use this:
/^(?<!\.)[a-z0-9_\.\-]{3,15}(?!\.)$/
This is how that regex breaks down:
^ means the "beginning of the string"
(?<!\.) makes sure that the username cannot start with a dot
[a-z0-9\._-]{3,15} means 3 to 15 alphanumeric characters, dots, underscores and hyphens
(?!\.) makes sure that the username cannot end with a dot
$ means the "end of the string"
If you allow uppercase characters then you can shorten the regex slightly:
/^(?<!\.)[\w\.\-]{3,15}(?!\.)$/
The \w is short for [a-zA-Z0-9_], also called word characters.
Another way of making sure that a username does not start or end with a dot is to use three consecutive [], like so:
/^[\w\-][\w\.\-]{1,13}[\w\-]$/
It can be useful if you need to match something in Javascript which, I believe, does not support lookbehind and lookahead.

Categories