If and only if end of string - php

To assert the end of a string with a regex you can use $
From what I've read though this is exactly what it does:
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
So this means it's not quite true, for example it wouldn't make a difference if I appended \n to a string when using $.
In my case this would be a security flaw in my PHP code as I use this regex to validate alphanumeric usernames on registration:
/^[a-zA-Z0-9]+$/
Is there a way to strictly assert if and only if it's the end of the string with a regex?

There are at least 2 ways to make sure you match at the very end of the string with a PCRE regex.
You may use \z anchor that matches at the very end of the string:
/^[a-zA-Z0-9]+\z/
Or, you may use a D modifier:
/^[a-zA-Z0-9]+$/D
The PCRE_DOLLAR_ENDONLY modifier D makes the $ anchor match at the very end of the string (excluding the position before the final newline in the string), i.e. act as \z anchor.

Related

php preg_replace and newline characters [duplicate]

I use a regex pattern i preg_match php function. The pattern is let's say '/abc$/'. It matches both strings:
'abc'
and
'abc
'
The second one has the line break at its end. What would be the pattern that matches only this first string?
'abc'
The reason why /abc$/ matches both "abc\n" and "abc" is that $ matches the location at the end of the string, or (even without /m modifier) the position before the newline that is at the end of the string.
You need the following regex:
/abc\z/
where \z is the unambiguous very end of the string, or
/abc$/D
where the /D modifier will make $ behave the same way as \z. See PHP.NET:
The meaning of dollar can be changed so that it matches only at the very end of the string, by setting the PCRE_DOLLAR_ENDONLY option at compile or matching time.
See the regex demo

GUID validation RegEx fails new line character [duplicate]

I use a regex pattern i preg_match php function. The pattern is let's say '/abc$/'. It matches both strings:
'abc'
and
'abc
'
The second one has the line break at its end. What would be the pattern that matches only this first string?
'abc'
The reason why /abc$/ matches both "abc\n" and "abc" is that $ matches the location at the end of the string, or (even without /m modifier) the position before the newline that is at the end of the string.
You need the following regex:
/abc\z/
where \z is the unambiguous very end of the string, or
/abc$/D
where the /D modifier will make $ behave the same way as \z. See PHP.NET:
The meaning of dollar can be changed so that it matches only at the very end of the string, by setting the PCRE_DOLLAR_ENDONLY option at compile or matching time.
See the regex demo

PHP regex Pattern Modifiers A and D

Can anyone help me with modifiers A and D?
I read the description 3 times and did a couple of tests on regex101 but I can not do it so that they would work. Or I can not find an example of what they would have earned.
For example, the regular expression
<u>[a-z]+<\/u>
works the same way with A and without A
https://regex101.com/r/X3nkMF/1/
See PHP/PCRE Manual: Possible modifiers in regex patterns
A(PCRE_ANCHORED)
If this modifier is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the start of the string which is being searched (the "subject string"). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl.
Example: /bar/A matches bar baz but not foo bar
There is also the \A anchor available to match start of the string. This is helpful in multiline mode (using the m flag) where ^ matches start of each line.
D(PCRE_DOLLAR_ENDONLY)
If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this modifier, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This modifier is ignored if m modifier is set. There is no equivalent to this modifier in Perl.
Example: /foo$/D matches foo but not foo\n
There is also the lower \z anchor available to match the absolute end of the string: foo\z Whereas the upper \Z would behave similar the dollar sign and also match before last \n with the difference that in multiline mode (m flag) upper \Z won't match at the end of each line.
<u>[a-z]+<\/u>
It does not matter whether you anchor that pattern to the beginning or not, it will always match the first line of
<u>word</u>
<u>main</u>
only - unless you add the g modifier to not stop after the first match.
So add /g and /gA, and then you will see what a difference this A makes ...

Capturing group with optional start and end characters

i have the follow string: find me String1\String2\String3, so i wanna capture string1, 2 and 3 if they exist. String 3 can be optional.
So far, what i could make is: (?<=find me)\s(\\?[\w]+\\?){1,3}, my assumption was:
The string should have find meat the beggining but it should not be captured
a whitespace
a group with \ as optional character at the beggining of the string, a word following it and \at the end of it, optional too, it can appear from 1 to 3 times.
What is wrong with my regex pattern?
Assuming your regex flavor supports \G, you can use this regex to capture all 3 strings separately:
(?<=find me |(?<!^)\G\\)\w+
RegEx Demo
\G asserts position at the end of the previous match or the start of the string for the first match.
\G matches a position that either line start OR end of the previous match. In this case I also have a negative lookbehind (?<!^) which means don't match line start, hence it makes \G match only the positions that end of the previous matches. For your example, it matches twice i.e. end of String1 and end of String2.

Regular expression any character but a white space

I'm creating a password validator which takes any character but whitespaces and with at least 6 characters.
After searching the best I came up is this is this example:
What is the regular expression for matching that contains no white space in between text?
It disallows any spaces inbetween but does allow starting and ending with a space. I want to disallow any space in the string passed.
I tried this but it doesn't work:
if (preg_match("/^[^\s]+[\S+][^\s]{6}$/", $string)) {
return true;
} else {
return false;
}
Thanks.
Something like this:
/^\S{6,}\z/
Can be quoted like:
preg_match('/^\S{6,}\z/', $string)
All answers using $ are wrong (at least without any special flags). You should use \z instead of $ if you do not want to allow a line break at the end of the string.
$ matches end of string or before a line break at end of string (if no modifiers are used)
\z matches end of string (independent of multiline mode)
From http://www.pcre.org/pcre.txt:
^ start of subject
also after internal newline in multiline mode
\A start of subject
$ end of subject
also before newline at end of subject
also before internal newline in multiline mode
\Z end of subject
also before newline at end of subject
\z end of subject
The simplest expression:
^\S{6,}$
^ means the start of the string
\S matches any non-whitespace character
{6,} means 6 or more
$ means the end of the string
In PHP, that would look like
preg_match('/^\S{6,}$/', $string)
Edit:
>> preg_match('/^\S{6,}$/', "abcdef\n")
1
>> preg_match('/^\S{6,}\z/', "abcdef\n")
0
>> preg_match('/^\S{6,}$/D', "abcdef\n")
0
Qtax is right. Good call! Although if you're taking input from an HTML <input type="text"> you probably won't have any newlines in it.
I think you should be fine using the following, which would match any string longer than 1 character with no whitespace:
^[^\s]+$
You can see the test here: http://regexr.com?2ua2e.
Try this. This will match at least 6 non whitespace characters followed by any number of additional non whitespace characters.
^[^\s]{6}[^\s]*$
\S - Matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v].
The start of string you can do : ^[ \t]+, and for end : [ \t]+$ (tab and spaces)
ETA:
By the way, you regex [\S+], i think you're looking for : [\S]+

Categories