php preg_replace and newline characters [duplicate] - php

I use a regex pattern i preg_match php function. The pattern is let's say '/abc$/'. It matches both strings:
'abc'
and
'abc
'
The second one has the line break at its end. What would be the pattern that matches only this first string?
'abc'

The reason why /abc$/ matches both "abc\n" and "abc" is that $ matches the location at the end of the string, or (even without /m modifier) the position before the newline that is at the end of the string.
You need the following regex:
/abc\z/
where \z is the unambiguous very end of the string, or
/abc$/D
where the /D modifier will make $ behave the same way as \z. See PHP.NET:
The meaning of dollar can be changed so that it matches only at the very end of the string, by setting the PCRE_DOLLAR_ENDONLY option at compile or matching time.
See the regex demo

Related

How to pull a IP out of a string using preg_match in PHP:? [duplicate]

What is the difference between "\\w+#\\w+[.]\\w+" and "^\\w+#\\w+[.]\\w+$"? I have tried to google for it but no luck.
^ means "Match the start of the string" (more exactly, the position before the first character in the string, so it does not match an actual character).
$ means "Match the end of the string" (the position after the last character in the string).
Both are called anchors and ensure that the entire string is matched instead of just a substring.
So in your example, the first regex will report a match on email#address.com.uk, but the matched text will be email#address.com, probably not what you expected. The second regex will simply fail.
Be careful, as some regex implementations implicitly anchor the regex at the start/end of the string (for example Java's .matches(), if you're using that).
If the multiline option is set (using the (?m) flag, for example, or by doing Pattern.compile("^\\w+#\\w+[.]\\w+$", Pattern.MULTILINE)), then ^ and $ also match at the start and end of a line.
Try the Javadoc:
http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
^ and $ match the beginnings/endings of a line (without consuming them)

GUID validation RegEx fails new line character [duplicate]

I use a regex pattern i preg_match php function. The pattern is let's say '/abc$/'. It matches both strings:
'abc'
and
'abc
'
The second one has the line break at its end. What would be the pattern that matches only this first string?
'abc'
The reason why /abc$/ matches both "abc\n" and "abc" is that $ matches the location at the end of the string, or (even without /m modifier) the position before the newline that is at the end of the string.
You need the following regex:
/abc\z/
where \z is the unambiguous very end of the string, or
/abc$/D
where the /D modifier will make $ behave the same way as \z. See PHP.NET:
The meaning of dollar can be changed so that it matches only at the very end of the string, by setting the PCRE_DOLLAR_ENDONLY option at compile or matching time.
See the regex demo

PHP regex Pattern Modifiers A and D

Can anyone help me with modifiers A and D?
I read the description 3 times and did a couple of tests on regex101 but I can not do it so that they would work. Or I can not find an example of what they would have earned.
For example, the regular expression
<u>[a-z]+<\/u>
works the same way with A and without A
https://regex101.com/r/X3nkMF/1/
See PHP/PCRE Manual: Possible modifiers in regex patterns
A(PCRE_ANCHORED)
If this modifier is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the start of the string which is being searched (the "subject string"). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl.
Example: /bar/A matches bar baz but not foo bar
There is also the \A anchor available to match start of the string. This is helpful in multiline mode (using the m flag) where ^ matches start of each line.
D(PCRE_DOLLAR_ENDONLY)
If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this modifier, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This modifier is ignored if m modifier is set. There is no equivalent to this modifier in Perl.
Example: /foo$/D matches foo but not foo\n
There is also the lower \z anchor available to match the absolute end of the string: foo\z Whereas the upper \Z would behave similar the dollar sign and also match before last \n with the difference that in multiline mode (m flag) upper \Z won't match at the end of each line.
<u>[a-z]+<\/u>
It does not matter whether you anchor that pattern to the beginning or not, it will always match the first line of
<u>word</u>
<u>main</u>
only - unless you add the g modifier to not stop after the first match.
So add /g and /gA, and then you will see what a difference this A makes ...

If and only if end of string

To assert the end of a string with a regex you can use $
From what I've read though this is exactly what it does:
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
So this means it's not quite true, for example it wouldn't make a difference if I appended \n to a string when using $.
In my case this would be a security flaw in my PHP code as I use this regex to validate alphanumeric usernames on registration:
/^[a-zA-Z0-9]+$/
Is there a way to strictly assert if and only if it's the end of the string with a regex?
There are at least 2 ways to make sure you match at the very end of the string with a PCRE regex.
You may use \z anchor that matches at the very end of the string:
/^[a-zA-Z0-9]+\z/
Or, you may use a D modifier:
/^[a-zA-Z0-9]+$/D
The PCRE_DOLLAR_ENDONLY modifier D makes the $ anchor match at the very end of the string (excluding the position before the final newline in the string), i.e. act as \z anchor.

Regular expression any character but a white space

I'm creating a password validator which takes any character but whitespaces and with at least 6 characters.
After searching the best I came up is this is this example:
What is the regular expression for matching that contains no white space in between text?
It disallows any spaces inbetween but does allow starting and ending with a space. I want to disallow any space in the string passed.
I tried this but it doesn't work:
if (preg_match("/^[^\s]+[\S+][^\s]{6}$/", $string)) {
return true;
} else {
return false;
}
Thanks.
Something like this:
/^\S{6,}\z/
Can be quoted like:
preg_match('/^\S{6,}\z/', $string)
All answers using $ are wrong (at least without any special flags). You should use \z instead of $ if you do not want to allow a line break at the end of the string.
$ matches end of string or before a line break at end of string (if no modifiers are used)
\z matches end of string (independent of multiline mode)
From http://www.pcre.org/pcre.txt:
^ start of subject
also after internal newline in multiline mode
\A start of subject
$ end of subject
also before newline at end of subject
also before internal newline in multiline mode
\Z end of subject
also before newline at end of subject
\z end of subject
The simplest expression:
^\S{6,}$
^ means the start of the string
\S matches any non-whitespace character
{6,} means 6 or more
$ means the end of the string
In PHP, that would look like
preg_match('/^\S{6,}$/', $string)
Edit:
>> preg_match('/^\S{6,}$/', "abcdef\n")
1
>> preg_match('/^\S{6,}\z/', "abcdef\n")
0
>> preg_match('/^\S{6,}$/D', "abcdef\n")
0
Qtax is right. Good call! Although if you're taking input from an HTML <input type="text"> you probably won't have any newlines in it.
I think you should be fine using the following, which would match any string longer than 1 character with no whitespace:
^[^\s]+$
You can see the test here: http://regexr.com?2ua2e.
Try this. This will match at least 6 non whitespace characters followed by any number of additional non whitespace characters.
^[^\s]{6}[^\s]*$
\S - Matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v].
The start of string you can do : ^[ \t]+, and for end : [ \t]+$ (tab and spaces)
ETA:
By the way, you regex [\S+], i think you're looking for : [\S]+

Categories