What is the difference between 2 regex patterns? - php

I want users input their username with only alphanumeric and dot character.
So I wrote a regex pattern as following:
'/([a-zA-Z0-9\.]+)/'
But I want to know is it the same with:
'/([a-zA-Z0-9.]+)/'
2 below patterns is the same? Thank you for help! :-)

You don't need to escape the dot which was present inside a character class. Inside a character class, dot . and escaped dot \. matches the literal dot. So both regexes are same.
And also for validation purposes, i would suggest you to add anchors like '/^[a-zA-Z0-9.]+$/' . Anchors would be used to do a exact string match. That is , /[a-zA-Z0-9.]+/ regex would match the substring foo in this ()foo input string but if you add start and end anchors to your regex like /^[a-zA-Z0-9.]+$/, it won't match even a single character in the above mentioned string. It's allowed to match only one or more alphanumeric or dot characters , if it finds a character other than dot or alphanumeric, then the regex engine won't match the corresponding string.

Related

Regex: Differentiating underscore(_) and dash(-)

I want to construct a pattern that identifies the valid domain name. A valid domain name has alphanumeric characters and dashes in it. The only rule is that the name should not begin or end with a dash.
I have a regular expression for the validation as ^\w((\w|-)*\w)?$
However the expression is validating the strings with underscores too (for ex.: cake_centre) which is wrong. Can anyone tell me why this is happening and how it can be corrected?
P.S.: I am using preg_match() function in PHP for checking the validation.
The metacharacter \w includes underscores, you can make a character class that will allow your listed requirements:
[a-zA-Z\d-]
or per your regex:
^[a-zA-Z\d]([a-zA-Z\d-]*[a-zA-Z\d])?$
(Also note the - position in the character class is important, a - at the start or end is the literal value. If you have it in the middle it can create a range. What special characters must be escaped in regular expressions?)
Underscores are being validated because they are part of the \w character class. If you want to exclude it, try:
/^[a-z0-9]+[a-z0-9\-]*[a-z0-9]+$/i
Here is the regexp with lookaround approach
(?<!-)([a-zA-Z0-9_]+)(?!-)
regexp pattern is created in 3 groups
First group ^(?<!-) is negetive look back to ensure that matched chars does not have dash before
Second group ([a-zA-Z0-9_]+) give matching characters
Third group (?!-) is negetive lookahead to ensure match is not ending with dash

preg_match wildcard, require at least one character

The preg_match below matches 'empty' (0 characters) against the wildcard. I want to disable that:
preg_match('/site.com\/subsection\/.*?/', $page_url);
So the thing above should match site.com/subsection/subpage, but shouldn't match the root dir site.com/subsection/
How can I adjust the regex above? Thanks in advance!
The .*? at the end of the pattern matches empty string. You need to make it match one or more characters using .+:
'/site\.com\/subsection\/.+/'
^
Now, it requires at least 1 char after site.com/subsection/.
Note the dot must be escaped to match a literal dot.
Also, it might be a good idea to use regex delimiters other than / (as OcuS suggests in the comments below) if you have many slashes in the pattern itself. I usually use tildes:
'~site\.com/subsection/.+~'

PHP regular expression pattern allows unwanted literal asterisks

I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods. Here is the pattern:
return mb_ereg_match("^[\w\s'-\.]+$", $name);
Problem is this pattern, for some reason, returns true when there are literal asterisks in $name. This shouldn't be possible unless I'm missing something. I've done multiple searches on literal asterisks and all I found was the "\*" pattern for intentionally matching them.
The same pattern in preg_match() also returns a match when passed a string like "*John".
What the heck am I missing?
You need a double-backslash in front of these codes. One to escape the backslash, one to escape the escape sequence.
You also need to escape the -, otherwise it accepts all characters "between" ' and ..
return mb_ereg_match("^[\\w\\s'\\-\\.]+$", $name);
Have a look at a working case (using preg_match): http://ideone.com/E8afAM
When enclosed in square-brackets, the hyphen acts as a special character to denote a range. In your case, it's matching all characters in the range ' to ..
Escaping the hyphen should return the desired result:
^[\w\s'\-\.]+$
I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods.
You miss, that \w is not a letter character. php.net says:
A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word".
And, the perl definition is:
A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or a connecting punctuation character, such as an underscore ("_").
The connecting punctuation character should mean only _ as i read, but this is maybe a multibyte extension's bug.
If you use mb_ereg_match only for whole unicode matches, give a try to preg_match's /u modifier & the Unicode character properties feature, since php 5.1.0

How can I match occurrences of string not in another string using regular expressions?

I'm trying to match all occurances of "string" in something like the following sequence except those inside ##
as87dio u8u u7o #string# ou os8 string os u
i.e. the second occurrence should be matched but not the first
Can anyone give me a solution?
You can use negative lookahead and lookbehind:
(?<!#)string(?!#)
EDIT
NOTE: As per Marks comments below, this would not match #string or string#.
You can try:
(?:[^#])string(?:[^#])
OK,
If you want to NOT match a character you put it in a character class (square brackets) and start it with the ^ character which negates it, for example [^a] means any character but a lowercase 'a'.
So if you want NOT at-sign, followed by string, followed by another NOT at-sign, you want
[^#]string[^#]
Now, the problem is that the character classes will each match a character, so in your example we'd get " string " which includes the leading and trailing whitespace. So, there's another construct that tells you not to match anything, and that is parens with a ?: in the beginning. (?: ). So you surround the ends with that.
(?:[^#])string(?:[^#])
OK, but now it doesn't match at the start of string (which, confusingly, is the ^ character doing double-duty outside a character class) or at the end of string $. So we have to use the OR character | to say "give me a non-at-sign OR start of string" and at the end "give me an non-at-sign OR end of string" like this:
(?:[^#]|^)string(?:[^#]|$)
EDIT: The negative backward and forward lookahead is a simpler (and clever) solution, but not available to all regular expression engines.
Now a follow-up question. If you had the word "astringent" would you still want to match the "string" inside? In other words, does "string" have to be a word by itself? (Despite my initial reaction, this can get pretty complicated :) )

Regular expression for validating a username?

I'm still kinda new to using Regular Expressions, so here's my plight. I have some rules for acceptable usernames and I'm trying to make an expression for them.
Here they are:
1-15 Characters
a-z, A-Z, 0-9, and spaces are acceptable
Must begin with a-z or A-Z
Cannot end in a space
Cannot contain two spaces in a row
This is as far as I've gotten with it.
/^[a-zA-Z]{1}([a-zA-Z0-9]|\s(?!\s)){0,14}[^\s]$/
It works, for the most part, but doesn't match a single character such as "a".
Can anyone help me out here? I'm using PCRE in PHP if that makes any difference.
Try this:
/^(?=.{1,15}$)[a-zA-Z][a-zA-Z0-9]*(?: [a-zA-Z0-9]+)*$/
The look-ahead assertion (?=.{1,15}$) checks the length and the rest checks the structure:
[a-zA-Z] ensures that the first character is an alphabetic character;
[a-zA-Z0-9]* allows any number of following alphanumeric characters;
(?: [a-zA-Z0-9]+)* allows any number of sequences of a single space (not \s that allows any whitespace character) that must be followed by at least one alphanumeric character (see PCRE subpatterns for the syntax of (?:…)).
You could also remove the look-ahead assertion and check the length with strlen.
make everything after your first character optional
^[a-zA-Z]?([a-zA-Z0-9]|\s(?!\s)){0,14}[^\s]$
The main problem of your regexp is that it needs at least two characters two have a match :
one for the [a-zA-Z]{1} part
one for the [^\s] part
Beside this problem, I see some parts of your regexp that could be improved :
The [^\s] class will match any character, except spaces : a dot or semi-colon will be accepted, try to use the [a-zA-Z0-9] class here to ensure the character is a correct one.
You can delete the {1} part at the beginning, as the regexp will match exactly one character by default

Categories