Preg_match php explaination - php

I have a question regarding one character in the preg_match syntax below.
I just want to completely understand.
\w looking for alpha-numberic characters and the underscore.
My question is what does the \ mean after \w and before the # sign?
Does this mean that it will allow:
any alphanumeric
any backslash
any dash
or is this backslash meant to single out the character that follows?
When I test it in w3schools.com example I can have backslashes in the email address which validates but they are removed when they are echoed out.
$email = test_input($_POST["email"]);
// check if e-mail address syntax is valid
if (!preg_match("/([\w\-]+\#[\w\-]+\.[\w\-]+)/",$email))
{
$emailErr = "Invalid email format";
}

The backslash is used to escape characters that have a special meaning in a regex to obtain a literal character. There are twelve characters that must be escaped: [ { ( ) . ? * + | \ ^ $
If I want to write a literal $ in a pattern, I must write \$
Note: you don't need to escape { if the situation is no ambiguous (with the quantifier {m,n} or {m})
Note 2: The delimiter of the pattern must be escaped too, inside and outside a character class.
Inside a character class these twelve characters don't need no more to be escaped since they loose their special meaning and are seen as literals. However, there is three characters that have a special meaning if they are in a special position in the character class. These characters are: ^ - ]
^ at the first position is used to negate a character class ([^M] => all that is not a M ). If you want to use it as a literal character at "the first position", you must write: [\^]
- between two characters defines a character range ([a-z]). This means that you don't need to escape it at the begining (or immediatly after ^) or at the end of the class. You only need to escape it between two characters. - is seen as a literal (and doesn't define a range) in all these examples:
[-abcd]
[^-abcd]
[abcd-]
[ab\-cd]
[\s-abcd] # because \s is not a character
] since it is used to close the character class must be escaped except at the first position or immediatly after the ^. []] and [^]] are correct.
If I write the pattern without uneeded backslashes, I obtain:
/([\w-]+#[\w-]+\.[\w-]+)/
To answer your question ("What does it mean?"): Nothing, uneeded escapes are ignored by the regex engine.

Related

why 3 backslash equal 4 backslash in php?

<?php
$a='/\\\/';
$b='/\\\\/';
var_dump($a);//string '/\\/' (length=4)
var_dump($b);//string '/\\/' (length=4)
var_dump($a===$b);//boolean true
?>
Why is the string with 3 backslashes equal to the string with 4 backslashes in PHP?
And can we use the 3-backslash version in regular expression?
The PHP reference says we must use 4 backslashes.
Note:
Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.
$b='/\\\\/';
php parses the string literal (more or less) character by character. The first input symbol is the forward slash. The result is a forward slash in the result (of the parsing step) and the input symbol (one character, the /) is taken away from the input.
The next input symbol is a backslash. It's taken from the input and the next character/symbol is inspected. It's also a backslash. That's a valid combination, so the second symbol is also taken from the input and the result is a single blackslash (for both input symbols).
The same with the third and fourth backslash.
The last input symbol (within the literal) is the forwardslash -> forwardslash in the result.
-> /\\/
Now for the string with three backslashes:
$a='/\\\/';
php "finds" the first blackslash, the next character is a blackslash - that's a valid combination resulting in one single blackslash in the result and both characters in the input literal taken.
php then "finds" the third blackslash, the next character is a forward-slash, this is not a valid combination. So the result is a single blackslash (because php loves and forgives you....) and only one character taken from the input.
The next input character is the forward-slash, resulting in a forwardslash in the result.
-> /\\/
=> both literals encode the same string.
It is explained in the documentation on the page about Strings:
Under the Single quoted section it says:
The simplest way to specify a string is to enclose it in single quotes (the character ').
To specify a literal single quote, escape it with a backslash (\). To specify a literal backslash, double it (\\). All other instances of backslash will be treated as a literal backslash.
Let's try to interpret your strings:
$a='/\\\/';
The forward slashes (/) have no special meaning in PHP strings, they represent themselves.
The first backslash (\) escapes the second backslash, as explained in the first sentence from the second paragraph quoted above.
The third backslash stands for itself, as explained in the last sentence of the above quote, because it is not followed by an apostrophe (') or a backslash (\).
As a result, the variable $a contains this string: /\\/.
On
$b='/\\\\/';
there are two backslashes (the second and the fourth) that are escaped by the first and the third backslash. The final (runtime) string is the same as for $a: /\\/.
Note
The discussion above is about the encoding of strings in PHP source. As you can see, there always is more than one (correct) way to encode the same string. Other options (beside string literals enclosed in single or double quotes, using heredoc or nowdoc syntax) is to use constants (for literal backslashes, for example) and build the strings from pieces.
For example:
define('BS', '\'); // can also use '\\', the result is the same
$c = '/'.BS.BS.'/';
uses no escaping and a single backslash. The constant BS contains a literal backslash and it is used everywhere a backslash is needed for its intrinsic value. Where a backslash is needed for escaping then a real backslash is used (there is no way to use BS for that).
The escaping in regex is a different thing. First, the regex is parsed at the runtime and at runtime $a, $b and $c above contain /\\/, no matter how they were generated.
Then, in regex a backslash that is not followed by a special character is ignored (see the difference above, in PHP it is interpreted as a literal backslash).
Combining PHP & regex
There are endless possibilities to make the things complicate. Let's try to keep them simple and put some guidelines for regex in PHP:
enclose the regex string in apostrophes ('), if it's possible; this way there are only two characters that needs to be escaped for PHP: the apostrophe and the backslash;
when parse URLs, paths or other strings that can contain forward slashes (/) use #, ~, ! or # as regex delimiter (which one is not used in the regex itself); this way there is no need to escape the delimiter when it is used inside the regex;
don't escape in regex characters when it's not needed; f.e., the dash (-) has a special meaning only when it is used in character classes; outside them it's useless to escape it (and even in character classes it can be used unquoted without having any special meaning if it is placed as the very first or the very last character inside the [...] enclosure);

allow parentheses and other symbols in regex

I've made this regex:
^[a-zA-Z0-9_.-]*$
Supports:
letters [uppercase and lowercase]
numbers [from 0 to 9]
underscores [_]
dots [.]
hyphens [-]
Now, I want to add these:
spaces [ ]
comma [,]
exclamation mark [!]
parenthesis [()]
plus [+]
equal [=]
apostrophe [']
double quotation mark ["]
at [#]
dollar [$]
percent [%]
asterisk [*]
For example, this code accept only some of the symbols above:
^[a-zA-Z0-9 _.,-!()+=“”„#"$#%*]*$
Returns:
Warning: preg_match(): Compilation failed: range out of order in character class at offset 16
Make sure to put hyphen - either at start or at end in character class otherwise it needs to be escaped. Try this regex:
^[a-zA-Z0-9 _.,!()+=`,"#$#%*-]*$
Also note that because * it will even match an empty string. If you don't want to match empty strings then use +:
^[a-zA-Z0-9 _.,!()+=`,"#$#%*-]+$
Or better:
^[\w .,!()+=`,"#$#%*-]+$
TEST:
$text = "_.,!()+=,#$#%*-";
if(!preg_match('/\A[\w .,!()+=`,"#$#%*-]+\z/', $text)) {
echo "error.";
}
else {
echo "OK.";
}
Prints:
OK.
The hyphen is being treated as a range marker -- when it sees ,-! it thinks you're asking for a range all characters in the charset that fall between , and ! (ie the same way that A-Z works. This isn't what you want.
Either make sure the hyphen is the last character in the character class, as it was before, or escape it with a backslash.
I would also point out that the quote characters you're using “”„ are part of an extended charset, and are not the same as the basic ASCII quotes "'. You may want to include both sets in your pattern. If you do need to include the non-ASCII characters in the pattern, you should also add the u modifier after the end of your pattern so it correctly picks up unicode characters.
Try escaping your regex: [a-zA-Z0-9\-\(\)\*]
Check if this help you: How to escape regular expression special characters using javascript?
Inside of a character class [...] the hyphen - has a special meaning unless it is the first or last character, so you need to escape it:
^[a-zA-Z0-9 _.,\-!()+=“”„#"$#%*]*$
None of the other characters need to be escaped in the character class (except ]). You will also need to escape the quote indicating the string. e.g.
'/[\']/'
"/[\"]/"
try this
^[A-Z0-9][A-Z0-9*&!_^%$#!~#,=+,./\|}{)(~`?][;:\'""-]{0,8}$
use this link to test
trick is i reverse ordered the parenthesis and other braces that took care of some problems. And for square braces you must escape them

PHP regular expression pattern allows unwanted literal asterisks

I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods. Here is the pattern:
return mb_ereg_match("^[\w\s'-\.]+$", $name);
Problem is this pattern, for some reason, returns true when there are literal asterisks in $name. This shouldn't be possible unless I'm missing something. I've done multiple searches on literal asterisks and all I found was the "\*" pattern for intentionally matching them.
The same pattern in preg_match() also returns a match when passed a string like "*John".
What the heck am I missing?
You need a double-backslash in front of these codes. One to escape the backslash, one to escape the escape sequence.
You also need to escape the -, otherwise it accepts all characters "between" ' and ..
return mb_ereg_match("^[\\w\\s'\\-\\.]+$", $name);
Have a look at a working case (using preg_match): http://ideone.com/E8afAM
When enclosed in square-brackets, the hyphen acts as a special character to denote a range. In your case, it's matching all characters in the range ' to ..
Escaping the hyphen should return the desired result:
^[\w\s'\-\.]+$
I have a regular expression that allows only specific characters from the name fields in an HTML form, namely letters, white space, single quotes, hyphens and periods.
You miss, that \w is not a letter character. php.net says:
A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word".
And, the perl definition is:
A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or a connecting punctuation character, such as an underscore ("_").
The connecting punctuation character should mean only _ as i read, but this is maybe a multibyte extension's bug.
If you use mb_ereg_match only for whole unicode matches, give a try to preg_match's /u modifier & the Unicode character properties feature, since php 5.1.0

Regular expression matching more than allowed characters

I am trying to validate that the given string contains contains only letters, numbers, spaces, and characters from a set of symbols (!-?():&,;+). Here is what I have so far:
/^[a-zA-Z0-9 !-?\(\):&,;\+]+$/
Now this works somewhat but it accepts other characters as well. For example, strings containing * or # validate. I thought that the ^ at the beginning of the expression and the $ at the end meant that it would match the whole string. What am I doing wrong?
Thanks.
/^[a-zA-Z0-9 !-?\(\):&,;\+]+$/
The - is not nice where you placed it! If you want to place - inside a character class be sure to either place it first or last e.g.
/^[a-zA-Z0-9 !?\(\):&,;\+-]+$/
Otherwise it will take the range of ! until ? whatever this range maybe...Depends on your regex machine.
Finally special characters are not special inside character classes. So no need to escape most of them :
/^[a-zA-Z0-9 !?():&,;+-]+$/
You have specified a "range" within your character class:
[!-?]
Means all ASCII symbols between ! and ?
http://www.regular-expressions.info/charclass.html
You need to escape the minus - with a \ backslash. (OTOH the backslash is redundant before the + and ( and ) within a character class.)

How to include hypens and apostrophes for a PHP Regex?

I'm writing a password regex in PHP that should return false for any string that has at least one character that is not:
a lowercase letter a-z
an uppercase letter A-Z
a number 0-9
a whitespace " *"
a punctuation symbol :,.!().?";
So far I have this:
<?php
$password = 'azAZ0 giggles 9*":,.!() .?";';
$regex1 = '#^[a-zA-Z0-9" *":,.!().?";\']+$#i';
if (preg_match($regex1, $password)) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
?>
Does this seem to be working as I intend it to, or do you see any glaring errors?
And what should I add to the regex so that it should return false for any string that has at least one character that is not:
a hyphen -
Your regex is pretty close to the target, but not totally correct.
I would use this one:
$regex1 = '/^[a-z0-9 :,.!().?";\'-]+$/i';
Points of interest:
Moved the hyphen to the end of the list, so that it won't be mistaken for a character range delimiter
Included an apostrophe by escaping it with a backslash, as per PHP's string escaping rules
Removed the A-Z part since the regex includes the case-insensitive modifier
Replaced * (which in this context means "a space or an asterisk") with just a space -- if you want to also allow tabs and newlines as part of the password (unlikely), replace it with \s
You simply need to escape ' using \. Try this
$regex1 = '#^[a-zA-Z0-9" *":,.!-().?";\']+$#i';
And you already seem to have - in the regex.
Within a character class (denoted by square brackets in regex), a minus - is always introducing a range: [A-Z].
You have !-(, which is no meaningful range and therefore does not do what you think. Solution:
Move the - to the start or the end of the character class: [-A-Z...] / [A-Z...-]
Escape the -: [A-Z\-...]
The other question you ask is "How do I get a single quote into a PHP string?" and really has nothing to do with regex. But "escape it" is the answer, of course.

Categories