Can't use regular expression properly in PHP - php

I am new in regular expression and i was doing some form validation using regular expression. But the problem is most of the regular expression are like
^(?=.{8})(?=.*[A-Z])(?=.*[a-z])(?=.*\d.*\d.*\d)(?=.*[^a-zA-Z\d].*[^a-zA-Z\d].*[^a-zA-Z\d])[-+%#a-zA-Z\d]+$
This one i am using for password validation. For other form validation I found lot of such expression here. Now the problem is when i use them in my code as follows
if(preg_match('^(?=.{8})(?=.*[A-Z])(?=.*[a-z])(?=.*\d.*\d.*\d)(?=.*[^a-zA-Z\d].*[^a-zA-Z\d].*[^a-zA-Z\d])[-+%##a-zA-Z\d]+$', $password))
I get at least one error. Most of the time it show erro No ending delimiter or unknown modifier etc

You don't have a delimiter around your expression.
Try this:
$pattern = '/^(?=.{8})(?=.*[A-Z])(?=.*[a-z])(?=.*\d.*\d.*\d)(?=.*[^a-zA-Z\d].*[^a-zA-Z\d].*[^a-zA-Z\d])[-+%#a-zA-Z\d]+$/';
preg_match ($pattern, $password);

Direct answer: You have no delimiters on your expression. PCRE grabs the first character ^ assumes it's the delimiter, and throws the error because it doesn't find a closing ^ at the end of the regex.
Indirect answer: Like Andy-Lester commented, your regex is over-complex and pretty much unreadable to anyone that isn't a regex guru. I use the following which is more readable and more maintainable.
$req_regex = array(
'/[A-Z]/', //uppercase
'/[a-z]/', //lowercase
'/[^A-Za-z]/' //non-alpha
);
foreach($req_regex as $regex) {
if( !preg_match($regex, $password) ) {
return NULL;
}
}

The problem with the expression you have given is that you do not have the delimiters around the expression.
For complex regular expressions it is best to build them up piecemeal. I have found the add-on for Firefox (https://addons.mozilla.org/en-us/firefox/addon/rext/) useful.

Related

PHP Regex: match character set OR end of string

I am porting code from Node.js to PHP and keep getting errors with this regular expression:
^/[a-z0-9]{6}([^0-9a-z]|$)
PHP complains about a dollar sign:
Unknown modifier '$'
In JavaScript I was able to check if a string was ending with [^0-9a-z] or END OF STRING.
How do I do this in PHP with preg_match()?
My PHP code looks like this:
<?
$sExpression = '^/[a-z0-9]{6}([^0-9a-z]|$)';
if (preg_match('|' . $sExpression . '|', $sUrl)) {
// ...
}
?>
The JavaScript code was similar to this:
var sExpression = '^/[a-z0-9]{6}([^0-9a-z]|$)';
var oRegex = new RegExp(sExpression);
if (oRegex.test(sUrl)) {
// ...
}
To match a string that starts with a slash, followed by six alphanumerics and is then followed by either the end-of-string or something that's not alphanumeric:
preg_match('~^/[a-z0-9]{6}([^0-9a-z]|$)~i', $str);
The original JavaScript probably used new RegExp(<expression>), but PCRE requires a proper enclosure of the expression; those are the ~ characters I've put in the above code. Btw, I've made the expression case insensitive by using the i modifier; feel free to remove it if not desired.
You were using | as the enclosure; as such, you should have escaped the pipe character inside the expression, but by doing so you would have changed the meaning. It's generally best to choose delimiters that do not have a special meaning in an expression; it also helps to choose delimiters that don't occur as such in the expression, e.g., my choice of ~ avoids having to escape any character.
Expressions in PCRE can be generalised as:
<start-delimiter> stuff <end-delimiter> modifiers
Typically, the starting delimiter is the same as the ending delimiter, except for cases such as [expression]i or {expression}i whereby the opening brace is matched with the closing brace :)
Fix the regular expression first:
^/[a-z0-9]{6}([^0-9a-z]|$)
Try this.
As others pointed out, I'm an idiot and saw a / as a \ ... LOL.
Ok, well go at this again,
I’d avoid using the "|" and just do it this way.
if (preg_match('/^\/[a-z0-9]{6}([^0-9a-z]|$)/', $sUrl)) { ... }
Reducing this to just matching a particular character or end of string (PHP),
\D777(\D|$)\
This will match:
xxx777xxx or xxx777 but not xxx7777 or xxx7777xxx

my regexp does not work for a simple word match

I want to see if the current request is on the localhost or not. For doing this, I use the following regular expression:
return ( preg_match("/(^localhost).*/", $url) == true ||
preg_match("/^({http|ftp|https}://localhost).*/", $url) == true )
? true : false;
And here is the var_dump() of $url:
string 'http://localhost/aone/public/' (length=29)
Which keeps returning false though. What is the problem of this regular expression?
You are currently using the forward slash (/) as the delimiter, but you aren't escaping it inside your pattern string. This will result in an error and will cause your preg_match() statement to not work (if you don't have error reporting enabled).
Also, you are using alternation incorrectly. If you want to match either foo or bar, you'd write (foo|bar), and not {foo|bar}.
The updated preg_match() should look like:
preg_match("/^(http|ftp|https):\/\/localhost.*/", $url)
Or with a different delimiter (so you don't have to escape all the / characters):
preg_match("#^(http|ftp|https)://localhost.*#", $url)
Curly braces have a special meaning in a regex, they are used to quantify the preceding character(s).
So:
/^({http|ftp|https}://localhost).*/
Should probably be something like:
#^((http|ftp|https)://localhost).*#
Edit: changed the delimiters so that the forward slash does not need to be escaped
This
{http|ftp|https}
is wrong.
I suppose you mean
(http|ftp|https)
Also, if you want only group and don't capture, please add ?::
(?:http|ftp|https)
I would change your current code to:
return preg_match("~^(?:(?:https?|ftp)://)?localhost~", $url);
You were using { and } for grouping, when those are used for quantifying and otherwise mean literal { and `} characters.
A couple of things to add is that:
you can use https? instead of (http|https);
you can use other delimiters for the regex when your pattern has those symbols as delimiters. This avoids you excessive escaping;
you can combine the two regex, since one part is optional (the (?:https?|ftp):// part) and doing so would make the later comparator unnecessary;
the .* at the end is not required.

preg_match troubles

I am trying to match a Youtube URL with regex to see if it is valid. This is my code:
if(preg_match('\bhttp://youtube.com/watch\?v=.*\b', $link))
{
echo "matched youtube";
}
But I'm getting an error:
Warning: preg_match() [function.preg-match]: Delimiter must not be alphanumeric or backslash in C:\xampp\htdocs\ajax\youtube.php on line 22
I'll admit I am a complete novice to regular expressions and I don't understand them much but I am trying to learn as I do this. I made the above regex using this online regex tool:
http://gskinner.com/RegExr/
and it works there. So what am I doing wrong and is there a better way to validate a youtube URL?
Thanks. :)
There's really no need for preg_match here:
$url = "http://youtube.com/watch?v=abc";
if(strpos($url, "http://youtube.com/watch?v=") === 0) {
echo "Valid";
}
PCRE require delimiters that separate the regular expressions and optional modifiers.
In this case the \ is assumed but \ is not a valid delimiter (see error message). Use a different character like ~:
preg_match('~\bhttp://youtube\.com/watch\?v=.*\b~', $link)
You should add addition delimeters to your regexp. This is used to supply optional parameters:
preg_match('"\bhttp://youtube.com/watch\?v=.*\b"', $link)
Symbol / is usually used as regexp delimeter, but in your case it'll force inner / to be escaped. So for more clear view I suggest to use ".
When using preg_match, then the regexp needs to be enclosed with proper delimiters.
For example:
preg_match('/\bhttp://youtube.com/watch\?v=.*\b/', $link)
In your example \b stands for word boundary, this is not a valid alphanumeric delimiter, hence the error message

Grubers new and improved URL recognising regex

I've been trying to use grubers latest url matching regex in a php project.
To test it I threw together something very simple:
$regex = "(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:"'.,<>?«»“”‘’]))";
$array = pret_match_all($regex, $theblockofurltext);
print_r($array);
The first problem was the " would escape a string, depending which I wrapped the regex with, so I just removed it. The use of this is personal and I will never have " anywhere near a url anyway. This left me with a new regex.
$regex = "(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'.,<>?«»“”‘’]))";
Raring to go I then ran my little script and it gave me the following error:
Warning: preg_split() [function.preg-split]: Unknown modifier '\' in D:\wwwroot\xxx\index.php on line 14
Unfortunately my REGEX class at school wasn't taught to anywhere near the levels of this regex requires, and I have no idea where to begin fixing this for use with PHP. Any help would be greatly appreciated. No doubt I'm probably doing something stupid too, so please go easy on me :)
Jon
Add # before and after your RE.
$regex = "#(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'.,<>?«»“”‘’]))#";
If you use PCRE, the regular expression must be enclosed in delimiters. Now, parenthesis () can also be delimiters, that is why the engine thinks, your expression is only (?i) and interprets the next \ as modifier.
You could use ~ as delimiter:
$regex = "~(?i)\b...]))~";
Update:
I don't know whether PHP supports the partial modifying of an expression with (?i). So you might have to remove this and put the modifier after the delimiter instead (you apply it to the whole expression anyway):
$regex = "~\b...]))~i";

Weird error using preg_match and unicode

if (preg_match('(\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+)', '2010/02/14/this-is-something'))
{
// do stuff
}
The above code works. However this one doesn't.
if (preg_match('/\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+/u', '2010/02/14/this-is-something'))
{
// do stuff
}
Maybe someone could shed some light as to why the one below doesn't work. This is the error that is being produced:
A PHP Error was encountered
Severity: Warning
Message: preg_match()
[function.preg-match]: Unknown
modifier '\'
Try this: (delimit the regex with ())
if (preg_match('#\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+#', '2010/02/14/this-is-something'))
{
// do stuff
}
Edited
The modifier u is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32.
Also as nvl observed, you are using / as the delimiter and you are not escaping the / present in the regex. So you'lll have to use:
/\p{Nd}{4}\/\p{Nd}{2}\/\p{Nd}{2}\/\p{L}+/u
To avoid this escaping you can use a different set of delimiters like:
#\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+#
or
#\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+#
As a tip, if your delimiter is present in your regex, its better to choose a different delimiter not found in the regex. This keeps the regex clean and short.
In the second regex you're using / as the regex delimiter, but you're also using it in the regex. The compiler is trying to interpret this part as a complete regex:
/\p{Nd}{4}/
It thinks the next character after the second / should be a modifier like 'u' or 'm', but it sees a backslash instead, so it throws that cryptic exception.
In the first regex you're using parentheses as regex delimiters; if you wanted to add the u modifier, you would put it after the closing paren:
'(\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+)u'
Although it's legal to use parentheses or other bracketing characters ({}, [], <>) as regex delimiters, it's not a good idea IMO. Most people prefer to use one of the less common punctuation characters. For example:
'~\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+~u'
'%\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+%u'
Of course, you could also escape the slashes in the regex with backslashes, but why bother?

Categories