PHP Regular Expression pattern accepts characters that are not allowed - php

preg_match('~^[a-zA-Z0-9!##$%^*()-_+=.]+$~', $string)
This is the pattern I used in my code, What I wanted to do was telling users that they're only allowed to use these characters. But the problem is that it works for some characters and not for some others. For example it doesn't allow a string like "john&john" but it allows "test<>" even though I didn't enter '<' and '>' in the pattern!

Most of those characters in the pattern have special meaning for the regex engine and must be escaped with backslash:
^[a-zA-Z0-9\!\#\#\$\%\^\*\(\)\-\_\+\=\.]+$
https://regex101.com/r/kH7hD8/1

I always test my regexps with tools like that https://regex101.com/
You must escape some special characters in your regexp:
^[a-zA-Z0-9!##\$%\^\*\(\)\-_\+=\.]+$

this will work:
preg_match('~^[a-zA-Z0-9!##$%^*()_+=.-]+$~', $string)

Problem is presence of an un-escaped hyphen in the middle of character class that is acting as range. Use this regex:
preg_match('~^[\w!##$%^*()+=.-]+$~', $string)

Related

How to include EOL in this regex? [duplicate]

I have a string that contains normal characters, white charsets and newline characters between <div> and </div>.
This regular expression doesn't work: /<div>(.*)<\/div>. It is because .* doesn't match newline characters. How can I do this?
You need to use the DOTALL modifier (/s).
'/<div>(.*)<\/div>/s'
This might not give you exactly what you want because you are greedy matching. You might instead try a non-greedy match:
'/<div>(.*?)<\/div>/s'
You could also solve this by matching everything except '<' if there aren't other tags:
'/<div>([^<]*)<\/div>/'
Another observation is that you don't need to use / as your regular expression delimiters. Using another character means that you don't have to escape the / in </div>, improving readability. This applies to all the above regular expressions. Here's it would look if you use '#' instead of '/':
'#<div>([^<]*)</div>#'
However all these solutions can fail due to nested divs, extra whitespace, HTML comments and various other things. HTML is too complicated to parse with Regex, so you should consider using an HTML parser instead.
To match all characters, you can use this trick:
%\<div\>([\s\S]*)\</div\>%
You can also use the (?s) mode modifier. For example,
(?s)/<div>(.*?)<\/div>
There shouldn't be any problem with just doing:
(.|\n)
This matches either any character except newline or a newline, so every character. It solved it for me, at least.
An option would be:
'/<div>(\n*|.*)<\/div>/i'
Which would match either newline or the dot identifier matches.
There is usually a flag in the regular expression compiler to tell it that dot should match newline characters.

PHP Regex - Get text between <P> tags with multiple lines [duplicate]

I have a string that contains normal characters, white charsets and newline characters between <div> and </div>.
This regular expression doesn't work: /<div>(.*)<\/div>. It is because .* doesn't match newline characters. How can I do this?
You need to use the DOTALL modifier (/s).
'/<div>(.*)<\/div>/s'
This might not give you exactly what you want because you are greedy matching. You might instead try a non-greedy match:
'/<div>(.*?)<\/div>/s'
You could also solve this by matching everything except '<' if there aren't other tags:
'/<div>([^<]*)<\/div>/'
Another observation is that you don't need to use / as your regular expression delimiters. Using another character means that you don't have to escape the / in </div>, improving readability. This applies to all the above regular expressions. Here's it would look if you use '#' instead of '/':
'#<div>([^<]*)</div>#'
However all these solutions can fail due to nested divs, extra whitespace, HTML comments and various other things. HTML is too complicated to parse with Regex, so you should consider using an HTML parser instead.
To match all characters, you can use this trick:
%\<div\>([\s\S]*)\</div\>%
You can also use the (?s) mode modifier. For example,
(?s)/<div>(.*?)<\/div>
There shouldn't be any problem with just doing:
(.|\n)
This matches either any character except newline or a newline, so every character. It solved it for me, at least.
An option would be:
'/<div>(\n*|.*)<\/div>/i'
Which would match either newline or the dot identifier matches.
There is usually a flag in the regular expression compiler to tell it that dot should match newline characters.

Help with php regex for limiting allowed characters

I'm working in php and want to set some rules for a submitted text field. I want to allow letters, numbers, spaces, and the symbols # ' , -
This is what I have:
/^(a-z,0-9+# )+$/i
That seems to work but when I add the ' or - symbols I get errors.
Almost there. What you're looking for is called character classes. These are denoted by the use of square brackets. For example
/^[-a-z0-9+#,' ]+$/i
To include the hyphen character, it needs to be the first or last character in the class.
Edit
As you want to include the single quote and you're using PHP where regular expressions must be represented as strings, be careful with how you quote the pattern. In this case, you can use either of
$pattern = "/^[-a-z0-9+#,' ]+\$/i"; // or
$pattern = '/^[-a-z0-9+#,\' ]+$/i';
You should use a character class - [a-zA-Z0-9 #',-]
Note that - should be used first or last or escaped otherwise it gets treated as denoting a range and you will get errors
I want to allow letters, numbers, spaces, and the symbols #, ', , and -.
Use this regex...
/^[-a-zA-Z\d ',#]+\z/
Note the \z. If you use $, you are allowing a trailing \n. CodePad.
Ensure to escape the ' if you are using ' as your string delimiter.
Please use /^[a-z,0-9+\#\-,\s]+$/i
Use this regex:
/^[-a-z0-9,# ']+$/i

Why does this PHP regex give me error?

Need Some Help With Regex:
I want to replace
[url=http://youtube.com]YouTube.com[/url]
with
YouTube.com
the regex
preg_replace("/[url=(.*?)](.*?)[/url]/is", '$2', $text);
why does this give me:
Warning: preg_replace() [function.preg-replace]: Unknown modifier 'r' in C:\Programa\wamp\www\func.php on line 18
You should escape special characters in your regular expression:
preg_replace('/\[url=(.*?)](.*?)\[\/url]/is', '$2', $text);
I have escaped the [ characters (they specify the start of a character class) and the / character (it specifies the boundaries of the regular expression.)
Alternatively (for the / character) you can use another boundary character:
preg_replace('#\[url=(.*?)](.*?)\[/url]#is', '$2', $text);
You still have to escape the [ characters, though.
PHP is interpreting the '/' in /url as being the end of the regex and the start of the regex options. Insert a '\' before it to make it a literal '/'.
You need to escape the '['s in the same way (otherwise they will be interpreted as introducing a character class).
preg_replace("/\[url=(.*?)](.*?)\[\/url]/is", '$2', $text);
Both the slashes and square brackets are special characters in regex, you will need to escape them:
\/ \[ \]
The 2nd '/' in a regex string ends the regex. You need to escape it. Also, preg_replace will interpret the '[url=(.*?)]' as a character class, so you need to escape those as well.
preg_replace('/\[url=(.*?)\](.*?)\[\/url\]/is', '$2', $text);
You seem to be just starting out with regular expressions. If that is the case - or maybe even if it isn't - you will find the Regex Coach to be a very helpful tool. It provides a sandbox for us to test our pattern matches and our replace strings too. If you had been using that it would have highlighted the need to escape the special characters.

How to make dot match newline characters using regular expressions

I have a string that contains normal characters, white charsets and newline characters between <div> and </div>.
This regular expression doesn't work: /<div>(.*)<\/div>. It is because .* doesn't match newline characters. How can I do this?
You need to use the DOTALL modifier (/s).
'/<div>(.*)<\/div>/s'
This might not give you exactly what you want because you are greedy matching. You might instead try a non-greedy match:
'/<div>(.*?)<\/div>/s'
You could also solve this by matching everything except '<' if there aren't other tags:
'/<div>([^<]*)<\/div>/'
Another observation is that you don't need to use / as your regular expression delimiters. Using another character means that you don't have to escape the / in </div>, improving readability. This applies to all the above regular expressions. Here's it would look if you use '#' instead of '/':
'#<div>([^<]*)</div>#'
However all these solutions can fail due to nested divs, extra whitespace, HTML comments and various other things. HTML is too complicated to parse with Regex, so you should consider using an HTML parser instead.
To match all characters, you can use this trick:
%\<div\>([\s\S]*)\</div\>%
You can also use the (?s) mode modifier. For example,
(?s)/<div>(.*?)<\/div>
There shouldn't be any problem with just doing:
(.|\n)
This matches either any character except newline or a newline, so every character. It solved it for me, at least.
An option would be:
'/<div>(\n*|.*)<\/div>/i'
Which would match either newline or the dot identifier matches.
There is usually a flag in the regular expression compiler to tell it that dot should match newline characters.

Categories