Why doesn't this regular expression work with spaces?

Why doesn't this regular expression work with spaces? - php

How do I make the following regular expression accept only the symbols I want it to accept as well as spaces?
if(!preg_match('/^[A-Z0-9\/\'&,.-]*$/', $line))
{
die();
}
else
{
//execute the rest of the validation script
}
I want the user to only be able to enter A-Z, 0-9, forward slashes, apostrophes, ampersands, commas, periods, and hyphens into a given text field $line.
It currently will accept something along the lines of HAM-BURGER which is perfect, it should accept that. I run into an issue when the user wants to type HAM BURGER (<- note the space).
If I remove the ^ from the beginning and/or the $ from the end it will succeed if the user types in anything. My attempted remedy to this was to make the * into a + but then it will accept anything as long as the user puts in at least one of the acceptable characters.

Add the space to the character class:
if(!preg_match('/^[A-Z0-9\/\'&,. -]*$/', $line))
Yes, it's that simple.
Note that the space has to be inserted before the - because it is a metacharacter in a character class (unless it's the first or last character in said character class). Another option is to escape it like:
if(!preg_match('/^[A-Z0-9\/\'&,.\- ]*$/', $line))
The regex explained:
^ and $ are start and end of string anchors. It tells the regex engine that it has to match the whole string rather than just part of it.
[...] is a character class.
* is the zero-or-more repetition operator. This means it will accept an empty string. You can change it to + (one-or-more) so it rejects the empty string.

This is a good reference for RegEx, though specifically for Perl:
http://www.cs.tut.fi/~jkorpela/perl/regexp.html

Related

How to check if string contains specific special characters or starting with a space? [duplicate]

I have the following requirements for validating an input field:
It should only contain alphabets and spaces between the alphabets.
It cannot contain spaces at the beginning or end of the string.
It cannot contain any other special character.
I am using following regex for this:
^(?!\s*$)[-a-zA-Z ]*$
But this is allowing spaces at the beginning. Any help is appreciated.

For me the only logical way to do this is:
^\p{L}+(?: \p{L}+)*$
At the start of the string there must be at least one letter. (I replaced your [a-zA-Z] by the Unicode code property for letters \p{L}). Then there can be a space followed by at least one letter, this part can be repeated.
\p{L}: any kind of letter from any language. See regular-expressions.info
The problem in your expression ^(?!\s*$) is, that lookahead will fail, if there is only whitespace till the end of the string. If you want to disallow leading whitespace, just remove the end of string anchor inside the lookahead ==> ^(?!\s)[-a-zA-Z ]*$. But this still allows the string to end with whitespace. To avoid this look back at the end of the string ^(?!\s)[-a-zA-Z ]*(?<!\s)$. But I think for this task a look around is not needed.

This should work if you use it with String.matches method. I assume you want English alphabet.
"[a-zA-Z]+(\\s+[a-zA-Z]+)*"
Note that \s will allow all kinds of whitespace characters. In Java, it would be equivalent to
[ \t\n\x0B\f\r]
Which includes horizontal tab (09), line feed (10), carriage return (13), form feed (12), backspace (08), space (32).
If you want to specifically allow only space (32):
"[a-zA-Z]+( +[a-zA-Z]+)*"
You can further optimize the regex above by making the capturing group ( +[a-zA-Z]+) non-capturing (with String.matches you are not going to be able to get the words individually anyway). It is also possible to change the quantifiers to make them possessive, since there is no point in backtracking here.
"[a-zA-Z]++(?: ++[a-zA-Z]++)*+"

Try this:
^(((?<!^)\s(?!$)|[-a-zA-Z])*)$
This expression uses negative lookahead and negative lookbehind to disallow spaces at the beginning or at the end of the string, and requiring the match of the entire string.

I think the problem is there's a ? before the negation of white spaces, which means it is optional
This should work:
[a-zA-Z]{1}([a-zA-Z\s]*[a-zA-Z]{1})?
at least one sequence of letters, then optional string with spaces but always ends with letters

I don't know if words in your accepted string can be seperated by more then one space. If they can:
^[a-zA-Z]+(( )+[a-zA-z]+)*$
If can't:
^[a-zA-Z]+( [a-zA-z]+)*$
String must start with letter (or few letters), not space.
String can contain few words, but every word beside first must have space before it.
Hope I helped.

Allow only some letters, ban special characters ($% etc.) except others (' -)

I need a Regex for PHP to do the following:
I want to allow [a-zα-ωá-źа-яա-ֆა-ჰא-ת] and chinese, japanese (more utf-8) letters;
I want to ban [^٩٨٧٦٥٤٣٢١٠۰۱۲۳۴۵۶۷۸۹] (arabic numbers);
This is what i've done:
function isValidFirstName($first_name) {
return preg_match("/^(?=[a-zα-ωá-źа-яա-ֆა-ჰא-ת]+([a-zα-ωá-źа-яա-ֆა-ჰא-ת' -]+)?\z)[a-zα-ωá-źа-яա-ֆა-ჰא-ת' -]+$/i", $first_name);
}
It looks like it works, but if I type letters of more than 1 language, it doesn't validate.
Examples: Авпа Вапапва á-ź John - doesn't validate.
John Gger - validates, á-ź á-ź - validates.
I would like to this all of these.
Or if there's a way, to echo a message if user entered more lingual string.

I can't reproduce the failure cases here (Авпа Вапапва á-ź John validates just fine), but you can simplify the regex a lot - you don't need that lookahead assertion:
preg_match('/^[a-zα-ωá-źа-яա-ֆა-ჰא-ת][a-zα-ωá-źа-яա-ֆა-ჰא-ת\' -]*$/i', $first_name)
As far as I can tell from the character ranges you've given, you don't need to exclude the digits because anything outside these character classes will already cause the regex to fail.
Another consideration: If your goal is to allow any letter from any language/script (plus some punctuation and space) you can (if you're using Unicode strings) further simplify this to:
preg_match('/^\pL[\pL\' -]*$/iu', $first_name)
But generally, I wouldn't try to validate a name by regular expressions (or any other means): Falsehoods programmers believe about names.

You may filter out Arabic characters by checking followin way using RegEx:
if (preg_match('/(?:[\p{Hebrew}]+)/imu', $subject)) {
# Successful match
} else {
# Match attempt failed
}
RegEx explanation
<!--
(?i)(?:[\p{IsHebrew}]+)
Options: case insensitive; ^ and $ match at line breaks
Match the remainder of the regex with the options: case insensitive (i) «(?i)»
Match the regular expression below «(?:[\p{IsHebrew}]+)»
A character in the Unicode block “Hebrew” (U+0590..U+05FF) «[\p{IsHebrew}]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
-->

REGEXP not catching some names correctly if certain values are at certain positions in the string

I have the following regex meant to test against valid name formats:
^[a-zA-Z]+(([\'\,\.\- ][a-zA-Z ])?[a-zA-Z]*)*$
it seems to work fine with all the expected odd name possibilities, including the following:
o'Bannon
Smith, Jr.
Double-barreled
I'm having problem when I plug this into my PHP code. If the first character is a number it passes through as valid.
If the last character is a space, comma, full-stop or other special allowed character, it's failing as invalid.
My PHP code is :
$v = 'Tested Value';
$value = (filter_var($v, FILTER_VALIDATE_REGEXP,array("options"=>array("regexp"=>"^[a-zA-Z]+(([\'\,\.\-,\ ][a-zA-Z ])?[a-zA-Z]*)*$^"))));
if (strlen($value) <2 && strlen($v) !=0) {
return "not valid";
}
What am I doing wrong here?

^[a-zA-Z]+(([\'\,\.\-,\ ][a-zA-Z ])?[a-zA-Z]*)*$^
The carets (^) at the beginning and end of the regex are being interpreted as regex deliminators, not as anchors. The regex isn't really matching the digits at the beginning of the string, it's skipping over them so it can start matching at the first letter it finds. You can use almost any ASCII punctuation character as the regex deliminator, but most people use # or ~, which are relatively uncommon and have no special meaning in regexes.
As for not allowing punctuation at the end, that's how the regex is written. Specifically, [\'\,\.\- ][a-zA-Z ] requires that each apostrophe, comma, period or hyphen be followed by a letter or a space. If you really want to allow any of those characters at the end, it's pretty simple:
~^(?:[a-z]+[',. -]*)+$~i
Of course, that's not a particularly good regex for validating names, but I have nothing better to offer; it's a job for which regexes are particularly ill-suited. And do you really want to be the one to tell your users their own names are invalid?

Your regex is way to complex
/^[a-z]+[',. a-z-]*$/i
should do the same thing

php regex allow only english characters in string

I want to insert into a textbox only english characters and other special characters like
$!#{]{[
etc...
but also i want to check if the string contains at least 2 characters of these: (a-zA-Z0-9)
So i thought of this:
preg_match('/[^a-zA-Z0-9 -"?()[]#:/\'_+*%#!~`$><,.;{}|\]/',$string)
is this a good approach?

No your approach is not good
Try this one. You need to complete the special characters you want into the character class. You need to escape the ]\-^ characters since they have special meanings in the class (depending on their position).
^(?=.*[A-Za-z0-9].*[A-Za-z0-9])[$!#{}[\]A-Za-z0-9]*$
See it here on Regexr
The first part is a positive lookahead that ensures the two characters of your [A-Za-z0-9] requirement somewhere in the string.
Then comes the character class [A-Za-z0-9])[$!#\{\}\[\]A-Za-z0-9] where you can put in the characters that you want to match.
The ^ at the beginning of my expression ensures that it matches from the start of the beginning and the $ at the end ensure that it matches the end of the string.
The ^ at the beginning of your example is a negation of the complete character class, what you don't want I guess, if you want to match for the character ^ put it somewhere else in the class. The - in the middle of your class defines a character range that matches everything from -", I don't know what characters that are, but probably more than you want. Put the - at the beginning or the end or escape it.

(?:.*?[0-9a-zA-Z]){2,}[0-9a-zA-Z$!#{\]{\[]*

$parameter ='(a-zA-Z){2}';
if $string='kasdfhk890';
preg_match($string,$parameter); //
return false;
if $string='k';
preg_match($string,$parameter); // single char error
return false;
if $string='kuyyee';
preg_match($string,$parameter);// english character only
return true;
You want learn more try this link

Allow + in regex email validate email [duplicate]

This question already has answers here:
How to validate an email address in PHP
(15 answers)
Closed 2 years ago.
Regex is blowing my mind. How can I change this to validate emails with a plus sign? so I can sign up with test+spam#gmail.com
if(!preg_match("/^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*$/i", $_GET['em'])) {

It seems like you aren't really familiar with what your regex is doing currently, which would be a good first step before modifying it. Let's walk through your regex using the email address john.robert.smith#mail.com (in each section below, the bolded part is what is matched by that section):
^ is the start of string
anchor.
It specifies that any match must
begin at the beginning of the
string. If the pattern is not
anchored, the regex engine can match
a substring, which is often
undesired.
Anchors are zero-width, meaning that
they do not capture any characters.
[_a-z0-9-]+ is made up of two
elements, a character
class
and a repetition
modifer:
[...] defines a character class, which tells the regex engine,
any of these characters are valid matches. In this case the class
contains the characters a-z, numbers
0-9 and the dash and underscore (in
general, a dash in a character class
defines a range, so you can use
a-z instead of
abcdefghijklmnopqrstuvwxyz; when
given as the last character in the
class, it acts as a literal dash).
+ is a repetition modifier that specifies that the preceding token
(in this case, the character class)
can be repeated one or more times.
There are two other repetition
operators: * matches zero or more
times; ? matches exactly zero or
one times (ie. makes something
optional).
(captures
john.robert.smith#mail.com)
(\.[_a-z0-9-]+)* again contains a
repeated character class. It also
contains a
group,
and an escaped character:
(...) defines a group, which allows you to group multiple tokens
together (in this case, the group
will be repeated as a
whole).Let's say we wanted to
match 'abc', zero or more times (ie.
abcabcabc matches, abcccc doesn't).
If we tried to use the pattern
abc*, the repetition modifier
would only apply to the c, because
c is the last token before the
modifier. In order to get around
this, we can group abc ((abc)*),
in which case the modifier would
apply to the entire group, as if it
was a single token.
\. specifies a literal dot character. The reason this is needed
is because . is a special
character in regex, meaning any
character.
Since we want to match an actual dot
character, we need to escape it.
(captures
john.robert.smith#mail.com)
# is not a special character in
regex, so, like all other
non-special characters, it matches
literally.
(captures john.robert.smith#mail.com)
[a-z0-9-]+ again defines a repeated character class, like item #2 above.
(captures john.robert.smith#mail.com)
(\.[a-z0-9-]+)* is almost exactly the same pattern as #3 above.
(captures john.robert.smith#mail.com)
$ is the end of string anchor. It works the same as ^ above, except matches the end of the string.
With that in mind, it should be a bit clearer how to add a section with captures a plus segment. As we saw above, + is a special character so it has to be escaped. Then, since the + has to be followed by some characters, we can define a character class with the characters we want to match and define its repetition. Finally, we should make the whole group optional because email addresses don't need to have a + segment:
(\+[a-z0-9-]+)?
When inserted into your regex, it'd look like this:
/^[_a-z0-9-]+(\.[_a-z0-9-]+)*(\+[a-z0-9-]+)?#[a-z0-9-]+(\.[a-z0-9-]+)*$/i

Save your sanity. Get a pre-made PHP RFC 822 Email address parser

I've used this regex to validate emails, and it works just fine with emails that contain a+:
/^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/

\+ will match a literal + sign, but be aware: You still won't be close to matching all possible email addresses according to the RFC spec, because the actual regex for that is madness. It's almost certainly not worth it; you should use a real email parser for this.

This is another solution (is similar to the solution found by David):
//Escaped for .Net
^[_a-zA-Z0-9-]+((\\.[_a-zA-Z0-9-]+)*|(\\+[_a-zA-Z0-9-]+)*)*#[a-zA-Z0-9-]+(\\.[a-zA-Z0-9-]+)*(\\.[a-zA-Z]{2,4})$
//Native
^[_a-zA-Z0-9-]+((\.[_a-zA-Z0-9-]+)*|(\+[_a-zA-Z0-9-]+)*)*#[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*(\.[a-zA-Z]{2,4})$

This is the another solution
/^[_a-z0-9-+]+(\.[_a-z0-9-+]+)*(\+[a-z0-9-]+)?#[a-z0-9-.]+(\.[a-z0-9]+)$/
or For razor page(#=\u0040)
/^[_a-z0-9-+]+(\.[_a-z0-9-+]+)*(\+[a-z0-9-]+)?\u0040[a-z0-9-.]+(\.[a-z0-9]+)$/

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Why doesn't this regular expression work with spaces? - php

This is a good reference for RegEx, though specifically for Perl: http://www.cs.tut.fi/~jkorpela/perl/regexp.html

Related

How to check if string contains specific special characters or starting with a space? [duplicate]

Allow only some letters, ban special characters ($% etc.) except others (' -)

REGEXP not catching some names correctly if certain values are at certain positions in the string

php regex allow only english characters in string

Allow + in regex email validate email [duplicate]

Categories

Resources