Regex to match numbers, # # % signs - php

I am trying to write a regex that matches all numbers (0-9) and # # % signs.
I have tried ^[0-9#%#]$ , it doesn't work.
I want it to match, for example: 1234345, 2323, 1, 3#, %#, 9, 23743, #####, or whatever...
There must be something missing?
Thank you

You're almost right... All you're missing is something to tell the regular expression there may be more than once of those characters like a * (0 or more) or a + (1 or more).
^[0-9#%#]+$
The ^ and $ are used do indicate the start and end of a string, respectively. Make sure that you string only contains those characters otherwise, it won't work (e.g. "The number is 89#1" wouldn't work because the string begins with something other than 0-9, #, %, or #).

Your pattern ^[0-9#%#]$ only matches strings that are one character long. The [] construct matches a single character, and the ^ and $ anchors mean that nothing can come before or after the character matched by the [].
If you just want to know if the string has one of those characters in it, then [0-9#%#] will do that. If you want to match a string that must have at least one character in it, then use ^[0-9#%#]+$. The "+" means to match one or more of the preceding item. If you also want to match empty strings, then use [0-9#%#]*. The "*" means to match zero or more of the preceding item.

It should be /^[0-9#%#]+$/. The + is a qualifier that means "one or more of the preceding".
The problem with your current regex is that it will only match one character that could either be a number or #, %, or #. This is because the ^ and $ characters match the beginning and the end of the line respectively. By adding the + qualifier, you are saying that you want to match one or more of the preceding character-class, and that the entire line consists of one or more of the characters in the specified character-class.

remove the caret (^), it is used to match from the start of the string.

You forgot "+"
^[0-9#%#]+$ must work

Related

Regular expression alphanumeric with dash and underscore and space, but not at the beginning or at the end of the string [duplicate]

I want to design an expression for not allowing whitespace at the beginning and at the end of a string, but allowing in the middle of the string.
The regex I've tried is this:
\^[^\s][a-z\sA-Z\s0-9\s-()][^\s$]\
This should work:
^[^\s]+(\s+[^\s]+)*$
If you want to include character restrictions:
^[-a-zA-Z0-9-()]+(\s+[-a-zA-Z0-9-()]+)*$
Explanation:
the starting ^ and ending $ denotes the string.
considering the first regex I gave, [^\s]+ means at least one not whitespace and \s+ means at least one white space. Note also that parentheses () groups together the second and third fragments and * at the end means zero or more of this group.
So, if you take a look, the expression is: begins with at least one non whitespace and ends with any number of groups of at least one whitespace followed by at least one non whitespace.
For example if the input is 'A' then it matches, because it matches with the begins with at least one non whitespace condition. The input 'AA' matches for the same reason. The input 'A A' matches also because the first A matches for the at least one not whitespace condition, then the ' A' matches for the any number of groups of at least one whitespace followed by at least one non whitespace.
' A' does not match because the begins with at least one non whitespace condition is not satisfied. 'A ' does not matches because the ends with any number of groups of at least one whitespace followed by at least one non whitespace condition is not satisfied.
If you want to restrict which characters to accept at the beginning and end, see the second regex. I have allowed a-z, A-Z, 0-9 and () at beginning and end. Only these are allowed.
Regex playground: http://www.regexr.com/
This RegEx will allow neither white-space at the beginning nor at the end of your string/word.
^[^\s].+[^\s]$
Any string that doesn't begin or end with a white-space will be matched.
Explanation:
^ denotes the beginning of the string.
\s denotes white-spaces and so [^\s] denotes NOT white-space. You could alternatively use \S to denote the same.
. denotes any character expect line break.
+ is a quantifier which denote - one or more times. That means, the character which + follows can be repeated on or more times.
You can use this as RegEx cheat sheet.
In cases when you have a specific pattern, say, ^[a-zA-Z0-9\s()-]+$, that you want to adjust so that spaces at the start and end were not allowed, you may use lookaheads anchored at the pattern start:
^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$
^^^^^^^^^^^^^^^^^^^^
Here,
(?!\s) - a negative lookahead that fails the match if (since it is after ^) immediately at the start of string there is a whitespace char
(?![\s\S]*\s$) - a negative lookahead that fails the match if, (since it is also executed after ^, the previous pattern is a lookaround that is not a consuming pattern) immediately at the start of string, there are any 0+ chars as many as possible ([\s\S]*, equal to [^]*) followed with a whitespace char at the end of string ($).
In JS, you may use the following equivalent regex declarations:
var regex = /^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$/
var regex = /^(?!\s)(?![^]*\s$)[a-zA-Z0-9\s()-]+$/
var regex = new RegExp("^(?!\\s)(?![^]*\\s$)[a-zA-Z0-9\\s()-]+$")
var regex = new RegExp(String.raw`^(?!\s)(?![^]*\s$)[a-zA-Z0-9\s()-]+$`)
If you know there are no linebreaks, [\s\S] and [^] may be replaced with .:
var regex = /^(?!\s)(?!.*\s$)[a-zA-Z0-9\s()-]+$/
See the regex demo.
JS demo:
var strs = ['a b c', ' a b b', 'a b c '];
var regex = /^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$/;
for (var i=0; i<strs.length; i++){
console.log('"',strs[i], '"=>', regex.test(strs[i]))
}
if the string must be at least 1 character long, if newlines are allowed in the middle together with any other characters and the first+last character can really be anyhing except whitespace (including ##$!...), then you are looking for:
^\S$|^\S[\s\S]*\S$
explanation and unit tests: https://regex101.com/r/uT8zU0
This worked for me:
^[^\s].+[a-zA-Z]+[a-zA-Z]+$
Hope it helps.
How about:
^\S.+\S$
This will match any string that doesn't begin or end with any kind of space.
^[^\s].+[^\s]$
That's it!!!! it allows any string that contains any caracter (a part from \n) without whitespace at the beginning or end; in case you want \n in the middle there is an option s that you have to replace .+ by [.\n]+
pattern="^[^\s]+[-a-zA-Z\s]+([-a-zA-Z]+)*$"
This will help you accept only characters and wont allow spaces at the start nor whitespaces.
This is the regex for no white space at the begining nor at the end but only one between. Also works without a 3 character limit :
\^([^\s]*[A-Za-z0-9]\s{0,1})[^\s]*$\ - just remove {0,1} and add * in order to have limitless space between.
As a modification of #Aprillion's answer, I prefer:
^\S$|^\S[ \S]*\S$
It will not match a space at the beginning, end, or both.
It matches any number of spaces between a non-whitespace character at the beginning and end of a string.
It also matches only a single non-whitespace character (unlike many of the answers here).
It will not match any newline (\n), \r, \t, \f, nor \v in the string (unlike Aprillion's answer). I realize this isn't explicit to the question, but it's a useful distinction.
Letters and numbers divided only by one space. Also, no spaces allowed at beginning and end.
/^[a-z0-9]+( [a-z0-9]+)*$/gi
I found a reliable way to do this is just to specify what you do want to allow for the first character and check the other characters as normal e.g. in JavaScript:
RegExp("^[a-zA-Z][a-zA-Z- ]*$")
So that expression accepts only a single letter at the start, and then any number of letters, hyphens or spaces thereafter.
use /^[^\s].([A-Za-z]+\s)*[A-Za-z]+$/. this one. it only accept one space between words and no more space at beginning and end
If we do not have to make a specific class of valid character set (Going to accept any language character), and we just going to prevent spaces from Start & End, The must simple can be this pattern:
/^(?! ).*[^ ]$/
Try on HTML Input:
input:invalid {box-shadow:0 0 0 4px red}
/* Note: ^ and $ removed from pattern. Because HTML Input already use the pattern from First to End by itself. */
<input pattern="(?! ).*[^ ]">
Explaination
^ Start of
(?!...) (Negative lookahead) Not equal to ... > for next set
Just Space / \s (Space & Tabs & Next line chars)
(?! ) Do not accept any space in first of next set (.*)
. Any character (Execpt \n\r linebreaks)
* Zero or more (Length of the set)
[^ ] Set/Class of Any character expect space
$ End of
Try it live: https://regexr.com/6e1o4
^[^0-9 ]{1}([a-zA-Z]+\s{1})+[a-zA-Z]+$
-for No more than one whitespaces in between , No spaces in first and last.
^[^0-9 ]{1}([a-zA-Z ])+[a-zA-Z]+$
-for more than one whitespaces in between , No spaces in first and last.
Other answers introduce a limit on the length of the match. This can be avoided using Negative lookaheads and lookbehinds:
^(?!\s)([a-zA-Z0-9\s])*?(?<!\s)$
This starts by checking that the first character is not whitespace ^(?!\s). It then captures the characters you want a-zA-Z0-9\s non greedily (*?), and ends by checking that the character before $ (end of string/line) is not \s.
Check that lookaheads/lookbehinds are supported in your platform/browser.
Here you go,
\b^[^\s][a-zA-Z0-9]*\s+[a-zA-Z0-9]*\b
\b refers to word boundary
\s+ means allowing white-space one or more at the middle.
(^(\s)+|(\s)+$)
This expression will match the first and last spaces of the article..

Explain the Regular Expression /^[a-zA-Z ]*/

I understand that the regex pattern must match a string which starts with the combination and the repetition of the following characters:
a-z
A-Z
a white-space character
And there is no limitation to how the string may end!
First Case
So a string such as uoiui897868 (any string that only starts with space, a-z or A-Z) matches the pattern... (Sure it does)
Second Case
But the problem is a string like 76868678jugghjiuh (any string that only starts with a character other than space, a-z or A-Z) matches too! This should not happen!
I have checked using the php function preg_match() too , which returns true (i.e. the pattern matches the string).
Also have used other online tools like regex101 or regexr.com. The string does match the pattern.
Can anybody could help me understand why the pattern matches the string described in the second case?
/^[a-zA-Z ]*/
Your regex will match strings that "begin with" any number (including zero) of letters or spaces.
^ means "start of string" and * means "zero or more".
Both uoiui897868 and 76868678jugghjiuh start with 0 or more letters/spaces, so they both match.
You probably want:
/^[a-zA-Z ]+/
The + means "one or more", so it won't match zero characters.
Your regex is completely useless: it will trivially match any string (empty, non-empty, with numbers, without,...), regardless of its structure.
This because
with ^, you enforce the begin of the string, now every string has a start.
You use a group [A-Za-z ], but you use a * operator, so 0 or more repititions. Thus even if the string does not contain (or begins with) a character from [A-Za-z ], the matcher will simply say: zero matches and parse the remaining of the string.
You need to use + instead of * to enforce "at least one character".
The '*' quantifier on the end means zero or more matches of the character, so all strings will match. Perhaps you want to drop the wildcard quantifier, or change it to a '+' quantifier, and add a '$' on the end to test the whole string.
What you really want is to match one or more of the preceding characters.
For that you use +
/^[a-zA-Z ]+/

Difference between regular expressions

I'm trying to work out what the differences are between these two:
preg_match('-^[^'.$inv.']+\.?$-' , $name
preg_match('-['.$inv.']-', $name
Thanks
To make it easier to exemplify, assume $inv = 'a'…
-^[^a]+\.?$- needs to match the whole string, because of the caret and the dollar signs. The string is expected to start with a character other than "a", followed by 0 or more characters that are still not "a"s. The last character in this string, however, can be a dot (hence the question mark after the dot)
-[a]- will match the first "a" in the string and it will stop looking as soon as it finds a match because you're using preg_match() and not preg_match_all().
Your first pattern does not make any sense, though, since already \. = [^a] (translated into English as: a dot is already not an "a")
[EDIT] The first pattern can actually mean something when there's a dot in the character class.
First of, be careful with $inv, depending on its content it could be possible to do some injections in the regular expression. To avoid that issue, use preg_quote().
That said, the first regex will be :
^ <-- the given string must begin with
[ <-- one of those characters
^ <-- inverse the accepted characters (instead of accepted characters, the following characters will be those that are not accepted)
$inv <-- characters
] <-- end of the list of characters (here not accepted characters)
+ <-- at least one character must be matched, more are accepted
\. <-- a '.'
? <-- the previous '.' isn't mandatory
$ <-- the given string must end here
If $inv = 'abc.' it will match:
def
def.
d
d.
It won't match:
., because the . isn't accepted by the [^abc.] group, even though there is \.? later, at least one character must be before a .
de.s, because the . isn't accepted in the [^abc.] group, it is only possible to have it at the end of the given string thanks to \.?
a
deb
testc
teskopkl;;[!##$b., because of the b
an empty string, at least one character must be matched with '[^'.$inv.']+'
It could be simplified into '^[^'.$inv.']+$' (don't forget the preg_quote though)
The second one will be:
[ <-- one of those characters
$inv <-- characters
] <-- end of the list of characters (here accepted characters)
If $inv = 'abc.' it will match
any string containing at least one of the letters a, b, c or .
It won't match any string which doesn't contain a, b, c or ..
In plain English, the first one is looking for an entire line which begins with one or more characters not included with the $inv string, and ending with an optional period.
The second one simply tries to match one character as specified by the value for $inv.
The first pattern matches a line containing none of the characters in $inv, optionally ending the line with a period.
The second pattern matches anything containing any of the characters in $inv.
- is the pattern delimiter, marking the beginning and end of the expression. It can technically be any character, but is most often /.
^ denotes the beginning of the string
[ ] encapsulates a set of characters to be matched
[^ ] encapsulates a set of characters that should not be matched, any other character is considered to be a match.
+ denotes that the previous character or set of characters should be matched one or more times.
. normally matches any character, which is why it is escaped as \. here to indicate a literal period character.
? denotes that the previous character should be matched zero or one time.
$ denotes the end of a string.
['.$inv.']
Lets go with the second one to begin with, since it's the simpler one.
This simply matches a string containing any single one of the characters contained within the string in the variable $inv.
It could contain anything else before or after that character from $inv.
^[^'.$inv.']+\.?$
Now the second one:
This matches a string that contains anything except the characters in $inv (the ^ inside the [] is a negative match).
The match that isn't part of $inv must be at the start of the string (the ^ outside the [] matches the start of the string).
The string can contain as many matching characters as it likes (one or more; that's the + sign after the [])
After that, it may optionally have a dot (the \.? is an optional dot character).
And nothing else after that (the $ matches the end of the string).
Note that in both cases, if $inv contains any regex reserved characters, it will fail (or do something unexpected). You should use preg_quote() to avoid this.
So... uh, they're completely different expressions. Not so much "what's the difference between them" as "what's the same about them". Answer: not much.
The first matches a string from start up to the first occurance of $inv followed by one or zero periods where the string must end.
The second matches a string only containing $inv.
Essentially they are almost the same, except the first allows for a possible . at the end.

check the value entered by the user with regular expression in php

in my program php, I want the user doesn't enter any caracters except the alphabets
like that : "dgdh", "sgfdgdfg" but he doesn't enter the numbers or anything else like "7657" or "gfd(-" or "fd54"
I tested this function but it doesn't cover all cases :
preg_match("#[^0-9]#",$_POST['chaine'])
how can I achieve that, thank you in advance
The simplest can be
preg_match('/^[a-z]+$/i', $_POST['chaine'])
the i modifier is for case-insensitive. The + is so that at least one alphabet is entered. You can change it to * if you want to allow empty string. The anchor ^ and $ enforce that the whole string is nothing but the alphabets. (they represent the beginning of the string and the end of the string, respectively).
If you want to allow whitespace, you can use:
Whitespace only at the beginning or end of string:
preg_match('/^\s*[a-z]+\s*$/i', $_POST['chaine'])
Any where:
preg_match('/^[a-z][\sa-z]*$/i', $_POST['chaine']) // at least one alphabet
Only the space character is allowed but not other whitespace:
preg_match('/^[a-z][ a-z]*$/i', $_POST['chaine'])
Two things. Firstly, you match non-digit characters. That is obviously not the same as letter characters. So you could simply use [a-zA-Z] or [a-z] and the case-insensitive modifier instead.
Secondly you only try to find one of those characters. You don't assert that the whole string is composed of these. So use this instead:
preg_match("#^[a-z]*$#i",$_POST['chaine'])
Only match letters (no whitespace):
preg_match("#^[a-zA-Z]+$#",$_POST['chaine'])
Explanation:
^ # matches the start of the line
[a-zA-Z] # matches any letter (upper or lowercase)
+ # means the previous pattern must match at least once
$ # matches the end of the line
With whitespace:
preg_match("#^[a-zA-Z ]+$#",$_POST['chaine'])

How can I match occurrences of string not in another string using regular expressions?

I'm trying to match all occurances of "string" in something like the following sequence except those inside ##
as87dio u8u u7o #string# ou os8 string os u
i.e. the second occurrence should be matched but not the first
Can anyone give me a solution?
You can use negative lookahead and lookbehind:
(?<!#)string(?!#)
EDIT
NOTE: As per Marks comments below, this would not match #string or string#.
You can try:
(?:[^#])string(?:[^#])
OK,
If you want to NOT match a character you put it in a character class (square brackets) and start it with the ^ character which negates it, for example [^a] means any character but a lowercase 'a'.
So if you want NOT at-sign, followed by string, followed by another NOT at-sign, you want
[^#]string[^#]
Now, the problem is that the character classes will each match a character, so in your example we'd get " string " which includes the leading and trailing whitespace. So, there's another construct that tells you not to match anything, and that is parens with a ?: in the beginning. (?: ). So you surround the ends with that.
(?:[^#])string(?:[^#])
OK, but now it doesn't match at the start of string (which, confusingly, is the ^ character doing double-duty outside a character class) or at the end of string $. So we have to use the OR character | to say "give me a non-at-sign OR start of string" and at the end "give me an non-at-sign OR end of string" like this:
(?:[^#]|^)string(?:[^#]|$)
EDIT: The negative backward and forward lookahead is a simpler (and clever) solution, but not available to all regular expression engines.
Now a follow-up question. If you had the word "astringent" would you still want to match the "string" inside? In other words, does "string" have to be a word by itself? (Despite my initial reaction, this can get pretty complicated :) )

Categories