I'm trying to create a pattern in PHP that matches 2 or more upper case characters in a string.
I've tried the following, but it only matches 2 or more upper case characters in a row, not the entire string:
preg_match('/[A-Z]{2,}/', $string);
For example, the string "aBcDe" or "Red Apple" should return true.
You just have to allow other characters between your uppercase letters:
^(?:.*?\p{Lu}){2}
Demo
I used \p{Lu} here to include Unicode characters as well. If you don't want that just use [A-Z] instead like you did in your pattern.
This simply means:
^ from the start of the pattern
(?: group:
.*? match anything, but as few chars as possible
\p{Lu} match an uppercase letter
){2} ... two times
If all you need to do is identify that a string contains at least 2 uppercase characters then you can use the following:
[A-Z].*?[A-Z]
Try it here.
If you need to identify the specific uppercase characters in the string then things get more complicated.
UPDATE: As Lucas mentioned, you need a different regex if you want unicode support.
\p{Lu}.*?\p{Lu}
^.*[A-Z].*[A-Z].*$
A simple pattern stating the same would do.See demo.
https://regex101.com/r/pT4tM5/23
[A-Z].*[A-Z]
is about as simple as it gets - match an uppercase followed by anything repeated any number of times followed by any other uppercase letter.
If you need to match the whole line/string that has at least 2 upper case letters, you can also use
^(?=(?:.*[A-Z]){2}).+$
Demo here.
Related
Hi guys have the following regex:
/([A-Z][\w-]*(\s+[A-Z][\w-]*)+)/
I've tried in different way, but i'm not a pro with regex..so, this is what want to do:
Add a rule that match only 3+ characters words.
Add a rule that can match name like "Institute of Technology" (so, three words with a lowercase word between the first and the last)
Can you help me to do that? (I should do different regex, am i right?)
In order to help you to understand, this is what you have:
[A-Z]: one character in the class A-Z
[\w-]*: a concatenation of zero or more word character or hypens
(...)+: one or more:
\s+: at least one space
[A-Z]: one character in the class A-Z
[\w-]*: a concatenation of zero or more word character or hypens
This is what you want:
[A-Z]: a capital letter
[\w-]*: a concatenation of zero or more word character or hypens
\s+: at least one space
[a-z]: a lower-case letter
[\w-]*: a concatenation of zero or more word character or hypens
\s+: at least one space
[A-Z]: a capital letter
[\w-]*: a concatenation of zero or more word character or hypens
That is:
[A-Z][\w-]*\s+[a-z][\w-]*\s+[A-Z][\w-]*
You may want to do some small changes. I think you can do them by your own.
A rule that matches only 3+ characters word is \w{3,}. If you want to capitalize the first character use [A-Z]\w{2,}.
(\w\w\w+)|(\w+ [a-z]+ \w+) - This code searches for a word consisting of at least 3 letters OR a word with at least 1 sign, space, small letters, 1+ signs. You can switch \w with [A-Z] if necessary.
If your 3 word phrase has to have 2 words with capital letters, change the second brackets to ([A-Z]\w* [a-z]+ [A-Z]\w*). Try it here: https://regex101.com/r/E3IPTj/1
Not sure on the scope of your limitations but a few 'building blocks' might help. Also id suggest just starting at the beginning I don't know any recent websites that handle learning regex well but when I started I used the following http://www.regular-expressions.info/tutorial.html (It's been many years, and the website does reflect its age so to speak)
However onto your regex:
Following your example: Institute of Technology
You need to know just a few things, character sets (and how to use matching length) and the space.
Character sets match one length (by default) and are done like for example [abc] that will match a, b, or c, and also supports character ranges (a-z)/grouped (eg. \d all digits).
The match length can be changed by using the:
+ - one or more (examples: a+, [abc]+, \d+)
* - zero or more (examples: a*, [abc]*)
And this one you might want but thats up to you
{min, max} - specific range, eg. b{3,5} will match 3-5 joined 'b' characters (bbb, bbbb, bbbbb) max can be omitted `{min,} to have at least min chars but no max
Spaces are done using "" (a space), (\s matches any whitespace character (equal to [\r\n\t\f\v ]) (spaces, tabs, newlines, ...)
In your example its a matter of case sensitive or not if not case sensitive we can use a simple [A-Za-z]+ to match upper and lowercase a-z of at least one length, together with the space we get something along the lines of
/[A-Za-z]+ [A-Za-z]+ [A-Za-z]+/
It's that simple. For case insensitive matching there is also an option flag, we can use i which will result in
/[a-z]+ [a-z]+ [a-z]+/i
If you do want to have case sensitive matching you will need to separate them how you like:
/[A-Z][a-z]* [a-z]+ [A-Z][a-z]*/ // (*A a A*)
As a small change I've also changed + into * so the lowercase part is not required, again up to you.
Also note that to match the beginning of a string your required to use ^ and to match the end of a string use $ the above examples will match any segment, not the whole input eg: qhg8Institute of Technology8tghagus would work
So final result:
/^[A-Z][a-z]* [a-z]+ [A-Z][a-z]*$/ // case sensitive (Aa a Aa)
/^[a-z]+ [a-z]+ [a-z]+$/i // case insensitive
Obviously there is lots more to learn that can be used to expand/ optimize this but regex are so customizable its really up to the person needing them to specify his/ her limitations/ requirements.
As a side note I noticed people using \w for word chars, but this also includes digits, _, and special language letters like à, ü, etc. Again up to you what to do with this.
I need to replace some camelCase characters with the camel Case character and a -.
What I have got is a string like those:
Albert-Weisgerber-Allee 35
Bruninieku iela 50-10
Those strings are going through this regex to seperate the number from the street:
$data = preg_replace("/[^ \w]+/", '', $data);
$pcre = '\A\s*(.*?)\s*\x2f?(\pN+\s*[a-zA-Z]?(?:\s*[-\x2f\pP]\s*\pN+\s*[a-zA-Z]?)*)\s*\z/ux';
preg_match($pcre, $data, $h);
Now, I have two problems.
I'm very bad at regex.
Above regex also cuts every - from the streets name, and there are a lot of those names in germany and europe.
Actually it would be quite easy to just adjust the regex to not cut any hyphens, but I want to learn how regex works and so I decided to try to find a regex that just replaces every camel case letter in the string with
- & matched Camel Case letter
except for the first uppercase letter appearance.
I've managed to find a regex that shows me the places I need to paste a hyphen like so:
.[A-Z]{1}/ug
https://regex101.com/r/qI2iA9/1
But how on earth do I replace this string:
AlbertWeisgerberAllee
that it becomes
Albert-Weisgerber-Allee
To insert dashes before caps use this regex:
$string="AlbertWeisgerberAllee";
$string=preg_replace("/([a-z])([A-Z])/", "\\1-\\2", $string);
Just use capture groups:
(.)([A-Z]) //removed {1} because [A-Z] implicitly matches {1}
And replace with $1-$2
See https://regex101.com/r/qI2iA9/3
You seem to be over complicating the expression. You can use the following to place - before any uppercase letters except the first:
(.)(?=[A-Z])
Just replace that with $1-. Essentially, what this regex does is:
(.) Find any character and place that character in group 1.
(?=[A-Z]) See if an uppercase character follows.
$1- If matched, replace with the character found in group 1 followed by a hyphen.
im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).
in my program php, I want the user doesn't enter any caracters except the alphabets
like that : "dgdh", "sgfdgdfg" but he doesn't enter the numbers or anything else like "7657" or "gfd(-" or "fd54"
I tested this function but it doesn't cover all cases :
preg_match("#[^0-9]#",$_POST['chaine'])
how can I achieve that, thank you in advance
The simplest can be
preg_match('/^[a-z]+$/i', $_POST['chaine'])
the i modifier is for case-insensitive. The + is so that at least one alphabet is entered. You can change it to * if you want to allow empty string. The anchor ^ and $ enforce that the whole string is nothing but the alphabets. (they represent the beginning of the string and the end of the string, respectively).
If you want to allow whitespace, you can use:
Whitespace only at the beginning or end of string:
preg_match('/^\s*[a-z]+\s*$/i', $_POST['chaine'])
Any where:
preg_match('/^[a-z][\sa-z]*$/i', $_POST['chaine']) // at least one alphabet
Only the space character is allowed but not other whitespace:
preg_match('/^[a-z][ a-z]*$/i', $_POST['chaine'])
Two things. Firstly, you match non-digit characters. That is obviously not the same as letter characters. So you could simply use [a-zA-Z] or [a-z] and the case-insensitive modifier instead.
Secondly you only try to find one of those characters. You don't assert that the whole string is composed of these. So use this instead:
preg_match("#^[a-z]*$#i",$_POST['chaine'])
Only match letters (no whitespace):
preg_match("#^[a-zA-Z]+$#",$_POST['chaine'])
Explanation:
^ # matches the start of the line
[a-zA-Z] # matches any letter (upper or lowercase)
+ # means the previous pattern must match at least once
$ # matches the end of the line
With whitespace:
preg_match("#^[a-zA-Z ]+$#",$_POST['chaine'])
Im about to create a registration form for my website. I need to check the variable, and accept it only if contains letter, number, _ or -.
How can do it with regex? I used to work with them with preg_replace(), but i think this is not the case. Also, i know that the "ereg" function is dead. Any solutions?
this regex is pretty common these days.
if(preg_match('/^[a-z0-9\-\_]+$/i',$username))
{
// Ok
}
Use preg_match:
preg_match('/^[\w-]+$/D', $str)
Here \w describes letters, digits and the _, so [\w-]+ matches one or more letters, digits, _, and -. ^ and $ are so called anchors that denote the begin and end of the string respectively. The D modifier avoids that $ really matches the end of the string and is not followed by a line break.
Note that the letter and digits that are matched by \w depend on the current locale and might match other letter or digits than just [a-zA-Z0-9]. So if you just want these, use them explicitly. And if you want to allow more than these, you could also try character classes that are describes by Unicode character properties like \p{L} for all Unicode letters.
Try preg_match(). http://php.net/manual/en/function.preg-match.php