Regex for detecting the same character more than five times? - php

I'm trying to figure out how to write a regex that can detect if in my string, any character is repeated more than five times consecutively? For example it wouldn't detect "hello", but it would detect "helloooooooooo".
Any ideas?
Edit: Sorry, to clarify, I need it to detect the same character repeated more than five times, not any sequence of five characters. And I also need it to work with any charter, not just "o" like in my example. ".{5,}" is no good because it just detects any sequence of any five characters, not the same character.

This should do it
(\w)\1{5,}
(\w) match any character and put it in the first group
\1{5,} check that the first group match at least 5 times.
Usage :
$input = 'helloooooooooo';
if (preg_match('/(\w)\1{5,}/', $input)) {
# Successful match
} else {
# Match attempt failed
}

Correction, should be (.)\1{5,}, I believe. My mistake. This gets you:
(.) #Any character
\1 #The character captured by (.)
{5,} #At least 5 more repetitions (total of at least 6)
You can also restrict it to letters by using (\w)\1{5,} or ([a-zA-Z])\1{5,}

You can use the regex:
(.)\1{5,}
Explanation:
. : Meta char that matches any
char.
() : Are used for grouping and
remembering the matched single char.
\1 : back reference to the single
char that was remembered in prev
step.
{5,} : Quantifier for 5 or more
and in PHP you can use it as:
$input = 'helloooooooooo';
if(preg_match('/(.)\1{5,}/',$input,$matches)) {
echo "Found repeating char $matches[1] in $input";
}
Output:
Found repeating char o in helloooooooooo

Yep.
(.)\1+
This will match repeated sequences of any character.
The \1 looks at the contents of the first set of brackets. (so if you have more complex regex, you'd need to adjust it to the correct number so it picks up the right set of brackets).
If you need to specify, say more than three of them:
(.)\1{3,}
The \1 syntax is quite powerful -- eg You can also use it elsewhere in your regex to search for the same character appearing in different places in your search string.

Related

Sanitize phone number: regular expression match all except first occurence is on first position

regarding to this post "https://stackoverflow.com/questions/35413960/regular-expression-match-all-except-first-occurence" I'm wondering how to find the first occurence on a string only if it start's with a specfic character in PHP.
I would like to sanitize phonenumbers. Example bad phone number:
+49+12423#23492#aosd#+dasd
Regex to remove all "+" except first occurence.
\G(?:\A[^\+]*\+)?+[^\+]*\K\+
Problem: it should remove every "+" only if it starts with "+" not if the first occurence-position is greater than 1.
The regex to remove everything except numbers is easy:
[^0-9]*
But I don't know how to combine those two within one regex. I would just use preg_replace() twice.
Of course I would be able to use a workaround like if ($str[0] === '+') {...} but I prefer to learn some new stuff (regex :)
Thanks for helping.
You can use
(?:\G(?!\A)|^\+)[^+]*\K\+
See the regex demo. Details:
(?:\G(?!\A)|^\+) - either the end of the preceding successful match or a + at the start of string
[^+]* - zero or more chars other than +
\K - match reset operator discarding the text matched so far
\+ - a + char.
See the PHP demo:
$re = '/(?:\G(?!\A)|^\+)[^+]*\K\+/m';
$str = '+49+12423#23492#aosd#+dasd';
echo preg_replace($re, '', $str);
// => +4912423#23492#aosd#dasd
You seem to want to combine the two queries:
A regex to remove everything except numbers
A regex to remove all "+" except first occurence
Here is my two cents:
(?:^\+|\d)(*SKIP)(*F)|.
Replace what is matched with nothing. Here is an online demo
(?:^\+|\d) - A non-capture group to match a starting literal plus or any digit in the range from 0-9.
(*SKIP)(*F) - Consume the previous matched characters and fail them in the rest of the matching result.
| - Or:
. - Any single character other than newline.
I'd like to think that this is a slight adaptation of what some consider "The best regex trick ever" where one would first try to match what you don't want, then use an alternation to match what you do want. With the use of the backtracking control verbs (*SKIP)(*F) we reverse the logic. We first match what we do want, exclude it from the results and then match what we don't want.

Using RegEx to find a string (as variable number)

How can I find numbers inside certain strings in php?
For example, having this text inside a page, I would like to find for
|||12345|||
or
|||354|||
I'm interested in the numbers, they always change according to the page I visit (numbers being the id of the page and 3-5 characters length).
So the only thing I know for sure is those pipes surrounding the numbers.
Thanks in advance.
Using this \|\|\|\K\d{3,5}(?=\|\|\|)
gives many advantages.
https://regex101.com/r/LtbKfM/1
First, three literals without a quantifier is a simple strncmp() c
call. Also, anytime a regex starts with an assertion it is
inherently slower. Therefore, this is the fastest match for the 3
leading pipe symbols.
Second, using the \K construct excludes whatever was previously
matched from group 0. We don't want to get the 3 pipes in the
match, but we do want to match them.
edit
Note that capture group results are not stored in a special string
buffer.
Each group is really a pointer (or offset) and a length.
The pointer (or offset) is to somewhere in the source string.
When it comes time to extract a particular group string, the overload function for braces
matches[#] uses the pointer (or offset) and length to create and return a string instance.
Using the \K construct simply sets the group 0 pointer (or offset)
to the position in the string that represents the position that
matched after the \K construct.
Third, using a lookahead assertion for 3 pipe symbols does not
consume the symbols as far as the next match is concerned. This
makes these symbols available for the next match. I.e:
|||999|||888||| would get 2 matches as would
|||999|||||888|||.
The result is an array of just the numbers.
Formatted
\|\|\| # 3 pipe symbols
\K # Exclude previous items from the match (group 0)
\d{3,5} # 3-5 digits
(?= \|\|\| ) # Assertion, not consumed, 3 pipe symbols ahead
While #S.Kablar's suggestion is pretty valid, it makes use of a syntax that may be difficult for a beginner.
The more casual way to achieve your goal would be as follows:
$text = 'your input string';
if (preg_match_all('~\|{3}(\d+)\|{3}~', $text, $matches)) {
foreach($matches[1] as $number) {
var_dump($number); // prints smth like string(3) "345"
}
}
The breakdown of the regex:
~ and ~ surround the expression
\| stands for the pipe, which is a special character in regex and must be escaped with a backslash
{3} says 'the previous (the pipe) must be present exactly three times'
( and ) enclose a subpattern so that it is stored under $matches[1]
\d requires a digit
+ says 'the previous (a digit) may be repeated but must have at least one instance'

REGEX - match words that contain letters repeating next to each other

im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).

regex to match entire words containing (one or more) mandatory and allowed characters

I want to match entire words that containing at least one of mandatory chars and allowed chars.
For example
Mandatory chars are : [ t , a , x ]
Allowed chars are : [ i , e]
t : passed (one of mandatories are here)
tea : passed (two of mandatories(t,a) and one allowed(e) here)
e : failed (none of mandatory is here)
teas : failed (two of mandatories(t,a) and one allowed(e) here but one intruder(s))
What is the appropriate REGEX code for this?
It will be used for search 12.000 rows of MySQL table containing one word each row as a PHP project.
Rather than giving a straight answer, let me help you help yourself. A word that passes consists of a sequence of:
zero or more allowed or mandatory characters
a mandatory character
zero or more allowed or mandatory characters
Write regexes for each of these, then just concatenate them to get a regex for the entire thing.
You can use this pattern:
\b[ie]*+[taxie]++\b
explanation:
since [ie]*+ has a word boundary on the left and a possessive quantifier, it grab all i and e as possible and will never give them back, then the next character must be a t, an a or an x from the next class with the + quantifier that impose at least 1 character.
The word boundary on the right disallow other kind of characters.
In perl it would be..
$string =~ /^[tax]*[ie]+$/i; #i is for ignore case
* is a 1 or more + is a 0 or more
I just realized you wanted entire words hold on let me rewrite it..
the ^ and $ will match start and end of line.

Matching ugly extra abbreviations and numbers in titles with PHP regex

I have to create regex to match ugly abbreviations and numbers. These can be one of following "formats":
1) [any alphabet char length of 1 char][0-9]
2) [double][whitespace][2-3 length of any alphabet char]
I tried to match double:
preg_match("/^-?(?:\d+|\d*\.\d+)$/", $source, $matches);
But I coldn't get it to select following example: 1.1 AA My test title. What is wrong with my regex and how can I add those others to my regex too?
In your regex you say "start of string, followed by maybe a - followed by at least one digit or followed by 0 or more digits, followed by a dot and followed by at least one digit and followed by the end of string.
So you regex could match for example.. 4.5, -.1 etc. This is exactly what you tell it to do.
You test input string does not match since there are other characters present after the number 1.1 and even if it somehow magically matched your "double" matching regex is wrong.
For a double without scientific notation you usually use this regex :
[-+]?\b[0-9]+(\.[0-9]+)?\b
Now that we have this out of our way we need a whitespace \s and
[2-3 length of alphabet]
Now I have no idea what [2-3 length of alphabet] means but by combining the above you get a regex like this :
[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]
You can also place anchors ^$ if you want the string to match entirely :
^[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]$
Feel free to ask if you are stuck! :)
I see multiple issues with your regex:
You try to match the whole string (as a number) by the anchors: ^ at the beginning and $ at the end. If you don't want that, remove those.
The number group is non-catching. It will be checked for matches, but those won't be added to $matches. That's because of the ?: internal options you set in (?:...). Remove ?: to make that group catching.
You place the shorter digit-pattern before the longer one. If you swap the order, the regex engine will look for it first and on success prefer it over the shorter one.
Maybe this already solves your issue:
preg_match("/-?(\d*\.\d+|\d+)/", $source, $matches);
Demo

Categories