I have some text with heading string and set of letters.
I need to get first one-digit number after set of string characters.
Example text:
ABC105001
ABC205001
ABC305001
ABCD105001
ABCD205001
ABCD305001
My RegEx:
^(\D*)(\d{1})(?=\d*$)
Link: http://www.regexr.com/390gv
As you cans see, RegEx works ok, but it captures first groups in results also. I need to get only this integer and when I try to put ?= in first group like this: ^(?=\D*)(\d{1})(?=\d*$) , Regex doesn't work.
Any ideas?
Thanks in advance.
(?=..) is a lookahead that means followed by and checks the string on the right of the current position.
(?<=...) is a lookbehind that means preceded by and checks the string on the left of the current position.
What is interesting with these two features, is the fact that contents matched inside them are not parts of the whole match result. The only problem is that a lookbehind can't match variable length content.
A way to avoid the problem is to use the \K feature that remove all on the left from match result:
^[A-Z]+\K\d(?=\d*$)
You're trying to use a positive lookahead when really you want to use non-capturing groups.
The one match you want will work with this regex:
^(?:\D*\d{1})(\d*)$
The (?: string will start a non-capturing group. This will not come back in matches.
So, if you used preg_match(';^(?:\D*\d{1})(\d*)$;', $string, $matches) to find your match, $matches[1] would be the string for which you're looking. (This is because $matches[0] will always be the full match from preg_match.)
try:
^(?:\D*)(\d{1})(?=\d*$) // (?: is the beginning of a no capture group
Related
May be it's simple but I cannot do it work.
I have two filename strings:
wrap.html
wrap-popup.html
I try to select both using
/.*wrap.*\.htm.*/ mask
But it only matches the first one "wrap.html".
If I use /.*wrap.+\.htm.*/, it only matches the second one "wrap-popup.html"
I thought * sounds 0 to infinite characters.
What's the correct mask to select both strings ???
Consider the string "this is text with 2 html pages: wrap.html and wrap-popup.html"
The first regex /.*wrap.*\.htm.*/ will match that whole string.
So if you don't want to include the first part of the string then you need to remove the first .*
Now /wrap.*\.htm.*/ will match "wrap.html and wrap-popup.html" from the string.
That's because the first .* is a greedy match.
So when we change the regex to /wrap.*?\.html?/ the .*? is now a lazy match. And the l? is an optional l. So the regex will return "wrap.html".
But if we want to retrieve both we need a global search, or it would only find the first match.
A preg_match_all (instead of preg_match) with the regex /wrap[\w\-]*?\.html?/ will match both "wrap.html" and "wrap-popup.html".
That second regex of yours wouldn't match wrap.html because with the .+ it expected at least 1 character between "match" and the dot.
I want to write a regex with assertions to extract the number 55 from string unknownstring/55.1, here is my regex
$str = 'unknownstring/55.1';
preg_match('/(?<=\/)\d+(?=\.1)$/', $str, $match);
so, basically I am trying to say give me the number that comes after slash, and is followed by a dot and number 1, and after that there are no characters. But it does not match the regex. I just tried to remove the $ sign from the end and it matched. But that condition is essential, as I need that to be the end of the string, because the unknownstring part can contain similar text, e.g. unknow/545.1nstring/55.1. Perhaps I can use preg_match_all, and take the last match, but I want understand why the first regex does not work, where is my mistake.
Thanks
Use anchor $ inside lookahead:
(?<=\/)\d+(?=\.1$)
RegEx Demo
You cannot use $ outside the positive lookahead because your number is NOT at the end of input and there is a \.1 following it.
im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).
I have a regexp pattern:
<^(([a-z]+)\:([0-9]+)\/?.*)$>
How do I avoid capturing the primary group?
<^(?:([a-z]+)\:([0-9]+)\/?.*)$>
The above pattern will still put the whole string 'localhost:8080' into the first (0) group. But I need to get only 2 matched groups, so that first (0) group is populated with 'localhost' and second (1) with '8080'.
Where did I make a mistake?
The first group, 0, will always be the entire match.
from the docs:
matches
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
if you don't care about the full match, you can use array_shift() to remove the unwanted element.
array_shift($matches);
That's just the way the regex functions work. The first group is always the entire match. You can use array_shift to get rid of it.
http://www.php.net/manual/en/function.array-shift.php
In a regex $0 is always equal to match string and not one of the groupings. Match groups will always start at $1. So look at $1 and $2 instead of $0 and $1.
If you are dealing with URLs, you can try using PEAR NetURL, or what might be better for you in this case would be parse-url()
print_r(parse_url($url));
How can I get only the text inside "()"
For example from "(en) English" I want only the "en".
I've written this pattern "/\(.[a-z]+\)/i" but it also gets the "()";
Thanks in advance.
<?php
$string = '(en) English';
preg_match('#\((.*?)\)#is', $string, $matches);
echo $matches[1]; # en
?>
$matches[0] will contain entire matches string, $matches[1] will first group, in this case (.*?) between ( and ).
What is the dot in your regex good for, I assume its there by mistake.
Second to give you an alternative to the capturing group answer (which is perfectly fine!), here is to soltution using lookbehind and lookahead.
(?<=\()[a-z]+(?=\))
See it here on Regexr
The trick here is, those lookarounds do not match the characters inside, they just check if they are there. So those characters are not included in the result.
(?<=\() positive look behind assertion, checking for the character ( before its position
(?=\) positive look ahead assertion, checking for the character ( ahead of its position
That should do the job.
"/\(([a-z]+)\)/i"
The easiest way is to get "/\(([a-z]+)\)/i" and use the capture group to get what you want.
Otherwise, you have to get into look ahead, look behinds
You could use a capture group like everyone else proposes
OR
you can make your match only check if your match is preceded by "(" and followed by ")". It's called Lookahead and lookbehind.
"/(?<=\().[a-z]+(?=\))/i"