regex match sub-string before sequence of numbers - php

I only want to get the - and space after - before the 4 numbers. I made the following regex to try and match these characters. ^(- )+?(?=\d{4})$
if i try this regex on the number string below i get no matches.
- 7575
what am i doing wrong?
I quite am new to regex.
Thanks in advance.

What your actual regex does is :
^(- )+? => match a sequence of -
Which has to be followed by 4 digit (?=\d{4}) without matching it
Then ending sentence $
So it's impossible.
You either , if you dont want to match the digit, have to put the end in the positive lookahead like
^(- )+?(?=\d{4}$)
Or remove the positive lookahead like
^(- )+?\d{4}$

Related

Sanitize phone number: regular expression match all except first occurence is on first position

regarding to this post "https://stackoverflow.com/questions/35413960/regular-expression-match-all-except-first-occurence" I'm wondering how to find the first occurence on a string only if it start's with a specfic character in PHP.
I would like to sanitize phonenumbers. Example bad phone number:
+49+12423#23492#aosd#+dasd
Regex to remove all "+" except first occurence.
\G(?:\A[^\+]*\+)?+[^\+]*\K\+
Problem: it should remove every "+" only if it starts with "+" not if the first occurence-position is greater than 1.
The regex to remove everything except numbers is easy:
[^0-9]*
But I don't know how to combine those two within one regex. I would just use preg_replace() twice.
Of course I would be able to use a workaround like if ($str[0] === '+') {...} but I prefer to learn some new stuff (regex :)
Thanks for helping.
You can use
(?:\G(?!\A)|^\+)[^+]*\K\+
See the regex demo. Details:
(?:\G(?!\A)|^\+) - either the end of the preceding successful match or a + at the start of string
[^+]* - zero or more chars other than +
\K - match reset operator discarding the text matched so far
\+ - a + char.
See the PHP demo:
$re = '/(?:\G(?!\A)|^\+)[^+]*\K\+/m';
$str = '+49+12423#23492#aosd#+dasd';
echo preg_replace($re, '', $str);
// => +4912423#23492#aosd#dasd
You seem to want to combine the two queries:
A regex to remove everything except numbers
A regex to remove all "+" except first occurence
Here is my two cents:
(?:^\+|\d)(*SKIP)(*F)|.
Replace what is matched with nothing. Here is an online demo
(?:^\+|\d) - A non-capture group to match a starting literal plus or any digit in the range from 0-9.
(*SKIP)(*F) - Consume the previous matched characters and fail them in the rest of the matching result.
| - Or:
. - Any single character other than newline.
I'd like to think that this is a slight adaptation of what some consider "The best regex trick ever" where one would first try to match what you don't want, then use an alternation to match what you do want. With the use of the backtracking control verbs (*SKIP)(*F) we reverse the logic. We first match what we do want, exclude it from the results and then match what we don't want.

PHP RegEx get first letter after set of characters

I have some text with heading string and set of letters.
I need to get first one-digit number after set of string characters.
Example text:
ABC105001
ABC205001
ABC305001
ABCD105001
ABCD205001
ABCD305001
My RegEx:
^(\D*)(\d{1})(?=\d*$)
Link: http://www.regexr.com/390gv
As you cans see, RegEx works ok, but it captures first groups in results also. I need to get only this integer and when I try to put ?= in first group like this: ^(?=\D*)(\d{1})(?=\d*$) , Regex doesn't work.
Any ideas?
Thanks in advance.
(?=..) is a lookahead that means followed by and checks the string on the right of the current position.
(?<=...) is a lookbehind that means preceded by and checks the string on the left of the current position.
What is interesting with these two features, is the fact that contents matched inside them are not parts of the whole match result. The only problem is that a lookbehind can't match variable length content.
A way to avoid the problem is to use the \K feature that remove all on the left from match result:
^[A-Z]+\K\d(?=\d*$)
You're trying to use a positive lookahead when really you want to use non-capturing groups.
The one match you want will work with this regex:
^(?:\D*\d{1})(\d*)$
The (?: string will start a non-capturing group. This will not come back in matches.
So, if you used preg_match(';^(?:\D*\d{1})(\d*)$;', $string, $matches) to find your match, $matches[1] would be the string for which you're looking. (This is because $matches[0] will always be the full match from preg_match.)
try:
^(?:\D*)(\d{1})(?=\d*$) // (?: is the beginning of a no capture group

Strict Regular Expressions

Is there any way to make regex not return here? As in, I want it to not return strings that are not exactly 8 digits.
preg_match_all( '/\w+\d{8}', 'word123456789', ret );
\w+ will also match digits. If you want to return only strings that END in exactly 8 digits, then perhaps:
'/\b[A-Za-z]+\d{8}\b/'
Edit: That should read strings that start with only letters and end with exactly eight digits. If you want something else, please clarify
You could use something along these lines:
'\w+(?<!\d)\d{8}\b'
\w+ - Any word character occuring 1 or more times
(?<!\d)\d{8} - Any 8 digits not preceded by another digit.
\b - word boundary.
word12345678 - Match
word123456789 - No Match
1word12345678 - Match
1w123456789rd12345678 - Match

regex to find number of specific length, but with any character except a number before or after it

I'm trying to work out a regex pattern to search a string for a 12 digit number. The number could have any number of other characters (but not numbers) in front or behind the one I am looking for.
So far I have /([0-9]{12})/ which finds 12 digit numbers correctly, however it also will match on a 13 digit number in the string.
the pattern should match 123456789012 on the following strings
"rgergiu123456789012ergewrg"
"123456789012"
"#123456789012"
"ergerg ergerwg erwgewrg \n rgergewrgrewg regewrge 123456789012 ergwerg"
it should match nothing on these strings:
"123456789012000"
"egjkrgkergr 123123456789012"
What you want are look-arounds. Something like:
/(?<![0-9])[0-9]{12}(?![0-9])/
A lookahead or lookbehind matches if the pattern is preceded or followed by another pattern, without consuming that pattern. So this pattern will match 12 digits only if they are not preceded or followed by more digits, without consuming the characters before and after the numbers.
/\D(\d{12})\D/ (in which case, the number will be capture index 1)
Edit: Whoops, that one doesn't work, if the number is the entire string. Use the one below instead
Or, with negative look-behind and look-ahead: /(?<!\d)\d{12}(?!\d)/ (where the number will be capture index 0)
if( preg_match("/(?<!\d)\d{12}(?!\d)/", $string, $matches) ) {
$number = $matches[0];
# ....
}
where $string is the text you're testing

Matching ugly extra abbreviations and numbers in titles with PHP regex

I have to create regex to match ugly abbreviations and numbers. These can be one of following "formats":
1) [any alphabet char length of 1 char][0-9]
2) [double][whitespace][2-3 length of any alphabet char]
I tried to match double:
preg_match("/^-?(?:\d+|\d*\.\d+)$/", $source, $matches);
But I coldn't get it to select following example: 1.1 AA My test title. What is wrong with my regex and how can I add those others to my regex too?
In your regex you say "start of string, followed by maybe a - followed by at least one digit or followed by 0 or more digits, followed by a dot and followed by at least one digit and followed by the end of string.
So you regex could match for example.. 4.5, -.1 etc. This is exactly what you tell it to do.
You test input string does not match since there are other characters present after the number 1.1 and even if it somehow magically matched your "double" matching regex is wrong.
For a double without scientific notation you usually use this regex :
[-+]?\b[0-9]+(\.[0-9]+)?\b
Now that we have this out of our way we need a whitespace \s and
[2-3 length of alphabet]
Now I have no idea what [2-3 length of alphabet] means but by combining the above you get a regex like this :
[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]
You can also place anchors ^$ if you want the string to match entirely :
^[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]$
Feel free to ask if you are stuck! :)
I see multiple issues with your regex:
You try to match the whole string (as a number) by the anchors: ^ at the beginning and $ at the end. If you don't want that, remove those.
The number group is non-catching. It will be checked for matches, but those won't be added to $matches. That's because of the ?: internal options you set in (?:...). Remove ?: to make that group catching.
You place the shorter digit-pattern before the longer one. If you swap the order, the regex engine will look for it first and on success prefer it over the shorter one.
Maybe this already solves your issue:
preg_match("/-?(\d*\.\d+|\d+)/", $source, $matches);
Demo

Categories