Regex - Find 8 digit number in string - php

I want to extract an 8 digit number from a string using regex.
Test strings are:
hi 82799162
236232 (82342450)
test data 8979
Required respective output should be
82799162
82342450
null
I have tried following code:
preg_match('/[0-9]{8}$/', $string, $match);
preg_match('/\d{8}$/', $string, $match);
But neither retrieves the number from 236232 (82342450).

If a regex is to capture exactly 8 digits, is must contain:
\d{8} as a central part,
a "before" condition, ensuring that no digit occurs before your match,
an "after" condition, ensuring that no digit occurs after your match.
One of possible solutions is to use negative lookbehind / lookahead:
(?<!\d)\d{8}(?!\d)
Another option is word boundary assertions (at both ends):
\b\d{8}\b
I think, regex like [0-9]{8} is not enough, as it captures also
first 8 digits from a longer sequence of digits.
Are you happy with that?

The problem is with your $ sign, and it is used to indicate the end of your expression. So basically, with that expression, you are looking for a string which ends with a 8 digit number. But in your second test string; '236232 (82342450)', ends with a bracket, and therefore it doesn't match the criteria (does not end with a number).
So remove the trailing $ and it will work.
preg_match('/[0-9]{8}/',$string,$match);
Hope it helps!!

Related

Return numbers from a string

In PHP how can I determine if a zip code exists within a string, and if so, return the zip code. In this context, a zip code can be either a string of 5 numbers (ie "55543") or a string of 5 numbers connected to 4 more numbers with a hyphen (ie "74657-9993")..
Can anyone help me out with a Regex pattern I can use with preg_match or any other good ways of doing this?
I have preg_match_all("/\d{5}/", $str, $matches); so far, but that doesn't account for the possible second 4 digits or the hyphen.
5 number connected to 4 more numbers with a hyphen:
preg_match_all("/\b\d{5}(?:-\d{4})?\b/", $str, $matches);
(?:-\d{4,})? is an optional group, with a hyphen, and at least 4 digits after it.
Edit: Forgot to prevent longer than 5 digits for the first part (and 4 digits for the second part), using a word boundary.
EDIT2:
Okay, something else I just noticed is that if you have 12345-12345 but don't want to get any number form this, you would use:
preg_match_all("/\b\d{5}(?!-\d{1,3}\b|-\d{5})(?:-\d{4})?\b/", $str, $matches);
The negative lookahead prevents the match of -12345 (or more digits, or less than 4 digits) if present, but allow only 4 digits ahead.
regex101 demo
Your pattern is
\b\d{5}(?:-\d{4})?\b
See it here on Regexr.
An important part here are the word boundaries \b, they ensure that not a part of a number is matched.
\d{5} is matching 5 digits as you already had it
(?:-\d{4})? is the optional part (because of the ? after the the group). The ?: at the start of the group is just making the group non-capturing.
I have recently implemented this in javascript
/^(\s*|\d{5}([\-]\d{4})?)$/;
Just modify your regex to allow the optional prefix:
preg_match_all("/\d{5}(\-\d{4})?/", $str, $matches);

matching 8 digit of alphanumeric in a string

I wanted to use regular expression to check if a string has a word that contains 8 digit of alphanumeric character, ignoring uppercase and lowercase (meaning that 2HJS1289 and 2hjs1289 should match). I know I can use preg to do this, and so far I have this:
preg_match('/[A-Za-z0-9]/i', $string)
I am unsure however on how to limit it only to 8 digits/character scheme.
For exactly 8 char word you will need to use word boundaries: \b
preg_match('/\b[A-Z\d]{8}\b/i', $string)
Try
preg_match('/\b([A-Z0-9]{8})\b/i', $string)
The {8} matches exactly 8 times. I added the capturing group (the parentheses), in case you needed to extract the actual match.
You can also use {min,max} to match the pattern repeating between min and max times (inclusive, I think). Or you can leave one of the parameters out to leave it open ended. Eg {min,} to match at least min times
[a-zA-Z0-9] - will match upper or lowercase letters or numbers
{8} - will specify to match 8 of the preceeding token
put it together:
preg_match('/([A-Za-z0-9]{8})/i', $string)
example

regex to find number of specific length, but with any character except a number before or after it

I'm trying to work out a regex pattern to search a string for a 12 digit number. The number could have any number of other characters (but not numbers) in front or behind the one I am looking for.
So far I have /([0-9]{12})/ which finds 12 digit numbers correctly, however it also will match on a 13 digit number in the string.
the pattern should match 123456789012 on the following strings
"rgergiu123456789012ergewrg"
"123456789012"
"#123456789012"
"ergerg ergerwg erwgewrg \n rgergewrgrewg regewrge 123456789012 ergwerg"
it should match nothing on these strings:
"123456789012000"
"egjkrgkergr 123123456789012"
What you want are look-arounds. Something like:
/(?<![0-9])[0-9]{12}(?![0-9])/
A lookahead or lookbehind matches if the pattern is preceded or followed by another pattern, without consuming that pattern. So this pattern will match 12 digits only if they are not preceded or followed by more digits, without consuming the characters before and after the numbers.
/\D(\d{12})\D/ (in which case, the number will be capture index 1)
Edit: Whoops, that one doesn't work, if the number is the entire string. Use the one below instead
Or, with negative look-behind and look-ahead: /(?<!\d)\d{12}(?!\d)/ (where the number will be capture index 0)
if( preg_match("/(?<!\d)\d{12}(?!\d)/", $string, $matches) ) {
$number = $matches[0];
# ....
}
where $string is the text you're testing

Matching ugly extra abbreviations and numbers in titles with PHP regex

I have to create regex to match ugly abbreviations and numbers. These can be one of following "formats":
1) [any alphabet char length of 1 char][0-9]
2) [double][whitespace][2-3 length of any alphabet char]
I tried to match double:
preg_match("/^-?(?:\d+|\d*\.\d+)$/", $source, $matches);
But I coldn't get it to select following example: 1.1 AA My test title. What is wrong with my regex and how can I add those others to my regex too?
In your regex you say "start of string, followed by maybe a - followed by at least one digit or followed by 0 or more digits, followed by a dot and followed by at least one digit and followed by the end of string.
So you regex could match for example.. 4.5, -.1 etc. This is exactly what you tell it to do.
You test input string does not match since there are other characters present after the number 1.1 and even if it somehow magically matched your "double" matching regex is wrong.
For a double without scientific notation you usually use this regex :
[-+]?\b[0-9]+(\.[0-9]+)?\b
Now that we have this out of our way we need a whitespace \s and
[2-3 length of alphabet]
Now I have no idea what [2-3 length of alphabet] means but by combining the above you get a regex like this :
[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]
You can also place anchors ^$ if you want the string to match entirely :
^[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]$
Feel free to ask if you are stuck! :)
I see multiple issues with your regex:
You try to match the whole string (as a number) by the anchors: ^ at the beginning and $ at the end. If you don't want that, remove those.
The number group is non-catching. It will be checked for matches, but those won't be added to $matches. That's because of the ?: internal options you set in (?:...). Remove ?: to make that group catching.
You place the shorter digit-pattern before the longer one. If you swap the order, the regex engine will look for it first and on success prefer it over the shorter one.
Maybe this already solves your issue:
preg_match("/-?(\d*\.\d+|\d+)/", $source, $matches);
Demo

Can Someone explain this reg ex to me?

I recently asked a question on formatting a telephone number and I got lots of responses. Most of the responses were great but one i really wanted to figure out what its doing because it worked great. If phone is the following how do the other lines work...what are they doing so i can learn
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Let's break the code into two lines.
preg_replace("~[^0-9]~", "", $phone);
First, we're going to replace matches to a regex with an empty string (in other words, delete matches from the string). The regex is [^0-9] (the ~ on each end is a delimiter). [...] in a regex defines a character class, which tells the regex engine to match one character within the class. Dashes are generally special characters inside a character class, and are used to specify a range (ie. 0-9 means all characters between 0 and 9, inclusive).
You can think of a character class like a shorthand for a big OR condition: ie. [0-9] is a shorthand for 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9. Note that classes don't have to contain ranges, either -- [aeiou] is a character class that matches a or e or i or o or u (or in other words, any vowel).
When the first character in the class is ^, the class is negated, which means that the regex engine should match any character that isn't in the class. So when you put all that together, the first line removes anything that isn't a digit (a character between 0 and 9) from $phone.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
The second line tries to match $phone against a second expression, and puts the results into an array called $matches, if a match is made. You will note there are three sets of brackets; these define capturing groups -- ie. if there is a match of a pattern as a whole, you will end up with three submatches, which in this case will contain the area code, prefix and suffix of the phone number. In general, anything contained in brackets in a regular expression is capturing (while there are exceptions, they are beyond the scope of this explanation). Groups can be useful for other things too, without wanting the overhead of capturing, so a group can be made non-capturing by prefacing it with ?: (ie. (?:...)).
Each group does a similar thing: [0-9]{3} or [0-9]{4}. As we saw above, [0-9] defines a character class containing the digits between 0 and 9 (as the classes here don't start with ^, these are not negated groups). The {3} or {4} is a repetition operator, which says "match exactly 3 (or 4) of the previous token (or group)". So [0-9]{3} will match exactly three digits in a row, and [0-9]{4} will match exactly four digits in a row. Note that the digits don't have to be all the same (ie. 111), because the character class is evaluate for each repetition (so 123 will match because 1 matches [0-9], then 2 matches [0-9], and then 3 matches [0-9]).
In the preg_replace it looks for anything that is not, ^ inside of the [], 0-9 (basically not a number) and replaces / removes it from that string given the replacement is "".
For the first section, it pulls out the first 3 numbers ([0-9]{3}) the {3} is the number of characters to match the items inside the [] are what to match and since this is inside of paranthesis () it stores it as a match in the array $matches. The second part pulls out the next 3 numbers and the last part pulls out the last 4 numbers from $phone and stores the matches that were matched in $matches.
The ~ are delimeters for the regular expressions.
You know it's a regular expression from the regex tag.
So, you are pattern matching.
The pattern you are matching is: [^0-9] followed by the phone number.
[^0-9] is NOT '^' any one digit
So, the match after that is any 3 digits, followed by any 3 digits, followed by any 4 digits.
I don't think it will match because of the () around the area code and the dash are missing.
I'd do this:
~\(([0-9]{3})\)([0-9]{3})-([0-9]{4})~'
"[^0-9]" means everything but numbers from 0 to 9. So basically, first line replace everything but numbers with "" (nothing)
[0-9]{3} means number from 0 to 9, 3 times in a row.
So it check if you have 3 numbers then 3 numbers than 4 numbers and try to match it with $matches.
Check this tuts
Using Regular Expressions with PHP
http://www.webcheatsheet.com/php/regular_expressions.php
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
In php you have to delimit regex pattern in some non-alphanumeric character "~" is used here.
[^0-9] is regex pattern used to remove anything out of $phone that is not in 0-9 range remember [^...] will negate the pattern it precedes.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Again in this line of code you have "~" as delimiter and
([0-9]{3}) this part of pattern will return 3 numbers from string (note: {} is used to specify range/number of characters to match) in a different output array dimension (check your $matches variable for result) using ( ) in a pattern results in groups/submatches

Categories