PHP: Simple regular expression to match exact length of digits - php

I am trying a simple regex to match exactly 5 digits from a string. However, this pattern matches for 5 and more than 5.
preg_match_all('#[0-9]{5}+#', 'one two 412312 three (51212 four five)', $matches);
print_r($matches);
Result:
Array(
[0] => Array
(
[0] => 41231
[1] => 51215
)
)
I need it to match exactly 5 digits.
Thanks.

You can use word boundaries here and remove the + quantifier after the range operator.
preg_match_all('~\b\d{5}\b~', $str, $matches);
As stated in the comments, if you need to match the five digits in a51212a but not 412312 you can use a combination of lookaround assertions.
preg_match_all('~(?<!\d)\d{5}(?!\d)~', $str, $matches);

Try this:
preg_match_all('(\b\d{5}\b)', 'one two 412312 three (51212 four five)', $matches);
print_r($matches);
It matches every group of 5 digits.

You can use lookaheads and -behinds for that: (?<!\d)[0-9]{5}(?!\d)

Related

preg_match return longest match

I am trying to return a series of numbers between 5 and 9 digits long. I want to be able to get the longest possible match, but unfortunately preg_match just returns the last 5 characters that match.
$string = "foo 123456";
if (preg_match("/.*(\d{5,9}).*/", $string, $match)) {
print_r($match);
};
will yield results
Array
(
[0] => foo 123456
[1] => 23456
)
Since you want only the numbers, you can just remove the .* from the pattern:
$string = "foo 123456";
if (preg_match("/\d{5,9}/", $string, $match)) {
print_r($match);
};
Note that if the input string is "123456789012", then the code will return 123456789 (which is a substring of a longer sequence of digits).
If you don't want to match a sequence of number that is part of a longer sequence of number, then you must add some look-around:
preg_match("/(?<!\d)\d{5,9}(?!\d)/", $string, $match)
DEMO
(?<!\d) checks that there is no digit in front of the sequence of digits. (?<!pattern) is zero-width negative look-behind, which means that without consuming text, it checks that looking behind from the current position, there is no match for the pattern.
(?!\d) checks that there is no digit after the sequence of digits. (?!pattern) is zero-width negative look-ahead, which means that without consuming text, it checks that looking ahead from the current position, there is no match for the pattern.
Use a "local" non-greedy like .*?
<?php
$string = "foo 123456 bar"; // work with "foo 123456", "123456", etc.
if (preg_match("/.*?(\d{5,9}).*/", $string, $match)) {
print_r($match);
};
result :
Array
(
[0] => foo 123456 bar
[1] => 123456
)
For more informations : http://en.wikipedia.org/wiki/Regular_expression#Lazy_quantification

Regex: Split string on number/string?

Consider the following:
700italic
regular
300bold
300bold900
All of those are different examples, only one of the rows will be executed per time.
Expected outcome:
// 700italic
array(
0 => 700
1 => itailc
)
// regular
array(
0 => regular
)
// 300bold
array(
0 => 300
1 => bold
)
// 300bold900
array(
0 => 300
1 => bold
2 => 900
)
I made the following:
(\d*)(\w*)
But it's not enough. It kinda works when i only have two "parts" (number|string or string|number) but if i add a third "segment" to it i wont work.
Any suggestions?
You could use preg_split instead. Then you can use lookarounds that match a position between a word an a letter:
$result = preg_split('/(?<=\d)(?=[a-z])|(?<=[a-z])(?=\d)/i', $input);
Note that \w matches digits (and underscores), too, in addition to letters.
The alternative (using a matching function) is to use preg_match_all and match only digits or letters for every match:
preg_match_all('/\d+|[a-z]+/i', $input, $result);
Instead of captures you will now get a single match for every of the desired elements in the resulting array. But you only want the array in the end, so you don't really care where they come from.
Could use the PREG_SPLIT_DELIM_CAPTURE flag.
Example:
<?php
$key= "group123425";
$pattern = "/(\d+)/";
$array = preg_split($pattern, $key, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
print_r($array);
?>
Check this post as well.
You're looking for preg_split:
preg_split(
'((\d+|\D+))', $subject, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY
)
Demo
Or preg_match_all:
preg_match_all('(\d+|\D+)', $test, $matches) && $matches = $matches[0];
Demo
You should match it instead of splitting it..
Still you can split it using
(?<=\d)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=\d)
You can use a pattern like this:
(\d*)([a-zA-Z]*)(\d*)
Or you can use preg_match_all with a pattern like this:
'/(?:[a-zA-Z]+|\d+)/'
Then you can match an arbitrary number of segments, each consisting of only letters or only digits.
Maybe something like this:
(\d*)(bold|italic|regular)(\d*)
or
(\d*)([a-zA-Z]*)(\d*)

Regular expression in PHP being too greedy on words

I know I'm just being simple-minded at this point but I'm stumped. Suppose I have a textual target that looks like this:
Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother's HG766 id was RB1223.
Using this RegExp: \s[A-Z][A-Z]\d\d\d\d\s, how would I extract, individually, the first and second occurrences of the matching strings? "JH6781" and "RB1223", respectively. I guarantee that the matching string will appear exactly twice in the target text.
Note: I do NOT want to change the existing string at all, so str_replace() is not an option.
Erm... how about using this regex:
/\b[A-Z]{2}\d{4}\b/
It means 'match boundary of a word, followed by exactly two capital English letters, followed by exactly four digits, followed by a word boundary'. So it won't match 'TGX7777' (word boundary is followed by three letters - pattern match failed), and it won't match 'TX77777' (four digits are followed by another digit - fail again).
And that's how it can be used:
$str = "Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother's HG766 id was RB1223.";
preg_match_all('/\b[A-Z]{2}\d{4}\b/', $str, $matches);
var_dump($matches[0]);
// array
// 0 => string 'JH6781' (length=6)
// 1 => string 'RB1223' (length=6)
$s='Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother\'s HG766 id was RB1223.';
$n=preg_match_all('/\b[A-Z][A-Z]\d\d\d\d\b/',$s,$m);
gives the result $n=2, then
print_r($m);
gives the result
Array
(
[0] => Array
(
[0] => JH6781
[1] => RB1223
)
)
You could use a combination of preg_match with the offset parameter(5th) and strpos to select the first and second occurrence.
Alternatively you could use preg_match_all and just use the first two array entries
<?php
$first = preg_match($regex, $subject, $match);
$second = preg_match($regex, $subject, $match, 0, strpos($match[0]) + 1);
?>

How do i break string into words at the position of number

I have some string data with alphanumeric value. like us01name, phc01name and other i.e alphabates + number + alphabates.
i would like to get first alphabates + number in first string and remaining on second.
How can i do it in php?
You can use a regular expression:
// if statement checks there's at least one match
if(preg_match('/([A-z]+[0-9]+)([A-z]+)/', $string, $matches) > 0){
$firstbit = $matches[1];
$nextbit = $matches[2];
}
Just to break the regular expression down into parts so you know what each bit does:
( Begin group 1
[A-z]+ As many alphabet characters as there are (case agnostic)
[0-9]+ As many numbers as there are
) End group 1
( Begin group 2
[A-z]+ As many alphabet characters as there are (case agnostic)
) End group 2
Try this code:
preg_match('~([^\d]+\d+)(.*)~', "us01name", $m);
var_dump($m[1]); // 1st string + number
var_dump($m[2]); // 2nd string
OUTPUT
string(4) "us01"
string(4) "name"
Even this more restrictive regex will also work for you:
preg_match('~([A-Z]+\d+)([A-Z]+)~i', "us01name", $m);
You could use preg_split on the digits with the pattern capture flag. It returns all pieces, so you'd have to put them back together. However, in my opinion is more intuitive and flexible than a complete pattern regex. Plus, preg_split() is underused :)
Code:
$str = 'user01jason';
$pieces = preg_split('/(\d+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($pieces);
Output:
Array
(
[0] => user
[1] => 01
[2] => jason
)

I need help applying a limit to a Regular expression on php

I am trying to find a number that consists of only 8 numbers, this is the code I have already:
preg_match_all("/([0-9]{8})/", $string, $match)
but this pulls 8 numbers from number strings that are longer than 8 digits
any help would be gratefully appreciated
Thanks
I'll use \d rather than [0-9].
If your string should contain nothing but a number of eight digits
Use ^ and $ to match start and end of string, respectively:
preg_match_all('/^(\d{8})$/', $string, $match)
If, within a larger string, you're matching a number that should have a maximum of eight digits
Quick but slightly brutish approach:
Use \D ([^0-9]) to match "not-a-number":
preg_match_all('/^|\D(\d{8})\D|$/', $string, $match)
Lookbehinds/lookaheads might make this better:
preg_match_all('/(?<!\d)(\d{8})(?!\d)/', $string, $match)
You need word boundaries
/\b[0-9]{8}\b/
Example:
$string = '34523452345 2352345234 13452345 45357567567567 24573257 35672456';
preg_match_all("/\b[0-9]{8}\b/", $string, $match);
print_r($match);
Output:
Array
(
[0] => Array
(
[0] => 13452345
[1] => 24573257
[2] => 35672456
)
)
This might be better than the other two suggestions:
preg_match_all('/(?<!\d)(\d{8})(?!\d)/', $string, $match)
Note that \d is equivalent to [0-9].
preg_match_all("/(?:^|\D)(\d{8})(?:\D|$)/", $string, $match);
Where the start and end non-matching groups (?:) allow for any non-digit (\D) or the start (^) or end ($) of the string.
Maybe include anything but digits before and after.
preg_match_all("/[^\d]([\d]{8})[^\d]/", $string, $match)

Categories