preg_match return longest match - php

I am trying to return a series of numbers between 5 and 9 digits long. I want to be able to get the longest possible match, but unfortunately preg_match just returns the last 5 characters that match.
$string = "foo 123456";
if (preg_match("/.*(\d{5,9}).*/", $string, $match)) {
print_r($match);
};
will yield results
Array
(
[0] => foo 123456
[1] => 23456
)

Since you want only the numbers, you can just remove the .* from the pattern:
$string = "foo 123456";
if (preg_match("/\d{5,9}/", $string, $match)) {
print_r($match);
};
Note that if the input string is "123456789012", then the code will return 123456789 (which is a substring of a longer sequence of digits).
If you don't want to match a sequence of number that is part of a longer sequence of number, then you must add some look-around:
preg_match("/(?<!\d)\d{5,9}(?!\d)/", $string, $match)
DEMO
(?<!\d) checks that there is no digit in front of the sequence of digits. (?<!pattern) is zero-width negative look-behind, which means that without consuming text, it checks that looking behind from the current position, there is no match for the pattern.
(?!\d) checks that there is no digit after the sequence of digits. (?!pattern) is zero-width negative look-ahead, which means that without consuming text, it checks that looking ahead from the current position, there is no match for the pattern.

Use a "local" non-greedy like .*?
<?php
$string = "foo 123456 bar"; // work with "foo 123456", "123456", etc.
if (preg_match("/.*?(\d{5,9}).*/", $string, $match)) {
print_r($match);
};
result :
Array
(
[0] => foo 123456 bar
[1] => 123456
)
For more informations : http://en.wikipedia.org/wiki/Regular_expression#Lazy_quantification

Related

trivial regex assistance

I have a string as follows:
$str="1-3";
When I pass it through here:
preg_match('#(\\d+)\\s*-\\s*(\\d+)#', $str, $matches);
I get:
$matches[0] //1-3
$matches[1] //1
$matchers[2] //3
Now if you hass something like this:
$str="a-3";
You get
$matches //empty
This is correct since it is restricted to only integers.
Now my problem is i want to implement something that functions the same however for characters.
Here's what I have so far
preg_match('#(\\w+)\\s*-\\s*(\\w+)#', $str, $matches);
$str="a-d"
I get:
$matches[0] //a-d
$matches[1] //a
$matchers[2] //d
Which works great, however if u do this (notice the integer):
$str="a-5"
I get:
$matches[0] //a-5
$matches[1] //a
$matchers[2] //5
What i need is to enforce only alphabetic characters on the subsequent regex expression - thus if you pass a-5 it should be marked as errored.
Essentially i need the first regex solution applied to the second one with characters only
Simple Change the capturing group to ([a-zA-z]+), Like(DEMO):
([a-zA-Z]+)\s*-\s*([a-zA-Z]+)
\w, works by matching, any alphanumeric characters and _ underscore. If you only want to match alphabets then you need to provide the alphabets range like
a-z small letter and A-Z capital letters.
You could use unicode property \pL that means any letter in any language:
$arr = [
'a-d',
'1-5',
'1-d',
'ç-é',
];
foreach($arr as $str) {
if (preg_match('/(\pL)\s*-\s*(\pL)/u', $str, $matches)) {
print_r($matches);
} else {
echo "$str : error\n";
}
}
Output:
Array
(
[0] => a-d
[1] => a
[2] => d
)
1-5 : error
1-d : error
Array
(
[0] => ç-é
[1] => ç
[2] => é
)

regex inside tags with specified string

I'm not very good at regex but i have a string like this :
$str = '<span id="MainStatuSSpan" style="background: brown;"> Incoming: 012345678 Group- SUPERMONEY Fronter: - 992236 UID: Y3281602190002004448</span>';
$pattern = '/(?:Fronter: - )[0-9]{1,6}/i';
preg_match($pattern, $str, $matches);
print_r($matches);
/*** ^^^^^^^ This prints :*/
Array ( [0] => Fronter: - 992236 )
In case of the Fronter is not with - or spaces I don't get the Fronter - number.
Can anyone help with an example that works in any case, there is always a Fronter and a number.
you can use Fronter:\W*[0-9]{1,6}
Fronter:\W*[0-9]{1,6} : match Fronter:
\W* : zero or more non-word characters
[0-9]{1,6} one to six digits
you regex will also find a match with Fronter:99222236 so you must use \b to avoid overflow digit length
Fronter:[- ]*[0-9]{1,6}\b

PHP Regex pulling text after period and before space

I'm attempting to pull a certain part out of different varying strings, and am having a really hard time getting the correct regex to do so. Here are a few examples of what I am trying to pull from:
AG055.MA - MAGNUM (Want to return just MA)
WI460.16 - SOMETHING (Want to return 16)
AG055.QB (Want to return QB)
So basically, I just want to pull the characters after the period, but before the space. Nothing else before or after. Can someone give me a hand with getting the correct regex?
This should work:
<?php
preg_match( '/\.([^ ]+)/', $text, $matches );
print_r( $matches );
?>
Output:
Array
(
[0] => .MA
[1] => MA
)
Array
(
[0] => .16
[1] => 16
)
Array
(
[0] => .QB
[1] => QB
)
The regex is saying find a . character, then get any characters after it that are not a space character. The + makes it only return matches where there is a non-space character after the dot.
preg_match('/\w+\.(\w{2})\s/', $input, $matches);
echo $matches[1];
\w+ means 1 or more word characters (a-z, A-Z and 0-9).
\. means the period/dot (the backslash is to escape it, because \. is used as an operator in regex)
(\w{2}) matches 2 word characters
\s means whitespace
preg_match('/^[A-Z0-9]{5}\.([A-Z0-9]{2})/', $string, $matches);
var_dump($matches);
Should return the characters in $matches[1].

Split string on non-alphanumeric characters and on positions between digits and non-digits

I'm trying to split a string by non-alphanumeric delimiting characters AND between alternations of digits and non-digits. The end result should be a flat array of consisting of alphabetic strings and numeric strings.
I'm working in PHP, and would like to use REGEX.
Examples:
ES-3810/24MX should become ['ES', '3810', '24', 'MX']
CISCO1538M should become ['CISCO' , '1538', 'M']
The input file sequence can be indifferently DIGITS or ALPHA.
The separators can be non-ALPHA and non-DIGIT chars, as well as a change between a DIGIT sequence to an APLHA sequence, and vice versa.
The command to match all occurrances of a regex is preg_match_all() which outputs a multidimensional array of results. The regex is very simple... any digit ([0-9]) one or more times (+) or (|) any letter ([A-z]) one or more times (+). Note the capital A and lowercase z to include all upper and lowercase letters.
The textarea and php tags are inluded for convenience, so you can drop into your php file and see the results.
<textarea style="width:400px; height:400px;">
<?php
foreach( array(
"ES-3810/24MX",
"CISCO1538M",
"123ABC-ThatsHowEasy"
) as $string ){
// get all matches into an array
preg_match_all("/[0-9]+|[[:upper:][:lower:]]+/",$string,$matches);
// it is the 0th match that you are interested in...
print_r( $matches[0] );
}
?>
</textarea>
Which outputs in the textarea:
Array
(
[0] => ES
[1] => 3810
[2] => 24
[3] => MX
)
Array
(
[0] => CISCO
[1] => 1538
[2] => M
)
Array
(
[0] => 123
[1] => ABC
[2] => ThatsHowEasy
)
$str = "ES-3810/24MX35 123 TEST 34/TEST";
$str = preg_replace(array("#[^A-Z0-9]+#i","#\s+#","#([A-Z])([0-9])#i","#([0-9])([A-Z])#i"),array(" "," ","$1 $2","$1 $2"),$str);
echo $str;
$data = explode(" ",$str);
print_r($data);
I could not think on a more 'cleaner' way.
The most direct preg_ function to produce the desired flat output array is preg_split().
Because it doesn't matter what combination of alphanumeric characters are on either side of a sequence of non-alphanumeric characters, you can greedily split on non-alphanumeric substrings without "looking around".
After that preliminary obstacle is dealt with, then split on the zero-length positions between a digit and a non-digit OR between a non-digit and a digit.
/ #starting delimiter
[^a-z\d]+ #match one or more non-alphanumeric characters
| #OR
\d\K(?=\D) #match a number, then forget it, then lookahead for a non-number
| #OR
\D\K(?=\d) #match a non-number, then forget it, then lookahead for a number
/ #ending delimiter
i #case-insensitive flag
Code: (Demo)
var_export(
preg_split('/[^a-z\d]+|\d\K(?=\D)|\D\K(?=\d)/i', $string, 0, PREG_SPLIT_NO_EMPTY)
);
preg_match_all() isn't a silly technique, but it doesn't return the array, it returns the number of matches and generates a reference variable containing a two dimensional array of which the first element needs to be accessed. Admittedly, the pattern is shorter and easier to follow. (Demo)
var_export(
preg_match_all('/[a-z]+|\d+/i', $string, $m) ? $m[0] : []
);

How do i break string into words at the position of number

I have some string data with alphanumeric value. like us01name, phc01name and other i.e alphabates + number + alphabates.
i would like to get first alphabates + number in first string and remaining on second.
How can i do it in php?
You can use a regular expression:
// if statement checks there's at least one match
if(preg_match('/([A-z]+[0-9]+)([A-z]+)/', $string, $matches) > 0){
$firstbit = $matches[1];
$nextbit = $matches[2];
}
Just to break the regular expression down into parts so you know what each bit does:
( Begin group 1
[A-z]+ As many alphabet characters as there are (case agnostic)
[0-9]+ As many numbers as there are
) End group 1
( Begin group 2
[A-z]+ As many alphabet characters as there are (case agnostic)
) End group 2
Try this code:
preg_match('~([^\d]+\d+)(.*)~', "us01name", $m);
var_dump($m[1]); // 1st string + number
var_dump($m[2]); // 2nd string
OUTPUT
string(4) "us01"
string(4) "name"
Even this more restrictive regex will also work for you:
preg_match('~([A-Z]+\d+)([A-Z]+)~i', "us01name", $m);
You could use preg_split on the digits with the pattern capture flag. It returns all pieces, so you'd have to put them back together. However, in my opinion is more intuitive and flexible than a complete pattern regex. Plus, preg_split() is underused :)
Code:
$str = 'user01jason';
$pieces = preg_split('/(\d+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($pieces);
Output:
Array
(
[0] => user
[1] => 01
[2] => jason
)

Categories