php: how to find numbers in string? - php

I have some strings like:
some words 1-25 to some words 26-50
more words 1-10
words text and words 30-100
how can I find and get from string all of the "1-25" and the "26-50" and more

If it’s integers, match multiple digits: \d+. To match the whole range expression: (\d+)-(\d+).
Maybe you also want to allow whitespace between the dash and the numbers:
(\d+)\s*-\s*(\d+)
And maybe you want to make sure that the expression stands free, i.e. isn’t part of a word:
\b(\d+)\s*-\s*(\d+)\b
\b is a zero-width match and tests for word boundaries. This expression forbids things
like “Some1 -2text” but allows “Some 1-2 text”.

You can do this with regular expressions:
echo preg_match_all('/([0-9]+)-([0-9]+)/', 'some words 1-25 to some words 26-50 more words 1-10 words text and words 30-100', $matches);
4
print_r($matches);
Array
(
[0] => Array
(
[0] => 1-25
[1] => 26-50
[2] => 1-10
[3] => 30-100
)
[1] => Array
(
[0] => 1
[1] => 26
[2] => 1
[3] => 30
)
[2] => Array
(
[0] => 25
[1] => 50
[2] => 10
[3] => 100
)
)
For each range the first value is in array[1] and the second is in array[2] at the same index.

I think this line is enough
preg_replace("/[^0-9]/","",$string);

Related

PHP: Can preg_match include unmatched groups?

Can the preg_match() function include groups it did not find in the matches array?
Here is the pattern I'm using:
/^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/
What I'm trying to is parse an human readable size into bytes. This pattern fits my requirement, but only if I can retrieve matches in the absolute group order.
This can produce upto 5 match groups, which would result in a matches array with indices 0-5. However if the string does not match all groups, then the matches array may have, for example, group 5 actually at index 3.
What I'd like is the final match in that pattern (5) to always be at the same index of the matches array. Because multiple groups are optional it's very important that when reading the matches array we know which group in the expression got matched.
Example situation: The regex tester at regexr.com will show all 5 groups including those not matched always in the correct order. By enabling the "global" and "multi-line" flags and using the following text, you can hover over the blue matches for a good visual.
500.2 KiB
256M
700 Mb
1.2GiB
You'll notice that not all groups are always matched, however the group indexes are always in the correct order.
Edit: Yes I did already try this in PHP with the following:
$matches = [];
$matchesC = 0;
$matchesN = 6;
if (!preg_match("/^([0-9]+)(\.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/", $size, $matches) || ($matchesC = count($matches)) < $matchesN) {
print_r($matches);
throw new \Exception(sprintf("Could not parse size string. (%d/%d)", $matchesC, $matchesN));
}
When $size is "256M" that print_r($matches); returns:
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
)
Groups 4 and 5 are missing.
The non-participating groups are just not initialized with an empty string value in PHP, so, Group 4 and 5 are null in case of '256M' string. It seems that preg_match discards those non-initialized values from the end of the array.
In your case, you can make your capturing groups non-optional, but the patterns inside optional.
$arr = array('500.2 KiB', '256M', '700 Mb', '1.2GiB');
foreach ($arr as $s) {
if (preg_match('~^([0-9]+)(\.[0-9]+)?\s?([^ib]?)(i?)(b?)$~i', $s, $m)) {
print_r($m) . "\n";
}
}
Output:
Array
(
[0] => 500.2 KiB
[1] => 500
[2] => .2
[3] => K
[4] => i
[5] => B
)
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
[4] =>
[5] =>
)
Array
(
[0] => 700 Mb
[1] => 700
[2] =>
[3] => M
[4] =>
[5] => b
)
Array
(
[0] => 1.2GiB
[1] => 1
[2] => .2
[3] => G
[4] => i
[5] => B
)
See the PHP demo.
You can use T-Regx which can handle such cases with ease! It always checks whether a group is matched, even if it's last and unmatched. It also can differentiate between "" (matched empty) or null (unmatched):
pattern('^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$')
->match($size)
->first(function (Match $match) {
// whether the group was used in a pattern
$match->hasGroup(14);
// whether the group was matched, even if last or empty string
$match->matched(5);
// group, or default value if not matched
$match->group(5)->orReturn('unmatched');
});

regex to find numbers in string that starts with 08 and are between 10-13 char

I am trying to parse a string that contain strings that are 9-11 characters long and are integers and starts with 08 or +62. How do I do this in PHP? Here's my regex thus far:
/^(\+?62|08)[0-9]{9,11}$/
so here's some sample string/integer I should be able to extract out of a long string
082298744807
087884962429
087783218768
0818809692
081224505277
+628129191929
+62812123929
It's unclear if you want to match numbers between 10-13 digits, or 9-11. In any case it's a simple fix (just count the initial two digits 08, or 62 as part of the total sum of digits. To implement this in php use preg_match_all:
$pat = "/^(?:\+?62|08)[0-9]{8,11}$/uim"; // note modfied range of digits
preg_match_all($pat, $str, $res);
print_r($res[0]);
Example:
http://ideone.com/GJYK7C
Result:
Array
(
[0] => 082298744807
[1] => 087884962429
[2] => 087783218768
[3] => 0818809692
[4] => 081224505277
[5] => +628129191929
[6] => +62812123929
[7] => +629490029944
)

PHP Regex: How to get optional text if present?

Let's take an example of following string:
$string = "length:max(260):min(20)";
In the above string, :max(260):min(20) is optional. I want to get it if it is present otherwise only length should be returned.
I have following regex but it doesn't work:
/(.*?)(?::(.*?))?/se
It doesn't return anything in the array when I use preg_match function.
Remember, there can be something else than above string. Maybe like this:
$string = "number:disallow(negative)";
Is there any problem in my regex or PHP won't return anything? Dumping preg_match returns int 1 which means the string matches the regex.
Fully Dumped:
int 1
array (size=2)
0 => string '' (length=0)
1 => string '' (length=0)
You're using single character (.) matching in the case of being lazy, at the very beginning. So it stops at the zero position. If you change your preg_match function to preg_match_all you'll see the captured groups.
Another problem is with your Regular Expression. You're killing the engine. Also e modifier is deprecated many many decades before!!! and yet it was used in preg_replace function only.
Don't use s modifier too! That's not needed.
This works at your case:
/([^:]+)(:.*)?/
Online demo
I tried to prepare a regex which can probably solve your issue and also add some value to it
this regex will not only match the optional elements but will also capture in key value pair
Regex
/(?<=:|)(?'prop'\w+)(?:\((?'val'.+?)\))?/g
Test string
length:max(260):min(20)
length
number:disallow(negative)
Result
MATCH 1
prop [0-6] length
MATCH 2
prop [7-10] max
val [11-14] 260
MATCH 3
prop [16-19] min
val [20-22] 20
MATCH 4
prop [24-30] length
MATCH 5
prop [31-37] number
MATCH 6
prop [38-46] disallow
val [47-55] negative
try demo here
EDIT
I think I understand what you meant by duplicate array with different key, it was due to named captures eg. prop & val
here is the revision without named capturing
Regex
/(?<=:|)(\w+)(?:\((.+?)\))?/
Sample code
$str = "length:max(260):min(20)";
$str .= "\nlength";
$str .= "\nnumber:disallow(negative)";
preg_match_all("/(?<=:|)(\w+)(?:\((.+?)\))?/",
$str,
$matches);
print_r($matches);
Result
Array
(
[0] => Array
(
[0] => length
[1] => max(260)
[2] => min(20)
[3] => length
[4] => number
[5] => disallow(negative)
)
[1] => Array
(
[0] => length
[1] => max
[2] => min
[3] => length
[4] => number
[5] => disallow
)
[2] => Array
(
[0] =>
[1] => 260
[2] => 20
[3] =>
[4] =>
[5] => negative
)
)
try demo here

RegEx Statement Issues - PHP

I am attempting to use RegEx to strip down the following data:
mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109
I am hoping to split it apart by home team (first city), home score (first digit), away team (second city), away score (second digit), and where in the game it is (in parenthesis). This is the RegEx I have currently, but am feeling is very wrong.
preg_match_all('/mlb_s_left[0-9]=(?P<hometeam>.*?) (?P<homescore>.*?) (?P<awayteam>.*?) (?P<awayscore>.*?)\((?P<time>.*?)\)/', $content, $matches);
I would appreciate any and all help in getting this working.
I have tested following code snippet in php 5.4.5:
<?php
$foo = 'mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109';
preg_match_all('/mlb_s_left\d=\^?(?P<hometeam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<homescore>\d+)\s+\^?(?P<awayteam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<awayscore>\d+)\s+\((?P<time>\w+)\)/', $foo, $matches, PREG_SET_ORDER);
print_r($matches);
?>
output:
Array
(
[0] => Array
(
[0] => mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)
[hometeam] => Baltimore
[1] => Baltimore
[homescore] => 3
[2] => 3
[awayteam] => NY Yankees
[3] => NY Yankees
[awayscore] => 12
[4] => 12
[time] => FINAL
[5] => FINAL
)
[1] => Array
(
[0] => mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)
[hometeam] => Chicago Sox
[1] => Chicago Sox
[homescore] => 3
[2] => 3
[awayteam] => Minnesota
[3] => Minnesota
[awayscore] => 2
[4] => 2
[time] => FINAL
[5] => FINAL
)
)
Something like this should get you close.
preg_match_all('/mlb_s_left\d+=(?P<hometeam>\D+)\s+(?P<homescore>\d+)\s+(?P<awayteam>\D+)\s+(?P<awayscore>\d+)\s*\((?P<time>[^)]+)\)/',
$content, $matches);
Note that \d matches any digit, and \D matches anything that is not a digit.
[^)]+ matches one or more non-close parens characters; \s+ matches one or more whitespace chars, and \s* matches zero or more whitespace characters.
This wouldn't work very well if you have a city name with a number in it, and if you have a huge string, it's possible it could get hung up somewhere; you might consider splitting it up and matching a bit more piecemeal.
Generally speaking I would avoid .*? as a pattern match, as it basically matches almost anything. It's best for your regular expression to be as specific as possible, based on what you know about the data.

Difficulty with preg_match

I'm having some difficulty with preg_match. I'm trying to match roman numerals, like this:
$string='This is roman XI and some other ones: XMCIII, like this.XXVIII'."\n";
preg_match('/(\s|\.)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\s/',$string,$matches);
print_r($matches);
It should match any roman numeral preceded with whitespace or period and ending with whitespace. But it returns the following:
Array
(
[0] => XI
[1] =>
[2] =>
[3] => X
[4] => I
)
You have {0, 4} or {0,3} ranges in regex which means that those parts are optional. You get spaces because space[nothing]space becomes a valid match.
You can simply filter out the empty space results from your array using array_filter

Categories