Struggling with regular expression, removing number from string - php

I have a number of strings such as these:
Virtus.pro (13)
mousesports (16)
Natus Vincere (12)
As you can see these is no really common way splitting the name from the number in all cases.
I'm really new to Regex. Does anyone have any ideas how I could split these strings to contain 2 variables?
Virtus.pro and 13. then mousesports and 16?
As you can see the Natus Vincere one has a space between the two parts of the name.
Really struggling, I've only been able to come up with a regex for extracting the number. But this doesn't work everytime.

I think you're looking for something like this:
$data = [
"Virtus.pro (13)",
"mousesports (16)",
"Natus Vincere (12)"
];
foreach ($data as $string) {
$matches = [];
preg_match('/(.*)\s\((\d+)\)/', $string, $matches);
list(, $team, $score) = $matches;
var_dump($team, $score);
}
Output:
string(10) "Virtus.pro"
string(2) "13"
string(11) "mousesports"
string(2) "16"
string(13) "Natus Vincere"
string(2) "12"
The idea is to look for a substring followed by a space, opening parenthesis, some digits, and a closing parenthesis. The leading substring and the digits are snagged up in capturing groups then spit out into $team and $score.

r'([a-zA-Z. ]+) (\(\d{1,2}\))'
I tried this one in python, it works for me.
You'd better provide more details I think, for example, the format of the names, which kind of punctuation it contains, and the number, how many digits it has, etc.
In my answer above, the name string can contains '.' and ' ', and the number will be 1 or 2 digits.
you can change it to
r'([a-zA-Z. ]+) \((\d+)\)'
to match a number that you don't know how many digits it contains.
it groups the match results by the way, the second group (index 1) is the name, the third group (index 2) is the number.
>>> import re
>>> are=re.compile(r'([a-zA-Z. ]+) \((\d{1,2})\)')
>>> d=are.search('Virtus.pro (13)')
>>> d.group()
'Virtus.pro (13)'
>>> d.group(1)
'Virtus.pro'
>>> d.group(2)
'13'
hope it helps.

Hi you can use something like this
#!/usr/bin/env python
import re
regex = re.compile('^(.*)\((\d+)\)$')
my_match = regex.match('Virtus.pro (13)')
You can then do:
m.group(1) #to get 'Virtus.pro '
m.group(2) #to get '13'
This is implemented in python btw

Related

Match string with 1 or more trailing substrings

I have an input that goes like this
[d/D/d1/d2/d3/d4/d5/d6/d7/D1/D2/D3/D4/D5/D6/D7]+[\.]+[r1/r2/r3/r4/r5/r6/R1/R2/R3/R4/R5/R6]+[\.]+[number 1 to 37]+[#]+[number 0 - 9 ]
An example would be "d2.r1.4#100.37#1.9#2.3#1(can have as many 1-37 # 0-9 as needed)"
How do I write a regex match that can allow the last part of the string to be dynamic (matches as many groups as needed as inputted)
I've tried this expression:
[dD1-7]+\.[rR1-5]+\.
and I'm not sure how to match the dynamic group that comes after the "d2.r1." part.
Assuming you merely need to validate the string (and not capture/extract specific substrings), the following pattern provides the same result as Emma's answer but with a tighter syntax.
The i pattern modifier means you only have to write the two letters in lowercase. I don't use any excess non-capturing groups. Two-character character classes don't need a hyphen. \d is the shorter way of expressing [0-9].
Wrapping the final/repeating characters in parentheses then writing * means the sequence in the parentheses may repeat zero or more times.
Code: (Demo)
$inputs = [
'd2.r1.4#100.37#1.9#2.3#1',
'd2.r1.4#100.37#1.9#2.38#1.8#22',
'd2.r1.4#100.37#1.9#2.3#1.12#2.30#2',
];
$pattern = '/^d[1-7]\.r[1-6](?:\.(?:3[0-7]|[12]\d|[1-9])#\d+)*$/i';
foreach ($inputs as $input) {
echo "\n{$input}: ";
var_export((bool)preg_match($pattern, $input));
}
Output:
d2.r1.4#100.37#1.9#2.3#1: true
d2.r1.4#100.37#1.9#2.38#1.8#22: false
d2.r1.4#100.37#1.9#2.3#1.12#2.30#2: true
I'm guessing that maybe some expression similar to,
^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$
or with some slight changes, would likely work here.
Test
$re = '/^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$/m';
$str = 'd2.r1.4#100.37#1.9#2.3#1
d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1
d2.r1.4#100.38#1.9#2.3#1
d2.r1.4#100.0#1.9#2.3#1
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(2) {
[0]=>
array(1) {
[0]=>
string(24) "d2.r1.4#100.37#1.9#2.3#1"
}
[1]=>
array(1) {
[0]=>
string(63) "d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

How to get numbers between a long space (PHP Regex)

I'd like to extract the numbers specifically with a PHP regex expression, I don't get the regex very much although I'm currently trying with the regex101 website. Thing is, I have this:
66
28006 MadridVer teléfono
(Literally that, it's seen with a lot of more spaces and 28006 MadridVer teléfono is presented in the next line actually). And I'd like to extract the number 28006 or at least split the findings of the expression in a way I have the 28006 separately in one of the groups. What would be my php regex expresion like? Maybe apart from capturing spaces I should capture a new line or something. But I am totally lost in this (yes, I'm an absolute regex novice yet).
I don't see a need for regex.
Remove the new line and explode on space.
Then use array_filter to remove empty values from the array and rearrange the array with array_values.
$str = "66
28006 MadridVer teléfono";
$str = str_replace("\n", " ", $str);
$arr = explode(" ", $str);
$arr = array_values(array_filter($arr));
var_dump($arr);
Returns:
array(4) {
[0]=>
string(2) "66"
[1]=>
string(5) "28006"
[2]=>
string(9) "MadridVer"
[3]=>
string(9) "teléfono"
}

PHP how to get the second number present in a string

I would like to get the second number present in a string.
Maybe you have better ideas than mine.
From this example:
1 PLN = 0.07 Gold
I would like to get only "0.07" from that string.
I obtain it from web scraping so it returns me as a string.
The problem that I have is the following.
The "second number in string" might be with ".", without it, might be composed by only one number ex. "1", or by two "12", might have decimals ex. "1.2", even position may change, because for some currency I will have "1 PLN", "1 USD" for others I will have "1 TW".
So I can't work on position, I can't extract only numbers (I have the "1" at the beginning of the string), I can't extract only INT cause I could have also decimals...
So the only constant of that string - I think (but if you have better ideas pls suggest me) - is that I need the second number I find in the string.
How could I get it?
Sorry If I wasn't enough clear.
Try this:
<?php
$string = '1 PLN = 0.07 Gold';
$pattern = '/\d+\.\d+/';
$matches = array();
$r = preg_match($pattern, $string, $matches);
var_dump($matches); //$matches is array with results
Output:
array(1) {
[0]=>
string(4) "0.07"
}
If its always in this format 1 PLN = 0.07 Gold You can just
$array = explode(" ", $string) with a Space and then get the required number with
$number = $array[3]
Try it out, let me know if it works
You can use a simple regex patter to isolate all numbers, including decimals if any after the equals sign.
The below works if the string will have the same structure, meaning an = a space and then the number that you are after.
= - matches the equals sign
\s - matches the space character immediately after
(\d*\.?\d*) - matches any number of digits followed by an optional period . and then any number of digits
$str = '1 PLN = 0.07 Gold';
preg_match('#=\s(\d*\.?\d*)#',$str,$matches);
print $matches[1];
Will output
0.07
This works regardless of what you have before the = sign.

php preg_match_all returning array of arrays

I want to replace some template tags:
$tags = '{name} text {first}';
preg_match_all('~\{(\w+)\}~', $tags, $matches);
var_dump($matches);
output is:
array(2) {
[0]=> array(2) {
[0]=> string(6) "{name}"
[1]=> string(7) "{first}"
}
[1]=> array(2) {
[0]=> string(4) "name"
[1]=> string(5) "first"
}
}
why are there inside 2 arrays? How to achieve only second one?
The sort answer:
Is there an alternative? Of course there is: lookaround assertions allow you to use zero-width (non-captured) single char matches easily:
preg_match_all('/(?<=\{)\w+(?=})/', $tags, $matches);
var_dump($matches);
Will dump this:
array(1) {
[0]=>
array(2) {
[0]=>
string(4) "name"
[1]=>
string(5) "first"
}
}
The pattern:
(?<=\{): positive lookbehind - only match the rest of the pattern if there's a { character in front of it (but don't capture it)
\w+: word characters are matches
(?=}): only match preceding pattern if it is followed by a } character (but don't capture the } char)
It's that simple: the pattern uses the {} delimiter chars as conditions for the matches, but doesn't capture them
Explaining this $matches array structure a bit:
The reason why $matches looks the way it does is quite simple: when using preg_match(_all), the first entry in the match array will always be the entire string matched by the given regex. That's why I used zero-width lookaround assertions, instead of groups. Your expression matches "{name}" in its entirety, and extracts "name" through grouping.
The matches array will hold the full match on index 0, and add groups at every subsequent index, in your case that means that:
$matches[0] will contain all substrings matching /\{\w+\}/ as a pattern.
$matches[1] will contain all substrings that were captured (/\{(\w+)\}/ captures (\w+)).
If you were to have a regex like this: /\{((\w)([^}]+))}/ the matches array will look something like this:
[
0 => [
'{name}',//as if you'd written /\{\w[^}]+}/
],
1 => [
'name',//matches group (\w)([^}]+), as if you wrote (\w[^}]+)
],
2 => [
'n',//matches (\w) group
],
3 => [
'ame',//and this is the ([^}]+) group obviously
]
]
Why? simple because the pattern contains 3 matching groups. Like I said: the first index in the matches array will always be the full match, regardless of capture groups. The groups are then appended to the array in the order the appear in in the expression. So if we analyze the expression:
\{: not matches, but part of the pattern, will only be in the $matches[0] values
((\w)([^}]+)): Start of first matching group, \w[^}]+ match is grouped here, $matches[1] will contain these values
(\w): Second group, a single \w char (ie first character after {. $matches[2] will therefore contain all first characters after a {
([^}]+): Third group, matches rest of string after {\w until a } is encountered, this will make out the $matches[3] values
To better understand, and be able to predict the way $matches will get populated, I'd strongly recommend you use this site: regex101. Write your expression there, and it'll break it all down for you on the right hand side, listing the groups. For example:
/\{((\w)([^}]+))}/
Is broken down like this:
/\{((\w)([^}]+))}/
\{ matches the character { literally
1st Capturing group ((\w)([^}]+))
2nd Capturing group (\w)
\w match any word character [a-zA-Z0-9_]
3rd Capturing group ([^}]+)
[^}]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
} the literal character }
} matches the character } literally
Looking at the capturing groups, you can now confidently say what $matches will look like, and you can safely say that $matches[2] will be an array of single characters.
Of course, this may leave you wondering why $matches is a 2D array. Well, that again is really quite easy: What you can predict is how many match indexes a $matches array will contain: 1 for the full pattern, then +1 for each capture group. What you Can't predict, though, is how many matches you'll find.
So what preg_match_all does is really quite simple: fill $matches[0] with all substrings that match the entire pattern, then extract each group substring from these matches and append that value onto the respective $matches arrays. In other words, the number of arrays that you can find in $matches is a given: it depends on the pattern. The number of keys you can find in the sub-arrays of $matches is an unknown, it depends on the string you're processing. If preg_match_all were to return a 1D array, it would be a lot harder to process the matches, now you can simply write this:
$total = count($matches);
foreach ($matches[0] as $k => $full) {
echo $full . ' contains: ' . PHP_EOL;
for ($i=1;$i<$total;++$i) {
printf(
'Group %d: %s' . PHP_EOL,
$i, $matches[$i][$k]
);
}
}
If preg_match_all created a flat array, you'd have to keep track of the amount of groups in your pattern. Whenever the pattern changes, you'd also have make sure to update the rest of the code to reflect the changes made to the pattern, making your code harder to maintain, whilst making it more error-prone, too
Thats because your regex could have multiple match groups - if you have more (..) you would have more entries in your array. The first one[0] ist always the whole match.
If you want an other order of the array, you could use PREG_SET_ORDER as the 4. argument for preg_match_all. Doing this would result in the following
array(2) {
[0]=> array(2) {
[0]=> string(6) "{name}"
[1]=> string(7) "name"
}
[1]=> array(2) {
[0]=> string(4) "{first}"
[1]=> string(5) "first"
}
}
this could be easier if you loop over your result in a foreach loop.
If you only interessted in the first match - you should stay with the default PREG_PATTERN_ORDER and just use $matches[1]

Regex with multiple newlines in sequence

I'm trying to use PHP's split() (preg_split() is also an option if your answer works with it) to split up a string on 2 or more \r\n's. My current effort is:
split("(\r\n){2,}",$nb);
The problem with this is it matches every time there is 2 or 3 \r\n's, then goes on and finds the next one. This is ineffective with 4 or more \r\n's.
I need all instances of two or more \r\n's to be treated the same as two \r\n's. For example, I'd need
Hello\r\n\r\nMy\r\n\r\n\r\n\r\n\r\n\r\nName is\r\nShadow
to become
array('Hello','My','Name is\r\nShadow');
preg_split() should do it with
$pattern = "/(\\r\\n){2,}/";
What about the following suggestion:
$nb = implode("\r\n", array_filter(explode("\r\n", $nb)));
It works for me:
$nb = "Hello\r\n\r\nMy\r\n\r\n\r\n\r\n\r\n\r\nName is\r\nShadow";
$parts = split("(\r\n){2,}",$nb);
var_dump($parts);
var_dump($parts === array('Hello','My',"Name is\r\nShadow"));
Prints:
array(3) {
[0]=>
string(5) "Hello"
[1]=>
string(2) "My"
[2]=>
string(15) "Name is
Shadow"
}
bool(true)
Note the double quotes in the second test to get the characters represented by \r\n.
Adding the PREG_SPLIT_NO_EMPTY flag to preg_replace() with Tomalak's pattern of "/(\\r\\n){2,}/" accomplished this for me.
\R is shorthand for matching newline sequences across different operating systems. You can prevent empty elements being created at the start and end of your output array by using the PREG_SPLIT_NO_EMPTY flag or you could call trim() on the string before splitting.
Code: (Demo)
$string = "\r\n\r\nHello\r\n\r\nMy\r\n\r\n\r\n\r\n\r\n\r\nName is\r\nShadow\r\n\r\n\r\n\r\n";
var_export(preg_split('~\R{2,}~', $string, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_split('~\R{2,}~', trim($string)));
Output from either technique:
array (
0 => 'Hello',
1 => 'My',
2 => 'Name is
Shadow',
)

Categories