Look inside pattern if parent pattern matches and share chars between patterns - php

I have a string like this:
Tickets order: № 123123123. CED-MSW-RPG-MOW-CEK PODYLOVA/ALEMR 555
423578932 19OCT11 Tickets order: № 123123123. 346257.
CSK-MOW-PRG-MOW-CWQ PODYLOVA/ALEMR 555 45837043 19OCT11
I need to collect all codes that are CEK, MOW, PRG and so on. I tried this pattern firstly:
$pattern = '#[-|\s]([A-Z]{3})#';
As result a get all my codes (that's ok) and the first 3 chars of users surname: "POD" from "PODYLOVA". If i say "after my code must be an hyphen or free space char by changing my pattern to this:
$pattern = '#[-|\s]([A-Z]{3})[-|\s]#';
My $matches var has this:
array (
0 =>
array (
0 => ' CED-',
1 => '-RPG-',
2 => '-CEK ',
3 => ' CSK-',
4 => '-PRG-',
5 => '-CWQ ',
),
1 =>
array (
0 => 'CED',
1 => 'RPG',
2 => 'CEK',
3 => 'CSK',
4 => 'PRG',
5 => 'CWQ',
),
)
You can see, that my pattern doesn't "share" the hyphen between desired codes.
I see two solutions, but cannot imaging the pattern, which will suit:
Make the pattern to share the hyphen between codes
Make more complicated pattern: firstly collect the text which contains codes ("CED-MSW-RPG-MOW-CEK") and then get all #([A-Z]{3}# inside this pattern.
It seems, that solution#1 is the best in my case, but how it should look?

Try this:
\b([A-Z]{3})\b
HTH

does this give you what you want?
(?<=-|\s)[A-Z]{3}(?=-|\s)
tested with grep:
kent$ echo "Tickets order: № 123123123. CED-MSW-RPG-MOW-CEK PODYLOVA/ALEMR 555 423578932 19OCT11 Tickets order: № 123123123. 346257. CSK-MOW-PRG-MOW-CWQ PODYLOVA/ALEMR 555 45837043 19OCT11"|grep -Po '(?<=-|\s)[A-Z]{3}(?=-|\s)'
CED
MSW
RPG
MOW
CEK
CSK
MOW
PRG
MOW
CWQ

Related

REGEX Pattern for Validation that check all string is integer and split into single integers

I tried multiple time to make a pattern that can validate given string is natural number and split into single number.
..and lack of understanding of regex, the closest thing that I can imagine is..
^([1-9])([0-9])*$ or ^([1-9])([0-9])([0-9])*$ something like that...
It only generates first, last, and second or last-second split-numbers.
I wonder what I need to know to solve this problem.. thanks
You may use a two step solution like
if (preg_match('~\A\d+\z~', $s)) { // if a string is all digits
print_r(str_split($s)); // Split it into chars
}
See a PHP demo.
A one step regex solution:
(?:\G(?!\A)|\A(?=\d+\z))\d
See the regex demo
Details
(?:\G(?!\A)|\A(?=\d+\z)) - either the end of the previous match (\G(?!\A)) or (|) the start of string (^) that is followed with 1 or more digits up to the end of the string ((?=\d+\z))
\d - a digit.
PHP demo:
$re = '/(?:\G(?!\A)|\A(?=\d+\z))\d/';
$str = '1234567890';
if (preg_match_all($re, $str, $matches)) {
print_r($matches[0]);
}
Output:
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => 7
[7] => 8
[8] => 9
[9] => 0
)

Get all matches with pure regex?

I'm working in PHP and need to parse strings looking like this:
Rake (100) Pot (1000) Players (andy: 10, bob: 20, cindy: 70)
I need to get the rake, pot, and rake contribution per player with names. The number of players is variable. Order is irrelevant so long as I can match player name to rake contribution in a consistent way.
For example I'm looking to get something like this:
Array
(
[0] => Rake (100) Pot (1000) Players (andy: 10, bob: 20, cindy: 70)
[1] => 100
[2] => 1000
[3] => andy
[4] => 10
[5] => bob
[6] => 20
[7] => cindy
[8] => 70
)
I was able to come up with a regex which matches the string but it only returns the last player-rake contribution pair
^Rake \(([0-9]+)\) Pot \(([0-9]+)\) Players \((?:([a-z]*): ([0-9]*)(?:, )?)*\)$
Outputs:
Array
(
[0] => Rake (100) Pot (1000) Players (andy: 10, bob: 20, cindy: 70)
[1] => 100
[2] => 1000
[3] => cindy
[4] => 70
)
I've tried using preg_match_all and g modifiers but to no success. I know preg_match_all would be able to get me what I wanted if I ONLY wanted the player-rake contribution pairs but there is data before that I also require.
Obviously I can use explode and parse the data myself but before going down that route I need to know if/how this can be done with pure regex.
You could use the below regex,
(?:^Rake \(([0-9]+)\) Pot \(([0-9]+)\) Players \(|)(\w+):?\s*(\d+)(?=[^()]*\))
DEMO
| at the last of the first non-capturing group helps the regex engine to match the characters from the remaining string using the pattern which follows the non-capturing group.
I would use the following Regex to validate the input string:
^Rake \((?<Rake>\d+)\) Pot \((?<Pot>\d+)\) Players \(((?:\w*: \d*(?:, )?)+)\)$
And then just use the explode() function on the last capture group to split the players out:
preg_match($regex, $string, $matches);
$players = explode(', ', $matches[2]);

How to capture multiple occurences of subpattern into one capture?

I want a regex expression which will capture multiple occurrences into one group. As an example, imagine the following phrases:
cat | likes her | mat
dog | goes to his | basket
I want to be able to capture each part of the phrase into a fixed position
array(
0 => cat likes her mat
1 => cat
2 => likes her
3 => mat
)
Obviously using:
$regex = '/(cat|dog)( likes| goes| to| his| her)* (mat|basket)/';
preg_match($regex, "The cat likes her mat", $m);
gives:
array(
0 => cat likes her mat
1 => cat
2 => likes
3 => her
4 => mat
)
But I always want mat/basket in $m[3], regardless of how many words are matched in the middle.
I have tried this:
$regex = '/(cat|dog)(?:( likes| goes| to| his| her)*) (mat|basket)/';
to try and prevent capturing of the multiple subpatterns, but this causes only the first word to be captured i.e.
array(
0 => cat likes her mat
1 => cat
2 => likes
3 => mat
)
Does anyone know how I can capture the whole of the middle part of the phrase (of an unknown number of wards length), but still get it into predicted output.
btw I cannot use (cat|dog).*?(mat|basket) because there are only specified words which are allowed in the middle.
The above is just an example; the actual usage has many more options for each of the subpatterns.
Thanks.
did you try this pattern:
/\b(cat|dog) ((?: ?(?:likes|goes|to|his|her)\b)*) ?(mat|basket)\b/
How about this pattern?
$regex = '/\b(cat|dog)\b((?:\b(?:\s+|likes|goes|to|his|her)\b)*)\b(mat|basket)\b/';
preg_match($regex, "The cat likes her mat", $m);
I have this result:
array (size=4)
0 => string 'cat likes her mat' (length=17)
1 => string 'cat' (length=3)
2 => string ' likes her ' (length=11)
3 => string 'mat' (length=3)
I voted for Casimir's result, however his pattern returns false positive on these strings:
cat likesher mat
cat likes her mat
cat mat

Separating a few things with preg_split

For the life of me, I can't figure out how to write the regex to split this.
Lets say we have the sample text:
15HGH(Whatever)ASD
I would like to break it down into the following groups (numbers, letters by themselves, and parenthesis contents)
15
H
G
H
Whatever
A
S
D
It can have any combination of the above such as:
15HGH
12ABCD
ABCD(Whatever)(test)
So far, I have gotten it to break apart either the numbers/letters or just the parenthesis part broken away. For example, in this case:
<?php print_r(preg_split( "/(\(|\))/", "5(Test)(testing)")); ?>
It will give me
Array
(
[0] => 5
[1] => Test
[2] => testing
)
I am not really sure what to put in the regex to match on only numbers and individual characters when combined. Any suggestions?
I don't know if preg_match_all satisfying you:
$text = '15HGH(Whatever)ASD';
preg_match_all("/([a-z]+)(?=\))|[0-9]+|([a-z])/i", $text, $out);
echo '<pre>';
print_r($out[0]);
Array
(
[0] => 15
[1] => H
[2] => G
[3] => H
[4] => Whatever
[5] => A
[6] => S
[7] => D
)
I've got this: Example (I don't know how is written the \n) but the substitution is working.
(\d+|\w|\([^)]++\)) Not too much to explain, first tries to get a number, then a char, and if there's nothing there, tries to get a whole word between parentheses. (They can't be nested)
Check this out using preg_match_all():
$string = '15HGH(Whatever)(Whatever)ASD';
preg_match_all('/\(([^\)]+)\)|(\d+)|([a-z])/i', $string, $matches);
$results = array_merge(array_filter($matches[1]),array_filter($matches[2]),array_filter($matches[3]));
print_r($results);
\(([^\)]+)\) --> Matches everything between parenthesis
\d+ --> Numbers only
[a-z] --> Single letters only
i --> Case insensitive

regex match between 2 strings

For example I have the text
a1aabca2aa3adefa4a
I want to extract 2 and 3 with a regex between abc and def, so 1 and 4 should be not included in the result.
I tried this
if(preg_match_all('#abc(?:a(\d)a)+def#is', file_get_contents('test.txt'), $m, PREG_SET_ORDER))
print_r($m);
I get this
> Array
(
[0] => Array
(
[0] => abca1aa2adef
[1] => 3
)
)
But I want this
Array
(
[0] => Array
(
[0] => abca1aa2adef
[1] => 2
[2] => 3
)
)
Is this possible with one preg_match_all call? How can I do it?
Thanks
preg_match_all(
'/\d # match a digit
(?=.*def) # only if followed by <anything> + def
(?!.*abc) # and not followed by <anything> + abc
/x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
works on your example. It assumes that there is exactly one instance of abc and def per line in your string.
The reason why your attempt didn't work is that your capturing group (\d) that matches the digit is within another, repeated group (?:a(\d)a)+. With every repetition, the result of the capture is overwritten. This is how regular expressions work.
In other words - see what's happening during the match:
Current position Current part of regex Capturing group 1
--------------------------------------------------------------
a1a no match, advancing... undefined
abc abc undefined
a2a (?:a(\d)a) 2
a3a (?:a(\d)a) (repeated) 3 (overwrites 2)
def def 3
You ask if it is possible with a single preg_match_all.
Indeed it is.
This code outputs exactly what you want.
<?php
$subject='a1aabca2aa3adefa4a';
$pattern='/abc(?:a(\d)a+(\d)a)def/m';
preg_match_all($pattern, $subject, $all_matches,PREG_OFFSET_CAPTURE | PREG_PATTERN_ORDER);
$res[0]=$all_matches[0][0][0];
$res[1]=$all_matches[1][0][0];
$res[2]=$all_matches[2][0][0];
var_dump($res);
?>
Here is the output:
array
0 => string 'abca2aa3adef' (length=12)
1 => string '2' (length=1)
2 => string '3' (length=1)

Categories