RegEx Statement Issues - PHP - php

I am attempting to use RegEx to strip down the following data:
mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109
I am hoping to split it apart by home team (first city), home score (first digit), away team (second city), away score (second digit), and where in the game it is (in parenthesis). This is the RegEx I have currently, but am feeling is very wrong.
preg_match_all('/mlb_s_left[0-9]=(?P<hometeam>.*?) (?P<homescore>.*?) (?P<awayteam>.*?) (?P<awayscore>.*?)\((?P<time>.*?)\)/', $content, $matches);
I would appreciate any and all help in getting this working.

I have tested following code snippet in php 5.4.5:
<?php
$foo = 'mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109';
preg_match_all('/mlb_s_left\d=\^?(?P<hometeam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<homescore>\d+)\s+\^?(?P<awayteam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<awayscore>\d+)\s+\((?P<time>\w+)\)/', $foo, $matches, PREG_SET_ORDER);
print_r($matches);
?>
output:
Array
(
[0] => Array
(
[0] => mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)
[hometeam] => Baltimore
[1] => Baltimore
[homescore] => 3
[2] => 3
[awayteam] => NY Yankees
[3] => NY Yankees
[awayscore] => 12
[4] => 12
[time] => FINAL
[5] => FINAL
)
[1] => Array
(
[0] => mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)
[hometeam] => Chicago Sox
[1] => Chicago Sox
[homescore] => 3
[2] => 3
[awayteam] => Minnesota
[3] => Minnesota
[awayscore] => 2
[4] => 2
[time] => FINAL
[5] => FINAL
)
)

Something like this should get you close.
preg_match_all('/mlb_s_left\d+=(?P<hometeam>\D+)\s+(?P<homescore>\d+)\s+(?P<awayteam>\D+)\s+(?P<awayscore>\d+)\s*\((?P<time>[^)]+)\)/',
$content, $matches);
Note that \d matches any digit, and \D matches anything that is not a digit.
[^)]+ matches one or more non-close parens characters; \s+ matches one or more whitespace chars, and \s* matches zero or more whitespace characters.
This wouldn't work very well if you have a city name with a number in it, and if you have a huge string, it's possible it could get hung up somewhere; you might consider splitting it up and matching a bit more piecemeal.
Generally speaking I would avoid .*? as a pattern match, as it basically matches almost anything. It's best for your regular expression to be as specific as possible, based on what you know about the data.

Related

Get all matches with pure regex?

I'm working in PHP and need to parse strings looking like this:
Rake (100) Pot (1000) Players (andy: 10, bob: 20, cindy: 70)
I need to get the rake, pot, and rake contribution per player with names. The number of players is variable. Order is irrelevant so long as I can match player name to rake contribution in a consistent way.
For example I'm looking to get something like this:
Array
(
[0] => Rake (100) Pot (1000) Players (andy: 10, bob: 20, cindy: 70)
[1] => 100
[2] => 1000
[3] => andy
[4] => 10
[5] => bob
[6] => 20
[7] => cindy
[8] => 70
)
I was able to come up with a regex which matches the string but it only returns the last player-rake contribution pair
^Rake \(([0-9]+)\) Pot \(([0-9]+)\) Players \((?:([a-z]*): ([0-9]*)(?:, )?)*\)$
Outputs:
Array
(
[0] => Rake (100) Pot (1000) Players (andy: 10, bob: 20, cindy: 70)
[1] => 100
[2] => 1000
[3] => cindy
[4] => 70
)
I've tried using preg_match_all and g modifiers but to no success. I know preg_match_all would be able to get me what I wanted if I ONLY wanted the player-rake contribution pairs but there is data before that I also require.
Obviously I can use explode and parse the data myself but before going down that route I need to know if/how this can be done with pure regex.
You could use the below regex,
(?:^Rake \(([0-9]+)\) Pot \(([0-9]+)\) Players \(|)(\w+):?\s*(\d+)(?=[^()]*\))
DEMO
| at the last of the first non-capturing group helps the regex engine to match the characters from the remaining string using the pattern which follows the non-capturing group.
I would use the following Regex to validate the input string:
^Rake \((?<Rake>\d+)\) Pot \((?<Pot>\d+)\) Players \(((?:\w*: \d*(?:, )?)+)\)$
And then just use the explode() function on the last capture group to split the players out:
preg_match($regex, $string, $matches);
$players = explode(', ', $matches[2]);

Finding (regex?) 10 digits in a row (PHP)

I am facing a problem i am not capable to solve. I have a string consisting of not needed text and 10 digit numbers who always start with "2" or "6". I need to get those in 10digit numbers into an array. I thought of regex and found this article Regular Expression for matching a numeric sequence? which is pretty close to what i need (except the descending/ascending thing) yet, as i could never and will NEVER be able to understand regex, i cant modify to my needs. If anyone could help me out here i would highly appreciate it!
Here is a sample of my string:
".........693 7098469 - ZQH X Bop. Hrtepou 50 flerpoUrroXn ........210 5014166 - 0E000PA E KapaoAn Anpn-rPou 21
EAArivtg .....................................................210 9618677 - MAPIA KapaoAri Arpn-rptou 21 Elanvolo .. 210 9643623 - MAPIA E ...................................................... 210 9643887 - MAPIA 0 loucrrivou 8 HX.toOrran ..............210 9914534 AIPITAKHE APTEMIOE n Avrtnopou 22
Reptcrrept ....._.........._......._................697 7440896 , -10AN."
Thank you very much in advance!
Greetings from Greece!
As I see your string your digits have an space between, and if you want strictly make your selections this is the regex:
[62]\d{2}\s*\d{7}
Explanation:
[62] # Start with 6 or 2
\d{2} # 2 more digits
\s* # any number of white spaces
\d{7} # 7 more digits
Live demo
and PHP code which has preg_match_all to match all occurrences of those strings:
preg_match_all("/[62]\d{2}\s*\d{7}/", $text, $matches);
Output:
Array
(
[0] => 693 7098469
[1] => 210 5014166
[2] => 210 9618677
[3] => 210 9643623
[4] => 210 9643887
[5] => 210 9914534
[6] => 697 7440896
)
PHP live demo
Maybe like this:
<?php
$x=
".........693 7098469 - ZQH X Bop. Hrtepou 50 flerpoUrroXn ........210 5014166 - 0E000PA E KapaoAn Anpn-rPou 21 EAArivtg ....................................................210 9618677 - MAPIA KapaoAri Arpn-rptou 21 Elanvolo .. 210 9643623 - MAPIA E ...................................................... 210 9643887 - MAPIA 0 loucrrivou 8 HX.toOrran ..............210 9914534 AIPITAKHE APTEMIOE n Avrtnopou 22
Reptcrrept ....._.........._......._................697 7440896 , -10AN.";
$x=str_replace(' ','',$x);
preg_match_all('/((2|6)\d{9})/',$x,$matches);
print_r($matches[0]);
And the result:
Array
(
[0] => 6937098469
[1] => 2105014166
[2] => 2109618677
[3] => 2109643623
[4] => 2109643887
[5] => 2109914534
[6] => 6977440896
)
there is a pretty cool page, that visualize the regex code for better understading:
https://www.debuggex.com/
this should work
((?:2|6)[0-9]{2} [0-9]{7})

strpos and substr on String

I have an HTML file which contains nothing but text. There are no styles or anything.
The text looks like:
ID NAME ANOTHER-ID-11-LETTERS MAJOR
Example:
20 Paul Mark Zedd 10203040506 Software Engineering
ID and ANOTHER-ID-11-LETTER are numbers..
NAME And MAJOR are normal text and also contain spaces.
How can I strip them and make each word or each content in new-line using PHP?
Expected result:
20
Paul Mark Zedd
10203040506
Software Engineering
Looks like the first item is always a number, followed by a space, followed by a name which can be anything, followed by a number which is 11 digits folowed by some more text.
You can use regex and the above details to split the string
$test = preg_match("/([0-9]*?)\s(.*?)([0-9]{11})\s(.*)/is", "20 Paul Mark Zedd 10203040506 Software Engineering",$matchs);
print_r($matchs)
output:
Array
(
[0] => 20 Paul Mark Zedd 10203040506 Software Engineering
[1] => 20
[2] => Paul Mark Zedd
[3] => 10203040506
[4] => Software Engineering
)
Just use a
preg_match:
#([\d]*)\s([a-zA-Z\s]*)\s([\d]*)\s([a-zA-Z\s]*)#
Example output:
array (
0 => '20 Paul Mark Zedd 10203040506 SoftwareEngineering',
1 => '20',
2 => 'Paul Mark Zedd',
3 => '10203040506',
4 => 'SoftwareEngineering',
)

php: how to find numbers in string?

I have some strings like:
some words 1-25 to some words 26-50
more words 1-10
words text and words 30-100
how can I find and get from string all of the "1-25" and the "26-50" and more
If it’s integers, match multiple digits: \d+. To match the whole range expression: (\d+)-(\d+).
Maybe you also want to allow whitespace between the dash and the numbers:
(\d+)\s*-\s*(\d+)
And maybe you want to make sure that the expression stands free, i.e. isn’t part of a word:
\b(\d+)\s*-\s*(\d+)\b
\b is a zero-width match and tests for word boundaries. This expression forbids things
like “Some1 -2text” but allows “Some 1-2 text”.
You can do this with regular expressions:
echo preg_match_all('/([0-9]+)-([0-9]+)/', 'some words 1-25 to some words 26-50 more words 1-10 words text and words 30-100', $matches);
4
print_r($matches);
Array
(
[0] => Array
(
[0] => 1-25
[1] => 26-50
[2] => 1-10
[3] => 30-100
)
[1] => Array
(
[0] => 1
[1] => 26
[2] => 1
[3] => 30
)
[2] => Array
(
[0] => 25
[1] => 50
[2] => 10
[3] => 100
)
)
For each range the first value is in array[1] and the second is in array[2] at the same index.
I think this line is enough
preg_replace("/[^0-9]/","",$string);

Multiline PHP Regex problem

I already tried looking here and in google... but I can't figure out what am I doing wrong :(
I have this text:
C 1 title
comment 1
C 2 title2
comment 2
C 3 title3
comment 3
Now... What I want to do is
Check for the C at the beggining.
Capture the number
Capture the Tile
Capture the comment
I'm trying to use this expression:
preg_match_all("/^C (\d*) (.*)\n(.*)$/im", $body, $match);
but it only works for the first set =(
Any tip on what am I doing wrong???
Thanks!!!!
It works as expected.
The snippet:
<?php
$body = 'C 1 title
comment 1
C 2 title2
comment 2
C 3 title3
comment 3';
preg_match_all("/^C (\d*) (.*)\n(.*)$/im", $body, $match);
print_r($match);
?>
produces the following output:
Array
(
[0] => Array
(
[0] => C 1 title
comment 1
[1] => C 2 title2
comment 2
[2] => C 3 title3
comment 3
)
[1] => Array
(
[0] => 1
[1] => 2
[2] => 3
)
[2] => Array
(
[0] => title
[1] => title2
[2] => title3
)
[3] => Array
(
[0] => comment 1
[1] => comment 2
[2] => comment 3
)
)
as you can see on Ideone.
To keep your matches nicely grouped, you might want to try:
preg_match_all("/^C (\d*) (.*)\n(.*)$/im", $body, $match, PREG_SET_ORDER);
instead.
HTH
EDIT
Ideone runs: PHP Version => 5.2.12-pl0-gentoo
And I also tested it on my machine (and get the same result), which runs: PHP Version => 5.3.3-1ubuntu9.5
But I can't imagine this is a versioning thing (at least, not with 5.x versions). Perhaps your line breaks are Windows style? Try this regex instead:
"/^C +(\d*) +(.*)\r?\n(.*)$/im"
I used the line break \r?\n instead of just \n so that Windows and Unix-style line breaks are matched, and also replaced single spaces with + to account for possible two (or more) spaces.

Categories