I have an HTML file which contains nothing but text. There are no styles or anything.
The text looks like:
ID NAME ANOTHER-ID-11-LETTERS MAJOR
Example:
20 Paul Mark Zedd 10203040506 Software Engineering
ID and ANOTHER-ID-11-LETTER are numbers..
NAME And MAJOR are normal text and also contain spaces.
How can I strip them and make each word or each content in new-line using PHP?
Expected result:
20
Paul Mark Zedd
10203040506
Software Engineering
Looks like the first item is always a number, followed by a space, followed by a name which can be anything, followed by a number which is 11 digits folowed by some more text.
You can use regex and the above details to split the string
$test = preg_match("/([0-9]*?)\s(.*?)([0-9]{11})\s(.*)/is", "20 Paul Mark Zedd 10203040506 Software Engineering",$matchs);
print_r($matchs)
output:
Array
(
[0] => 20 Paul Mark Zedd 10203040506 Software Engineering
[1] => 20
[2] => Paul Mark Zedd
[3] => 10203040506
[4] => Software Engineering
)
Just use a
preg_match:
#([\d]*)\s([a-zA-Z\s]*)\s([\d]*)\s([a-zA-Z\s]*)#
Example output:
array (
0 => '20 Paul Mark Zedd 10203040506 SoftwareEngineering',
1 => '20',
2 => 'Paul Mark Zedd',
3 => '10203040506',
4 => 'SoftwareEngineering',
)
Related
This question already has an answer here:
PHP preg_split while keeping delimiter at the start of array element
(1 answer)
Closed 3 years ago.
I read posts about regex and preg_split but nothing fitting to my interests.
I have the following text:
C01G01 Jon Doe Kenny Ranny C02G02 Ramsay John C02G03 Alice Axel
I want to use preg_split where the delimiter is C(number)G(number) and also keep the delimiter in the array.
What I have done:
$parts = preg_split('/C+[0-200]+G+[0-200]/',
$students,-1,PREG_SPLIT_DELIM_CAPTURE);
What Is returning:
Array(
[0] =>
[1] => 1 Jon Doe Kenny Ranny
[2] => 2 Ramsay John
[3] => 3 Alice Axel
)
What I expect to return:
Array(
[0] =>
[1] => C01G01 Jon Doe Kenny Ranny
[2] => C02G02 Ramsay John
[3] => C02G03 Alice Axel
)
\s(?=C+\d+G+\d+)
You can use something of this sort.See demo.
https://regex101.com/r/qyCwCN/1
I'm working in PHP and need to parse strings looking like this:
Rake (100) Pot (1000) Players (andy: 10, bob: 20, cindy: 70)
I need to get the rake, pot, and rake contribution per player with names. The number of players is variable. Order is irrelevant so long as I can match player name to rake contribution in a consistent way.
For example I'm looking to get something like this:
Array
(
[0] => Rake (100) Pot (1000) Players (andy: 10, bob: 20, cindy: 70)
[1] => 100
[2] => 1000
[3] => andy
[4] => 10
[5] => bob
[6] => 20
[7] => cindy
[8] => 70
)
I was able to come up with a regex which matches the string but it only returns the last player-rake contribution pair
^Rake \(([0-9]+)\) Pot \(([0-9]+)\) Players \((?:([a-z]*): ([0-9]*)(?:, )?)*\)$
Outputs:
Array
(
[0] => Rake (100) Pot (1000) Players (andy: 10, bob: 20, cindy: 70)
[1] => 100
[2] => 1000
[3] => cindy
[4] => 70
)
I've tried using preg_match_all and g modifiers but to no success. I know preg_match_all would be able to get me what I wanted if I ONLY wanted the player-rake contribution pairs but there is data before that I also require.
Obviously I can use explode and parse the data myself but before going down that route I need to know if/how this can be done with pure regex.
You could use the below regex,
(?:^Rake \(([0-9]+)\) Pot \(([0-9]+)\) Players \(|)(\w+):?\s*(\d+)(?=[^()]*\))
DEMO
| at the last of the first non-capturing group helps the regex engine to match the characters from the remaining string using the pattern which follows the non-capturing group.
I would use the following Regex to validate the input string:
^Rake \((?<Rake>\d+)\) Pot \((?<Pot>\d+)\) Players \(((?:\w*: \d*(?:, )?)+)\)$
And then just use the explode() function on the last capture group to split the players out:
preg_match($regex, $string, $matches);
$players = explode(', ', $matches[2]);
I am facing a problem i am not capable to solve. I have a string consisting of not needed text and 10 digit numbers who always start with "2" or "6". I need to get those in 10digit numbers into an array. I thought of regex and found this article Regular Expression for matching a numeric sequence? which is pretty close to what i need (except the descending/ascending thing) yet, as i could never and will NEVER be able to understand regex, i cant modify to my needs. If anyone could help me out here i would highly appreciate it!
Here is a sample of my string:
".........693 7098469 - ZQH X Bop. Hrtepou 50 flerpoUrroXn ........210 5014166 - 0E000PA E KapaoAn Anpn-rPou 21
EAArivtg .....................................................210 9618677 - MAPIA KapaoAri Arpn-rptou 21 Elanvolo .. 210 9643623 - MAPIA E ...................................................... 210 9643887 - MAPIA 0 loucrrivou 8 HX.toOrran ..............210 9914534 AIPITAKHE APTEMIOE n Avrtnopou 22
Reptcrrept ....._.........._......._................697 7440896 , -10AN."
Thank you very much in advance!
Greetings from Greece!
As I see your string your digits have an space between, and if you want strictly make your selections this is the regex:
[62]\d{2}\s*\d{7}
Explanation:
[62] # Start with 6 or 2
\d{2} # 2 more digits
\s* # any number of white spaces
\d{7} # 7 more digits
Live demo
and PHP code which has preg_match_all to match all occurrences of those strings:
preg_match_all("/[62]\d{2}\s*\d{7}/", $text, $matches);
Output:
Array
(
[0] => 693 7098469
[1] => 210 5014166
[2] => 210 9618677
[3] => 210 9643623
[4] => 210 9643887
[5] => 210 9914534
[6] => 697 7440896
)
PHP live demo
Maybe like this:
<?php
$x=
".........693 7098469 - ZQH X Bop. Hrtepou 50 flerpoUrroXn ........210 5014166 - 0E000PA E KapaoAn Anpn-rPou 21 EAArivtg ....................................................210 9618677 - MAPIA KapaoAri Arpn-rptou 21 Elanvolo .. 210 9643623 - MAPIA E ...................................................... 210 9643887 - MAPIA 0 loucrrivou 8 HX.toOrran ..............210 9914534 AIPITAKHE APTEMIOE n Avrtnopou 22
Reptcrrept ....._.........._......._................697 7440896 , -10AN.";
$x=str_replace(' ','',$x);
preg_match_all('/((2|6)\d{9})/',$x,$matches);
print_r($matches[0]);
And the result:
Array
(
[0] => 6937098469
[1] => 2105014166
[2] => 2109618677
[3] => 2109643623
[4] => 2109643887
[5] => 2109914534
[6] => 6977440896
)
there is a pretty cool page, that visualize the regex code for better understading:
https://www.debuggex.com/
this should work
((?:2|6)[0-9]{2} [0-9]{7})
I am attempting to use RegEx to strip down the following data:
mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109
I am hoping to split it apart by home team (first city), home score (first digit), away team (second city), away score (second digit), and where in the game it is (in parenthesis). This is the RegEx I have currently, but am feeling is very wrong.
preg_match_all('/mlb_s_left[0-9]=(?P<hometeam>.*?) (?P<homescore>.*?) (?P<awayteam>.*?) (?P<awayscore>.*?)\((?P<time>.*?)\)/', $content, $matches);
I would appreciate any and all help in getting this working.
I have tested following code snippet in php 5.4.5:
<?php
$foo = 'mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109';
preg_match_all('/mlb_s_left\d=\^?(?P<hometeam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<homescore>\d+)\s+\^?(?P<awayteam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<awayscore>\d+)\s+\((?P<time>\w+)\)/', $foo, $matches, PREG_SET_ORDER);
print_r($matches);
?>
output:
Array
(
[0] => Array
(
[0] => mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)
[hometeam] => Baltimore
[1] => Baltimore
[homescore] => 3
[2] => 3
[awayteam] => NY Yankees
[3] => NY Yankees
[awayscore] => 12
[4] => 12
[time] => FINAL
[5] => FINAL
)
[1] => Array
(
[0] => mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)
[hometeam] => Chicago Sox
[1] => Chicago Sox
[homescore] => 3
[2] => 3
[awayteam] => Minnesota
[3] => Minnesota
[awayscore] => 2
[4] => 2
[time] => FINAL
[5] => FINAL
)
)
Something like this should get you close.
preg_match_all('/mlb_s_left\d+=(?P<hometeam>\D+)\s+(?P<homescore>\d+)\s+(?P<awayteam>\D+)\s+(?P<awayscore>\d+)\s*\((?P<time>[^)]+)\)/',
$content, $matches);
Note that \d matches any digit, and \D matches anything that is not a digit.
[^)]+ matches one or more non-close parens characters; \s+ matches one or more whitespace chars, and \s* matches zero or more whitespace characters.
This wouldn't work very well if you have a city name with a number in it, and if you have a huge string, it's possible it could get hung up somewhere; you might consider splitting it up and matching a bit more piecemeal.
Generally speaking I would avoid .*? as a pattern match, as it basically matches almost anything. It's best for your regular expression to be as specific as possible, based on what you know about the data.
I have a string like this:
Tickets order: № 123123123. CED-MSW-RPG-MOW-CEK PODYLOVA/ALEMR 555
423578932 19OCT11 Tickets order: № 123123123. 346257.
CSK-MOW-PRG-MOW-CWQ PODYLOVA/ALEMR 555 45837043 19OCT11
I need to collect all codes that are CEK, MOW, PRG and so on. I tried this pattern firstly:
$pattern = '#[-|\s]([A-Z]{3})#';
As result a get all my codes (that's ok) and the first 3 chars of users surname: "POD" from "PODYLOVA". If i say "after my code must be an hyphen or free space char by changing my pattern to this:
$pattern = '#[-|\s]([A-Z]{3})[-|\s]#';
My $matches var has this:
array (
0 =>
array (
0 => ' CED-',
1 => '-RPG-',
2 => '-CEK ',
3 => ' CSK-',
4 => '-PRG-',
5 => '-CWQ ',
),
1 =>
array (
0 => 'CED',
1 => 'RPG',
2 => 'CEK',
3 => 'CSK',
4 => 'PRG',
5 => 'CWQ',
),
)
You can see, that my pattern doesn't "share" the hyphen between desired codes.
I see two solutions, but cannot imaging the pattern, which will suit:
Make the pattern to share the hyphen between codes
Make more complicated pattern: firstly collect the text which contains codes ("CED-MSW-RPG-MOW-CEK") and then get all #([A-Z]{3}# inside this pattern.
It seems, that solution#1 is the best in my case, but how it should look?
Try this:
\b([A-Z]{3})\b
HTH
does this give you what you want?
(?<=-|\s)[A-Z]{3}(?=-|\s)
tested with grep:
kent$ echo "Tickets order: № 123123123. CED-MSW-RPG-MOW-CEK PODYLOVA/ALEMR 555 423578932 19OCT11 Tickets order: № 123123123. 346257. CSK-MOW-PRG-MOW-CWQ PODYLOVA/ALEMR 555 45837043 19OCT11"|grep -Po '(?<=-|\s)[A-Z]{3}(?=-|\s)'
CED
MSW
RPG
MOW
CEK
CSK
MOW
PRG
MOW
CWQ