Regex pattern + preg_split keep delimiter [duplicate] - php

This question already has an answer here:
PHP preg_split while keeping delimiter at the start of array element
(1 answer)
Closed 3 years ago.
I read posts about regex and preg_split but nothing fitting to my interests.
I have the following text:
C01G01 Jon Doe Kenny Ranny C02G02 Ramsay John C02G03 Alice Axel
I want to use preg_split where the delimiter is C(number)G(number) and also keep the delimiter in the array.
What I have done:
$parts = preg_split('/C+[0-200]+G+[0-200]/',
$students,-1,PREG_SPLIT_DELIM_CAPTURE);
What Is returning:
Array(
[0] =>
[1] => 1 Jon Doe Kenny Ranny
[2] => 2 Ramsay John
[3] => 3 Alice Axel
)
What I expect to return:
Array(
[0] =>
[1] => C01G01 Jon Doe Kenny Ranny
[2] => C02G02 Ramsay John
[3] => C02G03 Alice Axel
)

\s(?=C+\d+G+\d+)
You can use something of this sort.See demo.
https://regex101.com/r/qyCwCN/1

Related

regex for preg_match_all to get only 10 chars in string [duplicate]

This question already has answers here:
Using explicitly numbered repetition instead of question mark, star and plus
(4 answers)
Closed 4 years ago.
Using my code bellow, How can I get results array only for users who have 10 chars in address with preg_match_all and regular expressions?
this is my code
$data = 'Maria address is QwMP_jkRkM and lives in Peru, Joseph address is QMPjkRk2ZM and lives in Peru, Miguel address is Q.wMP_jkRljo_hkM and lives in New York, George address is hdiJoW58_7 and lives in Austria';
preg_match_all('#(.*?) address is (.*?) and lives in (.*?)#', $data, $output);
Actually returns all matches, I need to remove the results that contain more than 10 characters in their address.
Note: I should not use foreach
The constraints seem artificial, but this should produce the correct output (if awkwardly) and it does use (.*?):
$data = 'Maria address is QwMP_jkRkM and lives in Peru, Joseph address is QMPjkRk2ZM and lives in Peru, Miguel address is Q.wMP_jkRljo_hkM and lives in New York, George address is hdiJoW58_7 and lives in Austria';
preg_match_all('#([^ ]*?) address is (.{1,10}) and lives in (.*?)(?:$|,)#', $data, $output);
print_r($output);
Result:
Array
(
[0] => Array
(
[0] => Maria address is QwMP_jkRkM and lives in Peru,
[1] => Joseph address is QMPjkRk2ZM and lives in Peru,
[2] => George address is hdiJoW58_7 and lives in Austria
)
[1] => Array
(
[0] => Maria
[1] => Joseph
[2] => George
)
[2] => Array
(
[0] => QwMP_jkRkM
[1] => QMPjkRk2ZM
[2] => hdiJoW58_7
)
[3] => Array
(
[0] => Peru
[1] => Peru
[2] => Austria
)
)

How to split a string in multiple ones (Php)?

I want to split a big number/string for example 123456789123456789 into 6 smaller strings/numbers of 3 characters each. So the result would be 123 456 789 123 456 789. How can I do this?
Use chunk_split():
$var = "123456789123456789";
$split_string = chunk_split($var, 3); // 3 is the length of each chunk
If you want your result as an array, you can use str_split():
$var = "123456789123456789";
$array = str_split($var, 3); // 3 is the length of each chunk
You may use chunk_split() function.
It splits a string into smaller
$string = "123456789123456789";
echo chunk_split ($string, 3, " ");
will output
123 456 789 123 456 789
First parameter is the string to be chunked. The second is the chunk length and the third is what you want at the end of each chunk.
See PHP manual for further information
You could do something like this:
$string = '123456789123456789';
preg_match_all('/(\d{3})/', $string, $matches);
print_r($matches[1]);
Output:
Array
(
[0] => 123
[1] => 456
[2] => 789
[3] => 123
[4] => 456
[5] => 789
)
\d is a number and {3} is 3 of the previously found character (in this case a number.
....
or if there won't always be even groupings:
$string = '12345678912345678922';
preg_match_all('/(\d{1,3})/', $string, $matches);
print_r($matches[1]);
Output:
Array
(
[0] => 123
[1] => 456
[2] => 789
[3] => 123
[4] => 456
[5] => 789
[6] => 22
)
Demo: https://regex101.com/r/rX0pJ1/1

strpos and substr on String

I have an HTML file which contains nothing but text. There are no styles or anything.
The text looks like:
ID NAME ANOTHER-ID-11-LETTERS MAJOR
Example:
20 Paul Mark Zedd 10203040506 Software Engineering
ID and ANOTHER-ID-11-LETTER are numbers..
NAME And MAJOR are normal text and also contain spaces.
How can I strip them and make each word or each content in new-line using PHP?
Expected result:
20
Paul Mark Zedd
10203040506
Software Engineering
Looks like the first item is always a number, followed by a space, followed by a name which can be anything, followed by a number which is 11 digits folowed by some more text.
You can use regex and the above details to split the string
$test = preg_match("/([0-9]*?)\s(.*?)([0-9]{11})\s(.*)/is", "20 Paul Mark Zedd 10203040506 Software Engineering",$matchs);
print_r($matchs)
output:
Array
(
[0] => 20 Paul Mark Zedd 10203040506 Software Engineering
[1] => 20
[2] => Paul Mark Zedd
[3] => 10203040506
[4] => Software Engineering
)
Just use a
preg_match:
#([\d]*)\s([a-zA-Z\s]*)\s([\d]*)\s([a-zA-Z\s]*)#
Example output:
array (
0 => '20 Paul Mark Zedd 10203040506 SoftwareEngineering',
1 => '20',
2 => 'Paul Mark Zedd',
3 => '10203040506',
4 => 'SoftwareEngineering',
)

RegEx Statement Issues - PHP

I am attempting to use RegEx to strip down the following data:
mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109
I am hoping to split it apart by home team (first city), home score (first digit), away team (second city), away score (second digit), and where in the game it is (in parenthesis). This is the RegEx I have currently, but am feeling is very wrong.
preg_match_all('/mlb_s_left[0-9]=(?P<hometeam>.*?) (?P<homescore>.*?) (?P<awayteam>.*?) (?P<awayscore>.*?)\((?P<time>.*?)\)/', $content, $matches);
I would appreciate any and all help in getting this working.
I have tested following code snippet in php 5.4.5:
<?php
$foo = 'mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109';
preg_match_all('/mlb_s_left\d=\^?(?P<hometeam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<homescore>\d+)\s+\^?(?P<awayteam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<awayscore>\d+)\s+\((?P<time>\w+)\)/', $foo, $matches, PREG_SET_ORDER);
print_r($matches);
?>
output:
Array
(
[0] => Array
(
[0] => mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)
[hometeam] => Baltimore
[1] => Baltimore
[homescore] => 3
[2] => 3
[awayteam] => NY Yankees
[3] => NY Yankees
[awayscore] => 12
[4] => 12
[time] => FINAL
[5] => FINAL
)
[1] => Array
(
[0] => mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)
[hometeam] => Chicago Sox
[1] => Chicago Sox
[homescore] => 3
[2] => 3
[awayteam] => Minnesota
[3] => Minnesota
[awayscore] => 2
[4] => 2
[time] => FINAL
[5] => FINAL
)
)
Something like this should get you close.
preg_match_all('/mlb_s_left\d+=(?P<hometeam>\D+)\s+(?P<homescore>\d+)\s+(?P<awayteam>\D+)\s+(?P<awayscore>\d+)\s*\((?P<time>[^)]+)\)/',
$content, $matches);
Note that \d matches any digit, and \D matches anything that is not a digit.
[^)]+ matches one or more non-close parens characters; \s+ matches one or more whitespace chars, and \s* matches zero or more whitespace characters.
This wouldn't work very well if you have a city name with a number in it, and if you have a huge string, it's possible it could get hung up somewhere; you might consider splitting it up and matching a bit more piecemeal.
Generally speaking I would avoid .*? as a pattern match, as it basically matches almost anything. It's best for your regular expression to be as specific as possible, based on what you know about the data.

Multiline PHP Regex problem

I already tried looking here and in google... but I can't figure out what am I doing wrong :(
I have this text:
C 1 title
comment 1
C 2 title2
comment 2
C 3 title3
comment 3
Now... What I want to do is
Check for the C at the beggining.
Capture the number
Capture the Tile
Capture the comment
I'm trying to use this expression:
preg_match_all("/^C (\d*) (.*)\n(.*)$/im", $body, $match);
but it only works for the first set =(
Any tip on what am I doing wrong???
Thanks!!!!
It works as expected.
The snippet:
<?php
$body = 'C 1 title
comment 1
C 2 title2
comment 2
C 3 title3
comment 3';
preg_match_all("/^C (\d*) (.*)\n(.*)$/im", $body, $match);
print_r($match);
?>
produces the following output:
Array
(
[0] => Array
(
[0] => C 1 title
comment 1
[1] => C 2 title2
comment 2
[2] => C 3 title3
comment 3
)
[1] => Array
(
[0] => 1
[1] => 2
[2] => 3
)
[2] => Array
(
[0] => title
[1] => title2
[2] => title3
)
[3] => Array
(
[0] => comment 1
[1] => comment 2
[2] => comment 3
)
)
as you can see on Ideone.
To keep your matches nicely grouped, you might want to try:
preg_match_all("/^C (\d*) (.*)\n(.*)$/im", $body, $match, PREG_SET_ORDER);
instead.
HTH
EDIT
Ideone runs: PHP Version => 5.2.12-pl0-gentoo
And I also tested it on my machine (and get the same result), which runs: PHP Version => 5.3.3-1ubuntu9.5
But I can't imagine this is a versioning thing (at least, not with 5.x versions). Perhaps your line breaks are Windows style? Try this regex instead:
"/^C +(\d*) +(.*)\r?\n(.*)$/im"
I used the line break \r?\n instead of just \n so that Windows and Unix-style line breaks are matched, and also replaced single spaces with + to account for possible two (or more) spaces.

Categories