How can a REGEX extract a pattern repeatedly? - php

Let's say I want to extract a list of sections from an email that are listed in the format
Section 26, 753, 87, 201, 47
I know that this certain kind of formatting is present in my document but I have no idea where. How can I write a regex that will extract all of the section numbers? (Sorry, I'll post the pattern I already have later.) Currently, it looks for the section phrase, followed by a space, followed by a number. How are the rest extracted? Perhaps 0 or more repetitions of comma, space, number? How exactly is that formatted?

Directly returning a variable number of captures from a regex is not possible with PHP/PCRE (although there are implementations that support this, notably .NET and Perl 6).
With PHP, you have to write code. There are a variety of options - remove matches from the string in a loop, extract the list and then use preg_match_all to get the numbers, and so on - but I think I would just extract the whole list into its own string and use split (well, preg_split) to get the individual section numbers:
$str = 'Section 26, 753, 87, 201, 47';
if (preg_match('/Section\s+(\d+(?:,\s*\d+)*)/', $str, $match)) {
$sections = preg_split('/,\s*/', $match[1]);
}
print_r($sections);
Which gives the desired result:
Array (
[0] => 26
[1] => 753
[2] => 87
[3] => 201
[4] => 47
)

Related

How to properly parse string using preg_match_all

I have some alerts setup, that are emailed to me on a regular occurrence and in those emails I get content that looks like this:
2002 Volkswagen Eurovan Clean title - $2000
That is the general consistent format. Those are also links that are clickable.
I have a script that's setup already that will extract the links from the body string properly, but what I am looking for is basically the year and the price from those titles that come in. There is the possibility of more than one being listed within the email.
So my question is, how can I use preg_match_all to properly grab all the possibilities so that I can then explode them to get the first piece of data (year) and the last piece of data (price)? Would I take the approach to see if I can match based on digits as it's presumed the format will generally be the same?
You can try matching the 4 digits starting with 19 and 20 and name these captures a year, and the digits after $ a price, and use anchors ^ and $ if these values are always at the beginning and end of a string:
^(?'year'\b(?:19|20)\d{2}\b)|(?'price'\$\d+)$
See demo
Sample IDEONE code:
$re = "/^(?'year'\\b(?:19|20)\\d{2}\\b)|(?'price'\\$\\d+)$/";
$str = "2002 Volkswagen Eurovan Clean title - \$2100";
preg_match_all($re, $str, $matches);
print_r(array_filter($matches["year"]));
print_r(array_filter($matches["price"]));
Output:
Array
(
[0] => 2002
)
Array
(
[1] => $2100
)

Regular expression for substring with variable length

I have a lot of strings like:
"1248, 60906068, 4536576, 858687( some text 67, 43, 45)"
And I want to check if the string starts from number and there are brackets in string, in same time I want to get all numbers from the begining to the first bracket. So for this example string result should be like:
[0] => 1248 , [1] => 60906068, [2] => 4536576, [3] => 858687
The point is that in the string after first number at the beginning of the string could be zero additional numbers or one number or even a lot of numbers.
I tried something like that:
^(\d+)(?:,\s?(\d+)?)*\([^\)]+\)$
But it takes only first and last number before brackets.
Is it possible to get all these numbers with only one Regular Expression?
Thank you in advance!
You can use this regex: (\d+)(?:\([^\)]+\))?
All numbers will be captured in Group 1.
See example.
Result:
1248
60906068
4536576
858687

PHP preg_match: comma separated decimals

This regex finds the right string, but only returns the first result. How do I make it search the rest of the text?
$text =",415.2109,520.33970,495.274100,482.3238,741.5634
655.3444,488.29980,741.5634";
preg_match("/[^,]+[\d+][.?][\d+]*/",$text,$data);
echo $data;
Follow up:
I'm pushing the initial expectations of this script, and I'm at the point where I'm pulling out more verbose data. Wasted many hours with this...can anyone shed some light?
heres my string:
155.101.153.123:simple:mass_mid:[479.0807,99.011, 100.876],mass_tol:[30],mass_mode: [1],adducts:[M+CH3OH+H],
130.216.138.250:simple:mass_mid:[290.13465,222.34566],mass_tol:[30],mass_mode:[1],adducts:[M+Na],
and heres my regex:
"/mass_mid:[((?:\d+)(?:.)(?:\d+)(?:,)*)/"
I'm really banging my head on this one! Can someone tell me how to exclude the line mass_mid:[ from the results, and keep the comma seperated values?
Use preg_match_all rather than preg_match
From the PHP Manual:
(`preg_match_all`) searches subject for all matches to the regular expression given in pattern and puts them in matches in the order specified by flags.
After the first match is found, the subsequent searches are continued on from end of the last match.
http://php.net/manual/en/function.preg-match-all.php
Don't use a regex. Use split to split apart your inputs on the commas.
Regexes are not a magic wand you wave at every problem that happens to involve strings.
Description
To extract a list of numeric values which may include a single decimal point, then you could use this regex
\d*\.?\d+
PHP Code Example:
<?php
$sourcestring=",415.2109,520.33970,495.274100,482.3238,741.5634
655.3444,488.29980,741.5634";
preg_match_all('/\d*\.?\d+/im',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
yields matches
$matches Array:
(
[0] => Array
(
[0] => 415.2109
[1] => 520.33970
[2] => 495.274100
[3] => 482.3238
[4] => 741.5634
[5] => 655.3444
[6] => 488.29980
[7] => 741.5634
)
)

PHP Separate two different sections in one input

I'm working on a PHP based application extension that will extend a launcher style app via the TVRage API class to return results to a user wherever they may be. This is done via Alfred App (alfredapp.com).
I would like to add the ability to include show name followed by S##E##:
example: Mike & Molly S01E02
The show name can change, so I can't stop it there, but I want to separate the S##E## from the show name. This will allow me to use that information to continue the search via the API. Even better, if there was a way to grab the numbers, and only the numbers between the S and the E (in the example 01) and the numbers after E (in the example 02) that would be perfect.
I was thinking the best function is strpos but after looking closer that searches for a string within a string. I believe I would need to use a regex to correctly do this. That would leave me with preg_match. Which led me to:
$regex = ?;
preg_match( ,$input);
Problem is I just don't understand Regular Expressions well enough to write it. What regular expression could be used to separate the show name from the S##E## or get just the two separate numbers?
Also, if you have a good place to teach regular expressions, that would be fantastic.
Thanks!
You can turn it around and use strrpos to look for the last space in the string and then use substr to get two strings based on the position you found.
Example:
$your_input = trim($input); // make sure there are no spaces at the end (and the beginning)
$last_space_at = strrpos($your_input, " ");
$show = substr($your_input, 0, $last_space_at - 1);
$episode = substr($your_input, $last_space_at + 1);
Regex:
$text = 'Mike & Molly S01E02';
preg_match("/(.+)(S\d{2}E\d{2})/", $text, $output);
print_r($output);
Output:
Array
(
[0] => Mike & Molly S01E02
[1] => Mike & Molly
[2] => S01E02
)
If you want the digits separately:
$text = 'Mike & Molly S01E02';
preg_match("/(.+)S(\d{2})E(\d{2})/", $text, $output);
print_r($output);
Output:
Array
(
[0] => Mike & Molly S01E02
[1] => Mike & Molly
[2] => 01
[3] => 02
)
Explanation:
. --> Match every character
.+ --> Match every character one or more times
\d --> Match a digit
\d{2} --> Match 2 digits
The parenthesis are to group the results.
www.regular-expressions.info is a good place to learn regex.

Split in series PHP

i have this string ++++++1DESIGNRESULTSM25Fe415(Main)
and i have similar string about 2000 lines from which i want to split these..
[++++++] [1] [DESIGNRESULTS] [M25] [Fe415] [(Main)]
from the pattern only the 2nd 4h and 5th value changes
eg.. ++++++2DESIGNRESULTSM30Fe418(Main) etc..
what i actually want is:
Split the first value [++++++]
Split the value after 4 Character of [DESIGNRESULTS] so ill get this [M25]
Split the value before 4 Character of [(Main)] so ill get this [Fe415]
After all this done store the final chunk of piece in an array.
the similar output what i want is
Array ( [0] => 1 [1] => M25 [2] => Fe415 )
Please help me with this...
Thanks in advance :)
Your data split needs are a bit unclear. A regular expression that will get separate matches on each of the chunks you first specify:
(\++)(\d)(DESIGNRESULTS)(M\d\d)(Fe\d\d\d)(\(Main\))
If you only need the two you are asking for at the end, you can use
(\d)DESIGNRESULTS(M\d\d)(Fe\d\d\d)
You could also replace \d\d with \d+ if the number of digits is unknown.
However, based on your examples it looks like each string chunk is a consistent length. It would be even faster to use
array(
substr($string, 6, 1)
//...
)
How about this
$str = "++++++1DESIGNRESULTSM25Fe415(Main)";
$match = array();
preg_match("/^\+{0,}(\d)DESIGNRESULTS(\w{3})(\w{5})/",$str,$match);
array_shift($match);
print_r($match);

Categories