How to properly parse string using preg_match_all

How to properly parse string using preg_match_all - php

I have some alerts setup, that are emailed to me on a regular occurrence and in those emails I get content that looks like this:
2002 Volkswagen Eurovan Clean title - $2000
That is the general consistent format. Those are also links that are clickable.
I have a script that's setup already that will extract the links from the body string properly, but what I am looking for is basically the year and the price from those titles that come in. There is the possibility of more than one being listed within the email.
So my question is, how can I use preg_match_all to properly grab all the possibilities so that I can then explode them to get the first piece of data (year) and the last piece of data (price)? Would I take the approach to see if I can match based on digits as it's presumed the format will generally be the same?

You can try matching the 4 digits starting with 19 and 20 and name these captures a year, and the digits after $ a price, and use anchors ^ and $ if these values are always at the beginning and end of a string:
^(?'year'\b(?:19|20)\d{2}\b)|(?'price'\$\d+)$
See demo
Sample IDEONE code:
$re = "/^(?'year'\\b(?:19|20)\\d{2}\\b)|(?'price'\\$\\d+)$/";
$str = "2002 Volkswagen Eurovan Clean title - \$2100";
preg_match_all($re, $str, $matches);
print_r(array_filter($matches["year"]));
print_r(array_filter($matches["price"]));
Output:
Array
(
[0] => 2002
)
Array
(
[1] => $2100
)

Related

How can I get a unknown number from a string within a certain part of the string?

My string:
How would you rate the ease and comfort required to undertake the session?#QUESTION_VALUE_0
How would I be able to get this value specifically from the above string? I don't know what this value will be (apart from that it will be an integer):
(some question)#QUESTION_VALUE_X where X is an integer, I want to get X.
I looked into Regex, but I suck at regular expressions, so I'm at a loss, cheers guys!
About as far as I got with regex
/#QUESTION_VALUE_[0-9]+/
But I can't get the number out of the string. How can I only grab the number?

This should work for you:
Just put the escape sequence \d (which means 0-9) with the quantifier + (which means 1 or more times) into a group (()) to capture the the number which you then can access in the array $m.
<?php
$str = "How would you rate the ease and comfort required to undertake the session?#QUESTION_VALUE_0";
preg_match("/#QUESTION_VALUE_(\d+)/", $str, $m);
echo $m[1];
?>
output:
0
If you do print_r($m); you will see the structure of your array:
Array
(
[0] => #QUESTION_VALUE_0
[1] => 0
)
And now you see ^ that you have the full match in the first element and then first group ((\d+)) in the second element.

Regular expression - Extracting strings from a repeating format

EDIT - SOLVED:
Thanks for the responses - I learned that this is actually in serialised format and that there's no need to process it using RegEx.
Apologies for the newb question - and I have tried many many variations, based on StackOverflow answers with no luck. I've also spent a while experimenting with an online Regex tool to try and solve this myself.
This is the string I'm checking :
i:0;s:1:"1";i:1;s:1:"3";i:2;s:1:"5";i:3;s:1:"6";
I'll settle for matching these strings :
i:0;s:1:"1";
i:1;s:1:"3";
i:2;s:1:"5";
i:3;s:1:"6";
But ideally I would like to capture all the values between quotes only.
(there could be anywhere between 1-10 of these kinds of entries)
i.e. regex_result = [1,3,5,6]
These are some of the regexes I've tried.
I've only been able to either capture the first match, or the last match, but not all matches - I'm confused as to why the regex isnt "repeating" as I'd expected:
(i:.;s:1:".";)*
(i:.;s:1:".";)+
(i:.;s:1:".";)+?
Thanks

You can use this regex.
/(?<=:")\d+(?=";)/g
DEMO

"([^"]*)"
Try this .See demo.
http://regex101.com/r/hQ1rP0/43

You need to use \G so that it would get the number within double quotes which was preceded by i:.;s:1:"(Here the dot after i: represents any character). The anchor \G matches at the position where the previous match ended.
<?php
$string = 'i:0;s:1:"1";i:1;s:1:"3";i:2;s:1:"5";i:3;s:1:"6";';
echo preg_match_all('~(?:i:.;s:1:"|(?<!^)\G)(.)(?=";)~', $string, $match);
print_r($match[1]);
?>
Output:
4Array
(
[0] => 1
[1] => 3
[2] => 5
[3] => 6
)
DEMO

how to get the string between two character in this case?

I want to get the string between "yes""yes"
eg.
yes1231yesyes4567yes
output:
1231,4567
How to do it in php?
may be there are 1 more output '' between yesyes right?

In this particular example, you could use preg_match_all() with a regex to capture digits between "yes" and "yes":
preg_match_all("/yes(\d+)yes/", $your_string, $output_array);
print_r($output_array[1]);
// Array
// (
// [0] => 1231
// [1] => 4567
// )
And to achieve your desired output:
echo implode(',', $output_array[1]); // 1231,4567
Edit: side reference, if you need a looser match that will simply match all sets of numbers in a string e.g. in your comment 9yes123yes12, use the regex: (\d+) and it will match 9, 123 and 12.

Good Evening,
If you are referring to extracting the digits (as shown in your example) then you can achieve this by referencing the response of Christopher who answered a very similar question on this topic:
PHP code to remove everything but numbers
However, on the other hand, if you are looking to extract all instances of the word "yes" from the string, I would recommend using the str_replace method supplied by PHP.
Here is an example that would get you started;
str_replace("yes", "", "yes I do like cream on my eggs");
I trust that this information is of use.

Parsing non-node, intermittent XML values using regex

This is a question for the regex gurus.
If I have a series of xml nodes, I would like to parse out (using regex) the contained node values that exist on the same level as my current node. For instance, if I have:
<top-node>
Hi
<second-node>
Hello
<inner-node>
</inner-node>
</second-node>
Hey
<third-node>
Foo
</third-node>
Bar
<top-node>
I would like to retrieve an array that is:
array(
1 => 'Hi',
2 => 'Hey',
3 => 'Bar'
)
I know I can start with
$inside = preg_match('~<(\S+).*?>(?P<inside>(.|\s)*)</\1>~', $original_text);
and that will retrieve the text sans the top-node.
However, the next step is a bit beyond my regex abilities.
EDIT: Actually, that preg_match appears only to work if the $original_text is all on the same line. Additionally, I think I can use a preg_split with a very similar regex to retrieve what I am looking for- it just isn't working across multiple lines.
NOTE: I appreciate and will oblige any requests for clarification; however, my question is pretty specific and I mean what I am asking, so don't give an answer like "go use SimpleXML" or something. Thank you for any and all assistance.

Description
This regex will capture the first level of text
(?:[\s\r\n]*<([^>\s]+)\s?(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>.*?<\/\1>)?[\s\r\n]*\K(?!\Z)(?:(?![\s\r\n]*(?:<|\Z)).)*1
Expanded
(?:[\s\r\n]*<([^>\s]+)\s?(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>.*?<\/\1>)? # match any open tags until the close tags if they exist
[\s\r\n]* # match any leading spaces or new line characters
\K # reset the capture and only capture the desired substring which follows
(?!\Z) # validate substring is not the end of the string, this prevents the phantom empty array value at the end
(?:(?![\s\r\n]*(?:<|\Z)).)* # capture the text inside the current substring, this expression is self limiting and will stop when it sees whitespace ahead followed by end of string or a new tag
Example
Sample Text
This is assuming you've removed the first top level tags
Hi
<second-node>
Hello
<inner-node>
</inner-node>
</second-node>
Hey
<third-node>
Foo
</third-node>
Bar
Capture Groups
0: is the actual captured group
1: is the name of the subtag which is then back referenced inside the regex
[0] => Array
(
[0] => Hi
[1] => Hey
[2] => Bar
)
[1] => Array
(
[0] =>
[1] => second-node
[2] => third-node
)
Disclaimer
This solution will get hung up on nested structures like:
Hi
<second-node>
Hello
<second-node>
</second-node>
This string will be found
</second-node>
Hey

Based on your own idea, using a preg_split I came up with:
$raw="<top-node>
Hi
<second-node>
Hello
<inner-node>
</inner-node>
</second-node>
Hey
<third-node>
Foo
</third-node>
Bar
</top-node>";
$reg='~<(\S+).*?>(.*?)</\1>~s';
preg_match_all($reg, $raw, $res);
$res = explode(chr(31), preg_replace($reg, chr(31), $res[2][0]));
Note, chr(31) is the 'unit seperator'
Testing resulting array with:
echo ("<xmp>start\n" . print_r($res, true) . "\nfin</xmp>");
That seems to work for 1 node, giving you the array you asked for, but it will probably have all sorts of problems with it.. You might want to trim the returned values to.
EDIT:
Denomales' answer is probably better..

PHP Separate two different sections in one input

I'm working on a PHP based application extension that will extend a launcher style app via the TVRage API class to return results to a user wherever they may be. This is done via Alfred App (alfredapp.com).
I would like to add the ability to include show name followed by S##E##:
example: Mike & Molly S01E02
The show name can change, so I can't stop it there, but I want to separate the S##E## from the show name. This will allow me to use that information to continue the search via the API. Even better, if there was a way to grab the numbers, and only the numbers between the S and the E (in the example 01) and the numbers after E (in the example 02) that would be perfect.
I was thinking the best function is strpos but after looking closer that searches for a string within a string. I believe I would need to use a regex to correctly do this. That would leave me with preg_match. Which led me to:
$regex = ?;
preg_match( ,$input);
Problem is I just don't understand Regular Expressions well enough to write it. What regular expression could be used to separate the show name from the S##E## or get just the two separate numbers?
Also, if you have a good place to teach regular expressions, that would be fantastic.
Thanks!

You can turn it around and use strrpos to look for the last space in the string and then use substr to get two strings based on the position you found.
Example:
$your_input = trim($input); // make sure there are no spaces at the end (and the beginning)
$last_space_at = strrpos($your_input, " ");
$show = substr($your_input, 0, $last_space_at - 1);
$episode = substr($your_input, $last_space_at + 1);

Regex:
$text = 'Mike & Molly S01E02';
preg_match("/(.+)(S\d{2}E\d{2})/", $text, $output);
print_r($output);
Output:
Array
(
[0] => Mike & Molly S01E02
[1] => Mike & Molly
[2] => S01E02
)
If you want the digits separately:
$text = 'Mike & Molly S01E02';
preg_match("/(.+)S(\d{2})E(\d{2})/", $text, $output);
print_r($output);
Output:
Array
(
[0] => Mike & Molly S01E02
[1] => Mike & Molly
[2] => 01
[3] => 02
)
Explanation:
. --> Match every character
.+ --> Match every character one or more times
\d --> Match a digit
\d{2} --> Match 2 digits
The parenthesis are to group the results.
www.regular-expressions.info is a good place to learn regex.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to properly parse string using preg_match_all - php

Related

How can I get a unknown number from a string within a certain part of the string?

Regular expression - Extracting strings from a repeating format

how to get the string between two character in this case?

Parsing non-node, intermittent XML values using regex

PHP Separate two different sections in one input

Categories

Resources