PHP Regexp: ignoring everything before a defined substring - php

I'm trying to parse a web page.
Basically it gets stored in a string that will look like this:
"[HTML CODE ...]world:[HTML CODE ...]my_number[REST OF HTML_CODE ...]"
Of course "world:" and "MY_NUMBER" are part of the html code, however I would like to ignore everything before the first occurrence of "world:". What I need is the first number that appears after the first occurrence of "world:", keeping in mind that a bunch of html code will be between those.
I could substring the html code but I would like to do this all just by using a single regex if possible.
This is the regular expression I tried to match:
'/(?<=world:)\D+?[0-9]+/'
But this returns me all the html stuff between "world:" and my number.
Thanks!

I think you were close to getting it. I was able to use this on the string you provided.
$subject = "[HTML CODE ...]world:[HTML CODE ...]3334[REST OF HTML_CODE ...]";
$pattern = "/world:\D+?(?<my_number>[0-9]+)/";
$matches = array();
$result = preg_match_all($pattern, $subject, &$matches);
print_r($matches);
Results in:
Array
(
[0] => Array
(
[0] => world:[HTML CODE ...]3334
)
[my_number] => Array
(
[0] => 3334
)
[1] => Array
(
[0] => 3334
)
)

Related

Finding values in a string via regex in php

I am trying to get information out of a textarea that contains certain strings (e.g. [name]) and find each item encased in the square brackets using regex patterns (currently tried using preg_match, preg_split, preg_quote, preg_match_all). It seems that the problem is in my regex pattern that I am providing for it.
My current regex:
$menuItems = preg_match_all('/[^[][([^[].*)]/U', $_SESSION['emailBody'], $menuItems);
I have tried many other patterns e.g.
/(?[...]\w+): (?[...]\d+)/
Any help that can be provided with this is greatly appreciated.
EDIT:
Sample input:
[email] address [to] name [from] someone
Message displayed on var_dump of the $menuItems variable:
array(1) { [0]=> string(0) "" }
EDIT 2:
Thank you to everyone for the help and support with this, I am pleased to say that it is all up and running perfectly!
From the comment stream above, you can simplify the regular expression as follows:
preg_match_all('/\[(.*)\]/U', $_SESSION['emailBody'], $menuItems);
One thing to note:
preg_match_all() fills the array in its 3rd parameter with the results of the matches. Your example line then overwrites this array with the result of preg_match_all() (an integer).
You should then be able to iterate over the results by using the following loop:
foreach ($menuItems[1] as $menuItem) {
// ...
}
Escape the square brackets and remove the dot:
$menuItems = preg_match_all('/[^[]\[([^[]*)\]/U', $_SESSION['emailBody'], $menuItems);
// here __^ __^ ^
preg_match_all doesn't return a string. You have to add an array for the last parameter:
preg_match_all('/\[([^[\]]*)\]/U', $_SESSION['emailBody'], $matches);
The matches are in the array $matches
print_r($matches);
Working example:
$str = '[email] address [to] name [from] someone';
preg_match_all('/\[([^[\]]*)\]/U', $str, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => [email]
[1] => [to]
[2] => [from]
)
[1] => Array
(
[0] => email
[1] => to
[2] => from
)
)
Here is a simple solution. This regex will capture all items encased in brackets along with brackets as well.
If you don't want brackets in result change regex to $regex = "/(?:\\[(\\w+)\\])/mi";
$subject = "[email] address [to] name [from] someone";
$regex = "/(\\[\\w+\\])/mi";
$matches = array();
preg_match_all($regex, $subject, &$matches);
print_r($matches);

Pattern for preg_match

I have a string contains the following pattern "[link:activate/$id/$test_code]" I need to get the word activate, $id and $test_code out of this when the pattern [link.....] occurs.
I also tried getting the inside items by using grouping but only gets active and $test_code couldn't get $id. Please help me to get all the parameter and action name in array.
Below is my code and output
Code
function match_test()
{
$string = "Sample string contains [link:activate/\$id/\$test_code] again [link:anotheraction/\$key/\$second_param]]] also how the other ationc like [link:action] works";
$pattern = '/\[link:([a-z\_]+)(\/\$[a-z\_]+)+\]/i';
preg_match_all($pattern,$string,$matches);
print_r($matches);
}
Output
Array
(
[0] => Array
(
[0] => [link:activate/$id/$test_code]
[1] => [link:anotheraction/$key/$second_param]
)
[1] => Array
(
[0] => activate
[1] => anotheraction
)
[2] => Array
(
[0] => /$test_code
[1] => /$second_param
)
)
Try this:
$subject = <<<'LOD'
Sample string contains [link:activate/$id/$test_code] again [link:anotheraction/$key/$second_param]]] also how the other ationc like [link:action] works
LOD;
$pattern = '~\[link:([a-z_]+)((?:/\$[a-z_]+)*)]~i';
preg_match_all($pattern, $subject, $matches);
print_r($matches);
if you need to have \$id and \$test_code separated you can use this instead:
$pattern = '~\[link:([a-z_]+)(/\$[a-z_]+)?(/\$[a-z_]+)?]~i';
Is this what you are looking for?
/\[link:([\w\d]+)\/(\$[\w\d]+)\/(\$[\w\d]+)\]/
Edit:
Also the problem with your expression is this part:
(\/\$[a-z\_]+)+
Although you have repeated the group, the match will only return one because it is still only one group declaration. The regex won't invent matching group numbers for you (Not that i've ever seen anyway).

Regular Expression with wordpress shortcodes

I'm trying to find all shortcodes within a string which looks like this:
 [a_col] One
 [/a_col]
outside
[b_col]
Two
[/b_col] [c_col] Three [/c_col]
I need the content (eg "Three") and the letter from the col (a, b or c)
Here's the expression I'm using
preg_match_all('#\[(a|b|c)_col\](.*)\[\/\1_col\]#m', $string, $hits);
but $hits contains only the last one.
The content can have any character even "[" or "]"
EDIT:
I would like to get "outside" as well which can be any string (except these cols). How can I handle that or should I parse this in a second step?
This will capture anything in the content, as well as attributes, and will allow any characters in the content.
<?php
$input = '[a_col some="thing"] One[/a_col]
[b_col] Two [/b_col]
[c_col] [Three] [/c_col] ';
preg_match_all('#\[(a|b|c)_col([^\[]*)\](.*?)\[\/\1_col\]#msi', $input, $matches);
print_r($matches);
?>
EDIT:
You may want to then trim the matches, since it appears there may be some whitespace. Alternatively, you can use regex for removing the whitespace in the content:
preg_match_all('#\[(a|b|c)_col([^\[]*)\]\s*(.*?)\s*\[\/\1_col\]#msi', $input, $matches);
OUTPUT:
Array
(
[0] => Array
(
[0] => [a_col some="thing"] One[/a_col]
[1] => [b_col] Two [/b_col]
[2] => [c_col] [Three] [/c_col]
)
[1] => Array
(
[0] => a
[1] => b
[2] => c
)
[2] => Array
(
[0] => some="thing"
[1] =>
[2] =>
)
[3] => Array
(
[0] => One
[1] => Two
[2] => [Three]
)
)
It might also be helpful to use this for capturing the attribute names and values stored in $matches[2]. Consider $atts to be the first element in $matches[2]. Of course, would iterate over the array of attributes and perform this on each.
preg_match_all('#([^="\'\s]+)[\t ]*=[\t ]*("|\')(.*?)\2#', $atts, $att_matches);
This gives an array where the names are stored in $att_matches[1] and their corresponding values are stored in $att_matches[3].
use ((.|\n)*) instead of (.*) to capture multiple lines...
<?php
$string = "
[a_col] One
[/a_col]
[b_col]
Two
[/b_col] [c_col] Three [/c_col]";
preg_match_all('#\[(a|b|c)_col\]((.|\n)*)\[\/\1_col\]#m', $string, $hits);
echo "<textarea style='width:90%;height:90%;'>";
print_r($hits);
echo "</textarea>";
?>
I don't have an environment I can test with here but you could use a look behind and look ahead assertion and a back reference to match tags around the content. Something like this.
(?<=\[(\w)\]).*(?=\[\/\1\])

trying to filter string with <br> tags using explode, does not work

I get a string that looks like this
<br>
ACCEPT:YES
<br>
SMMD:tv240245ce
<br>
is contained in a variable $_session['result']
I am trying to parse through this string and get the following either in an array or as separate variables
ACCEPT:YES
tv240245ce
I first tried
to explode the string using as the delimiter, and that did not work
then I already tried
$yes = explode(":", strip_tags($_SESSION['result']));
echo print_r($yes);
which gives me an array like so
Array ( [0] => ACCEPT [1] => YESSEED [2] => tv240245ce ) 1
which gives me one of my answers.
Please what would be a great way of trying to achieve what I am trying to achieve?
is there a way to get rid of the first and last?
then use the remaining one as a delimiter to explode the string ?
or what's the best way to go about this ?
This will do it:
$data=preg_split('/\s?<br>\s?/', str_replace('SMMD:','',$data), NULL, PREG_SPLIT_NO_EMPTY);
See example here:
CodePad
You can also skip caring about the spurious <br> and treat the whole string as key:value format with a simple regex like:
preg_match_all('/^(\w+):(.*)/', $text, $result, PREG_SET_ORDER);
This requires that you really have line breaks in it though. Gives you a $result list which is easy to convert into an associative array afterwards:
[0] => Array
(
[0] => ACCEPT:YES
[1] => ACCEPT
[2] => YES
)
[1] => Array
(
[0] => SMMD:tv240245ce
[1] => SMMD
[2] => tv240245ce
)
First, do a str_replace to remove all instances of "SMMD:". Then, Explode on "< b r >\n". Sorry for weird spaced, it was encoding the line break.
Include the new line character and you should get the array you want:
$mystr = str_replace( 'SMMD:', '', $mystr );
$res_array = explode( "<br>\n", $mystr );

How does this RegEx for parsing emails work in PHP?

Okay, I have the following PHP code to extract an email address of the following two forms:
Random Stranger <email#domain.com>
email#domain.com
Here is the PHP code:
// The first example
$sender = "Random Stranger <email#domain.com>";
$pattern = '/([\w_-]*#[\w-\.]*)|.*<([\w_-]*#[\w-\.]*)>/';
preg_match($pattern,$sender,$matches,PREG_OFFSET_CAPTURE);
echo "<pre>";
print_r($matches);
echo "</pre><hr>";
// The second example
$sender = "user#domain.com";
preg_match($pattern,$sender,$matches,PREG_OFFSET_CAPTURE);
echo "<pre>";
print_r($matches);
echo "</pre>";
My question is... what is in $matches? It seems to be a strange collection of arrays. Which index holds the match from the parenthesis? How can I be sure I'm getting the email address and only the email address?
Update:
Here is the output:
Array
(
[0] => Array
(
[0] => Random Stranger
[1] => 0
)
[1] => Array
(
[0] =>
[1] => -1
)
[2] => Array
(
[0] => user#domain.com
[1] => 5
)
)
Array
(
[0] => Array
(
[0] => user#domain.com
[1] => 0
)
[1] => Array
(
[0] => user#domain.com
[1] => 0
)
)
This doesn't help you with your preg question but it will simplify your code. Since those are the only 2 options, dont use regular expressions
echo end( explode( '<', rtrim( $sender, '>' ) ) );
The following is copied directly from the help doc at http://us.php.net/preg_match
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
The preg_match() manual page explains how $matches works. It's an optional parameter that gets filled with the results of any bracketed sub-expression from your regexp, in the order that they matched. $matches[0] is always the entire expression match, followed by the sub-expressions.
So for example, that pattern contains two sub-expression, ([\w_-]*#[\w-\.]*) and ([\w_-]*#[\w-\.]*). The parts matching those two expressions will be put into $matches[1] and $matches[2], respectively. I would guess after a quick glance that for the email address of Random Stranger <email#domain.com>, you would have something like this in $matches:
Array(
0 => "Random Stranger <email#domain.com>",
1 => "Random Stranger",
2 => "email#domain.com"
)
Think of it as passing an array named $matches by reference, that gets filled with all the sub-parts that are matched.
Edit - note that you are using the PREG_OFFSET_CAPTURE flag, which alters the behaviour of how $matches gets filled, so your result won't match my example. The manual explains how this flag alters the capture as well. In this case, instead of a set of matched sub-expressions, you get a multidimensional array of each expression with the position it was found at in the string.

Categories