Split a string while keeping delimiters and string outside - php

I'm trying to do something that must be really simple, but I'm fairly new to PHP and I'm struggling with this one. What I want is to split a string containing 0, 1 or more delimiters (braces), while keeping the delimiters AND the string between AND the string outside.
ex: 'Hello {F}{N}, how are you?' would output :
Array ( [0] => Hello
[1] => {F}
[2] => {N}
[3] => , how are you? )
Here's my code so far:
$value = 'Hello {F}{N}, how are you?';
$array= preg_split('/[\{\}]/', $value,-1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($array);
which outputs (missing braces) :
Array ( [0] => Hello
[1] => F
[2] => N
[3] => , how are you? )
I also tried :
preg_match_all('/\{[^}]+\}/', $myValue, $array);
Which outputs (braces are there, but the text outside is flushed) :
Array ( [0] => {F}
[1] => {N} )
I'm pretty sure I'm on the good track with preg_split, but with the wrong regex. Can anyone help me with this? Or tell me if I'm way off?

You aren't capturing the delimiters. Add them to a capturing group:
/(\{.*?\})/

You need parentheses around the part of the expression to be captured:
preg_split('/(\{[^}]+\})/', $myValue, -1, PREG_SPLIT_DELIM_CAPTURE);
See the documentation for preg_split().

Related

How to get a particular string using preg_replace?

i want to get a particular value from string in php. Following is the string
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_replace('/(.*)\[(.*)\](.*)\[(.*)\](.*)/', '$2', $str);
i want to get value of data01. i mean [1,2].
How can i achieve this using preg_replace?
How can solve this ?
preg_replace() is the wrong tool, I have used preg_match_all() in case you need that other item later and trimmed down your regex to capture the part of the string you are looking for.
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('/\[([0-9,]+)\]/',$string,$match);
print_r($match);
/*
print_r($match) output:
Array
(
[0] => Array
(
[0] => [1,2]
[1] => [2,3]
)
[1] => Array
(
[0] => 1,2
[1] => 2,3
)
)
*/
echo "Your match: " . $match[1][0];
?>
This enables you to have the captured characters or the matched pattern , so you can have [1,2] or just 1,2
preg_replace is used to replace by regular expression!
I think you want to use preg_match_all() to get each data attribute from the string.
The regex you want is:
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('#data[0-9]{2}=(\[[0-9,]+\])#',$string,$matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => data01=[1,2]
[1] => data02=[2,3]
)
[1] => Array
(
[0] => [1,2]
[1] => [2,3]
)
)
I have tested this as working.
preg_replace is for replacing stuff. preg_match is for extracting stuff.
So you want:
preg_match('/(.*?)\[(.*?)\](.*?)\[(.*?)\](.*)/', $str, $match);
var_dump($match);
See what you get, and work from there.

Regular Expression with wordpress shortcodes

I'm trying to find all shortcodes within a string which looks like this:
 [a_col] One
 [/a_col]
outside
[b_col]
Two
[/b_col] [c_col] Three [/c_col]
I need the content (eg "Three") and the letter from the col (a, b or c)
Here's the expression I'm using
preg_match_all('#\[(a|b|c)_col\](.*)\[\/\1_col\]#m', $string, $hits);
but $hits contains only the last one.
The content can have any character even "[" or "]"
EDIT:
I would like to get "outside" as well which can be any string (except these cols). How can I handle that or should I parse this in a second step?
This will capture anything in the content, as well as attributes, and will allow any characters in the content.
<?php
$input = '[a_col some="thing"] One[/a_col]
[b_col] Two [/b_col]
[c_col] [Three] [/c_col] ';
preg_match_all('#\[(a|b|c)_col([^\[]*)\](.*?)\[\/\1_col\]#msi', $input, $matches);
print_r($matches);
?>
EDIT:
You may want to then trim the matches, since it appears there may be some whitespace. Alternatively, you can use regex for removing the whitespace in the content:
preg_match_all('#\[(a|b|c)_col([^\[]*)\]\s*(.*?)\s*\[\/\1_col\]#msi', $input, $matches);
OUTPUT:
Array
(
[0] => Array
(
[0] => [a_col some="thing"] One[/a_col]
[1] => [b_col] Two [/b_col]
[2] => [c_col] [Three] [/c_col]
)
[1] => Array
(
[0] => a
[1] => b
[2] => c
)
[2] => Array
(
[0] => some="thing"
[1] =>
[2] =>
)
[3] => Array
(
[0] => One
[1] => Two
[2] => [Three]
)
)
It might also be helpful to use this for capturing the attribute names and values stored in $matches[2]. Consider $atts to be the first element in $matches[2]. Of course, would iterate over the array of attributes and perform this on each.
preg_match_all('#([^="\'\s]+)[\t ]*=[\t ]*("|\')(.*?)\2#', $atts, $att_matches);
This gives an array where the names are stored in $att_matches[1] and their corresponding values are stored in $att_matches[3].
use ((.|\n)*) instead of (.*) to capture multiple lines...
<?php
$string = "
[a_col] One
[/a_col]
[b_col]
Two
[/b_col] [c_col] Three [/c_col]";
preg_match_all('#\[(a|b|c)_col\]((.|\n)*)\[\/\1_col\]#m', $string, $hits);
echo "<textarea style='width:90%;height:90%;'>";
print_r($hits);
echo "</textarea>";
?>
I don't have an environment I can test with here but you could use a look behind and look ahead assertion and a back reference to match tags around the content. Something like this.
(?<=\[(\w)\]).*(?=\[\/\1\])

Regex, get multiple occurrences

I would like to know how to get multiple occurrences from a regex.
$str = "Some validations <IF TEST>firstValue</IF> in <IF OK>secondValue</IF> end of string.";
$do = preg_match("/<IF(.*)>.*<\/IF>/i", $str, $matches);
This is what I've done so far. It works if I have only 1 , but if I have more it doesn't return the right values. Here is the result:
Array ( [0] => firstValue in secondValue [1] => TEST>firstValue in
I need to get the "TEST" and the "OK" values.
EDIT: I've brought the modifications suggested, thanks a lot it works fine ! However, I am now trying to add a elsif parameter and can't get it to work well. Here is what I've done:
$do = preg_match_all("~<IF([^<>]+)>([^<>]+)(</IF>|<ELSEIF([^<>]+)>([^<>]+)</IF>)~", $str, $matches, PREG_SET_ORDER);
and the results is
Array
(
[0] => Array
(
[0] => firstValuesecondValue
[1] => TEST
[2] => firstValue
[3] => secondValue
[4] => TEST1
[5] => secondValue
)
[1] => Array
(
[0] => thirdValue
[1] => OK
[2] => thirdValue
[3] =>
)
)
Is there a way to make my array more clean ? It has many elements which are useless like the [0][4] etc.
You should make the regex more specific. The .* that you are using should either be less greedy, or better yet disallow other angle brackets:
~<IF([^<>]+)>([^<>]+)</IF>~i
More importantly, you should use preg_match_all, not just preg_match.
preg_match_all("~<IF([^<>]+)>([^<>]+)</IF>~i", $str, $matches, PREG_SET_ORDER);
That'll give you a nested array like:
[0] => Array
(
[0] => <IF TEST>firstValue</IF>
[1] => TEST
[2] => firstValue
)
[1] => Array
(
[0] => <IF OK>secondValue</IF>
[1] => OK
[2] => secondValue
)
The answers pointing out that you should use preg_match_all are correct.
But there is another problem: the .* is greedy by default. This will cause it to match both tags in a single match, so you need to make the star non-greedy (i.e. lazy):
/<IF(.*?)>.*?<\/IF>/i
Use this code:
$string = "Some validations <IF TEST>firstValue</IF> in <IF OK>secondValue</IF> end of string.";
$regex = "/<IF (.*?)>.*?<\/IF>/i";
preg_match_all($regex, $string, $matches);
print_r($matches[1]);
You regex is good but you have to use the non-greedy mode adding the ? char and use the preg_match_all() function.
Use a non-greedy match .*? and preg_match_all for this purpose.

return empty string from preg_split

Right now i'm trying to get this:
Array
(
[0] => hello
[1] =>
[2] => goodbye
)
Where index 1 is the empty string.
$toBeSplit= 'hello,,goodbye';
$textSplitted = preg_split('/[,]+/', $toBeSplit, -1);
$textSplitted looks like this:
Array
(
[0] => hello
[1] => goodbye
)
I'm using PHP 5.3.2
[,]+ means one or more comma characters while as much as possible is matched. Use just /,/ and it works:
$textSplitted = preg_split('/,/', $toBeSplit, -1);
But you don’t even need regular expression:
$textSplitted = explode(',', $toBeSplit);
How about this:
$textSplitted = preg_split('/,/', $toBeSplit, -1);
Your split regex was grabbing all the commas, not just one.
Your pattern splits the text using a sequence of commas as separator (its syntax also isn't perfect, as you're using a character class for no reason), so two (or two hundred) commas count just as one.
Anyway, since your just using a literal character as separator, use explode():
$str = 'hello,,goodbye';
print_r(explode(',', $str));
output:
Array
(
[0] => hello
[1] =>
[2] => goodbye
)

Regex for spliting on all unescaped semi-colons

I'm using php's preg_split to split up a string based on semi-colons, but I need it to only split on non-escaped semi-colons.
<?
$str = "abc;def\\;abc;def";
$arr = preg_split("/;/", $str);
print_r($arr);
?>
Produces:
Array
(
[0] => abc
[1] => def\
[2] => abc
[3] => def
)
When I want it to produce:
Array
(
[0] => abc
[1] => def\;abc
[2] => def
)
I've tried "/(^\\)?;/" or "/[^\\]?;/" but they both produce errors. Any ideas?
This works.
<?
$str = "abc;def\;abc;def";
$arr = preg_split('/(?<!\\\);/', $str);
print_r($arr);
?>
It outputs:
Array
(
[0] => abc
[1] => def\;abc
[2] => def
)
You need to make use of a negative lookbehind (read about lookarounds). Think of "match all ';' unless preceed by a '\'".
I am not really proficient with PHP regexes, but try this one:
/(?<!\\);/
Since Bart asks: Of course you can also use regex to split on unescaped ; and take escaped escape characters into account. It just gets a bit messy:
<?
$str = "abc;def\;abc\\\\;def";
preg_match_all('/((?:[^\\\\;]|\\\.)*)(?:;|$)/', $str, $arr);
print_r($arr);
?>
Array
(
[0] => Array
(
[0] => abc;
[1] => def\;abc\\;
[2] => def
)
[1] => Array
(
[0] => abc
[1] => def\;abc\\
[2] => def
)
)
What this does is to take a regular expression for “(any character except \ and ;) or (\ followed by any character)” and allow any number of those, followed by a ; or the end of the string.
I'm not sure how php handles $ and end-of-line characters within a string, you may need to set some regex options to get exactly what you want for those.

Categories