Regex for spliting on all unescaped semi-colons - php

I'm using php's preg_split to split up a string based on semi-colons, but I need it to only split on non-escaped semi-colons.
<?
$str = "abc;def\\;abc;def";
$arr = preg_split("/;/", $str);
print_r($arr);
?>
Produces:
Array
(
[0] => abc
[1] => def\
[2] => abc
[3] => def
)
When I want it to produce:
Array
(
[0] => abc
[1] => def\;abc
[2] => def
)
I've tried "/(^\\)?;/" or "/[^\\]?;/" but they both produce errors. Any ideas?

This works.
<?
$str = "abc;def\;abc;def";
$arr = preg_split('/(?<!\\\);/', $str);
print_r($arr);
?>
It outputs:
Array
(
[0] => abc
[1] => def\;abc
[2] => def
)
You need to make use of a negative lookbehind (read about lookarounds). Think of "match all ';' unless preceed by a '\'".

I am not really proficient with PHP regexes, but try this one:
/(?<!\\);/

Since Bart asks: Of course you can also use regex to split on unescaped ; and take escaped escape characters into account. It just gets a bit messy:
<?
$str = "abc;def\;abc\\\\;def";
preg_match_all('/((?:[^\\\\;]|\\\.)*)(?:;|$)/', $str, $arr);
print_r($arr);
?>
Array
(
[0] => Array
(
[0] => abc;
[1] => def\;abc\\;
[2] => def
)
[1] => Array
(
[0] => abc
[1] => def\;abc\\
[2] => def
)
)
What this does is to take a regular expression for “(any character except \ and ;) or (\ followed by any character)” and allow any number of those, followed by a ; or the end of the string.
I'm not sure how php handles $ and end-of-line characters within a string, you may need to set some regex options to get exactly what you want for those.

Related

Split string in php with comma and new line

Im trying to split string in PHP. I should split string using two delimiters: new line and comma. My code is:
$array = preg_split("/\n|,/", $str)
But i get string split using comma, but not using \n. Why is that? Also , do I have to take into account "\r\n" symbol?
I can think of two possible reasons that this is happening.
1. You are using a single quoted string:
$array = preg_split("/\n|,/", 'foo,bar\nbaz');
print_r($array);
Array
(
[0] => foo
[1] => bar\nbaz
)
If so, use double quotes " instead ...
$array = preg_split("/\n|,/", "foo,bar\nbaz");
print_r($array);
Array
(
[0] => foo
[1] => bar
[2] => baz
)
2. You have multiple newline sequences and I would recommend using \R if so. This matches any Unicode newline sequence that is in the ASCII range.
$array = preg_split('/\R|,/', "foo,bar\nbaz\r\nquz");
print_r($array);
Array
(
[0] => foo
[1] => bar
[2] => baz
[3] => quz
)

How to get a particular string using preg_replace?

i want to get a particular value from string in php. Following is the string
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_replace('/(.*)\[(.*)\](.*)\[(.*)\](.*)/', '$2', $str);
i want to get value of data01. i mean [1,2].
How can i achieve this using preg_replace?
How can solve this ?
preg_replace() is the wrong tool, I have used preg_match_all() in case you need that other item later and trimmed down your regex to capture the part of the string you are looking for.
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('/\[([0-9,]+)\]/',$string,$match);
print_r($match);
/*
print_r($match) output:
Array
(
[0] => Array
(
[0] => [1,2]
[1] => [2,3]
)
[1] => Array
(
[0] => 1,2
[1] => 2,3
)
)
*/
echo "Your match: " . $match[1][0];
?>
This enables you to have the captured characters or the matched pattern , so you can have [1,2] or just 1,2
preg_replace is used to replace by regular expression!
I think you want to use preg_match_all() to get each data attribute from the string.
The regex you want is:
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('#data[0-9]{2}=(\[[0-9,]+\])#',$string,$matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => data01=[1,2]
[1] => data02=[2,3]
)
[1] => Array
(
[0] => [1,2]
[1] => [2,3]
)
)
I have tested this as working.
preg_replace is for replacing stuff. preg_match is for extracting stuff.
So you want:
preg_match('/(.*?)\[(.*?)\](.*?)\[(.*?)\](.*)/', $str, $match);
var_dump($match);
See what you get, and work from there.

Split a string while keeping delimiters and string outside

I'm trying to do something that must be really simple, but I'm fairly new to PHP and I'm struggling with this one. What I want is to split a string containing 0, 1 or more delimiters (braces), while keeping the delimiters AND the string between AND the string outside.
ex: 'Hello {F}{N}, how are you?' would output :
Array ( [0] => Hello
[1] => {F}
[2] => {N}
[3] => , how are you? )
Here's my code so far:
$value = 'Hello {F}{N}, how are you?';
$array= preg_split('/[\{\}]/', $value,-1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($array);
which outputs (missing braces) :
Array ( [0] => Hello
[1] => F
[2] => N
[3] => , how are you? )
I also tried :
preg_match_all('/\{[^}]+\}/', $myValue, $array);
Which outputs (braces are there, but the text outside is flushed) :
Array ( [0] => {F}
[1] => {N} )
I'm pretty sure I'm on the good track with preg_split, but with the wrong regex. Can anyone help me with this? Or tell me if I'm way off?
You aren't capturing the delimiters. Add them to a capturing group:
/(\{.*?\})/
You need parentheses around the part of the expression to be captured:
preg_split('/(\{[^}]+\})/', $myValue, -1, PREG_SPLIT_DELIM_CAPTURE);
See the documentation for preg_split().

Regular Expression with wordpress shortcodes

I'm trying to find all shortcodes within a string which looks like this:
 [a_col] One
 [/a_col]
outside
[b_col]
Two
[/b_col] [c_col] Three [/c_col]
I need the content (eg "Three") and the letter from the col (a, b or c)
Here's the expression I'm using
preg_match_all('#\[(a|b|c)_col\](.*)\[\/\1_col\]#m', $string, $hits);
but $hits contains only the last one.
The content can have any character even "[" or "]"
EDIT:
I would like to get "outside" as well which can be any string (except these cols). How can I handle that or should I parse this in a second step?
This will capture anything in the content, as well as attributes, and will allow any characters in the content.
<?php
$input = '[a_col some="thing"] One[/a_col]
[b_col] Two [/b_col]
[c_col] [Three] [/c_col] ';
preg_match_all('#\[(a|b|c)_col([^\[]*)\](.*?)\[\/\1_col\]#msi', $input, $matches);
print_r($matches);
?>
EDIT:
You may want to then trim the matches, since it appears there may be some whitespace. Alternatively, you can use regex for removing the whitespace in the content:
preg_match_all('#\[(a|b|c)_col([^\[]*)\]\s*(.*?)\s*\[\/\1_col\]#msi', $input, $matches);
OUTPUT:
Array
(
[0] => Array
(
[0] => [a_col some="thing"] One[/a_col]
[1] => [b_col] Two [/b_col]
[2] => [c_col] [Three] [/c_col]
)
[1] => Array
(
[0] => a
[1] => b
[2] => c
)
[2] => Array
(
[0] => some="thing"
[1] =>
[2] =>
)
[3] => Array
(
[0] => One
[1] => Two
[2] => [Three]
)
)
It might also be helpful to use this for capturing the attribute names and values stored in $matches[2]. Consider $atts to be the first element in $matches[2]. Of course, would iterate over the array of attributes and perform this on each.
preg_match_all('#([^="\'\s]+)[\t ]*=[\t ]*("|\')(.*?)\2#', $atts, $att_matches);
This gives an array where the names are stored in $att_matches[1] and their corresponding values are stored in $att_matches[3].
use ((.|\n)*) instead of (.*) to capture multiple lines...
<?php
$string = "
[a_col] One
[/a_col]
[b_col]
Two
[/b_col] [c_col] Three [/c_col]";
preg_match_all('#\[(a|b|c)_col\]((.|\n)*)\[\/\1_col\]#m', $string, $hits);
echo "<textarea style='width:90%;height:90%;'>";
print_r($hits);
echo "</textarea>";
?>
I don't have an environment I can test with here but you could use a look behind and look ahead assertion and a back reference to match tags around the content. Something like this.
(?<=\[(\w)\]).*(?=\[\/\1\])

Regex, get multiple occurrences

I would like to know how to get multiple occurrences from a regex.
$str = "Some validations <IF TEST>firstValue</IF> in <IF OK>secondValue</IF> end of string.";
$do = preg_match("/<IF(.*)>.*<\/IF>/i", $str, $matches);
This is what I've done so far. It works if I have only 1 , but if I have more it doesn't return the right values. Here is the result:
Array ( [0] => firstValue in secondValue [1] => TEST>firstValue in
I need to get the "TEST" and the "OK" values.
EDIT: I've brought the modifications suggested, thanks a lot it works fine ! However, I am now trying to add a elsif parameter and can't get it to work well. Here is what I've done:
$do = preg_match_all("~<IF([^<>]+)>([^<>]+)(</IF>|<ELSEIF([^<>]+)>([^<>]+)</IF>)~", $str, $matches, PREG_SET_ORDER);
and the results is
Array
(
[0] => Array
(
[0] => firstValuesecondValue
[1] => TEST
[2] => firstValue
[3] => secondValue
[4] => TEST1
[5] => secondValue
)
[1] => Array
(
[0] => thirdValue
[1] => OK
[2] => thirdValue
[3] =>
)
)
Is there a way to make my array more clean ? It has many elements which are useless like the [0][4] etc.
You should make the regex more specific. The .* that you are using should either be less greedy, or better yet disallow other angle brackets:
~<IF([^<>]+)>([^<>]+)</IF>~i
More importantly, you should use preg_match_all, not just preg_match.
preg_match_all("~<IF([^<>]+)>([^<>]+)</IF>~i", $str, $matches, PREG_SET_ORDER);
That'll give you a nested array like:
[0] => Array
(
[0] => <IF TEST>firstValue</IF>
[1] => TEST
[2] => firstValue
)
[1] => Array
(
[0] => <IF OK>secondValue</IF>
[1] => OK
[2] => secondValue
)
The answers pointing out that you should use preg_match_all are correct.
But there is another problem: the .* is greedy by default. This will cause it to match both tags in a single match, so you need to make the star non-greedy (i.e. lazy):
/<IF(.*?)>.*?<\/IF>/i
Use this code:
$string = "Some validations <IF TEST>firstValue</IF> in <IF OK>secondValue</IF> end of string.";
$regex = "/<IF (.*?)>.*?<\/IF>/i";
preg_match_all($regex, $string, $matches);
print_r($matches[1]);
You regex is good but you have to use the non-greedy mode adding the ? char and use the preg_match_all() function.
Use a non-greedy match .*? and preg_match_all for this purpose.

Categories