split string with delimiter that are not inside specific characters - php

I have a string in the in the following format
,"value","value2","3",("this is, a test"), "3"
How can I split by commas when they are not within parenthesis?
Edit: Sorry slight problem/correction, inside the parenthesis the format is actually
,"value","value2","3",(THIS IS THE FORMAT "AND QUOTES, INSIDE"), "3"

Consider this code:
$str = ',"value","value2","3",(THIS IS THE FORMAT \) "AND QUOTES, INSIDE"), "3"';
$regex = '#(\(.*?(?<!\\\)\))\s*,|,#';
$arr = preg_split( $regex, $str, 0, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY );
print_r($arr);
OUTPUT:
Array
(
[0] => "value"
[1] => "value2"
[2] => "3"
[3] => (THIS IS THE FORMAT \) "AND QUOTES, INSIDE")
[4] => "3"
)

The quotes are already sufficient to delimit the comma, so you don't need parens as well. If you take out the parens, str_getcsv() will work on it just fine. If you don't have control of the source, you can strip them yourself:
$str = str_replace('",("', '","', $str);
$str = str_replace('"), "', '", "', $str);
print_r(str_getcsv($str))
Edit for updated question:
You're still ok as long as there are no unescaped parens in the file. Just convert close parens to open parens (since getcsv() can only use a single char for delimiters), and then use open paren as your quote character:
$str = str_replace(')', '(', $str);
print_r(str_getcsv($str, ',', '('));
Result:
Array
(
[0] =>
[1] => "value"
[2] => "value2"
[3] => "3"
[4] => THIS IS THE FORMAT "AND QUOTES, INSIDE"
[5] => "3"
)

the above solutions work fine but i have one more
preg_match_all('#(,)?("|(\())(.+?)((?(3)\)|"))(,)?#',$str,$arr);
the output to this one is
Array
(
[0] => Array
(
[0] => ,"value",
[1] => "value2",
[2] => "3",
[3] => ("this is, a test"),
[4] => "3"
)
[1] => Array
(
[0] => ,
[1] =>
[2] =>
[3] =>
[4] =>
)
[2] => Array
(
[0] => "
[1] => "
[2] => "
[3] => (
[4] => "
)
[3] => Array
(
[0] =>
[1] =>
[2] =>
[3] => (
[4] =>
)
[4] => Array
(
[0] => value
[1] => value2
[2] => 3
[3] => "this is, a test"
[4] => 3
)
[5] => Array
(
[0] => "
[1] => "
[2] => "
[3] => )
[4] => "
)
[6] => Array
(
[0] => ,
[1] => ,
[2] => ,
[3] => ,
[4] =>
)
)
so $arr[4] contains the matches

Here’s a simple tokenizer that you can use to split the input into strings and other characters:
preg_match_all('/"(?:[^\\\\"]|\\.)*"|[^"]/', $input, $tokens)
If you want to parse the input, just iterate the tokens and do whatever syntax check you want. You can identify the strings by the quote at the begin and end of the token.

preg_match("/,?\"(.*?)\",?/", $myString, $result);
You can check the regex here
Edit: The only solution I can quickly think with escaped quotes is just replace them and add them again later
preg_match("/,?\"(.*?)\",?/", str_replace('\"', "'", $myString), $result);

Related

how to break the words if comes before and after "-" using php?

I wanna separate the words which comes before and after the "-" using php. I am unable to separate the text which carry space befor and after the "-".
<?php
$value="this | is : my , text - test , done > hello-hi";
$keywords = preg_split("/[,|:&>]+/", $value);
print_r($keywords);
?>
Answer m getting:
Array ( [0] => this [1] => is [2] => my [3] => text - test [4] => done [5] => hello-hi )
Answer I want will be like :
Array ( [0] => this [1] => is [2] => my [3] => text [4]=>test [5] => done [6] => hello-hi )
I think you could make a gruop and add one OR (|) to split words between the -.
([,|:&>]+|\s+-\s+) will split by ,|:&> or a for one or more spaces followed by a - and followed by one or more spaces. This avoid strings/text like hello-hi to split in two elements.
$value="this | is : my , text - test , done > hello-hi";
$keywords = preg_split("/([,|:&>]+|\s+-\s+)/", $value);
print_r($keywords);
Output:
Array ( [0] => this [1] => is [2] => my [3] => text [4] => test [5] => done [6] => hello-hi )

preg_match_all with named subpatterns

I can`t figure out the following expression:
preg_match_all('/[(?P<slug>\w+\-)\-(?P<flag>(m|t))\-(?P<id>\d+)]+/', $slugs, $matches);
My $slugs variable is something like this:
article-slug-one-m-111617/article-slug-two-t-111611/article-slug-three-t-111581/article-slug-four-m-111609/
Your expression looks like an attempt to split up the path elements into slug, flag and id parts. It fails as the brackets [ ... ] is used to match characters but here it seems to be used to keep things together, like parentheses. It also fails to get the slug part right, as it does not allow for more than one series of word \w and dash - characters. I.e. that part matches 'article-' but not 'article-slug-one-'.
Maybe this is what you want?
$slugs = 'article-slug-one-m-111617/article-slug-two-t-111611/article-slug-three-t-111581/article-slug-four-m-111609/';
preg_match_all('/(?P<slug>[\w-]+)\-(?P<flag>[mt])\-(?P<id>\d+)/', $slugs, $matches);
echo "First slug : " . $matches['slug'][0], PHP_EOL;
echo "Second flag: " . $matches['flag'][1], PHP_EOL;
echo "Third ID : " . $matches['id'][2], PHP_EOL;
print_r($matches);
Output:
First slug : article-slug-one
Second flag: t
Third ID : 111581
Array
(
[0] => Array
(
[0] => article-slug-one-m-111617
[1] => article-slug-two-t-111611
[2] => article-slug-three-t-111581
[3] => article-slug-four-m-111609
)
[slug] => Array
(
[0] => article-slug-one
[1] => article-slug-two
[2] => article-slug-three
[3] => article-slug-four
)
[1] => Array
(
[0] => article-slug-one
[1] => article-slug-two
[2] => article-slug-three
[3] => article-slug-four
)
[flag] => Array
(
[0] => m
[1] => t
[2] => t
[3] => m
)
[2] => Array
(
[0] => m
[1] => t
[2] => t
[3] => m
)
[id] => Array
(
[0] => 111617
[1] => 111611
[2] => 111581
[3] => 111609
)
[3] => Array
(
[0] => 111617
[1] => 111611
[2] => 111581
[3] => 111609
)
)

How to stop splitting within a pair of second delimiter in preg_split (PHP)?

I need to generate an array with preg_split, as implode('', $array) can re-generate the original string. `preg_split of
$str = 'this is a test "some quotations is her" and more';
$array = preg_split('/( |".*?")/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
generates an array of
Array
(
[0] => this
[1] =>
[2] => is
[3] =>
[4] => a
[5] =>
[6] => test
[7] =>
[8] =>
[9] => "some quotations is here"
[10] =>
[11] =>
[12] => and
[13] =>
[14] => more
)
I need to take care of the space before/after the quotation marks too, to generate an array with the exact pattern of the original string.
For example, if the string is test "some quotations is here"and, the array should be
Array
(
[0] => test
[1] =>
[2] => "some quotations is here"
[3] => and
)
Note: The edit has been made based on initial discussion with #mikel.
Will this work for you ?
preg_split('/( ?".*?" ?| )/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
This should do the trick
$str = 'this is a test "some quotations is her" and more';
$result = preg_split('/(?:("[^"]+")|\b)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
$result = array_slice($result, 1,-1);
Output
Array
(
[0] => this
[1] =>
[2] => is
[3] =>
[4] => a
[5] =>
[6] => test
[7] =>
[8] => "some quotations is her"
[9] =>
[10] => and
[11] =>
[12] => more
)
Reconstruction
implode('', $result);
// => this is a test "some quotations is her" and more

What is the regex for the text between quotes?

Ok, I have tried looking at other answers, but couldn't get mine solved. So here is the code:
{"chg":"-0.71","vol":"40700","time":"11.08.2011 12:29:09","high":"1.417","low":"1.360","last":"1.400","pcl":"1.410","turnover":"56,560.25"}
I need to get every second value in the quotes (as the "name" values are constant). I actually worked out that I need to get text between :" and " but i can't manage to write a regex for that.
EDIT: I'm doing preg_match_all in php. And its between :" and ", not " and " as someone else edited.
Why on earth would you attempt to parse JSON with regular expressions? PHP already parses JSON properly, with built-in functionality.
Code:
<?php
$input = '{"chg":"-0.71","vol":"40700","time":"11.08.2011 12:29:09","high":"1.417","low":"1.360","last":"1.400","pcl":"1.410","turnover":"56,560.25"}';
print_r(json_decode($input, true));
?>
Output:
Array
(
[chg] => -0.71
[vol] => 40700
[time] => 11.08.2011 12:29:09
[high] => 1.417
[low] => 1.360
[last] => 1.400
[pcl] => 1.410
[turnover] => 56,560.25
)
Live demo.
You may need to escape characters or add a forward slash to the front or back depending on your language. But it's basically:
:"([^"].*?)"
or
/:"([^"].*?)"/
I've test this in groovy as below and it works.
import java.util.regex.*;
String test='{"chg":"-0.71","vol":"40700","time":"11.08.2011 12:29:09","high":"1.417","low":"1.360","last":"1.400","pcl":"1.410","turnover":"56,560.25"}'
// Create a pattern to match breaks
Pattern p = Pattern.compile(':"([^"]*)"');
// Split input with the pattern
// Run some matches
Matcher m = p.matcher(test);
while (m.find())
System.out.println("Found comment: "+m.group().replace('"','').replace(":",""));
Output was:
Found comment: -0.71
Found comment: 40700
Found comment: 11.08.2011 12:29:09
Found comment: 1.417
Found comment: 1.360
Found comment: 1.400
Found comment: 1.410
Found comment: 56,560.25
PHP Example
<?php
$subject = '{"chg":"-0.71","vol":"40700","time":"11.08.2011 12:29:09","high":"1.417","low":"1.360","last":"1.400","pcl":"1.410","turnover":"56,560.25"}';
$pattern = '/(?<=:")[^"]*/';
preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>
Output is:
Array ( [0] => Array ( [0] => Array ( [0] => -0.71 [1] => 8 ) [1] => Array ( [0] => 40700 [1] => 22 ) [2] => Array ( [0] => 11.08.2011 12:29:09 [1] => 37 ) [3] => Array ( [0] => 1.417 [1] => 66 ) [4] => Array ( [0] => 1.360 [1] => 80 ) [5] => Array ( [0] => 1.400 [1] => 95 ) [6] => Array ( [0] => 1.410 [1] => 109 ) [7] => Array ( [0] => 56,560.25 [1] => 128 ) ) )

How to preg_split using PREG_SPLIT_DELIM_CAPTURE

$str = "blabla and, some more blah";
$delimiters = " ,¶.\n";
$char_buff = preg_split("/(,) /", $str, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($char_buff);
I get:
Array (
[0] => blabla and
[1] => ,
[2] => some more blah
)
I was able to figure out how to use the parenthesis to get the comma to show up in its own array element -- but how can I do this with multiple different delimiters (for example, those in the $delimiters variable)?
You need to create a character class by wrapping the delimiters with [ and ].
<?php
$str = "blabla and, some more blah. Blah.\nSecond line.";
$delimiters = " ,¶.\n";
$char_buff = preg_split('/([' . $delimiters . '])/', $str, -1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($char_buff);
You also need to use PREG_SPLIT_NO_EMPTY so that in places where you get two matches in a row, for instance a comma followed by a space, you don't get an empty match.
Output
Array
(
[0] => blabla
[1] =>
[2] => and
[3] => ,
[4] =>
[5] => some
[6] =>
[7] => more
[8] =>
[9] => blah
[10] => .
[11] =>
[12] => Blah
[13] => .
[14] =>
[15] => Second
[16] =>
[17] => line
[18] => .
)
Depending on what you are doing, using strtok may be a more appropriate way of doing it though.
Use something like:
'/([,.])/'
That is put each delimiter in that square bracket.
Each delimiter expression needs to be inside its own group.
print_r(preg_split('/2\d4/' , '12345', null, PREG_SPLIT_DELIM_CAPTURE));
Array ( [0] => 1 [1] => 5 )
print_r(preg_split('/(2)(\d)(4)/', '12345', null, PREG_SPLIT_DELIM_CAPTURE));
Array ( [0] => 1 [1] => 2 [2] => 3 [3] => 4 [4] => 5 )

Categories