Related
I have a string that I want to explode by commas but only if the comma is not nested inside some parentheses. This is a fairly common use case, and I have been reading through answered posts in this forum, but not really found what I am looking for.
So, in detail: the point is, i have a string (= SQL SELECT ... FROM statement), and I want to extract the elements from the list separated by comma encoded in this string (= the name of the columns that one wants to select from. However, these elements can contain brackets, and effectively be function calls. For example, in SQL one could do
SELECT TO_CHAR(min(shippings.shippingdate), 'YYYY-MM-DD') as shippingdate, nameoftheguy FROM shippings WHERE ...
Obviously, I would like to have an array now containing as first element
TO_CHAR(min(shippings.shippingdate), 'YYYY-MM-DD') as shippingdate
and as second element
nameoftheguy
The approaches I have followed so far are
PHP and RegEx: Split a string by commas that are not inside brackets (and also nested brackets),
PHP: Split a string by comma(,) but ignoring anything inside square brackets?,
Explode string except where surrounded by parentheses?, and
PHP: split string on comma, but NOT when between braces or quotes?
(focussing on the regex expressions therein, since I would like to do it with a single regex line), but in my little test area, those do not give the proper result. In fact, all of them split nothing or too much:
$Input: SELECT first, second, to_char(my,big,house) as bigly, export(mastermind and others) as aloah FROM
$Output: Array ( [0] => first [1] => second [2] => to_char [3] => (my,big,house) [4] => as [5] => bigly [6] => export [7] => (mastermind and others) [8] => as [9] => aloah )
The code of my test area is
<?php
function test($sql){
$foo = preg_match("/SELECT(.*?)FROM/", $sql, $match);
$bar = preg_match_all("/(?:[^(|]|\([^)]*\))+/", $match[1], $list);
//$bar = preg_match_all("/\((?:[^()]|(?R))+\)|'[^']*'|[^(),\s]+/", $match[1], $list);
//$bar = preg_match_all("/[,]+(?![^\[]*\])/", $match[1], $list);
//$bar = preg_match_all("/(?:[^(|]|\([^)]*\))+/", $match[1], $list);
//$bar = preg_match_all("/[^(,\s]+|\([^)]+\)/", $match[1], $list);
//$bar = preg_match_all("/([(].*?[)])|(\w)+/", $match[1], $list);
print "<br/>";
return $list[0];
}
print_r(test("SELECT first, second, to_char(my,big,house) as bigly, export(mastermind and others) as aloah FROM"));
?>
As you can imagine, I am not an regex expert, but I would like to do this splitting in a single line, if it is possible.
Following the conversation here, I did write a parser to solve this problem. It is quite ugly, but it does the job (at least within some limitations). For completeness (if anybody else might run into the same question), I post it here:
function full($sqlu){
$sqlu = strtoupper($sqlu);
if(strpos($sqlu, "SELECT ")===false || strpos($sqlu, " FROM ")===false) return NULL;
$def = substr($sqlu, strpos($sqlu, "SELECT ")+7, strrpos($sqlu, " FROM ")-7);
$raw = explode(",", $def);
$elements = array();
$rem = array();
foreach($raw as $elm){
array_push($rem, $elm);
$txt = implode(",", $rem);
if(substr_count($txt, "(") - substr_count($txt, ")") == 0){
array_push($elements, $txt);
$rem = array();
}
}
return $elements;
}
When feeding it with the following string
SELECT first, second, to_char(my,(big, and, fancy),house) as bigly, (SELECT myVar,foo from z) as super, export(mastermind and others) as aloah FROM table
it returns
Array ( [0] => first [1] => second [2] => to_char(my,(big, and, fancy),house) as bigly [3] => (SELECT myVar,foo from z) as super [4] => export(mastermind and others) as aloah )
I have following array
$type=Array(
[0] => PL
[1] => LWP
[2] => Carry Forward
[3] => SL
);
I want to convert it into String.I used implode function to convert it.
$newarray=implode(",", $type);
but it reurn string like below
PL,LWP,Carry Forward,SL
I want String like below
("PL","LWP","Carry Forward","SL")
Please help me...
Try this one ;)
$newarray='("'.implode('","', $type).'")';
As you want double quotes in the string generated,you can replace the glue of implode with implode('", "', $type) instead of implode(", ", $type)
These double quotes will not appear in the 1st and last value of string.. So add (" in the beginning and ") at the end of implode.
See the code below for entire syntax
<?php
$type=Array(
[0] => PL
[1] => LWP
[2] => Carry Forward
[3] => SL
);
//Convert array to string with double quotes for each value
$newarray='("'.implode('", "', $type).'")';
echo $newarray;
?>
You can try with "," glue and adding prefix (" and postfix ")
$newarray= '("' . implode('","', $type) . '")';
You should also think about edge case when you have empty array since you may want to handle it slightly different. In this case it would result ("")
As it stands, your question is unanswerable. You are trying to create a machine parseable representation of data without providing nearly enough information about how this is to be parsed.
e.g. What if one the elements already contains a double quote - you need to escape it. But which method do you use for escaping?
e.g. Quoting a numerical value may then affect its interpretation by the consumer of the data (although your example does not include numerical data).
You might consider using fputcsv(); (but this will only write to a stream - so if you need the data elsewhere then you'd need to capture the stream.
The current answers disregard the edge case of $type being empty by returning ("") instead of () which is probably what you want.
You may want to surround each element by quotes before adding the parentheses like so:
$type = array("PL", "LWP", "Carry Forward", "SL");
$quoted = array_map( function($element) { return "\"${element}\""; }, $type );
$newarray = '(' . implode(',', $quoted) . ')';
I have this problem:
One string like:
$stringa = "array('name' => 'John')"
I want obtain : array('name' => 'John') for use in my code like array
Any helps? Thanks
Caution using eval...
Caution The eval() language construct is very dangerous because it
allows execution of arbitrary PHP code. Its use thus is discouraged.
If you have carefully verified that there is no other option than to
use this construct, pay special attention not to pass any user
provided data into it without properly validating it beforehand.
<?php
$stringa = "array('name' => 'John')";
$code = "\$a = " . $stringa . ";";
eval($code);
print_r($a);
Gouda Elalfy was on the right idea, but his solution simply made the whole string a single array value.
First, we need to remove the excess datatype information:
$slimString = substr($stringa,6);
$slimString = rtrim($slimString,")");
This now gives us a string of:
'name' => 'John'
So then the Keys and the values in the string need to be split up,
so break at => as so:
For multiple values in the string:
This method also includes the single quotes to limit catching punctuation commas (please note this method was screwed up by trim not being as effective as I'd have liked and requiring str_replace quotes instead).
$slimString = str_replace("', '","=>", $slimString);
Then
$slimStringParts = explode("=>", $slimString);
This will split on => (or ,) so that multiple values of array contents can be generated.
Then cycle through each of the array pieces, on the basis that the EVEN numbers (and zero) are the Keys and the ODD numbers are the values, also removing the quotes as well for each one,
I was originally using trim but for some reason trim was not working as fully as I expected. so instead reverted to st_replace
foreach($slimStringParts as $key => $part){
if(($key%2) == 0){
$part = str_replace("'","",$part);
$arrayOutput[$part] = str_replace("'","",$slimStringParts[$key+1]);
}
}
unset($key, $part);
The foreach only acts upon the even and zero values as referenced above, and they take the original key value + 1 as their contents.
The trim/str_replace removes the single quotes and looks untidy but this works for a string of one or more array values
And finally the Output:
print_r($arrayOutput);
Test with the original:
input : "array('name' => 'John')"
Output : Array ( [name ] => John )
Tested with a multivalue array string:
input : "array('name' => 'John', 'surname' => 'James', 'Ride' => 'horse')"
Output : Array ( [name ] => John [surname ] => James [Ride ] => horse )
Full code:
$stringa = "array('name' => 'John')";
$stringb = "array('name' => 'John', 'surname' => 'James', 'Ride' => 'horse')";
$slimString = substr($stringb,6);
$slimString = rtrim($slimString,")");
$slimString = str_replace("', '","=>", $slimString);
$slimStringParts = explode("=>", $slimString);
foreach($slimStringParts as $key => $part){
if(($key%2) == 0){
$part = str_replace("'","",$part);
$arrayOutput[$part] = str_replace("'","",$slimStringParts[$key+1]);
}
}
unset($key,$part);
print_r($arrayOutput);
exit;
Please note my trim() idea was what I wanted but that seems to be influenced by my page character encoding :-(
You can use eval() function.
But it's not recommanded.
http://php.net/manual/en/function.eval.php
Eval is unsafe, try to use it as temporary method and find a better solution.
Example :
$stringa="array('name' => 'John');";
$array = eval($stringa);
Background
I have an array which I create by splitting a string based on every occurrence of 0d0a using preg_split('/(?<=0d0a)(?!$)/').
For example:
$string = "78781110d0a78782220d0a";
will be split into:
Array ( [0] => 78781110d0a [1] => 78782220d0a )
A valid array element has to start with 7878 and end with 0d0a.
The Problem
But sometimes, there's an additional 0d0a in the string which splits into an extra and invalid array element, i.e., that doesn't begin with 7878.
Take this string for example:
$string = "78781110d0a2220d0a78783330d0a";
This is split into:
Array ( [0] => 78781110d0a [1] => 2220d0a [2] => 78783330d0a )
But it should actually be:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a)
My Solution
I've written the following (messy) code to get around this:
$data = Array('78781110d0a','2220d0a','78783330d0a');
$i = 0; //count for $data array;
$j = 0; //count for $dataFixed array;
$dataFixed = $data;
foreach($data as $packet) {
if (substr($packet,0,4) != "7878") { //if packet doesn't start with 7878, do some fixing
if ($i != 0) { //its the first packet, can't help it!
$j++;
if ((substr(strtolower($packet), -4, 4) == "0d0a")) { //if the packet doesn't end with 0d0a, its 'mostly' not valid, so discard it
$dataFixed[$i-$j] = $dataFixed[$i-$j] . $packet;
}
unset($dataFixed[$i-$j+1]);
$dataFixed = array_values($dataFixed);
}
}
$i++;
}
Description
I first copy the array to another array $dataFixed. In a foreach loop of the $data array, I check whether it starts with 7878. If it doesn't, I join it with the previous array in $data. I then unset the current array in $dataFixed and reset the array elements with array_values.
But I'm not very confident about this solution.. Is there a better, more efficient way?
UPDATE
What if the input string doesn't end in 0d0a like its supposed to? It will stick to the previous array element..
For e.g.: in the string 78781110d0a2220d0a78783330d0a0000, 0000 should be separated as another array element.
Use another positive lookahead (?=7878) to form:
preg_split('/(?<=0d0a)(?=7878)/',$string)
Note: I removed (?!$) because I wasn't sure what that was for, based on your example data.
For example, this code:
$string = "78781110d0a2220d0a78783330d0a";
$array = preg_split('/(?<=0d0a)(?=7878)(?!$)/',$string);
print_r($array);
Results in:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a )
UPDATE:
Based on your revised question of having possible random characters at the end of the input string, you can add three lines to make a complete program of:
$string = "78781110d0a2220d0a787830d0a330d0a0000";
$array = preg_split('/(?<=0d0a)(?=7878)/',$string);
$temp = preg_split('/(7878.*0d0a)/',$array[count($array)-1],null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$array[count($array)-1] = $temp[0];
if(count($temp)>1) { $array[] = $temp[1]; }
print_r($array);
We basically do the initial splitting, then split the last element of the resulting array by the expected data format, keeping the delimiter using PREG_SPLIT_DELIM_CAPTURE. The PREG_SPLIT_NO_EMPTY ensures we won't get an empty array element if the input string doesn't end in random characters.
UPDATE 2:
Based on your comment below where it seems you're implying there might be random characters between any of the desired matches, and you want these random characters preserved, you could do this:
$string = "0078781110d0a2220d0a2220d0a0000787830d0a330d0a000078781110d0a2220d0a0000787830d0a330d0a0000";
$split1 = preg_split('/(7878.*?0d0a)/',$string,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$result = array();
foreach($split1 as $e){
$split2 = preg_split('/(.*0d0a)/',$e,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
foreach($split2 as $el){
// test if $el doesn't start with 7878 and ends with 0d0a
if(strpos($el,'7878') !== 0 && substr($el,-4) == '0d0a'){
//if(preg_match('/^(?!7878).*0d0a$/',$el) === 1){
$result[ count($result)-1 ] = $result[ count($result)-1 ] . $el;
} else {
$result[] = $el;
}
}
}
print_r($result);
The strategy employed here is different than above. First we split the input string based on the delimiter that matches your desired data, using the nongreedy regex .*?. At this point we have some strings that contain the ending of a desired value and some garbage at the end, so we split again based on the last occurrence of "0d0a" with the greedy regex .*0d0a. We then append any of those resulting values that don't start with "7878" but end with "0d0a" to the previous value, as this should repair the first and second halves that got split because it contained an extra "0d0a".
I provided two methods for the innermost if statement, one using regular expressions. The regex one is marginally slower in my testing, so I've left that one commented out.
I might still not have your full requirements, so you'll have to let me know if it works and perhaps provided your full dataset.
I think you are using a delimiter "0d0a" which also happens to be part of a content! Its not possible to avoid getting junk data as long as delimiter can also be part of content. Somehow delimiter must be unique.
Possible solutions.
Change the delimited to something else that doesn't occur as part of your data ( 000000, #!.;)
If you are definite about length of text that easy arrange item may have, use it. As per examples its not possible.
Solutions given in answers considering only sample data you have shared. If you are confidant about what will be the content of string, then these solutions given by others are pretty good to use. Otherwise these solutions wont assure you guarantee!
Best solution: Fix right delimiter then use regex or explode whatever you prefer.
Why don't you use preg_match_all instead? You can avoid all of the non-capturing groups (the look aheads, look behinds) in order to split the string (which without the non-capturing groups removes the matches), and just find the matches you're looking for:
Updated
<?php
$string = "00787817878110d0a22278780d0a78783330d0a00";
preg_match_all('/7878.*?0d0a(?=7878|[^(7878)]*?$)/', $string, $arr);
print_r($arr);
?>
Gives an array $arr[0] => ( [0] => 787817878110d0a22278780d0a, [1] => 78783330d0a ). Strips leading and trailing garbage characters (whatever doesn't start with 7878 or end with 7878 or 0d0a.
So $arr[0] would be the array of values that you are looking for.
See example on ideone
Works with multiple 7878 values and multiple 0d0a values (even though that's ridiculous).
Update
If splitting is more your style, why not avoid regular expressions altogether?
<?php
$string = "787817878110d0a22278780d0a78783330d0a";
$arr = explode('0d0a7878', $string);
$string = implode('0d0a,7878', $arr);
$arr = explode(',', $string);
print_r($arr);
?>
Here we split the string by the delimiter 0d0a7878, which is what #CharlieGorichanaz's solution is doing, and props to him for the quick, accurate solution. We then add a comma, because who doesn't love comma separated values? And we explode again on the commas for an array of desired values. Performance-wise, this ought to be faster than using regular expressions. See example.
Hopefully, this is an easy one. I have an array with lines that contain output from a CSV file. What I need to do is simply remove any commas that appear between double-quotes.
I'm stumbling through regular expressions and having trouble. Here's my sad-looking code:
<?php
$csv_input = '"herp","derp","hey, get rid of these commas, man",1234';
$pattern = '(?<=\")/\,/(?=\")'; //this doesn't work
$revised_input = preg_replace ( $pattern , '' , $csv_input);
echo $revised_input;
//would like revised input to echo: "herp","derp,"hey get rid of these commas man",1234
?>
Thanks VERY much, everyone.
Original Answer
You can use str_getcsv() for this as it is purposely designed for process CSV strings:
$out = array();
$array = str_getcsv($csv_input);
foreach($array as $item) {
$out[] = str_replace(',', '', $item);
}
$out is now an array of elements without any commas in them, which you can then just implode as the quotes will no longer be required once the commas are removed:
$revised_input = implode(',', $out);
Update for comments
If the quotes are important to you then you can just add them back in like so:
$revised_input = '"' . implode('","', $out) . '"';
Another option is to use one of the str_putcsv() (not a standard PHP function) implementations floating about out there on the web such as this one.
This is a very naive approach that will work only if 'valid' commas are those that are between quotes with nothing else but maybe whitespace between.
<?php
$csv_input = '"herp","derp","hey, get rid of these commas, man",1234';
$pattern = '/([^"])\,([^"])/'; //this doesn't work
$revised_input = preg_replace ( $pattern , "$1$2" , $csv_input);
echo $revised_input;
//ouput for this is: "herp","derp","hey get rid of these commas man",1234
It should def be tested more but it works in this case.
Cases where it might not work is where you don't have quotes in the string.
one,two,three,four -> onetwothreefour
EDIT : Corrected the issues with deleting spaces and neighboring letters.
Well, I haven't been lazy and written a small function to do exactly what you need:
function clean_csv_commas($csv){
$len = strlen($csv);
$inside_block = FALSE;
$out='';
for($i=0;$i<$len;$i++){
if($csv[$i]=='"'){
if($inside_block){
$inside_block=FALSE;
}else{
$inside_block=TRUE;
}
}
if($csv[$i]==',' && $inside_block){
// do nothing
}else{
$out.=$csv[$i];
}
}
return $out;
}
You might be coming at this from the wrong angle.
Instead of removing the commas from the text (presumably so you can then split the string on the commas to get the separate elements), how about writing something that works on the quotes?
Once you've found an opening quote, you can check the rest of the string; anything before the next quote is part of this element. You can add some checking here to look for escaped quotes, too, so things like:
"this is a \"quote\""
will still be read properly.
Not exactly an answer you've been looking for - But I've used it for cleaning commas in numbers in CSV.
$csv = preg_replace('%\"([^\"]*)(,)([^\"]*)\"%i','$1$3',$csv);
"3,120", 123, 345, 567 ==> 3120, 123, 345, 567