I have a string that I want to explode by commas but only if the comma is not nested inside some parentheses. This is a fairly common use case, and I have been reading through answered posts in this forum, but not really found what I am looking for.
So, in detail: the point is, i have a string (= SQL SELECT ... FROM statement), and I want to extract the elements from the list separated by comma encoded in this string (= the name of the columns that one wants to select from. However, these elements can contain brackets, and effectively be function calls. For example, in SQL one could do
SELECT TO_CHAR(min(shippings.shippingdate), 'YYYY-MM-DD') as shippingdate, nameoftheguy FROM shippings WHERE ...
Obviously, I would like to have an array now containing as first element
TO_CHAR(min(shippings.shippingdate), 'YYYY-MM-DD') as shippingdate
and as second element
nameoftheguy
The approaches I have followed so far are
PHP and RegEx: Split a string by commas that are not inside brackets (and also nested brackets),
PHP: Split a string by comma(,) but ignoring anything inside square brackets?,
Explode string except where surrounded by parentheses?, and
PHP: split string on comma, but NOT when between braces or quotes?
(focussing on the regex expressions therein, since I would like to do it with a single regex line), but in my little test area, those do not give the proper result. In fact, all of them split nothing or too much:
$Input: SELECT first, second, to_char(my,big,house) as bigly, export(mastermind and others) as aloah FROM
$Output: Array ( [0] => first [1] => second [2] => to_char [3] => (my,big,house) [4] => as [5] => bigly [6] => export [7] => (mastermind and others) [8] => as [9] => aloah )
The code of my test area is
<?php
function test($sql){
$foo = preg_match("/SELECT(.*?)FROM/", $sql, $match);
$bar = preg_match_all("/(?:[^(|]|\([^)]*\))+/", $match[1], $list);
//$bar = preg_match_all("/\((?:[^()]|(?R))+\)|'[^']*'|[^(),\s]+/", $match[1], $list);
//$bar = preg_match_all("/[,]+(?![^\[]*\])/", $match[1], $list);
//$bar = preg_match_all("/(?:[^(|]|\([^)]*\))+/", $match[1], $list);
//$bar = preg_match_all("/[^(,\s]+|\([^)]+\)/", $match[1], $list);
//$bar = preg_match_all("/([(].*?[)])|(\w)+/", $match[1], $list);
print "<br/>";
return $list[0];
}
print_r(test("SELECT first, second, to_char(my,big,house) as bigly, export(mastermind and others) as aloah FROM"));
?>
As you can imagine, I am not an regex expert, but I would like to do this splitting in a single line, if it is possible.
Following the conversation here, I did write a parser to solve this problem. It is quite ugly, but it does the job (at least within some limitations). For completeness (if anybody else might run into the same question), I post it here:
function full($sqlu){
$sqlu = strtoupper($sqlu);
if(strpos($sqlu, "SELECT ")===false || strpos($sqlu, " FROM ")===false) return NULL;
$def = substr($sqlu, strpos($sqlu, "SELECT ")+7, strrpos($sqlu, " FROM ")-7);
$raw = explode(",", $def);
$elements = array();
$rem = array();
foreach($raw as $elm){
array_push($rem, $elm);
$txt = implode(",", $rem);
if(substr_count($txt, "(") - substr_count($txt, ")") == 0){
array_push($elements, $txt);
$rem = array();
}
}
return $elements;
}
When feeding it with the following string
SELECT first, second, to_char(my,(big, and, fancy),house) as bigly, (SELECT myVar,foo from z) as super, export(mastermind and others) as aloah FROM table
it returns
Array ( [0] => first [1] => second [2] => to_char(my,(big, and, fancy),house) as bigly [3] => (SELECT myVar,foo from z) as super [4] => export(mastermind and others) as aloah )
Related
Take a look this string: (parent)item category(child)master data(name)category
by the way, that string is dynamic, and I want word inside () as array key and everything after () is that key value before next ()
how can I get the array result from the string above to this: ["parent" => "item category", "child" => "master data", "name" => "category"]?
This probably is what you are looking for:
<?php
$input = "(parent)item category(child)master data(name)category";
preg_match_all('/\(([^()]+)\)([^()]+)/', $input, $matches);
$output = array_combine($matches[1], $matches[2]);
print_r($output);
The output obviously is:
Array
(
[parent] => item category
[child] => master data
[name] => category
)
The approach uses a "regular expression" matching all occurrences of a pattern in the input string. All that is left is to combine the matched tokens which is done by the array_combine(...) call.
Note that such an approach works, but is very limited. It fails with more complex input structure due to the fact that pattern matching based on regular expressions is limited itself. In such cases you'd either have to implement a real language parser (or use a compiler-compiler like yacc or bison to do that for you). Or you simplify your input data structure which usually is more promising ;-)
you can use explode to get an array based on the selected word
<?php
$str = "(parent)item category(child)master data(name)category";
$list = explode("(", $str);
$x = [];
foreach($list as $item){
if($item != null) {
$i = explode(")",$item);
$x[$i[0]] = $i[1];
}
}
print_r($x);
I'm having troubles extracting several strings between tags from a variable, in order to echo them inside a span individually.
Thing is, the tags in question are determined by a variable, here is what it looks like :
<?php
$string = "[en_UK]english1[/en_UK][en_UK]english2[/en_UK][fr_FR]francais1[/fr_FR][fr_FR]francais2[/fr_FR][fr_FR]francais3[/fr_FR]";
$lang = "en_UK";
preg_match("/(.[$lang]), (.[\/$lang])/", $string, $outputs_list);
foreach ($outputs_list as $output) {
echo "<span>".$output."/span>";
}
// in this exemple I want to output :
// <span>english1</span>
// <span>english2</span>
?>
It's my first time using preg_match and after trying so many differents things I'm kinda lost right now.
Basically I want to extract every strings contained between the tags [$lang] and [/$lang] (in my exemple $lang = "en_UK" but it will be determined by the user's cookies.
I'd like some help figuring this out if possible,
Thanks
[] in a regular expression makes a character class. I'm not sure what you're trying to do with the .s and , either. Your regex currently says:
Any single character, an e, n, _, U, or K, a , and space, and again any single character, an e, n, _, U, K, but this time also allowing /.
Regex demo: https://regex101.com/r/8pmy89/2
I also believe you were grouping the wrong value. The () go around what you want to capture, know as a capture group.
I think the regex you want is:
\[$lang\](.+?)\[\/$lang\]
Regex demo: https://regex101.com/r/8pmy89/3
PHP Usage:
$string = "[en_UK]english1[/en_UK][en_UK]english2[/en_UK][fr_FR]francais1[/fr_FR][fr_FR]francais2[/fr_FR][fr_FR]francais3[/fr_FR]";
$lang = "en_UK";
preg_match_all("/\[$lang\](.+?)\[\/$lang\]/", $string, $outputs_list);
foreach ($outputs_list[1] as $output) {
echo "<span>".$output."/span>";
}
PHP demo: https://eval.in/686086
Preg_match vs. preg_match_all
Preg_match only returns the first match(es) of a regex. Preg_match_all returns all matches of the regex. The 0 index has what the full regex matched. All subsequent indexes are each capture groups, e.g. 1 is the first capture group.
Simple Demo: https://eval.in/686116
$string = '12345';
preg_match('/(\d)/', $string, $match);
print_r($match);
preg_match_all('/(\d)/', $string, $match);
print_r($match);
Output:
Array
(
[0] => 1
[1] => 1
)
^ is preg_match, below is the preg_match_all
Array
(
[0] => Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
)
[1] => Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
)
)
I made little late to post this answer. #chris already answered your question, so i made the solution broad which might help other users who may have different requirements.
<?php
$string = "[en_UK]english1[/en_UK][en_UK]english2[/en_UK][fr_FR]francais1[/fr_FR][fr_FR]francais2[/fr_FR][fr_FR]francais3[/fr_FR]";
$lang = "en_UK";
echo "User Language -";
preg_match_all("/\[$lang\](.+?)\[\/$lang\]/", $string, $outputs_list);
foreach ($outputs_list[1] as $output) {
echo "<span>".$output.",</span>";
}
echo "<br/>All languages - ";
//if all lang
preg_match_all("/[a-zA-Z0-9]{3,}/", $string, $outputs_list);
foreach ($outputs_list[0] as $output) {
echo "<span>".$output.",</span>";
}
echo "<br/>Type of lang - ";
//for showing all type of lang
preg_match_all("/(\[[a-zA-Z_]+\])*/", $string, $outputs_list);
foreach ($outputs_list[1] as $output) {
echo "<span>".$output."</span>";
}
?>
OUTPUT
User Language -english1,english2,
All languages - english1,english2,francais1,francais2,francais3,
Type of lang - [en_UK][en_UK][fr_FR][fr_FR][fr_FR]
I have bunch of strings like this:
a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc
And what I need to do is to split them up based on the hashtag position to something like this:
Array
(
[0] => A
[1] => AAX1AAY222
[2] => B
[3] => BBX4BBY555BBZ6
[4] => C
[5] => MMM1
[6] => D
[7] => ARA1
[8] => E
[9] => ABC
)
So, as you see the character right behind the hashtag is captured plus everything after the hashtag just right before the next char+hashtag.
I've the following RegEx which works fine only when I have a numeric value in the end of each part.
Here is the RegEx set up:
preg_split('/([A-Z])+#/', $text, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
And it works fine with something like this:
C#mmm1D#ara1
But, if I change it to this (removing the numbers):
C#mmmD#ara
Then it will be the result, which is not good:
Array
(
[0] => C
[1] => D
)
I've looked at this question and this one also, which are similar but none of them worked for me.
So, my question is why does it work only if it has followed by a number? and how I can solve it?
Here you can see some of them sample strings which I have:
a#123b#abcc#def456 // A:123, B:ABC, C:DEF456
a#abc1def2efg3b#abcdefc#8 // A:ABC1DEF2EFG3, B:ABCDEF, C:8
a#abcdef123b#5c#xyz789 // A:ABCDEF123, B:5, C:XYZ789
P.S. Strings are case-insensitive.
P.P.S. If you ever thinking what the hell are these strings, they are user submitted answers to a questionnaire, and I can't do anything on them like refactoring as they are already stored and just need to be proceed.
Why Not Using explode?
If you look at my examples you will see that I need to capture the character right before the # as well. If you think it's possible with explode() please post the output as well, thanks!
Update
Should we focus on why /([A-Z])+#/ works only if numbers included? thanks.
Instead of using preg_split(), decide what you want to match instead:
A set of "words" if followed by either <any-char># or <end-of-string>.
A character if immediately followed by #.
$str = 'a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc';
preg_match_all('/\w+(?=.#|$)|\w(?=#)/', $str, $matches);
Demo
This expression uses two look-ahead assertions. The results are in $matches[0].
Update
Another way of looking at it would be this:
preg_match_all('/(\w)#(\w+)(?=\w#|$)/', $str, $matches);
print_r(array_combine($matches[1], $matches[2]));
Each entry starts with a single character, followed by a hash, followed by X characters until either the end of the string is encountered or the start of a next entry.
The output is this:
Array
(
[a] => aax1aay222
[b] => bbx4bby555bbz6
[c] => mmm1
[d] => ara1
[e] => abc
)
If you still want to use preg_split you can remove the + and it might work as expected:
'/([A-Z])#/i'
Since then you only match the hashtag and ONE alpha character before, and not all them.
Example: http://codepad.viper-7.com/z1kFDb
Edit: Added a case-insensitive flag i in the pattern.
Use explode() rather than Regexp
$tmpArray = explode("#","a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc");
$myArray = array();
for($i = 0; $i < count($tmpArray) - 1; $i++) {
if (substr($tmpArray[$i],0,-1)) $myArray[] = substr($tmpArray[$i],0,-1);
if (substr($tmpArray[$i],-1)) $myArray[] = substr($tmpArray[$i],-1);
}
if (count($tmpArray) && $tmpArray[count($tmpArray) - 1]) $myArray[] = $tmpArray[count($tmpArray) - 1];
edit: I updated my answer to reflect better reading the questions
You can use explode() function that will split the string except the hash signs, like stated in the answers given before.
$myArray = explode("#",$string);
For the string 'a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc' this returns something like
$myarray = array('a', 'aax1aay22b', 'bbx4bby555bbz6c' ....);
All you need now is to take the last character of each string in array as another item.
$copy = array();
foreach($myArray as $item){
$beginning = substr($item,0,strlen($item)-1); // this takes all characters except the last one
$ending = substr($item,-1); // this takes the last one
$copy[] = $beginning;
$copy[] = $ending;
} // end foreach
This is an example, not tested.
EDIT
Instead of substr($item,0,strlen($item)-1); you might use substr($item,0,-1);.
Background
I have an array which I create by splitting a string based on every occurrence of 0d0a using preg_split('/(?<=0d0a)(?!$)/').
For example:
$string = "78781110d0a78782220d0a";
will be split into:
Array ( [0] => 78781110d0a [1] => 78782220d0a )
A valid array element has to start with 7878 and end with 0d0a.
The Problem
But sometimes, there's an additional 0d0a in the string which splits into an extra and invalid array element, i.e., that doesn't begin with 7878.
Take this string for example:
$string = "78781110d0a2220d0a78783330d0a";
This is split into:
Array ( [0] => 78781110d0a [1] => 2220d0a [2] => 78783330d0a )
But it should actually be:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a)
My Solution
I've written the following (messy) code to get around this:
$data = Array('78781110d0a','2220d0a','78783330d0a');
$i = 0; //count for $data array;
$j = 0; //count for $dataFixed array;
$dataFixed = $data;
foreach($data as $packet) {
if (substr($packet,0,4) != "7878") { //if packet doesn't start with 7878, do some fixing
if ($i != 0) { //its the first packet, can't help it!
$j++;
if ((substr(strtolower($packet), -4, 4) == "0d0a")) { //if the packet doesn't end with 0d0a, its 'mostly' not valid, so discard it
$dataFixed[$i-$j] = $dataFixed[$i-$j] . $packet;
}
unset($dataFixed[$i-$j+1]);
$dataFixed = array_values($dataFixed);
}
}
$i++;
}
Description
I first copy the array to another array $dataFixed. In a foreach loop of the $data array, I check whether it starts with 7878. If it doesn't, I join it with the previous array in $data. I then unset the current array in $dataFixed and reset the array elements with array_values.
But I'm not very confident about this solution.. Is there a better, more efficient way?
UPDATE
What if the input string doesn't end in 0d0a like its supposed to? It will stick to the previous array element..
For e.g.: in the string 78781110d0a2220d0a78783330d0a0000, 0000 should be separated as another array element.
Use another positive lookahead (?=7878) to form:
preg_split('/(?<=0d0a)(?=7878)/',$string)
Note: I removed (?!$) because I wasn't sure what that was for, based on your example data.
For example, this code:
$string = "78781110d0a2220d0a78783330d0a";
$array = preg_split('/(?<=0d0a)(?=7878)(?!$)/',$string);
print_r($array);
Results in:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a )
UPDATE:
Based on your revised question of having possible random characters at the end of the input string, you can add three lines to make a complete program of:
$string = "78781110d0a2220d0a787830d0a330d0a0000";
$array = preg_split('/(?<=0d0a)(?=7878)/',$string);
$temp = preg_split('/(7878.*0d0a)/',$array[count($array)-1],null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$array[count($array)-1] = $temp[0];
if(count($temp)>1) { $array[] = $temp[1]; }
print_r($array);
We basically do the initial splitting, then split the last element of the resulting array by the expected data format, keeping the delimiter using PREG_SPLIT_DELIM_CAPTURE. The PREG_SPLIT_NO_EMPTY ensures we won't get an empty array element if the input string doesn't end in random characters.
UPDATE 2:
Based on your comment below where it seems you're implying there might be random characters between any of the desired matches, and you want these random characters preserved, you could do this:
$string = "0078781110d0a2220d0a2220d0a0000787830d0a330d0a000078781110d0a2220d0a0000787830d0a330d0a0000";
$split1 = preg_split('/(7878.*?0d0a)/',$string,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$result = array();
foreach($split1 as $e){
$split2 = preg_split('/(.*0d0a)/',$e,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
foreach($split2 as $el){
// test if $el doesn't start with 7878 and ends with 0d0a
if(strpos($el,'7878') !== 0 && substr($el,-4) == '0d0a'){
//if(preg_match('/^(?!7878).*0d0a$/',$el) === 1){
$result[ count($result)-1 ] = $result[ count($result)-1 ] . $el;
} else {
$result[] = $el;
}
}
}
print_r($result);
The strategy employed here is different than above. First we split the input string based on the delimiter that matches your desired data, using the nongreedy regex .*?. At this point we have some strings that contain the ending of a desired value and some garbage at the end, so we split again based on the last occurrence of "0d0a" with the greedy regex .*0d0a. We then append any of those resulting values that don't start with "7878" but end with "0d0a" to the previous value, as this should repair the first and second halves that got split because it contained an extra "0d0a".
I provided two methods for the innermost if statement, one using regular expressions. The regex one is marginally slower in my testing, so I've left that one commented out.
I might still not have your full requirements, so you'll have to let me know if it works and perhaps provided your full dataset.
I think you are using a delimiter "0d0a" which also happens to be part of a content! Its not possible to avoid getting junk data as long as delimiter can also be part of content. Somehow delimiter must be unique.
Possible solutions.
Change the delimited to something else that doesn't occur as part of your data ( 000000, #!.;)
If you are definite about length of text that easy arrange item may have, use it. As per examples its not possible.
Solutions given in answers considering only sample data you have shared. If you are confidant about what will be the content of string, then these solutions given by others are pretty good to use. Otherwise these solutions wont assure you guarantee!
Best solution: Fix right delimiter then use regex or explode whatever you prefer.
Why don't you use preg_match_all instead? You can avoid all of the non-capturing groups (the look aheads, look behinds) in order to split the string (which without the non-capturing groups removes the matches), and just find the matches you're looking for:
Updated
<?php
$string = "00787817878110d0a22278780d0a78783330d0a00";
preg_match_all('/7878.*?0d0a(?=7878|[^(7878)]*?$)/', $string, $arr);
print_r($arr);
?>
Gives an array $arr[0] => ( [0] => 787817878110d0a22278780d0a, [1] => 78783330d0a ). Strips leading and trailing garbage characters (whatever doesn't start with 7878 or end with 7878 or 0d0a.
So $arr[0] would be the array of values that you are looking for.
See example on ideone
Works with multiple 7878 values and multiple 0d0a values (even though that's ridiculous).
Update
If splitting is more your style, why not avoid regular expressions altogether?
<?php
$string = "787817878110d0a22278780d0a78783330d0a";
$arr = explode('0d0a7878', $string);
$string = implode('0d0a,7878', $arr);
$arr = explode(',', $string);
print_r($arr);
?>
Here we split the string by the delimiter 0d0a7878, which is what #CharlieGorichanaz's solution is doing, and props to him for the quick, accurate solution. We then add a comma, because who doesn't love comma separated values? And we explode again on the commas for an array of desired values. Performance-wise, this ought to be faster than using regular expressions. See example.
I am trying to match a semi dynamically generated string. So I can see if its the correct format, then extract the information from it that I need. My Problem is I no matter how hard I try to grasp regex can't fathom it for the life of me. Even with the help of so called generators.
What I have is a couple different strings like the following. [#img:1234567890] and [#user:1234567890] and [#file:file_name-with.ext]. Strings like this pass through are intent on passing through a filter so they can be replaced with links, and or more readable names. But again try as I might I can't come up with a regex for any given one of them.
I am looking for the format: [#word:] of which I will strip the [, ], #, and word from the string so I can then turn around an query my DB accordingly for whatever it is and work with it accordingly. Just the regex bit is holding me back.
Not sure what you mean by generators. I always use online matchers to see that my test cases work. #Virendra almost had it except forgot to escape the [] charaters.
/\[#(\w+):(.*)\]/
You need to start and end with a regex delimeter, in this case the '/' character.
Then we escape the '[]' which is use by regex to match ranges of characters hence the '['.
Next we match a literal '#' symbol.
Now we want to save this next match so we can use it later so we surround it with ().
\w matches a word. Basically any characters that aren't spaces, punctuation, or line characters.
Again match a literal :.
Maybe useful to have the second part in a match group as well so (.*) will match any character any number of times, and save it for you.
Then we escape the closing ] as we did earlier.
Since it sounds like you want to use the matches later in a query we can use preg_match to save the matches to an array.
$pattern = '/\[#(\w+):(.*)\]/';
$subject = '[#user:1234567890]';
preg_match($pattern, $subject, $matches);
print_r($matches);
Would output
array(
[0] => '[#user:1234567890]', // Full match
[1] => 'user', // First match
[2] => '1234567890' // Second match
)
An especially helpful tool I've found is txt2re
Here's what I would do.
<pre>
<?php
$subj = 'An image:[#img:1234567890], a user:[#user:1234567890] and a file:[#file:file_name-with.ext]';
preg_match_all('~(?<match>\[#(?<type>[^:]+):(?<value>[^\]]+)\])~',$subj,$matches,PREG_SET_ORDER);
foreach ($matches as &$arr) unset($arr[0],$arr[1],$arr[2],$arr[3]);
print_r($matches);
?>
</pre>
This will output
Array
(
[0] => Array
(
[match] => [#img:1234567890]
[type] => img
[value] => 1234567890
)
[1] => Array
(
[match] => [#user:1234567890]
[type] => user
[value] => 1234567890
)
[2] => Array
(
[match] => [#file:file_name-with.ext]
[type] => file
[value] => file_name-with.ext
)
)
And here's a pseudo version of how I would use the preg_replace_callback() function:
function replace_shortcut($matches) {
global $users;
switch (strtolower($matches['type'])) {
case 'img' : return '<img src="images/img_'.$matches['value'].'jpg" />';
case 'file' : return ''.$matches['value'].'';
// add id of each user in array
case 'user' : $users[] = (int) $matches['value']; return '%s';
default : return $matches['match'];
}
}
$users = array();
$replaceArr = array();
$subj = 'An image:[#img:1234567890], a user:[#user:1234567890] and a file:[#file:file_name-with.ext]';
// escape percentage signs to avoid complications in the vsprintf function call later
$subj = strtr($subj,array('%'=>'%%'));
$subj = preg_replace_callback('~(?<match>\[#(?<type>[^:]+):(?<value>[^\]]+)\])~',replace_shortcut,$subj);
if (!empty($users)) {
// connect to DB and check users
$query = " SELECT `id`,`nick`,`date_deleted` IS NOT NULL AS 'deleted'
FROM `users` WHERE `id` IN ('".implode("','",$users)."')";
// query
// ...
// and catch results
while ($row = $con->fetch_array()) {
// position of this id in users array:
$idx = array_search($row['id'],$users);
$nick = htmlspecialchars($row['nick']);
$replaceArr[$idx] = $row['deleted'] ?
"<span class=\"user_deleted\">{$nick}</span>" :
"{$nick}";
// delete this key so that we can check id's not found later...
unset($users[$idx]);
}
// in here:
foreach ($users as $key => $value) {
$replaceArr[$key] = '<span class="user_unknown">User'.$value.'</span>';
}
// replace each user reference marked with %s in $subj
$subj = vsprintf($subj,$replaceArr);
} else {
// remove extra percentage signs we added for vsprintf function
$subj = preg_replace('~%{2}~','%',$subj);
}
unset($query,$row,$nick,$idx,$key,$value,$users,$replaceArr);
echo $subj;
You can try something like this:
/\[#(\w+):([^]]*)\]/
\[ escapes the [ character (otherwise interpreted as a character set); \w means any "word" character, and [^]]* means any non-] character (to avoid matching past the end of the tag, as .* might). The parens group the various matched parts so that you can use $1 and $2 in preg_replace to generate the replacement text:
echo preg_replace('/\[#(\w+):([^]]*)\]/', '$1 $2', '[#link:abcdef]');
prints link abcdef