This may be able to be accomplished with a regular expression but I have no idea. What I am trying to accomplish is being able to parse a string with a given delimiter but when it sees a set of brackets it parses differently. As I am a visual learning let me show you an example of what I am attempting to achieve. (PS this is getting parsed from a url)
Given the string input:
String1,String2(data1,data2,data3),String3,String4
How can I "transform" this string into this array:
{
"String1": "String1",
"String2": [
"data1",
"data2",
"data3"
],
"String3": "String3",
"String4": "String4
}
Formatting doesn't have to be this strict as I'm just attempting to make a simple API for my project.
Obviously things like
array explode ( string $delimiter , string $string [, int $limit = PHP_INT_MAX ] )
Wouldn't work because there are commas inside the brackets as well. I've attempted manual parsing looking at each character at a time but I fear for the performance and it doesn't actually work anyway. I've pasted the gist of my attempt.
https://gist.github.com/Fudge0952/24cb4e6a4ec288a4c492
While you could try to split your initial string on commas and ignore anything in parentheses for the first split, this necessarily makes assumptions about what those string values can actually be (possibly requiring escaping/unescaping values depending on what those strings have to contain).
If you have control over the data format, though, it would be far better to just start with JSON. It's well-defined and well-supported.
You can either build an ad-hoc parser like (mostly untested):
<?php
$p = '!
[^,\(\)]+ # token: String
|, # token: comma
|\( # token: open
|\) # token: close
!x';
$input = 'String1,String2(data1,data2,data3,data4(a,b,c)),String3,String4';
preg_match_all($p, $input, $m);
// using a norewinditerator, so we can use nested foreach-loops on the same iterator
$it = new NoRewindIterator(
new ArrayIterator($m[0])
);
var_export( foo( $it ) );
function foo($tokens, $level=0) {
$result = [];
$current = null;
foreach( $tokens as $t ) {
switch($t) {
case ')':
break; // foreach loop
case '(':
if ( is_null($current) ) {
throw new Exception('moo');
}
$tokens->next();
$result[$current] = foo($tokens, $level+1);
$current = null;
break;
case ',':
if ( !is_null($current) ) {
$result[] = $current;
$current = null;
}
break;
default:
$current = $t;
break;
}
}
if ( !is_null($current) ) {
$result[] = $current;
}
return $result;
}
prints
array (
0 => 'String1',
'String2' =>
array (
0 => 'data1',
1 => 'data2',
2 => 'data3',
'data4' =>
array (
0 => 'a',
1 => 'b',
2 => 'c',
),
),
1 => 'String3',
2 => 'String4',
)
(but will most certainly fail horribly for not-well-formed strings)
or take a look at lexer/parser generator like e.g. PHP_LexerGenerator and PHP_ParserGenerator.
This is a solution with preg_match_all():
$string = 'String1,String2(data1,data2,data3),String3,String4,String5(data4,data5,data6)';
$pattern = '/([^,(]+)(\(([^)]+)\))?/';
preg_match_all( $pattern, $string, $matches );
$result = array();
foreach( $matches[1] as $key => $val )
{
if( $matches[3][$key] )
{ $add = explode( ',', $matches[3][$key] ); }
else
{ $add = $val; }
$result[$val] = $add;
}
$json = json_encode( $result );
3v4l.org demo
Pattern explanation:
([^,(]+) group 1: any chars except ‘,’ and ‘(’
(\(([^)]+)\))? group 2: zero or one occurrence of brackets wrapping:
└──┬──┘
┌──┴──┐
([^)]+) group 3: any chars except ‘,’
Related
I'm trying to find out if there's any function that would split a string like:
keyword=flower|type=outdoors|colour=red
to array:
array('keyword' => 'flower', 'type' => 'outdoors', 'colour' => 'red')
At the moment I built a custom function, which uses explode to first split elements with the separator | and then each of those with assignment symbol =, but is there perhaps a native function which would do it out of the box by specifying the string separator?
The function I've written looks like this:
public static function splitStringToArray(
$string = null,
$itemDivider = '|',
$keyValueDivider = '='
) {
if (empty($string)) {
return array();
}
$items = explode($itemDivider, $string);
if (empty($items)) {
return array();
}
$out = array();
foreach($items as $item) {
$itemArray = explode($keyValueDivider, $item);
if (
count($itemArray) > 1 &&
!empty($itemArray[1])
) {
$out[$itemArray[0]] = $itemArray[1];
}
}
return $out;
}
$string = "keyword=flower|type=outdoors|colour=red";
$string = str_replace('|', '&', $string);
parse_str($string, $values);
$values=array_filter($values); // Remove empty pairs as per your comment
print_r($values);
Output
Array
(
[keyword] => flower
[type] => outdoors
[colour] => red
)
Fiddle
Use regexp to solve this problem.
([^=]+)\=([^\|]+)
http://regex101.com/r/eQ9tW8/1
The issue is that your chosen format of representing variables in a string is non-standard. If you are able to change the | delimiter to a & character you would have (what looks like) a query string from a URL - and you'll be able to parse that easily:
$string = "keyword=flower&type=outdoors&colour=red";
parse_str( $string, $arr );
var_dump( $arr );
// array(3) { ["keyword"]=> string(6) "flower" ["type"]=> string(8) "outdoors" ["colour"]=> string(3) "red" }
I would recommend changing the delimiter at the source instead of manually replacing it with replace() or something similar (if possible).
If I have a string like below:
$str = "Some {translate:text} with some {if:{isCool}?{translate:cool}|{translate:uncool}} features";
... I would like to get the following result:
array (
0 => 'translate:text',
1 => 'if:{isCool}?{translate:cool}|{translate:uncool}',
)
I already have this function but i belive its possible to simplify it with preg_match(_all)?
define('STR_START','{');
define('STR_END','}');
function getMarkers($str, &$arr = array()) {
if(strpos($str,STR_START)) {
list($trash,$str) = explode(STR_START,$str, 2);
unset($trash);
$startPos = 0;
$endPos = 0;
do {
$strStartPos = strpos($str,STR_START,$startPos);
$strEndPos = strpos($str,STR_END,$endPos);
$startPos = $strStartPos + 1;
$endPos = $strEndPos + 1;
} while($strStartPos !== false && $strStartPos < $strEndPos);
$arr[] = substr($str,0,$strEndPos);
getMarkers(substr($str,$strEndPos+1),$arr);
}
return $arr;
}
I have tried the following but it dose not work that well with submarkers.
preg_match_all('/\{(.*?)\}/',"Some {translate:text} with some {if:{isCool}?{translate:cool}|{translate:uncool}} features", $matches);
var_export($matches[1]);
array (
0 => 'translate:text',
1 => 'if:{isCool',
2 => 'translate:cool',
3 => 'translate:uncool',
)
Is it possible to ajust the abowe mentioned pattern to get the right result?
array (
0 => 'translate:text',
1 => 'if:{isCool}?{translate:cool}|{translate:uncool}',
)
You need to use a recursive pattern, example:
$pattern = '~{((?>[^{}]++|(?R))*)}~';
Where (?R) stands for all the pattern (the whole pattern repeated inside itself)
What: I'm attempting to compare data in two arrays and write a statement based on the comparison,
$sys = array("1"=>'kitchen lights', "2"=>'living lights', "3"=>'living fan');
$input = array('off kitchen lights','on living fan');
Note: The input can come in any order! :-/ any ideas
Compare these to allow for me to change the state in a database and write to a change log.
The sys array key is also important here.
Im shooting for a result like the following:
$write = '1:0,2:256';// means off kitchen lights and on living fan
The write is broken into bits like this:
($sys array key number):('256' on or off '0'),(separate next listing...)
Im familiar with array_intersect.
$wordin = explode(" ", $input);
$wordsys = explode(" ", $sys);
$result = array_intersect($wordin, $wordsys);
Im sure I could loop through the array looking for lets say on and replace it with 256 or 0 but im running to issues thinking of how to do the following:
Handle variations like lights versus light...I need them to be equal for this...
Preserve the sys array key number
Note: Im not sure of a "easier" method but I will take any feed back!
Thanks,
JT
More Info: A user types a string. Im pulling all the detail out of the string and arriving at the input array. The sys is a predefined database that the user set up.
To have different triggers for the same thing, you can do something like this (allows you to add more triggers easily). You could also place some regex in the triggers and evaluate them, but you can figure that out yourself ;)
<?php
define('SWITCHED_ON', 256);
define('SWITCHED_OFF', 0);
$sys = array(
'1' => array(
'name' => 'Kitchen Lights',
'triggers' => array(
'kitchen light',
'kitchen lights',
),
),
'2' => array(
'name' => 'Living Lights',
'triggers' => array(
'living lights',
'lights in living room',
'light in living room',
),
),
'3' => array(
'name' => 'Living Fan',
'triggers' => array(
'living fan',
'fan in living room',
),
),
);
$input = array('off kitchen lights','on living fan');
$output = array();
foreach ( $input as $command ) {
// split command at first whitespace
// $command_array = preg_split('%\s+%', $command, 2);
// update to allow input like $input = array('kitchen off lights','living fan on');
$split = preg_split('%\s+%', $command);
$input_switch = false;
$input_trigger = array();
foreach ( $split as $part ) {
if ( $input_switch === false ) {
switch ( $part ) {
case 'on': $input_switch = SWITCHED_ON; break;
case 'off': $input_switch = SWITCHED_OFF; break;
default: $input_trigger[] = $part; break;
}
} else {
$input_trigger[] = $part;
}
}
if ( $input_switch === false || empty($input_trigger) ) {
continue;
}
$input_trigger = implode(' ', $input_trigger);
// insert check if command is valid (for example contains only spaces and alphanumerics.. etc..)
// ...
foreach ( $sys as $syskey => $conf ) {
foreach ( $conf['triggers'] as $trigger ) {
if ( $trigger == $input_trigger ) {
$output[] = $syskey.':'.$input_switch;
continue 3; // continue outer foreach
}
}
}
// if you arrive here, the command was not found in sys
}
$output = implode(',', $output);
echo $output;
PS: The $sys array looks different, but as u say the user sets them up. So there would be no way to check for all cases of "kitchen lights", "kitchen light", and what other stuff the user puts into the array. So they could just fill the array like above, with different triggers for the same thing. I think the ease of use makes up the extra structure of the new $sys. ^^
UPDATE: Updated to allow unordered input. I think the unordered input is kind of hard to deal with, if you can not be sure how many instances of the word "off" or "on" are found in one command. If there are more instances, you won't be able to decide which "on" or "off" is the correct one. There could be a rule.. like "The first instance of "on" or "off" is the one we'll use" or something. The code above will use that rule. So if you input a command like "kitchen off lights on off", it will result in trying to turn OFF the thing that has a trigger "kitchen lights on off". Another possible way is to reject the command if there are more instances of "on"|"off". Or to cut multiple instances of "on"|"off".
Try this:
$values = array();
foreach ($input as $i) {
$parts = explode(' ', $i);
// first word: 'on' || 'off'
$val = array_shift($parts);
// attach the remaining words again to form the key
$key = implode(' ', $parts);
// get the index of the $key value in $sys array
// and concat 0 or 156 depending on $val
$values[] = array_shift(array_keys($sys, $key)).':'.($val == 'on' ? 256: 0);
}
$write = implode(';', $values);
makes use of the second parameter of array_keys to fetch the correct key of the $sys array.
See it in action in this fiddle
edit
For managing different inputs in different formats (without changing the $sys array):
$alts = array(
'kitchen lights' => array(
'kitchen lights', 'kitchen lights', 'lights in kitchen', 'light in kitchen'
),
'living fan' => array(
'living fan', 'living fans', 'fans in living', 'fan in living'
),
);
foreach ($input as $i) {
$i = strtolower($i); // make sure we have all lower caps
// check if on in in the start or beginning of the input
$flag = substr($i, 0, 2) === 'on' || strpos($i, strlen($i)-1, 2) === 'on';
// remove on and off from the string, trim whitespace
$search = trim(str_replace(array('on', 'off'), '', $i));
// search for the resulting string in any of the alt arrays
$foundSysKey = false;
foreach ($alts as $sysKey => $alt) {
if (in_array($search, $alt)) {
$foundSysKey = $sysKey;
break;
}
}
// did not find it? continue to the next one
if ($foundSysKey === false) {
echo 'invalid key: '.$search;
continue;
}
// now you have the info we need and can precede as in the previous example
$values[] = array_shift(array_keys($sys, $foundSysKey)).':'.($flag ? 256: 0);
}
I tried saving an updated fiddle but the site seems to have some problems... it did work though.
I want to extract two substrings from a predictably formatted string.
Each string is comprised of letters followed by numbers.
Inputs & Outputs:
MAU120 => MAU and 120
MAUL345 => MAUL and 345
MAUW23 => MAUW and 23
$matches = array();
if ( preg_match('/^([A-Z]+)([0-9]+)$/i', 'MAUL345', $matches) ) {
echo $matches[1]; // MAUL
echo $matches[2]; // 345
}
If you require the MAU you can do:
/^(MAU[A-Z]*)([0-9]+)$/i
Removing i modifier at the end will make the regex case-sensitive.
Try this regular expression:
/(\D*)(\d*)/
PHP code:
$matches = array();
var_dump( preg_match('/(\D*)(\d*)/', 'MAUL345', $matches) );
var_dump( $matches );
Taken literally from your examples:
<?php
$tests = array('MAU120', 'MAUL345', 'MAUW23', 'bob2', '?##!123', 'In the MAUX123 middle.');
header('Content-type: text/plain');
foreach($tests as $test)
{
preg_match('/(MAU[A-Z]?)(\d+)/', $test, $matches);
$str = isset($matches[1]) ? $matches[1] : '';
$num = isset($matches[2]) ? $matches[2] : '';
printf("\$str = %s\n\$num = %d\n\n", $str, $num);
}
?>
Produces:
$test = MAU120
$str = MAU
$num = 120
$test = MAUL345
$str = MAUL
$num = 345
$test = MAUW23
$str = MAUW
$num = 23
$test = bob2
$str =
$num = 0
$test = ?##!123
$str =
$num = 0
$test = In the MAUX123 middle.
$str = MAUX
$num = 123
When you can guarantee that there will be one or more non-numbers and then one or more numbers, you can call upon sscanf() to parse the string.
The native function has multiple advantages over preg_match().
It doesn't return the fullstring match.
It will allow you to type cast substrings depending on the format placeholder you use.
It can return its array or create reference variables -- depending on the number of parameters you feed it.
Code: (Demo)
$tests = [
'MAU120',
'MAUL345',
'MAUW23',
];
foreach ($tests as $test) {
sscanf($test, '%[^0-9]%d', $letters, $numbers);
var_export([$letters, $numbers]);
echo "\n";
}
Output: (notice that the numbers are cast as integer type)
array (
0 => 'MAU',
1 => 120,
)
array (
0 => 'MAUL',
1 => 345,
)
array (
0 => 'MAUW',
1 => 23,
)
If your numbers might start with zero(s) and you want to retain them, you can use %s instead of %d to capture the non-whitespaces substring. If you use %s, then the digits will be cast as a string instead of int-type.
Alternative syntax: (Demo)
foreach ($tests as $test) {
var_export(sscanf($test, '%[^0-9]%d'));
echo "\n";
}
I have string:
ABCDEFGHIJK
And I have two arrays of positions in that string that I want to insert different things to.
Array
(
[0] => 0
[1] => 5
)
Array
(
[0] => 7
[1] => 9
)
Which if I decided to add the # character and the = character, it'd produce:
#ABCDE=FG#HI=JK
Is there any way I can do this without a complicated set of substr?
Also, # and = need to be variables that can be of any length, not just one character.
You can use string as array
$str = "ABCDEFGH";
$characters = preg_split('//', $str, -1);
And afterwards you array_splice to insert '#' or '=' to position given by array
Return the array back to string is done by:
$str = implode("",$str);
This works for any number of characters (I am using "#a" and "=b" as the character sequences):
function array_insert($array,$pos,$val)
{
$array2 = array_splice($array,$pos);
$array[] = $val;
$array = array_merge($array,$array2);
return $array;
}
$s = "ABCDEFGHIJK";
$arr = str_split($s);
$arr_add1 = array(0=>0, 1=>5);
$arr_add2 = array(0=>7, 1=>9);
$char1 = '#a';
$char2 = '=b';
$arr = array_insert($arr, $arr_add1[0], $char1);
$arr = array_insert($arr, $arr_add1[1] + strlen($char1), $char2);
$arr = array_insert($arr, $arr_add2[0]+ strlen($char1)+ strlen($char2), $char1);
$arr = array_insert($arr, $arr_add2[1]+ strlen($char1)+ strlen($char2) + strlen($char1), $char2);
$s = implode("", $arr);
print_r($s);
There is an easy function for that: substr_replace. But for this to work, you would have to structure you array differently (which would be more structured anyway), e.g.:
$replacement = array(
0 => '#',
5 => '=',
7 => '#',
9 => '='
);
Then sort the array by keys descending, using krsort:
krsort($replacement);
And then you just need to loop over the array:
$str = "ABCDEFGHIJK";
foreach($replacement as $position => $rep) {
$str = substr_replace($str, $rep, $position, 0);
}
echo $str; // prints #ABCDE=FG#HI=JK
This works by inserting the replacements starting from the end of string. And it would work with any replacement string without having to determine the length of that string.
Working DEMO