Parsing flat-file database information into a multidimensional array - php
I want to make a class for parsing flat-file database information into one large analogous multidimensional array. I had the idea of formatting the database in a sort of python-esque format as follows:
"tree #1":
"key" "value"
"sub-tree #1":
"key" "value"
"key #2" "value"
"key #3" "value"
I am trying to make it parse this and build and array while parsing it to throw the keys/values into, and I want it to be very dynamic and expandable. I've tried many different techniques and I've been stumped in each of these attempts. This is my most recent:
function parse($file=null) {
$file = $file ? $file : $this->dbfile;
### character variables
# get values of
$src = file_get_contents($file);
# current character number
$p = 0;
### array variables
# temp shit
$a = array();
# set $ln keys
$ln = array("q"=>0,"k"=>null,"v"=>null,"s"=>null,"p"=>null);
# indent level
$ilvl = 0;
### go time
while (strlen($src) > $p) {
$chr = $src[$p];
# quote
if ($chr == "\"") {
if ($ln["q"] == 1) { // quote open?
$ln["q"] = 0; // close it
if (!$ln["k"]) { // key yet?
$ln["k"] = $ln["s"]; // set key
$ln["s"] = null;
$a[$ln["k"]] = $ln["v"]; // write to current array
} else { // value time
$ln["v"] = $ln["s"]; // set value
$ln["s"] = null;
}
} else {
$ln["q"] = 1; // open quote
}
}
elseif ($chr == "\n" && $ln["q"] == 0) {
$ln = array("q"=>0,"k"=>null,"v"=>null,"s"=>null,"p"=>null);
$llvl = $ilvl;
}
# beginning of subset
elseif ($chr == ":" && $ln["q"] == 0) {
$ilvl++;
if (!array_key_exists($ilvl,$a)) { $a[$ilvl] = array(); }
$a[$ilvl][$ln["k"]] = array("#mbdb-parent"=> $ilvl-1 .":".$ln["k"]);
$ln = array("q"=>0,"k"=>null,"v"=>null,"s"=>null,"p"=>null);
$this->debug("INDENT++",$ilvl);
}
# end of subset
elseif ($chr == "}") {
$ilvl--;
$this->debug("INDENT--",$ilvl);
}
# other characters
else {
if ($ln["q"] == 1) {
$ln["s"] .= $chr;
} else {
# error
}
}
$p++;
}
var_dump($a);
}
I honestly have no idea where to go from here. The thing troubling me most is setting the multidimensional values like $this->c["main"]["sub"]["etc"] the way I have it here. Can it even be done? How can I actually nest the arrays as the data is nested in the db file?
This is all going to depend on how human-readable you want your "flat file" to be.
Want human-readable?
XML
Yaml
Semi-human-readable?
JSON
Not really human-readable?
Serialized PHP (also PHP-only)
Mysql Dump
Writing your own format is going to be painful. Unless you want to do this purely for the academic experience, then I say don't bother.
Looks like JSON might be a happy medium for you.
$configData = array(
'tree #1' => array(
'key' => 'value'
, 'sub-tree #1' => array(
'key' => 'value'
, 'key #2' => 'value'
, 'key #3' => 'value'
)
)
);
// Save config data
file_put_contents( 'path/to/config.json', json_format( json_encode( $configData ) ) );
// Load it back out
$configData = json_decode( file_get_contents( 'path/to/config.json' ), true );
// Change something
$configData['tree #1']['sub-tree #1']['key #2'] = 'foo';
// Re-Save (same as above)
file_put_contents( 'path/to/config.json', json_format( json_encode( $configData ) ) );
You can get the json_format() function here, which just pretty-formats for easier human-reading. If you don't care about human-readability, you can skip it.
Well, you could use serialize and unserialize but that would be no fun, right? You should be using formats specifically designed for this purpose, but for sake of exercise, I'll try and see what I can come up with.
There seems to be two kinds of datatypes in your flatfile, key-value pairs and arrays. key-value pairs are denoted with two sets of quotes and arrays with one pair of quotes and a following colon. As you go through the file, you must parse each row and determine what it represents. That's easy with regular expressions. The hard part is to keep track of the level we're going at and act accordingly. Here's a function that parses the tree you provided:
function parse_flatfile($filename) {
$file = file($filename);
$result = array();
$open = false;
foreach($file as $row) {
$level = strlen($row) - strlen(ltrim($row));
$row = rtrim($row);
// Regular expression to catch key-value pairs
$isKeyValue = preg_match('/"(.*?)" "(.*?)"$/', $row, $match);
if($isKeyValue == 1) {
if($open && $open['level'] < $level) {
$open['item'][$match[1]] = $match[2];
} else {
$open = array('level' => $level - 1, 'item' => &$open['parent']);
if($open) {
$open['item'][$match[1]] = $match[2];
} else {
$result[$match[1]] = $match[2];
}
}
// Regular expression to catch arrays
} elseif(($isArray = preg_match('/"(.*?)":$/', $row, $match)) > 0) {
if($open && $open['level'] < $level) {
$open['item'][$match[1]] = array();
$open = array('level' => $level, 'item' => &$open['item'][$match[1]], 'parent' => &$open['item']);
} else {
$result[$match[1]] = array();
$open = array('level' => $level, 'item' => &$result[$match[1]], 'parent' => false);
}
}
}
return $result;
}
I won't go into greater detail on how that works, but it short, as we progress deeper into the array, the previous level is stored in a reference $open and so on. Here's a more complex tree using your notation:
"tree_1":
"key" "value"
"sub_tree_1":
"key" "value"
"key_2" "value"
"key_3" "value"
"key_4" "value"
"key_5" "value"
"tree_2":
"key_6" "value"
"sub_tree_2":
"sub_tree_3":
"sub_tree_4":
"key_6" "value"
"key_7" "value"
"key_8" "value"
"key_9" "value"
"key_10" "value"
And to parse that file you could use:
$result = parse_flatfile('flat.txt');
print_r($result);
And that would output:
Array
(
[tree_1] => Array
(
[key] => value
[sub_tree_1] => Array
(
[key] => value
[key_2] => value
[key_3] => value
)
[key_4] => value
[key_5] => value
)
[tree_2] => Array
(
[key_6] => value
[sub_tree_2] => Array
(
[sub_tree_3] => Array
(
[sub_tree_4] => Array
(
[key_6] => value
[key_7] => value
[key_8] => value
[key_9] => value
[key_10] => value
)
)
)
)
)
I guess my test file covers all the bases, and it should work without breaking. But I won't give any guarantees.
Transforming a multidimensional array to flatfile using this notation will be left as an exercise to the reader :)
Related
Building an array from a static array
EDIT: I have the following: $EXCLUDE['extensions']="199"; $EXCLUDE['extensions']="800-1000": What I want to do is be able to create a list of values, either single digit or a range. End result: I have this(listed below) and I want to replace the 799 and 1000 with what I have below and set it so that it DOES NOT display those extensions if they contain those numbers. But I also want to include a range as well. foreach($obj as $file) { if($file['dir_list']=="yes"){ if($file['user']<="799" || $file['user']>="1000"){ $D = $domain; $V = $file['user']; $g = $this->get_presence($D,$V);
Make a 2-dimensional array of start/end values. $exclude = array( array('start' => 199, 'end' => 199), array('start' => 800, 'end => 1000) ); Then iterate over the array to see if the value is in on of the excluded ranges. foreach ($obj as $file) { $excluded = false; foreach ($exclude as $e) { $num = intval($file['user']); if ($num >= $e['start'] && $num <= $e['end']) { $excluded = true; break; } } if (!$excluded) { $D = $domain; $V = $file['user']; $g = $this->get_presence($D, $V); } }
Not sure whether I've understood the question correctly, but here it is: The initialization of an array can be done with array(), for example: <?php $a = array(1, 5, "foo", 8, 19, "some text", "foo"); ?> This creates an array with seven elements. As you can see, types within an array can be mixed. Values that you put in an array must neither be array nor come from arrays, but it is allowed. The following code creates an array of arrays: <?php //array of arrays $a = array( array(1,2,3), array("test", "text"), array(1,2,"foo") ); ?>
Is it possible to read an associative array value from layers deeper than the first one by passing its coordinate in a string? [duplicate]
I have a multidimensional array, here is a small excerpt: Array ( [Albums] => Array ( [A Great Big World - Is There Anybody Out There] => Array(...), [ATB - Contact] => Array(...), ) [Pop] => Array (...) ) And I have a dynamic path: /albums/a_great_big_world_-_is_there_anybody_out_there What would be the best way to retrieve the value of (in this example) $arr["albums"]["A Great Big World - Is There Anybody Out There"]? Please note that it should be dynamic, since the nesting can go deeper than the 2 levels in this example. EDIT Here is the function I use to create a simple string for the URL: function formatURL($url) { return preg_replace('/__+/', '_', preg_replace('/[^a-z0-9_\s-]/', "", strtolower(str_replace(" ", "_", $url)))); }
$array = array(...); $path = '/albums/a_great_big_world_-_is_there_anybody_out_there'; $value = $array; foreach (explode('/', trim($path, '/')) as $key) { if (isset($value[$key]) && is_array($value[$key])) { $value = $value[$key]; } else { throw new Exception("Path $path is invalid"); } } echo $value;
String to multidimensional array path
I have a multidimensional array, here is a small excerpt: Array ( [Albums] => Array ( [A Great Big World - Is There Anybody Out There] => Array(...), [ATB - Contact] => Array(...), ) [Pop] => Array (...) ) And I have a dynamic path: /albums/a_great_big_world_-_is_there_anybody_out_there What would be the best way to retrieve the value of (in this example) $arr["albums"]["A Great Big World - Is There Anybody Out There"]? Please note that it should be dynamic, since the nesting can go deeper than the 2 levels in this example. EDIT Here is the function I use to create a simple string for the URL: function formatURL($url) { return preg_replace('/__+/', '_', preg_replace('/[^a-z0-9_\s-]/', "", strtolower(str_replace(" ", "_", $url)))); }
$array = array(...); $path = '/albums/a_great_big_world_-_is_there_anybody_out_there'; $value = $array; foreach (explode('/', trim($path, '/')) as $key) { if (isset($value[$key]) && is_array($value[$key])) { $value = $value[$key]; } else { throw new Exception("Path $path is invalid"); } } echo $value;
php to set order of preference of an Array Sequence using string length
$records = array( '123PP' => 3.63, '123DDD' => 9.63, '123D' => 6.63, '123PPPP' => 9.63, '123DD' => 9.63, '123P' => 2.63, '123PPP' => 1.53 ); After looping through the records, I have to get only one value whose key should be 123D because the order of preference is: 123D, 123P, 123DD, 123PP, 123DDD, 123PPP, 123PPPP... For e.g.: If 123D is not found in the array, then 123P is the answer. If 123P is not found in the array, then 123DD is the answer. And I have found a solution : foreach ($records as $key => $value) { if (empty($this->minLength)) { $this->invoiceTax = $value; $this->minLength = strlen($key); } elseif (strpos($key, 'P') !== false && (strlen($key) < $this->minLength)) { $this->invoiceTax = $value; $this->minLength = strlen($key); } elseif (strpos($key, 'D') !== false && (strlen($key) <= $this->minLength)) { $this->invoiceTax = $value; $this->minLength = strlen($key); } But I want to know if this code can be optimised by not storing the string length of every key.
This function could easily be tidied but this is something that could be solved with recursion. This means that if 123D is in the array the code will be highly optimised and only run once, twice for 123P, three times for 123DD, etc. function GetMostPref($records, $key = "123", $next = "D", $depth = 0) { if($depth == count($records)) { // Hit end of array with nothing found return false; } if(strpos($next, 'D') !== false) { // Currently looking at a 'D...' key. // Next key is the same length as this key just with Ps. $nextKey = str_repeat('P', strlen($next)); } else if(strpos($next, 'P') !== false) { // Currently looking at a 'P...' key. // Next key has one extra char and is all Ds. $nextKey = str_repeat('D', strlen($next)+1); } else { // Key not valid return false; } if(array_key_exists($key.$next, $records)) { // Found the key in the array so return it. return $records[$key.$next]; } else { // Recursive call with the next key and increased depth. return GetMostPref($records, $key, $nextKey, $depth + 1); } } // Testing $records = array( '123PP' => 3.63, '123DDD' => 9.63, '123D' => 6.63, '123PPPP' => 9.63, '123DD' => 9.63, '123P' => 2.63, '123PPP' => 1.53 ); // Finds 123D and returns 6.63 echo GetMostPref($records);
function prepareFunctionCall(){ $records = array('123PP' => 3.63,'123DDD' => 9.63,'123PPPP' => 9.63 ,'123DD' => 9.63,'123P' => 2.63,'123PPP' => 1.53); // '123D' => 6.63, foreach($records as $key=>$value){ $tmp = strlen($key).$key; $arTmp[$tmp] = $value; } getFirstValue($arTmp); } function getFirstValue($pArray){ ksort($pArray); reset($pArray); print key($pArray).' = '.current($pArray); } This is an alternative for the good solution provided by MatthewMcGovern. I provide the alternative, because this function makes use of the php functions ksort, reset and current. Those functions are created for this type of situation and if possible then would I advise you to rewrite the keys of the array before finding out which key is the first key to select. That is what I did with the addition of the strlen. But that is suboptimal compared to rewriting the keys at the moment of collecting the data. The core of the function are the calls to the functions ksort, reset and current.
Convert PostgreSQL array to PHP array
I have trouble reading Postgresql arrays in PHP. I have tried explode(), but this breaks arrays containing commas in strings, and str_getcsv() but it's also no good as PostgreSQL doesn't quote the Japanese strings. Not working: explode(',', trim($pgArray['key'], '{}')); str_getcsv( trim($pgArray['key'], '{}') ); Example: // print_r() on PostgreSQL returned data: Array ( [strings] => {または, "some string without a comma", "a string, with a comma"} ) // Output: Array ( [0] => または [1] => "some string without a comma" [2] => "a string [3] => with a comma" ) explode(',', trim($pgArray['strings'], '{}')); // Output: Array ( [0] => [1] => some string without a comma [2] => a string, with a comma ) print_r(str_getcsv( trim($pgArray['strings'], '{}') ));
If you have PostgreSQL 9.2 you can do something like this: SELECT array_to_json(pg_array_result) AS new_name FROM tbl1; The result will return the array as JSON Then on the php side issue: $array = json_decode($returned_field); You can also convert back. Here are the JSON functions page
As neither of these solutions work with multidimentional arrays, so I offer here my recursive solution that works with arrays of any complexity: function pg_array_parse($s, $start = 0, &$end = null) { if (empty($s) || $s[0] != '{') return null; $return = array(); $string = false; $quote=''; $len = strlen($s); $v = ''; for ($i = $start + 1; $i < $len; $i++) { $ch = $s[$i]; if (!$string && $ch == '}') { if ($v !== '' || !empty($return)) { $return[] = $v; } $end = $i; break; } elseif (!$string && $ch == '{') { $v = pg_array_parse($s, $i, $i); } elseif (!$string && $ch == ','){ $return[] = $v; $v = ''; } elseif (!$string && ($ch == '"' || $ch == "'")) { $string = true; $quote = $ch; } elseif ($string && $ch == $quote && $s[$i - 1] == "\\") { $v = substr($v, 0, -1) . $ch; } elseif ($string && $ch == $quote && $s[$i - 1] != "\\") { $string = false; } else { $v .= $ch; } } return $return; } I haven't tested it too much, but looks like it works. Here you have my tests with results: var_export(pg_array_parse('{1,2,3,4,5}'));echo "\n"; /* array ( 0 => '1', 1 => '2', 2 => '3', 3 => '4', 4 => '5', ) */ var_export(pg_array_parse('{{1,2},{3,4},{5}}'));echo "\n"; /* array ( 0 => array ( 0 => '1', 1 => '2', ), 1 => array ( 0 => '3', 1 => '4', ), 2 => array ( 0 => '5', ), ) */ var_export(pg_array_parse('{dfasdf,"qw,,e{q\"we",\'qrer\'}'));echo "\n"; /* array ( 0 => 'dfasdf', 1 => 'qw,,e{q"we', 2 => 'qrer', ) */ var_export(pg_array_parse('{,}'));echo "\n"; /* array ( 0 => '', 1 => '', ) */ var_export(pg_array_parse('{}'));echo "\n"; /* array ( ) */ var_export(pg_array_parse(null));echo "\n"; // NULL var_export(pg_array_parse(''));echo "\n"; // NULL P.S.: I know this is a very old post, but I couldn't find any solution for postgresql pre 9.2
Reliable function to parse PostgreSQL (one-dimensional) array literal into PHP array, using regular expressions: function pg_array_parse($literal) { if ($literal == '') return; preg_match_all('/(?<=^\{|,)(([^,"{]*)|\s*"((?:[^"\\\\]|\\\\(?:.|[0-9]+|x[0-9a-f]+))*)"\s*)(,|(?<!^\{)(?=\}$))/i', $literal, $matches, PREG_SET_ORDER); $values = []; foreach ($matches as $match) { $values[] = $match[3] != '' ? stripcslashes($match[3]) : (strtolower($match[2]) == 'null' ? null : $match[2]); } return $values; } print_r(pg_array_parse('{blah,blah blah,123,,"blah \\"\\\\ ,{\100\x40\t\daő\ő",NULL}')); // Array // ( // [0] => blah // [1] => blah blah // [2] => 123 // [3] => // [4] => blah "\ ,{## daőő // [5] => // ) var_dump(pg_array_parse('{,}')); // array(2) { // [0] => // string(0) "" // [1] => // string(0) "" // } print_r(pg_array_parse('{}')); var_dump(pg_array_parse(null)); var_dump(pg_array_parse('')); // Array // ( // ) // NULL // NULL print_r(pg_array_parse('{または, "some string without a comma", "a string, with a comma"}')); // Array // ( // [0] => または // [1] => some string without a comma // [2] => a string, with a comma // )
If you can foresee what kind text data you can expect in this field, you can use array_to_string function. It's available in 9.1 E.g. I exactly know that my array field labes will never have symbol '\n'. So I convert array labes into string using function array_to_string SELECT ... array_to_string( labels, chr(10) ) as labes FROM ... Now I can split this string using PHP function explode: $phpLabels = explode( $pgLabes, "\n" ); You can use any sequence of characters to separate elements of array. SQL: SELECT array_to_string( labels, '<--###DELIMITER###-->' ) as labes PHP: $phpLabels = explode( '<--###DELIMITER###-->', $pgLabes );
As #Kelt mentioned: Postgresql arrays look like this: {1,2,3,4} You can just simply replace first { and last } with [ and ] respectively and then json_decode that. But his solution works only for one-dimensional arrays. Here the solution either for one-dimensional and multidimensional arrays: $postgresArray = '{{1,2},{3,4}}'; $phpArray = json_decode(str_replace(['{', '}'], ['[', ']'], $postgresArray)); // [[1,2],[3,4]] To cast back: $phpArray=[[1,2],[3,4]]; $postgresArray=str_replace(['[', ']'], ['{', '}'], json_encode($phpArray));
Based on the answers in the thread i created two simple php functions that can be of use: private function pgArray_decode(string $pgArray){ return explode(',', trim($pgArray, '{}')); } private function pgArray_encode(array $array){ $jsonArray = json_encode($array, true); $jsonArray = str_replace('[','{',$jsonArray); $jsonArray = str_replace(']','}',$jsonArray); return $jsonArray; }
I tried the array_to_json answer, but unfortunalety this results in an unknown function error. Using the dbal query builder on a postgres 9.2 database with something like ->addSelect('array_agg(a.name) as account_name'), I got as result a string like { "name 1", "name 2", "name 3" } There are only quotes around the array parts if they contain special characters like whitespace or punctuation. So if there are quotes, I make the string a valid json string and then use the build-in parse json function. Otherwise I use explode. $data = str_replace(array("\r\n", "\r", "\n"), "", trim($postgresArray,'{}')); if (strpos($data, '"') === 0) { $data = '[' . $data . ']'; $result = json_decode($data); } else { $result = explode(',', $data); }
If you have control of the query that's hitting the database, why don't you just use unnest() to get the results as rows instead of Postgres-arrays? From there, you can natively get a PHP-array. $result = pg_query('SELECT unnest(myArrayColumn) FROM someTable;'); if ( $result === false ) { throw new Exception("Something went wrong."); } $array = pg_fetch_all($result); This sidesteps the overhead and maintenance-issues you'd incur by trying to convert the array's string-representation yourself.
I can see you are using explode(',', trim($pgArray, '{}')); But explode is used to Split a string by string (and you are supplying it an array!!). something like .. $string = "A string, with, commas"; $arr = explode(',', $string); What are you trying to do with array? if you want to concatenate have a look on implode OR not sure if it is possible for you to specify the delimiter other than a comma? array_to_string(anyarray, text)
Postgresql arrays look like this: {1,2,3,4} You can just simply replace first { and last } with [ and ] respectively and then json_decode that. $x = '{1,2,3,4}'; $y = json_decode('[' . substr($x, 1, -1) . ']'); // [1, 2, 3, 4] To cast back the other way would be mirror opposite: $y = [1, 2, 3, 4]; $x = '{' . substr(json_encode($y), 1, -1) . '}';
A simple and fast function for converting deep PostgreSQL array string to JSON string without using pg connection. function pgToArray(string $subject) : array { if ($subject === '{}') { return array(); } $matches = null; // find all elements; // quoted: {"1{\"23\"},abc"} // unquoted: {abc,123.5,TRUE,true} // and empty elements {,,} preg_match_all( '/\"((?<=\\\\).|[^\"])*\"|[^,{}]+|(?={[,}])|(?=,[,}])/', $subject,$matches,PREG_OFFSET_CAPTURE); $subject = str_replace(["{","}"],["[","]"],$subject); // converting delimiters to JSON $matches = array_reverse($matches[0]); foreach ($matches as $match) { $item = trim($match[0]); $replace = null; if ((strpos($item,"{") !== false) || (strpos($item,"}") !== false)) { // restoring replaced '{' and '}' inside string $replace = $match[0]; } elseif (in_array($item,["NULL","TRUE","FALSE"])) { $replace = strtolower($item); } elseif ($item === "" || ($item[0] !== '"' && !in_array($item,["null","true","false"]) && !is_float($item))) { $replace = '"' . $item . '"'; // adding quotes to string element } if ($replace) { // concatenate modified element instead of old element $subject = substr($subject, 0, $match[1]) . $replace . substr($subject, $match[1] + strlen($match[0])); } } return json_decode($subject, true); }