PHP parse_str() function allowing passing shorthand array - php

We are accepting strings from templates to a markup engine, which allows for configuration to be passed in a "simple" form.
The engine parses the strings via PHP, using an adapted version of the parse_str() function - so we can parse any combination of the strings below:
config=posts_per_page:"5",default:"No questions yet -- once created they will appear here."&markup->template="{{ questions }}"
gives:
Array(
[config] => Array
(
[posts_per_page] => 5
[default] => No questions yet -- once created they will appear here.
)
[markup] => Array
(
[template] => {{ questions }}
)
)
OR:
config->default=all:"<p class='ml-3'>No members here yet...</p>"
To Get:
Array
[config] => Array
(
[default] => Array
(
[all] => <p class='ml-3'>No members here yet...</p>
)
)
)
Another:
config=>handle:"medium"
Returns:
Array (
[config] => Array
(
[>handle] => medium
)
)
Strings can be passed with spaces ( and multi-line spaces ) and string parameters should be passed between "double quotes" to preserve natural spacing - we run the following preg_replace on the string before it is passed to the parse_str method:
// strip white spaces from data that is not passed inside double quotes ( "data" ) ##
$string = preg_replace( '~"[^"]*"(*SKIP)(*F)|\s+~', "", $string );
So far, so good - until we try to pass a "delimiter" inside a string value, then it is treated literally - for example the following string returns a corrupt array:
config=posts_per_page:"5",default:"No questions yet -- once created, they will appear here."&markup->template="{{ questions }}"
Returns the following array:
Array (
[config] => Array
(
[posts_per_page] => 5
[default] => No questions yet -- once created
[ they will appear here."] =>
)
[markup] => Array
(
[template] => {{ questions }}
)
)
The "," was treated literally and the string was broken into an extra array part.
One simple solution is to create delimiters and operators with a lower chance of conflicting with string values - for example changing "," to "###" - but one important part of the markup used is that it is easy to write and read - it's intended use-case is for front-end developers to pass simple arguments to the template parser - this is one reason we have tried to avoid JSON - which is of course a good fit in terms of passing data, but it's hard to read and write - of course, that statement is subjective and open to opinion :)
Here is the parse_str method:
public static function parse_str( $string = null ) {
// h::log($string);
// delimiters ##
$operator_assign = '=';
$operator_array = '->';
$delimiter_key = ':';
$delimiter_and_property = ',';
$delimiter_and_key = '&';
// check for "=" delimiter ##
if( false === strpos( $string, $operator_assign ) ){
h::log( 'e:>Passed string format does not include asssignment operator "'.$operator_assign.'" -- '.$string );
return false;
}
# result array
$array = [];
# split on outer delimiter
$pairs = explode( $delimiter_and_key, $string );
# loop through each pair
foreach ( $pairs as $i ) {
# split into name and value
list( $key, $value ) = explode( $operator_assign, $i, 2 );
// what about array values ##
// example -- sm:medium, lg:large
if( false !== strpos( $value, $delimiter_key ) ){
// temp array ##
$value_array = [];
// split value into an array at "," ##
$value_pairs = explode( $delimiter_and_property, $value );
// h::log( $value_pairs );
# loop through each pair
foreach ( $value_pairs as $v_pair ) {
// h::log( $v_pair ); // 'sm:medium'
# split into name and value
list( $value_key, $value_value ) = explode( $delimiter_key, $v_pair, 2 );
$value_array[ $value_key ] = $value_value;
}
// check if we have an array ##
if ( is_array( $value_array ) ){
$value = $value_array;
}
}
// $key might be in part__part format, so check ##
if( false !== strpos( $key, $operator_array ) ){
// explode, max 2 parts ##
$md_key = explode( $operator_array, $key, 2 );
# if name already exists
if( isset( $array[ $md_key[0] ][ $md_key[1] ] ) ) {
# stick multiple values into an array
if( is_array( $array[ $md_key[0] ][ $md_key[1] ] ) ) {
$array[ $md_key[0] ][ $md_key[1] ][] = $value;
} else {
$array[ $md_key[0] ][ $md_key[1] ] = array( $array[ $md_key[0] ][ $md_key[1] ], $value );
}
# otherwise, simply stick it in a scalar
} else {
$array[ $md_key[0] ][ $md_key[1] ] = $value;
}
} else {
# if name already exists
if( isset($array[$key]) ) {
# stick multiple values into an array
if( is_array($array[$key]) ) {
$array[$key][] = $value;
} else {
$array[$key] = array($array[$key], $value);
}
# otherwise, simply stick it in a scalar
} else {
$array[$key] = $value;
}
}
}
// h::log( $array );
# return result array
return $array;
}
I will try to skip splitting string between "double quotes" - probably via another regex, but perhaps there are other potential pitfalls waiting that might not make this approach viable long-term - any help glady accepted!

One solution, is to change the following:
from:
$value_pairs = explode( $delimiter_and_property, $value );
to:
$value_pairs = self::quoted_explode( $value, $delimiter_and_property, '"' );
which calls a new method found on another SO answer ( linked in comment block ):
/**
* Regex Escape values
*/
public static function regex_escape( $subject ) {
return str_replace( array( '\\', '^', '-', ']' ), array( '\\\\', '\\^', '\\-', '\\]' ), $subject );
}
/**
* Explode string, while respecting delimiters
*
* #link https://stackoverflow.com/questions/3264775/an-explode-function-that-ignores-characters-inside-quotes/13755505#13755505
*/
public static function quoted_explode( $subject, $delimiter = ',', $quotes = '\"' )
{
$clauses[] = '[^'.self::regex_escape( $delimiter.$quotes ).']';
foreach( str_split( $quotes) as $quote ) {
$quote = self::regex_escape( $quote );
$clauses[] = "[$quote][^$quote]*[$quote]";
}
$regex = '(?:'.implode('|', $clauses).')+';
preg_match_all( '/'.str_replace('/', '\\/', $regex).'/', $subject, $matches );
return $matches[0];
}

Related

Parsing (non-standard) json to array/object

I have string like this:
['key1':'value1', 2:'value2', 3:$var, 4:'with\' quotes', 5:'with, comma']
And I want to convert it to an array like this:
$parsed = [
'key1' => 'value1',
2 => 'value2',
3 => '$var',
4 => 'with\' quotes',
5 => 'with, comma',
];
How can I parse that?
Any tips or codes will be appreciated.
What can't be done?
Using standard json parsers
eval()
explode() by , and explode() by :
As you cannot use any pre-built function, like json_decode, you'll have to try and find the most possible scenarios of quoting, and replace them with known substrings.
Given that all of the values and/or keys in the input array are encapsulated in single quotes:
Please note: this code is untested
<?php
$input = "[ 'key1':'value1', 2:'value2', 3:$var, 4:'with\' quotes', 5: '$var', 'another_key': 'something not usual, like \'this\'' ]";
function extractKeysAndValuesFromNonStandardKeyValueString ( $string ) {
$input = str_replace ( Array ( "\\\'", "\'" ), Array ( "[DOUBLE_QUOTE]", "[QUOTE]" ), $string );
$input_clone = $input;
$return_array = Array ();
if ( preg_match_all ( '/\'?([^\':]+)\'?\s*\:\s*\'([^\']+)\'\s*,?\s*/', $input, $matches ) ) {
foreach ( $matches[0] as $i => $full_match ) {
$key = $matches[1][$i];
$value = $matches[2][$i];
if ( isset ( ${$value} ) $value = ${$value};
else $value = str_replace ( Array ( "[DOUBLE_QUOTE]", "[QUOTE]" ), Array ( "\\\'", "\'" ), $value );
$return_array[$key] = $value;
$input_clone = str_replace ( $full_match, '', $input_clone );
}
// process the rest of the string, if anything important is left inside of it
if ( preg_match_all ( '/\'?([^\':]+)\'?\s*\:\s*([^,]+)\s*,?\s*/', $input_clone, $matches ) ) {
foreach ( $matches[0] as $i => $full_match ) {
$key = $matches[1][$i];
$value = $matches[2][$i];
if ( isset ( ${$value} ) $value = ${$value};
$return_array[$key] = $value;
}
}
}
return $return_array;
}
The idea behind this function is to first replace all the possible combinations of quotes in the non-standard string with something you can easily replace, then perform a standard regexp against your input, then rebuild everything assuring you're resetting the previously replaced substrings

get all preg_match_all matches recursively?

I am trying to parse some attributes from a string. I am using the function below, which is working fine but I would like to parse all nested matches, right now it is only parsing the root level when the goal is to parse all the matches even the ones contained inside others:
function get_all_attributes( $tag, $text ){
preg_match_all( '/\[(\[?)(embed|wp_caption|caption|gallery|playlist|audio|video|acf|page|row|column|loop\-grid|loop\-grid\-item|grid\-filters|page\-title|page\-section|header|header\-column|header\-menu|header\-logo)(?![\w-])([^\]\/]*(?:\/(?!\])[^\]\/]*)*?)(?:(\/)\]|\](?:([^\[]*+(?:\[(?!\/\2\])[^\[]*+)*+)\[\/\2\])?)(\]?)/s', $text, $matches );
$out = array();
if( isset( $matches[2] ) )
{
foreach( (array) $matches[2] as $key => $value )
{
if( $tag === $value ){
$out[] = array(
'name' => $tag,
'attributes' => shortcode_parse_atts( $matches[3][$key] ),
'content' => trim($matches[5][$key]),
);
}
}
}
return $out;
}
The following is the string being parsed, it contains shortcodes from Wordpress, which I am trying to put it in array to easily get the attributes later on:
[page key="2298"]
[page-title key="1446986321457"]aaaa[/page-title]
[page-title key="1446986418207"]bbbbb[/page-title]
[row key="1446893994674"]
[column key="1446893994674_1"]
[image key="1446893994674_1_logo"]ccc[/image]
[/column]
[/row]
Is it possible to get all strings both parents and children in the same array maybe using a recursive regex?

Query string like parameters regex

From a text like:
category=[123,456,789], subcategories, id=579, not_in_category=[111,333]
I need a regex to get something like:
$params[category][0] = 123;
$params[category][1] = 456;
$params[category][2] = 789;
$params[subcategories] = ; // I just need to know that this exists
$params[id] = 579;
$params[not_category][0] = 111;
$params[not_category][1] = 333;
Thanks everyone for the help.
PS
As you suggested, I clarify that the structure and the number of items may change.
Basically the structure is:
key=value, key=value, key=value, ...
where value can be:
a single value (e.g. category=123 or postID=123 or mykey=myvalue, ...)
an "array" (e.g. category=[123,456,789])
a "boolean" where the TRUE value is an assumption from the fact that "key" exists in the array (e.g. subcategories)
This method should be flexible enough:
$str = 'category=[123,456,789], subcategories, id=579, not_in_category=[111,333]';
$str = preg_replace('#,([^0-9 ])#',', $1',$str); //fix for string format with no spaces (count=10,paginate,body_length=300)
preg_match_all('#(.+?)(,[^0-9]|$)#',$str,$sections); //get each section
$params = array();
foreach($sections[1] as $param)
{
list($key,$val) = explode('=',$param); //Put either side of the "=" into variables $key and $val
if(!is_null($val) && preg_match('#\[([0-9,]+)\]#',$val,$match)>0)
{
$val = explode(',',$match[1]); //turn the comma separated numbers into an array
}
$params[$key] = is_null($val) ? '' : $val;//Use blank string instead of NULL
}
echo '<pre>'.print_r($params,true).'</pre>';
var_dump(isset($params['subcategories']));
Output:
Array
(
[category] => Array
(
[0] => 123
[1] => 456
[2] => 789
)
[subcategories] =>
[id] => 579
[not_in_category] => Array
(
[0] => 111
[1] => 333
)
)
bool(true)
Alternate (no string manipulation before process):
$str = 'count=10,paginate,body_length=300,rawr=[1,2,3]';
preg_match_all('#(.+?)(,([^0-9,])|$)#',$str,$sections); //get each section
$params = array();
foreach($sections[1] as $k => $param)
{
list($key,$val) = explode('=',$param); //Put either side of the "=" into variables $key and $val
$key = isset($sections[3][$k-1]) ? trim($sections[3][$k-1]).$key : $key; //Fetch first character stolen by previous match
if(!is_null($val) && preg_match('#\[([0-9,]+)\]#',$val,$match)>0)
{
$val = explode(',',$match[1]); //turn the comma separated numbers into an array
}
$params[$key] = is_null($val) ? '' : $val;//Use blank string instead of NULL
}
echo '<pre>'.print_r($params,true).'</pre>';
Another alternate: full re-format of string before process for safety
$str = 'count=10,paginate,body_length=300,rawr=[1, 2,3] , name = mike';
$str = preg_replace(array('#\s+#','#,([^0-9 ])#'),array('',', $1'),$str); //fix for varying string formats
preg_match_all('#(.+?)(,[^0-9]|$)#',$str,$sections); //get each section
$params = array();
foreach($sections[1] as $param)
{
list($key,$val) = explode('=',$param); //Put either side of the "=" into variables $key and $val
if(!is_null($val) && preg_match('#\[([0-9,]+)\]#',$val,$match)>0)
{
$val = explode(',',$match[1]); //turn the comma separated numbers into an array
}
$params[$key] = is_null($val) ? '' : $val;//Use blank string instead of NULL
}
echo '<pre>'.print_r($params,true).'</pre>';
You can use JSON also, it's native in PHP : http://php.net/manual/fr/ref.json.php
It will be more easy ;)
<?php
$subject = "category=[123,456,789], subcategories, id=579, not_in_category=[111,333]";
$pattern = '/category=\[(.*?)\,(.*?)\,(.*?)\]\,\s(subcategories),\sid=(.*?)\,\snot_in_category=\[(.*?)\,(.*?)\]/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3);
print_r($matches);
?>
I think this will get you the matches out... didn't actually test it but it might be a good starting point.
Then you just need to push the matches to the correct place in the array you need. Also test if the subcategories string exists with strcmp or something...
Also, notice that I assumed your subject string has that fixe dtype of structure... if it is changing often, you'll need much more than this...
$str = 'category=[123,456,789], subcategories, id=579, not_in_category=[111,333]';
$main_arr = preg_split('/(,\s)+/', $str);
$params = array();
foreach( $main_arr as $value) {
$pos = strpos($value, '=');
if($pos === false) {
$params[$value] = null;
} else {
$index_part = substr($value, 0, $pos);
$value_part = substr($value, $pos+1, strlen($value));
$match = preg_match('/\[(.*?)\]/', $value_part,$xarr);
if($match) {
$inner_arr = preg_split('/(,)+/', $xarr[1]);
foreach($inner_arr as $v) {
$params[$index_part][] = $v;
}
} else {
$params[$index_part] = $value_part;
}
}
}
print_r( $params );
Output :
Array
(
[category] => Array
(
[0] => 123
[1] => 456
[2] => 789
)
[subcategories] =>
[id] => 579
[not_in_category] => Array
(
[0] => 111
[1] => 333
)
)

regular expression to match contents other than opening tag

Attempting to match contents of tag, with the exclusion of any other tag. I have some malformed html I'm trying to clean up.
Put simple:
<td><ins>sample content</td>
<td>other content</td>
</tr>
<tr>
<td>remaining</ins>other copy</td>
I'd like to capture "sample content", the html before it (<td><ins>), and the html after it, up to and exclude </ins>
I believe what I'm looking for is a negative look ahead, but I'm a little lost as to how this would work in PHP.
Since you don't know what the unclosed opening tag is, you are creating a bare-bones parser.
You will need to loop over your content and stop every time you encounter an opening angle-bracket <. This REALLY should be done with AWK; however if you must use PHP, you would do something like this
<?php
$file = file_get_contents('path/to/file');
$file = preg_replace( '/[\n\r\t]/' , '' , $file );
$pieces = explode( '<' , $file );
if ( !$pieces[0] ) array_shift($pieces);
/* Given your example code, $pieces looks like this
$pieces = array(
[0] => 'td>',
[1] => 'ins>sample content',
[2] => '/td>',
[3] => 'td>other content',
[4] => '/td>',
[5] => '/tr>',
[6] => 'tr>',
[7] => 'td>remaining',
[8] => '/ins>other copy',
[9] => '/td>'
);
*/
$openers = array();//$openers = [];
$closers = array();//$closers = [];
$brokens = array();//$brokens = [];
for ( $i = 0 , $count = count($pieces) ; $i < $count ; $i++ ) {
//grab everything essentially between the brackets
$tag = strstr( $pieces[$i] , '>' , TRUE );
$oORc = strpos( $pieces[$i] , '/' );
if ( $oORc !== FALSE ) {
//store this for later (and maintain $pieces' index)
$closers[$i] = $tag;
$elm = str_replace( '/' , '' , $tag );
if ( ( $oIndex = array_search( $elm , $openers ) ) && count($openers) != count($closers) ) {
//more openers than closers ==> broken pair
$brokens[$oIndex] = $pieces[$oIndex];
$cIndex = array_search( $tag, $closers );
$brokens[$cIndex] = $pieces[$cIndex];
//remove the unpaired elements from the 2 arrays so count works
unset( $openers[$oIndex] , $closers[$cIndex] );
}
} else {
$openers[$i] = $tag;
}//fi
}//for
print_r($brokens);
?>
The index in $brokens is the index from $pieces where the malformed html appeared and its value is the offending tag and its content:
$brokens = Array(
[1] => ins>sample content
[8] => /ins>other copy
);
Caveat This does not account for self-closing tags like <br /> or <img /> (but that's why you should use one of the many software apps that already exist for this).

parseExpression in Twig

I am integrating Twig in an existing project.
I am writing a token parser to parse a custom tag that is similiar to the {% render %} tag.
My tag looks like:
{% mytag 'somestring' with { 'name': name, 'color': 'green' } %}
where name is defined by {% set name = 'foo' %}
I am able to parse somestring without any issues.
This is the code used to parse the stuff in the with{ }:
$Stream = $this->parser->getStream();
if ( $Stream->test( \Twig_Token::NAME_TYPE, 'with' ) )
{
$Stream->next();
$parsedParameters = $this->parser->getExpressionParser()->parseExpression();
$parameters = $this->parser->getEnvironment()->compile( $parsedParameters );
var_dump( $parameters ); //string 'array( "name" => $this->getContext( $context, "name" ), "color" => "green" )' (length=72)
foreach ( $parsedParameters->getIterator() as $parameter )
{
//var_dump($ parameter->getAttribute('value') );
}
}
My goal is to turn 'name': name, 'color': 'green' into an associative array within the token parser:
array(
'name' => 'foo',
'color': 'green',
)
As the documentation is quite sparse and the code in the library is uncommented, I am not sure how to do this. If I loop through $parsedParameters, I get 4 elements consisting of the array key and an array value. However, as name is a variable with a type Twig_Node_Expression_Name, I am unsure as to how I can compile it to get the compiled value. Currently, I have found a way to compile that node, but all it gives me is a string containing a PHP expression which I can't use.
How can I turn the parsed expression into an associative array?
Okey. I guess I was able to solve that. Not to pretty, but should work.
$Stream = $this->parser->getStream();
if ( $Stream->test( \Twig_Token::NAME_TYPE, 'with' ) )
{
$Stream->next();
$parsedParameters = $this->parser->getExpressionParser()->parseExpression();
$parameters = $this->parser->getEnvironment()->compile( $parsedParameters );
var_dump( $parameters ); //string 'array( "name" => $this->getContext( $context, "name" ), "color" => "green" )' (length=72)
$index = null;
$value = null;
foreach ( $parsedParameters->getIterator() as $parameter )
{
if ( $parameter->hasAttribute( 'value' ) )
{
$index = $parameter->getAttribute( 'value' );
}
elseif ( $parameter->hasAttribute( 'name' ) )
{
$value = $parameter->getAttribute( 'name' );
}
if ( isset( $index, $value ) )
{
$params[ $index ] = $value;
$index = null;
$value = null;
}
}
}
So here I have array params, that I can pass to Custom node.
$params = var_export( $properties['params'], true );
unset( $fieldProperties['params'] );
Now I just did following:
$Compiler
->write( "\$params = array();\n" )
->write( "foreach ( {$params} as \$searchFor => \$replaceWith )\n" )
->write( "{\n" )
->write( "\t\$params[ \$searchFor ] = str_replace( \$replaceWith, \$context[\$replaceWith], \$replaceWith );\n" )
->write( "}\n" )
->write( "var_dump( \$params );\n" );
This should be it.
Also I see, that you where talking about TokenParser, but sadly I haven't found the solution to turn it over there.

Categories