Matching (pairing) tokens (eg, brackets or quotes) - php

In short, I need a function which attempts a rudimentary code fix by adding brackets/quotes were necessary, for parsing purposes. That is, the resulting code is not expected to be runnable.
Let's see a few examples:
[1] class Aaa { $var a = "hi"; => class Aaa { $var a = "hi"; }
[2] $var a = "hi"; } => { $var a = "hi"; }
[3] class { a = "hi; function b( } => class { a = "hi; function b( }"}
[4] class { a = "hi"; function b( } => class { a = "hi"; function b() {}}
PS: The 4th example above looks quite complicated, but in fact, it's quite easy. If the engine finds an ending bracket token which doesn't match with the stack, it should the opposite token before that one. As you can see, this works pretty well.
As a function signature, it looks like: balanceTokens($code, $bracket_tokens, $quote_tokens)
The function I wrote works using a stack. Well, it doesn't exactly work, but it does use a stack.
function balanceTokens($code, $bracket_tokens, $quote_tokens){
$stack = array(); $last = null; $result = '';
foreach(str_split($code) as $c){
if($last==$c && in_array($c, $quote_tokens)){
// handle closing string
array_pop($stack);
}elseif(!in_array($last, $quote_tokens)){
// handle other tokens
if(isset($bracket_tokens[$c])){
// handle begining bracket
$stack[] = $c;
}elseif(($p = array_search($c, $bracket_tokens)) != false){
// handle ending bracket
$l = array_pop($stack);
if($l != $p)$result .= $p;
}elseif(isset($quote_tokens[$c])){
// handle begining quote
$stack[] = $c;
$last = $c;
}// else other token...
}
$result .= $c;
}
// perform fixes
foreach($stack as $token){
// fix ending brackets
if(isset($bracket_tokens[$token]))
$result .= $bracket_tokens[$token];
// fix begining brackets
if(in_array($token, $bracket_tokens))
$result = $token . $result;
}
return $result;
}
The function is called like this:
$new_code = balanceTokens(
$old_code,
array(
'<' => '>',
'{' => '}',
'(' => ')',
'[' => ']',
),
array(
'"' => '"',
"'" => "'",
)
);
Yes, it's quite generic, there aren't any hard-coded tokens.
I haven't the slightest idea why it's not working...as a matter of fact, I don't even know if it should work. I admit I didn't put much thought into writing it. Maybe there are obvious issues which I'm not seeing.

An alternative implementation (which does more aggressive balancing):
function balanceTokens($code) {
$tokens = [
'{' => '}',
'[' => ']',
'(' => ')',
'"' => '"',
"'" => "'",
];
$closeTokens = array_flip($tokens);
$stringTokens = ['"' => true, '"' => true];
$stack = [];
for ($i = 0, $l = strlen($code); $i < $l; ++$i) {
$c = $code[$i];
// push opening tokens to the stack (for " and ' only if there is no " or ' opened yet)
if (isset($tokens[$c]) && (!isset($stringTokens[$c]) || end($stack) != $c)) {
$stack[] = $c;
// closing tokens have to be matched up with the stack elements
} elseif (isset($closeTokens[$c])) {
$matched = false;
while ($top = array_pop($stack)) {
// stack has matching opening for current closing
if ($top == $closeTokens[$c]) {
$matched = true;
break;
}
// stack has unmatched opening, insert closing at current pos
$code = substr_replace($code, $tokens[$top], $i, 0);
$i++;
$l++;
}
// unmatched closing, insert opening at start
if (!$matched) {
$code = $closeTokens[$c] . $code;
$i++;
$l++;
}
}
}
// any elements still on the stack are unmatched opening, so insert closing
while ($top = array_pop($stack)) {
$code .= $tokens[$top];
}
return $code;
}
Some examples:
$tests = array(
'class Aaa { public $a = "hi";',
'$var = "hi"; }',
'class { a = "hi; function b( }',
'class { a = "hi"; function b( }',
'foo { bar[foo="test',
'bar { bar[foo="test] { bar: "rgba(0, 0, 0, 0.1}',
);
Passing those to the function gives:
class Aaa { public $a = "hi";}
{$var = "hi"; }
class { a = "hi; function b( )"}
class { a = "hi"; function b( )}
foo { bar[foo="test"]}
bar { bar[foo="test"] { bar: "rgba(0, 0, 0, 0.1)"}}

After some coffee :), I came up with a (somewhat) working prototype function.
You can see it in action here. However, it is slightly modified to add some (shiny) debug output.
/**
* Fix some possible issues with the code (eg, incompleteness).
* #param string $code The code to sanitize.
* #param array $bracket_tokens List of bracket tokens where the index is the begin bracket and the value is the end bracket.
* #param array $quote_tokens List of quote tokens where the index is the begin quote and the value is the end quote.
* #return string The sanitized code.
*/
function css_sanitize($code, $bracket_tokens, $quote_tokens){
$result = '';
$stack = array();
$last = '';
foreach(str_split($code) as $c){
if(in_array($c, $quote_tokens) && $last==$c){
array_pop($stack);
$last = '';
}elseif(!in_array($last, $quote_tokens)){
if(isset($bracket_tokens[$c])){
$stack[] = $c;
}elseif(($p = array_search($c, $bracket_tokens)) != false){
if($last != $c){
$result .= $p;
}else{
array_pop($stack);
$last = (($p = count($stack)) > 1) ? $stack[$p] : '';
}
}elseif(isset($quote_tokens[$c])){
$stack[] = $c;
$last = $c;
}
}
$result .= $c;
}
foreach(array_reverse($stack) as $token){
if(isset($bracket_tokens[$token])){
$result .= $bracket_tokens[$token];
}
if(in_array($token, $bracket_tokens)){
$result = $token . $result;
}
if(isset($quote_tokens[$token])){
$result .= $quote_tokens[$token];
}
}
return $result;
}

Related

php variable interpolation inside variable [duplicate]

Python has a feature called template strings.
>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
I know that PHP allows you to write:
"Hello $person"
and have $person substituted, but the templates can be reused in various sections of the code?
You can use template strings like this:
$name = "Maria";
$info["last_name"] = "Warner";
echo "Hello {$name} {$info["last_name"]}";
This will echo Hello Maria Warner.
You could also use strtr:
$template = '$who likes $what';
$vars = array(
'$who' => 'tim',
'$what' => 'kung pao',
);
echo strtr($template, $vars);
Outputs:
tim likes kung pao
I think there are a bunch of ways to do this... but this comes to mind.
$search = array('%who%', '%what_id%');
$replace = array('tim', 'kung pao');
$conference_target = str_replace(
$search,
$replace,
"%who% likes %what%"
);
Ha, we even had one in our framework using vsprintf:
class Helper_StringFormat {
public static function sprintf($format, array $args = array()) {
$arg_nums = array_slice(array_flip(array_keys(array(0 => 0) + $args)), 1);
for ($pos = 0; preg_match('/(?<=%)\(([a-zA-Z_]\w*)\)/', $format, $match, PREG_OFFSET_CAPTURE, $pos);) {
$arg_pos = $match[0][2];
$arg_len = strlen($match[0][0]);
$arg_key = $match[1][0];
if (! array_key_exists($arg_key, $arg_nums)) {
user_error("sprintfn(): Missing argument '${arg_key}'", E_USER_WARNING);
return false;
}
$format = substr_replace($format, $replace = $arg_nums[$arg_key] . '$', $arg_pos, $arg_len);
$pos = $arg_pos + strlen($replace);
}
return vsprintf($format, array_values($args));
}
}
Which looks like it came from the sprintf page
This allows for calls like:
sprintfn('second: %(second)s ; first: %(first)s', array(
'first' => '1st',
'second'=> '2nd'
));
UPDATE
Here is an update to do what you want... not fully tested though
class Helper_StringFormat {
public static function sprintf($format, array $args = array()) {
$arg_nums = array_slice(array_flip(array_keys(array(0 => 0) + $args)), 1);
for ($pos = 0; preg_match('/(?<=%)\(([a-zA-Z_][\w\s]*)\)/', $format, $match, PREG_OFFSET_CAPTURE, $pos);) {
$arg_pos = $match[0][1];
$arg_len = strlen($match[0][0]);
$arg_key = $match[1][0];
if (! array_key_exists($arg_key, $arg_nums)) {
user_error("sprintfn(): Missing argument '${arg_key}'", E_USER_WARNING);
return false;
}
$format = substr_replace($format, $replace = $arg_nums[$arg_key] . '$', $arg_pos, $arg_len);
$pos = $arg_pos + strlen($replace); // skip to end of replacement for next iteration
}
return vsprintf($format, array_values($args));
}
}
$str = "%(my var)s now work with a slight %(my var2)s";
$repl = array("my var" => "Spaces", "my var2" => "modification.");
echo Helper_StringFormat::sprintf($str, $repl);
OUTPUT
Spaces now work with a slight modification.
Another more simple approach would be this:
$s = function ($vars) {
extract($vars);
return "$who likes $what";
};
echo $s(['who' => 'Tim', 'what' => 'King Pao']); // Tim likes King Pao
And yes, PHPStorm will complain...
I personally most like sprintf (or vsprintf, for an array of arguments). It places them in the intended order, coerces types as needed, and has a lot more advanced features available.
Example:
$var = sprintf("%s costs %.2f dollars", "Cookies", 1.1);
This will result in the value Cookies cost 1.10 dollars.
There's an entire family of printf functions for different use cases, all listed under "See Also".
Very versatile: same methods for providing variables, array components, function results, etc.
I made a function to do what you want. i made it "quck-and-dirty" because i have not much time to refactorize it, maybe i upload it to my github.
EDIT: a bug correction...
Use it like
formattemplatter(
'$who likes $what'
, array(
'who' => 'Tim'
, 'what' => 'Kung Pao'
)
);
Variables can be [a-zA-Z0-9_] only.
function formattemplater($string, $params) {
// Determine largest string
$largest = 0;
foreach(array_keys($params) as $k) {
if(($l=strlen($k)) > $largest) $largest=$l;
}
$buff = '';
$cp = false; // Conditional parenthesis
$ip = false; // Inside parameter
$isp = false; // Is set parameter
$bl = 1; // buffer length
$param = ''; // current parameter
$out = ''; // output string
$string .= '!';
for($sc=0,$c=$oc='';isset($string{$sc});++$sc,++$bl) {
$c = $string{$sc};
if($ip) {
$a = ord($c);
if(!($a == 95 || ( // underscore
($a >= 48 && $a <= 57) // 0-9
|| ($a >= 65 && $a <= 90) // A-Z
|| ($a >= 97 && $a <= 122) // a-z
)
)) {
$isp = isset($params[$buff]);
if(!$cp && !$isp) {
trigger_error(
sprintf(
__FUNCTION__.': the parameter "%s" is not defined'
, $buff
)
, E_USER_ERROR
);
} elseif(!$cp || $isp) {
$out .= $params[$buff];
}
$isp = $isp && !empty($params[$buff]);
$oc = $buff = '';
$bl = 0;
$ip = false;
}
}
if($cp && $c === ')') {
$out .= $buff;
$cp = $isp = false;
$c = $buff = '';
$bl = 0;
}
if(($cp && $isp) || $ip)
$buff .= $c;
if($c === '$' && $oc !== '\\') {
if($oc === '(') $cp = true;
else $out .= $oc;
$ip = true;
$buff = $c = $oc = '';
$bl = 0;
}
if(!$cp && $bl > $largest) {
$buff = substr($buff, - $largest);
$bl = $largest;
}
if(!$ip && ( !$cp || ($cp && $isp))) {
$out .= $oc;
if(!$cp) $oc = $c;
}
}
return $out;
}
Just for the sake of completeness: there is also Heredoc.
$template = fn( $who, $what ) => <<<EOT
$who likes $what
EOT;
echo( $template( 'tim', 'kung pao' ) );
Outputs:
tim likes kung pao
Sidenotes:
You get highlighting in your favourite language (if properly configured). Just substitute EOT (from the sample above) with whatever you like (e.c. HTML, SQL, PHP, ...).
Escape arrays with curly braces {$data['who']}. Accessing objekts like $data->who works without braces.
Arrow functions like fn($a)=>$a are available since PHP 7.4. You can write function($a){return $a;} if you are using PHP<7.4.

Getting Invalid argument supplied for foreach as a result of pass by reference variable

I am upgrading a codebase that makes use of pass by reference
Main function
function splitSqlFile(&$ret, $sql)
{
$sql = trim($sql);
$sql_len = strlen($sql);
$char = '';
$string_start = '';
$in_string = false;
for ($i = 0; $i < $sql_len; ++$i) {
$char = $sql[$i];
if ($in_string) {
for (;;) {
$i = strpos($sql, $string_start, $i);
if (!$i) {
$ret[] = $sql;
return true;
}else if ($string_start == '`' || $sql[$i-1] != '\\'){
......
}else {
......
} // end if...elseif...else
} // end for
}
else if ($char == ';') {
$ret[] = substr($sql, 0, $i);
$sql = ltrim(substr($sql, min($i + 1, $sql_len)));
$sql_len = strlen($sql);
if ($sql_len) {
$i = -1;
} else {
// The submited statement(s) end(s) here
return true;
}
}else if (($char == '"') || ($char == '\'') || ($char == '`')) {
$in_string = true;
$string_start = $char;
} // end else if (is start of string)
// for start of a comment (and remove this comment if found)...
else if ($char == '#' || ($char == ' ' && $i > 1 && $sql[$i-2] . $sql[$i-1] == '--')) {
......
if (!$end_of_comment) {
// no eol found after '#', add the parsed part to the returned
// array and exit
$ret[] = trim(substr($sql, 0, $i-1));
return true;
} else {
.....
} // end if...else
} // end else if (is comment)
} // end for
// add any rest to the returned array
if (!empty($sql) && trim($sql) != '') {
$ret[] = $sql;
}
return true;
}
Calling the function
$sqlUtility->splitSqlFile($pieces, $sql_query);
foreach ($pieces as $piece)
{
.......
}
If the above variable splitSqlFile(&$ret, $sql) have the "&" before it, the program does run successfully, but if it is removed, now splitSqlFile($ret, $sql), It will start returning the 'invalid argument supplied for foreach' error.and when I try using the "is_array" function to check if it is an array, the result is always "NULL".
Why you get the error:
By removing the & from $ret, you are no longer referencing the variable in the function call. In this case, $pieces. So when you do a foreach on $pieces after calling the function, it will error because $pieces is basically a null variable at that point.
function splitSqlFile(&$ret,$sql) {
$ret[] = 'stuff';
}
splitSqlFile($pieces,$sql);
// $pieces will be an array as 0 => 'stuff'
foreach ($pieces as $piece) { } // will not error
vs:
function splitSqlFile($ret,$sql) {
$ret[] = 'stuff';
}
splitSqlFile($pieces,$sql);
// $pieces will be a null variable, since it was never assigned anything
foreach ($pieces as $piece) { } // will error
Alternative to no reference:
So if you want to remove the & and no longer pass by reference, you have to do other changes to the function to get that value back out. And depending on the codebase, this could mean a whole lot of work everywhere that function is used!
Example:
function splitSqlFile($sql) {
$ret = [];
$ret[] = 'stuff';
return array('result'=>true,'ret'=>$ret);
}
// $result will contain multiple things to utilize
// if you will only need that variable once (does not accumulate)
$result = splitSqlFile($sql);
foreach ($result['pieces'] as $piece) { }
// if that variable is added by multiple calls, and displayed later... merge
$pieces = [];
$result = splitSqlFile($sql_1);
$pieces = array_merge($pieces,$result['pieces']);
$result = splitSqlFile($sql_2);
$pieces = array_merge($pieces,$result['pieces']);
foreach ($pieces as $piece) { }
A second example (passing in the array as you go... gets confusing):
function splitSqlFile($pieces_in,$sql) {
$pieces_in[] = 'stuff';
return array('result'=>true,'pieces_out'=>$pieces_in);
}
$pieces = [];
$result = splitSqlFile($pieces,$sql_1);
$pieces = $result['pieces_out'];
$result = splitSqlFile($pieces,$sql_2);
$pieces = $result['pieces_out'];
foreach ($pieces as $piece) { }
As you can see, not only does it change the return values that has to be dealt with, but it also changes how it is called. Again, if this function is used in a thousand places in the code... serious headaches!
Conclusion:
I would honestly keep the reference as it is. It was done that way to make accumulating debug data easier, and direct. Otherwise you have a lot of code changes to do toget rid of the reference.
However that can simply be my opinion on the matter.

PHP rename all variables inside code

I would like to rename all variables within the file to random name.
For example this:
$example = "some $string";
function ($variable2) {
echo $variable2;
}
foreach ($variable3 as $key => $var3val) {
echo $var3val . "somestring";
}
Will become this:
$frk43r = "some $string";
function ($izi34ee) {
echo $izi34ee;
}
foreach ($erew7er as $iure7 => $er3k2) {
echo $er3k2 . "some$string";
}
It doesn't look so easy task so any suggestions will be helpful.
I would use token_get_all to parse the document and map a registered random string replacement on all interesting tokens.
To obfuscate all the variable names, replace T_VARIABLE in one pass, ignoring all the superglobals.
Additionally, for the bounty's requisite function names, replace all the T_FUNCTION declarations in the first pass. Then a second pass is needed to replace all the T_STRING invocations because PHP allows you to use a function before it's declared.
For this example, I generated all lowercase letters to avoid case-insensitive clashes to function names, but you can obviously use whatever characters you want and add an extra conditional check for increased complexity. Just remember that they can't start with a number.
I also registered all the internal function names with get_defined_functions to protect against the extremely off-chance possibility that a randomly generated string would match one of those function names. Keep in mind this won't protect against special extensions installed on the machine running the obfuscated script that are not present on the server obfuscating the script. The chances of that are astronomical, but you can always ratchet up the length of the randomly generated string to diminish those odds even more.
<?php
$tokens = token_get_all(file_get_contents('example.php'));
$globals = array(
'$GLOBALS',
'$_SERVER',
'$_GET',
'$_POST',
'$_FILES',
'$_COOKIE',
'$_SESSION',
'$_REQUEST',
'$_ENV',
);
// prevent name clashes with randomly generated strings and native functions
$registry = get_defined_functions();
$registry = $registry['internal'];
// first pass to change all the variable names and function name declarations
foreach($tokens as $key => $element){
// make sure it's an interesting token
if(!is_array($element)){
continue;
}
switch ($element[0]) {
case T_FUNCTION:
$prefix = '';
// this jumps over the whitespace to get the function name
$index = $key + 2;
break;
case T_VARIABLE:
// ignore the superglobals
if(in_array($element[1], $globals)){
continue 2;
}
$prefix = '$';
$index = $key;
break;
default:
continue 2;
}
// check to see if we've already registered it
if(!isset($registry[$tokens[$index][1]])){
// make sure our random string hasn't already been generated
// or just so crazily happens to be the same name as an internal function
do {
$replacement = $prefix.random_str(16);
} while(in_array($replacement, $registry));
// map the original and register the replacement
$registry[$tokens[$index][1]] = $replacement;
}
// rename the variable
$tokens[$index][1] = $registry[$tokens[$index][1]];
}
// second pass to rename all the function invocations
$tokens = array_map(function($element) use ($registry){
// check to see if it's a function identifier
if(is_array($element) && $element[0] === T_STRING){
// make sure it's one of our registered function names
if(isset($registry[$element[1]])){
// rename the variable
$element[1] = $registry[$element[1]];
}
}
return $element;
},$tokens);
// dump the tokens back out to rebuild the page with obfuscated names
foreach($tokens as $token){
echo $token[1] ?? $token;
}
/**
* https://stackoverflow.com/a/31107425/4233593
* Generate a random string, using a cryptographically secure
* pseudorandom number generator (random_int)
*
* For PHP 7, random_int is a PHP core function
* For PHP 5.x, depends on https://github.com/paragonie/random_compat
*
* #param int $length How many characters do we want?
* #param string $keyspace A string of all possible characters
* to select from
* #return string
*/
function random_str($length, $keyspace = 'abcdefghijklmnopqrstuvwxyz')
{
$str = '';
$max = mb_strlen($keyspace, '8bit') - 1;
for ($i = 0; $i < $length; ++$i) {
$str .= $keyspace[random_int(0, $max)];
}
return $str;
}
Given this example.php
<?php
$example = 'some $string';
if(isset($_POST['something'])){
echo $_POST['something'];
}
function exampleFunction($variable2){
echo $variable2;
}
exampleFunction($example);
$variable3 = array('example','another');
foreach($variable3 as $key => $var3val){
echo $var3val."somestring";
}
Produces this output:
<?php
$vsodjbobqokkaabv = 'some $string';
if(isset($_POST['something'])){
echo $_POST['something'];
}
function gkfadicwputpvroj($zwnjrxupprkbudlr){
echo $zwnjrxupprkbudlr;
}
gkfadicwputpvroj($vsodjbobqokkaabv);
$vfjzehtvmzzurxor = array('example','another');
foreach($vfjzehtvmzzurxor as $riuqtlravsenpspv => $mkdgtnpxaqziqkgo){
echo $mkdgtnpxaqziqkgo."somestring";
}
EDIT 4.12.2016 - please see below! (after first answer)
I've just tried to find a solution which can handle both cases: your given case and this example from Elias Van Ootegerm.
of course it should be improved as mentioned in one of my comments, but it works for your example:
$source = file_get_contents("source.php");
// this should get all Variables BUT isn't right at the moment if a variable is followed by an ' or " !!
preg_match_all('/\$[\$a-zA-Z0-9\[\'.*\'\]]*/', $source, $matches);
$matches = array_unique($matches[0]);
// this array saves all old and new variable names to track all replacements
$replacements = array();
$obfuscated_source = $source;
foreach($matches as $varName)
{
do // generates random string and tests if it already is used by an earlier replaced variable name
{
// generate a random string -> should be improved.
$randomName = substr(md5(rand()), 0, 7);
// ensure that first part of variable name is a character.
// there could also be a random character...
$randomName = "a" . $randomName;
}
while(in_array("$" . $randomName, $replacements));
if(substr($varName, 0,8) == '$GLOBALS')
{
// this handles the case of GLOBALS variables
$delimiter = substr($varName, 9, 1);
if($delimiter == '$') $delimiter = '';
$newName = '$GLOBALS[' .$delimiter . $randomName . $delimiter . ']';
}
else if(substr($varName, 0,8) == '$_SERVER')
{
// this handles the case of SERVER variables
$delimiter = substr($varName, 9, 1);
if($delimiter == '$') $delimiter = '';
$newName = '$_SERVER[' .$delimiter . $randomName . $delimiter . ']';
}
else if(substr($varName, 0,5) == '$_GET')
{
// this handles the case of GET variables
$delimiter = substr($varName, 6, 1);
if($delimiter == '$') $delimiter = '';
$newName = '$_GET[' .$delimiter . $randomName . $delimiter . ']';
}
else if(substr($varName, 0,6) == '$_POST')
{
// this handles the case of POST variables
$delimiter = substr($varName, 7, 1);
if($delimiter == '$') $delimiter = '';
$newName = '$_POST[' .$delimiter . $randomName . $delimiter . ']';
}
else if(substr($varName, 0,7) == '$_FILES')
{
// this handles the case of FILES variables
$delimiter = substr($varName, 8, 1);
if($delimiter == '$') $delimiter = '';
$newName = '$_FILES[' .$delimiter . $randomName . $delimiter . ']';
}
else if(substr($varName, 0,9) == '$_REQUEST')
{
// this handles the case of REQUEST variables
$delimiter = substr($varName, 10, 1);
if($delimiter == '$') $delimiter = '';
$newName = '$_REQUEST[' .$delimiter . $randomName . $delimiter . ']';
}
else if(substr($varName, 0,9) == '$_SESSION')
{
// this handles the case of SESSION variables
$delimiter = substr($varName, 10, 1);
if($delimiter == '$') $delimiter = '';
$newName = '$_SESSION[' .$delimiter . $randomName . $delimiter . ']';
}
else if(substr($varName, 0,5) == '$_ENV')
{
// this handles the case of ENV variables
$delimiter = substr($varName, 6, 1);
if($delimiter == '$') $delimiter = '';
$newName = '$_ENV[' .$delimiter . $randomName . $delimiter . ']';
}
else if(substr($varName, 0,8) == '$_COOKIE')
{
// this handles the case of COOKIE variables
$delimiter = substr($varName, 9, 1);
if($delimiter == '$') $delimiter = '';
$newName = '$_COOKIE[' .$delimiter . $randomName . $delimiter . ']';
}
else if(substr($varName, 1, 1) == '$')
{
// this handles the case of variable variables
$name = substr($varName, 2, strlen($varName)-2);
$pattern = '/(?=\$)\$' . $name . '.*;/';
preg_match_all($pattern, $source, $varDeclaration);
$varDeclaration = $varDeclaration[0][0];
preg_match('/\s*=\s*["\'](?:\\.|[^"\\]])*["\']/', $varDeclaration, $varContent);
$varContent = $varContent[0];
preg_match('/["\'](?:\\.|[^"\\]])*["\']/', $varContent, $varContentDetail);
$varContentDetail = substr($varContentDetail[0], 1, strlen($varContentDetail[0])-2);
$replacementDetail = str_replace($varContent, substr($replacements["$" . $varContentDetail], 1, strlen($replacements["$" . $varContentDetail])-1), $varContent);
$explode = explode($varContentDetail, $varContent);
$replacement = $explode[0] . $replacementDetail . $explode[1];
$obfuscated_source = str_replace($varContent, $replacement, $obfuscated_source);
}
else
{
$newName = '$' . $randomName;
}
$obfuscated_source = str_replace($varName, $newName, $obfuscated_source);
$replacements[$varName] = $newName;
}
// this part may be useful to change hard-coded returns of functions.
// it changes all remaining words in the document which are like the previous changed variable names to the new variable names
// attention: if the variables in the document have common names it could also change text you don't like to change...
foreach($replacements as $before => $after)
{
$name_before = str_replace("$", "", $before);
$name_after = str_replace("$", "", $after);
$obfuscated_source = str_replace($name_before, $name_after, $obfuscated_source);
}
// here you can place code to write back the obfuscated code to the same or to a new file, e.g:
$file = fopen("result.php", "w");
fwrite($file, $obfuscated_source);
fclose($file);
EDIT there are still some cases left which require some effort.
At least some kinds of variable declarations may not be handled correctly!
Also the first regex is not perfect, my current status is like:
'/\$\$?[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/'
but this does not get the index-values of predefined variables... But I think it has some potential. If you use it like here you get all 18 involved variables... The next step could be to determine if a [..] follws after the variable name. If so any predefined variable AND such cases like $g = $GLOBALS; and any further use of such a $g would be covered...
EDIT 4.12.2016
due to LSerni and several comments on both the original quesion and some solutions I also wrote a parsing solution which you can find below.
It handles an extended example file which was my aim. If you find any other challenge, please tell me!
new solution:
$variable_names_before = array();
$variable_names_after = array();
$function_names_before = array();
$function_names_after = array();
$forbidden_variables = array(
'$GLOBALS',
'$_SERVER',
'$_GET',
'$_POST',
'$_FILES',
'$_COOKIE',
'$_SESSION',
'$_REQUEST',
'$_ENV',
);
$forbidden_functions = array(
'unlink'
);
// read file
$data = file_get_contents("example.php");
$lock = false;
$lock_quote = '';
for($i = 0; $i < strlen($data); $i++)
{
// check if there are quotation marks
if(($data[$i] == "'" || $data[$i] == '"'))
{
// if first quote
if($lock_quote == '')
{
// remember quotation mark
$lock_quote = $data[$i];
$lock = true;
}
else if($data[$i] == $lock_quote)
{
$lock_quote = '';
$lock = false;
}
}
// detect variables
if(!$lock && $data[$i] == '$')
{
$start = $i;
// detect variable variable names
if($data[$i+1] == '$')
{
$start++;
// increment $i to avoid second detection of variable variable as "normal variable"
$i++;
}
$end = 1;
// find end of variable name
while(ctype_alpha($data[$start+$end]) || is_numeric($data[$start+$end]) || $data[$start+$end] == "_")
{
$end++;
}
// extract variable name
$variable_name = substr($data, $start, $end);
if($variable_name == '$')
{
continue;
}
// check if variable name is allowed
if(in_array($variable_name, $forbidden_variables))
{
// forbidden variable deteced, do whatever you want!
}
else
{
// check if variable name already has been detected
if(!in_array($variable_name, $variable_names_before))
{
$variable_names_before[] = $variable_name;
// generate random name for variable
$new_variable_name = "";
do
{
$new_variable_name = random_str(rand(5, 20));
}
while(in_array($new_variable_name, $variable_names_after));
$variable_names_after[] = $new_variable_name;
}
//var_dump("variable: " . $variable_name);
}
}
// detect function-definitions
// the third condition checks if the symbol before 'function' is neither a character nor a number
if(!$lock && strtolower(substr($data, $i, 8)) == 'function' && (!ctype_alpha($data[$i-1]) && !is_numeric($data[$i-1])))
{
// find end of function name
$end = strpos($data, '(', $i);
// extract function name and remove possible spaces on the right side
$function_name = rtrim(substr($data, ($i+9), $end-$i-9));
// check if function name is allowed
if(in_array($function_name, $forbidden_functions))
{
// forbidden function detected, do whatever you want!
}
else
{
// check if function name already has been deteced
if(!in_array($function_name, $function_names_before))
{
$function_names_before[] = $function_name;
// generate random name for variable
$new_function_name = "";
do
{
$new_function_name = random_str(rand(5, 20));
}
while(in_array($new_function_name, $function_names_after));
$function_names_after[] = $new_function_name;
}
//var_dump("function: " . $function_name);
}
}
}
// this array contains prefixes and suffixes for string literals which
// may contain variable names.
// if string literals as a return of functions should not be changed
// remove the last two inner arrays of $possible_pre_suffixes
// this will enable correct handling of situations like
// - $func = 'getNewName'; echo $func();
// but it will break variable variable names like
// - ${getNewName()}
$possible_pre_suffixes = array(
array(
"prefix" => "= '",
"suffix" => "'"
),
array(
"prefix" => '= "',
"suffix" => '"'
),
array(
"prefix" => "='",
"suffix" => "'"
),
array(
"prefix" => '="',
"suffix" => '"'
),
array(
"prefix" => 'rn "', // return " ";
"suffix" => '"'
),
array(
"prefix" => "rn '", // return ' ';
"suffix" => "'"
)
);
// replace variable names
for($i = 0; $i < count($variable_names_before); $i++)
{
$data = str_replace($variable_names_before[$i], '$' . $variable_names_after[$i], $data);
// try to find strings which equals variable names
// this is an attempt to handle situations like:
// $a = "123";
// $b = "a"; <--
// $$b = "321"; <--
// and also
// function getName() { return "a"; }
// echo ${getName()};
$name = substr($variable_names_before[$i], 1);
for($j = 0; $j < count($possible_pre_suffixes); $j++)
{
$data = str_replace($possible_pre_suffixes[$j]["prefix"] . $name . $possible_pre_suffixes[$j]["suffix"],
$possible_pre_suffixes[$j]["prefix"] . $variable_names_after[$i] . $possible_pre_suffixes[$j]["suffix"],
$data);
}
}
// replace funciton names
for($i = 0; $i < count($function_names_before); $i++)
{
$data = str_replace($function_names_before[$i], $function_names_after[$i], $data);
}
/**
* https://stackoverflow.com/a/31107425/4233593
* Generate a random string, using a cryptographically secure
* pseudorandom number generator (random_int)
*
* For PHP 7, random_int is a PHP core function
* For PHP 5.x, depends on https://github.com/paragonie/random_compat
*
* #param int $length How many characters do we want?
* #param string $keyspace A string of all possible characters
* to select from
* #return string
*/
function random_str($length, $keyspace = 'abcdefghijklmnopqrstuvwxyz')
{
$str = '';
$max = mb_strlen($keyspace, '8bit') - 1;
for ($i = 0; $i < $length; ++$i)
{
$str .= $keyspace[random_int(0, $max)];
}
return $str;
}
example input file:
$example = 'some $string';
$test = '$abc 123' . $example . '$hello here I "$am"';
if(isset($_POST['something'])){
echo $_POST['something'];
}
function exampleFunction($variable2){
echo $variable2;
}
exampleFunction($example);
$variable3 = array('example','another');
foreach($variable3 as $key => $var3val){
echo $var3val."somestring";
}
$test = "example";
$$test = 'hello';
exampleFunction($example);
exampleFunction($$test);
function getNewName()
{
return "test";
}
exampleFunction(${getNewName()});
output of my function:
$fesvffyn = 'some $string';
$zimskk = '$abc 123' . $fesvffyn . '$hello here I "$am"';
if(isset($_POST['something'])){
echo $_POST['something'];
}
function kainbtqpybl($yxjvlvmyfskwqcevo){
echo $yxjvlvmyfskwqcevo;
}
kainbtqpybl($fesvffyn);
$lmiphctfgjfdnonjpia = array('example','another');
foreach($lmiphctfgjfdnonjpia as $qypdfcpcla => $gwlpcpnvnhbvbyflr){
echo $gwlpcpnvnhbvbyflr."somestring";
}
$zimskk = "fesvffyn";
$$zimskk = 'hello';
kainbtqpybl($fesvffyn);
kainbtqpybl($$zimskk);
function tauevjkk()
{
return "zimskk";
}
kainbtqpybl(${tauevjkk()});
I know there are some cases left, where you can find an issue with variable variable names, but then you may have to expand the $possible_pre_suffixes array...
Maybe you also want to differentiate between global variables and "forbidden variables"...
Well, you can try write your own but the number of strange things you have to handle are likely to overwhelm you, and I presume you are more interested in using such a tool than writing and maintaining one yourself. (There a lots of broken PHP obfuscators out there, where people have tried to do this).
If you want one that is reliable, you do have base it on a parser or your tool will mis-parse the text and handle it wrong (this is the first "strange thing"). Regexes simply won't do the trick.
The Semantic Designs PHP Obfuscator (from my company), taken out of the box, took this slightly modified version of Elias Van Ootegem's example:
<?php
//non-obfuscated
function getVarname()
{//the return value has to change
return (('foobar'));
}
$format = '%s = %d';
$foobar = 123;
$variableVar = (('format'));//you need to change this string
printf($$variableVar, $variableVar = getVarname(), $$variableVar);
echo PHP_EOL;
var_dump($GLOBALS[(('foobar'))]);//note the key == the var
and produced this:
<?php function l0() { return (('O0')); } $l1="%\163 = %d"; $O1=0173; $l2=(('O2')); printf($$l2,$l2=l0(),$$l2); echo PHP_EOL; var_dump($GLOBALS[(('O0'))]);
The key issue in Elias's example are strings that actually contain variable names. In general, there is no way for a tool to know that "x" is a variable name, and not just the string containing the letter x. But, the programmers know. We insist that such strings be marked [by enclosing them in ((..)) ] and then the obfuscator can obfuscate their content properly.
Sometimes the string contains variables names and other things; it that case,
the programmer has to break up the string into "variable name" content and everything else. This is pretty easy to do in practice, and is
the "slight change" I made to his supplied code.
Other strings, not being marked, are left alone. You only have to do this
once to the source file. [You can say this is cheating, but no other practical answer will work; the tool cannot know reliably. Halting Problem, if you insist.].
The next thing to get right is reliable obfuscation across multiple files. You can't do this one file at a time. This obfuscator has been used on very big PHP applications (thousands of PHP script files).
Yes, it does use a full PHP parser. Not nikic's.
I ended up with this simple code:
$tokens = token_get_all($src);
$skip = array('$this','$_GET','$_POST','$_REQUEST','$_SERVER','$_COOKIE','$_SESSION');
function renameVars($tokens,$content,$skip){
$vars = array();
foreach($tokens as $token) {
if ($token[0] == T_VARIABLE && !in_array($token[1],$skip))
$vars[generateRandomString()]= $token[1];
}
$vars = array_unique($vars);
$vars2 = $vars;
foreach($vars as $new => $old){
foreach($vars2 as $var){
if($old!=$var && strpos($var,$old)!==false){
continue 2;
}
}
$content = str_replace($old,'${"'.$new.'"}',$content);
//function(${"example"}) will trigger error. This is why we need this:
$content = str_replace('(${"'.$new.'"}','($'.$new,$content);
$content = str_replace(',${"'.$new.'"}',',$'.$new,$content);
$chars = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z');
//for things like function deleteExpired(Varien_Event_Observer $fz5eDWIt1si), Exception,
foreach($chars as $char){
$content = str_replace($char.' ${"'.$new.'"}',$char.' $'.$new,$content);
}
}
It works for me because the code is simple. I guess it wont work in all scenarios.
I have it working now but there may still be some vulnerabilities because PHP allows functions names and variables names to be generated dynamically.
The first function replaces $_SESSION, $_POST etc. with functions:
function replaceArrayVariable($str, $arr, $function)
{
$str = str_replace($arr, $function, $str);
$lastPos = 0;
while (($lastPos = strpos($str, $function, $lastPos)) !== false)
{
$lastPos = $lastPos + strlen($function);
$currentPos = $lastPos;
$openSqrBrackets = 1;
while ($openSqrBrackets > 0)
{
if ($str[$currentPos] === '[')
$openSqrBrackets++;
elseif ($str[$currentPos] === ']')
$openSqrBrackets--;
$currentPos++;
}
$str[$currentPos - 1] = ')';
}
return $str;
}
The second renames functions ignoring whitelisted keywords:
function renameFunctions($str)
{
preg_match_all('/[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/', $str, $matches, PREG_OFFSET_CAPTURE);
$totalMatches = count($matches[0]);
$offset = 0;
for ($i = 0; $i < $totalMatches; $i++)
{
$matchIndex = $matches[0][$i][1] + $offset;
if ($matchIndex === 0 || $str[$matchIndex - 1] !== '$')
{
$keyword = $matches[0][$i][0];
if ($keyword !== 'true' && $keyword !== 'false' && $keyword !== 'if' && $keyword !== 'else' && $keyword !== 'getPost' && $keyword !== 'getSession')
{
$str = substr_replace($str, 'qq', $matchIndex, 0);
$offset += 2;
}
}
}
return $str;
}
Then to rename functions, variables, and non-whitelisted keywords, I use this code:
$str = replaceArrayVariable($str, '$_POST[', 'getPost(');
$str = replaceArrayVariable($str, '$_SESSION[', 'getSession(');
preg_match_all('/\'(?:\\\\.|[^\\\\\'])*\'|.[^\']+/', $str, $matches);
$str = '';
foreach ($matches[0] as $match)
{
if ($match[0] != "'")
{
$match = preg_replace('!\s+!', ' ', $match);
$match = renameFunctions($match);
$match = str_replace('$', '$qq', $match);
}
$str .= $match;
}

Does PHP have a feature like Python's template strings?

Python has a feature called template strings.
>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
I know that PHP allows you to write:
"Hello $person"
and have $person substituted, but the templates can be reused in various sections of the code?
You can use template strings like this:
$name = "Maria";
$info["last_name"] = "Warner";
echo "Hello {$name} {$info["last_name"]}";
This will echo Hello Maria Warner.
You could also use strtr:
$template = '$who likes $what';
$vars = array(
'$who' => 'tim',
'$what' => 'kung pao',
);
echo strtr($template, $vars);
Outputs:
tim likes kung pao
I think there are a bunch of ways to do this... but this comes to mind.
$search = array('%who%', '%what_id%');
$replace = array('tim', 'kung pao');
$conference_target = str_replace(
$search,
$replace,
"%who% likes %what%"
);
Ha, we even had one in our framework using vsprintf:
class Helper_StringFormat {
public static function sprintf($format, array $args = array()) {
$arg_nums = array_slice(array_flip(array_keys(array(0 => 0) + $args)), 1);
for ($pos = 0; preg_match('/(?<=%)\(([a-zA-Z_]\w*)\)/', $format, $match, PREG_OFFSET_CAPTURE, $pos);) {
$arg_pos = $match[0][2];
$arg_len = strlen($match[0][0]);
$arg_key = $match[1][0];
if (! array_key_exists($arg_key, $arg_nums)) {
user_error("sprintfn(): Missing argument '${arg_key}'", E_USER_WARNING);
return false;
}
$format = substr_replace($format, $replace = $arg_nums[$arg_key] . '$', $arg_pos, $arg_len);
$pos = $arg_pos + strlen($replace);
}
return vsprintf($format, array_values($args));
}
}
Which looks like it came from the sprintf page
This allows for calls like:
sprintfn('second: %(second)s ; first: %(first)s', array(
'first' => '1st',
'second'=> '2nd'
));
UPDATE
Here is an update to do what you want... not fully tested though
class Helper_StringFormat {
public static function sprintf($format, array $args = array()) {
$arg_nums = array_slice(array_flip(array_keys(array(0 => 0) + $args)), 1);
for ($pos = 0; preg_match('/(?<=%)\(([a-zA-Z_][\w\s]*)\)/', $format, $match, PREG_OFFSET_CAPTURE, $pos);) {
$arg_pos = $match[0][1];
$arg_len = strlen($match[0][0]);
$arg_key = $match[1][0];
if (! array_key_exists($arg_key, $arg_nums)) {
user_error("sprintfn(): Missing argument '${arg_key}'", E_USER_WARNING);
return false;
}
$format = substr_replace($format, $replace = $arg_nums[$arg_key] . '$', $arg_pos, $arg_len);
$pos = $arg_pos + strlen($replace); // skip to end of replacement for next iteration
}
return vsprintf($format, array_values($args));
}
}
$str = "%(my var)s now work with a slight %(my var2)s";
$repl = array("my var" => "Spaces", "my var2" => "modification.");
echo Helper_StringFormat::sprintf($str, $repl);
OUTPUT
Spaces now work with a slight modification.
Another more simple approach would be this:
$s = function ($vars) {
extract($vars);
return "$who likes $what";
};
echo $s(['who' => 'Tim', 'what' => 'King Pao']); // Tim likes King Pao
And yes, PHPStorm will complain...
I personally most like sprintf (or vsprintf, for an array of arguments). It places them in the intended order, coerces types as needed, and has a lot more advanced features available.
Example:
$var = sprintf("%s costs %.2f dollars", "Cookies", 1.1);
This will result in the value Cookies cost 1.10 dollars.
There's an entire family of printf functions for different use cases, all listed under "See Also".
Very versatile: same methods for providing variables, array components, function results, etc.
I made a function to do what you want. i made it "quck-and-dirty" because i have not much time to refactorize it, maybe i upload it to my github.
EDIT: a bug correction...
Use it like
formattemplatter(
'$who likes $what'
, array(
'who' => 'Tim'
, 'what' => 'Kung Pao'
)
);
Variables can be [a-zA-Z0-9_] only.
function formattemplater($string, $params) {
// Determine largest string
$largest = 0;
foreach(array_keys($params) as $k) {
if(($l=strlen($k)) > $largest) $largest=$l;
}
$buff = '';
$cp = false; // Conditional parenthesis
$ip = false; // Inside parameter
$isp = false; // Is set parameter
$bl = 1; // buffer length
$param = ''; // current parameter
$out = ''; // output string
$string .= '!';
for($sc=0,$c=$oc='';isset($string{$sc});++$sc,++$bl) {
$c = $string{$sc};
if($ip) {
$a = ord($c);
if(!($a == 95 || ( // underscore
($a >= 48 && $a <= 57) // 0-9
|| ($a >= 65 && $a <= 90) // A-Z
|| ($a >= 97 && $a <= 122) // a-z
)
)) {
$isp = isset($params[$buff]);
if(!$cp && !$isp) {
trigger_error(
sprintf(
__FUNCTION__.': the parameter "%s" is not defined'
, $buff
)
, E_USER_ERROR
);
} elseif(!$cp || $isp) {
$out .= $params[$buff];
}
$isp = $isp && !empty($params[$buff]);
$oc = $buff = '';
$bl = 0;
$ip = false;
}
}
if($cp && $c === ')') {
$out .= $buff;
$cp = $isp = false;
$c = $buff = '';
$bl = 0;
}
if(($cp && $isp) || $ip)
$buff .= $c;
if($c === '$' && $oc !== '\\') {
if($oc === '(') $cp = true;
else $out .= $oc;
$ip = true;
$buff = $c = $oc = '';
$bl = 0;
}
if(!$cp && $bl > $largest) {
$buff = substr($buff, - $largest);
$bl = $largest;
}
if(!$ip && ( !$cp || ($cp && $isp))) {
$out .= $oc;
if(!$cp) $oc = $c;
}
}
return $out;
}
Just for the sake of completeness: there is also Heredoc.
$template = fn( $who, $what ) => <<<EOT
$who likes $what
EOT;
echo( $template( 'tim', 'kung pao' ) );
Outputs:
tim likes kung pao
Sidenotes:
You get highlighting in your favourite language (if properly configured). Just substitute EOT (from the sample above) with whatever you like (e.c. HTML, SQL, PHP, ...).
Escape arrays with curly braces {$data['who']}. Accessing objekts like $data->who works without braces.
Arrow functions like fn($a)=>$a are available since PHP 7.4. You can write function($a){return $a;} if you are using PHP<7.4.

PHP recursive variable replacement

I'm writing code to recursively replace predefined variables from inside a given string. The variables are prefixed with the character '%'. Input strings that start with '^' are to be evaluated.
For instance, assuming an array of variables such as:
$vars['a'] = 'This is a string';
$vars['b'] = '123';
$vars['d'] = '%c'; // Note that $vars['c'] has not been defined
$vars['e'] = '^5 + %d';
$vars['f'] = '^11 + %e + %b*2';
$vars['g'] = '^date(\'l\')';
$vars['h'] = 'Today is %g.';
$vars['input_digits'] = '*****';
$vars['code'] = '%input_digits';
The following code would result in:
a) $str = '^1 + %c';
$rc = _expand_variables($str, $vars);
// Result: $rc == 1
b) $str = '^%a != NULL';
$rc = _expand_variables($str, $vars);
// Result: $rc == 1
c) $str = '^3+%f + 3';
$rc = _expand_variables($str, $vars);
// Result: $rc == 262
d) $str = '%h';
$rc = _expand_variables($str, $vars);
// Result: $rc == 'Today is Monday'
e) $str = 'Your code is: %code';
$rc = _expand_variables($str, $vars);
// Result: $rc == 'Your code is: *****'
Any suggestions on how to do that? I've spent many days trying to do this, but only achieved partial success. Unfortunately, my last attempt managed to generate a 'segmentation fault'!!
Help would be much appreciated!
Note that there is no check against circular inclusion, which would simply lead to an infinite loop. (Example: $vars['s'] = '%s'; ..) So make sure your data is free of such constructs.
The commented code
// if(!is_numeric($expanded) || (substr($expanded.'',0,1)==='0'
// && strpos($expanded.'', '.')===false)) {
..
// }
can be used or skipped. If it is skipped, any replacement is quoted, if the string $str will be evaluated later on! But since PHP automatically converts strings to numbers (or should I say it tries to do so??) skipping the code should not lead to any problems.
Note that boolean values are not supported! (Also there is no automatic conversion done by PHP, that converts strings like 'true' or 'false' to the appropriate boolean values!)
<?
$vars['a'] = 'This is a string';
$vars['b'] = '123';
$vars['d'] = '%c';
$vars['e'] = '^5 + %d';
$vars['f'] = '^11 + %e + %b*2';
$vars['g'] = '^date(\'l\')';
$vars['h'] = 'Today is %g.';
$vars['i'] = 'Zip: %j';
$vars['j'] = '01234';
$vars['input_digits'] = '*****';
$vars['code'] = '%input_digits';
function expand($str, $vars) {
$regex = '/\%(\w+)/';
$eval = substr($str, 0, 1) == '^';
$res = preg_replace_callback($regex, function($matches) use ($eval, $vars) {
if(isset($vars[$matches[1]])) {
$expanded = expand($vars[$matches[1]], $vars);
if($eval) {
// Special handling since $str is going to be evaluated ..
// if(!is_numeric($expanded) || (substr($expanded.'',0,1)==='0'
// && strpos($expanded.'', '.')===false)) {
$expanded = "'$expanded'";
// }
}
return $expanded;
} else {
// Variable does not exist in $vars array
if($eval) {
return 'null';
}
return $matches[0];
}
}, $str);
if($eval) {
ob_start();
$expr = substr($res, 1);
if(eval('$res = ' . $expr . ';')===false) {
ob_end_clean();
die('Not a correct PHP-Expression: '.$expr);
}
ob_end_clean();
}
return $res;
}
echo expand('^1 + %c',$vars);
echo '<br/>';
echo expand('^%a != NULL',$vars);
echo '<br/>';
echo expand('^3+%f + 3',$vars);
echo '<br/>';
echo expand('%h',$vars);
echo '<br/>';
echo expand('Your code is: %code',$vars);
echo '<br/>';
echo expand('Some Info: %i',$vars);
?>
The above code assumes PHP 5.3 since it uses a closure.
Output:
1
1
268
Today is Tuesday.
Your code is: *****
Some Info: Zip: 01234
For PHP < 5.3 the following adapted code can be used:
function expand2($str, $vars) {
$regex = '/\%(\w+)/';
$eval = substr($str, 0, 1) == '^';
$res = preg_replace_callback($regex, array(new Helper($vars, $eval),'callback'), $str);
if($eval) {
ob_start();
$expr = substr($res, 1);
if(eval('$res = ' . $expr . ';')===false) {
ob_end_clean();
die('Not a correct PHP-Expression: '.$expr);
}
ob_end_clean();
}
return $res;
}
class Helper {
var $vars;
var $eval;
function Helper($vars,$eval) {
$this->vars = $vars;
$this->eval = $eval;
}
function callback($matches) {
if(isset($this->vars[$matches[1]])) {
$expanded = expand($this->vars[$matches[1]], $this->vars);
if($this->eval) {
// Special handling since $str is going to be evaluated ..
if(!is_numeric($expanded) || (substr($expanded . '', 0, 1)==='0'
&& strpos($expanded . '', '.')===false)) {
$expanded = "'$expanded'";
}
}
return $expanded;
} else {
// Variable does not exist in $vars array
if($this->eval) {
return 'null';
}
return $matches[0];
}
}
}
I now have written an evaluator for your code, which addresses the circular reference problem, too.
Use:
$expression = new Evaluator($vars);
$vars['a'] = 'This is a string';
// ...
$vars['circular'] = '%ralucric';
$vars['ralucric'] = '%circular';
echo $expression->evaluate('%circular');
I use a $this->stack to handle circular references. (No idea what a stack actually is, I simply named it so ^^)
class Evaluator {
private $vars;
private $stack = array();
private $inEval = false;
public function __construct(&$vars) {
$this->vars =& $vars;
}
public function evaluate($str) {
// empty string
if (!isset($str[0])) {
return '';
}
if ($str[0] == '^') {
$this->inEval = true;
ob_start();
eval('$str = ' . preg_replace_callback('#%(\w+)#', array($this, '_replace'), substr($str, 1)) . ';');
if ($error = ob_get_clean()) {
throw new LogicException('Eval code failed: '.$error);
}
$this->inEval = false;
}
else {
$str = preg_replace_callback('#%(\w+)#', array($this, '_replace'), $str);
}
return $str;
}
private function _replace(&$matches) {
if (!isset($this->vars[$matches[1]])) {
return $this->inEval ? 'null' : '';
}
if (isset($this->stack[$matches[1]])) {
throw new LogicException('Circular Reference detected!');
}
$this->stack[$matches[1]] = true;
$return = $this->evaluate($this->vars[$matches[1]]);
unset($this->stack[$matches[1]]);
return $this->inEval == false ? $return : '\'' . $return . '\'';
}
}
Edit 1: I tested the maximum recursion depth for this script using this:
$alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEF'; // GHIJKLMNOPQRSTUVWXYZ
$length = strlen($alphabet);
$vars['a'] = 'Hallo World!';
for ($i = 1; $i < $length; ++$i) {
$vars[$alphabet[$i]] = '%' . $alphabet[$i-1];
}
var_dump($vars);
$expression = new Evaluator($vars);
echo $expression->evaluate('%' . $alphabet[$length - 1]);
If another character is added to $alphabet maximum recursion depth of 100 is reached. (But probably you can modify this setting somewhere?)
I actually just did this while implementing a MVC framework.
What I did was create a "find-tags" function that uses a regular expression to find all things that should be replaced using preg_match_all and then iterated through the list and called the function recursively with the str_replaced code.
VERY Simplified Code
function findTags($body)
{
$tagPattern = '/{%(?P<tag>\w+) *(?P<inputs>.*?)%}/'
preg_match_all($tagPattern,$body,$results,PREG_SET_ORDER);
foreach($results as $command)
{
$toReturn[] = array(0=>$command[0],'tag'=>$command['tag'],'inputs'=>$command['inputs']);
}
if(!isset($toReturn))
$toReturn = array();
return $toReturn;
}
function renderToView($body)
{
$arr = findTags($body);
if(count($arr) == 0)
return $body;
else
{
foreach($arr as $tag)
{
$body = str_replace($tag[0],$LOOKUPARRY[$tag['tag']],$body);
}
}
return renderToView($body);
}

Categories