Regex to parse define() contents, possible? - php

I am very new to regex, and this is way too advanced for me. So I am asking the experts over here.
Problem
I would like to retrieve the constants / values from a php define()
DEFINE('TEXT', 'VALUE');
Basically I would like a regex to be able to return the name of constant, and the value of constant from the above line. Just TEXT and VALUE . Is this even possible?
Why I need it? I am dealing with language file and I want to get all couples (name, value) and put them in array. I managed to do it with str_replace() and trim() etc.. but this way is long and I am sure it could be made easier with single line of regex.
Note: The VALUE may contain escaped single quotes as well. example:
DEFINE('TEXT', 'J\'ai');
I hope I am not asking for something too complicated. :)
Regards

For any kind of grammar-based parsing, regular expressions are usually an awful solution. Even smple grammars (like arithmetic) have nesting and it's on nesting (in particular) that regular expressions just fall over.
Fortunately PHP provides a far, far better solution for you by giving you access to the same lexical analyzer used by the PHP interpreter via the token_get_all() function. Give it a character stream of PHP code and it'll parse it into tokens ("lexemes"), which you can do a bit of simple parsing on with a pretty simple finite state machine.
Run this program (it's run as test.php so it tries it on itself). The file is deliberately formatted badly so you can see it handles that with ease.
<?
define('CONST1', 'value' );
define (CONST2, 'value2');
define( 'CONST3', time());
define('define', 'define');
define("test", VALUE4);
define('const5', //
'weird declaration'
) ;
define('CONST7', 3.14);
define ( /* comment */ 'foo', 'bar');
$defn = 'blah';
define($defn, 'foo');
define( 'CONST4', define('CONST5', 6));
header('Content-Type: text/plain');
$defines = array();
$state = 0;
$key = '';
$value = '';
$file = file_get_contents('test.php');
$tokens = token_get_all($file);
$token = reset($tokens);
while ($token) {
// dump($state, $token);
if (is_array($token)) {
if ($token[0] == T_WHITESPACE || $token[0] == T_COMMENT || $token[0] == T_DOC_COMMENT) {
// do nothing
} else if ($token[0] == T_STRING && strtolower($token[1]) == 'define') {
$state = 1;
} else if ($state == 2 && is_constant($token[0])) {
$key = $token[1];
$state = 3;
} else if ($state == 4 && is_constant($token[0])) {
$value = $token[1];
$state = 5;
}
} else {
$symbol = trim($token);
if ($symbol == '(' && $state == 1) {
$state = 2;
} else if ($symbol == ',' && $state == 3) {
$state = 4;
} else if ($symbol == ')' && $state == 5) {
$defines[strip($key)] = strip($value);
$state = 0;
}
}
$token = next($tokens);
}
foreach ($defines as $k => $v) {
echo "'$k' => '$v'\n";
}
function is_constant($token) {
return $token == T_CONSTANT_ENCAPSED_STRING || $token == T_STRING ||
$token == T_LNUMBER || $token == T_DNUMBER;
}
function dump($state, $token) {
if (is_array($token)) {
echo "$state: " . token_name($token[0]) . " [$token[1]] on line $token[2]\n";
} else {
echo "$state: Symbol '$token'\n";
}
}
function strip($value) {
return preg_replace('!^([\'"])(.*)\1$!', '$2', $value);
}
?>
Output:
'CONST1' => 'value'
'CONST2' => 'value2'
'CONST3' => 'time'
'define' => 'define'
'test' => 'VALUE4'
'const5' => 'weird declaration'
'CONST7' => '3.14'
'foo' => 'bar'
'CONST5' => '6'
This is basically a finite state machine that looks for the pattern:
function name ('define')
open parenthesis
constant
comma
constant
close parenthesis
in the lexical stream of a PHP source file and treats the two constants as a (name,value) pair. In doing so it handles nested define() statements (as per the results) and ignores whitespace and comments as well as working across multiple lines.
Note: I've deliberatley made it ignore the case when functions and variables are constant names or values but you can extend it to that as you wish.
It's also worth pointing out that PHP is quite forgiving when it comes to strings. They can be declared with single quotes, double quotes or (in certain circumstances) with no quotes at all. This can be (as pointed out by Gumbo) be an ambiguous reference reference to a constant and you have no way of knowing which it is (no guaranteed way anyway), giving you the chocie of:
Ignoring that style of strings (T_STRING);
Seeing if a constant has already been declared with that name and replacing it's value. There's no way you can know what other files have been called though nor can you process any defines that are conditionally created so you can't say with any certainty if anything is definitely a constant or not nor what value it has; or
You can just live with the possibility that these might be constants (which is unlikely) and just treat them as strings.
Personally I would go for (1) then (3).

This is possible, but I would rather use get_defined_constants(). But make sure all your translations have something in common (like all translations starting with T), so you can tell them apart from other constants.

Try this regular expression to find the define calls:
/\bdefine\(\s*("(?:[^"\\]+|\\(?:\\\\)*.)*"|'(?:[^'\\]+|\\(?:\\\\)*.)*')\s*,\s*("(?:[^"\\]+|\\(?:\\\\)*.)*"|'(?:[^'\\]+|\\(?:\\\\)*.)*')\s*\);/is
So:
$pattern = '/\\bdefine\\(\\s*("(?:[^"\\\\]+|\\\\(?:\\\\\\\\)*.)*"|\'(?:[^\'\\\\]+|\\\\(?:\\\\\\\\)*.)*\')\\s*,\\s*("(?:[^"\\\\]+|\\\\(?:\\\\\\\\)*.)*"|\'(?:[^\'\\\\]+|\\\\(?:\\\\\\\\)*.)*\')\\s*\\);/is';
$str = '<?php define(\'foo\', \'bar\'); define("define(\\\'foo\\\', \\\'bar\\\')", "define(\'foo\', \'bar\')"); ?>';
preg_match_all($pattern, $str, $matches, PREG_SET_ORDER);
var_dump($matches);
I know that eval is evil. But that’s the best way to evaluate the string expressions:
$constants = array();
foreach ($matches as $match) {
eval('$constants['.$match[1].'] = '.$match[1].';');
}
var_dump($constants);

You might not need to go overboard with the regex complexity - something like this will probably suffice
/DEFINE\('(.*?)',\s*'(.*)'\);/
Here's a PHP sample showing how you might use it
$lines=file("myconstants.php");
foreach($lines as $line) {
$matches=array();
if (preg_match('/DEFINE\(\'(.*?)\',\s*\'(.*)\'\);/i', $line, $matches)) {
$name=$matches[1];
$value=$matches[2];
echo "$name = $value\n";
}
}

Not every problem with text should be solved with a regexp, so I'd suggest you state what you want to achieve and not how.
So, instead of using php's parser which is not really useful, or instead of using a completely undebuggable regexp, why not write a simple parser?
<?php
$str = "define('nam\\'e', 'va\\\\\\'lue');\ndefine('na\\\\me2', 'value\\'2');\nDEFINE('a', 'b');";
function getDefined($str) {
$lines = array();
preg_match_all('#^define[(][ ]*(.*?)[ ]*[)];$#mi', $str, $lines);
$res = array();
foreach ($lines[1] as $cnt) {
$p = 0;
$key = parseString($cnt, $p);
// Skip comma
$p++;
// Skip space
while ($cnt{$p} == " ") {
$p++;
}
$value = parseString($cnt, $p);
$res[$key] = $value;
}
return $res;
}
function parseString($s, &$p) {
$quotechar = $s[$p];
if (! in_array($quotechar, array("'", '"'))) {
throw new Exception("Invalid quote character '" . $quotechar . "', input is " . var_export($s, true) . " # " . $p);
}
$len = strlen($s);
$quoted = false;
$res = "";
for ($p++;$p < $len;$p++) {
if ($quoted) {
$quoted = false;
$res .= $s{$p};
} else {
if ($s{$p} == "\\") {
$quoted = true;
continue;
}
if ($s{$p} == $quotechar) {
$p++;
return $res;
}
$res .= $s{$p};
}
}
throw new Exception("Premature end of line");
}
var_dump(getDefined($str));
Output:
array(3) {
["nam'e"]=>
string(7) "va\'lue"
["na\me2"]=>
string(7) "value'2"
["a"]=>
string(1) "b"
}

Related

Super-hero PHP code refactoring: How to initialize variables

I need to run over some pages written in PHP by somebody that doesn't know that a variable should be initialized before use..
Therefore I have thousands of lines to check in order to make sure I Do not have such thing
<?php
$foo = $bar . "I m a noob";
?>
and of course having the problem with $bar not initialized.
The problem is that I have about 20~50 variables in this case in 25 files..
Do you know any super-hero way to set all the variables to ' ' or null ?
I do not want to set the warning level to E or W .. that's too crappy.
Thanks in advance!
Theoretically this code at the beginning of the script does the job:
$code = file_get_contents(__FILE__);
preg_match_all('/(?<=\$)[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/', $code, $matches);
$variables = array_unique($matches[0]);
foreach ($variables as $variable) {
if (!isset($$variable)) {
$$variable = null;
}
}
The regular expression is from PHP site.
EDIT
A clean and quicker way, without a regular expression:
$code = file_get_contents(__FILE__);
$tokens = token_get_all($code);
foreach ($tokens as $token) {
if (is_array($token) && $token[0] == T_VARIABLE) {
$variable = ltrim($token[1], '$');
if (!isset($$variable)) {
$$variable = null;
}
}
}
you can use an array for the 25:
for ($i = 0; $i < 25; $i++) { $bar[$i] = ''; /* you can change the "" to null */ }
using an array with a for, you can set a great number of vars with diferente values, or with the same value.

PHP: How to turn a string that contains an array expression in an actual array?

I have an array of user inputs ($atts) as key=>value pairs. Some of the values could be written as an array expression, such as:
'setting' => 'array(50,25)'
In those cases, I would like to convert the array expression contained in that string into an actual array. So the output would be something like:
$atts = array(
'setting' => array(50,25),
'another' => 'not written as an array expression'
)
Written logically, the code would be:
For each key=>value pair in the array $atts...
if the value is a string formatted as an array expression...
explode that value into an array.
Anybody know how I would write this in PHP?
function stringToArray($string) {
$string = "return " . $string . ";";
if (function_exists("token_get_all")) {//tokenizer extension may be disabled
$php = "<?php\n" . $string . "\n?>";
$tokens = token_get_all($php);
foreach ($tokens as $token) {
$type = $token[0];
if (is_long($type)) {
if (in_array($type, array(
T_OPEN_TAG,
T_RETURN,
T_WHITESPACE,
T_ARRAY,
T_LNUMBER,
T_DNUMBER,
T_CONSTANT_ENCAPSED_STRING,
T_DOUBLE_ARROW,
T_CLOSE_TAG,
T_NEW,
T_DOUBLE_COLON
))) {
continue;
}
exit("For your security, we stoped data parsing at '(" . token_name($type) . ") " . $token[1] . "'.");
}
}
}
return eval($string);
}
$a='array(10,20)';
print_r(stringToArray($a));
Use the tokenizer:
function stringToArray($str) {
$array = array();
$toks = token_get_all("<?php $str");
if ($toks[1][0] != T_ARRAY || $toks[2] != '(' || end($toks) != ')')
return null;
for($i=3; $i<count($toks)-1; $i+=2) {
if (count($toks[$i]) != 3)
return null;
if ($toks[$i][0] == T_WHITESPACE) {
$i--;
continue;
}
if ($toks[$i][0] == T_VARIABLE || $toks[$i][0] == T_STRING)
return null;
$value = $toks[$i][1];
if ($toks[$i][0] == T_CONSTANT_ENCAPSED_STRING)
$value = substr($value, 1, strlen($value) - 2);
$array[] = $value;
if ($toks[$i + 1] != ',' && $toks[$i + 1] != ')' && $toks[$i + 1][0] != T_WHITESPACE)
return null;
}
return $array;
}
The above will work only for literals. Passing a variable, a constant, an expression, a nested array or a malformed array declaration will return null:
stringToArray('array(1,2)'); // works
stringToArray('array(1,2.4)'); // works
stringToArray('array("foo",2)'); // works
stringToArray('array(\'hello\',2)'); // works
stringToArray('array()'); // works
stringToArray('array(1,2 + 3)'); // returns null
stringToArray('array(1,2 + 3)'); // returns null
stringToArray('array("foo"."bar")'); // returns null
stringToArray('array(array("hello"))'); // returns null
stringToArray('array($a,$b)'); // returns null
stringToArray('array(new bar)'); // returns null
stringToArray('array(SOME_CONST)'); // returns null
stringToArray('hello'); // returns null
You can also use the following to check if your string is an array expression or not:
function isArrayExpression($str) {
$toks = token_get_all("<?php $str");
return (
$toks[1][0] == T_ARRAY &&
$toks[2] == '(' &&
end($toks) == ')'
);
}
isArrayExpression('array(1,2,3)'); // true
isArrayExpression('array is cool'); // false
isArrayExpression('array(!!!!'); // false
You can always tweak it to your needs. Hope this helps.
As suggested in the comments of your post, eval() will do what you're looking for. However it's not really practical to store arrays as strings in the first place. If you're looking to make data more portable, I'd recommend using json_encode() or even serialize()
You can use eval, but I would highly recommend not using it, and instead try to rethink your design on a better way to handle the settings.
So for your example instead of storing
'setting' => 'array(50,25)'
Could you do something like
'setting' => array('type'=>'array', 'value'=>'50, 25')
then when you load the settings
you can do
switch($type)
case 'array'
$val = explode(', ', $value)
Or something similar
But like others suggested, I would try to save the settings using serialization

PHP dynamically accessing variable value

I want to dynamically access value of variable, let's say I have this array:
$aData = array(
'test' => 123
);
Standard approach to print the test key value would be:
print $aData['test'];
However, if I have to work with string representation of variable (for dynamic purposes)
$sItem = '$aData[\'test\']';
how can I achieve to print aData key named test? Neither of examples provided below works
print $$sItem;
print eval($sItem);
What would be the solution?
Your eval example is lacking the return value:
print eval("return $sItem;");
should do it:
$aData['test'] = 'foo';
$sItem = '$aData[\'test\']';
print eval("return $sItem;"); # foo
But it's not recommended to use eval normally. You can go into hell's kitchen with it because eval is evil.
Instead just parse the string and return the value:
$aData['test'] = 'foo';
$sItem = '$aData[\'test\']';
$r = sscanf($sItem, '$%[a-zA-Z][\'%[a-zA-Z]\']', $vName, $vKey);
if ($r === 2)
{
$result = ${$vName}[$vKey];
}
else
{
$result = NULL;
}
print $result; # foo
This can be done with some other form of regular expression as well.
As your syntax is very close to PHP an actually a subset of it, there is some alternative you can do if you want to validate the input before using eval. The method is to check against PHP tokens and only allow a subset. This does not validate the string (e.g. syntax and if a variable is actually set) but makes it more strict:
function validate_tokens($str, array $valid)
{
$vchk = array_flip($valid);
$tokens = token_get_all(sprintf('<?php %s', $str));
array_shift($tokens);
foreach($tokens as $token)
if (!isset($vchk[$token])) return false;
return true;
}
You just give an array of valid tokens to that function. Those are the PHP tokens, in your case those are:
T_LNUMBER (305) (probably)
T_VARIABLE (309)
T_CONSTANT_ENCAPSED_STRING (315)
You then just can use it and it works with more complicated keys as well:
$aData['test'] = 'foo';
$aData['te\\\'[]st']['more'] = 'bar';
$sItem = '$aData[\'test\']';
$vValue = NULL;
if (validate_tokens($sItem, array(309, 315, '[', ']')))
{
$vValue = eval("return $sItem;");
}
I used this in another answer of the question reliably convert string containing PHP array info to array.
No eval necessary if you have (or can get) the array name and key into separate variables:
$aData = array(
'test' => 123
);
$arrayname = 'aData';
$keyname = 'test';
print ${$arrayname}[$keyname]; // 123
You can just use it like an ordinary array:
$key = "test";
print $aData[$key];
Likewise $aData could itself be an entry in a larger array store.
As alternative, extracting the potential array keys using a regex and traversing an anonymous array (should have mentioned that in your question, if) with references would be possible. See Set multi-dimensional array by key path from array values? and similar topics.
Personally I'm using a construct like this to utilize dynamic variable paths like varname[keyname] instead (similar to how PHP interprets GET parameters). It's just an eval in sheeps clothing (do not agree with the eval scaremongering though):
$val = preg_replace("/^(\w)+(\[(\w+)])$/e", '$\1["\3"]', "aData[test]");
The only solution in your case is to use Eval().
But please be very very very careful when doing this! Eval will evaluate (and execute) any argument you pass to it as PHP. So if you will feed it something that comes from users, then anyone could execute any PHP code on your server, which goes without saying is a security hole the size of Grand canyon!.
edit: you will have to put a "print" or "echo" inside your $sItem variable somehow. It will either have to be in $sItem ($sItem = 'echo $aData[\'test\']';) or you will have to write your Eval() like this: Eval ( 'echo ' . $sData ).
$sItem = '$aData[\'test\']';
eval('$someVar = '.$sItem.';');
echo $someVar;
Use eval() with high caution as others aldready explained.
You could use this method
function getRecursive($path, array $data) {
// transform "foo['bar']" and 'foo["bar"]' to "foo[bar]"
$path = preg_replace('#\[(?:"|\')(.+)(?:"|\')\]#Uis', '[\1]', $path);
// get root
$i = strpos($path, '[');
$rootKey = substr($path, 0, $i);
if (!isset($data[$rootKey])) {
return null;
}
$value = $data[$rootKey];
$length = strlen($path);
$currentKey = null;
for (; $i < $length; ++$i) {
$char = $path[$i];
switch ($char) {
case '[':
if ($currentKey !== null) {
throw new InvalidArgumentException(sprintf('Malformed path, unexpected "[" at position %u', $i));
}
$currentKey = '';
break;
case ']':
if ($currentKey === null) {
throw new InvalidArgumentException(sprintf('Malformed path, unexpected "]" at position %u', $i));
}
if (!isset($value[$currentKey])) {
return null;
}
$value = $value[$currentKey];
if (!is_array($value)) {
return $value;
}
$currentKey = null;
break;
default:
if ($currentKey === null) {
throw new InvalidArgumentException(sprintf('Malformed path, unexpected "%s" at position %u', $char, $i));
}
$currentKey .= $char;
break;
}
}
if ($currentKey !== null) {
throw new InvalidArgumentException('Malformed path, must be and with "]"');
}
return $value;
}

Using a keyword as variable name in an INI file

I have the following in an INI file:
[country]
SE = Sweden
NO = Norway
FI = Finland
However, when var_dump()ing PHP's parse_ini_file() function, I get the following output:
PHP Warning: syntax error, unexpected BOOL_FALSE in test.ini on line 2
in /Users/andrew/sandbox/test.php on line 1
bool(false)
It appears that "NO" is reserved. Is there any other way I can set a variable named "NO"?
Another hack would be to reverse your ini keys with their values and use array_flip:
<?php
$ini =
"
[country]
Sweden = 'SE'
Norway = 'NO'
Finland = 'FI'
";
$countries = parse_ini_string($ini, true);
$countries = array_flip($countries["country"]);
echo $countries["NO"];
Still you will need to use quotes around NO (at least), if you do
Norway = NO
you don't get an error but value for $countries["NO"] will be an empty string.
This propably comes a little late but the way PHPs parse_ini_file works bothered me so much that I wrote my own little parser.
Feel free to use it, but use with care it has only been shallowly tested!
// the exception used by the parser
class IniParserException extends \Exception {
public function __construct($message, $code = 0, \Exception $previous = null) {
parent::__construct($message, $code, $previous);
}
public function __toString() {
return __CLASS__ . ": [{$this->code}]: {$this->message}\n";
}
}
// the parser
function my_parse_ini_file($filename, $processSections = false) {
$initext = file_get_contents($filename);
$ret = [];
$section = null;
$lineNum = 0;
$lines = explode("\n", str_replace("\r\n", "\n", $initext));
foreach($lines as $line) {
++$lineNum;
$line = trim(preg_replace('/[;#].*/', '', $line));
if(strlen($line) === 0) {
continue;
}
if($processSections && $line{0} === '[' && $line{strlen($line)-1} === ']') {
// section header
$section = trim(substr($line, 1, -1));
} else {
$eqIndex = strpos($line, '=');
if($eqIndex !== false) {
$key = trim(substr($line, 0, $eqIndex));
$matches = [];
preg_match('/(?<name>\w+)(?<index>\[\w*\])?/', $key, $matches);
if(!array_key_exists('name', $matches)) {
throw new IniParserException("Variable name must not be empty! In file \"$filename\" in line $lineNum.");
}
$keyName = $matches['name'];
if(array_key_exists('index', $matches)) {
$isArray = true;
$arrayIndex = trim($matches['index']);
if(strlen($arrayIndex) == 0) {
$arrayIndex = null;
}
} else {
$isArray = false;
$arrayIndex = null;
}
$value = trim(substr($line, $eqIndex+1));
if($value{0} === '"' && $value{strlen($value)-1} === '"') {
// too lazy to check for multiple closing " let's assume it's fine
$value = str_replace('\\"', '"', substr($value, 1, -1));
} else {
// special value
switch(strtolower($value)) {
case 'yes':
case 'true':
case 'on':
$value = true;
break;
case 'no':
case 'false':
case 'off':
$value = false;
break;
case 'null':
case 'none':
$value = null;
break;
default:
if(is_numeric($value)) {
$value = $value + 0; // make it an int/float
} else {
throw new IniParserException("\"$value\" is not a valid value! In file \"$filename\" in line $lineNum.");
}
}
}
if($section !== null) {
if($isArray) {
if(!array_key_exists($keyName, $ret[$section])) {
$ret[$section][$keyName] = [];
}
if($arrayIndex === null) {
$ret[$section][$keyName][] = $value;
} else {
$ret[$section][$keyName][$arrayIndex] = $value;
}
} else {
$ret[$section][$keyName] = $value;
}
} else {
if($isArray) {
if(!array_key_exists($keyName, $ret)) {
$ret[$keyName] = [];
}
if($arrayIndex === null) {
$ret[$keyName][] = $value;
} else {
$ret[$keyName][$arrayIndex] = $value;
}
} else {
$ret[$keyName] = $value;
}
}
}
}
}
return $ret;
}
What does it differently? Variable names may only consist of alphanumerical characters but other than that no restrictions to them. Strings must be encapsulated with " everything else has to be a special value like no, yes, true, false, on, off, null or none. For mapping see code.
Kind of a hack but you can add backticks around the key names:
[country]
`SE` = Sweden
`NO` = Norway
`FI` = Finland
Then access them like so:
$result = parse_ini_file('test.ini');
echo "{$result['`NO`']}\n";
Output:
$ php test.php
Norway
I was getting this error when there were single quote combinations in the string such as 't or 's. To get rid of the problem, I wrapped the string in double quotes:
Before:
You have selected 'Yes' but you haven't entered the date's flexibility
After:
"You have selected 'Yes' but you haven't entered the date's flexibility"
I ran into the same problem and tried to escape the name in every possible way.
Then I remembered that because of the INI syntax both names and values will be trimmed, so the following workaround MAYBE should do the trick:
NL = Netherlands
; A whitespace before the name
NO = Norway
PL = Poland
And it works ;) As long as your co-workers read the comments (which is not always the case) and don't delete it accidentally. So, yes, the array flipping solution is a safe bet.
From the manual page for parse_ini_file:
There are reserved words which must not be used as keys for ini files. These include: null, yes, no, true, false, on, off, none.
So no, you can't set a variable NO.

Backticking MySQL Entities

I've the following method which allows me to protect MySQL entities:
public function Tick($string)
{
$string = explode('.', str_replace('`', '', $string));
foreach ($string as $key => $value)
{
if ($value != '*')
{
$string[$key] = '`' . trim($value) . '`';
}
}
return implode('.', $string);
}
This works fairly well for the use that I make of it.
It protects database, table, field names and even the * operator, however now I also want it to protect function calls, ie:
AVG(database.employees.salary)
Should become:
AVG(`database`.`employees`.`salary`) and not `AVG(database`.`employees`.`salary)`
How should I go about this? Should I use regular expressions?
Also, how can I support more advanced stuff, from:
MAX(AVG(database.table.field1), MAX(database.table.field2))
To:
MAX(AVG(`database`.`table`.`field1`), MAX(`database`.`table`.`field2`))
Please keep in mind that I want to keep this method as simple/fast as possible, since it pretty much iterates over all the entity names in my database.
If this is quoting parts of an SQL statement, and they have only complexity that you descibe, a RegEx is a great approach. On the other hand, if you need to do this to full SQL statements, or simply more complicated components of statements (such as "MAX(AVG(val),MAX(val2))"), you will need to tokenize or parse the string and have a more sophisticated understanding of it to do this quoting accurately.
Given the regular expression approach, you may find it easier to break the function name out as one step, and then use your current code to quote the database/table/column names. This can be done in one RE, but it will be tricker to get right.
Either way, I'd highly recommend writing a few unit test cases. In fact, this is an ideal situation for this approach: it's easy to write the tests, you have some existing cases that work (which you don't want to break), and you have just one more case to add.
Your test can start as simply as:
assert '`ticked`' == Tick('ticked');
assert '`table`.`ticked`' == Tick('table.ticked');
assert 'db`.`table`.`ticked`' == Tick('db.table.ticked');
And then add:
assert 'FN(`ticked`)' == Tick('FN(ticked)');
etc.
Using the test case ndp gave I created a regex to do the hard work for you. The following regex will replace all word boundaries around words that are not followed by an opening parenthesis.
\b(\w+)\b(?!\()
The Tick() functionality would then be implemented in PHP as follows:
function Tick($string)
{
return preg_replace( '/\b(\w+)\b(?!\()/', '`\1`', $string );
}
It's generally a bad idea to pass the whole SQL to the function. That way, you'll always find a case when it doesn't work, unless you fully parse the SQL syntax.
Put the ticks to the names on some previous abstraction level, which makes up the SQL.
Before you explode your string on periods, check if the last character is a parenthesis. If so, this call is a function.
<?php
$string = str_replace('`', '', $string)
$function = "";
if (substr($string,-1) == ")") {
// Strip off function call first
$opening = strpos($string, "(");
$function = substr($string, 0, $opening+1);
$string = substr($string, $opening+1, -1);
}
// Do your existing parsing to $string
if ($function == "") {
// Put function back on string
$string = $function . $string . ")";
}
?>
If you need to cover more advanced situations, like using nested functions, or multiple functions in sequence in one "$string" variable, this would become a much more advanced function, and you'd best ask yourself why these elements aren't being properly ticked in the first place, and not need any further parsing.
EDIT: Updating for nested functions, as per original post edit
To have the above function deal with multiple nested functions, you likely need something that will 'unwrap' your nested functions. I haven't tested this, but the following function might get you on the right track.
<?php
function unwrap($str) {
$pos = strpos($str, "(");
if ($pos === false) return $str; // There's no function call here
$last_close = 0;
$cur_offset = 0; // Start at the beginning
while ($cur_offset <= strlen($str)) {
$first_close = strpos($str, ")", $offset); // Find first deep function
$pos = strrpos($str, "(", $first_close-1); // Find associated opening
if ($pos > $last_close) {
// This function is entirely after the previous function
$ticked = Tick(substr($str, $pos+1, $first_close-$pos)); // Tick the string inside
$str = substr($str, 0, $pos)."{".$ticked."}".substr($str,$first_close); // Replace parenthesis by curly braces temporarily
$first_close += strlen($ticked)-($first_close-$pos); // Shift parenthesis location due to new ticks being added
} else {
// This function wraps other functions; don't tick it
$str = substr($str, 0, $pos)."{".substr($str,$pos+1, $first_close-$pos)."}".substr($str,$first_close);
}
$last_close = $first_close;
$offset = $first_close+1;
}
// Replace the curly braces with parenthesis again
$str = str_replace(array("{","}"), array("(",")"), $str);
}
If you are adding the function calls in your code, as opposed to passing them in through a string-only interface, you can replace the string parsing with type checking:
function Tick($value) {
if (is_object($value)) {
$result = $value->value;
} else {
$result = '`'.str_replace(array('`', '.'), array('', '`.`'), $value).'`';
}
return $result;
}
class SqlFunction {
var $value;
function SqlFunction($function, $params) {
$sane = implode(', ', array_map('Tick', $params));
$this->value = "$function($sane)";
}
}
function Maximum($column) {
return new SqlFunction('MAX', array($column));
}
function Avg($column) {
return new SqlFunction('AVG', array($column));
}
function Greatest() {
$params = func_get_args();
return new SqlFunction('GREATEST', $params);
}
$cases = array(
"'simple'" => Tick('simple'),
"'table.field'" => Tick('table.field'),
"'table.*'" => Tick('table.*'),
"'evil`hack'" => Tick('evil`hack'),
"Avg('database.table.field')" => Tick(Avg('database.table.field')),
"Greatest(Avg('table.field1'), Maximum('table.field2'))" => Tick(Greatest(Avg('table.field1'), Maximum('table.field2'))),
);
echo "<table>";
foreach ($cases as $case => $result) {
echo "<tr><td>$case</td><td>$result</td></tr>";
}
echo "</table>";
This avoids any possible SQL injection while remaining legible to future readers of your code.
You could use preg_replace_callback() in conjunction with your Tick() method to skip at least one level of parens:
public function tick($str)
{
return preg_replace_callback('/[^()]*/', array($this, '_tick_replace_callback'), $str);
}
protected function _tick_replace_callback($str) {
$string = explode('.', str_replace('`', '', $string));
foreach ($string as $key => $value)
{
if ($value != '*')
{
$string[$key] = '`' . trim($value) . '`';
}
}
return implode('.', $string);
}
Are you generating the SQL Query or is it being passed to you? If you generating the query I wouldn't pass the whole query string just the parms/values you want to wrap in the backticks or what ever else you need.
EXAMPLE:
function addTick($var) {
return '`' . $var . '`';
}
$condition = addTick($condition);
$SQL = 'SELECT' . $what . '
FROM ' . $table . '
WHERE ' . $condition . ' = ' . $constraint;
This is just a mock but you get the idea that you can pass or loop through your code and build the query string rather than parsing the query string and adding your backticks.

Categories