How to mathematically evaluate a string like "2-1" to produce "1"? - php

I was just wondering if PHP has a function that can take a string like 2-1 and produce the arithmetic result of it?
Or will I have to do this manually with explode() to get the values left and right of the arithmetic operator?

I know this question is old, but I came across it last night while searching for something that wasn't quite related, and every single answer here is bad. Not just bad, very bad. The examples I give here will be from a class that I created back in 2005 and spent the past few hours updating for PHP5 because of this question. Other systems do exist, and were around before this question was posted, so it baffles me why every answer here tells you to use eval, when the caution from PHP is:
The eval() language construct is very dangerous because it allows execution of arbitrary PHP code. Its use thus is discouraged. If you have carefully verified that there is no other option than to use this construct, pay special attention not to pass any user provided data into it without properly validating it beforehand.
Before I jump in to the example, the places to get the class I will be using is on either PHPClasses or GitHub. Both the eos.class.php and stack.class.php are required, but can be combined in to the same file.
The reason for using a class like this is that it includes and infix to postfix(RPN) parser, and then an RPN Solver. With these, you never have to use the eval function and open your system up to vulnerabilities. Once you have the classes, the following code is all that is needed to solve a simple (to more complex) equation such as your 2-1 example.
require_once "eos.class.php";
$equation = "2-1";
$eq = new eqEOS();
$result = $eq->solveIF($equation);
That's it! You can use a parser like this for most equations, however complicated and nested without ever having to resort to the 'evil eval'.
Because I really don't want this only only to have my class in it, here are some other options. I am just familiar with my own since I've been using it for 8 years. ^^
Wolfram|Alpha API
Sage
A fairly bad parser
phpdicecalc
Not quite sure what happened to others that I had found previously - came across another one on GitHub before as well, unfortunately I didn't bookmark it, but it was related to large float operations that included a parser as well.
Anyways, I wanted to make sure an answer to solving equations in PHP on here wasn't pointing all future searchers to eval as this was at the top of a google search. ^^

$operation='2-1';
eval("\$value = \"$operation\";");
or
$value=eval("return ($operation);");

This is one of the cases where eval comes in handy:
$expression = '2 - 1';
eval( '$result = (' . $expression . ');' );
echo $result;

You can use BC Math arbitrary precision
echo bcsub(5, 4); // 1
echo bcsub(1.234, 5); // 3
echo bcsub(1.234, 5, 4); // -3.7660
http://www.php.net/manual/en/function.bcsub.php

In this forum someone made it without eval. Maybe you can try it? Credits to them, I just found it.
function calculate_string( $mathString ) {
$mathString = trim($mathString); // trim white spaces
$mathString = ereg_replace ('[^0-9\+-\*\/\(\) ]', '', $mathString); // remove any non-numbers chars; exception for math operators
$compute = create_function("", "return (" . $mathString . ");" );
return 0 + $compute();
}
$string = " (1 + 1) * (2 + 2)";
echo calculate_string($string); // outputs 8

Also see this answer here: Evaluating a string of simple mathematical expressions
Please note this solution does NOT conform to BODMAS, but you can use brackets in your evaluation string to overcome this.
function callback1($m) {
return string_to_math($m[1]);
}
function callback2($n,$m) {
$o=$m[0];
$m[0]=' ';
return $o=='+' ? $n+$m : ($o=='-' ? $n-$m : ($o=='*' ? $n*$m : $n/$m));
}
function string_to_math($s){
while ($s != ($t = preg_replace_callback('/\(([^()]*)\)/','callback1',$s))) $s=$t;
preg_match_all('![-+/*].*?[\d.]+!', "+$s", $m);
return array_reduce($m[0], 'callback2');
}
echo string_to_match('2-1'); //returns 1

As create_function got deprecated and I was utterly needed an alternative lightweight solution of evaluating string as math. After a couple of hours spending, I came up with following. By the way, I did not care about parentheses as I don't need in my case. I just needed something that conform operator precedence correctly.
Update: I have added parentheses support as well. Please check this project Evaluate Math String
function evalAsMath($str) {
$error = false;
$div_mul = false;
$add_sub = false;
$result = 0;
$str = preg_replace('/[^\d\.\+\-\*\/]/i','',$str);
$str = rtrim(trim($str, '/*+'),'-');
if ((strpos($str, '/') !== false || strpos($str, '*') !== false)) {
$div_mul = true;
$operators = array('*','/');
while(!$error && $operators) {
$operator = array_pop($operators);
while($operator && strpos($str, $operator) !== false) {
if ($error) {
break;
}
$regex = '/([\d\.]+)\\'.$operator.'(\-?[\d\.]+)/';
preg_match($regex, $str, $matches);
if (isset($matches[1]) && isset($matches[2])) {
if ($operator=='+') $result = (float)$matches[1] + (float)$matches[2];
if ($operator=='-') $result = (float)$matches[1] - (float)$matches[2];
if ($operator=='*') $result = (float)$matches[1] * (float)$matches[2];
if ($operator=='/') {
if ((float)$matches[2]) {
$result = (float)$matches[1] / (float)$matches[2];
} else {
$error = true;
}
}
$str = preg_replace($regex, $result, $str, 1);
$str = str_replace(array('++','--','-+','+-'), array('+','+','-','-'), $str);
} else {
$error = true;
}
}
}
}
if (!$error && (strpos($str, '+') !== false || strpos($str, '-') !== false)) {
$add_sub = true;
preg_match_all('/([\d\.]+|[\+\-])/', $str, $matches);
if (isset($matches[0])) {
$result = 0;
$operator = '+';
$tokens = $matches[0];
$count = count($tokens);
for ($i=0; $i < $count; $i++) {
if ($tokens[$i] == '+' || $tokens[$i] == '-') {
$operator = $tokens[$i];
} else {
$result = ($operator == '+') ? ($result + (float)$tokens[$i]) : ($result - (float)$tokens[$i]);
}
}
}
}
if (!$error && !$div_mul && !$add_sub) {
$result = (float)$str;
}
return $error ? 0 : $result;
}
Demo: http://sandbox.onlinephpfunctions.com/code/fdffa9652b748ac8c6887d91f9b10fe62366c650

Here is a somewhat verbose bit of code I rolled for another SO question. It does conform to BOMDAS without eval(), but is not equipped to do complex/higher-order/parenthetical expressions. This library-free approach pulls the expression apart and systematically reduces the array of components until all of the operators are removed. It certainly works for your sample expression: 2-1 ;)
preg_match() checks that each operator has a numeric substring on each side.
preg_split() divides the string into an array of alternating numbers and operators.
array_search() finds the index of the targeted operator, while it exists in the array.
array_splice() replaces the operator element and the elements on either side of it with a new element that contains the mathematical result of the three elements removed.
** updated to allow negative numbers **
Code: (Demo)
$expression = "-11+3*1*4/-6-12";
if (!preg_match('~^-?\d*\.?\d+([*/+-]-?\d*\.?\d+)*$~', $expression)) {
echo "invalid expression";
} else {
$components = preg_split('~(?<=\d)([*/+-])~', $expression, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
var_export($components); // ['-11','+','3','*','1','*','4','/','-6','-','12']
while (($index = array_search('*',$components)) !== false) {
array_splice($components, $index - 1, 3, $components[$index - 1] * $components[$index + 1]);
var_export($components);
// ['-11','+','3','*','4','/','-6','-','12']
// ['-11','+','12','/','-6','-','12']
}
while (($index = array_search('/', $components)) !== false) {
array_splice($components, $index - 1, 3, $components[$index - 1] / $components[$index + 1]);
var_export($components); // [-'11','+','-2','-','12']
}
while (($index = array_search('+', $components)) !== false) {
array_splice($components, $index - 1, 3, $components[$index - 1] + $components[$index + 1]);
var_export($components); // ['-13','-','12']
}
while (($index = array_search('-', $components)) !== false) {
array_splice($components, $index - 1, 3, $components[$index - 1] - $components[$index + 1]);
var_export($components); // [-25]
}
echo current($components); // -25
}
Here is a demo of the BOMDAS version that uses php's pow() when ^ is encountered between two numbers (positive or negative).
I don't think I'll ever bother writing a version that handles parenthetical expressions ... but we'll see how bored I get.

You can do it by eval function.
Here is how you can do this.
<?php
$exp = "2-1;";
$res = eval("return $exp");
echo $res; // it will return 1
?>
You can use $res anywhere in the code to get the result.
You can use it in form by changing $exp value.
Here is an example of creating a web calculator.
<?php
if (isset($_POST['submit'])) {
$exp = $_POST['calc'];
$res = eval("return $exp;"); //Here we have to add ; after $exp to make a complete code.
echo $res;
}
?>
// html code
<form method="post">
<input type="text" name="calc">
<input type="submit" name="submit">
</form>

Related

String between string with array in PHP, array order ISSUE

I'm facing an issue with a function that gets a string between two other strings.
function string_between($str, $starting_word, $ending_word) {
$subtring_start = strpos($str, $starting_word);
$subtring_start += strlen($starting_word);
foreach ($ending_word as $a){
$size = strpos($str, $a, $subtring_start) - $subtring_start;
}
return substr($str, $subtring_start, $size);
}
The issue is that the function searches for the first ending_word in the array.
An example will be easier to understand:
$array_a = ['the', 'amen']; // Starting strings
$array_b = [',', '.']; // Ending strings
$str = "Hello, the world. Then, it is over.";
Expected result:
"the world."
Current result:
"the world. Then,"
The function will think that the ending_word is "," because it is the first element met in the array_b. However, the text encounters first the '.' after the "the" starting word.
How can I make sure the function goes through the text and stops at the first element in the $str present in the array_b, whatever the position in the array?
Any idea?
Basically, you need to break outside of your foreach loop when $size > 0
That way it stops looping through your array when it finds the 1st occurrence. Here is the more complete code with other fixes:
function stringBetween($string, $startingWords, $endingWords) {
foreach ($startingWords as $startingWord) {
$subtringStart = strpos($string, $startingWord);
if ($subtringStart > 0) {
foreach ($endingWords as $endingWord){
$size = strpos($string, $endingWord, $subtringStart) - $subtringStart + strlen($endingWord);
if ($size > 0) {
break;
}
}
if ($size > 0) {
return substr($string, $subtringStart, $size);
}
}
}
return null;
}
$startArr = array('the', 'amen'); // Starting strings
$endArr = array('.', ','); // Ending strings
$str = "Hello, the world. Then, it is over.";
echo stringBetween($str, $startArr, $endArr); // the world.
This type of problems are best solved by PCRE regexes, only couple of lines needed in function :
function string_between($str, $starts, $ends) {
preg_match("/(?:{$starts}).*?(?:{$ends})/mi", $str, $m);
return $m[0];
}
Then calling like this :
echo string_between("Hello, the world. Then, it is over.", 'the|amen', ',|\.');
Produces : the world.
The trick,- search to the nearest matching ending symbol is done with regex non-greedy seach, indicated by question symbol in pattern .*?. You can even extend this function to accept arrays as starting/ending symbols, just that case modify function (possibly with implode('|',$arr)) for concatenating symbols into regex grouping formula.
Edited version
This works now. Iterate over your teststrings from first array looking for position of occurance from teststring. If found one then search for the second teststring at startposition from end of first string.
To get the shortest hit I store the position from the second and take the minimum.
You can try it at http://sandbox.onlinephpfunctions.com/code/0f1e5c97da62b4daaf0e49f52271fe288d1cacbb
$array_a =array('the','amen');
$array_b =array(',','.', '#');
$str = "Hello, the world. Then, it is over.";
function earchString($str, $array_a, $array_b) {
forEach($array_a as $test) {
$pos = strpos($str, $test);
if ($pos===false) continue;
$found = [];
forEach($array_b as $test2) {
$posStart = $pos+strlen($test);
$pos2 = strpos($str, $test2, $posStart);
$found[] = ($pos2!==false) ? $pos2 : INF;
}
$min = min($found);
if ($min !== INF)
return substr($str,$pos,$min-$pos) .$str[$min];
}
return '';
}
echo earchString($str, $array_a, $array_b);

preg_match() match to cases with one pattern?

How would I go about ordering 1 array into 2 arrays depending on what each part of the array starts with using preg_match() ?
I know how to do this using 1 expression but I don't know how to use 2 expressions.
So far I can have done (don't ask why I'm not using strpos() - I need to use regex):
$gen = array(
'F1',
'BBC450',
'BBC566',
'F2',
'F31',
'SOMETHING123',
'SOMETHING456'
);
$f = array();
$bbc = array();
foreach($gen as $part) {
if(preg_match('/^F/', $part)) {
// Add to F array
array_push($f, $part);
} else if(preg_match('/^BBC/', $part)) {
// Add to BBC array
array_push($bbc, $part);
} else {
// Not F or BBC
}
}
So my question is: is it possible to do this using 1 preg_match() function?
Please ignore the SOMETHING part in the array, it's to show that using just one if else statement wouldn't solve this.
Thanks.
You can use an alternation along with the third argument to preg_match, which contains the part of the regexp that matched.
preg_match('/^(?:F|BBC)/', $part, $match);
switch ($match) {
case 'F':
$f[] = $part;
break;
case 'BBC':
$bbc[] = $part;
break;
default:
// Not F or BBC
}
It is even possible without any loop, switch, or anything else (which is faster and more efficient then the accepted answer's solution).
<?php
preg_match_all("/(?:(^F.*$)|(^BBC.*$))/m", implode(PHP_EOL, $gen), $matches);
$f = isset($matches[1]) ? $matches[1] : array();
$bbc = isset($matches[2]) ? $matches[2] : array();
You can find an interactive explanation of the regular expression at regex101.com which I created for you.
The (not desired) strpos approach is nearly five times faster.
<?php
$c = count($gen);
$f = $bbc = array();
for ($i = 0; $i < $c; ++$i) {
if (strpos($gen[$i], "F") === 0) {
$f[] = $gen[$i];
}
elseif (strpos($gen[$i], "BBC") === 0) {
$bbc[] = $gen[$i];
}
}
Regular expressions are nice, but the are no silver bullet for everything.

Backticking MySQL Entities

I've the following method which allows me to protect MySQL entities:
public function Tick($string)
{
$string = explode('.', str_replace('`', '', $string));
foreach ($string as $key => $value)
{
if ($value != '*')
{
$string[$key] = '`' . trim($value) . '`';
}
}
return implode('.', $string);
}
This works fairly well for the use that I make of it.
It protects database, table, field names and even the * operator, however now I also want it to protect function calls, ie:
AVG(database.employees.salary)
Should become:
AVG(`database`.`employees`.`salary`) and not `AVG(database`.`employees`.`salary)`
How should I go about this? Should I use regular expressions?
Also, how can I support more advanced stuff, from:
MAX(AVG(database.table.field1), MAX(database.table.field2))
To:
MAX(AVG(`database`.`table`.`field1`), MAX(`database`.`table`.`field2`))
Please keep in mind that I want to keep this method as simple/fast as possible, since it pretty much iterates over all the entity names in my database.
If this is quoting parts of an SQL statement, and they have only complexity that you descibe, a RegEx is a great approach. On the other hand, if you need to do this to full SQL statements, or simply more complicated components of statements (such as "MAX(AVG(val),MAX(val2))"), you will need to tokenize or parse the string and have a more sophisticated understanding of it to do this quoting accurately.
Given the regular expression approach, you may find it easier to break the function name out as one step, and then use your current code to quote the database/table/column names. This can be done in one RE, but it will be tricker to get right.
Either way, I'd highly recommend writing a few unit test cases. In fact, this is an ideal situation for this approach: it's easy to write the tests, you have some existing cases that work (which you don't want to break), and you have just one more case to add.
Your test can start as simply as:
assert '`ticked`' == Tick('ticked');
assert '`table`.`ticked`' == Tick('table.ticked');
assert 'db`.`table`.`ticked`' == Tick('db.table.ticked');
And then add:
assert 'FN(`ticked`)' == Tick('FN(ticked)');
etc.
Using the test case ndp gave I created a regex to do the hard work for you. The following regex will replace all word boundaries around words that are not followed by an opening parenthesis.
\b(\w+)\b(?!\()
The Tick() functionality would then be implemented in PHP as follows:
function Tick($string)
{
return preg_replace( '/\b(\w+)\b(?!\()/', '`\1`', $string );
}
It's generally a bad idea to pass the whole SQL to the function. That way, you'll always find a case when it doesn't work, unless you fully parse the SQL syntax.
Put the ticks to the names on some previous abstraction level, which makes up the SQL.
Before you explode your string on periods, check if the last character is a parenthesis. If so, this call is a function.
<?php
$string = str_replace('`', '', $string)
$function = "";
if (substr($string,-1) == ")") {
// Strip off function call first
$opening = strpos($string, "(");
$function = substr($string, 0, $opening+1);
$string = substr($string, $opening+1, -1);
}
// Do your existing parsing to $string
if ($function == "") {
// Put function back on string
$string = $function . $string . ")";
}
?>
If you need to cover more advanced situations, like using nested functions, or multiple functions in sequence in one "$string" variable, this would become a much more advanced function, and you'd best ask yourself why these elements aren't being properly ticked in the first place, and not need any further parsing.
EDIT: Updating for nested functions, as per original post edit
To have the above function deal with multiple nested functions, you likely need something that will 'unwrap' your nested functions. I haven't tested this, but the following function might get you on the right track.
<?php
function unwrap($str) {
$pos = strpos($str, "(");
if ($pos === false) return $str; // There's no function call here
$last_close = 0;
$cur_offset = 0; // Start at the beginning
while ($cur_offset <= strlen($str)) {
$first_close = strpos($str, ")", $offset); // Find first deep function
$pos = strrpos($str, "(", $first_close-1); // Find associated opening
if ($pos > $last_close) {
// This function is entirely after the previous function
$ticked = Tick(substr($str, $pos+1, $first_close-$pos)); // Tick the string inside
$str = substr($str, 0, $pos)."{".$ticked."}".substr($str,$first_close); // Replace parenthesis by curly braces temporarily
$first_close += strlen($ticked)-($first_close-$pos); // Shift parenthesis location due to new ticks being added
} else {
// This function wraps other functions; don't tick it
$str = substr($str, 0, $pos)."{".substr($str,$pos+1, $first_close-$pos)."}".substr($str,$first_close);
}
$last_close = $first_close;
$offset = $first_close+1;
}
// Replace the curly braces with parenthesis again
$str = str_replace(array("{","}"), array("(",")"), $str);
}
If you are adding the function calls in your code, as opposed to passing them in through a string-only interface, you can replace the string parsing with type checking:
function Tick($value) {
if (is_object($value)) {
$result = $value->value;
} else {
$result = '`'.str_replace(array('`', '.'), array('', '`.`'), $value).'`';
}
return $result;
}
class SqlFunction {
var $value;
function SqlFunction($function, $params) {
$sane = implode(', ', array_map('Tick', $params));
$this->value = "$function($sane)";
}
}
function Maximum($column) {
return new SqlFunction('MAX', array($column));
}
function Avg($column) {
return new SqlFunction('AVG', array($column));
}
function Greatest() {
$params = func_get_args();
return new SqlFunction('GREATEST', $params);
}
$cases = array(
"'simple'" => Tick('simple'),
"'table.field'" => Tick('table.field'),
"'table.*'" => Tick('table.*'),
"'evil`hack'" => Tick('evil`hack'),
"Avg('database.table.field')" => Tick(Avg('database.table.field')),
"Greatest(Avg('table.field1'), Maximum('table.field2'))" => Tick(Greatest(Avg('table.field1'), Maximum('table.field2'))),
);
echo "<table>";
foreach ($cases as $case => $result) {
echo "<tr><td>$case</td><td>$result</td></tr>";
}
echo "</table>";
This avoids any possible SQL injection while remaining legible to future readers of your code.
You could use preg_replace_callback() in conjunction with your Tick() method to skip at least one level of parens:
public function tick($str)
{
return preg_replace_callback('/[^()]*/', array($this, '_tick_replace_callback'), $str);
}
protected function _tick_replace_callback($str) {
$string = explode('.', str_replace('`', '', $string));
foreach ($string as $key => $value)
{
if ($value != '*')
{
$string[$key] = '`' . trim($value) . '`';
}
}
return implode('.', $string);
}
Are you generating the SQL Query or is it being passed to you? If you generating the query I wouldn't pass the whole query string just the parms/values you want to wrap in the backticks or what ever else you need.
EXAMPLE:
function addTick($var) {
return '`' . $var . '`';
}
$condition = addTick($condition);
$SQL = 'SELECT' . $what . '
FROM ' . $table . '
WHERE ' . $condition . ' = ' . $constraint;
This is just a mock but you get the idea that you can pass or loop through your code and build the query string rather than parsing the query string and adding your backticks.

What is the algorithm for parsing expressions in infix notation?

I would like to parse boolean expressions in PHP. As in:
A and B or C and (D or F or not G)
The terms can be considered simple identifiers. They will have a little structure, but the parser doesn't need to worry about that. It should just recognize the keywords and or not ( ). Everything else is a term.
I remember we wrote simple arithmetic expression evaluators at school, but I don't remember how it was done anymore. Nor do I know what keywords to look for in Google/SO.
A ready made library would be nice, but as I remember the algorithm was pretty simple so it might be fun and educational to re-implement it myself.
Recursive descent parsers are fun to write and easy to read. The first step is to write your grammar out.
Maybe this is the grammar you want.
expr = and_expr ('or' and_expr)*
and_expr = not_expr ('and' not_expr)*
not_expr = simple_expr | 'not' not_expr
simple_expr = term | '(' expr ')'
Turning this into a recursive descent parser is super easy. Just write one function per nonterminal.
def expr():
x = and_expr()
while peek() == 'or':
consume('or')
y = and_expr()
x = OR(x, y)
return x
def and_expr():
x = not_expr()
while peek() == 'and':
consume('and')
y = not_expr()
x = AND(x, y)
return x
def not_expr():
if peek() == 'not':
consume('not')
x = not_expr()
return NOT(x)
else:
return simple_expr()
def simple_expr():
t = peek()
if t == '(':
consume('(')
result = expr()
consume(')')
return result
elif is_term(t):
consume(t)
return TERM(t)
else:
raise SyntaxError("expected term or (")
This isn't complete. You have to provide a little more code:
Input functions. consume, peek, and is_term are functions you provide. They'll be easy to implement using regular expressions. consume(s) reads the next token of input and throws an error if it doesn't match s. peek() simply returns a peek at the next token without consuming it. is_term(s) returns true if s is a term.
Output functions. OR, AND, NOT, and TERM are called each time a piece of the expression is successfully parsed. They can do whatever you want.
Wrapper function. Instead of just calling expr directly, you'll want to write a little wrapper function that initializes the variables used by consume and peek, then calls expr, and finally checks to make sure there's no leftover input that didn't get consumed.
Even with all this, it's still a tiny amount of code. In Python, the complete program is 84 lines, and that includes a few tests.
Why not jsut use the PHP parser?
$terms=array('and','or','not','A','B','C','D'...);
$values=array('*','+','!',1,1,0,0,1....);
$expression="A and B or C and (D or F or not G)";
$expression=preg_replace($terms, $values,$expression);
$expression=preg_replace('^(+|-|!|1|0)','',$expression);
$result=eval($expression);
Actually, that 2nd regex is wrong (and only required if you need to prevent any code injection) - but you get the idea.
C.
I'd go with Pratt parser. It's almost like recursive descent but smarter :) A decent explanation by Douglas Crockford (of JSLint fame) here.
Dijkstra's shunting yard algorithm is the traditional one for going from infix to postfix/graph.
I've implemented the shunting yard algorithm as suggested by plinth. However, this algorithm just gives you the postfix notation, aka reverse Polish notation (RNP). You still have to evaluate it, but that's quite easy once you have the expression in RNP (described for instance here).
The code below might not be good PHP style, my PHP knowledge is somewhat limited. It should be enough to get the idea though.
$operators = array("and", "or", "not");
$num_operands = array("and" => 2, "or" => 2, "not" => 1);
$parenthesis = array("(", ")");
function is_operator($token) {
global $operators;
return in_array($token, $operators);
}
function is_right_parenthesis($token) {
global $parenthesis;
return $token == $parenthesis[1];
}
function is_left_parenthesis($token) {
global $parenthesis;
return $token == $parenthesis[0];
}
function is_parenthesis($token) {
return is_right_parenthesis($token) || is_left_parenthesis($token);
}
// check whether the precedence if $a is less than or equal to that of $b
function is_precedence_less_or_equal($a, $b) {
// "not" always comes first
if ($b == "not")
return true;
if ($a == "not")
return false;
if ($a == "or" and $b == "and")
return true;
if ($a == "and" and $b == "or")
return false;
// otherwise they're equal
return true;
}
function shunting_yard($input_tokens) {
$stack = array();
$output_queue = array();
foreach ($input_tokens as $token) {
if (is_operator($token)) {
while (is_operator($stack[count($stack)-1]) && is_precedence_less_or_equal($token, $stack[count($stack)-1])) {
$o2 = array_pop($stack);
array_push($output_queue, $o2);
}
array_push($stack, $token);
} else if (is_parenthesis($token)) {
if (is_left_parenthesis($token)) {
array_push($stack, $token);
} else {
while (!is_left_parenthesis($stack[count($stack)-1]) && count($stack) > 0) {
array_push($output_queue, array_pop($stack));
}
if (count($stack) == 0) {
echo ("parse error");
die();
}
$lp = array_pop($stack);
}
} else {
array_push($output_queue, $token);
}
}
while (count($stack) > 0) {
$op = array_pop($stack);
if (is_parenthesis($op))
die("mismatched parenthesis");
array_push($output_queue, $op);
}
return $output_queue;
}
function str2bool($s) {
if ($s == "true")
return true;
if ($s == "false")
return false;
die('$s doesn\'t contain valid boolean string: '.$s.'\n');
}
function apply_operator($operator, $a, $b) {
if (is_string($a))
$a = str2bool($a);
if (!is_null($b) and is_string($b))
$b = str2bool($b);
if ($operator == "and")
return $a and $b;
else if ($operator == "or")
return $a or $b;
else if ($operator == "not")
return ! $a;
else die("unknown operator `$function'");
}
function get_num_operands($operator) {
global $num_operands;
return $num_operands[$operator];
}
function is_unary($operator) {
return get_num_operands($operator) == 1;
}
function is_binary($operator) {
return get_num_operands($operator) == 2;
}
function eval_rpn($tokens) {
$stack = array();
foreach ($tokens as $t) {
if (is_operator($t)) {
if (is_unary($t)) {
$o1 = array_pop($stack);
$r = apply_operator($t, $o1, null);
array_push($stack, $r);
} else { // binary
$o1 = array_pop($stack);
$o2 = array_pop($stack);
$r = apply_operator($t, $o1, $o2);
array_push($stack, $r);
}
} else { // operand
array_push($stack, $t);
}
}
if (count($stack) != 1)
die("invalid token array");
return $stack[0];
}
// $input = array("A", "and", "B", "or", "C", "and", "(", "D", "or", "F", "or", "not", "G", ")");
$input = array("false", "and", "true", "or", "true", "and", "(", "false", "or", "false", "or", "not", "true", ")");
$tokens = shunting_yard($input);
$result = eval_rpn($tokens);
foreach($input as $t)
echo $t." ";
echo "==> ".($result ? "true" : "false")."\n";
You could use an LR parser to build a parse tree and then evaluate the tree to obtain the result. A detailed description including examples can be found in Wikipedia. If you haven't coded it yourself already I will write a small example tonight.
The simplest way is to use regexes that converts your expression into an expression in php syntax and then use eval, as suggested by symcbean. But I'm not sure if you would want to use it in production code.
The other way is to code your own simple recursive descent parser. It isn't as hard as it might sound. For a simple grammar such yours (boolean expressions), you can easily code one from scratch. You can also use a parser generator similar to ANTLR for php, probably searching for a php parser generator would turn up something.

Regex to parse define() contents, possible?

I am very new to regex, and this is way too advanced for me. So I am asking the experts over here.
Problem
I would like to retrieve the constants / values from a php define()
DEFINE('TEXT', 'VALUE');
Basically I would like a regex to be able to return the name of constant, and the value of constant from the above line. Just TEXT and VALUE . Is this even possible?
Why I need it? I am dealing with language file and I want to get all couples (name, value) and put them in array. I managed to do it with str_replace() and trim() etc.. but this way is long and I am sure it could be made easier with single line of regex.
Note: The VALUE may contain escaped single quotes as well. example:
DEFINE('TEXT', 'J\'ai');
I hope I am not asking for something too complicated. :)
Regards
For any kind of grammar-based parsing, regular expressions are usually an awful solution. Even smple grammars (like arithmetic) have nesting and it's on nesting (in particular) that regular expressions just fall over.
Fortunately PHP provides a far, far better solution for you by giving you access to the same lexical analyzer used by the PHP interpreter via the token_get_all() function. Give it a character stream of PHP code and it'll parse it into tokens ("lexemes"), which you can do a bit of simple parsing on with a pretty simple finite state machine.
Run this program (it's run as test.php so it tries it on itself). The file is deliberately formatted badly so you can see it handles that with ease.
<?
define('CONST1', 'value' );
define (CONST2, 'value2');
define( 'CONST3', time());
define('define', 'define');
define("test", VALUE4);
define('const5', //
'weird declaration'
) ;
define('CONST7', 3.14);
define ( /* comment */ 'foo', 'bar');
$defn = 'blah';
define($defn, 'foo');
define( 'CONST4', define('CONST5', 6));
header('Content-Type: text/plain');
$defines = array();
$state = 0;
$key = '';
$value = '';
$file = file_get_contents('test.php');
$tokens = token_get_all($file);
$token = reset($tokens);
while ($token) {
// dump($state, $token);
if (is_array($token)) {
if ($token[0] == T_WHITESPACE || $token[0] == T_COMMENT || $token[0] == T_DOC_COMMENT) {
// do nothing
} else if ($token[0] == T_STRING && strtolower($token[1]) == 'define') {
$state = 1;
} else if ($state == 2 && is_constant($token[0])) {
$key = $token[1];
$state = 3;
} else if ($state == 4 && is_constant($token[0])) {
$value = $token[1];
$state = 5;
}
} else {
$symbol = trim($token);
if ($symbol == '(' && $state == 1) {
$state = 2;
} else if ($symbol == ',' && $state == 3) {
$state = 4;
} else if ($symbol == ')' && $state == 5) {
$defines[strip($key)] = strip($value);
$state = 0;
}
}
$token = next($tokens);
}
foreach ($defines as $k => $v) {
echo "'$k' => '$v'\n";
}
function is_constant($token) {
return $token == T_CONSTANT_ENCAPSED_STRING || $token == T_STRING ||
$token == T_LNUMBER || $token == T_DNUMBER;
}
function dump($state, $token) {
if (is_array($token)) {
echo "$state: " . token_name($token[0]) . " [$token[1]] on line $token[2]\n";
} else {
echo "$state: Symbol '$token'\n";
}
}
function strip($value) {
return preg_replace('!^([\'"])(.*)\1$!', '$2', $value);
}
?>
Output:
'CONST1' => 'value'
'CONST2' => 'value2'
'CONST3' => 'time'
'define' => 'define'
'test' => 'VALUE4'
'const5' => 'weird declaration'
'CONST7' => '3.14'
'foo' => 'bar'
'CONST5' => '6'
This is basically a finite state machine that looks for the pattern:
function name ('define')
open parenthesis
constant
comma
constant
close parenthesis
in the lexical stream of a PHP source file and treats the two constants as a (name,value) pair. In doing so it handles nested define() statements (as per the results) and ignores whitespace and comments as well as working across multiple lines.
Note: I've deliberatley made it ignore the case when functions and variables are constant names or values but you can extend it to that as you wish.
It's also worth pointing out that PHP is quite forgiving when it comes to strings. They can be declared with single quotes, double quotes or (in certain circumstances) with no quotes at all. This can be (as pointed out by Gumbo) be an ambiguous reference reference to a constant and you have no way of knowing which it is (no guaranteed way anyway), giving you the chocie of:
Ignoring that style of strings (T_STRING);
Seeing if a constant has already been declared with that name and replacing it's value. There's no way you can know what other files have been called though nor can you process any defines that are conditionally created so you can't say with any certainty if anything is definitely a constant or not nor what value it has; or
You can just live with the possibility that these might be constants (which is unlikely) and just treat them as strings.
Personally I would go for (1) then (3).
This is possible, but I would rather use get_defined_constants(). But make sure all your translations have something in common (like all translations starting with T), so you can tell them apart from other constants.
Try this regular expression to find the define calls:
/\bdefine\(\s*("(?:[^"\\]+|\\(?:\\\\)*.)*"|'(?:[^'\\]+|\\(?:\\\\)*.)*')\s*,\s*("(?:[^"\\]+|\\(?:\\\\)*.)*"|'(?:[^'\\]+|\\(?:\\\\)*.)*')\s*\);/is
So:
$pattern = '/\\bdefine\\(\\s*("(?:[^"\\\\]+|\\\\(?:\\\\\\\\)*.)*"|\'(?:[^\'\\\\]+|\\\\(?:\\\\\\\\)*.)*\')\\s*,\\s*("(?:[^"\\\\]+|\\\\(?:\\\\\\\\)*.)*"|\'(?:[^\'\\\\]+|\\\\(?:\\\\\\\\)*.)*\')\\s*\\);/is';
$str = '<?php define(\'foo\', \'bar\'); define("define(\\\'foo\\\', \\\'bar\\\')", "define(\'foo\', \'bar\')"); ?>';
preg_match_all($pattern, $str, $matches, PREG_SET_ORDER);
var_dump($matches);
I know that eval is evil. But that’s the best way to evaluate the string expressions:
$constants = array();
foreach ($matches as $match) {
eval('$constants['.$match[1].'] = '.$match[1].';');
}
var_dump($constants);
You might not need to go overboard with the regex complexity - something like this will probably suffice
/DEFINE\('(.*?)',\s*'(.*)'\);/
Here's a PHP sample showing how you might use it
$lines=file("myconstants.php");
foreach($lines as $line) {
$matches=array();
if (preg_match('/DEFINE\(\'(.*?)\',\s*\'(.*)\'\);/i', $line, $matches)) {
$name=$matches[1];
$value=$matches[2];
echo "$name = $value\n";
}
}
Not every problem with text should be solved with a regexp, so I'd suggest you state what you want to achieve and not how.
So, instead of using php's parser which is not really useful, or instead of using a completely undebuggable regexp, why not write a simple parser?
<?php
$str = "define('nam\\'e', 'va\\\\\\'lue');\ndefine('na\\\\me2', 'value\\'2');\nDEFINE('a', 'b');";
function getDefined($str) {
$lines = array();
preg_match_all('#^define[(][ ]*(.*?)[ ]*[)];$#mi', $str, $lines);
$res = array();
foreach ($lines[1] as $cnt) {
$p = 0;
$key = parseString($cnt, $p);
// Skip comma
$p++;
// Skip space
while ($cnt{$p} == " ") {
$p++;
}
$value = parseString($cnt, $p);
$res[$key] = $value;
}
return $res;
}
function parseString($s, &$p) {
$quotechar = $s[$p];
if (! in_array($quotechar, array("'", '"'))) {
throw new Exception("Invalid quote character '" . $quotechar . "', input is " . var_export($s, true) . " # " . $p);
}
$len = strlen($s);
$quoted = false;
$res = "";
for ($p++;$p < $len;$p++) {
if ($quoted) {
$quoted = false;
$res .= $s{$p};
} else {
if ($s{$p} == "\\") {
$quoted = true;
continue;
}
if ($s{$p} == $quotechar) {
$p++;
return $res;
}
$res .= $s{$p};
}
}
throw new Exception("Premature end of line");
}
var_dump(getDefined($str));
Output:
array(3) {
["nam'e"]=>
string(7) "va\'lue"
["na\me2"]=>
string(7) "value'2"
["a"]=>
string(1) "b"
}

Categories