Removing allowable whitespace in PHP using parser tokens - php

I am trying to make a simple script that will remove all unnecessary whitespace from a PHP file/string.
I have been successful at parsing the string using tokens, but don't see a good method to remove "extra" whitespace.
For instance,
function test() { return TRUE; }
should be
function test(){return TRUE;}
and NOT
functiontest(){returnTRUE;}
You end up with the last version if you just remove the T_WHITESPACE token.
Is there something I am missing to remove whitespace, but keep spaces after things like "function " and "return ". Thank you!

$newSource = '';
foreach (token_get_all($source) as $i => $token) {
if (!is_array($token)) {
$newSource .= $token;
}
if ($token[0] == T_WHITESPACE) {
if ( isset($tokens[$i - 1]) && isset($tokens[$i + 1])
&& is_array($tokens[$i - 1]) && is_array($tokens[$i + 1])
&& isLabel($tokens[$i - 1][1]) && isLabel($tokens[$i + 1][1])
) {
$newSource .= ' ';
}
} else {
$newSource .= $token[1];
}
}
function isLabel($str) {
return preg_match('~^[a-zA-Z0-9_\x7f-\xff]+$~', $str);
}
Removing the whitespace is always allowed, apart from the case where there is a LABEL on both sides of it. I check that and either add nothing or a single space character.
There is just another special case I know about, there whitespace is of importance: T_END_HEREDOC must be followed by either ; or \n. Compacting or stripping the space here is not allowed. So, if that is of importance to you, you can simply add that ;)

Well, T_WHITESPACE can be spaces or newlines, etc. So one trivial approach would be to automatically replace all T_WHITESPACE instances with a new one that consists of exactly one space.
But for a smarter method, just go through the list of parser tokens, and figure out which ones should have a whitespace after it and which should not (something like this):
foreach ($tokens as $k => $val) {
if (is_array($val) && $val[0] == T_WHITESPACE) {
if (!is_array($tokens[$k - 1])) {
//remove this space
} else {
switch ($tokens[$k - 1][0]) {
case T_ABSTRACT:
case T_FUNCTION:
//.. other keeps here:
continue;
break;
default:
//remove the space
}
}
}
}
And one more note, don't do this for performance. If you're using an OPCODE Cache (Such as APC) you'll see no benefit for a lot of work. If you're not using one, why aren't you?

Your effort is futile.
php -w
Allows already to strip whitespace off scripts. It uses a bit more elaborate logic to remove whitespace from the token stream.
Here's the zend_strip() function as found in zend_highlight.c:
while ((token_type=lex_scan(&token TSRMLS_CC))) {
switch (token_type) {
case T_WHITESPACE:
if (!prev_space) {
zend_write(" ", sizeof(" ") - 1);
prev_space = 1;
}
/* lack of break; is intentional */
case T_COMMENT:
case T_DOC_COMMENT:
token.type = 0;
continue;
case T_END_HEREDOC:
zend_write(LANG_SCNG(yy_text), LANG_SCNG(yy_leng));
efree(token.value.str.val);
/* read the following character, either newline or ; */
if (lex_scan(&token TSRMLS_CC) != T_WHITESPACE) {
zend_write(LANG_SCNG(yy_text), LANG_SCNG(yy_leng));
}
zend_write("\n", sizeof("\n") - 1);
prev_space = 1;
token.type = 0;
continue;
default:
zend_write(LANG_SCNG(yy_text), LANG_SCNG(yy_leng));
break;
}
if (token.type == IS_STRING) {
switch (token_type) {
case T_OPEN_TAG:
case T_OPEN_TAG_WITH_ECHO:
case T_CLOSE_TAG:
case T_WHITESPACE:
case T_COMMENT:
case T_DOC_COMMENT:
break;
default:
efree(token.value.str.val);
break;
}
}
prev_space = token.type = 0;
}

Related

Is it possible to debug this PHP

I'm really grasping at straws here. I don't code and my PHP email extension works great EXCEPT for these errors when editing the templates. My vendor is in who-knows what timezone and has been very unresponsive. Besides the obvious - change extensions, change vendors etc., can any of you PHP folk look at this code block and tell me what's wrong. I don't know if taken out of context it will be meaningful. The Errors are PHP Notice: Trying to get property of non-object beginning at line 1344. Here is a block of code from 1344 to 1370. It evidently has to do with switching to plain text when HTML isn't an option. AGAIN, grasping at straws here but stackoverflow is pretty solid when it comes to knowledgeable folk.
I'm re-pasting the code to include a more complete block. I have also added comments in the lines to indicate where the four warnings appear. They look like this:
//NEXT LINE IS ERROR ONE - LINE 1344 Notice: Trying to get property of non-object
Thank you
<?php
public static function isHTML($str) {
$html = array('A','ABBR','ACRONYM','ADDRESS','APPLET','AREA','B','BASE','BASEFONT','BDO','BIG','BLOCKQUOTE',
'BODY','BR','BUTTON','CAPTION','CENTER','CITE','CODE','COL','COLGROUP','DD','DEL','DFN','DIR','DIV','DL',
'DT','EM','FIELDSET','FONT','FORM','FRAME','FRAMESET','H1','H2','H3','H4','H5','H6','HEAD','HR','HTML',
'I','IFRAME','IMG','INPUT','INS','ISINDEX','KBD','LABEL','LEGEND','LI','LINK','MAP','MENU','META',
'NOFRAMES','NOSCRIPT','OBJECT','OL','OPTGROUP','OPTION','P','PARAM','PRE','Q','S','SAMP','SCRIPT',
'SELECT','SMALL','SPAN','STRIKE','STRONG','STYLE','SUB','SUP','TABLE','TBODY','TD','TEXTAREA','TFOOT',
'TH','THEAD','TITLE','TR','TT','U','UL','VAR');
return preg_match("~(<\/?)\b(".implode('|', $html).")\b([^>]*>)~i", $str, $c);
}
private function _html_to_plain_text($node) {
if ($node instanceof DOMText) {
return preg_replace("/\\s+/im", " ", $node->wholeText);
}
if ($node instanceof DOMDocumentType) {
// ignore
return "";
}
// Next
//NEXT LINE IS ERROR ONE - LINE 1344 Notice: Trying to get property of non-object
$nextNode = $node->nextSibling;
while ($nextNode != null) {
if ($nextNode instanceof DOMElement) {
break;
}
$nextNode = $nextNode->nextSibling;
}
$nextName = null;
if ($nextNode instanceof DOMElement && $nextNode != null) {
$nextName = strtolower($nextNode->nodeName);
}
// Previous
//NEXT LINE IS ERROR TWO - LINE 1357 Notice: Trying to get property of non-object
$nextNode = $node->previousSibling;
while ($nextNode != null) {
if ($nextNode instanceof DOMElement) {
break;
}
$nextNode = $nextNode->previousSibling;
}
$prevName = null;
if ($nextNode instanceof DOMElement && $nextNode != null) {
$prevName = strtolower($nextNode->nodeName);
}
//NEXT LINE IS ERROR THREE - LINE 1369 Notice: Trying to get property of non-object
$name = strtolower($node->nodeName);
// start whitespace
switch ($name) {
case "hr":
return "------\n";
case "style":
case "head":
case "title":
case "meta":
case "script":
// ignore these tags
return "";
case "h1":
case "h2":
case "h3":
case "h4":
case "h5":
case "h6":
// add two newlines
$output = "\n";
break;
case "p":
case "div":
// add one line
$output = "\n";
break;
default:
// print out contents of unknown tags
$output = "";
break;
}
// debug $output .= "[$name,$nextName]";
//NEXT LINE IS ERROR FOUR - LINE 1408 Notice: Trying to get property of non-object
if($node->childNodes){
for ($i = 0; $i < $node->childNodes->length; $i++) {
$n = $node->childNodes->item($i);
$text = $this->_html_to_plain_text($n);
$output .= $text;
}
}
// end whitespace
switch ($name) {
case "style":
case "head":
case "title":
case "meta":
case "script":
// ignore these tags
return "";
case "h1":
case "h2":
case "h3":
case "h4":
case "h5":
case "h6":
$output .= "\n";
break;
case "p":
case "br":
// add one line
if ($nextName != "div")
$output .= "\n";
break;
case "div":
// add one line only if the next child isn't a div
if (($nextName != "div" && $nextName != null) || ($node->hasAttribute('class') && strstr($node->getAttribute('class'), 'emailtemplateSpacing')))
$output .= "\n";
break;
case "a":
// links are returned in [text](link) format
$href = $node->getAttribute("href");
if ($href == null) {
// it doesn't link anywhere
if ($node->getAttribute("name") != null) {
$output = "$output";
}
} else {
if ($href == $output || ($node->hasAttribute('class') && strstr($node->getAttribute('class'), 'emailtemplateNoDisplay'))) {
// link to the same address: just use link
$output;
} else {
// No display
$output = $href . "\n" . $output;
}
}
// does the next node require additional whitespace?
switch ($nextName) {
case "h1": case "h2": case "h3": case "h4": case "h5": case "h6":
$output .= "\n";
break;
}
default:
// do nothing
}
return $output;
}
}
?>
Sounds like the $node variable coming into your function from private function _html_to_plain_text($node) isn't the correct variable type. Might be an int, string, array, or null, instead of whatever object/class it's supposed to be. You probably need to look at whatever code is calling that function, rather than the function itself.
Debugging in PHP is a skill in itself.
If coding by hand, consider inserting var_export($node); statements before and after areas where you suspect there is a problem so you can examine it.
If you have access to a debugger (for example, Eclipse + XAMPP + XDebug), use that and go line by line, examining the contents of the $node variable by hovering over it or looking at your variable list.

switch-case with preg_match (+ starts with / strpos)

I wanna accomplish something like this:
!say some message
which then is taken to a function, which proceeds with the content. As you can imagine, I'm listening on some text-inputs. Right now I'm facing the problem, that strings like this also trigger the case:
this is !say some text
Code:
$prefix = "!";
switch ($message->content) {
case preg_match("/[^.*]$prefix . 'say' (.*) /", $message->content):
_say($message);
break;
}
function _say($message)
{
$parts = explode("!say", $message->content);
return $message->channel->sendMessage(trim($parts[1]), true);
}
From my understanding, the regexp is correct (also confirmed on a regexp-tester).
Tested with substr as well, but that doesn't do it either.
case substr($message->content, 0, 3) == $prefix . 'say' :
What am I doing wrong?
A command is a command, it should not be something to be fooled around with. So instead of using regex I suggest a simple string comparison:
$prefix = "!";
function _say($message){
return $message->channel->sendMessage(trim($message), true);
}
if($message->content[0] == $prefix){
if($p = strpos($message->content, ' ') !== false){
$cmd = substr($message->content, 1, $p);
switch($cmd){
case 'say':
_say(substr($message->content, $p));
break;
default:
die('unknown command');
}
} else {
die('empty cmd');
}
} else {
die('no command');
}

Why is require_once echoing entire file contents? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I have a class in a file "evalmath.php".
If I require it like this: require_once('evalmath.php'); the entire contents of that file is echoed out to the screen.
If I do it like this, require_once( 'evalmath.php' );, it doesn't.
HUH?
EDIT - SOURCE CODE OF EVALMATH.PHP
<?
/*
================================================================================
EvalMath - PHP Class to safely evaluate math expressions
Copyright (C) 2005 Miles Kaufmann <http://www.twmagic.com/>
================================================================================
NAME
EvalMath - safely evaluate math expressions
SYNOPSIS
<?
include('evalmath.class.php');
$m = new EvalMath;
// basic evaluation:
$result = $m->evaluate('2+2');
// supports: order of operation; parentheses; negation; built-in functions
$result = $m->evaluate('-8(5/2)^2*(1-sqrt(4))-8');
// create your own variables
$m->evaluate('a = e^(ln(pi))');
// or functions
$m->evaluate('f(x,y) = x^2 + y^2 - 2x*y + 1');
// and then use them
$result = $m->evaluate('3*f(42,a)');
?>
DESCRIPTION
Use the EvalMath class when you want to evaluate mathematical expressions
from untrusted sources. You can define your own variables and functions,
which are stored in the object. Try it, it's fun!
METHODS
$m->evalute($expr)
Evaluates the expression and returns the result. If an error occurs,
prints a warning and returns false. If $expr is a function assignment,
returns true on success.
$m->e($expr)
A synonym for $m->evaluate().
$m->vars()
Returns an associative array of all user-defined variables and values.
$m->funcs()
Returns an array of all user-defined functions.
PARAMETERS
$m->suppress_errors
Set to true to turn off warnings when evaluating expressions
$m->last_error
If the last evaluation failed, contains a string describing the error.
(Useful when suppress_errors is on).
AUTHOR INFORMATION
Copyright 2005, Miles Kaufmann.
LICENSE
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1 Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/
class EvalMath {
var $suppress_errors = false;
var $last_error = null;
var $v = array('e'=>2.71,'pi'=>3.14); // variables (and constants)
var $f = array(); // user-defined functions
var $vb = array('e', 'pi'); // constants
var $fb = array( // built-in functions
'sin','sinh','arcsin','asin','arcsinh','asinh',
'cos','cosh','arccos','acos','arccosh','acosh',
'tan','tanh','arctan','atan','arctanh','atanh',
'sqrt','abs','ln','log');
function EvalMath() {
// make the variables a little more accurate
$this->v['pi'] = pi();
$this->v['e'] = exp(1);
}
function e($expr) {
return $this->evaluate($expr);
}
function evaluate($expr) {
$this->last_error = null;
$expr = trim($expr);
if (substr($expr, -1, 1) == ';') $expr = substr($expr, 0, strlen($expr)-1); // strip semicolons at the end
//===============
// is it a variable assignment?
if (preg_match('/^\s*([a-z]\w*)\s*=\s*(.+)$/', $expr, $matches)) {
if (in_array($matches[1], $this->vb)) { // make sure we're not assigning to a constant
return $this->trigger("cannot assign to constant '$matches[1]'");
}
if (($tmp = $this->pfx($this->nfx($matches[2]))) === false) return false; // get the result and make sure it's good
$this->v[$matches[1]] = $tmp; // if so, stick it in the variable array
return $this->v[$matches[1]]; // and return the resulting value
//===============
// is it a function assignment?
} elseif (preg_match('/^\s*([a-z]\w*)\s*\(\s*([a-z]\w*(?:\s*,\s*[a-z]\w*)*)\s*\)\s*=\s*(.+)$/', $expr, $matches)) {
$fnn = $matches[1]; // get the function name
if (in_array($matches[1], $this->fb)) { // make sure it isn't built in
return $this->trigger("cannot redefine built-in function '$matches[1]()'");
}
$args = explode(",", preg_replace("/\s+/", "", $matches[2])); // get the arguments
if (($stack = $this->nfx($matches[3])) === false) return false; // see if it can be converted to postfix
for ($i = 0; $i<count($stack); $i++) { // freeze the state of the non-argument variables
$token = $stack[$i];
if (preg_match('/^[a-z]\w*$/', $token) and !in_array($token, $args)) {
if (array_key_exists($token, $this->v)) {
$stack[$i] = $this->v[$token];
} else {
return $this->trigger("undefined variable '$token' in function definition");
}
}
}
$this->f[$fnn] = array('args'=>$args, 'func'=>$stack);
return true;
//===============
} else {
return $this->pfx($this->nfx($expr)); // straight up evaluation, woo
}
}
function vars() {
$output = $this->v;
unset($output['pi']);
unset($output['e']);
return $output;
}
function funcs() {
$output = array();
foreach ($this->f as $fnn=>$dat)
$output[] = $fnn . '(' . implode(',', $dat['args']) . ')';
return $output;
}
//===================== HERE BE INTERNAL METHODS ====================\\
// Convert infix to postfix notation
function nfx($expr) {
$index = 0;
$stack = new EvalMathStack;
$output = array(); // postfix form of expression, to be passed to pfx()
$expr = trim(strtolower($expr));
$ops = array('+', '-', '*', '/', '^', '_');
$ops_r = array('+'=>0,'-'=>0,'*'=>0,'/'=>0,'^'=>1); // right-associative operator?
$ops_p = array('+'=>0,'-'=>0,'*'=>1,'/'=>1,'_'=>1,'^'=>2); // operator precedence
$expecting_op = false; // we use this in syntax-checking the expression
// and determining when a - is a negation
if (preg_match("/[^\w\s+*^\/()\.,-]/", $expr, $matches)) { // make sure the characters are all good
return $this->trigger("illegal character '{$matches[0]}'");
}
while(1) { // 1 Infinite Loop ;)
$op = substr($expr, $index, 1); // get the first character at the current index
// find out if we're currently at the beginning of a number/variable/function/parenthesis/operand
$ex = preg_match('/^([a-z]\w*\(?|\d+(?:\.\d*)?|\.\d+|\()/', substr($expr, $index), $match);
//===============
if ($op == '-' and !$expecting_op) { // is it a negation instead of a minus?
$stack->push('_'); // put a negation on the stack
$index++;
} elseif ($op == '_') { // we have to explicitly deny this, because it's legal on the stack
return $this->trigger("illegal character '_'"); // but not in the input expression
//===============
} elseif ((in_array($op, $ops) or $ex) and $expecting_op) { // are we putting an operator on the stack?
if ($ex) { // are we expecting an operator but have a number/variable/function/opening parethesis?
$op = '*'; $index--; // it's an implicit multiplication
}
// heart of the algorithm:
while($stack->count > 0 and ($o2 = $stack->last()) and in_array($o2, $ops) and ($ops_r[$op] ? $ops_p[$op] < $ops_p[$o2] : $ops_p[$op] <= $ops_p[$o2])) {
$output[] = $stack->pop(); // pop stuff off the stack into the output
}
// many thanks: http://en.wikipedia.org/wiki/Reverse_Polish_notation#The_algorithm_in_detail
$stack->push($op); // finally put OUR operator onto the stack
$index++;
$expecting_op = false;
//===============
} elseif ($op == ')' and $expecting_op) { // ready to close a parenthesis?
while (($o2 = $stack->pop()) != '(') { // pop off the stack back to the last (
if (is_null($o2)) return $this->trigger("unexpected ')'");
else $output[] = $o2;
}
if (preg_match("/^([a-z]\w*)\($/", $stack->last(2), $matches)) { // did we just close a function?
$fnn = $matches[1]; // get the function name
$arg_count = $stack->pop(); // see how many arguments there were (cleverly stored on the stack, thank you)
$output[] = $stack->pop(); // pop the function and push onto the output
if (in_array($fnn, $this->fb)) { // check the argument count
if($arg_count > 1)
return $this->trigger("too many arguments ($arg_count given, 1 expected)");
} elseif (array_key_exists($fnn, $this->f)) {
if ($arg_count != count($this->f[$fnn]['args']))
return $this->trigger("wrong number of arguments ($arg_count given, " . count($this->f[$fnn]['args']) . " expected)");
} else { // did we somehow push a non-function on the stack? this should never happen
return $this->trigger("internal error");
}
}
$index++;
//===============
} elseif ($op == ',' and $expecting_op) { // did we just finish a function argument?
while (($o2 = $stack->pop()) != '(') {
if (is_null($o2)) return $this->trigger("unexpected ','"); // oops, never had a (
else $output[] = $o2; // pop the argument expression stuff and push onto the output
}
// make sure there was a function
if (!preg_match("/^([a-z]\w*)\($/", $stack->last(2), $matches))
return $this->trigger("unexpected ','");
$stack->push($stack->pop()+1); // increment the argument count
$stack->push('('); // put the ( back on, we'll need to pop back to it again
$index++;
$expecting_op = false;
//===============
} elseif ($op == '(' and !$expecting_op) {
$stack->push('('); // that was easy
$index++;
$allow_neg = true;
//===============
} elseif ($ex and !$expecting_op) { // do we now have a function/variable/number?
$expecting_op = true;
$val = $match[1];
if (preg_match("/^([a-z]\w*)\($/", $val, $matches)) { // may be func, or variable w/ implicit multiplication against parentheses...
if (in_array($matches[1], $this->fb) or array_key_exists($matches[1], $this->f)) { // it's a func
$stack->push($val);
$stack->push(1);
$stack->push('(');
$expecting_op = false;
} else { // it's a var w/ implicit multiplication
$val = $matches[1];
$output[] = $val;
}
} else { // it's a plain old var or num
$output[] = $val;
}
$index += strlen($val);
//===============
} elseif ($op == ')') { // miscellaneous error checking
return $this->trigger("unexpected ')'");
} elseif (in_array($op, $ops) and !$expecting_op) {
return $this->trigger("unexpected operator '$op'");
} else { // I don't even want to know what you did to get here
return $this->trigger("an unexpected error occured");
}
if ($index == strlen($expr)) {
if (in_array($op, $ops)) { // did we end with an operator? bad.
return $this->trigger("operator '$op' lacks operand");
} else {
break;
}
}
while (substr($expr, $index, 1) == ' ') { // step the index past whitespace (pretty much turns whitespace
$index++; // into implicit multiplication if no operator is there)
}
}
while (!is_null($op = $stack->pop())) { // pop everything off the stack and push onto output
if ($op == '(') return $this->trigger("expecting ')'"); // if there are (s on the stack, ()s were unbalanced
$output[] = $op;
}
return $output;
}
// evaluate postfix notation
function pfx($tokens, $vars = array()) {
if ($tokens == false) return false;
$stack = new EvalMathStack;
foreach ($tokens as $token) { // nice and easy
// if the token is a binary operator, pop two values off the stack, do the operation, and push the result back on
if (in_array($token, array('+', '-', '*', '/', '^'))) {
if (is_null($op2 = $stack->pop())) return $this->trigger("internal error");
if (is_null($op1 = $stack->pop())) return $this->trigger("internal error");
switch ($token) {
case '+':
$stack->push($op1+$op2); break;
case '-':
$stack->push($op1-$op2); break;
case '*':
$stack->push($op1*$op2); break;
case '/':
if ($op2 == 0) return $this->trigger("division by zero");
$stack->push($op1/$op2); break;
case '^':
$stack->push(pow($op1, $op2)); break;
}
// if the token is a unary operator, pop one value off the stack, do the operation, and push it back on
} elseif ($token == "_") {
$stack->push(-1*$stack->pop());
// if the token is a function, pop arguments off the stack, hand them to the function, and push the result back on
} elseif (preg_match("/^([a-z]\w*)\($/", $token, $matches)) { // it's a function!
$fnn = $matches[1];
if (in_array($fnn, $this->fb)) { // built-in function:
if (is_null($op1 = $stack->pop())) return $this->trigger("internal error");
$fnn = preg_replace("/^arc/", "a", $fnn); // for the 'arc' trig synonyms
if ($fnn == 'ln') $fnn = 'log';
eval('$stack->push(' . $fnn . '($op1));'); // perfectly safe eval()
} elseif (array_key_exists($fnn, $this->f)) { // user function
// get args
$args = array();
for ($i = count($this->f[$fnn]['args'])-1; $i >= 0; $i--) {
if (is_null($args[$this->f[$fnn]['args'][$i]] = $stack->pop())) return $this->trigger("internal error");
}
$stack->push($this->pfx($this->f[$fnn]['func'], $args)); // yay... recursion!!!!
}
// if the token is a number or variable, push it on the stack
} else {
if (is_numeric($token)) {
$stack->push($token);
} elseif (array_key_exists($token, $this->v)) {
$stack->push($this->v[$token]);
} elseif (array_key_exists($token, $vars)) {
$stack->push($vars[$token]);
} else {
return $this->trigger("undefined variable '$token'");
}
}
}
// when we're out of tokens, the stack should have a single element, the final result
if ($stack->count != 1) return $this->trigger("internal error");
return $stack->pop();
}
// trigger an error, but nicely, if need be
function trigger($msg) {
$this->last_error = $msg;
if (!$this->suppress_errors) trigger_error($msg, E_USER_WARNING);
return false;
}
}
// for internal use
class EvalMathStack {
var $stack = array();
var $count = 0;
function push($val) {
$this->stack[$this->count] = $val;
$this->count++;
}
function pop() {
if ($this->count > 0) {
$this->count--;
return $this->stack[$this->count];
}
return null;
}
function last($n=1) {
return $this->stack[$this->count-$n];
}
}
The class file starts with a short tag [<?] - if short tags are disabled [which should be default behavior], whole file is considered plain text and thus not parsed.
Just replace it with a full tag - <?php.
Just try to enclose the script with <?php instead of <? it might be with the servers configuration issue not accepting it.
Leading off your confirmation that short tags are disabled, I am guessing you are debugging your script like this?
// dumps everything!
require_once('evalmath.php');
// dumps nothing??
require_once( 'evalmath.php' );
If your testing looks something like that, then the only reason the second form doesn't dump your file is because it doesn't actually include it a second time. require_once.

How to clean and simplify this code?

After thinking about This Question and giving an answer to it I wanted to do more about that to train myself.
So I wrote a function which will calc the length of an given function. Th given php-file has to start at the beginning of the needed function.
Example: If the function is in a big phpfile with lots of functions, like
/* lots of functions */
function f_interesting($arg) {
/* function */
}
/* lots of other functions */
then $part3 of my function will require to begin like that (after the starting-{ of the interesting function):
/* function */
}
/* lots of other functions */
Now that's not the problem, but I would like to know if there are an cleaner or simplier ways to do this. Here's my function: (I already cleaned a lot of testing-echo-commands)
(The idea behind it is explained here)
function f_analysis ($part3) {
if(isset($part3)) {
$char_array = str_split($part3); //get array of chars
$end_key = false; //length of function
$depth = 0; //How much of unclosed '{'
$in_sstr = false; //is next char inside in ''-String?
$in_dstr = false; //is nect char inside an ""-String?
$in_sl_comment = false; //inside an //-comment?
$in_ml_comment = false; //inside an /* */-comment?
$may_comment = false; //was the last char an '/' which can start a comment?
$may_ml_comment_end = false; //was the last char an '*' which may end a /**/-comment?
foreach($char_array as $key=>$char) {
if($in_sstr) {
if ($char == "'") {
$in_sstr = false;
}
}
else if($in_dstr) {
if($char == '"') {
$in_dstr = false;
}
}
else if($in_sl_comment) {
if($char == "\n") {
$in_sl_comment = false;
}
}
else if($in_ml_comment) {
if($may_ml_comment_end) {
$may_ml_comment_end = false;
if($char == '/') {
$in_ml_comment = false;
}
}
if($char == '*') {
$may_ml_comment_end = true;
}
}
else if ($may_comment) {
if($char == '/') {
$in_sl_comment = true;
}
else if($char == '*') {
$in_ml_comment = true;
}
$may_comment = false;
}
else {
switch ($char) {
case '{':
$depth++;
break;
case '}':
$depth--;
break;
case '/':
$may_comment = true;
break;
case '"':
$in_dstr = true;
break;
case "'":
$in_sstr = true;
break;
}
}
if($depth < 0) {
$last_key = $key;
break;
}
}
} else echo '<br>$part3 of f_analysis not set!';
return ($last_key===false) ? false : $last_key+1; //will be false or the length of the function
}
Tokenizer (Example) - Learn it, love it.
You could probably reduce the number of state variables a little, but truthfully... yes, it will be messy code. I would probably get rid of $may_ml_comment_end and peek ahead for the next character when I encounter an asterisk, for example. You will need to rewrite your foreach loop to a regular for loop be able to do that without creating a bigger mess though.
PS: I don't see you handling the escape character yet. Without the above approach, that would introduce another boolean variable.
Another problem with your current code is that characters immediately following a / don't get interpreted as they should. However unlikely
echo 5/'2'; // NB: no space in between
is valid in PHP and would break your parser.

How to evaluate formula passed as string in PHP?

Just trying to figure out the proper and safer way to execute mathematical operation passed as string. In my scenario it is values fetched from image EXIF data.
After little research I found two way of doing it.
first, using eval:
function calculator1($str){
eval("\$str = $str;");
return $str;
}
second, using create_function:
function calculator2($str){
$fn = create_function("", "return ({$str});" );
return $fn();
};
Both examples require string cleanup to avoid malicious code execution. Is there any other or shorter way of doing so?
This might help.
http://www.phpclasses.org/browse/package/2695.html
Annoying login required to download. I copied an pasted it here for you.
This class can be used to safely evaluate mathematical expressions.
The class can take an expression in a text string and evaluate it by replacing values of variables and calculating the results of mathematical functions and operations.
It supports implicit multiplication, multivariable functions and nested functions.
It can be used to evaluate expressions from untrusted sources. It provides robust error checking and only evaluates a limited set of functions.
It could be used to generate graphs from expressions of formulae.
/*
================================================================================
EvalMath - PHP Class to safely evaluate math expressions
Copyright (C) 2005 Miles Kaufmann <http://www.twmagic.com/>
================================================================================
NAME
EvalMath - safely evaluate math expressions
SYNOPSIS
<?
include('evalmath.class.php');
$m = new EvalMath;
// basic evaluation:
$result = $m->evaluate('2+2');
// supports: order of operation; parentheses; negation; built-in functions
$result = $m->evaluate('-8(5/2)^2*(1-sqrt(4))-8');
// create your own variables
$m->evaluate('a = e^(ln(pi))');
// or functions
$m->evaluate('f(x,y) = x^2 + y^2 - 2x*y + 1');
// and then use them
$result = $m->evaluate('3*f(42,a)');
?>
DESCRIPTION
Use the EvalMath class when you want to evaluate mathematical expressions
from untrusted sources. You can define your own variables and functions,
which are stored in the object. Try it, it's fun!
METHODS
$m->evalute($expr)
Evaluates the expression and returns the result. If an error occurs,
prints a warning and returns false. If $expr is a function assignment,
returns true on success.
$m->e($expr)
A synonym for $m->evaluate().
$m->vars()
Returns an associative array of all user-defined variables and values.
$m->funcs()
Returns an array of all user-defined functions.
PARAMETERS
$m->suppress_errors
Set to true to turn off warnings when evaluating expressions
$m->last_error
If the last evaluation failed, contains a string describing the error.
(Useful when suppress_errors is on).
AUTHOR INFORMATION
Copyright 2005, Miles Kaufmann.
LICENSE
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1 Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/
class EvalMath {
var $suppress_errors = false;
var $last_error = null;
var $v = array('e'=>2.71,'pi'=>3.14); // variables (and constants)
var $f = array(); // user-defined functions
var $vb = array('e', 'pi'); // constants
var $fb = array( // built-in functions
'sin','sinh','arcsin','asin','arcsinh','asinh',
'cos','cosh','arccos','acos','arccosh','acosh',
'tan','tanh','arctan','atan','arctanh','atanh',
'sqrt','abs','ln','log');
function EvalMath() {
// make the variables a little more accurate
$this->v['pi'] = pi();
$this->v['e'] = exp(1);
}
function e($expr) {
return $this->evaluate($expr);
}
function evaluate($expr) {
$this->last_error = null;
$expr = trim($expr);
if (substr($expr, -1, 1) == ';') $expr = substr($expr, 0, strlen($expr)-1); // strip semicolons at the end
//===============
// is it a variable assignment?
if (preg_match('/^\s*([a-z]\w*)\s*=\s*(.+)$/', $expr, $matches)) {
if (in_array($matches[1], $this->vb)) { // make sure we're not assigning to a constant
return $this->trigger("cannot assign to constant '$matches[1]'");
}
if (($tmp = $this->pfx($this->nfx($matches[2]))) === false) return false; // get the result and make sure it's good
$this->v[$matches[1]] = $tmp; // if so, stick it in the variable array
return $this->v[$matches[1]]; // and return the resulting value
//===============
// is it a function assignment?
} elseif (preg_match('/^\s*([a-z]\w*)\s*\(\s*([a-z]\w*(?:\s*,\s*[a-z]\w*)*)\s*\)\s*=\s*(.+)$/', $expr, $matches)) {
$fnn = $matches[1]; // get the function name
if (in_array($matches[1], $this->fb)) { // make sure it isn't built in
return $this->trigger("cannot redefine built-in function '$matches[1]()'");
}
$args = explode(",", preg_replace("/\s+/", "", $matches[2])); // get the arguments
if (($stack = $this->nfx($matches[3])) === false) return false; // see if it can be converted to postfix
for ($i = 0; $i<count($stack); $i++) { // freeze the state of the non-argument variables
$token = $stack[$i];
if (preg_match('/^[a-z]\w*$/', $token) and !in_array($token, $args)) {
if (array_key_exists($token, $this->v)) {
$stack[$i] = $this->v[$token];
} else {
return $this->trigger("undefined variable '$token' in function definition");
}
}
}
$this->f[$fnn] = array('args'=>$args, 'func'=>$stack);
return true;
//===============
} else {
return $this->pfx($this->nfx($expr)); // straight up evaluation, woo
}
}
function vars() {
$output = $this->v;
unset($output['pi']);
unset($output['e']);
return $output;
}
function funcs() {
$output = array();
foreach ($this->f as $fnn=>$dat)
$output[] = $fnn . '(' . implode(',', $dat['args']) . ')';
return $output;
}
//===================== HERE BE INTERNAL METHODS ====================\\
// Convert infix to postfix notation
function nfx($expr) {
$index = 0;
$stack = new EvalMathStack;
$output = array(); // postfix form of expression, to be passed to pfx()
$expr = trim(strtolower($expr));
$ops = array('+', '-', '*', '/', '^', '_');
$ops_r = array('+'=>0,'-'=>0,'*'=>0,'/'=>0,'^'=>1); // right-associative operator?
$ops_p = array('+'=>0,'-'=>0,'*'=>1,'/'=>1,'_'=>1,'^'=>2); // operator precedence
$expecting_op = false; // we use this in syntax-checking the expression
// and determining when a - is a negation
if (preg_match("/[^\w\s+*^\/()\.,-]/", $expr, $matches)) { // make sure the characters are all good
return $this->trigger("illegal character '{$matches[0]}'");
}
while(1) { // 1 Infinite Loop ;)
$op = substr($expr, $index, 1); // get the first character at the current index
// find out if we're currently at the beginning of a number/variable/function/parenthesis/operand
$ex = preg_match('/^([a-z]\w*\(?|\d+(?:\.\d*)?|\.\d+|\()/', substr($expr, $index), $match);
//===============
if ($op == '-' and !$expecting_op) { // is it a negation instead of a minus?
$stack->push('_'); // put a negation on the stack
$index++;
} elseif ($op == '_') { // we have to explicitly deny this, because it's legal on the stack
return $this->trigger("illegal character '_'"); // but not in the input expression
//===============
} elseif ((in_array($op, $ops) or $ex) and $expecting_op) { // are we putting an operator on the stack?
if ($ex) { // are we expecting an operator but have a number/variable/function/opening parethesis?
$op = '*'; $index--; // it's an implicit multiplication
}
// heart of the algorithm:
while($stack->count > 0 and ($o2 = $stack->last()) and in_array($o2, $ops) and ($ops_r[$op] ? $ops_p[$op] < $ops_p[$o2] : $ops_p[$op] <= $ops_p[$o2])) {
$output[] = $stack->pop(); // pop stuff off the stack into the output
}
// many thanks: http://en.wikipedia.org/wiki/Reverse_Polish_notation#The_algorithm_in_detail
$stack->push($op); // finally put OUR operator onto the stack
$index++;
$expecting_op = false;
//===============
} elseif ($op == ')' and $expecting_op) { // ready to close a parenthesis?
while (($o2 = $stack->pop()) != '(') { // pop off the stack back to the last (
if (is_null($o2)) return $this->trigger("unexpected ')'");
else $output[] = $o2;
}
if (preg_match("/^([a-z]\w*)\($/", $stack->last(2), $matches)) { // did we just close a function?
$fnn = $matches[1]; // get the function name
$arg_count = $stack->pop(); // see how many arguments there were (cleverly stored on the stack, thank you)
$output[] = $stack->pop(); // pop the function and push onto the output
if (in_array($fnn, $this->fb)) { // check the argument count
if($arg_count > 1)
return $this->trigger("too many arguments ($arg_count given, 1 expected)");
} elseif (array_key_exists($fnn, $this->f)) {
if ($arg_count != count($this->f[$fnn]['args']))
return $this->trigger("wrong number of arguments ($arg_count given, " . count($this->f[$fnn]['args']) . " expected)");
} else { // did we somehow push a non-function on the stack? this should never happen
return $this->trigger("internal error");
}
}
$index++;
//===============
} elseif ($op == ',' and $expecting_op) { // did we just finish a function argument?
while (($o2 = $stack->pop()) != '(') {
if (is_null($o2)) return $this->trigger("unexpected ','"); // oops, never had a (
else $output[] = $o2; // pop the argument expression stuff and push onto the output
}
// make sure there was a function
if (!preg_match("/^([a-z]\w*)\($/", $stack->last(2), $matches))
return $this->trigger("unexpected ','");
$stack->push($stack->pop()+1); // increment the argument count
$stack->push('('); // put the ( back on, we'll need to pop back to it again
$index++;
$expecting_op = false;
//===============
} elseif ($op == '(' and !$expecting_op) {
$stack->push('('); // that was easy
$index++;
$allow_neg = true;
//===============
} elseif ($ex and !$expecting_op) { // do we now have a function/variable/number?
$expecting_op = true;
$val = $match[1];
if (preg_match("/^([a-z]\w*)\($/", $val, $matches)) { // may be func, or variable w/ implicit multiplication against parentheses...
if (in_array($matches[1], $this->fb) or array_key_exists($matches[1], $this->f)) { // it's a func
$stack->push($val);
$stack->push(1);
$stack->push('(');
$expecting_op = false;
} else { // it's a var w/ implicit multiplication
$val = $matches[1];
$output[] = $val;
}
} else { // it's a plain old var or num
$output[] = $val;
}
$index += strlen($val);
//===============
} elseif ($op == ')') { // miscellaneous error checking
return $this->trigger("unexpected ')'");
} elseif (in_array($op, $ops) and !$expecting_op) {
return $this->trigger("unexpected operator '$op'");
} else { // I don't even want to know what you did to get here
return $this->trigger("an unexpected error occured");
}
if ($index == strlen($expr)) {
if (in_array($op, $ops)) { // did we end with an operator? bad.
return $this->trigger("operator '$op' lacks operand");
} else {
break;
}
}
while (substr($expr, $index, 1) == ' ') { // step the index past whitespace (pretty much turns whitespace
$index++; // into implicit multiplication if no operator is there)
}
}
while (!is_null($op = $stack->pop())) { // pop everything off the stack and push onto output
if ($op == '(') return $this->trigger("expecting ')'"); // if there are (s on the stack, ()s were unbalanced
$output[] = $op;
}
return $output;
}
// evaluate postfix notation
function pfx($tokens, $vars = array()) {
if ($tokens == false) return false;
$stack = new EvalMathStack;
foreach ($tokens as $token) { // nice and easy
// if the token is a binary operator, pop two values off the stack, do the operation, and push the result back on
if (in_array($token, array('+', '-', '*', '/', '^'))) {
if (is_null($op2 = $stack->pop())) return $this->trigger("internal error");
if (is_null($op1 = $stack->pop())) return $this->trigger("internal error");
switch ($token) {
case '+':
$stack->push($op1+$op2); break;
case '-':
$stack->push($op1-$op2); break;
case '*':
$stack->push($op1*$op2); break;
case '/':
if ($op2 == 0) return $this->trigger("division by zero");
$stack->push($op1/$op2); break;
case '^':
$stack->push(pow($op1, $op2)); break;
}
// if the token is a unary operator, pop one value off the stack, do the operation, and push it back on
} elseif ($token == "_") {
$stack->push(-1*$stack->pop());
// if the token is a function, pop arguments off the stack, hand them to the function, and push the result back on
} elseif (preg_match("/^([a-z]\w*)\($/", $token, $matches)) { // it's a function!
$fnn = $matches[1];
if (in_array($fnn, $this->fb)) { // built-in function:
if (is_null($op1 = $stack->pop())) return $this->trigger("internal error");
$fnn = preg_replace("/^arc/", "a", $fnn); // for the 'arc' trig synonyms
if ($fnn == 'ln') $fnn = 'log';
eval('$stack->push(' . $fnn . '($op1));'); // perfectly safe eval()
} elseif (array_key_exists($fnn, $this->f)) { // user function
// get args
$args = array();
for ($i = count($this->f[$fnn]['args'])-1; $i >= 0; $i--) {
if (is_null($args[$this->f[$fnn]['args'][$i]] = $stack->pop())) return $this->trigger("internal error");
}
$stack->push($this->pfx($this->f[$fnn]['func'], $args)); // yay... recursion!!!!
}
// if the token is a number or variable, push it on the stack
} else {
if (is_numeric($token)) {
$stack->push($token);
} elseif (array_key_exists($token, $this->v)) {
$stack->push($this->v[$token]);
} elseif (array_key_exists($token, $vars)) {
$stack->push($vars[$token]);
} else {
return $this->trigger("undefined variable '$token'");
}
}
}
// when we're out of tokens, the stack should have a single element, the final result
if ($stack->count != 1) return $this->trigger("internal error");
return $stack->pop();
}
// trigger an error, but nicely, if need be
function trigger($msg) {
$this->last_error = $msg;
if (!$this->suppress_errors) trigger_error($msg, E_USER_WARNING);
return false;
}
}
// for internal use
class EvalMathStack {
var $stack = array();
var $count = 0;
function push($val) {
$this->stack[$this->count] = $val;
$this->count++;
}
function pop() {
if ($this->count > 0) {
$this->count--;
return $this->stack[$this->count];
}
return null;
}
function last($n=1) {
return $this->stack[$this->count-$n];
}
}
EDIT: Jitters wanted the version that supports reverse polish notation. Reminds me of my college days when I had an HP calculator :)
<?php
/* This Class can be useful for writting RPN macros or FORTH like parsers
#Author: Arturo Gonzalez-Mata Santana (Spain)
arturogmata#gmail.com
#copyright 2007: www.phpsqlasp.com
It is part of a project to recover "macros" from some old aplications
This code is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 3
of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA
*/
class RPNstack
{
var $data=array();
var $compare=0;
function pop() {return array_shift ($this->data);}
function push($x) {array_unshift($this->data, $x);}
function count() {return count($this->data);}
function first() {return $this->data[0];}
function top() {return end($this->data);} //last element of
function swap() { // interchange tow elements
$t = $this->data[1];
$this->data[1] = $this->data[0];
$this->data[0] = $t;
}
function dup() { // put a copy of X element in the stack
array_unshift($this->data, $this->data[0]);
}
function dump(){ // dump array data for debuging
print_r($this->data);
}
function parse($tok) // execute actions with the stack for each token
{
$r = null;
$tok = strtoupper(trim($tok));
//$this->dump(); // this line is for debugging purpose only
switch ($tok) :
// FIRST "IF THEN" AND OTHER FLOW CONTROLS
case ('THEN'): break;
case('IF'):
if ($this->pop() == 0) do { // if condition is false do nothing until "THEN"
$tok = strtoupper(strtok (" "));
} while ($tok <> "THEN"); // IF THERE IS NO "THEN" THIS SHALL BE AN ENLESS LOOP
break;
// basic math operators //OPERADORES MATEMATICOS BASICOS
case('+'):
$r = $this->pop() + $this->pop();
// $r = array_shift($this->data) + array_shift($this->data); // is more efficient but less understable
break;
case('-'):
$r = $this->pop(); $r = $this->pop()-$r;
break;
case('*'):
$r = $this->pop() * $this->pop();
break;
case('/'):
$r = $this->pop(); $r = $this->pop() / $r;
break;
// stack operators //OPERADORES DE PILA
case ('DUP'):
$r=$this->dup();
break;
case ('SWAP'):
$this->swap();
break;
// COMPARISON OPERATORS
case ('='):
if ($this->data[0] == $this->data[1]) $r = $this->push(1);
else $r = $this->push(0);
break;
case ('<>'):
if ($this->data[0] <> $this->data[1]) $r = $this->push(1);
else $r = $this->push(0);
break;
case ('<'):
if ($this->data[0] < $this->data[1]) $r = $this->push(1);
else $r = $this->push(0);
break;
case ('>'):
if ($this->data[0] > $this->data[1]) $r = $this->push(1);
else $r = $this->push(0);
break;
case ('>='):
if ($this->data[0] >= $this->data[1]) $r = $this->push(1);
else $r = $this->push(0);
break;
case ('<='):
if ($this->data[0] <= $this->data[1]) $r = $this->push(1);
else $r = $this->push(0);
break;
// WARNING FOR NON IMPLEMENTED FUNCTIONS
default:
return sprintf('I don\'t know how to "%s" ', $tok);
endswitch;
if (!is_null($r)) $this->push($r);
return $r;
} // parse
function parse_line($cadena)
{
$tok = strtok ($cadena," ");
while ($tok!= '') {
if (is_numeric ($tok)) {
$this->push($tok);
} else {
$r = $this->parse($tok);
}
$tok = strtok (" ");
}
return $r;
}
} // class RPN
?>

Categories