Building a computer algebra system

Building a computer algebra system - php

I'm creating a CAS (Computer Algebra System) in PHP, but I'm stuck right now. I am using this website.
Now I wrote a tokenizer. It will convert an equation like this:
1+2x-3*(4-5*(3x))
to this:
NUMBER PLUS_OPERATOR NUMBER VAR[X] MINUS_OPERATOR NUMBER MULTIPLY_OPERATOR GROUP
(where group is another set of tokens). How can I simplify this equation? Yeah, I know what you can do: adding X-vars, but they are in the sub-group. What is the best method I can use for handling those tokens?

A really useful next step would be to construct a parse tree:
You'd make one of these by writing an infix parser. You could either do this by writing a simple recursive descent parser, or by bringing in the big guns and using a parser generator. In either case, it helps to construct a formal grammar:
expression: additive
additive: multiplicative ([+-] multiplicative)*
multiplicative: primary ('*' primary)*
primary: variable
| number
| '(' expression ')'
Note that this grammar does not handle the 2x syntax, but it should be easy to add.
Notice the clever use of recursion in the grammar rules. primary only captures variables, numbers, and parenthesized expressions, and stops when it runs into an operator. multiplicative parses one or more primary expressions delimited by * signs, but stops when it runs into a + or - sign. additive parses one or more multiplicative expressions delimited by + and -, but stops when it runs into a ). Hence, the recursion scheme determines operator precedence.
It isn't too terribly difficult to implement a predictive parser by hand, as I've done below (see full example at ideone.com):
function parse()
{
global $tokens;
reset($tokens);
$ret = parseExpression();
if (current($tokens) !== FALSE)
die("Stray token at end of expression\n");
return $ret;
}
function popToken()
{
global $tokens;
$ret = current($tokens);
if ($ret !== FALSE)
next($tokens);
return $ret;
}
function parseExpression()
{
return parseAdditive();
}
function parseAdditive()
{
global $tokens;
$expr = parseMultiplicative();
for (;;) {
$next = current($tokens);
if ($next !== FALSE && $next->type == "operator" &&
($next->op == "+" || $next->op == "-"))
{
next($tokens);
$left = $expr;
$right = parseMultiplicative();
$expr = mkOperatorExpr($next->op, $left, $right);
} else {
return $expr;
}
}
}
function parseMultiplicative()
{
global $tokens;
$expr = parsePrimary();
for (;;) {
$next = current($tokens);
if ($next !== FALSE && $next->type == "operator" &&
$next->op == "*")
{
next($tokens);
$left = $expr;
$right = parsePrimary();
$expr = mkOperatorExpr($next->op, $left, $right);
} else {
return $expr;
}
}
}
function parsePrimary()
{
$tok = popToken();
if ($tok === FALSE)
die("Unexpected end of token list\n");
if ($tok->type == "variable")
return mkVariableExpr($tok->name);
if ($tok->type == "number")
return mkNumberExpr($tok->value);
if ($tok->type == "operator" && $tok->op == "(") {
$ret = parseExpression();
$tok = popToken();
if ($tok->type == "operator" && $tok->op == ")")
return $ret;
else
die("Missing end parenthesis\n");
}
die("Unexpected $tok->type token\n");
}
Okay, so now you have this lovely parse tree, and even a pretty picture to go with it. Now what? Your goal (for now) might be to simply combine terms to get a result of the form:
n1*a + n2*b + n3*c + n4*d + ...
I'll leave that part to you. Having a parse tree should make things much more straightforward.

PHP is good at strings, numbers, and arrays. But it is a poor language for implementing symbolic formula manipulation, because it has no native machinery for processing "symbolic expressions", for which you really want trees. Yes, you can implement all that machinery. What is harder is to do the algebraic manipulations. Its quite a lot of work if you want do build something semi-sophisticated. Ideally you want machinery to help you write the transformations directly and easily.
For instance, how will you implement arbitrary algebra rules? Associativity and commutativity? Term "matching at a distance"?, e.g.
(3*a+b)-2(a-b)+a ==> 3a-b
You can look at how a simple CAS can be implemented using our DMS program transformation system. DMS has hard mathematical constructs like commutativity and associativity built in, and you can write algebra rules explicitly to operate on symbolic formulas.

The book
Computer Algebra and Symbolic Computation: Mathematical Methods by Joel S. Cohen
describes an algorithm for automatic simplification of algebraic expressions.
This algorithm is used in the Symbolism computer algebra library for C#. Going with your example, the following C# program:
var x = new Symbol("x");
(1 + 2 * x - 3 * (4 - 5 * (3 * x)))
.AlgebraicExpand()
.Disp();
displays the following at the console:
-11 + 47 * x

Related

PHP Sum of two numbers resulting in a large numbers with a + symbol [duplicate]

Ok, so PHP isn't the best language to be dealing with arbitrarily large integers in, considering that it only natively supports 32-bit signed integers. What I'm trying to do though is create a class that could represent an arbitrarily large binary number and be able to perform simple arithmetic operations on two of them (add/subtract/multiply/divide).
My target is dealing with 128-bit integers.
There's a couple of approaches I'm looking at, and problems I see with them. Any input or commentary on what you would choose and how you might go about it would be greatly appreciated.
Approach #1: Create a 128-bit integer class that stores its integer internally as four 32-bit integers. The only problem with this approach is that I'm not sure how to go about handling overflow/underflow issues when manipulating individual chunks of the two operands.
Approach #2: Use the bcmath extension, as this looks like something it was designed to tackle. My only worry in taking this approach is the scale setting of the bcmath extension, because there can't be any rounding errors in my 128-bit integers; they must be precise. I'm also worried about being able to eventually convert the result of the bcmath functions into a binary string (which I'll later need to shove into some mcrypt encryption functions).
Approach #3: Store the numbers as binary strings (probably LSB first). Theoretically I should be able to store integers of any arbitrary size this way. All I would have to do is write the four basic arithmetic functions to perform add/sub/mult/div on two binary strings and produce a binary string result. This is exactly the format I need to hand over to mcrypt as well, so that's an added plus. This is the approach I think has the most promise at the moment, but the one sticking point I've got is that PHP doesn't offer me any way to manipulate the individual bits (that I know of). I believe I'd have to break it up into byte-sized chunks (no pun intended), at which point my questions about handling overflow/underflow from Approach #1 apply.

The PHP GMP extension will be better for this. As an added bonus, you can use it to do your decimal-to-binary conversion, like so:
gmp_strval(gmp_init($n, 10), 2);

There are already various classes available for this so you may wish to look at them before writing your own solution (if indeed writing your own solution is still needed).

As far as I can tell, the bcmath extension is the one you'll want. The data in the PHP manual is a little sparse, but you out to be able to set the precision to be exactly what you need by using the bcscale() function, or the optional third parameter in most of the other bcmath functions. Not too sure on the binary strings thing, but a bit of googling tells me you ought to be able to do with by making use of the pack() function.

I implemented the following PEMDAS complaint BC evaluator which may be useful to you.
function BC($string, $precision = 32)
{
if (extension_loaded('bcmath') === true)
{
if (is_array($string) === true)
{
if ((count($string = array_slice($string, 1)) == 3) && (bcscale($precision) === true))
{
$callback = array('^' => 'pow', '*' => 'mul', '/' => 'div', '%' => 'mod', '+' => 'add', '-' => 'sub');
if (array_key_exists($operator = current(array_splice($string, 1, 1)), $callback) === true)
{
$x = 1;
$result = #call_user_func_array('bc' . $callback[$operator], $string);
if ((strcmp('^', $operator) === 0) && (($i = fmod(array_pop($string), 1)) > 0))
{
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string = array_shift($string), $x, $i = pow($i, -1)));
do
{
$x = $y;
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string, $x, $i));
}
while (BC(sprintf('%s > %s', $x, $y)));
}
if (strpos($result = bcmul($x, $result), '.') !== false)
{
$result = rtrim(rtrim($result, '0'), '.');
if (preg_match(sprintf('~[.][9]{%u}$~', $precision), $result) > 0)
{
$result = bcadd($result, (strncmp('-', $result, 1) === 0) ? -1 : 1, 0);
}
else if (preg_match(sprintf('~[.][0]{%u}[1]$~', $precision - 1), $result) > 0)
{
$result = bcmul($result, 1, 0);
}
}
return $result;
}
return intval(version_compare(call_user_func_array('bccomp', $string), 0, $operator));
}
$string = array_shift($string);
}
$string = str_replace(' ', '', str_ireplace('e', ' * 10 ^ ', $string));
while (preg_match('~[(]([^()]++)[)]~', $string) > 0)
{
$string = preg_replace_callback('~[(]([^()]++)[)]~', __FUNCTION__, $string);
}
foreach (array('\^', '[\*/%]', '[\+-]', '[<>]=?|={1,2}') as $operator)
{
while (preg_match(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), $string) > 0)
{
$string = preg_replace_callback(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), __FUNCTION__, $string, 1);
}
}
}
return (preg_match('~^[+-]?[0-9]++(?:[.][0-9]++)?$~', $string) > 0) ? $string : false;
}
It automatically deals with rounding errors, just set the precision to whatever digits you need.

Password Validation With Multiple Rules

I'm attempting to write a regex in PHP that validates the following:
At least 10 chars
Has at least 2 Upper-case characters
Has at least 2 Numbers OR Symbols
I've looked at just about every reference I can find but, to no avail.
I guess I can test individually, but that makes me very sad :(
Can someone please help? (And send me to a spot where I can learn in plain English Reg Ex?)

This picture is worth more than 1000 words
(and that's a lot of entropy)
(image via XKCD)
With this in mind you might want to consider dropping rules 2 & 3 if password length is higher than X (say.. 20) or increase the minimum to at least 16 characters (as the only rule).
As for your requirement:
As opposed to having one big, ugly, hard-to-maintain, advanced RegExp you might want to break the problem in smaller parts and tackle each bit separately using dedicated functions.
For this you could look at ctype_* functions, count_chars() and MultiByte String Functions.
Now the ugly:
This advanced RegEx will return true or false according to your rules:
preg_match('/^(?=.{10,}$)(?=.*?[A-Z].*?[A-Z])(?=.*?([\x20-\x40\x5b-\x60\x7b-\x7e\x80-\xbf]).*?(?1).*?$).*$/',$string);
Test demo here: http://regex101.com/r/qE9eB2
1st part (LookAhead) : (?=.{10,}$) will check string length and continue if it has at least 10 characters. You could drop this and do a check with strlen() or even better mb_strlen().
2nd part (also a LookAhead): (?=.*?[A-Z].*?[A-Z]) will check for the presence of 2 UPPERCASE characters. You could also do a $upper=preg_replace('/[^A-Z]/','',$string) instead and count the chars in $upper to be more than two.
3rd LookAhead uses a character class: [\x20-\x40\x5b-\x60\x7b-\x7e\x80-\xbf] with hex escaped character ranges for common symbols (pretty much all the symbols one could find on an average keyboard). You could also do a $sym=preg_replace('/[^a-zA-Z]/','',$string) instead and count the chars in $sym to be more than two. Note: to make it shorter I used a recursive group (?1) to not repeat the same character class again
For learning, the most comprehensive RegExp reference I know of is: regular-expressions.info

You can use lookaheads to make sure that what you are looking for is contained appropriately.
/(?=.*[A-Z].*[A-Z])(?=.*[^a-zA-Z].*[^a-zA-Z]).{10,}/

I have always preferred good old procedural code for handling stuff like this. Regular expressions can be useful but they can also be a little cumbersome, especially for code maintenance and quick scanning (regular expressions are not exactly examples of readability).
function strContains($string, $contains, $n = 1, $exact = false) {
$length = strlen($string);
$tally = 0;
for ($i = 0; $i < $length; $i++) {
if (strpos($contains, $string[$i]) !== false) {
$tally++;
}
}
return ($exact ? $tally == $n : $tally >= $n);
}
function validPassword($password) {
if (strlen($password) < 10) {
return false;
}
$upperChars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$upperCount = 2;
if (strContains($password, $upperChars, $upperCount) === false) {
return false;
}
$numSymChars = '0123456789!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~';
$numSymCount = 2;
if (strContains($password, $numSymChars, $numSymCount) === false) {
return false;
}
return true;
}

Help in Converting Small Python Code to PHP

please i need some help in converting a python code to a php syntax
the code is for generating an alphanumeric code using alpha encoding
the code :
def mkcpl(x):
x = ord(x)
set="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
for c in set:
d = ord(c)^x
if chr(d) in set:
return 0,c,chr(d)
if chr(0xff^d) in set:
return 1,c,chr(0xff^d)
raise Exception,"No encoding found for %#02x"%x
def mkalphadecryptloader(shcode):
s="hAAAAX5AAAAHPPPPPPPPa"
shcode=list(shcode)
shcode.reverse()
shcode = "".join(shcode)
shcode += "\x90"*((-len(shcode))%4)
for b in range(len(shcode)/4):
T,C,D = 0,"",""
for i in range(4):
t,c,d = mkcpl(shcode[4*b+i])
T += t << i
C = c+C
D = d+D
s += "h%sX5%sP" % (C,D)
if T > 0:
s += "TY"
T = (2*T^T)%16
for i in range(4):
if T & 1:
s += "19"
T >>= 1
if T == 0:
break
s += "I"
return s+"\xff\xe4"
any help would be really appreciated ...

i will help you a little. For the rest of it, please read up on the documentation.
function mkcpl($x){
$x=ord($x);
$set="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
$set=str_split($set);
foreach($set as $c){
$d=ord($c)^$x;
if( in_array( chr($d) ,$set ) ){
return array(0,$c,chr($d));
}
if ( in_array( chr(0xff^d) ,$set ) ){
return array(0,$c,chr(0xff^$d));
}
}
}
function mkalphadecryptloader($shcode){
$s="hAAAAX5AAAAHPPPPPPPPa";
# you could use strrev()
$shcode=str_split($shcode);
$shcode=array_reverse($shcode);
$shcode=implode("",$shcode);
# continue on... read the documentation
}
print_r(mkcpl("A"));
mkalphadecryptloader("abc");
Python: PHP
len() - length of string/array. strlen(),count()
range() - generate range of numbers for($i=0;$i<=number;$i++)
<< <<
the rest of them, like +=, == etc are pretty much the same across the 2 languages.

the rest of them, like +=, == etc are
pretty much the same across the 2
languages.
Careful; in PHP string concatenation is accomplished using .= not +=. If you try to use += PHP will try to evaluate the expression mathematically (probably returning a null) and you'll be pulling your hair out trying to figure out what's wrong with your script.

Standard algorithm to tokenize a string, keep delimiters (in PHP)

I want to split an arithmetic expression into tokens, to convert it into RPN.
Java has the StringTokenizer, which can optionally keep the delimiters. That way, I could use the operators as delimiters. Unfortunately, I need to do this in PHP, which has strtok, but that throws away the delimiters, so I need to brew something myself.
This sounds like a classic textbook example for Compiler Design 101, but I'm afraid I'm lacking some formal education here. Is there a standard algorithm you can point me to?
My other options are to read up on Lexical Analysis or to roll up something quick and dirty with the available string functions.

This might help.
Practical Uses of Tokenizer

As often, I would just use a regular expression to do this:
$expr = '(5*(7 + 2 * -9.3) - 8 )/ 11';
$tokens = preg_split('/([*\/^+-]+)\s*|([\d.]+)\s*/', $expr, -1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$tts = print_r($tokens, true);
echo "<pre>x=$tts</pre>";
It needs a little more work to accept numbers with exponent (like -9.2e-8).

OK, thanks to PhiLho, my final code is this, should anyone need it. It's not even really dirty. :-)
static function rgTokenize($s)
{
$rg = array();
// remove whitespace
$s = preg_replace("/\s+/", '', $s);
// split at numbers, identifiers, function names and operators
$rg = preg_split('/([*\/^+\(\)-])|(#\d+)|([\d.]+)|(\w+)/', $s, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
// find right-associative '-' and put it as a sign onto the following number
for ($ix = 0, $ixMax = count($rg); $ix < $ixMax; $ix++) {
if ('-' == $rg[$ix]) {
if (isset($rg[$ix - 1]) && self::fIsOperand($rg[$ix - 1])) {
continue;
} else if (isset($rg[$ix + 1]) && self::fIsOperand($rg[$ix + 1])) {
$rg[$ix + 1] = $rg[$ix].$rg[$ix + 1];
unset($rg[$ix]);
} else {
throw new Exception("Syntax error: Found right-associative '-' without operand");
}
}
}
$rg = array_values($rg);
echo join(" ", $rg)."\n";
return $rg;
}

Arithmetic with Arbitrarily Large Integers in PHP

The PHP GMP extension will be better for this. As an added bonus, you can use it to do your decimal-to-binary conversion, like so:
gmp_strval(gmp_init($n, 10), 2);

There are already various classes available for this so you may wish to look at them before writing your own solution (if indeed writing your own solution is still needed).

As far as I can tell, the bcmath extension is the one you'll want. The data in the PHP manual is a little sparse, but you out to be able to set the precision to be exactly what you need by using the bcscale() function, or the optional third parameter in most of the other bcmath functions. Not too sure on the binary strings thing, but a bit of googling tells me you ought to be able to do with by making use of the pack() function.

I implemented the following PEMDAS complaint BC evaluator which may be useful to you.
function BC($string, $precision = 32)
{
if (extension_loaded('bcmath') === true)
{
if (is_array($string) === true)
{
if ((count($string = array_slice($string, 1)) == 3) && (bcscale($precision) === true))
{
$callback = array('^' => 'pow', '*' => 'mul', '/' => 'div', '%' => 'mod', '+' => 'add', '-' => 'sub');
if (array_key_exists($operator = current(array_splice($string, 1, 1)), $callback) === true)
{
$x = 1;
$result = #call_user_func_array('bc' . $callback[$operator], $string);
if ((strcmp('^', $operator) === 0) && (($i = fmod(array_pop($string), 1)) > 0))
{
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string = array_shift($string), $x, $i = pow($i, -1)));
do
{
$x = $y;
$y = BC(sprintf('((%1$s * %2$s ^ (1 - %3$s)) / %3$s) - (%2$s / %3$s) + %2$s', $string, $x, $i));
}
while (BC(sprintf('%s > %s', $x, $y)));
}
if (strpos($result = bcmul($x, $result), '.') !== false)
{
$result = rtrim(rtrim($result, '0'), '.');
if (preg_match(sprintf('~[.][9]{%u}$~', $precision), $result) > 0)
{
$result = bcadd($result, (strncmp('-', $result, 1) === 0) ? -1 : 1, 0);
}
else if (preg_match(sprintf('~[.][0]{%u}[1]$~', $precision - 1), $result) > 0)
{
$result = bcmul($result, 1, 0);
}
}
return $result;
}
return intval(version_compare(call_user_func_array('bccomp', $string), 0, $operator));
}
$string = array_shift($string);
}
$string = str_replace(' ', '', str_ireplace('e', ' * 10 ^ ', $string));
while (preg_match('~[(]([^()]++)[)]~', $string) > 0)
{
$string = preg_replace_callback('~[(]([^()]++)[)]~', __FUNCTION__, $string);
}
foreach (array('\^', '[\*/%]', '[\+-]', '[<>]=?|={1,2}') as $operator)
{
while (preg_match(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), $string) > 0)
{
$string = preg_replace_callback(sprintf('~(?<![0-9])(%1$s)(%2$s)(%1$s)~', '[+-]?(?:[0-9]++(?:[.][0-9]*+)?|[.][0-9]++)', $operator), __FUNCTION__, $string, 1);
}
}
}
return (preg_match('~^[+-]?[0-9]++(?:[.][0-9]++)?$~', $string) > 0) ? $string : false;
}
It automatically deals with rounding errors, just set the precision to whatever digits you need.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.