Assign expression to the arc and execute it - php

I need to implement the following thing in my web-application. I know my solution is incorrect, but I put the code jsut to demonstrate the idea.
There is a class 'arc'. I need to be able to assign ANY expression to this arc (e.g. a+b+c,a-c,if-then). Once expression is assigned, I'd like to be able to execute it with some randomly taken variables. Is it possible to implement such functionality in web-applications? Maybe, I should use some plug-in like MathPL? Or maybe there is an absolutely different approach to tackle such kind of problems?
class arc {
var $arcexpression;
function setExpression($arcexpression) {
$this->arcexpression = $arcexpression;
}
function getExpression() {
return $this->arcexpression;
}
}
$arc = new arc();
$arc->setExpression("if a>b then return a else return b");
$result = $arc->execute(a,b); // the function 'execute' should be somehow described in 'arc'

You don't need to implement a whole language for this. I would start by limiting what can be done, for example, limit your expressions to arithmetic operators (+, -, *, /), parentheses and if-then operator. You'll need to enforce some sort of syntax for if-then to make it easier, possibly, the same as php's operator ?:. After that you need to build a parser for this grammar only: to parse a given expression into a tree. For example, expression `a + b * c' would parse into something like this:
+
/ \
a *
/ \
b c
After that you'll just have to evaluate such expressions. For example, by passing an array into your evaluate function of type { a => 1, b => 2, c => 3 }, you'll get 7 out of it.
The idea of the parse is the following:
Start from position 1 in the string - and call a recursive function to parse data from that position. In the function, start reading from the specified position.
If you read an opening parenthesis, call itself recursively
If you encounter a closing parenthesis or end-of-string, return the root node
Read the first identifier (or recursively inside parentheses)
Read the arithmetic sign
Read the second identifier (or recursively inside parentheses)
If the sign is * or /, then create the node with the sign in it and two operands as children and attach that node as the corresponding (left or right) child of the previous operator.
If the sign is + or -, then find create the node with the sign in it, one of the children being one of the operands and the second node being the root of the subtree with * and / at the root (or the second operand, if it's a simple operation).
Getting pure arithmetic, with parentheses, working is easy; if-then is a bit more tricky, but still not too bad. About 10 years ago I had to implement something like this in Java. It took me about 3 days to get everything sorted and was in total about 500 lines of code in 1 class, not counting javadoc. I suspect in PHP it will be less code, due to sheer simplicity of PHP syntax and type conversions.
It may sound complicated, but, in reality, it's much easier than it seems once you start doing it. I remember very well a university assignment to do something similar as part of the algorithms class, 17-18 years ago.

Related

Combine regex and string analysis to specify a required pattern for string input validation

I should firstly apologize for my probably rookie question, but I've just got no clue how to achieve that relatively complex task being a complete newbie regarding regex. What I need is to specify a validation pattern for a string input and perform separate checks on the separate segments of that pattern. So let's begin with the task itself. I'm working with php7.0 on laravel 5.4 (which should genuinely not make any difference) and I need to somehow produce a matching pattern for a string input, which pattern is the following:
header1: expression1; header2: expression2; header3: expression3 //etc...
What I'd need here is to check if each header is present and if it's present in a special validation list of available headers. So I'd need to extract each header.
Furthermore the expressions are built as follows
expression1 = (a1 + a2)*(a3-a1)
expression2 = b1*(b2 - b3)/b4
//etc...
The point is that each expression contains some numeric parameters which should form a valid arithmetic calculation. Those parameters should also be contained in a special list of available parameter placeholders, so I'd need to check them too. So, is there a simple efficient way (using regex and string analysis in pure php) to specify that strict structure or should I do everything step by step with exploding and try-catching?
An optimal solution would be a shorthand logic (or regex expression?) of a kind like:
$value->match("^n(header: expression)")
->delimitedBy(';')
->where(in_array($header, $allowed_headers))
->where(strtr($expression, array_fill_keys($available_param_placeholders, 0))->isValidArithmeticExpression())
I hope you can follow my logic. The code above would read as: Match N repetitions of the pattern "header: expression", delimited by ';', where 'header' (given that $header is its value) is in an array and where 'expression' (given that $expression is its value) forms a valid arithmetic expression when all available parameter placeholders have been replaced by 0. That's it all. Each deviation of that strict pattern should return false.
As an alternative I'm currently thinking of something like firstly exploding the string by the main delimiter (the semicolon) and then analysing each part separately. So I'll then have to check if there is a colon present, then if everything to the left of the colon matches a valid header name and if everythin to the right of the column forms a valid arithmetic expression when all param names from the list are replaced by a random value (like 0, just to check if the code executes, which I also don't know how to do). Anyway, that way seems like an overkill and I'm sure there should be a smoother way to specify the needed pattern.
I hope I've explained everything good enough and sorry if I'm being to messy explaining my problem. Thanks in advance for each piece of advice/help! Greatly appreciated!
Using eval() must always be Plan Z. With my understanding of your input string, this method may sufficiently validate the headers and expressions (if not, I think it should sufficiently sanitize the string for arithmetic parsing). I don't code in Laravel, so if this can be converted to Laravel syntax I'll leave that job for you.
Code: (Demo)
$test = "header1: (a1 + a2)*(a3-a1); header2: b1*(b2 - b3)/b4; header3: c1 * (((c2); header4: ((a1 * (a2 - b1))/(a3-a1))+b2";
$allowed_headers=['header1','header3','header4'];
$pairs=explode('; ',$test);
foreach($pairs as $pair){
list($header,$expression)=explode(': ',$pair,2);
if(!in_array($header,$allowed_headers)){
echo "$header is not permitted.";
}elseif(!preg_match('~^((?:[-+*/ ]+|[a-z]\d+|\((?1)\))*)$~',$expression)){ // based on https://stackoverflow.com/a/562729/2943403
echo "Invalid expression # $header: $expression";
}else{
echo "$header passed.";
}
echo "\n---\n";
}
Output:
header1 passed.
---
header2 is not permitted.
---
Invalid expression # header3: c1 * (((c2)
---
header4 passed.
---
I will admit the above pattern will match (+ )( +) so it is not the breast best pattern. So perhaps your question may be a candidate for using eval(). Although you may want to consider/research some of the github creations / plugins / parsers that can parse/tokenize an arithmetic expressions first.
Perhaps:
calculate math expression from a string using eval
How to evaluate formula passed as string in PHP?
Parse math operations with PHP
How to mathematically evaluate a string like "2-1" to produce "1"?
Any $pair that gets past the if and the elseif can move onto the evaluation process in the else.
I'll give you a headstart/hint about some general handling, but I'll shy away from giving any direct instruction to avoid the wrath of a certain population of critics.
}else{
// replace all variables with 0
//$expression=preg_replace('/[a-z]\d+/','0',$expression);
// or replace each unique variable with a whole number
$expression=preg_match_all('/[a-z]\d+/',$expression,$out)?strtr($expression,array_flip($out[0])):$expression; // variables become incremented whole numbers
// ... from here use $expression with eval() in a style/intent of your choosing.
// ... set a battery of try and catch statements to handle unsavory outcomes.
// https://www.sitepoint.com/a-crash-course-of-changes-to-exception-handling-in-php-7/
}
$test = "header1: (a1 + a2)*(a3-a1); header2: b1*(b2 - b3)/b4; header3: expression3";
$pairs = explode(';', $test);
$headers = [];
$expressions = [];
foreach ($pairs as $p) {
$he = explode(':', $p);
$headers[] = trim($he[0]);
$expressions[] = trim($he[1]);
}
foreach ($headers as $h) {
if (!in_array($h, $allowed_headers)) {
return false;
}
}
foreach ($expressions as $e) {
preg_match_all('/[a-z0-9]+/', $e, $matches);
foreach ($matches as $m) {
if (param_fails($m)) {
echo "Expression $e contains forbidden param $m.";
}
}
}
Regex appeared to be not as complicated as I thought when posting that question, so I've managed to achieve the pattern in its complete form by myself with the initial headstart owed to #mickmackusa. What I have finally come up with is that here, explained to you by regex101 itself: https://regex101.com/r/UHMrqL/1
The logic whic it's based on is described in the initial question. The only thing missing is the verification of the values of the headers and the names of the params, but that's easy to match afterwards with preg_match_all and verify with pure php checks. Thanks again for the attention and the help! :)

Converting pseudocode into usable (using regular expression?)

As part of the system I am writing, users can create their own custom Rules, to be run when certain events happen.
There are a set number of Objects they can use to create these rules, all of which have a set number of properties and methods:
So as an example of a rule, we could say:
“if this unit award is ‘Distinction’ then set all the criteria on this unit to award ‘Achieved’”
IF UNIT.award equals “Distinction”
THEN UNIT.criteria.set_award(‘A’)
“else if this unit award is ‘Merit’ then set the award of any criteria on this unit whose name starts with either ‘P’ or ‘M’ to ‘Achieved’”
IF UNIT.award equals “Merit”
THEN UNIT.criteria.filter(‘starts’, ‘name’, ‘P’, ‘M’).set_award(‘A’)
“else if this unit award is ‘Pass then set the award of any criteria on this unit whose name starts with ‘P’ to ‘Achieved’”
IF UNIT.award equals “Merit”
THEN UNIT.criteria.filter(‘starts’, ‘name’, ‘P’).set_award(‘A’)
The problem I am having, is I am just not sure how to take that string of object, properties & methods, e.g. “UNIT.criteria.filter(‘starts’, ‘name’, ‘P’).set_award(‘A’)” and convert it into something usable.
The end result I’d like to convert the string to would be something along the lines of:
So I can then convert that into the actual proper objects and return the relevant values or run the relevant methods.
Since there is only a set number of things I need to support (for now at least) and I don’t need anything complex like calculation support or variables, it seems overkill to create a Lexer system, so I was thinking of just using a regular expression to split all the sections.
So using the examples above, I could do a simple split on the “.” character, but if that character is used in a method parameter, e.g. “CRITERION.filter(‘is’, ‘name’, ‘P.1’)” then that screws it up completely.
I could use a less common character to split them, for example a double colon or something “::” but if for whatever reason someone puts that into a parameter it will still cause the same problem. I’ve tried creating a regular expression that splits on the character, only if it’s not between quotes, but I haven’t been able to get it to work.
So basically my question is: would a regular expression be the best way to do this? (If so, could anyone help me with getting it to ignore the specified character if it’s in a method). Or is there another way I could do this that would be easier/better?
Thanks.
I'd think an ORM language like eloquent could do this for you.
But if I had to do this then first I'd split the IF THEN ELSE parts.
Leaving:
UNIT.award equals “Distinction”
UNIT.criteria.filter(‘starts’, ‘name’, ‘P’, ‘M’).set_award(‘A’)
I'm guessing the "equals" could also be "not equals" or "greater" so...
I'd split the first bit around that.
/(?'ident'[a-z.]*?) (?'expression'equals|greater) (?'compare'[0-9a-z\“\”]+)/gi
But an explode around 'equals' will do the same.
Then I'd explode the second part around the dots.
Giving:
UNIT
criteria
filter(a,b,c,d)
set_ward(e)
Pop off the first 2 to get object and property and then a list of possible filters and actions.
But frankly I'd would develop a language that would not mix properties with actions and filters.
Something like:
IF object.prop EQUALS const|var
THEN UPDATE object.prop
WITH const|var [WHERE object.prop filter const|var [AND|OR const|var]]
Eloquent does it straight in php:
DB::table('users')
->where('id', 1)
->update(['votes' => 1]);
So maybe I'd do something like:
THEN object.prop->filter(a,b,c,d)->set('award','A')
This makes it easy to split actions around -> and properties around .
Anyway...
I do my Regex on https://regex101.com/
Hope this helps.

PHP regex parsing - splitting tokens in my own language. Is there a better way?

I am creating my own language.
The goal is to "compile" it to PHP or Javascript, and, ultimately, to interpret and run it on the same language, to make it look like a "middle-level" language.
Right now, I'm focusing on the aspect of interpreting it in PHP and run it.
At the moment, I'm using regex to split the string and extract the multiple tokens.
This is the regex I have:
/\:((?:cons#(?:\d+(?:\.\d+)?|(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|(?:[a-z]+(?:#[a-z]+)?|\^?[\~\&](?:[a-z]+|\d+|\-1)))/g
This is quite hard to read and maintain, even though it works.
Is there a better way of doing this?
Here is an example of the code for my language:
:define:&0:factorial
:param:~0:static
:case
:lower#equal:cons#1
:case:end
:scope
:return:cons#1
:scope:end
:scope
:define:~0:static
:define:~1:static
:require:static
:call:static#sub:^~0:~1 :store:~0
:call:&-1:~0 :store:~1
:call:static#sum:^~0:~1 :store:~0
:return:~0
:scope:end
:define:end
This defines a recursive function to calculate the factorial (not so well written, that isn't important).
The goal is to get what is after the :, including the #. :static#sub is a whole token, saving it without the :.
Everything is the same, except for the token :cons, which can take a value after. The value is a numerical value (integer or float, called static or dynamic in the language, respectively) or a string, which must start and end with ", supporting escaping like \". Multi-line strings aren't supported.
Variables are the ones with ~0, using ^ before will get the value to the above :scope.
Functions are similar, being used &0 instead and &-1 points to the current function (no need for ^&-1 here).
Said this, Is there a better way to get the tokens?
Here you can see it in action: http://regex101.com/r/nF7oF9/2
[Update] To issue the pattern being complicated and maintainability, you can split it using PCRE_EXTENDED, and comments:
preg_match('/
# read constant (?)
\:((?:cons#(?:\d+(?:\.\d+)?|
# read a string (?)
(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|
# read an identifier (?)
(?:[a-z]+(?:#[a-z]+)?|
# read whatever
\^?[\~\&](?:[a-z]+|\d+|\-1)))
/gx
', $input)
Beware that all space are ignored, except under certain conditions (\n is normally "safe").
Now, if you want to pimp you lexer and parser, then read that:
What does (f)lex [GNU equivalent of LEX] is simply let you pass a list of regexp, and eventually a "group". You can also try ANTLR and PHP Target Runtime to get the work done.
As for you request, I've made a lexer in the past, following the principle of FLEX. The idea is to cycle through the regexp like FLEX does:
$regexp = [reg1 => STRING, reg2 => ID, reg3 => WS];
$input = ...;
$tokens = [];
while ($input) {
$best = null;
$k = null;
for ($regexp as $re => $kind) {
if (preg_match($re, $input, $match)) {
$best = $match[0];
$k = $kind;
break;
}
}
if (null === $best) {
throw new Exception("could not analyze input, invalid token");
}
$tokens[] = ['kind' => $kind, 'value' => $best];
$input = substr($input, strlen($best)); // move.
}
Since FLEX and Yacc/Bison integrates, the usual pattern is to read until next token (that is, they don't do a loop that read all input before parsing).
The $regexp array can be anything, I expected it to be a "regexp" => "kind" key/value, but you can also an array like that:
$regexp = [['reg' => '...', 'kind' => STRING], ...]
You can also enable/disable regexp using groups (like FLEX groups works): for example, consider the following code:
class Foobar {
const FOOBAR = "arg";
function x() {...}
}
There is no need to activate the string regexp until you need to read an expression (here, the expression is what come after the "="). And there is no need to activate the class identifier when you are actually in a class.
FLEX's group permits to read comments, using a first regexp, activating some group that would ignore other regexp, until some matches is done (like "*/").
Note that this approach is a naïve approach: a lexer like FLEX will actually generate an automaton, which use different state to represent your need (the regexp is itself an automaton).
This use an algorithm of packed indexes or something alike (I used the naïve "for each" because I did not understand the algorithm enough) which is memory and speed efficient.
As I said, it was something I made in the past - something like 6/7 years ago.
It was on Windows.
It was not particularly quick (well it is O(N²) because of the two loops).
I think also that PHP was compiling the regexp each times. Now that I do Java, I use the Pattern implementation which compile the regexp once, and let you reuse it. I don't know PHP does the same by first looking into a regexp cache if there was already a compiled regexp.
I was using preg_match with an offset, to avoid doing the substr($input, ...) at the end.
You should try to use the ANTLR3 PHP Code Generation Target, since the ANTLR grammar editor is pretty easy to use, and you will have a really more readable/maintainable code :)

Parse indented list into boolean tree

I have a piece of text like the following
foo
and foo2
and bar
or something
and somethingElse
or somethingElse2
or somethingElse3
and baz
or godknows
or godknows2
This should be interpreted as:
(
foo
&& foo2
&& (bar || (something && (somethingElse || somethingElse2 || somethingElse 3)))
&& (baz || godknows || godknows2)
)
At the moment I'm reading line by line. I know that I need to measure the indentation and parse the expression of the next line in order to figure out the expression that the current line belongs too, but I'm having trouble figuring out how to do that usefully without consuming the next line too.
It seems like the kind of problem which has a recursive solution, but it's escaping me.
The input format isn't fixed, I just want to be able to turn a relatively readable expression into a tree of booleans, so if you can answer with a more suitable format which is still readable, please do :)
Python, which uses this style of indentation, does its parsing by maintaining a stack of indentation levels. Upon seeing a new line, it determines whether it has been indented from the previous line by seeing whether the current depth has increased. If so, Python pretends that there was an invisible symbol called "INDENT" that was inserted into the input stream. It then pushes the new depth onto the stack.
If the indentation decreases, Python repeatedly pops the stack and pretends that an invisible symbol called "DEDENT" was inserted into the input stream until the indentation level matches the value on the stack.
You could probably adapt this approach very easily here by replacing "INDENT" and "DEDENT" with ( and ). You would need to do a minor transformation afterwards by making sure that the ( token was inserted before the previous variable, but I'd expect this isn't too hard.
With that change, you should be able to parse this extremely easily. For example, the script
A
and B
or C
and D
or E
Would transform into
A and (B or (C and D))) or E
Hope this helps!

Can you explain Perl's hash system to a PHP guy?

How do Perl hashes work?
Are they like arrays in PHP or some completely different beast?
From what I understand all it is is an associative array right? This is what I thought until I began
to talk to a Perl programmer who told me I was completely wrong, but couldn't explain it in a way
that didn't make my eyes cross.
Anyway, the way that I thought it worked was like this
PHP's:
$argv['dog_name'] = 'missy';
$argv[0] = 'tree';
same as Perl's:
my %argv{'dog_name'} = 'missy';
my $argv[0] = 'tree';
Right? But you cannot print(%argv{'dog_name'}), you have to (revert?) to print($argv{'dog_name'}) which is confusing?
Is it trying to print as a variable now, like you would in PHP, echo $argv['dog_name']; ? Does this mean (again) that a hash is
just a PHP associative array with a % to declare but a $ to access?
I don't know, I'm hoping some PHP/Perl Guru can explain how hashes work, and how similar they are
to PHP's arrays.
To write
$argv['dog_name'] = 'missy';
$argv[0] = 'tree';
in Perl, you would write it as follows:
$argv{dog_name} = 'missy';
$argv{0} = 'tree';
if you had strict on, which you should, then you will need to predeclare the variable:
my %argv;
$argv{dog_name} = 'missy';
$argv{0} = 'tree';
If the above is a bit repetitive for you, you could write it:
my %argv = (
dog_name => 'missy',
0 => 'tree',
);
You can find more detail on the perldata manpage.
In short, the reasons why the sigils change from % to $ is that %hash refers to a plural hash (a list of key value pairs), and $hash{foo} refers to a single element of a hash. This is the same with arrays, where # refers to the full array, and $ refers to a single element. (for both arrays and hashes a leading # sigil with a subscript means a slice of the data, where multiple keys are passed and a list of values are returned)
To elaborate slightly on Ambrose's answer, the reason for your confusion is the difference between the philosophy of using sigils in Perl and PHP.
In PHP, the sigil is attached to the identifyer. E.g. a hash identifyer will ALWAYS have a hash sigil around it.
In Perl, a sigil is attached to the way you are accessing the data structure (are you accessing 1 value, a list of values, or a whole hash of values) - for details see other excellent answers such as Eric's.
%argv{'dog_name'} is a syntax error. You need $argv{'dog_name'} instead.
But you are correct that a perl hash is just an associative array (why perl chose to use a different terminology, I don't know).
For a complete understanding of hashes, I recommend reading any of the vast number of perl tutorials or books that cover the topic. Programming Perl is an excellent choice, or here's a random online tutorial I found as well.
I would, as Flimzy, also recommend Programming Perl. As a recent PHP to Perl convert myself, it has taught me a great deal about the language.
The % symbol is used to create a full 'associative array', as we would think of it. For example, I could create an associative array by doing the following:
%hash = ('key1' => 'value1', 'key2' => 'value2');
I could then print it out like so:
print %hash;
The output would be something like:
'key2value2key1value1'
This is, I believe, known as 'list context', since the % indicates that we are talking about a range of values.
On the other hand, if I wanted to access a single value, we would have to use the $ sigil. This, as 'Programming Perl' tells us, can be thought of as an 'S' for 'Scalar'. We have to use the $ sign whenever we are talking about a singular value.
So, to access an individual item in the array, I would have to use the following syntax:
print $hash{'key1'};
The same is true of arrays. A full array can be created like so:
#array = ('abc', '123');
and then printed like so:
print #array;
But, to access a single element of the array I would type instead:
print $array[0];
There are lots of basic principles here. You should read about 'list context' and 'scalar context' in some detail. Before long you will also want to look at references, which are the things you use to create multimensional structures in Perl. I really would recommend 'Programming Perl'! It was a difficult read in chapters, but it certainly does cover everything you need to know (and more).
The sigil changing really isn't as complicated as you make it sound. You already do this in English without thinking about it.
If you have a set of cars, then you would talk about "these cars" (or "those cars"). That's like an array.
my #cars = ('Vauxhall', 'Ford', 'Rolls Royce');
If you're talking about just one car from that set, you switch to using "this car". That's like a single element from an array.
say $car[1]; # prints 'Ford';
Similar rules also apply to hashes.
I would say your confusion is partly caused by one simple fact. Perl has different sigils for different things. PHP has one sigil for everything.
So whether you're putting something into an array/hash, or getting something out, or declaring a simple scalar variable, in PHP you always use the dollar sign.
With perl you need to be more specific, that's all.
The "sigil", i.e. the character before the variable name, denotes the amount of data being accessed, as follows:
If you say $hash{key}, you are using scalar context, i.e. one value.
For plural or list context, the sigil changes to #, therefore #hash{('key1', 'key2')} returns a list of two values associated with the two keys respectivelly (might be written as #hash{qw(key1 key2)}, too).
%hash is used to acces the hash as a whole.
The same applies to arrays: $arr[0] = 1, but #arr[1 .. 10] = (10) x 10.
I hope that you are not expecting to get a full tutorial regarding perl hashes here. You don't need a Perl guru to explain you hashes, just a simple google search.
http://www.perl.com/pub/2006/11/02/all-about-hashes.html
PS: please increase your accept ratio - 62% is pretty low

Categories