Converting pseudocode into usable (using regular expression?)

Converting pseudocode into usable (using regular expression?) - php

As part of the system I am writing, users can create their own custom Rules, to be run when certain events happen.
There are a set number of Objects they can use to create these rules, all of which have a set number of properties and methods:
So as an example of a rule, we could say:
“if this unit award is ‘Distinction’ then set all the criteria on this unit to award ‘Achieved’”
IF UNIT.award equals “Distinction”
THEN UNIT.criteria.set_award(‘A’)
“else if this unit award is ‘Merit’ then set the award of any criteria on this unit whose name starts with either ‘P’ or ‘M’ to ‘Achieved’”
IF UNIT.award equals “Merit”
THEN UNIT.criteria.filter(‘starts’, ‘name’, ‘P’, ‘M’).set_award(‘A’)
“else if this unit award is ‘Pass then set the award of any criteria on this unit whose name starts with ‘P’ to ‘Achieved’”
IF UNIT.award equals “Merit”
THEN UNIT.criteria.filter(‘starts’, ‘name’, ‘P’).set_award(‘A’)
The problem I am having, is I am just not sure how to take that string of object, properties & methods, e.g. “UNIT.criteria.filter(‘starts’, ‘name’, ‘P’).set_award(‘A’)” and convert it into something usable.
The end result I’d like to convert the string to would be something along the lines of:
So I can then convert that into the actual proper objects and return the relevant values or run the relevant methods.
Since there is only a set number of things I need to support (for now at least) and I don’t need anything complex like calculation support or variables, it seems overkill to create a Lexer system, so I was thinking of just using a regular expression to split all the sections.
So using the examples above, I could do a simple split on the “.” character, but if that character is used in a method parameter, e.g. “CRITERION.filter(‘is’, ‘name’, ‘P.1’)” then that screws it up completely.
I could use a less common character to split them, for example a double colon or something “::” but if for whatever reason someone puts that into a parameter it will still cause the same problem. I’ve tried creating a regular expression that splits on the character, only if it’s not between quotes, but I haven’t been able to get it to work.
So basically my question is: would a regular expression be the best way to do this? (If so, could anyone help me with getting it to ignore the specified character if it’s in a method). Or is there another way I could do this that would be easier/better?
Thanks.

I'd think an ORM language like eloquent could do this for you.
But if I had to do this then first I'd split the IF THEN ELSE parts.
Leaving:
UNIT.award equals “Distinction”
UNIT.criteria.filter(‘starts’, ‘name’, ‘P’, ‘M’).set_award(‘A’)
I'm guessing the "equals" could also be "not equals" or "greater" so...
I'd split the first bit around that.
/(?'ident'[a-z.]*?) (?'expression'equals|greater) (?'compare'[0-9a-z\“\”]+)/gi
But an explode around 'equals' will do the same.
Then I'd explode the second part around the dots.
Giving:
UNIT
criteria
filter(a,b,c,d)
set_ward(e)
Pop off the first 2 to get object and property and then a list of possible filters and actions.
But frankly I'd would develop a language that would not mix properties with actions and filters.
Something like:
IF object.prop EQUALS const|var
THEN UPDATE object.prop
WITH const|var [WHERE object.prop filter const|var [AND|OR const|var]]
Eloquent does it straight in php:
DB::table('users')
->where('id', 1)
->update(['votes' => 1]);
So maybe I'd do something like:
THEN object.prop->filter(a,b,c,d)->set('award','A')
This makes it easy to split actions around -> and properties around .
Anyway...
I do my Regex on https://regex101.com/
Hope this helps.

Related

Assign expression to the arc and execute it

I need to implement the following thing in my web-application. I know my solution is incorrect, but I put the code jsut to demonstrate the idea.
There is a class 'arc'. I need to be able to assign ANY expression to this arc (e.g. a+b+c,a-c,if-then). Once expression is assigned, I'd like to be able to execute it with some randomly taken variables. Is it possible to implement such functionality in web-applications? Maybe, I should use some plug-in like MathPL? Or maybe there is an absolutely different approach to tackle such kind of problems?
class arc {
var $arcexpression;
function setExpression($arcexpression) {
$this->arcexpression = $arcexpression;
}
function getExpression() {
return $this->arcexpression;
}
}
$arc = new arc();
$arc->setExpression("if a>b then return a else return b");
$result = $arc->execute(a,b); // the function 'execute' should be somehow described in 'arc'

You don't need to implement a whole language for this. I would start by limiting what can be done, for example, limit your expressions to arithmetic operators (+, -, *, /), parentheses and if-then operator. You'll need to enforce some sort of syntax for if-then to make it easier, possibly, the same as php's operator ?:. After that you need to build a parser for this grammar only: to parse a given expression into a tree. For example, expression `a + b * c' would parse into something like this:
+
/ \
a *
/ \
b c
After that you'll just have to evaluate such expressions. For example, by passing an array into your evaluate function of type { a => 1, b => 2, c => 3 }, you'll get 7 out of it.
The idea of the parse is the following:
Start from position 1 in the string - and call a recursive function to parse data from that position. In the function, start reading from the specified position.
If you read an opening parenthesis, call itself recursively
If you encounter a closing parenthesis or end-of-string, return the root node
Read the first identifier (or recursively inside parentheses)
Read the arithmetic sign
Read the second identifier (or recursively inside parentheses)
If the sign is * or /, then create the node with the sign in it and two operands as children and attach that node as the corresponding (left or right) child of the previous operator.
If the sign is + or -, then find create the node with the sign in it, one of the children being one of the operands and the second node being the root of the subtree with * and / at the root (or the second operand, if it's a simple operation).
Getting pure arithmetic, with parentheses, working is easy; if-then is a bit more tricky, but still not too bad. About 10 years ago I had to implement something like this in Java. It took me about 3 days to get everything sorted and was in total about 500 lines of code in 1 class, not counting javadoc. I suspect in PHP it will be less code, due to sheer simplicity of PHP syntax and type conversions.
It may sound complicated, but, in reality, it's much easier than it seems once you start doing it. I remember very well a university assignment to do something similar as part of the algorithms class, 17-18 years ago.

Can you explain Perl's hash system to a PHP guy?

How do Perl hashes work?
Are they like arrays in PHP or some completely different beast?
From what I understand all it is is an associative array right? This is what I thought until I began
to talk to a Perl programmer who told me I was completely wrong, but couldn't explain it in a way
that didn't make my eyes cross.
Anyway, the way that I thought it worked was like this
PHP's:
$argv['dog_name'] = 'missy';
$argv[0] = 'tree';
same as Perl's:
my %argv{'dog_name'} = 'missy';
my $argv[0] = 'tree';
Right? But you cannot print(%argv{'dog_name'}), you have to (revert?) to print($argv{'dog_name'}) which is confusing?
Is it trying to print as a variable now, like you would in PHP, echo $argv['dog_name']; ? Does this mean (again) that a hash is
just a PHP associative array with a % to declare but a $ to access?
I don't know, I'm hoping some PHP/Perl Guru can explain how hashes work, and how similar they are
to PHP's arrays.

To write
$argv['dog_name'] = 'missy';
$argv[0] = 'tree';
in Perl, you would write it as follows:
$argv{dog_name} = 'missy';
$argv{0} = 'tree';
if you had strict on, which you should, then you will need to predeclare the variable:
my %argv;
$argv{dog_name} = 'missy';
$argv{0} = 'tree';
If the above is a bit repetitive for you, you could write it:
my %argv = (
dog_name => 'missy',
0 => 'tree',
);
You can find more detail on the perldata manpage.
In short, the reasons why the sigils change from % to $ is that %hash refers to a plural hash (a list of key value pairs), and $hash{foo} refers to a single element of a hash. This is the same with arrays, where # refers to the full array, and $ refers to a single element. (for both arrays and hashes a leading # sigil with a subscript means a slice of the data, where multiple keys are passed and a list of values are returned)

To elaborate slightly on Ambrose's answer, the reason for your confusion is the difference between the philosophy of using sigils in Perl and PHP.
In PHP, the sigil is attached to the identifyer. E.g. a hash identifyer will ALWAYS have a hash sigil around it.
In Perl, a sigil is attached to the way you are accessing the data structure (are you accessing 1 value, a list of values, or a whole hash of values) - for details see other excellent answers such as Eric's.

%argv{'dog_name'} is a syntax error. You need $argv{'dog_name'} instead.
But you are correct that a perl hash is just an associative array (why perl chose to use a different terminology, I don't know).
For a complete understanding of hashes, I recommend reading any of the vast number of perl tutorials or books that cover the topic. Programming Perl is an excellent choice, or here's a random online tutorial I found as well.

I would, as Flimzy, also recommend Programming Perl. As a recent PHP to Perl convert myself, it has taught me a great deal about the language.
The % symbol is used to create a full 'associative array', as we would think of it. For example, I could create an associative array by doing the following:
%hash = ('key1' => 'value1', 'key2' => 'value2');
I could then print it out like so:
print %hash;
The output would be something like:
'key2value2key1value1'
This is, I believe, known as 'list context', since the % indicates that we are talking about a range of values.
On the other hand, if I wanted to access a single value, we would have to use the $ sigil. This, as 'Programming Perl' tells us, can be thought of as an 'S' for 'Scalar'. We have to use the $ sign whenever we are talking about a singular value.
So, to access an individual item in the array, I would have to use the following syntax:
print $hash{'key1'};
The same is true of arrays. A full array can be created like so:
#array = ('abc', '123');
and then printed like so:
print #array;
But, to access a single element of the array I would type instead:
print $array[0];
There are lots of basic principles here. You should read about 'list context' and 'scalar context' in some detail. Before long you will also want to look at references, which are the things you use to create multimensional structures in Perl. I really would recommend 'Programming Perl'! It was a difficult read in chapters, but it certainly does cover everything you need to know (and more).

The sigil changing really isn't as complicated as you make it sound. You already do this in English without thinking about it.
If you have a set of cars, then you would talk about "these cars" (or "those cars"). That's like an array.
my #cars = ('Vauxhall', 'Ford', 'Rolls Royce');
If you're talking about just one car from that set, you switch to using "this car". That's like a single element from an array.
say $car[1]; # prints 'Ford';
Similar rules also apply to hashes.

I would say your confusion is partly caused by one simple fact. Perl has different sigils for different things. PHP has one sigil for everything.
So whether you're putting something into an array/hash, or getting something out, or declaring a simple scalar variable, in PHP you always use the dollar sign.
With perl you need to be more specific, that's all.

The "sigil", i.e. the character before the variable name, denotes the amount of data being accessed, as follows:
If you say $hash{key}, you are using scalar context, i.e. one value.
For plural or list context, the sigil changes to #, therefore #hash{('key1', 'key2')} returns a list of two values associated with the two keys respectivelly (might be written as #hash{qw(key1 key2)}, too).
%hash is used to acces the hash as a whole.
The same applies to arrays: $arr[0] = 1, but #arr[1 .. 10] = (10) x 10.

I hope that you are not expecting to get a full tutorial regarding perl hashes here. You don't need a Perl guru to explain you hashes, just a simple google search.
http://www.perl.com/pub/2006/11/02/all-about-hashes.html
PS: please increase your accept ratio - 62% is pretty low

Find function and line number where variable gets modified

Let's say that at the beginning of a random function variable $variables['content'] is 1,000 characters long.
This random function is very long, with many nested functions within.
At the end of the function $variables['content'] is only 20 characters long.
How do you find which of nested functions modified this variable?

Not sure how you'd want to return it but you could use the magic constant __LINE__. That returns the line of the current document.
You could create a variable called $variables['line'] and assign __LINE__ as the value, where appropriate.

If it were me, first I'd consider breaking apart the 1000-line beast. There's probably no good reason for it to be so huge. Yes, it'll take longer than just trying to monkey-patch your current bug, but you'll probably find dozens more bugs in that function just trying to break it apart.
Lecture over, I'd do a search/replace for $variables['content'].*=([^;]*); to a method call like this: $variables['content'] = hello(\1, __LINE__);. This will fail if you are assigning strings with semicolons in them or something similar, so make sure you inspect every change carefully. Write a hello() function that takes two parameters: whatever it is you're assigning to $variables['content'] and the line number. In hello(), simply print your line number to the log or standard error or whatever is most convenient, and then return the first argument unchanged.
When you're done fixing it all up, you can either remove all those silly logging functions, or you can see if 'setting the $variables['content'] action' is important enough to have its own function that does something useful. Refactoring can start small. :)

I think this is a problem code tracing can help with.
In my case I had this variable that was being modified across many functions, and I didn't know where.
The problem is that at some point in the program the variable (a string) was around 40,000 characters, then at the end of the program something had cut it to 20 characters.
To find this information I stepped through the code with the Zend debugger. I found the information I wanted (what functions modified the variable), but it took me a while.
Apparently XDebug does tell you what line numbers, in what functions the variables are modified:
Example, tracing documentation, project home, tutorial article.

parse search string

I have search strings, similar to the one bellow:
energy food "olympics 2010" Terrorism OR "government" OR cups NOT transport
and I need to parse it with PHP5 to detect if the content belongs to any of the following clusters:
AllWords array
AnyWords array
NotWords array
These are the rules i have set:
If it has OR before or after the word or quoted words if belongs to
AnyWord.
If it has a NOT before word or quoted words it belongs to NotWords
If it has 0 or more more spaces before the word or quoted phrase it
belongs to AllWords.
So the end result should be something similar to:
AllWords: (energy, food, "olympics 2010")
AnyWords: (terrorism, "government", cups)
NotWords: (Transport)
What would be a good way to do this?

If you want to do this with Regex, be aware that your parsing will break on stupid user input (the user, not the input =) ).
I'd try the following Regexes.
NotWords:
(?<=NOT\s)\b((?!NOT|OR)\w+|"[^"]+")\b
AllWords:
(?<!OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?!\s+OR)
AnyWords:
Well.. the rest. =) They are not that easy to spot, since I do not know how to put "OR behind it or OR in front of it" into regex. Maybe you could join the results from the three regexes
(?<=OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?!\s+OR)
(?<=OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?=\s+OR)
(?<!OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?=\s+OR)
Problems: These require exactly one space between modifier words and expressions. PHP only supports lookbehinds for fixes length expressions, so I see no way around that, sorry. You could just use \b(\w+|"[^"]+")\b to split the input, and parse the resulting array manually.

This is an excellent example of how an test-first driven approach can help you arrive at a solution. It might not be the very best one, but having tests written allow you to refactor with confidence and instantly see if you break any of the existing tests. Anyway, you could set up a few tests like:
public function setUp () {
$this->searchParser = new App_Search_Parser();
}
public function testSingleWordParsesToAllWords () {
$this->searchParser->parse('Transport');
$this->assertEquals(
$this->searchParser->getAllWords(),
array('Transport')
);
$this->assertEquals($this->searchParser->getNotWords(), array());
$this->assertEquals($this->searchParser->getAnyWords());
}
public function testParseOfCombinedSearchString () {
$query = 'energy food "olympics 2010" Terrorism ' .
'OR "government" OR cups NOT transport';
$this->searchParser->parse($query);
$this->assertEquals(
$this->searchParser->getAllWords(),
array('energy', 'food', 'olympics 2010')
);
$this->assertEquals(
$this->searchParser->getNotWords(),
array('Transport')
);
$this->assertEquals(
$this->searchParser->getAnyWords(),
array( 'terrorism', 'government', 'cups')
);
}
Other good tests would include:
testParseTwoWords
testParseTwoWordsWithOr
testParseSimpleWithNot
testParseInvalid
Here you have to decide what invalid input looks like and how you interpret it, i.e:
'NOT Transport': Search for anything that doesn't contain Transport or inform the user that he has to include at least one search term too?
'OR energy': Is it ok to begin with a combinator?
'food OR NOT energy': Does this mean "search for food or anything that doesn't contain energy", or does it mean "search for food and not energy", or doesn't it mean anything? (i.e. throw exception, return false or whatnot)
testParseEmpty
Then, write the tests one by one, and write a simple solution that passes the test. Then refactor and make it right, and run again to see that you still pass the test.
Once a test passes and the code is refactored, then write the next test and repeat the procedure. Add more tests as you find special cases and refactor the code so that it passes all tests. If you break a test, back-up and re-write the code (not the test!) such that it passes.
As for how you can solve this problem, look into preg_match, strtok or rely simply loop through the string adding up tokens as you go.

Too big data objects

Recently, I have identified very strong code smell in one of the project, I'm working for.
Especially, in its cost counting feature. To count the total cost for some operation, we have to pass many kind of information to the "parser class".
For example:
phone numbers
selected campaigns
selected templates
selected contacts
selected groups
and about 2-4 information types more
Before refactoring there were all this parameters were passed through constructor to the Counter class(8 parameters, you can image that..).
To increase readability I have introduced a Data class, named CostCountingData with readable getters and setters to all this properties.
But I don't think, that that code became much readable after this refactoring:
$cost_data = new CostCountingData();
$cost_data->setNumbers($numbers);
$cost_data->setContacts($contacts);
$cost_data->setGroups($groups);
$cost_data->setCampaigns($campaigns);
$cost_data->setUser($user);
$cost_data->setText($text);
$cost_data->setTotalQuantity($total_quantity);
$CostCounter = new TemplateReccurentSendingCostCounter($cost_data);
$total_cost = $CostCounter->count();
Can you tell me whether there is some problem with readability of this code snippet?
Maybe you can see any code smells and can point me at them..
The only idea I have how to refactore this code, is to split this big data object to several, consisting related data types. But I'm not sure whether should I do this or not..
Whay do you think about it?

If what you want is named parameters (which it looks to me is what you are trying to achieve), would you ever consider just passing in an associative array, with the names as keys? There is also the Named Parameter Idiom, I can only find a good reference for this for C++ though, perhaps it's called something else by PHP programmers.
For that, you'd change your methods on CostCountingData so they all returned the original object. That way you could rewrite your top piece of code to this:
$cost_data = new CostCountingData();
$cost_data
->setNumbers($numbers)
->setContacts($contacts)
->setGroups($groups)
->setCampaigns($campaigns)
->setUser($user)
->setText($text)
->setTotalQuantity($total_quantity);
So there's two options. I would probably go for the named parameter idiom myself as it is self-documenting, but I don't think the difference is too great.

There are a few possibilities here:
If you only need a few of the parameters each time, the named parameter approach suggested elsewhere in this answer is a nice solution.
If you always need all of the parameters, I would suggest that the smell is coming from a broken semantic model of the data required to calculate a cost.
No matter how you dress up a call to a method that takes eight arguments, the fact that eight supposedly unrelated pieces of information are required to calculate something will always smell.
Take a step back:
Are the eight separate arguments really all unrelated, or can you compose intermediate objects that reflect the reality of the situation a little better? Products? invoice items? Something else?
Can the cost method be split into smaller methods that calculate part of the cost based on fewer arguments, and the total cost produced from adding up the component costs?

If I needed the data object (and classes called "data" are themselves a smell) I'd set its values in its constructor. But the interesting question is where are these values coming from? The source of the values should itself be an object of some sort, and its probably the one you are really interested in.
Edit: If the data is coming from POST, then I would make the class be wrapper around the POST data. I would not provide any setter functions, and would make the read accessors look like this (my PHP is more than a little rusty):
class CostStuff {
constructor CostStuff( $postdata ) {
$mypost = $postdata;
}
function User() {
return $mypost[ "user_name" ];
}
...
}

What do you think is wrong with your code snippet? If you have several variables that you've already used that you now need to get into $cost_data, then you'll need to set each one in turn. There's no way around this.
A question though, is why do you have these variables $numbers, $contracts, etc. that you're now deciding you need to move into $cost_data ?
Is the refactoring you're looking for possibly that the point in the code where you set $numbers should actually be the point you're setting $cost_data->numbers, and having the $numbers variable at all is superfluous?
E.g.
$numbers=getNumbersFromSomewhere() ;
// do stuff
$contracts=getContracstFromSomewhere() ;
// do stuff
$cost_data=new dataObject() ;
$cost_data->setNumbers($numbers);
$cost_data->setContracts($contracts) ;
$cost_data->someOperation() ;
becomes
$cost_data=new dataObject() ;
$cost_data->setNumbers(getNumbersFromSomewhere()) ;
// do stuff
$cost_data->setContracts(getContractsFromSomewhere()) ;
// do stuff
$cost_data->someOperation() ;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.