So I am working on a simple micro language/alternative syntax for PHP.
Its syntax takes a lot from JavaScript and CoffeeScript including a few of my own concepts. I have hand written the parser (no parser generator used) in PHP to convert the code into PHP then execute it. It is more of a proof of concept/learning tool rather than anything else but I'd be lying if I said I didn't want to see it used on an actual project one day.
Anyway here is a little problem I have come across that I thought I would impose on you great intellects:
As you know in PHP the period ( . ) is used for string concatenation. However in JavaScript it is used for method chaining.
Now one thing that annoys me in PHP is having to do use that bloody arrow (->) for my method chains, so I went the JavaScript way and implemented the period (.) for use with objects.
(I think you can see the problem already)
Because I'm currently only writing a 'dumb' parser that merely does a huge search and replace, there is no way to distinguish whether a period (.) is being used for concatenation or for method chaining.
"So if you are trying to be like JavaScript, just use the addition (+) operator Franky!", I hear you scream. Well I would but because the addition (+) operator is used for math in PHP I would merely be putting myself in the same situation.
Unless I can make my parser smart enough (with a crap load of work) to know that when the addition (+) operator is working with integers then don't convert it into a period (.) for concatenation I am pretty much screwed.
But here is the cool thing. Because this is pretty much a new language. I don't have to use the period or addition operator for concatenation.
So my question is: If I was to decide to introduce a new method of string concatenation, what character would make the most sense?
Does it have to be one character? .. could work!
Any myriad of combinations, like ~~ or >: even!
If you don't want to use + or ., then I would recommend ^ because that's used in some other languages for string concatenation and I don't believe that it's used for anything in PHP.
Edit: It's been pointed out that it's used for XOR. One option would be to use ^ anyway since bitwise XOR is not commonly used and then to map something else like ^^ to XOR. Another option would be to use .. for concatenation. The problem is that the single characters are mostly taken.
Another option would be to use +, but map it to a function which concatenates when one argument is a string and adds otherwise. In order to not break things which rely on strings which are numbers being treated as their values, we should probably treat numeric strings as numbers for these purposes. Here's the function that I would use.
function smart_add($arg1,$arg2) {
if ($arg1.is_numeric() && $arg2.is_numeric()) {
return $arg1 + $arg2;
} else {
return $arg1 . $arg2;
}
}
Then a + b + c + d just gets turned into smart_add(smart_add(smart_add(a,b),c),d)
This may not be perfect in all cases, but it should work pretty well most of the time and has clear rules for use.
So my question is: If I was to decide to introduce a new method of
string concatenation, what character would make the most sense?
As you're well aware of, you'll need to chose a character that is not being used as one of PHP's operators. Since string concatenation is a common technique, I would try to avoid using characters that you need to press SHIFT to type, as those characters will be a hindrance.
Instead of trying to assign one character for string concatenation (as most are already in use), perhaps you should define your own syntax for string concatenation (or any other operation you need to overwrite with a different operator), as a shorthand operator (sort of). Something like:
[$string, $string]
Should be easy to pick up by a parser and form the resulting concatenated string.
Edit: I should also note that whether you're using literal strings or variables, there's no way (as far as I know) to confuse this syntax with any other PHP functionality, since the comma in the middle is invalid for array manipulations. So, all of the following would still be recognized as string concatenation and not something else in PHP.
["stack", "overflow"]
["stack", $overflow]
[$stack, $overflow]
Edit: Since this conflicts to JSON notation, the following alternative variations exist:
Changing the delimiter
Omitting the delimiter
Example:
[$stack $overflow $string $concatenation] // Use nothing (but really need space)
Related
The PHP debugging tool kint has a strange syntax where certain symbols can be prefixed to functions to alter their behavior, as shown in this guide.
The relevant information:
Modifiers are a way to change Kint output without having to use a different function. Simply prefix your call to kint with a modifier to apply it:
! Expand all data in this dump automatically
+ Disable the depth limit in this dump
- Attempt to clear any buffered output before this dump
# Return the output of this dump instead of echoing it
~ Use the text renderer for this dump
Example:
+Kint::dump($data); // Disabled depth limit
!d($data); // Expanded automatically
How does this work?
By looking at the source code it seems that the symbols are being parsed into an array called $modifiers. But how can you do this with PHP? And what is the scope of this, could I do this with other unicode symbols as well, or are the five in question (+, -, ~, !, #) the only ones.
The '#' already has a use in PHP when prefixed, see: What is the use of the # symbol in PHP?. How can this be overruled?
Edit: A follow-up question to the answers given is how exactly kint bends the (php) rules. For example why the ~ doesn't give a syntax error. Consider this example:
<?php
function d($args) {
echo $args[0];
}
d([1,2,3]); // prints 1
~d([1,2,3]); // syntax error, unsupported operand types
vs
<?php
require 'kint.php';
~d([1,2,3]); // prints the array with the text renderer with no issues
Edit 2: removed unsubstantiated claim that kint uses eval()
Original author of Kint here.
Sorry you found it confusing! The operands were added as a shorthand to switch some commonly used settings for common usecase scenarios.
Since Kint already parses the PHP code where it was called from to get and display the names (or expressions) of passed variables that are being dumped, adding the operands was a minor addition to that functionality.
Note the variable name is displayed ^. As of time of writing Kint is still the only dumper that can do this!
And the actual explanation to the OP question comes from this in-depth answer:
PHP unary operators:
- is arithmetic negation
+ is arithmetic identity
! is logical negation
~ is bitwise not
Thus, it is perfectly allowable to prefix function calls with these operators, as long as the function returns a type of value that the operator would normally work on:
function foo() {
return 0;
}
// All of these work just fine, and generate no errors:
-foo();
+foo();
!foo();
~foo();
Sorry for the late reply. I was just reading the Kint documentation and had the same question. After finding your question I decided to investigate. You may have figured it out by now, but kind actually reads the source code of the file that invoked it to change its behavior based on whether any of these "modifiers" were present.
This behavior is absolutely unpredictable as far as I'm concerned and I can't believe anybody would use this kind of trick as anything but a proof of concept. Notably, because the file must be readable, kint modifiers fail on eval()'d code (which you shouldn't be using to begin with) and perhaps in other unusual cases as well.
I just noticed that I could use an a variable as an argument, like this: $variable = "This's a string."; function('$variable'), and not like this: function('This's a string');. I can see why I can't do the latter, but I don't understand what's happening behind the scenes that meakes the first example work.
Have you heard about formal languages? The parser keeps track of the context, and so, it knows what the expected characters are and what not.
In the moment you close the already opened string, you're going back to the context before the opening of the string (that is, in the context of a function call in this case).
The relevant php-internal pieces of codes are:
the scanner turns the sequence between ' and ' into an indivisible TOKEN.
the parser puts the individual indivisible tokens into a semantic context.
These are the relevant chucks of C code that make it work. They are part of the inner workings of PHP (particularily, the Zend Engine).
PHP does not anticipate anything, it really reads everything char by char and it issues a parsing error as soon as it finds an unexpected TOKEN in a semantic context where it's not allowed to be.
In your case, it reads the token 'This' and the scanner matches a new string. Then it goes on reading s and when it finds a space, it turns the s into a constant. As the constant and the previously found token 'This' together don't form any known reduction (the possible reductions are described in the parser-link I've given you above), the parser issues an error like
Unexpected T_STRING
As you can deduce from this message, it is really referring to what it has found (or what it hopes it has found), so there's really no anticipation of anything.
Your question itself is wrong in the sense that there's no apostroph in the variable (in the variable's identifier). You may have an apostroph in the variable's value. Do not confuse them. A value can stand alone, without a variable:
<?php
'That\'s fine';
42;
(this is a valid PHP code which just loads those values into memory)
function('$variable') shouldn't be working correctly
Characters within the " " escape single quotes
Characters within '' do not escape single quotes (they cant escape themselves!).
Using the "" also lets you use variables as part of a string, so:
$pet = 'cat'
$myStory = "the $pet walked down the street"
function($pet) is the way the function should be passed a string
use it like this
function('This\'s a string');
Before passing a string to eval() I would like to make sure the syntax is correct and allow:
Two functions: a() and b()
Four operators: /*-+
Brackets: ()
Numbers: 1.2, -1, 1
How can I do this, maybe it has something to do with PHP Tokenizer?
I'm actually trying to make a simple formula interpreter so a() and b() will be replaced by ln() and exp(). I don't want to write a tokenizer and parser from scratch.
As far as validation is concerned, the following character tokens are valid:
operator: [/*+-]
funcs: (a\(|b\()
brackets: [()]
numbers: \d+(\.\d+)?
space: [ ]
A simple validation could then check if the input string matches any combination of these patterns. Because the funcs token is pretty precise and it does not clash much with other tokens, this validation should be quite stable w/o the need implementing any syntax/grammar already:
$tokens = array(
'operator' => '[/*+-]',
'funcs' => '(a\(|b\()',
'brackets' => '[()]',
'numbers' => '\d+(\.\d+)?',
'space' => '[ ]',
);
$pattern = '';
foreach($tokens as $token)
{
$pattern .= sprintf('|(?:%s)', $token);
}
$pattern = sprintf('~^(%s)*$~', ltrim($pattern, '|'));
echo $pattern;
Only if the whole input string matches against the token based pattern, it validates. It still might be syntactically wrong PHP, put you can ensure it only is build upon the specified tokens:
~^((?:[/*+-])|(?:(a\(|b\())|(?:[()])|(?:\d+(\.\d+)?)|(?:[ ]))*$~
If you build the pattern dynamically - as in the example - you're able to modify your language tokens later on more easily.
Additionally this can be the first step to your own tokenizer / lexer. The token stream can then passed on to a parser which can syntactically validate and interpret it. That's the part user187291 wrote about.
Alternatively to writing a full lexer+parser, and you need to validate the syntax, you can formulate your grammar based on tokens as well and then do a regex based token grammar on the token representation of the input.
The tokens are the words you use in your grammar. You will need to describe parenthesis and function definition more precisely then in tokens, and the tokenizer should follow more clear rules which token supersedes another token. The concept is outlined in another question of mine. It uses regex as well for grammar formulation and syntax validation, but it still does not parse. In your case eval would be the parser you're making use of.
Parser generators have indeed already been written for PHP, and "LIME" in particular comes with the typical "calculator" example, which would be an obvious starting point for your "mini language": http://sourceforge.net/projects/lime-php/
It's been years since I last played with LIME, but it was already mature & stable then.
Notes:
1) Using a full-on parser generator gives you the advantage of avoiding PHP eval() entirely if you wish - you can make LIME emit a parser which effectively provides an "eval" function for expressions written in your mini language (with validation baked in). This gives you the additional advantage of allowing you to add support for new functions, as needed.
2) It may seem like overkill at first to use a parser generator for such an apparently small task, but once you get the examples working you'll be impressed by how easy it is to modify and extend them. And it's very easy to underestimate the difficulty of writing a bug-free parser (even a "trivial" one) from scratch.
yes, you need the Tokenizer, or something similar, but it's only part of the story. A tokenizer (more commonly called "lexer") can only read and parse elements of an expression, but has no means to detect that something like "foo()+*bar)" is invalid. You need the second part, called parser which would be able to arrange tokens in a kind of a tree (called "AST") or provide an error message when failing to do so. Ironically, once you've got a tree, "eval" is not needed anymore, you can evaluate your expression directly from the tree.
I would recommend you to write a parser by hand because it's a very useful exercise and a lot of fun. Recursive descent parsers are quite easy to program.
You could use token_get_all(), inspect each token, and abort at the first invalid token.
hakre's answer, using regexes is a nice solution, but is a wee bit complicated. Also handling a whitelist of functions becomes rather messy. And if this does go wrong it could have a very nasty effect on your system.
Is there a reason you don't use the javascript 'eval' instead?
I'm writing a script that will allow a user to input a string that is a math statement, to then be evaluated. I however have hit a roadblock. I cannot figure out how, using preg_match, to dissallow statements that have variables in them.
Using this, $calc = create_function("", "return (" . $string . ");" ); $calc();, allows users to input a string that will be evaluated, but it crashes whenever something like echo 'foo'; is put in place of the variable $string.
I've seen this post, but it does not allow for math functions inside the string, such as $string = 'sin(45)';.
For a stack-based parser implemented in PHP that uses Djikstra's shunting yard algorithm to convert infix to postfix notation, and with support for functions with varying number of arguments, you can look at the source for the PHPExcel calculation engine (and which does not use eval)
Also have a look at the responses to this question
How complex of a math function do you need to allow? If you only need basic math, then you might be able to get away with only allowing whitespace + the characters 0123456789.+/-* or some such.
In general, however, using the language's eval-type capabilities to just do math is probably a bad idea.
Something like:
^([\d\(\)\+\-*/ ,.]|sin\(|cos\(|sqrt\(|...)+$
would allow only numbers, brackets, math operations and provided math functions. But it won't check if provided expression is valid, so something like +++sin()))333((( would be still accepted.
I wonder if this class would help you? Found that doing a search on Google for "php math expressions".
I have a need to evaluate user-defined logical expressions of arbitrary complexity on some PHP pages. Assuming that form fields are the primary variables, it would need to:
substitute"varibles" for form
fields values;
handle comparison operators,
minimally ==, <, <=, >= and > by
symbol, name (eg eq, lt, le, ge, gt
respectively);
handle boolean operators not, and, or and
possibly xor by name, symbol (eg !,
&&, || and ^^ respectively);
handle literal values for strings
and numbers;
be plaintext not XML (eg "firstname
== '' or lastname == ''); and
be reasonably performant.
Now in years gone by I've written recursive descent parsers that could build an expression tree and do this kind of thing but thats not a task I'm relishing in PHP so I'm hoping there are things out there that will at least get me some of the way there.
Suggestions?
Much time has gone by since this question was asked, and I happened to be looking for an expression parser for php. I chose to use the ExpressionLanguage component from Symfony 2.4. It can be installed with no dependencies from composer via packagist.
composer require symfony/expression-language
Check create_function, it creates an anonymous function from the string parameters passed, I'm not sure about its performance, but it's very flexible...
If I understand the problem correctly, you want the users to write out functions in non-PHP, and then have PHP interpret it?
If so, you could simply take their string and replace "lt" with "<" and "gt" with ">" ... then do eval().
I have a hunch the problem isn't this simple, but if it is, eval() could do the job. Of course, then you're opening yourself up for any kind of attack.
Take a look at my infix to postfix example I think you could port it to PHP with relative ease. It only uses an array and some switches. No trees. A stack is only needed to run the postfix result.
Check out this function: http://pluginphp.com/plug-in31.php
You can try adapting my Evaluator class (https://github.com/djfm/Evaluator), it does arithmetic expressions (for now) and you can use variables too. All the major PHP operators are implemented.