Related
I have an idea. I want to give our client the ability to specify pricing based on a number of variables by writing some simple code like this:
if customer.zip is "37208"
return 39.99
else
return 59.99
And in my code, I'd do something like this:
try {
$variables = array('customer' => array('zip' => '63901'));
$code = DSL::parse(DSL::tokenize($userCode))
$returnValue = DSL::run($code, $variables);
} catch (SyntaxErrorException $e) {
...
}
I guess what I'm wanting is to create a simple DSL in PHP that allows our customer to have a great deal of flexibility in setting pricing without having to have us code each and every case.
Here's the basic idea:
I would provide an array of variables and the code that the customer wrote.
The parser would evaluate the code that the user wrote using the variables provided and return to me the value that our customer returned. It would throw exceptions for any syntax errors, etc.
I would then use the returned value in the normal logic of the application.
So do you know of any resources or frameworks for building a simple DSL in PHP? Any ideas where to begin?
Thanks!
Technical limitations aside, you might want to really think twice about giving this kind of programming power to (I presume) non-programmers. They will probably mess up in completely unpredictable ways and you'll be the one having to clean up the mess. At least guard it with lots of tests. And possibly legalese as well.
But you asked a question, so I'll try to answer that. There is a distinction to be made between internal style DSL's (What most people mean when they use the word DSL) and then external style DSL's (Which is more like a mini language). Ruby is famous for having a syntax that lends it self well to internal style DSL. PHP on the other hand, is quite bad in that regard.
That said, you can still do some stuff in PHP - The simplest is perhaps to just write up a library of functions and then have your customers write code in plain PHP, using that library. You would have to audit the code of course, but it would give all the benefits of using an existing runtime.
If that's not fancy enough, you will have to dig in to the heavy stuff. First you need a parser. If you know how, they can be hand written fairly easily, but unless you were forced to write one in school or you have a strange hobby of writing that kind of stuff just for fun (I do), it's probably going to take you a bit of work. The basic components of a parser is a tokenizer and some kind of automata (state machine) that arranges the tokens into a tree-structure (an AST).
Once you have your parsed structure, you need to evaluate it. Since this is a DSL, the number of features are limited and performance is probably not your biggest concern, you could write some object oriented code around the AST and leave it at that. Otherwise you have options like writing some sort of interpreter or cross-compile it into another format (PHP would be an obvious choice).
The tricky part all way through this is mostly in handling edge cases, such as syntax errors and report something meaningful back to the user. Again, just giving then access to a subset of PHP, will give you that for free, so consider that first.
If anyone else is looking for another option - consider using Twig for creating the DSL/parsing (http://twig.sensiolabs.org/) which is integrated to the Pico CMS (http://pico.dev7studios.com/#).
The standard approach to building a DSL parser is to employ a parser generator aka a compiler-compiler to do the heavy lifting. This allows the developer to express the DSL in an abstract BNF-ish syntax, and not have to get into the nitty gritty of parsing and lexing.
Examples include Yacc in C, Regexp::Grammars in Perl, and ANTLR, which targets Java and several other languages, etc. The PHP option appears to be PHP-PEG.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
The community reviewed whether to reopen this question 11 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I am setting out to do a side project that has the goal of translating code from one programming language to another. The languages I am starting with are PHP and Python (Python to PHP should be easier to start with), but ideally I would be able to add other languages with (relative) ease. The plan is:
This is geared towards web development. The original and target code will be be sitting on top of frameworks (which I will also have to write). These frameworks will embrace an MVC design pattern and follow strict coding conventions. This should make translation somewhat easier.
I am also looking at IOC and dependency injection, as they might make the translation process easier and less error prone.
I'll make use of Python's parser module, which lets me fiddle with the Abstract Syntax Tree. Apparently the closest I can get with PHP is token_get_all(), which is a start.
From then on I can build the AST, symbol tables and control flow.
Then I believe I can start outputting code. I don't need a perfect translation. I'll still have to review the generated code and fix problems. Ideally the translator should flag problematic translations.
Before you ask "What the hell is the point of this?" The answer is... It'll be an interesting learning experience. If you have any insights on how to make this less daunting, please let me know.
EDIT:
I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.
I've been building tools (DMS Software Reengineering Toolkit) to do general purpose program manipulation (with language translation being a special case) since 1995, supported by a strong team of computer scientists. DMS provides generic parsing, AST building, symbol tables, control and data flow analysis, application of translation rules, regeneration of source text with comments, etc., all parameterized by explicit definitions of computer languages.
The amount of machinery you need to do this well is vast (especially if you want to be able to do this for multiple languages in a general way), and then you need reliable parsers for languages with unreliable definitions (PHP is perfect example of this).
There's nothing wrong with you thinking about building a language-to-language translator or attempting it, but I think you'll find this a much bigger task for real languages than you expect. We have some 100 man-years invested in just DMS, and another 6-12 months in each "reliable" language definition (including the one we painfully built for PHP), much more for nasty languages such as C++. It will be a "hell of a learning experience"; it has been for us. (You might find the technical Papers section at the above website interesting to jump start that learning).
People often attempt to build some kind of generalized machinery by starting with some piece of technology with which they are familiar, that does a part of the job. (Python ASTs are great example). The good news, is that part of the job is done. The bad news is that machinery has a zillion assumptions built into it, most of which you won't discover until you try to wrestle it into doing something else. At that point you find out the machinery is wired to do what it originally does, and will really, really resist your attempt to make it do something else. (I suspect trying to get the Python AST to model PHP is going to be a lot of fun).
The reason I started to build DMS originally was to build foundations that had very few such assumptions built in. It has some that give us headaches. So far, no black holes. (The hardest part of my job over the last 15 years is to try to prevent such assumptions from creeping in).
Lots of folks also make the mistake of assuming that if they can parse (and perhaps get an AST), they are well on the way to doing something complicated. One of the hard lessons is that you need symbol tables and flow analysis to do good program analysis or transformation. ASTs are necessary but not sufficient. This is the reason that Aho&Ullman's compiler book doesn't stop at chapter 2. (The OP has this right in that he is planning to build additional machinery beyond the AST). For more on this topic, see Life After Parsing.
The remark about "I don't need a perfect translation" is troublesome. What weak translators do is convert the "easy" 80% of the code, leaving the hard 20% to do by hand. If the application you intend to convert are pretty small, and you only intend to convert it once well, then that 20% is OK. If you want to convert many applications (or even the same one with minor changes over time), this is not nice. If you attempt to convert 100K SLOC then 20% is 20,000 original lines of code that are hard to translate, understand and modify in the context of another 80,000 lines of translated program you already don't understand. That takes a huge amount of effort. At the million line level, this is simply impossible in practice. (Amazingly there are people that distrust automated tools and insist on translating million line systems by hand; that's even harder and they normally find out painfully with long time delays, high costs and often outright failure.)
What you have to shoot for to translate large-scale systems is high nineties percentage conversion rates, or it is likely that you can't complete the manual part of the translation activity.
Another key consideration is size of code to be translated. It takes a lot of energy to build a working, robust translator, even with good tools. While it seems sexy and cool to build a translator instead of simply doing a manual conversion, for small code bases (e.g., up to about 100K SLOC in our experience) the economics simply don't justify it. Nobody likes this answer, but if you really have to translate just 10K SLOC of code, you are probably better off just biting the bullet and doing it. And yes, that's painful.
I consider our tools to be extremely good (but then, I'm pretty biased). And it is still very hard to build a good translator; it takes us about 1.5-2 man-years and we know how to use our tools. The difference is that with this much machinery, we succeed considerably more often than we fail.
My answer will address the specific task of parsing Python in order to translate it to another language, and not the higher-level aspects which Ira addressed well in his answer.
In short: do not use the parser module, there's an easier way.
The ast module, available since Python 2.6 is much more suitable for your needs, since it gives you a ready-made AST to work with. I've written an article on this last year, but in short, use the parse method of ast to parse Python source code into an AST. The parser module will give you a parse tree, not an AST. Be wary of the difference.
Now, since Python's ASTs are quite detailed, given an AST the front-end job isn't terribly hard. I suppose you can have a simple prototype for some parts of the functionality ready quite quickly. However, getting to a complete solution will take more time, mainly because the semantics of the languages are different. A simple subset of the language (functions, basic types and so on) can be readily translated, but once you get into the more complex layers, you'll need heavy machinery to emulate one language's core in another. For example consider Python's generators and list comprehensions which don't exist in PHP (to my best knowledge, which is admittedly poor when PHP is involved).
To give you one final tip, consider the 2to3 tool created by the Python devs to translate Python 2 code to Python 3 code. Front-end-wise, it has most of the elements you need to translate Python to something. However, since the cores of Python 2 and 3 are similar, no emulation machinery is required there.
Writing a translator isn't impossible, especially considering that Joel's Intern did it over a summer.
If you want to do one language, it's easy. If you want to do more, it's a little more difficult, but not too much. The hardest part is that, while any turing complete language can do what another turing complete language does, built-in data types can change what a language does phenomenally.
For instance:
word = 'This is not a word'
print word[::-2]
takes a lot of C++ code to duplicate (ok, well you can do it fairly short with some looping constructs, but still).
That's a bit of an aside, I guess.
Have you ever written a tokenizer/parser based on a language grammar? You'll probably want to learn how to do that if you haven't, because that's the main part of this project. What I would do is come up with a basic Turing complete syntax - something fairly similar to Python bytecode. Then you create a lexer/parser that takes a language grammar (perhaps using BNF), and based on the grammar, compiles the language into your intermediate language. Then what you'll want to do is do the reverse - create a parser from your language into target languages based on the grammar.
The most obvious problem I see is that at first you'll probably create horribly inefficient code, especially in more powerful* languages like Python.
But if you do it this way then you'll probably be able to figure out ways to optimize the output as you go along. To summarize:
read provided grammar
compile program into intermediate (but also Turing complete) syntax
compile intermediate program into final language (based on provided grammar)
...?
Profit!(?)
*by powerful I mean that this takes 4 lines:
myinput = raw_input("Enter something: ")
print myinput.replace('a', 'A')
print sum(ord(c) for c in myinput)
print myinput[::-1]
Show me another language that can do something like that in 4 lines, and I'll show you a language that's as powerful as Python.
There are a couple answers telling you not to bother. Well, how helpful is that? You want to learn? You can learn. This is compilation. It just so happens that your target language isn't machine code, but another high-level language. This is done all the time.
There's a relatively easy way to get started. First, go get http://sourceforge.net/projects/lime-php/ (if you want to work in PHP) or some such and go through the example code. Next, you can write a lexical analyzer using a sequence of regular expressions and feed tokens to the parser you generate. Your semantic actions can either output code directly in another language or build up some data structure (think objects, man) that you can massage and traverse to generate output code.
You're lucky with PHP and Python because in many respects they are the same language as each other, but with different syntax. The hard part is getting over the semantic differences between the grammar forms and data structures. For example, Python has lists and dictionaries, while PHP only has assoc arrays.
The "learner" approach is to build something that works OK for a restricted subset of the language (such as only print statements, simple math, and variable assignment), and then progressively remove limitations. That's basically what the "big" guys in the field all did.
Oh, and since you don't have static types in Python, it might be best to write and rely on PHP functions like "python_add" which adds numbers, strings, or objects according to the way Python does it.
Obviously, this can get much bigger if you let it.
I will second #EliBendersky point of view regarding using ast.parse instead of parser (which I did not know about before). I also warmly recommend you to review his blog. I used ast.parse to do Python->JavaScript translator (#https://bitbucket.org/amirouche/pythonium). I've come up with Pythonium design by somewhat reviewing other implementations and trying them on my own. I forked Pythonium from https://github.com/PythonJS/PythonJS which I also started, It's actually a complete rewrite . The overall design is inspired from PyPy and http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-89-1.pdf paper.
Everything I tried, from beginning to the best solution, even if it looks like Pythonium marketing it really isn't (don't hesitate to tell me if something doesn't seem correct to the netiquette):
Implement Python semantic in Plain Old JavaScript using prototype inheritance: AFAIK it's impossible to implement Python multiple inheritance using JS prototype object system. I did try to do it using other tricks later (cf. getattribute). As far as I know there is no implementation of Python multiple inheritance in JavaScript, the best that exists is Single inhertance + mixins and I'm not sure they handle diamond inheritance. Kind of similar to Skulpt but without google clojure.
I tried with Google clojure, just like Skulpt (compiler) instead of actually reading Skulpt code #fail. Anyway because of JS prototype based object system still impossible. Creating binding was very very difficult, you need to write JavaScript and a lot of boilerplate code (cf. https://github.com/skulpt/skulpt/issues/50 where I am the ghost). At that time there was no clear way to integrate the binding in the build system. I think that Skulpt is a library and you just have to include your .py files in the html to be executed, no compilation phase required to be done by the developer.
Tried pyjaco (compiler) but creating bindings (calling Javascript code from Python code) was very difficult, there was too much boilerplate code to create every time. Now I think pyjaco is the one that more near Pythonium. pyjaco is written in Python (ast.parse too) but a lot is written in JavaScript and it use prototype inheritance.
I never actually succeed at running Pyjamas #fail and never tried to read the code #fail again. But in my mind PyJamas was doing API->API tranlation (or framework to framework) and not Python to JavaScript translation. The JavaScript framework consume data that is already in the page or data from the server. Python code is only "plumbing". After that I discovered that pyjamas was actually a real python->js translator.
Still I think it's possible to do API->API (or framework->framework) translation and that's basicly what I do in Pythonium but at lower level. Probably Pyjamas use the same algorithm as Pythonium...
Then I discovered brython fully written in Javascript like Skulpt, no need for compilation and lot of fluff... but written in JavaScript.
Since the initial line written in the course of this project, I knew about PyPy, even the JavaScript backend for PyPy. Yep, you can, if you find it, directly generate a Python interpreter in JavaScript from PyPy. People say, it was a disaster. I read no where why. But I think the reason is that the intermediate language they use to implement the interpreter, RPython, is a subset of Python tailored to be translated to C (and maybe asm). Ira Baxter says you always make assumptions when you build something and probably you fine tune it to be the best at what it's meant to do in the case of PyPy: Python->C translation. Those assumptions might not be relevant in another context worse they can infere overhead otherwise said direct translation will most likely always be better.
Having the interpreter written in Python sounded like a (very) good idea. But I was more interested in a compiler for performance reasons also it's actually more easy to compile Python to JavaScript than interpret it.
I started PythonJS with the idea of putting together a subset of Python that I could easily translate to JavaScript. At first I didn't even bother to implement OO system because of past experience. The subset of Python that I achieved to translate to JavaScript are:
function with full parameters semantic both in definition and calling. This is the part I am most proud of.
while/if/elif/else
Python types were converted to JavaScript types (there is no python types of any kind)
for could iterate over Javascript arrays only (for a in array)
Transparent access to JavaScript: if you write Array in the Python code it will be translated to Array in javascript. This is the biggest achievement in terms of usability over its competitors.
You can pass function defined in Python source to javascript functions. Default arguments will be taken into account.
It add has special function called new which is translated to JavaScript new e.g: new(Python)(1, 2, spam, "egg") is translated to "new Python(1, 2, spam, "egg").
"var" are automatically handled by the translator. (very nice finding from Brett (PythonJS contributor).
global keyword
closures
lambdas
list comprehensions
imports are supported via requirejs
single class inheritance + mixin via classyjs
This seems like a lot but actually very narrow compared to full blown semantic of Python. It's really JavaScript with a Python syntax.
The generated JS is perfect ie. there is no overhead, it can not be improved in terms of performance by further editing it. If you can improve the generated code, you can do it from the Python source file too. Also, the compiler did not rely on any JS tricks that you can find in .js written by http://superherojs.com/, so it's very readable.
The direct descendant of this part of PythonJS is the Pythonium Veloce mode. The full implementation can be found # https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/veloce/veloce.py?at=master 793 SLOC + around 100 SLOC of shared code with the other translator.
An adapted version of pystones.py can be translated in Veloce mode cf. https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pystone/?at=master
After having setup basic Python->JavaScript translation I choosed another path to translate full Python to JavaScript. The way of glib doing object oriented class based code except the target language is JS so you have access to arrays, map-like objects and many other tricks and all that part was written in Python. IIRC there is no javascript code written by in Pythonium translator. Getting single inheritance is not difficult here are the difficult parts making Pythonium fully compliant with Python:
spam.egg in Python is always translated to getattribute(spam, "egg") I did not profile this in particular but I think that where it loose a lot of time and I'm not sure I can improve upon it with asm.js or anything else.
method resolution order: even with the algorithm written in Python, translating it to Python Veloce compatible code was a big endeavour.
getattributre: the actual getattribute resolution algorithm is kind of tricky and it still doesn't support data descriptors
metaclass class based: I know where to plug the code, but still...
last bu not least: some_callable(...) is always transalted to "call(some_callable)". AFAIK the translator doesn't use inference at all, so every time you do a call you need to check which kind of object it is to call it they way it's meant to be called.
This part is factored in https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/runtime.py?at=master It's written in Python compatible with Python Veloce.
The actual compliant translator https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/compliant.py?at=master doesn't generate JavaScript code directly and most importantly doesn't do ast->ast transformation. I tried the ast->ast thing and ast even if nicer than cst is not nice to work with even with ast.NodeTransformer and more importantly I don't need to do ast->ast.
Doing python ast to python ast in my case at least would maybe be a performance improvement since I sometime inspect the content of a block before generating the code associated with it, for instance:
var/global: to be able to var something I must know what I need to and not to var. Instead of generating a block tracking which variable are created in a given block and inserting it on top of the generated function block I just look for revelant variable assignation when I enter the block before actually visiting the child node to generate the associated code.
yield, generators have, as of yet, a special syntax in JS, so I need to know which Python function is a generator when I want to write the "var my_generator = function"
So I don't really visit each node once for each phase of the translation.
The overall process can be described as:
Python source code -> Python ast -> Python source code compatible with Veloce mode -> Python ast -> JavaScript source code
Python builtins are written in Python code (!), IIRC there is a few restrictions related to bootstraping types, but you have access to everything that can translate Pythonium in compliant mode. Have a look at https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/builtins/?at=master
Reading JS code generated from pythonium compliant can be understood but source maps will greatly help.
The valuable advice I can give you in the light of this experience are kind old farts:
extensively review the subject both in literature and existing projects closed source or free. When I reviewed the different existing projects I should have given it way more time and motivation.
ask questions! If I knew beforehand that PyPy backend was useless because of the overhead due to C/Javascript semantic mismatch. I would maybe had Pythonium idea way before 6 month ago maybe 3 years ago.
know what you want to do, have a target. For this project I had different objectives: pratice a bit a javascript, learn more of Python and be able to write Python code that would run in the browser (more and that below).
failure is experience
a small step is a step
start small
dream big
do demos
iterate
With Python Veloce mode only, I'm very happy! But along the way I discovered that what I was really looking for was liberating me and others from Javascript but more importantly being able to create in a comfortable way. This lead me to Scheme, DSL, Models and eventually domain specific models (cf. http://dsmforum.org/).
About what Ira Baxter response:
The estimations are not helpful at all. I took me more or less 6 month of free time for both PythonJS and Pythonium. So I can expect more from full time 6 month. I think we all know what 100 man-year in an enterprise context can mean and not mean at all...
When someone says something is hard or more often impossible, I answer that "it only takes time to find a solution for a problem that is impossible" otherwise said nothing is impossible except if it's proven impossible in this case a math proof...
If it's not proven impossible then it leaves room for imagination:
finding a proof proving it's impossible
and
If it is impossible there may be an "inferior" problem that can have a solution.
or
if it's not impossible, finding a solution
It's not just optimistic thinking. When I started Python->Javascript everybody was saying it was impossible. PyPy impossible. Metaclasses too hard. etc... I think that the only revolution that brings PyPy over Scheme->C paper (which is 25 years old) is some automatic JIT generation (based hints written in the RPython interpreter I think).
Most people that say that a thing is "hard" or "impossible" don't provide the reasons. C++ is hard to parse? I know that, still they are (free) C++ parser. Evil is in the detail? I know that. Saying it's impossible alone is not helpful, It's even worse than "not helpful" it's discouraging, and some people mean to discourage others. I heard about this question via https://stackoverflow.com/questions/22621164/how-to-automatically-generate-a-parser-code-to-code-translator-from-a-corpus.
What would be perfection for you? That's how you define next goal and maybe reach the overall goal.
I am more interested in knowing what kinds of patterns I could enforce
on the code to make it easier to translate (ie: IoC, SOA ?) the code
than how to do the translation.
I see no patterns that can not be translated from one language to another language at least in a less than perfect way. Since language to language translation is possible, you'd better aim for this first. Since, I think according to http://en.wikipedia.org/wiki/Graph_isomorphism_problem, translation between two computer languages is a tree or DAG isomorphism. Even if we already know that they are both turing complete, so...
Framework->Framework which I better visualize as API->API translation might still be something that you might keep in mind as a way to improve the generated code. E.g: Prolog as very specific syntax but still you can do Prolog like computation by describing the same graph in Python... If I was to implement a Prolog to Python translator I wouldn't implement unification in Python but in a C library and come up with a "Python syntax" that is very readable for a Pythonist. In the end, syntax is only "painting" for which we give a meaning (that's why I started scheme). Evil is in the detail of the language and I'm not talking about the syntax. The concepts that are used in the language getattribute hook (you can live without it) but required VM features like tail-recursion optimisation can be difficult to deal with. You don't care if the initial program doesn't use tail recursion and even if there is no tail recursion in the target language you can emulate it using greenlets/event loop.
For target and source languages, look for:
Big and specific ideas
Tiny and common shared ideas
From this will emerge:
Things that are easy to translate
Things that are difficult to translate
You will also probably be able to know what will be translated to fast and slow code.
There is also the question of the stdlib or any library but there is no clear answer, it depends of your goals.
Idiomatic code or readable generated code have also solutions...
Targeting a platform like PHP is much more easy than targeting browsers since you can provide C-implementation of slow and/or critical path.
Given you first project is translating Python to PHP, at least for the PHP3 subset I know of, customising veloce.py is your best bet. If you can implement veloce.py for PHP then probably you will be able to run the compliant mode... Also if you can translate PHP to the subset of PHP you can generate with php_veloce.py it means that you can translate PHP to the subset of Python that veloce.py can consume which would mean that you can translate PHP to Javascript. Just saying...
You can also have a look at those libraries:
https://bitbucket.org/logilab/astroid
https://bitbucket.org/logilab/pylint-brain
Also you might be interested by this blog post (and comments): https://www.rfk.id.au/blog/entry/pypy-js-poc-jit/
This Google Tech Talk from Ira Baxter is interesting https://www.youtube.com/watch?v=C-_dw9iEzhA
You could take a look at the Vala compiler, which translates Vala (a C#-like language) into C.
I've heard many places that PHP's eval function is often not the answer. In light of PHP 5.3's LSB and closures we're running out of reasons to depend on eval or create_function.
Are there any conceivable cases where eval is the best (only?) answer in PHP 5.3?
This question is not about whether eval is evil in general, as it obviously is not.
Summary of Answers:
Evaluating numerical expressions (or other "safe" subsets of PHP)
Unit testing
Interactive PHP "shell"
Deserialization of trusted var_export
Some template languages
Creating backdoors for administers and/or hackers
Compatibility with < PHP 5.3
Checking syntax (possibly not safe)
If you're writing malware and you want to make life hard for the sysadmin who's trying to clean up after you. That seems to be the most common usage case in my experience.
Eric Lippert sums eval up over three blog posts. It's a very interesting read.
As far as I'm aware, the following are some of the only reasons eval is used.
For example, when you are building up complex mathematical expressions based on user input, or when you are serializing object state to a string so that it can be stored or transmitted, and reconstituted later.
The main problem with eval is it being a gateway for malicious code. Thus you should never use it in a context where it can be exploited from the outside, e.g. user provided input.
One valid UseCase would be in Mocking Frameworks.
Example from PHPUnit_Framework_TestCase::getMock()
// ... some code before
$mock = PHPUnit_Framework_MockObject_Generator::generate(
$originalClassName,
$methods,
$mockClassName,
$callOriginalClone,
$callAutoload
);
if (!class_exists($mock['mockClassName'], FALSE)) {
eval($mock['code']);
}
// ... some code after
There is actually a lot of things happening in the generate method. In laymens terms: PHPUnit will take the arguments to generate and create a class template from it. It will then eval that class template to make it available for instantiation. The point of this is to have TestDoubles to mock dependencies in UnitTests of course.
If you are writing a site that interprets and executes PHP code, like an interactive shell would.
...
I'm a systems guy, that's all I got.
You can use eval to create ad-hoc classes:
function myAutoLoad($sClassName){
# classic part
if (file_exists($sClassName.'.php'){
require $sClassName.'.php';
} else {
eval("
class $sClassName{
public function __call($sMethod,$aArgs){
return 'No such class: ' . $sClassName;
}
}");
}
}
Although, of course, usage is quite limited (some API's or maybe DI containers, testing frameworks, ORMs which have to deal with databases with dynamic structure, code playgrounds)
eval is a construct that can be used to check for syntax errors.
Say you have these two PHP scripts:
script1.php
<?php
// This is a valid syntax
$a = 1;
script2.php
<?php
// This is an invalid syntax
$a = abcdef
You can check for syntax errors using eval:
$code1 = 'return true; ?>'.file_get_contents('script1.php');
$code2 = 'return true; ?>'.file_get_contents('script2.php');
echo eval($code1) ? 'script1 has valid syntax' : 'script1 has syntax errors';
echo eval($code2) ? 'script2 has valid syntax' : 'script2 has syntax errors';
Unlike php_check_syntax (which is deprecated and removed anyway), the code will not be executed.
EDIT:
The other (preferred) alternative being php -l. You can use the solution above if you don't have access to system() or shell execution commands.
This method can inject classes/functions in your code. Be sure to enforce a preg_replace call or a namespace before doing so, to prevent them from being executed in subsequent calls.
As for the OP topic: When (if ever) is eval NOT evil? eval is simply not evil. Programmers are evil for using eval for no reason. eval can shorten your code (mathematical expression evaluation, per example).
I've found that there are times when most features of a language are useful. After all, even GOTO has had its proponents. Eval is used in a number of frameworks and it is used well. For example, CodeIgniter uses eval to distinguish between class hierarchy of PHP 4 and PHP 5 implementations. Blog plugins which allow for execution of PHP code definitely need it (and that is a feature available in Expression Engine, Wordpress, and others). I've also used it for one website where a series of views are almost identical, but custom code was needed for each and creating some sort of insane rules engine was far more complicated and slower.
While I know that this isn't PHP, I found that Python's eval makes implementation of a basic calculator much simpler.
Basically, here's the question:
Does eval make it easier to read? One of our chief goals is communicating to other programmers what was going through our head when we wrote this. In the CodeIgniter example it is very clear what they were trying to accomplish.
Is there another way? Chances are, if you're using eval (or variable variables, or any other form of string look-up or reflection syntax), there is another way to do it. Have you exhausted your other options? Do you have a reasonably limitted input set? Can a switch statement be used?
Other considerations:
Can it be made safe? Is there a way that a stray piece of code can work its way into the eval statement?
Can it be made consistent? Can you, given an input, always and consistently produce the same output?
An appropriate occasion (given the lack of easy alternatives) would be when trusted data was serialized with var_export and it's necessary to unserialize it. Of course, it should never have been serialized in that fashion, but sometimes the error is already done.
I suppose, eval should be used where the code is actually needs to be compiled. I mean such cases like template file compilations (template language into PHP for the sake of performance), plugin hook compilation, compilations for performance reasons etc.
You could use eval to create a setup for adding code after the system installed. Normally if you would want to change the code on the server you would have to add/change existing PHP files. An alternative to this would be to store the code in a database and use eval to execute it. You'd have to be sure that the code added is safe though.
Think of it like a plugin, just one that can do about anything...
You could think of a site that would allow people to contribute code snippets that the users could then dynamically add into their web pages - without them actually persisting code on the webservers filesystem. What you would need is an approval process though...
This eval debate is actually one big misunderstanding in context of php. People are brainwasched about eval being evil, but usually they have no problem using include, although include is essentially the same thing. Include foo is the same as eval file_get_contents foo, so everytime you're including something you commit the mortal sin of eval.
Compatibility. It's quite frequent to provide PHP4 fallbacks. But likewise it's a possible desire to emulate PHP5.4 functionality in 5.3, as example SplString. While simply providing two include variants (include.php4 vs. include.php5) is frequent, it's sometimes more efficient or readable to resort to eval():
$IMPL_AA = PHP_VERSION >= 5 ? "implements ArrayAccess" : "";
eval(<<<END
class BaseFeature $IMPL_AA {
Where in this case the code would work on PHP4, but expose the nicer API/syntax only on PHP5. Note that the example is fictional.
I've used eval when I had a php-engined bot that communicated with me and I could tell it to do commands via EVAL: php commands here. Still evil, but if your code has no idea what to expect (in case you pull a chunk of PHP code from a database) eval is the only solution.
So, this should hold true for all languages with eval:
Basically, with few exceptions, if you are building the value passed to eval or getting it from a non-truested source you are doing something wrong. The same holds true if you are calling eval on a static string.
Beyond the performance problems with initializing the parser at runtime, and the security issues, You generally mess with the type system.
More seriously, it's just been shown that in the vast majority of cases, there are much more elegant approaches to the solution. However, instead of banning the construct outright, it's nice to think of it as one might goto. There are legitimate uses for both, but it is a good red flag that should get you thinking about if you are approaching the problem the correct way.
In my experience, I've only found legitimate uses that fall in the categories of plugins and privileged user (for instance, the administrator of a website, not the user of such) extensions. Basically things that act as code coming from trusted sources.
Not direct use but the /e modifier to preg_replace utilizes eval and can be quite handy. See example #4 on http://php.net/preg_replace.
Whether or not it's evil/bad is subjective and depends entirely on what you consider "good" in a specific context. When dealing with untrusted inputs it is usually considered bad. However, in other situations it can be useful. Imagine writing a one-time data conversion script under extreme deadline pressure. In this situation, if eval works and makes things easier, I would have trouble calling it evil.
I recently had an idea to create my own String class to make using PHP's functions easier. Instead of strlen($str) I write $str->length(). Makes it easier to remember parameter orders in certain functions, like substr.
I ran some timing scripts on it and found that it's about 5 times slower than using the regular functions. I haven't tested this in a real app yet, so I don't know how negligible that will be (1ms vs 5ms or 100ms vs 500ms?).
Anyway it struck me that now PHP is now focusing more on OOP, wouldn't it make sense for strings, arrays and other basic types to be object oriented? They could then name the functions better and code would just "feel" nicer. And slowly phase out the old way of doing things. Any pros/cons to this?
You could always document your code and just use the functions that PHP has provided. It is hard to tell whether or not it is going to affect you in the long run. There are many different factors that can influence that. Give it a shot, maybe and if it does not work out switch back to the original way.
Personally, I would just keep my document well documented instead of getting too fancy. If you want to make the push into more OOP your best bet is to look into a Framework.
I wish PHP would make it that simple from the start. The code would look so neat.
I think it would be nice, but it wouldn't be PHP any more...
Sure, $somestring->length() is nice but, on the flipside, you have to $somestring = new String('asdf...') whenever you make a string and then constantly convert Strings to strings and vice versa. You probably end up making things harder to write & maintain in the long run.
I don't really see the PHP language ever changing in this way - it would change too much of the fundamental language. If you want a language that is like this, you're going to have to change languages, rather than hope for the language to change.
I understand why you're doing this, but libraries and abstractions that do nothing else besides "make it easier on the programmer" are a waste of time, in my humble opinion. They're rarely efficient, they're fluff, and they're even kind of pretentious.
PHP has a lot of faults, that is to be certain, but you'll spend a lot of time that could be better spent elsewhere if you try to invent workarounds for them everywhere you go.
Have a look at Stringy. It gives you a ton of useful methods that makes working with especially UTF-encoded strings easier in PHP:
$stringy = S::create('Fòô', 'UTF-8');
count($stringy); // 3
I program mostly in PHP and have a site along with other samples in ASP I need to convert over to PHP. Is there some kind of "translator" tool that can either enter lines of code or full slabs that attempts to output a close PHP equivalent?
Otherwise, is there an extensive table that lists comparisons (such as design215.com/toolbox/asp.php)
It isn't perfect, but this will convert most code.
I think this is a poor way to do it. Sure, a quick-reference table helps a little. But really you need to be fluent in both ASP and current PHP best practices, and envision what a good PHP design would be. The naive transliteration will just give you PHP code that thinks it's ASP. A true port will be easier to understand and maintain.
I agree with Abinadi that the tool by Mike kohn here is probably the best available still.
We did a successful conversion for a decent size project and wrote a blog about the process: Converting Classic ASP to PHP
While a standard lookup table with function could work it would be a LOT of work still to clean everything up. ASP to PHP is still probably one of the easier conversions but as mentioned will most likely end up with code that potentially is bad but in a different language.
Mike's tool handles fairly basic single page conversions and a good starting point but was outdated, missing a lot of functions and smarts when used on a bigger project. In saying that, it's still worth trying out even in the current state.
Here's a list of the main points we had to consider:
Not all types have a compatible type, eg dates and booleans
COM Objects can be used but may need heavy refactoring
Variable case sensitivity (tools can help here a lot)
Variable scoping (asp loves globals)
HTML/JS Get and Post case sensitivity (harder to fix with tools)
Object self references, eg PHP classes need $this->variable
If you use lots of let/get/set be prepared for some heavier re-factoring
Of course the list above is just things to lookout for, if you were to create a tool you have to factor in a lot of the basics in parsing/tokenising asp code before even considering the above differences.
Good luck to anyone attempting this conversion project, having done it before we know the feeling.