I am looking for a language parser written in PHP.
The goal is to read a custom language, not read PHP code.
Basically, I want to specify a language syntax, give a code snippet and get back a structure representing it. Then I can traverse that structure to execute the code snippet. I believe the structure will be an AST, but I don't know if this is the only option (I am not intimate with parsers and their vocabulary).
I had a look at the Doctrine DQL parser but it doesn't seem like a generic language parser.
This is not a complete list, if you're looking for PHP runtime lexer/parsers, one exceptional project is Phlexy by NikiC.
You can find a use-case inside PHP-Parser as well written by him. That is a parser for the PHP language with an abstract syntax tree (AST), partially generated from a grammar file.
I never managed it to get that far yet, from my own research over the years, there are not many such projects in PHP userspace, and these two libraries from NikiC are really a very good example.
If you're looking for a lexer that follows more the flex rules, I have written one in XDOM that lexes CSS selector syntax, it's also with a parser but the parser is not based on a grammar file even though it exists in the CSS specs. The lexer is based on a .lex file.
Supposedly php has a library to do a lexer/parser:
http://php.net/manual/en/parle.examples.php
For Windows, looks like you can grab a pre-compiled binary.
http://php.net/manual/en/parle.installation.php
says versions available here: http://windows.php.net/downloads/pecl/releases/parle/
Related
I am looking for a language parser written in PHP.
The goal is to read a custom language, not read PHP code.
Basically, I want to specify a language syntax, give a code snippet and get back a structure representing it. Then I can traverse that structure to execute the code snippet. I believe the structure will be an AST, but I don't know if this is the only option (I am not intimate with parsers and their vocabulary).
I had a look at the Doctrine DQL parser but it doesn't seem like a generic language parser.
This is not a complete list, if you're looking for PHP runtime lexer/parsers, one exceptional project is Phlexy by NikiC.
You can find a use-case inside PHP-Parser as well written by him. That is a parser for the PHP language with an abstract syntax tree (AST), partially generated from a grammar file.
I never managed it to get that far yet, from my own research over the years, there are not many such projects in PHP userspace, and these two libraries from NikiC are really a very good example.
If you're looking for a lexer that follows more the flex rules, I have written one in XDOM that lexes CSS selector syntax, it's also with a parser but the parser is not based on a grammar file even though it exists in the CSS specs. The lexer is based on a .lex file.
Supposedly php has a library to do a lexer/parser:
http://php.net/manual/en/parle.examples.php
For Windows, looks like you can grab a pre-compiled binary.
http://php.net/manual/en/parle.installation.php
says versions available here: http://windows.php.net/downloads/pecl/releases/parle/
I would like to know: is there something like pyparsing (a recursive descent parser) for PHP?
I already looked for it, but it seems no one did it yet. I hope I am wrong.
Thank you in advance.
I don't know any maintained parser generators written in PHP. But there are parser generators written in other languages with PHP as a target language. One I have personally used is kmyacc. There is a PHP and Windows compatible fork of it. The grammar for it is written in yacc format and can be compiled to PHP using this command:
kmyacc -l -m %PARSER_PROTOTYPE_FILE% -p %NAME% %GRAMMAR_FILE%
Kmyacc already comes with a procedural parser prototype file for PHP, but I personally use a modified version of an OOP based prototype.
As an example: This grammar get's compiled into this parser. (Note that the grammar is huge, that's why the generated parser has two and a half thousand lines. A "normal" grammar would obviously be far smaller.)
If all you need to parse are "custom expressions", you can probably code a recursive descent parser by hand fairly easily, if you have already written down your grammar.
See this SO answer for details: Is there an alternative for flex/bison that is usable on 8-bit embedded systems?
You can try this:
http://pyparsing.wikispaces.com/message/view/home/41772107
I've been searching for this for a while though I never get a great answer.
I'm looking for a tutorial or code which parses a defined syntax like a new language. Preferably using strtok or tokenizer.
I need to write a simple language which I will parse later.
Thanks for any help.
edit
The language is quite simple. Basically variable assignment and loops as well as conditional checks. Nothing fancy.
edit
I guess from the answer I got, the title should not be so. Something along the lines of "how to create a language in php" would be better. Thanks.
Basically, "making a language" involves several steps. First, you need a "lexer" which splits your input into substrings belonging to different symbol classes (like "identifier", "number", "operator" etc). Second, you write down a grammar of your language, usually using some kind of BNF. Then you eat the banana use a program called "parser generator" which turns your grammar into actual parser code and finally you combine lexer and parser to get an actual complier.
Normally, this kind of things is being done with C or Java, I've never heard of working compliers written in php. Still, you can use php tokenizer for the first part (the lexer) - assuming your language has syntax similar to php - and try http://pear.php.net/package/PHP_ParserGenerator to generate the parser.
Sorry if this sounds a bit complicated, but so it is.
This link Any decent PHP parser written in PHP? discusses parsing of PHP, using PHP.
The value of this for OP, is that the answers provide several ways to obtain parser generators, some that run in PHP itself, which would likely be useful to him.
I need to parse PHP & JavaScript documents structure to get the info about document functions & their parameters, classes & their methods, variables, and so on ...
I'm wondering if there is any solution for doing that (no regular expressions) ... I've heard about something called "lexing" however I was unable to find any examples even the ones that could me tell if this is something what I am looking for or not ...
thanks in advance
By "Lexing" your referring to Lexical Analysis, and there are some ancient tools which mostly still work named Lex and Yacc. Lex builds the tokenizer, and Yacc stands for "yet another compiler compiler" and is the actual parser.
The concept of lex/Yacc, is you build a grammar for the language, and then run the grammar through the paslex tool to generate source code (normally in C) that you can use to parse a file and take action on specific keywords and tokens. Martin Waldenburg wrote a pascal version of lex/yacc named PasLex which has been kicking around for way over a decade now and has been converted to Delphi (although it might not work with the latest versions without some minor work). If I remember correctly, it uses the same .L grammar input files as lex, so any documentation you find for lex/yacc can also be applied to paslex, with the exception that you get pascal code as the output.
I'm not sure about current documentation availability. Before the internet (gasp) we used books and most of this was heavily documented on paper which has long turned yellow...however, rumor has it that you might..just might be able to pick up a used copy from Amazon. I cut my teeth on this using a book which is also known as "the dragon book" which appears to have been re-published as recently as 2006.
EDIT:
I was mistaken by the tool, it was TPLY. PasLex was a delphi grammar implementation...TPLY was the Lex/Yacc tool which generated pascal source from a .L file.
I'm not sure if this is feasible but for PHP would you be able to invoke the PHP CLI from Delphi to get the information?
If so you could call token_get_all() and then spit out the result in something that you can parse in Delphi (maybe xml, json, etc.). This is lexing. The problem with this is that is only half the problem solved - you still have to understand each token in context to get the results you want.
I find myself drawn to the Parsing Expression Grammar formalism for describing domain specific languages, but so far the implementation code I've found has been written in languages like Java and Haskell that aren't web server friendly in the shared hosting environment that my organization has to live with.
Does anyone know of any PEG libraries or PackRat Parser Generators for Javascript or PHP? Of course code generators in any languages that can produce Javascript or PHP source code would do the trick.
I have recently written PEG.js, PEG-based parser generator for JavaScript. It can be used from a command-line or you can try it from your browser.
There is in fact one for Javascript: OMeta. http://www.tinlizzie.org/ometa/
I also implemented a version of this in Python: http://github.com/python-parsley/parsley
php PEG https://github.com/maetl/php-peg
This post is really old but I found it through google, and It should have been answered
Language.js:
Language.js is an open source experimental new parser based on PEG (Parsing Expression Grammar), with the special addition of the "naughty OR" operator to handle errors in a unique new way. It makes use of memoization to achieve linear time parsing speed
There's also Kouprey for JavaScript, which is a very easy to use PEG generator/library.
look at https://github.com/leblancmeneses/NPEG can easily be converted into php.
Parse tree is created with anonymous functions.
Have you looked at ANTLR? It produces lexer and parser code, handles abstract syntax trees, lets you insert code the grammar to be injected into the lexer/parser code, and its available for a variety of languages!