I need to parse PHP & JavaScript documents structure to get the info about document functions & their parameters, classes & their methods, variables, and so on ...
I'm wondering if there is any solution for doing that (no regular expressions) ... I've heard about something called "lexing" however I was unable to find any examples even the ones that could me tell if this is something what I am looking for or not ...
thanks in advance
By "Lexing" your referring to Lexical Analysis, and there are some ancient tools which mostly still work named Lex and Yacc. Lex builds the tokenizer, and Yacc stands for "yet another compiler compiler" and is the actual parser.
The concept of lex/Yacc, is you build a grammar for the language, and then run the grammar through the paslex tool to generate source code (normally in C) that you can use to parse a file and take action on specific keywords and tokens. Martin Waldenburg wrote a pascal version of lex/yacc named PasLex which has been kicking around for way over a decade now and has been converted to Delphi (although it might not work with the latest versions without some minor work). If I remember correctly, it uses the same .L grammar input files as lex, so any documentation you find for lex/yacc can also be applied to paslex, with the exception that you get pascal code as the output.
I'm not sure about current documentation availability. Before the internet (gasp) we used books and most of this was heavily documented on paper which has long turned yellow...however, rumor has it that you might..just might be able to pick up a used copy from Amazon. I cut my teeth on this using a book which is also known as "the dragon book" which appears to have been re-published as recently as 2006.
EDIT:
I was mistaken by the tool, it was TPLY. PasLex was a delphi grammar implementation...TPLY was the Lex/Yacc tool which generated pascal source from a .L file.
I'm not sure if this is feasible but for PHP would you be able to invoke the PHP CLI from Delphi to get the information?
If so you could call token_get_all() and then spit out the result in something that you can parse in Delphi (maybe xml, json, etc.). This is lexing. The problem with this is that is only half the problem solved - you still have to understand each token in context to get the results you want.
Related
I need to parse a small 'mini language' which users can type on my site. I was wondering what the counterparts of lex and jacc or antlr are for the world of php.
I used LIME Parser generator for PHP a couple of years ago, and it was already mature and stable.
The parser generator itself is written in PHP, which doesn't really matter in any technical sense - as we require only that the generated parser be in PHP - but I like this detail nonetheless. It makes me feel less apologetic about writing software in PHP ;-)
EDIT:
I should add:
Where I wrote "used" it would be more accurate to say that I "played with". I haven't written any production code using lime, yet. But I see no reason not to do so.
The "calculator example" provided with lime uses a tokenize() method which is very far from a real substitute for the power of lex. But if you need a real tokenizer it ought to be possible to use lex on the "front end" to feed tokens to lime on the "back end".
http://pear.php.net/package/PHP_ParserGenerator
http://wezfurlong.org/blog/2006/nov/parser-and-lexer-generators-for-php
I've ported Jison, a Bison clone in javascript, to php. The results are a killer parser, able to handle very simple and very complex lexing/parsing. It is now part of Jison, but there are a few updates in my fork - https://github.com/robertleeplummerjr/jison . The files are here - https://github.com/robertleeplummerjr/jison/tree/master/ports/php
See the readme in that page, you create a javascript and php parser at the same time that are capable of doing the same or different things. COOL!
I advise you to write your own parser, as it is quite easy today.
The easiest way to do so would be in my opinion to create one class for every syntax type possible (expression, test, loop, etc.).
Then in each class, code the following methods:
one method to determinate from a string if the string is of the given type (a+b is of type 'expression', if(b) is not)
one method to "run" this type (a+b will return a->run() + b->run(), and a->run() will return a value)
I've been searching for this for a while though I never get a great answer.
I'm looking for a tutorial or code which parses a defined syntax like a new language. Preferably using strtok or tokenizer.
I need to write a simple language which I will parse later.
Thanks for any help.
edit
The language is quite simple. Basically variable assignment and loops as well as conditional checks. Nothing fancy.
edit
I guess from the answer I got, the title should not be so. Something along the lines of "how to create a language in php" would be better. Thanks.
Basically, "making a language" involves several steps. First, you need a "lexer" which splits your input into substrings belonging to different symbol classes (like "identifier", "number", "operator" etc). Second, you write down a grammar of your language, usually using some kind of BNF. Then you eat the banana use a program called "parser generator" which turns your grammar into actual parser code and finally you combine lexer and parser to get an actual complier.
Normally, this kind of things is being done with C or Java, I've never heard of working compliers written in php. Still, you can use php tokenizer for the first part (the lexer) - assuming your language has syntax similar to php - and try http://pear.php.net/package/PHP_ParserGenerator to generate the parser.
Sorry if this sounds a bit complicated, but so it is.
This link Any decent PHP parser written in PHP? discusses parsing of PHP, using PHP.
The value of this for OP, is that the answers provide several ways to obtain parser generators, some that run in PHP itself, which would likely be useful to him.
I'm looking for a way to have to write and maintain a certain algorithm (a graphics rendering sub-module of my code, actually) only once. I need the algorithm in C++, PHP and Javascript. Theoretically I could write it in C++ and wrap it into a PHP extension; but that has many issues of itself and doesn't solve the Javascript link.
What I'm looking for, I think, is a tool that converts from a language (doesn't matter which one) into the three (or two, if the source language is one of the three) output languages I'm targetting. I've found MetaL (http://www.meta-language.net/) which seems to do what I want but also looks dead (no updates since 2007) and only targets one of the three languages I need. It needs to be quite flexible and allow me to update the results - for example, I use Cairo in my C++ and PHP rendering, and HTML Canvas on the Javascript side. So I need to customize to the API for certain effects.
Alternatively, I'd settle for a PHP parser and lexer that would give me an AST with enough information for me to write generators for C++ and Javascript as an alternative backend.
Any ideas? Thanks.
You could take a look at Haxe. Haxe is an open source programming language. It can be compiled to JavaScript, Flash/ActionScript, PHP, C++, Java, C#, Python and Lua.
The Emscripten project (which I only spotted last week) might interest you: http://syntensity.blogspot.com/2011/04/emscripten-10.html
This guy has basically written a compiler for C/C++ that compiles to Javascript code.
That should solve the Javascript side of your problem.
Hope that helps.
Another product along the same lines, and a bit more well known is Google Web Toolkit (GWT). It's based on Java, but the end result is similar -- you write your web application in Java code and it compiles the front-end parts into Javascript and the back-end parts into regular Java bytecode. I know you're not asking for Java, but if it interest you, the link is here: http://code.google.com/webtoolkit/
Slightly less useful, but possibly more relevant to your question is PHPJS. This is a project to implement as much of the PHP language in Javascript as possible. They're doing it on a function-by-function basis, so it's only ever going to be an approximation, but given that the language syntaxes are similar, it may be possible to use it to write code that works unchanged in native PHP and also in Javascript on the client side.
Of course the one big down-side of compiling one language into another is that the resulting code is always going to be sub-optimal. There's not much you can do about that, but it's worth bearing in mind before you start down the path of writing a shared code-base in a single language.
Maybe look into 'coding' your original algorithm in xml and using various xslt templates to output to your target languages ? Or possibly antlr (http://www.antlr.org/ http://www.amazon.com/Definitive-Antlr-Reference-Domain-Specific-Programmers/dp/0978739256/ref=sr_1_1?s=books&ie=UTF8&qid=1303114884&sr=1-1).
Maybe you can just Write it in javascript and then use a C++ and PHP javascript interpreter.
A completely different approach would be to use assembly code. Write the algorithm in a language of your choice, compile it to ASM source. Then provide the interface wrappers in the deployment languages.
Of course this is all so much 'air pie'. It depends upon so many variables, number of target platforms, importance of optimization, frequency of interface change related to implementation change etc etc
See How to escape quote PHP strings generated by Delphi?
I am just interested to hear if anyone has used Delphi (or possibly BCB) as a code generator for PHP ...
(or thoughts about code generation from one language to another in general)
Hmm, any good books about code generation ?
I've generated javascript, SQL and Delphi many times. But mostly is basic substitution, (and the example in the post you mention looks the same), not really codegeneration in the "compiler" sense of the word.
But there are also many real compilers in Pascals and Delphi like dialects. The biggest one I think is Free Pascal (http://www.freepascal.org), which is a compiler for Object Pascal (aka delphi)
(added later:)
Besides variable substitution, basic templating engines also fall in this category. Templates are sometimes easier maintainable than the same fragement code. Specially in html/cgi land this is used a lot.
You can generate anything from a tool which can export text files no?
You can write all by the hand, or in a "delphi style" by using Delphi for PHP http://www.embarcadero.com/products/delphi-for-php
best regards,
anyone has used Delphi (or possibly BCB) as a code generator for PHP
PHP - no, but I'm generating a lot of Delphi/Pascal code from Delphi. I've also generated all other things used for a web application: HTML, JavaScript, CSS - but never PHP because I didn't need that. So it's possible, but simply knowing it's possible is not going to help you much.
thoughts about code generation from one language to another in general
You need to look into "text template engines" for Delphi. I can't suggest any because I wrote my own (and I'm not planing on releasing my own under any license).
I find myself drawn to the Parsing Expression Grammar formalism for describing domain specific languages, but so far the implementation code I've found has been written in languages like Java and Haskell that aren't web server friendly in the shared hosting environment that my organization has to live with.
Does anyone know of any PEG libraries or PackRat Parser Generators for Javascript or PHP? Of course code generators in any languages that can produce Javascript or PHP source code would do the trick.
I have recently written PEG.js, PEG-based parser generator for JavaScript. It can be used from a command-line or you can try it from your browser.
There is in fact one for Javascript: OMeta. http://www.tinlizzie.org/ometa/
I also implemented a version of this in Python: http://github.com/python-parsley/parsley
php PEG https://github.com/maetl/php-peg
This post is really old but I found it through google, and It should have been answered
Language.js:
Language.js is an open source experimental new parser based on PEG (Parsing Expression Grammar), with the special addition of the "naughty OR" operator to handle errors in a unique new way. It makes use of memoization to achieve linear time parsing speed
There's also Kouprey for JavaScript, which is a very easy to use PEG generator/library.
look at https://github.com/leblancmeneses/NPEG can easily be converted into php.
Parse tree is created with anonymous functions.
Have you looked at ANTLR? It produces lexer and parser code, handles abstract syntax trees, lets you insert code the grammar to be injected into the lexer/parser code, and its available for a variety of languages!