String Parse , Lexer or Regex [duplicate] - php

I need to parse a small 'mini language' which users can type on my site. I was wondering what the counterparts of lex and jacc or antlr are for the world of php.

I used LIME Parser generator for PHP a couple of years ago, and it was already mature and stable.
The parser generator itself is written in PHP, which doesn't really matter in any technical sense - as we require only that the generated parser be in PHP - but I like this detail nonetheless. It makes me feel less apologetic about writing software in PHP ;-)
EDIT:
I should add:
Where I wrote "used" it would be more accurate to say that I "played with". I haven't written any production code using lime, yet. But I see no reason not to do so.
The "calculator example" provided with lime uses a tokenize() method which is very far from a real substitute for the power of lex. But if you need a real tokenizer it ought to be possible to use lex on the "front end" to feed tokens to lime on the "back end".

http://pear.php.net/package/PHP_ParserGenerator
http://wezfurlong.org/blog/2006/nov/parser-and-lexer-generators-for-php

I've ported Jison, a Bison clone in javascript, to php. The results are a killer parser, able to handle very simple and very complex lexing/parsing. It is now part of Jison, but there are a few updates in my fork - https://github.com/robertleeplummerjr/jison . The files are here - https://github.com/robertleeplummerjr/jison/tree/master/ports/php
See the readme in that page, you create a javascript and php parser at the same time that are capable of doing the same or different things. COOL!

I advise you to write your own parser, as it is quite easy today.
The easiest way to do so would be in my opinion to create one class for every syntax type possible (expression, test, loop, etc.).
Then in each class, code the following methods:
one method to determinate from a string if the string is of the given type (a+b is of type 'expression', if(b) is not)
one method to "run" this type (a+b will return a->run() + b->run(), and a->run() will return a value)

Related

PHP preprocessing script? [duplicate]

PHP interpreters are very common, but the PHP syntax & libraries are inconsistent & cumbersome (IMO, of course). I think a language that compiles into PHP but provides higher level level features (like, modules, mixins, list comprehensions, etc...) and easier syntax (like optional semicolons, implied returns, no dollar sign for variables, optional brackets and braces, etc...) would be valuable. Does anything like this exist?
I've been researching this a lot and at the moment it seems the answer is no. I'm the author of exactly such a project called Snowscript - it is far from complete, but the documentation is pretty good and some things do work. Would love to hear feedback of what you think about the syntax!
The short answer is "no." CoffeeScript rose to popularity because of a unique confluence of factors. For one, as Wesley points out, JavaScript has a monopoly on the browser platform, while PHP only has a monopoly on .php files. On your own servers, if you don't like PHP, you can just use Ruby, Python, Perl, or any of the myriad JVM or .NET languages.
Another factor is that JavaScript's design was something of an accident. Its creator, Brendan Eich, was told to "make it look like Java"; but semantically, it has more in common with Lisp and Smalltalk. CoffeeScript arguably provides a syntax that's a better fit with JavaScript's inner workings.
JavaScript's own syntactic evolution is severely hindered by the need to maintain compatibility with older browsers. PHP suffers no such limitations, as anyone who's transitioned their code from PHP4 to PHP5 can attest. If you want to make JavaScript a better language, you need a precompiler. If you want to make PHP a better language, post a feature request for PHP6. (Edit: In my original answer, I fell for an April Fool's joke claiming that PHP6 had been released in 2010. Obviously I'm not a PHP guy...)
All of that said, it could be cool to have a language that's like CoffeeScript for PHP. The ongoing success of WordPress, and its use on servers that users often have little control over, attests to PHP's unique place as a deployed language. It's also difficult to use PHP with alternative markup languages like Haml. Perhaps an alternative markup language combined with a fresh PHP syntax could produce a compelling enough reason for people to precompile their PHP.
Browsing and surfing the web I've found http://mammouth.boutglay.com/ looks like the most similar to coffee-script language for PHP. Seems to do the job.
If I've understood what you want correctly, then there's Haxe, which can target PHP, as well as Flash, JavaScript, and others.
I've only ever used it for Flash but found it very useful.
If you like Lisps, have a look at Pharen. I haven't needed to use it yet, but it looks pretty nice - it has defmacro and even transforms tail recursion into loops.
#gosukiwi made Blueberry, which looks like this:
/*
I'm a multiline comment
*/
a = 1 # variable definition
# you can use JSON syntax to define associative arrays
arr = { "name": "Mike", "age": 18, "meta": { "items": [1, 2, 3] } }
if a == 1
echo("Hello, World!")
end
for i in (0..10)
echo(i)
end
class MyClass < MyParentClass
#name
def Greet
echo("Hello! My name is " & #name)
end
end
They also mentioned it in this comment.
Currently there is no production-ready or completed coffeescript-like language/compiler for PHP.
I am the author of CoffeePHP, and is working on the compiler for the shorter syntax. it's actually another language.
https://github.com/c9s/coffeephp
Of course, you might be aware of this, but you could simply use nodejs with CoffeeScript... (unless you're specifically attached to PHP)
This library isn't like CoffeeScript, in itself, but it's a foundation for rewriting PHP to declare and use your own syntax. I don't have any experience with it, so don't read this as an endorsement, just an observation. https://github.com/theseer/preprocessor
Take a look at coffescript-php project which is compatible with coffescript 1.3.1 on github can be found at https://github.com/alxlit/coffeescript-php

how to create a language in php

I've been searching for this for a while though I never get a great answer.
I'm looking for a tutorial or code which parses a defined syntax like a new language. Preferably using strtok or tokenizer.
I need to write a simple language which I will parse later.
Thanks for any help.
edit
The language is quite simple. Basically variable assignment and loops as well as conditional checks. Nothing fancy.
edit
I guess from the answer I got, the title should not be so. Something along the lines of "how to create a language in php" would be better. Thanks.
Basically, "making a language" involves several steps. First, you need a "lexer" which splits your input into substrings belonging to different symbol classes (like "identifier", "number", "operator" etc). Second, you write down a grammar of your language, usually using some kind of BNF. Then you eat the banana use a program called "parser generator" which turns your grammar into actual parser code and finally you combine lexer and parser to get an actual complier.
Normally, this kind of things is being done with C or Java, I've never heard of working compliers written in php. Still, you can use php tokenizer for the first part (the lexer) - assuming your language has syntax similar to php - and try http://pear.php.net/package/PHP_ParserGenerator to generate the parser.
Sorry if this sounds a bit complicated, but so it is.
This link Any decent PHP parser written in PHP? discusses parsing of PHP, using PHP.
The value of this for OP, is that the answers provide several ways to obtain parser generators, some that run in PHP itself, which would likely be useful to him.

CoffeeScript-esque language for PHP? [duplicate]

PHP interpreters are very common, but the PHP syntax & libraries are inconsistent & cumbersome (IMO, of course). I think a language that compiles into PHP but provides higher level level features (like, modules, mixins, list comprehensions, etc...) and easier syntax (like optional semicolons, implied returns, no dollar sign for variables, optional brackets and braces, etc...) would be valuable. Does anything like this exist?
I've been researching this a lot and at the moment it seems the answer is no. I'm the author of exactly such a project called Snowscript - it is far from complete, but the documentation is pretty good and some things do work. Would love to hear feedback of what you think about the syntax!
The short answer is "no." CoffeeScript rose to popularity because of a unique confluence of factors. For one, as Wesley points out, JavaScript has a monopoly on the browser platform, while PHP only has a monopoly on .php files. On your own servers, if you don't like PHP, you can just use Ruby, Python, Perl, or any of the myriad JVM or .NET languages.
Another factor is that JavaScript's design was something of an accident. Its creator, Brendan Eich, was told to "make it look like Java"; but semantically, it has more in common with Lisp and Smalltalk. CoffeeScript arguably provides a syntax that's a better fit with JavaScript's inner workings.
JavaScript's own syntactic evolution is severely hindered by the need to maintain compatibility with older browsers. PHP suffers no such limitations, as anyone who's transitioned their code from PHP4 to PHP5 can attest. If you want to make JavaScript a better language, you need a precompiler. If you want to make PHP a better language, post a feature request for PHP6. (Edit: In my original answer, I fell for an April Fool's joke claiming that PHP6 had been released in 2010. Obviously I'm not a PHP guy...)
All of that said, it could be cool to have a language that's like CoffeeScript for PHP. The ongoing success of WordPress, and its use on servers that users often have little control over, attests to PHP's unique place as a deployed language. It's also difficult to use PHP with alternative markup languages like Haml. Perhaps an alternative markup language combined with a fresh PHP syntax could produce a compelling enough reason for people to precompile their PHP.
Browsing and surfing the web I've found http://mammouth.boutglay.com/ looks like the most similar to coffee-script language for PHP. Seems to do the job.
If I've understood what you want correctly, then there's Haxe, which can target PHP, as well as Flash, JavaScript, and others.
I've only ever used it for Flash but found it very useful.
If you like Lisps, have a look at Pharen. I haven't needed to use it yet, but it looks pretty nice - it has defmacro and even transforms tail recursion into loops.
#gosukiwi made Blueberry, which looks like this:
/*
I'm a multiline comment
*/
a = 1 # variable definition
# you can use JSON syntax to define associative arrays
arr = { "name": "Mike", "age": 18, "meta": { "items": [1, 2, 3] } }
if a == 1
echo("Hello, World!")
end
for i in (0..10)
echo(i)
end
class MyClass < MyParentClass
#name
def Greet
echo("Hello! My name is " & #name)
end
end
They also mentioned it in this comment.
Currently there is no production-ready or completed coffeescript-like language/compiler for PHP.
I am the author of CoffeePHP, and is working on the compiler for the shorter syntax. it's actually another language.
https://github.com/c9s/coffeephp
Of course, you might be aware of this, but you could simply use nodejs with CoffeeScript... (unless you're specifically attached to PHP)
This library isn't like CoffeeScript, in itself, but it's a foundation for rewriting PHP to declare and use your own syntax. I don't have any experience with it, so don't read this as an endorsement, just an observation. https://github.com/theseer/preprocessor
Take a look at coffescript-php project which is compatible with coffescript 1.3.1 on github can be found at https://github.com/alxlit/coffeescript-php

Parsing PHP/JavaScript document structure in Delphi

I need to parse PHP & JavaScript documents structure to get the info about document functions & their parameters, classes & their methods, variables, and so on ...
I'm wondering if there is any solution for doing that (no regular expressions) ... I've heard about something called "lexing" however I was unable to find any examples even the ones that could me tell if this is something what I am looking for or not ...
thanks in advance
By "Lexing" your referring to Lexical Analysis, and there are some ancient tools which mostly still work named Lex and Yacc. Lex builds the tokenizer, and Yacc stands for "yet another compiler compiler" and is the actual parser.
The concept of lex/Yacc, is you build a grammar for the language, and then run the grammar through the paslex tool to generate source code (normally in C) that you can use to parse a file and take action on specific keywords and tokens. Martin Waldenburg wrote a pascal version of lex/yacc named PasLex which has been kicking around for way over a decade now and has been converted to Delphi (although it might not work with the latest versions without some minor work). If I remember correctly, it uses the same .L grammar input files as lex, so any documentation you find for lex/yacc can also be applied to paslex, with the exception that you get pascal code as the output.
I'm not sure about current documentation availability. Before the internet (gasp) we used books and most of this was heavily documented on paper which has long turned yellow...however, rumor has it that you might..just might be able to pick up a used copy from Amazon. I cut my teeth on this using a book which is also known as "the dragon book" which appears to have been re-published as recently as 2006.
EDIT:
I was mistaken by the tool, it was TPLY. PasLex was a delphi grammar implementation...TPLY was the Lex/Yacc tool which generated pascal source from a .L file.
I'm not sure if this is feasible but for PHP would you be able to invoke the PHP CLI from Delphi to get the information?
If so you could call token_get_all() and then spit out the result in something that you can parse in Delphi (maybe xml, json, etc.). This is lexing. The problem with this is that is only half the problem solved - you still have to understand each token in context to get the results you want.

Are there any Parsing Expression Grammar (PEG) libraries for Javascript or PHP?

I find myself drawn to the Parsing Expression Grammar formalism for describing domain specific languages, but so far the implementation code I've found has been written in languages like Java and Haskell that aren't web server friendly in the shared hosting environment that my organization has to live with.
Does anyone know of any PEG libraries or PackRat Parser Generators for Javascript or PHP? Of course code generators in any languages that can produce Javascript or PHP source code would do the trick.
I have recently written PEG.js, PEG-based parser generator for JavaScript. It can be used from a command-line or you can try it from your browser.
There is in fact one for Javascript: OMeta. http://www.tinlizzie.org/ometa/
I also implemented a version of this in Python: http://github.com/python-parsley/parsley
php PEG https://github.com/maetl/php-peg
This post is really old but I found it through google, and It should have been answered
Language.js:
Language.js is an open source experimental new parser based on PEG (Parsing Expression Grammar), with the special addition of the "naughty OR" operator to handle errors in a unique new way. It makes use of memoization to achieve linear time parsing speed
There's also Kouprey for JavaScript, which is a very easy to use PEG generator/library.
look at https://github.com/leblancmeneses/NPEG can easily be converted into php.
Parse tree is created with anonymous functions.
Have you looked at ANTLR? It produces lexer and parser code, handles abstract syntax trees, lets you insert code the grammar to be injected into the lexer/parser code, and its available for a variety of languages!

Categories