Lex and Yacc in PHP [closed] - php

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Is there an implementation of Lex and Yacc in PHP?
If not, can anyone suggest a lexical analyser and parser generator (ie, anything like Lex and Yacc) that will create PHP code. I'm not too worried about the performance of the resulting parser.
I am sick of using regex to parse things that really shouldn't be parsed with regex...

There's JLexPHP: https://github.com/wez/JLexPHP/blob/master/jlex.php
I've not used it, but there's this: http://pear.php.net/package/PHP_ParserGenerator , which creates a PHP Parser from a Lemon grammar. The project seems to be inactive though.
I also found this project: http://code.google.com/p/antlrphpruntime/ , which uses Antlr. Again inactive though.

Been for looking for this kind of thing for a while. After finding this post, I've tried the ANTLR PHP runtime. I can report that it's far from being finished. There are several errors in the generated code, where the original java runtime classes has not been properly translated to PHP (nested class declarations, using '.' instead of '.' when trying to access class methods operator).
The ANTLR framework itself is quite powerful (can't attest to the efficiency of the generated code).
Especially the graphical tool ANTLRWorks makes it easy to create and debug grammas. Just too bad about the PHP version. It's possible to roll your own though. The best solution may be to analyse the generated ANTLR runtime class, figure out how it's works, and come up with a light weight less enterprisey version thereof.

Cheap trick: code a recursive descent parser. This will cover a lot of cases. See
Is there an alternative for flex/bison that is usable on 8-bit embedded systems?

Another sugestion: avoid Lex/Yacc approach, use PHP as a good string parser,
for simple tasks and simple translators: use perl-regular expressions (PCRE), with PHP preg_* functions. The callback have the same power of Awk or Yacc rules, but with PHP code (!).
for complex tasks: translate (with a PHP string or PCRE translator or another translator) your language to a XML dialect, process with DOM and/or XSLT. XSLT is "rule oriented" (se xsl:template) like Yacc. With XSLT you have also access to PHP functions with registerphpfunctions(). If need back to a non-XML language or a I/O complex format, process de output (a saved XML or a XSLT-output) again with PCRE and string functions.
PS: for more rich and complex languages, the "translation to XML" task is possible (see xSugar theory), but not always easy. You can use PHP-PEG to translate with PHP, or you can translate with a external tool, for cache the XML, or for use a permanent-xml-translated version of your specific-language-scripts.
These two options have the same (Lex and Yacc) power, and use only build-in PHP classes and functions.
For the complex cases, remember that XML, XSLT, etc. are W3C standards, then, XML-dialects are "standard formats", XML-tools are optimized and still evolving, and XML-data are interchangeable.

Related

String Parse , Lexer or Regex [duplicate]

I need to parse a small 'mini language' which users can type on my site. I was wondering what the counterparts of lex and jacc or antlr are for the world of php.
I used LIME Parser generator for PHP a couple of years ago, and it was already mature and stable.
The parser generator itself is written in PHP, which doesn't really matter in any technical sense - as we require only that the generated parser be in PHP - but I like this detail nonetheless. It makes me feel less apologetic about writing software in PHP ;-)
EDIT:
I should add:
Where I wrote "used" it would be more accurate to say that I "played with". I haven't written any production code using lime, yet. But I see no reason not to do so.
The "calculator example" provided with lime uses a tokenize() method which is very far from a real substitute for the power of lex. But if you need a real tokenizer it ought to be possible to use lex on the "front end" to feed tokens to lime on the "back end".
http://pear.php.net/package/PHP_ParserGenerator
http://wezfurlong.org/blog/2006/nov/parser-and-lexer-generators-for-php
I've ported Jison, a Bison clone in javascript, to php. The results are a killer parser, able to handle very simple and very complex lexing/parsing. It is now part of Jison, but there are a few updates in my fork - https://github.com/robertleeplummerjr/jison . The files are here - https://github.com/robertleeplummerjr/jison/tree/master/ports/php
See the readme in that page, you create a javascript and php parser at the same time that are capable of doing the same or different things. COOL!
I advise you to write your own parser, as it is quite easy today.
The easiest way to do so would be in my opinion to create one class for every syntax type possible (expression, test, loop, etc.).
Then in each class, code the following methods:
one method to determinate from a string if the string is of the given type (a+b is of type 'expression', if(b) is not)
one method to "run" this type (a+b will return a->run() + b->run(), and a->run() will return a value)

How to programmatically turn any webpage into an RSS feed? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
There are many websites and blog which provide RSS feeds, but on the other hand there are also many which do not. I want to turn that type of web page into RSS feeds.
I found some solutions using through Google like Feed43, Page2rss, Dapper etc, but I want an Open Source project which can perform this task or any tutorial explaining about it.
Please give me suggestions and if you can explain, you are most welcome.
My preferable language is PHP.
There's nothing magic about RSS. I suggest you read this tutorial to understand how to build an RSS feed from scratch:
http://www.xul.fr/en-xml-rss.html
Then use your PHP skills to build one from your content. A generic HTML-to-RSS scraper can be found online by searching for "html to rss converter" or whatever, but most of these will be hosted solutions and the RSS feeds they produce aren't that great. A good RSS feed requires understanding the content that you're syndicating, not just the raw HTML. IMHO.
In general there is not going to be any "one size fites all" solution to something like this. You'll have to examine the HTML structure of the blog you want to build an RSS feed from, then parse out the content you are interested in, and stick it into an RSS feed.
Here's some PHP things to help get you started:
Parsing HTML:
DOMDocument (swiss-army-knife of HTML/XML parsing)
SimpleXML (easy to use, but requires valid XML)
Tidy (can be used to clean up bad HTML)
Understanding RSS Feeds:
http://en.wikipedia.org/wiki/RSS
To construct them with PHP, you can once again use DOMDocument or SimpleXML. Another option is, depending on the format of the HTML you want to convert into RSS, you may be able to create an XSLT stylesheet to transform it.
There is no simple or concrete answer to this question, but I will get you started.
First, you need to build a crawler of sorts. Typically, you are going to want this to be multi-threaded and run in the background on your server. This might be as simple as forking PHP processes on the server, but you might find a more efficient way, depending on how much traffic you expect.
Now probably the best way to start would be to read the DOM. See http://php.net/manual/en/class.domdocument.php Look for headings and try to associate them with the paragraphs below them. Beware though that probably less than half the sites out there (and likely far fewer from the ones that don't already have a feed) don't structure their site in an organized way. But, it is a place to start.
There are plenty of element attributes too you can use, such as alt text. Also, in time you may find a lot of sites using a particular template that you can write code to handle directly.
You should also have something to read existing feeds. If a site has a feed, no sense in generating one for it, right? Use SimplePie to get started, but there are alternatives you don't like it. http://simplepie.org/
Once you have parsed the page, you'll want a database backend to track it and changes and what not.
From there, you need something to generate the feed. There are plenty of OOP classes for doing this. Often times, I just write my own, but that is up to you.
If you build sites with the simple symphony cms then yes, its very easy. See this snippet of a tutorial. Learn here

Is there a PDF parser for PHP? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Hi I know about several PDF Generators for php (fpdf, dompdf, etc.)
What I want to know is about a parser.
For reasons beyond my control, certain information I need is only in a table inside a pdf
and I need to extract that table and convert it to an array.
Any suggestions?
I've written one before (for similar needs), and I can say this: Have fun. It's quite a complex task. The PDF specification is large and unwieldy. There are several methods of storing text inside of it. And the kicker is that each PDF generator is different in how it works. So while something like TFPDF or DOMPDF creates REALLY easy to read PDFs (from a machine standpoint), Acrobat makes some really hellish documents.
The reason is how it writes the text. Most DOM based renderers --that I've used-- write the entire line as one string, and position it once (which is really easy to read). Acrobat tries to be more efficient (and it is) by writing only one or maybe a few characters at a time, and positioning them independently. While this REALLY simplifies rendering, it makes reading MUCH more difficult.
The up side here, is that the PDF format in itself is really simple. You have "objects" that follow a regular syntax. Then you can link them together to generate the content. The specification does a good job at describing the file format. But real world reading is going to take a bit of brain power...
Some helpful pieces of advice that I had to learn the hard way if you're going to write it yourself:
Adobe likes to re-map fonts. So character 65 will likely not be A... You need to find a map object and deduce what it's doing based upon what characters are in there. And it is efficient since if a character doesn't appear in the document for that font, it doesn't include it (which makes life difficult if you try to programmatically edit a PDF)...
Write it as abstract as possible. Write classes for each object type, and each native type (strings, numbers, etc). Let those classes parse for you. There will be a fair bit of repetition in there, but you'll save yourself in the end when you realize that you need to tweak something for only one specific type)...
Write for a specific version or two of the PDF spec, and enforce it. Check the version number, and if it's higher than you expect, bail... And don't try to "make it work". If you want to support newer versions, break out the specification and upgrade the parser from there. Don't try to trial and error your way up (it's not fun)...
Good luck with compressed streams. I've found that typically you can't trust the length arguments to verify what you are uncompressing. Sometimes (for some generators) it works well... Others it's off by one or more bytes. I just attempt to deflate it if the filter matches, and then force the length...
When testing lengths, don't use strlen. Use mb_strlen($string, '8bit') since it will compensate for different character sets (and allow potentially invalid characters in other charsets).
Otherwise, best of luck...
I use PDFBox for that (http://pdfbox.apache.org/). This software is javabased and platform independend. It works fast and reliable. You can use it via exec or shell execute or via a PHP/Java-Bridge (http://php-java-bridge.sourceforge.net/)
Have you already looked at xPDF ? There is a program in there called pdftotext that will do the conversion. You can call it from PHP and then read in the text version of the PDF. You will need to have the ability to run exec() or system() from php, so this may not work on all hosted solutions though.
Also, there are some examples on the PHP site that will convert PDF to text, although its pretty rough. You may want to try some of those examples as well. On that PHP page, search for luc at phpt dot org.
Zend_Pdf is part of the Zend Framework. Their manual states:
The Zend_Pdf component is a PDF
(Portable Document Format)
manipulation engine. It can load,
create, modify and save documents.
Thus it can help any PHP application
dynamically create PDF documents by
modifying existing documents or
generating new ones from scratch.
Have a look at GhostScript or ITextSharp, there are various cross-platform version of both.
It may not actually be a table inside the PDF as the PDF loses that sort of information...
This is PHP PDF parser, which exists in two flavours:
Free version can parse PDFs up to format PDF 1.5
Commercial add-on can parse any PDF format (up to current 1.9)

Widely accepted methods for documenting PHP source code in an auto-doc way? [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
For ActionScript 2, I've used NaturalDocs. However it has pretty poor support for PHP. I've looked so far at doxygen and phpDocumentor, but their output is pretty ugly in my opinion. Does anyone have any experience with automatic documentation generation for PHP? I'd prefer to be able to use javadoc-style tags, they are short to write and easy to remember.
ApiGen
http://apigen.org/
ApiGen has support for PHP 5.3 namespaces, packages, linking between documentation, cross referencing to PHP standard classes and general documentation, creation of highlighted source code and experimental support for PHP 5.4 traits.
DocBlox
http://www.docblox-project.org/
PHP 5.3 compatible API Documentation generator aimed at projects of all sizes and Continuous Integration.
able to fully parse and transform Zend Framework 2
There are two well-known and often-used tool that can generate API doc from docblocks :
phpDocumentor, which is specific to PHP, and is probably one of the most used tool for PHP projects
and Doxygen, which is more for C, but is used for PHP by some people.
About the "guidelines" : I would say it depends on your projects, but, at least, I would expect to see :
a description of what the method/function does
parameters, with #param type name description of the parameter
return value, with #return type description of the return value
exceptions, with #throws type description of the exception that can be thrown
A great thing being that recent IDE (like Eclipse PDT, for instance), can read and interpret those markers, to provide hints when you're coding ;-)
Also, there are more and more PHP tools that use php docblocks for some other purpose than documentation.
For instance, PHPUnit allows you to specify some test-related stuff using some specific tags -- see Annotations.
PHPDoc is probably as good as you'll get it in terms of Javadoc style inline commenting. You might also want to look at PHPXRef.
Yes, phpDocumentor (http://www.phpdoc.org/) is an acceptable standard tool for PHP autodocs. It's the de-facto standard.
It's acceptable to follow the general JavaDoc guidelines for code when documenting PHP code. However, you're going to run into cases where that's not enough because PHP and Java are different languages.
For example, PHP functions have no return type and it's inevitable (and sometimes desirable) for a function to return one type with one context, and another type with in a second context. JavaDoc guidelines aren't going to help with that, because it's impossible to do in Java.
phpDocumentor can output in a style similar to the php.net documentation (and its built in with both smarty and non smarty layouts)
Check out PHPDoctor http://peej.github.com/phpdoctor/, a simple yet full features Javadoc clone for PHP.
The Wordpress code-base is documented using phpdoc tags (see this) and it's generally pretty good - it's rendered here using phpDocumentor.
Most of the code that I write that's substantial enough that I need to think about autogenerating docs for is done in .Net, so I can't give much of a documentation writer's perspective though
If ugly's a problem, I'd just switch the stylesheet for a custom one. If you don't want to overwrite the generated one, you can use a Firefox plugin like Stylish.
You could edit the template file to add your own stylesheet to override the existing one.

Are there any Parsing Expression Grammar (PEG) libraries for Javascript or PHP?

I find myself drawn to the Parsing Expression Grammar formalism for describing domain specific languages, but so far the implementation code I've found has been written in languages like Java and Haskell that aren't web server friendly in the shared hosting environment that my organization has to live with.
Does anyone know of any PEG libraries or PackRat Parser Generators for Javascript or PHP? Of course code generators in any languages that can produce Javascript or PHP source code would do the trick.
I have recently written PEG.js, PEG-based parser generator for JavaScript. It can be used from a command-line or you can try it from your browser.
There is in fact one for Javascript: OMeta. http://www.tinlizzie.org/ometa/
I also implemented a version of this in Python: http://github.com/python-parsley/parsley
php PEG https://github.com/maetl/php-peg
This post is really old but I found it through google, and It should have been answered
Language.js:
Language.js is an open source experimental new parser based on PEG (Parsing Expression Grammar), with the special addition of the "naughty OR" operator to handle errors in a unique new way. It makes use of memoization to achieve linear time parsing speed
There's also Kouprey for JavaScript, which is a very easy to use PEG generator/library.
look at https://github.com/leblancmeneses/NPEG can easily be converted into php.
Parse tree is created with anonymous functions.
Have you looked at ANTLR? It produces lexer and parser code, handles abstract syntax trees, lets you insert code the grammar to be injected into the lexer/parser code, and its available for a variety of languages!

Categories