Php-parser printer with max line length functionality - php

I wonder if it exists pretty printer for php-parser which is capable to enforce desired maximum line length?
(It seems rather simple to implement for some basic cases (array elements list, function arguments list), but it starts to be puzzling with variable expressions etc.)

As far as I know there's no existing pretty printer for PHP-Parser that takes a right margin into account.
There's the standard pretty printer of PHP-Parser itself.
There's also a PSR-2 pretty printer made for PHP-Parser.
DIY
If these don't suffice, you'll have to write a pretty printer yourself.
IMHO this shouldn't be to hard. You can simply wrap when a node exceeds the right margin and indent 4 spaces (or whatever you use). Then you can start optimizing things like array definitions and such.

Sorry for the late reply. You could use PHP Front too. Indentation is done for all nestings of statements, 2 spaces per nesting.
Some customized Indentation is possible. And it is available in PHP Front.
The parser and the pretty printer are also tested together using the test-files of the source-distribution of PHP.
Each test-file is parsed, pretty-printed, parsed and pretty-printed again.
The correctness of this round-trip is tested by performing a diff between the two parsed and the two pretty-printed files.
However I got recommendation to use Standard one as it has many features. It has variable expressions and array expressions feautures. Where as in PHP front, it is still some bugs available to use arrays.
Standard Pretty Printer: (Variable Expressions & Array)
public function pExpr_Variable(Expr\Variable $node) {
if ($node->name instanceof Expr) {
return '${' . $this->p($node->name) . '}';
} else {
return '$' . $node->name;
}
}
public function pExpr_Array(Expr\Array_ $node) {
return 'array(' . $this->pCommaSeparated($node->items) . ')';
}

Related

Functional Programming - Return Transformed array and the count of the array without calculating twice

I'm trying to write more functional code in PHP without any helper libraries.
I need to return some JSON that includes the results of a transformed array AND the count of that array (for convenience on the data consumer end). Since you're not supposed to use variables in FP, I'm stumped on how to get the count of the array without recalculating/remapping the array.
Here's an example of what my code currently looks like:
$duplicates = array_filter( get_results(), 'find_duplicates' );
send_json( array(
"duplicates" => $duplicates,
"numDuplicates" => count( $duplicates )
) );
How can I do the same without storing the results of the filter in a temporary variable to avoid running array_filter() twice?
But first, acknowledge the following...
"Since you're not supposed to use variables in FP..." – that's a ludicrous understanding of functional programming. Variables are used constantly in functional programs. I'm guessing you saw point-free functional programs and then imagined that every program can be expressed in such a way...
the receiver of the JSON could easily get the number of duplicates using JSON.parse(json).duplicates.length because every Array in JavaScript has a length property – it's arguably silly to attach a numDuplicates in the first place. Anyway, let's assume your consumer has a specific API that requires the numDuplicates field...
functional programming is concerned with things like function purity – maybe you've simplified your code in your post (which is bad; don't do that) or that is in fact your actual code. In such a case, get_results() and send_json functions are impure; send_json has an obvious (but unknown) side effect (the return value is not used) — You ask for a functional solution but you have other outstanding non-functional code... so...
There's nothing wrong with the code you have. Sometimes removing a point (variable, or argument), it hurts the readability of the code. In your case, this code is perfectly legible. It is at this point that I feel you're only trying to shorten the code or make it more clever. Your intention is to improve it, but I think you'd actually harm it in this case.
What if I told you...
a variable assignment can be replaced with a lambda? 0_0
(function ($duplicates) {
send_json([
'duplicates' => $duplicates,
'numDuplicates' => count($duplicates)
});
}) (array_filter(get_results(), 'find_duplicates'));
But that made the code longer.. and there's added abstraction which hurts readability T_T In this case, using a normal variable assignment (as in your original code) would've been much better
Combinators
OK, so what if you had some combinators at your disposal to massage the data into the desired shape?
function apply (...$xs) {
return function ($f) use ($xs) {
return call_user_func($f, ...$xs);
};
}
function identity ($x) { return $x; }
// hey look, mom! no points!
send_json(
array_combine(
['duplicates', 'numDuplicates'],
array_map(
apply(
array_filter(get_results(), 'find_duplicates')),
['identity', 'count'])));
Did we achieve anything other than writing the weirdest PHP you or anyone else has probably seen? Not to mention, the input is strangely nested in the middle of the expression...
remarks
I'm nearly certain that you'll be disappointed with this answer (or disagree with me), but I'm also pretty confident that you're not sure what you're looking for. A guess: you saw functional programming that "doesn't use variables" and assumed that's how all programs can and should be written; but that's just not the case. Sometimes using a variable or two can dramatically improve the readability of a given expression.
Anyway, all of this is truly beside the point because attaching numDuplicates is arguably an anti-pattern in JSON anyway (point #2 above).

PHP regex parsing - splitting tokens in my own language. Is there a better way?

I am creating my own language.
The goal is to "compile" it to PHP or Javascript, and, ultimately, to interpret and run it on the same language, to make it look like a "middle-level" language.
Right now, I'm focusing on the aspect of interpreting it in PHP and run it.
At the moment, I'm using regex to split the string and extract the multiple tokens.
This is the regex I have:
/\:((?:cons#(?:\d+(?:\.\d+)?|(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|(?:[a-z]+(?:#[a-z]+)?|\^?[\~\&](?:[a-z]+|\d+|\-1)))/g
This is quite hard to read and maintain, even though it works.
Is there a better way of doing this?
Here is an example of the code for my language:
:define:&0:factorial
:param:~0:static
:case
:lower#equal:cons#1
:case:end
:scope
:return:cons#1
:scope:end
:scope
:define:~0:static
:define:~1:static
:require:static
:call:static#sub:^~0:~1 :store:~0
:call:&-1:~0 :store:~1
:call:static#sum:^~0:~1 :store:~0
:return:~0
:scope:end
:define:end
This defines a recursive function to calculate the factorial (not so well written, that isn't important).
The goal is to get what is after the :, including the #. :static#sub is a whole token, saving it without the :.
Everything is the same, except for the token :cons, which can take a value after. The value is a numerical value (integer or float, called static or dynamic in the language, respectively) or a string, which must start and end with ", supporting escaping like \". Multi-line strings aren't supported.
Variables are the ones with ~0, using ^ before will get the value to the above :scope.
Functions are similar, being used &0 instead and &-1 points to the current function (no need for ^&-1 here).
Said this, Is there a better way to get the tokens?
Here you can see it in action: http://regex101.com/r/nF7oF9/2
[Update] To issue the pattern being complicated and maintainability, you can split it using PCRE_EXTENDED, and comments:
preg_match('/
# read constant (?)
\:((?:cons#(?:\d+(?:\.\d+)?|
# read a string (?)
(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|
# read an identifier (?)
(?:[a-z]+(?:#[a-z]+)?|
# read whatever
\^?[\~\&](?:[a-z]+|\d+|\-1)))
/gx
', $input)
Beware that all space are ignored, except under certain conditions (\n is normally "safe").
Now, if you want to pimp you lexer and parser, then read that:
What does (f)lex [GNU equivalent of LEX] is simply let you pass a list of regexp, and eventually a "group". You can also try ANTLR and PHP Target Runtime to get the work done.
As for you request, I've made a lexer in the past, following the principle of FLEX. The idea is to cycle through the regexp like FLEX does:
$regexp = [reg1 => STRING, reg2 => ID, reg3 => WS];
$input = ...;
$tokens = [];
while ($input) {
$best = null;
$k = null;
for ($regexp as $re => $kind) {
if (preg_match($re, $input, $match)) {
$best = $match[0];
$k = $kind;
break;
}
}
if (null === $best) {
throw new Exception("could not analyze input, invalid token");
}
$tokens[] = ['kind' => $kind, 'value' => $best];
$input = substr($input, strlen($best)); // move.
}
Since FLEX and Yacc/Bison integrates, the usual pattern is to read until next token (that is, they don't do a loop that read all input before parsing).
The $regexp array can be anything, I expected it to be a "regexp" => "kind" key/value, but you can also an array like that:
$regexp = [['reg' => '...', 'kind' => STRING], ...]
You can also enable/disable regexp using groups (like FLEX groups works): for example, consider the following code:
class Foobar {
const FOOBAR = "arg";
function x() {...}
}
There is no need to activate the string regexp until you need to read an expression (here, the expression is what come after the "="). And there is no need to activate the class identifier when you are actually in a class.
FLEX's group permits to read comments, using a first regexp, activating some group that would ignore other regexp, until some matches is done (like "*/").
Note that this approach is a naïve approach: a lexer like FLEX will actually generate an automaton, which use different state to represent your need (the regexp is itself an automaton).
This use an algorithm of packed indexes or something alike (I used the naïve "for each" because I did not understand the algorithm enough) which is memory and speed efficient.
As I said, it was something I made in the past - something like 6/7 years ago.
It was on Windows.
It was not particularly quick (well it is O(N²) because of the two loops).
I think also that PHP was compiling the regexp each times. Now that I do Java, I use the Pattern implementation which compile the regexp once, and let you reuse it. I don't know PHP does the same by first looking into a regexp cache if there was already a compiled regexp.
I was using preg_match with an offset, to avoid doing the substr($input, ...) at the end.
You should try to use the ANTLR3 PHP Code Generation Target, since the ANTLR grammar editor is pretty easy to use, and you will have a really more readable/maintainable code :)

Creating custom PHP Syntax Parser

I am thinking about how one would go about creating a PHP equivalent for a couple of libraries I found for CSS and JS.
One is Less CSS which is a dynamic stylesheet language. The basic idea behind Less CSS is that it allows you to create more dynamic CSS rules containing entities that "regular" CSS does not support such as mixins, functions etc and then the final Less CSS compiles those syntax into regular CSS.
Another interesting JS library which behaves in a (kind of) similar pattern is CoffeeScript where you can write "tidier & simpler" code which then gets compiled into regular Javascript.
How would one go about creating a simple similar interface for PHP? Just as a proof of concept; I am only trying to learn stuff. Lets just take a simple use case of extending classes.
class a
{
function a_test()
{
echo "This is test in a ";
}
}
class b extends a
{
function b_test()
{
parent::a_test();
echo "This is test in b";
}
}
$b = new b();
$b->b_test();
Suppose I want to let the user write class b as (just for the example):
class b[a] //would mean b extends a
{
function b_test()
{
[a_test] //would mean parent::a_test()
echo "This is test in b";
}
}
And let them later have that code "resolve" to regular PHP (Usually by running a separate command/process I would believe). My question is how would I go about creating something like this. Can it be done in PHP, would I require to use something like C/C++. How should I approach this problem if I were to go at it? Are there any resources online? Any pointers are deeply appreciated!
Language transcoders are not as easy as one might think.
The example you gave can be implemented very easily with a preg_replace that looks for class definitions and replaces [a] with extends a.
But more complex features need a transcoder which is a suite of smaller logical pieces of code.
In most programmer jargon people incorrectly call transcoders compilers but the difference between compilers and transcoders is that compilers read source code and output raw binary machine code while transcoders read source code and output (a different) source code.
The PHP (or JavaScript) runtime for example is neither compiler nor transcoder, it's an interpreter.
But enough about jargon let's talk about transcoders:
To build a transcoder you must first build a tokenizer, it breaks apart the source code into tokens, meaning that if it sees an entire word such as 'class' or the name of a class or 'function' or the name of a function, it captures that word and considers it a token. When it encounters another token such as an opening round bracket or an opening brace or a square bracket etc. it considers that another token.
Luckily all of the recognized tokens available in PHP are already easily scanned by token_get_all which is a function PHP is bundled with. You may have some trouble because PHP assumes some things about how you use symbols but all in all you can make use of this function.
The tokenizer creates a flat list of all the tokens it finds and gives it to the parser.
The parser is the second phase of your transcoder, it reads the list of tokens and decides stuff like "if token[0] is a class and token[1] is a name_value then we have a class" etc.. after running through the entire list of tokens we should have an abstract syntax tree.
The abstract syntax tree is a structure that symbolically retains only the relevant information about a the source code.
$ast = array(
'my_derived_class' => array(
'implements' => array(
'my_interface_1',
'my_interface_2',
'my_interface_3'),
'extends' => 'my_base_class',
'members' => array(
'my_property_name' => 'my_default_value',
'my_method_name' => array( /* ... */ )
)
)
);
After you get an abstract syntax tree you need to walk through it and output the destination source code.
The real tricky part is the parser which (depending on the complexity of the language you are parsing) may need a backtracking algorithm or some other form of pattern matching to differentiate similar cases against one another.
I recommend reading about this in Terence Parr' book http://pragprog.com/book/tpdsl/language-implementation-patterns which describes in detail the design patterns needed to write a transcoder.
In Terrence' book you'll find out why some languages such as HTML or CSS are much simpler (structurally) than PHP or JavaScript and how that relates the complexity of the language parser.

Spaces, line breaks, tabs ; are they affect server performance?

Spaces, line breaks, tabs ; are they affect server performance ?
I'm in the road of learning PHP and before I go further with my current coding style, i want to make sure :
Are line breaks and spaces affect the performance of the server ? Usually, I always add them for readibility. for example in the following code :
import('something') ;
$var = 'A' ;
$varb = 'B' ;
switch($var) {
case 'A' :
doSomething() ;
doAnotherThing() ;
break ;
}
if ($var == $varb) { header('Location: somewhere.php') ; }
Summary,
I add space before a semicolon
I add space after and before variable value assignment and comparison
I add space between ) and {
Usually I add a line break after { if the code following it consist of multiple statements.
Inside the curly bracket, I always start with a space before the first statement and ended it with another space after the last statement's semicolon
I always give a 2-space-width tab for every child elements
I always add a space after 'Location:' inside header function.
I always add space before semicolon for each case condition
This style is cool for me, I like it, its tidy and it makes me easier to debug, what i wonder is, will this kind of coding style hurt/burden the system ? Will it makes server slower by re-formatting my codes ? So far i got no formatting error.
Thank you for your kind answers
No. The extra formatting will not affect performance at all*.
Choose the coding style you like -- that is also acceptable for the team/project/existing code -- and, most importantly, be consistent. (Using an editor with customizable syntax formatting is helpful.)
Happy coding.
*While it could be argued that an insignificant increase IO may occur and an insignificant greater amount of symbols must be read by the lexer, the final result is: there will be no performance decrease.
No and yes (but mostly insignificant). Slightly different way thinking about the issue from #pst's answer (not even thinking about disk io) but same end result.
Simplified php behind the scenes - PHP is compiled to bytecodes on runtime. During compile, all spaces and comments are filtered down/out among many other actions.
Filtering out more whitespace from less is mostly insignificant compared with all the other actions.
The compiled bytecodes are what actually gets run.
But let's say you are running a major website, have 1000s of web servers and each php file is getting called millions of times a day. All those previously insignificant bits of time add up. But so does all the other stuff that the compiler is doing. At the point that this all becomes an issue for you, it's time to start looking into PHP caching/accelerators. (Or more likely long before this.)
Basically, those cachers/accelerators cache the compiled bytecodes the first time they are produced after the files are modified. Subsequent calls to the same file skip the compiling phase and go right to the cached compiled bytecodes. At that stage all the whitespace no longer exists. So, it becomes a moot point because they only ever compile once.

Best way to implement conditional in a PHP templating system?

I'm creating a very simple PHP templating system for a custom CMS/news system. Each article has the intro and the full content (if applicable). I want to have a conditional of the form:
{continue}click to continue{/continue}
So if it's possible to continue, the text "click to continue" will display, otherwise, don't display this whole string. So far, it works like this: if we can continue, remove the {continue} tags; otherwise, remove the whole thing.
I'm currently using preg_match_all to match the string in the second case. What is the best function for removing the matched text? A simple str_replace, or something else?
Or is there a better way to implement this overall?
Why not use preg_replace_callback?
Specifically:
preg_replace_callback('!\{continue\}(.*)\{/continue\}!Us', 'replace_continue', $html);
function replace_continue($matches) {
if (/* can continue */) {
return $matches[1];
} else {
return '';
}
}
I find preg_replace_callback to be incredibly useful.
Sorry I know people have been ridiculing this kind of answer, but just use Smarty. Its simple, stable, clever, free, small and cheap. Spend an hour learning how to use it and you will never look back.
Go to www.smarty.net
For PHP templating systems the proper way is to parse the template (using state machine or at least preg_split), generate PHP code from it, and then use that PHP code only. Then you'll be able to use normal if for conditional expressions.
Regular expressions aren't good idea for implementing templates. It will be PITA to handle nesting and enforce proper syntax beyond basic cases. Performance will be very poor (you have to be careful not to create backtracking expression and even then you'll end up scanning and copying KBs of templates several times).

Categories