I have a piece of text like the following
foo
and foo2
and bar
or something
and somethingElse
or somethingElse2
or somethingElse3
and baz
or godknows
or godknows2
This should be interpreted as:
(
foo
&& foo2
&& (bar || (something && (somethingElse || somethingElse2 || somethingElse 3)))
&& (baz || godknows || godknows2)
)
At the moment I'm reading line by line. I know that I need to measure the indentation and parse the expression of the next line in order to figure out the expression that the current line belongs too, but I'm having trouble figuring out how to do that usefully without consuming the next line too.
It seems like the kind of problem which has a recursive solution, but it's escaping me.
The input format isn't fixed, I just want to be able to turn a relatively readable expression into a tree of booleans, so if you can answer with a more suitable format which is still readable, please do :)
Python, which uses this style of indentation, does its parsing by maintaining a stack of indentation levels. Upon seeing a new line, it determines whether it has been indented from the previous line by seeing whether the current depth has increased. If so, Python pretends that there was an invisible symbol called "INDENT" that was inserted into the input stream. It then pushes the new depth onto the stack.
If the indentation decreases, Python repeatedly pops the stack and pretends that an invisible symbol called "DEDENT" was inserted into the input stream until the indentation level matches the value on the stack.
You could probably adapt this approach very easily here by replacing "INDENT" and "DEDENT" with ( and ). You would need to do a minor transformation afterwards by making sure that the ( token was inserted before the previous variable, but I'd expect this isn't too hard.
With that change, you should be able to parse this extremely easily. For example, the script
A
and B
or C
and D
or E
Would transform into
A and (B or (C and D))) or E
Hope this helps!
Related
I need to implement the following thing in my web-application. I know my solution is incorrect, but I put the code jsut to demonstrate the idea.
There is a class 'arc'. I need to be able to assign ANY expression to this arc (e.g. a+b+c,a-c,if-then). Once expression is assigned, I'd like to be able to execute it with some randomly taken variables. Is it possible to implement such functionality in web-applications? Maybe, I should use some plug-in like MathPL? Or maybe there is an absolutely different approach to tackle such kind of problems?
class arc {
var $arcexpression;
function setExpression($arcexpression) {
$this->arcexpression = $arcexpression;
}
function getExpression() {
return $this->arcexpression;
}
}
$arc = new arc();
$arc->setExpression("if a>b then return a else return b");
$result = $arc->execute(a,b); // the function 'execute' should be somehow described in 'arc'
You don't need to implement a whole language for this. I would start by limiting what can be done, for example, limit your expressions to arithmetic operators (+, -, *, /), parentheses and if-then operator. You'll need to enforce some sort of syntax for if-then to make it easier, possibly, the same as php's operator ?:. After that you need to build a parser for this grammar only: to parse a given expression into a tree. For example, expression `a + b * c' would parse into something like this:
+
/ \
a *
/ \
b c
After that you'll just have to evaluate such expressions. For example, by passing an array into your evaluate function of type { a => 1, b => 2, c => 3 }, you'll get 7 out of it.
The idea of the parse is the following:
Start from position 1 in the string - and call a recursive function to parse data from that position. In the function, start reading from the specified position.
If you read an opening parenthesis, call itself recursively
If you encounter a closing parenthesis or end-of-string, return the root node
Read the first identifier (or recursively inside parentheses)
Read the arithmetic sign
Read the second identifier (or recursively inside parentheses)
If the sign is * or /, then create the node with the sign in it and two operands as children and attach that node as the corresponding (left or right) child of the previous operator.
If the sign is + or -, then find create the node with the sign in it, one of the children being one of the operands and the second node being the root of the subtree with * and / at the root (or the second operand, if it's a simple operation).
Getting pure arithmetic, with parentheses, working is easy; if-then is a bit more tricky, but still not too bad. About 10 years ago I had to implement something like this in Java. It took me about 3 days to get everything sorted and was in total about 500 lines of code in 1 class, not counting javadoc. I suspect in PHP it will be less code, due to sheer simplicity of PHP syntax and type conversions.
It may sound complicated, but, in reality, it's much easier than it seems once you start doing it. I remember very well a university assignment to do something similar as part of the algorithms class, 17-18 years ago.
I want to extract the whole function given its name and starting line-number.
Output should be something like
function function_name( $a = null, $b = true ) {
$i_am_test = 'foo';
return $i_am_test;
}
or whatever the function definition is. Most tools (including grep etc.) only return the first line function function_name( $a = null, $b = true ) { but I need the entire function definition.
To accurately extract a function (variable/class/....) from a computer program source file, especially for PHP, you need a real parser for that languages.
Otherwise you'll have some kind of hack that fails for an amazing variety of crazy reasons, some of which have to do with strings and comments confusing the extraction machinery (and trying skip string literals is PHP is nightmare), and some have to do with funny language rules you don't discover until you trip over it (what happens if your PHP file contains HTML that contains stuff that looks like PHP source code?).
Our DMS Software Reengineering Toolkit has a full PHP5 front end that understand PHP syntax completely. It can parse PHP source files, and then be configured to analyze/extract whatever code you want. The parser accurately captures line numbers on its internal ASTs, so it is quite easy to find the code in file at a particular line number; given the code/AST, it is quite easy to print the AST the represents the code at that line number. If you find a function identifier on a particular line and print out the relevant AST, you'll exactly the function source code.
I don't know if it's just me or not, but I am allergic to one line ifs in any c like language, I always like to see curly brackets after an if, so instead of
if($a==1)
$b = 2;
or
if($a==1) $b = 2;
I'd like to see
if($a==1){
$b = 2;
}
I guess I can support my preference by arguing that the first one is more prone to errors, and it has less readability.
My problem right now is that I'm working on a code that is packed with these one line ifs, I was wondering if there is some sort of utility that will help me correct these ifs, some sort of php code beautifier that would do this.
I was also thinking of developing some sort of regex that could be used with linux's sed command, to accomplish this, but I'm not sure if that's even possible given that the regex should match one line ifs, and wrap them with curley brackets, so the logic would be to find an if and the conditional statement, and then look for { following the condition, if not found then wrap the next line, or the next set of characters before a line break and then wrap it with { and }
What do you think?
Your best bet is probably to use the built-in PHP tokenizer, and to parse the resulting token stream.
See this answer for more information about the PHP Tokenizer: https://stackoverflow.com/a/5642653/1005039
You can also take a look at a script I wrote to parse PHP source files to fix another common problem in legacy code, namely to fix unquoted array indexes:
https://github.com/GustavBertram/php-array-index-fixer/blob/master/aif.php
The script uses a state machine instead of a generalized parser, but in your case it might be good enough.
Spaces, line breaks, tabs ; are they affect server performance ?
I'm in the road of learning PHP and before I go further with my current coding style, i want to make sure :
Are line breaks and spaces affect the performance of the server ? Usually, I always add them for readibility. for example in the following code :
import('something') ;
$var = 'A' ;
$varb = 'B' ;
switch($var) {
case 'A' :
doSomething() ;
doAnotherThing() ;
break ;
}
if ($var == $varb) { header('Location: somewhere.php') ; }
Summary,
I add space before a semicolon
I add space after and before variable value assignment and comparison
I add space between ) and {
Usually I add a line break after { if the code following it consist of multiple statements.
Inside the curly bracket, I always start with a space before the first statement and ended it with another space after the last statement's semicolon
I always give a 2-space-width tab for every child elements
I always add a space after 'Location:' inside header function.
I always add space before semicolon for each case condition
This style is cool for me, I like it, its tidy and it makes me easier to debug, what i wonder is, will this kind of coding style hurt/burden the system ? Will it makes server slower by re-formatting my codes ? So far i got no formatting error.
Thank you for your kind answers
No. The extra formatting will not affect performance at all*.
Choose the coding style you like -- that is also acceptable for the team/project/existing code -- and, most importantly, be consistent. (Using an editor with customizable syntax formatting is helpful.)
Happy coding.
*While it could be argued that an insignificant increase IO may occur and an insignificant greater amount of symbols must be read by the lexer, the final result is: there will be no performance decrease.
No and yes (but mostly insignificant). Slightly different way thinking about the issue from #pst's answer (not even thinking about disk io) but same end result.
Simplified php behind the scenes - PHP is compiled to bytecodes on runtime. During compile, all spaces and comments are filtered down/out among many other actions.
Filtering out more whitespace from less is mostly insignificant compared with all the other actions.
The compiled bytecodes are what actually gets run.
But let's say you are running a major website, have 1000s of web servers and each php file is getting called millions of times a day. All those previously insignificant bits of time add up. But so does all the other stuff that the compiler is doing. At the point that this all becomes an issue for you, it's time to start looking into PHP caching/accelerators. (Or more likely long before this.)
Basically, those cachers/accelerators cache the compiled bytecodes the first time they are produced after the files are modified. Subsequent calls to the same file skip the compiling phase and go right to the cached compiled bytecodes. At that stage all the whitespace no longer exists. So, it becomes a moot point because they only ever compile once.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I need some PHP code to convert some PHP into JS.
functionality - I'm using common PHP functions from php.js
syntax - ???
The issue is converting the syntax. I don't need full PHP syntax, mind you; no need for supporting class definitions/declarations. Here's a small checklist of what needs conversion:
"." should be "+" (string concat)
"->" should be "." (object operator)
"::" should be "." (class operator - not really required)
Please note that the resulting code is pretty much independent of the PHP environment, so no "what if it uses a PHP class?".
I'm not asking for full code, just a tip on the right direction to this kind of conversion; I was thinking about employing a state machine/engine.
If you're curious as to why I'm pushing code to the user side: I need a dynamic way to change visibility of certain elements given certain conditions. My plan is to do this without having to execute this code server side and having unnecessary ajax calls.
Edit: Look people. I know not using AJAX sounds ludicrous to you but the world doesn't work on hype and nice-sounding design conditions (=ajax). I simply can't afford each single user polling my server 5 to 10 times per second just for my server to return a "yes" or "no" answer. Keep in mind that switching is asynchronous, and I can't buffer the AJAX calls.
Edit 2: I am sure what I'm doing is the best way in my situation. There is no "possibly better" way, so quit posting non-constructive comments. I can't get into any more detail than I have so already. The conversion from PHP code to JS is simply a matter of shortening user input; we only need one expression, then convert it to whichever language is necessary (in this particular case, from PHP to JS). The conditions on how this works will not change regardless if I describe the system down to the API specs, and inundating the topic with useless (for you) prototype docs will not help at all.
Also, for those thinking this idea came after waking up form some dream; know this has been reviewed between technical development and QA, so please do not deviate into inexistent design issues.
Edit 3: Examples (original PHP code and expected output):
(original) -- (converted)
5=="test" -- 5=="test"
'$'.(func(12)*10) -- '$'+(func(12)*10)
Fields::count()==5 -- Fields.count()==5
$this->id==5 -- this.id==5
About the last example, don't worry about context/scope, it is correct. Also note that the expressions may look weird; this is because they are expression; a single line of code that must return a value, which explains the absence of an EOL (;) and the multiple use of returning a boolean value. (exotic stuff like backtick operator execution, PHP tags, echo, die, list, etc.. left out on purpose)
Okay, let me take a stab at this one...
Screw regexes. I love them, but there's a better way, and it's built in. Check out token_get_all(). It will parse PHP source as a string and return a list of the very same tokens that PHP itself uses (including the beloved T_PAAMAYIM_NEKUDOTAYIM). You can then completely reconstruct the source of the script, one token at a time, translating it into Javascript syntax along the way.
[charles#teh ~]$ php --interactive
Interactive shell
php > print_r(token_get_all('<?php class Foo { public function bar() { echo "Yikes!"; } } $f = new Foo(); $f->bar(); ?>'));
Array
(
[0] => Array
(
[0] => 368
[1] => <?php
[2] => 1
)
[1] => Array
(
[0] => 353
[1] => class
[2] => 1
)
[2] => Array
(
[0] => 371
[1] =>
[2] => 1
)
[3] => Array
(
[0] => 307
[1] => Foo
[2] => 1
)
...
While this may be a bit overkill, it also uses the same parsing rules PHP uses, and should therefore be less of a long-term pain than regular expressions. It also gives you the flexibility to detect features that can't be translated (i.e. things that php-js doesn't support) and reject the translation and/or work around the problem.
Also, you still haven't told us what you're doing and why you're doing it. There are still probably more accurate, useful answers available. Help us help you by giving us more information.
You believe polling to be unrealistic due to an expected stupidly high number of requests per second. Why are you expecting that number? What does your application do that would cause such conditions?
Why do you want to translate PHP code rather than writing specific Javascript? You're just manipulating page contents a bit, why do you need PHP code to make that decision?
Language translation is probably the least simple solution to this problem, and is therefore an amazingly awful idea. It couldn't have been arrived at as the first option. What are your other options, and why have they been ruled out?
Have you tried Harmony Framework?
Here's the quick and dirty solution I came up with, written in under 20 minutes (probably lots of bugs), but it looks like it works.
function convertPhpToJs($php){
$php=str_split($php,1); $js='';
$str=''; // state; either empty or a quote character
$strs=array('\'','`','"'); // string quotes; single double and backtick
$nums=array('0','1','2','3','4','5','6','7','8','9'); // numerals
$wsps=array(chr(9),chr(10),chr(13),chr(32)); // remove whitespace from code
foreach($php as $n=>$c){
$p=isset($php[$n-1])?$php[$n-1]:'';
$f=isset($php[$n+1])?$php[$n+1]:'';
if($str!='' && $str!=$c){ $js.=$c; continue; } // in a string
if($str=='' && in_array($c,$strs)){ $str=$c; $js.=$c; continue; } // starting a string
if($str!='' && $str==$c){ $str=''; $js.=$c; continue; } // ending a string
// else, it is inside code
if($c=='$')continue; // filter out perl-style variable names
if($c==':' && $f==':'){ $js.='.'; continue; } // replace 1st of :: to .
if($p==':' && $c==':')continue; // filter out 2nd char of ::
if($c=='-' && $f=='>'){ $js.='.'; continue; } // replace 1st of -> to .
if($p=='-' && $c=='>')continue; // filter out 2nd char of ->
if($c=='.' && (!in_array($p,$nums) || !in_array($f,$nums))){ $js.='+'; continue; } // replace string concat op . to +
if(in_array($c,$wsps))continue; // filter out whitespace
$js.=$c;
}
return $js;
}
The following:
$window->alert("$".Math::round(450/10));
Converted to:
window.alert("$"+Math.round(450/10));
Edit: Can't believe all the fuss this question caused compared to the time taken.
Feel free to criticize at will. I don't actually like it much personally.
I wrote a tool called php2js that can automatically convert PHP code to javascript. It is not perfect, but supports the most common PHP functionality including classes, inheritance, arrays, etc, etc. It also includes and knows about php.js, so code written with php's standard library functions may "just work".
Maybe you will find it useful.
Im created tool PHP-to-JavaScript for converting PHP code to JavaScript. Its support:
Namespaces,use
Class, abstract class extends and interfaces
constants and define
Exceptions and catch
list()
magic methods __get __set and __call
and more