Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
The community reviewed whether to reopen this question 11 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I am setting out to do a side project that has the goal of translating code from one programming language to another. The languages I am starting with are PHP and Python (Python to PHP should be easier to start with), but ideally I would be able to add other languages with (relative) ease. The plan is:
This is geared towards web development. The original and target code will be be sitting on top of frameworks (which I will also have to write). These frameworks will embrace an MVC design pattern and follow strict coding conventions. This should make translation somewhat easier.
I am also looking at IOC and dependency injection, as they might make the translation process easier and less error prone.
I'll make use of Python's parser module, which lets me fiddle with the Abstract Syntax Tree. Apparently the closest I can get with PHP is token_get_all(), which is a start.
From then on I can build the AST, symbol tables and control flow.
Then I believe I can start outputting code. I don't need a perfect translation. I'll still have to review the generated code and fix problems. Ideally the translator should flag problematic translations.
Before you ask "What the hell is the point of this?" The answer is... It'll be an interesting learning experience. If you have any insights on how to make this less daunting, please let me know.
EDIT:
I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.
I've been building tools (DMS Software Reengineering Toolkit) to do general purpose program manipulation (with language translation being a special case) since 1995, supported by a strong team of computer scientists. DMS provides generic parsing, AST building, symbol tables, control and data flow analysis, application of translation rules, regeneration of source text with comments, etc., all parameterized by explicit definitions of computer languages.
The amount of machinery you need to do this well is vast (especially if you want to be able to do this for multiple languages in a general way), and then you need reliable parsers for languages with unreliable definitions (PHP is perfect example of this).
There's nothing wrong with you thinking about building a language-to-language translator or attempting it, but I think you'll find this a much bigger task for real languages than you expect. We have some 100 man-years invested in just DMS, and another 6-12 months in each "reliable" language definition (including the one we painfully built for PHP), much more for nasty languages such as C++. It will be a "hell of a learning experience"; it has been for us. (You might find the technical Papers section at the above website interesting to jump start that learning).
People often attempt to build some kind of generalized machinery by starting with some piece of technology with which they are familiar, that does a part of the job. (Python ASTs are great example). The good news, is that part of the job is done. The bad news is that machinery has a zillion assumptions built into it, most of which you won't discover until you try to wrestle it into doing something else. At that point you find out the machinery is wired to do what it originally does, and will really, really resist your attempt to make it do something else. (I suspect trying to get the Python AST to model PHP is going to be a lot of fun).
The reason I started to build DMS originally was to build foundations that had very few such assumptions built in. It has some that give us headaches. So far, no black holes. (The hardest part of my job over the last 15 years is to try to prevent such assumptions from creeping in).
Lots of folks also make the mistake of assuming that if they can parse (and perhaps get an AST), they are well on the way to doing something complicated. One of the hard lessons is that you need symbol tables and flow analysis to do good program analysis or transformation. ASTs are necessary but not sufficient. This is the reason that Aho&Ullman's compiler book doesn't stop at chapter 2. (The OP has this right in that he is planning to build additional machinery beyond the AST). For more on this topic, see Life After Parsing.
The remark about "I don't need a perfect translation" is troublesome. What weak translators do is convert the "easy" 80% of the code, leaving the hard 20% to do by hand. If the application you intend to convert are pretty small, and you only intend to convert it once well, then that 20% is OK. If you want to convert many applications (or even the same one with minor changes over time), this is not nice. If you attempt to convert 100K SLOC then 20% is 20,000 original lines of code that are hard to translate, understand and modify in the context of another 80,000 lines of translated program you already don't understand. That takes a huge amount of effort. At the million line level, this is simply impossible in practice. (Amazingly there are people that distrust automated tools and insist on translating million line systems by hand; that's even harder and they normally find out painfully with long time delays, high costs and often outright failure.)
What you have to shoot for to translate large-scale systems is high nineties percentage conversion rates, or it is likely that you can't complete the manual part of the translation activity.
Another key consideration is size of code to be translated. It takes a lot of energy to build a working, robust translator, even with good tools. While it seems sexy and cool to build a translator instead of simply doing a manual conversion, for small code bases (e.g., up to about 100K SLOC in our experience) the economics simply don't justify it. Nobody likes this answer, but if you really have to translate just 10K SLOC of code, you are probably better off just biting the bullet and doing it. And yes, that's painful.
I consider our tools to be extremely good (but then, I'm pretty biased). And it is still very hard to build a good translator; it takes us about 1.5-2 man-years and we know how to use our tools. The difference is that with this much machinery, we succeed considerably more often than we fail.
My answer will address the specific task of parsing Python in order to translate it to another language, and not the higher-level aspects which Ira addressed well in his answer.
In short: do not use the parser module, there's an easier way.
The ast module, available since Python 2.6 is much more suitable for your needs, since it gives you a ready-made AST to work with. I've written an article on this last year, but in short, use the parse method of ast to parse Python source code into an AST. The parser module will give you a parse tree, not an AST. Be wary of the difference.
Now, since Python's ASTs are quite detailed, given an AST the front-end job isn't terribly hard. I suppose you can have a simple prototype for some parts of the functionality ready quite quickly. However, getting to a complete solution will take more time, mainly because the semantics of the languages are different. A simple subset of the language (functions, basic types and so on) can be readily translated, but once you get into the more complex layers, you'll need heavy machinery to emulate one language's core in another. For example consider Python's generators and list comprehensions which don't exist in PHP (to my best knowledge, which is admittedly poor when PHP is involved).
To give you one final tip, consider the 2to3 tool created by the Python devs to translate Python 2 code to Python 3 code. Front-end-wise, it has most of the elements you need to translate Python to something. However, since the cores of Python 2 and 3 are similar, no emulation machinery is required there.
Writing a translator isn't impossible, especially considering that Joel's Intern did it over a summer.
If you want to do one language, it's easy. If you want to do more, it's a little more difficult, but not too much. The hardest part is that, while any turing complete language can do what another turing complete language does, built-in data types can change what a language does phenomenally.
For instance:
word = 'This is not a word'
print word[::-2]
takes a lot of C++ code to duplicate (ok, well you can do it fairly short with some looping constructs, but still).
That's a bit of an aside, I guess.
Have you ever written a tokenizer/parser based on a language grammar? You'll probably want to learn how to do that if you haven't, because that's the main part of this project. What I would do is come up with a basic Turing complete syntax - something fairly similar to Python bytecode. Then you create a lexer/parser that takes a language grammar (perhaps using BNF), and based on the grammar, compiles the language into your intermediate language. Then what you'll want to do is do the reverse - create a parser from your language into target languages based on the grammar.
The most obvious problem I see is that at first you'll probably create horribly inefficient code, especially in more powerful* languages like Python.
But if you do it this way then you'll probably be able to figure out ways to optimize the output as you go along. To summarize:
read provided grammar
compile program into intermediate (but also Turing complete) syntax
compile intermediate program into final language (based on provided grammar)
...?
Profit!(?)
*by powerful I mean that this takes 4 lines:
myinput = raw_input("Enter something: ")
print myinput.replace('a', 'A')
print sum(ord(c) for c in myinput)
print myinput[::-1]
Show me another language that can do something like that in 4 lines, and I'll show you a language that's as powerful as Python.
There are a couple answers telling you not to bother. Well, how helpful is that? You want to learn? You can learn. This is compilation. It just so happens that your target language isn't machine code, but another high-level language. This is done all the time.
There's a relatively easy way to get started. First, go get http://sourceforge.net/projects/lime-php/ (if you want to work in PHP) or some such and go through the example code. Next, you can write a lexical analyzer using a sequence of regular expressions and feed tokens to the parser you generate. Your semantic actions can either output code directly in another language or build up some data structure (think objects, man) that you can massage and traverse to generate output code.
You're lucky with PHP and Python because in many respects they are the same language as each other, but with different syntax. The hard part is getting over the semantic differences between the grammar forms and data structures. For example, Python has lists and dictionaries, while PHP only has assoc arrays.
The "learner" approach is to build something that works OK for a restricted subset of the language (such as only print statements, simple math, and variable assignment), and then progressively remove limitations. That's basically what the "big" guys in the field all did.
Oh, and since you don't have static types in Python, it might be best to write and rely on PHP functions like "python_add" which adds numbers, strings, or objects according to the way Python does it.
Obviously, this can get much bigger if you let it.
I will second #EliBendersky point of view regarding using ast.parse instead of parser (which I did not know about before). I also warmly recommend you to review his blog. I used ast.parse to do Python->JavaScript translator (#https://bitbucket.org/amirouche/pythonium). I've come up with Pythonium design by somewhat reviewing other implementations and trying them on my own. I forked Pythonium from https://github.com/PythonJS/PythonJS which I also started, It's actually a complete rewrite . The overall design is inspired from PyPy and http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-89-1.pdf paper.
Everything I tried, from beginning to the best solution, even if it looks like Pythonium marketing it really isn't (don't hesitate to tell me if something doesn't seem correct to the netiquette):
Implement Python semantic in Plain Old JavaScript using prototype inheritance: AFAIK it's impossible to implement Python multiple inheritance using JS prototype object system. I did try to do it using other tricks later (cf. getattribute). As far as I know there is no implementation of Python multiple inheritance in JavaScript, the best that exists is Single inhertance + mixins and I'm not sure they handle diamond inheritance. Kind of similar to Skulpt but without google clojure.
I tried with Google clojure, just like Skulpt (compiler) instead of actually reading Skulpt code #fail. Anyway because of JS prototype based object system still impossible. Creating binding was very very difficult, you need to write JavaScript and a lot of boilerplate code (cf. https://github.com/skulpt/skulpt/issues/50 where I am the ghost). At that time there was no clear way to integrate the binding in the build system. I think that Skulpt is a library and you just have to include your .py files in the html to be executed, no compilation phase required to be done by the developer.
Tried pyjaco (compiler) but creating bindings (calling Javascript code from Python code) was very difficult, there was too much boilerplate code to create every time. Now I think pyjaco is the one that more near Pythonium. pyjaco is written in Python (ast.parse too) but a lot is written in JavaScript and it use prototype inheritance.
I never actually succeed at running Pyjamas #fail and never tried to read the code #fail again. But in my mind PyJamas was doing API->API tranlation (or framework to framework) and not Python to JavaScript translation. The JavaScript framework consume data that is already in the page or data from the server. Python code is only "plumbing". After that I discovered that pyjamas was actually a real python->js translator.
Still I think it's possible to do API->API (or framework->framework) translation and that's basicly what I do in Pythonium but at lower level. Probably Pyjamas use the same algorithm as Pythonium...
Then I discovered brython fully written in Javascript like Skulpt, no need for compilation and lot of fluff... but written in JavaScript.
Since the initial line written in the course of this project, I knew about PyPy, even the JavaScript backend for PyPy. Yep, you can, if you find it, directly generate a Python interpreter in JavaScript from PyPy. People say, it was a disaster. I read no where why. But I think the reason is that the intermediate language they use to implement the interpreter, RPython, is a subset of Python tailored to be translated to C (and maybe asm). Ira Baxter says you always make assumptions when you build something and probably you fine tune it to be the best at what it's meant to do in the case of PyPy: Python->C translation. Those assumptions might not be relevant in another context worse they can infere overhead otherwise said direct translation will most likely always be better.
Having the interpreter written in Python sounded like a (very) good idea. But I was more interested in a compiler for performance reasons also it's actually more easy to compile Python to JavaScript than interpret it.
I started PythonJS with the idea of putting together a subset of Python that I could easily translate to JavaScript. At first I didn't even bother to implement OO system because of past experience. The subset of Python that I achieved to translate to JavaScript are:
function with full parameters semantic both in definition and calling. This is the part I am most proud of.
while/if/elif/else
Python types were converted to JavaScript types (there is no python types of any kind)
for could iterate over Javascript arrays only (for a in array)
Transparent access to JavaScript: if you write Array in the Python code it will be translated to Array in javascript. This is the biggest achievement in terms of usability over its competitors.
You can pass function defined in Python source to javascript functions. Default arguments will be taken into account.
It add has special function called new which is translated to JavaScript new e.g: new(Python)(1, 2, spam, "egg") is translated to "new Python(1, 2, spam, "egg").
"var" are automatically handled by the translator. (very nice finding from Brett (PythonJS contributor).
global keyword
closures
lambdas
list comprehensions
imports are supported via requirejs
single class inheritance + mixin via classyjs
This seems like a lot but actually very narrow compared to full blown semantic of Python. It's really JavaScript with a Python syntax.
The generated JS is perfect ie. there is no overhead, it can not be improved in terms of performance by further editing it. If you can improve the generated code, you can do it from the Python source file too. Also, the compiler did not rely on any JS tricks that you can find in .js written by http://superherojs.com/, so it's very readable.
The direct descendant of this part of PythonJS is the Pythonium Veloce mode. The full implementation can be found # https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/veloce/veloce.py?at=master 793 SLOC + around 100 SLOC of shared code with the other translator.
An adapted version of pystones.py can be translated in Veloce mode cf. https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pystone/?at=master
After having setup basic Python->JavaScript translation I choosed another path to translate full Python to JavaScript. The way of glib doing object oriented class based code except the target language is JS so you have access to arrays, map-like objects and many other tricks and all that part was written in Python. IIRC there is no javascript code written by in Pythonium translator. Getting single inheritance is not difficult here are the difficult parts making Pythonium fully compliant with Python:
spam.egg in Python is always translated to getattribute(spam, "egg") I did not profile this in particular but I think that where it loose a lot of time and I'm not sure I can improve upon it with asm.js or anything else.
method resolution order: even with the algorithm written in Python, translating it to Python Veloce compatible code was a big endeavour.
getattributre: the actual getattribute resolution algorithm is kind of tricky and it still doesn't support data descriptors
metaclass class based: I know where to plug the code, but still...
last bu not least: some_callable(...) is always transalted to "call(some_callable)". AFAIK the translator doesn't use inference at all, so every time you do a call you need to check which kind of object it is to call it they way it's meant to be called.
This part is factored in https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/runtime.py?at=master It's written in Python compatible with Python Veloce.
The actual compliant translator https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/compliant.py?at=master doesn't generate JavaScript code directly and most importantly doesn't do ast->ast transformation. I tried the ast->ast thing and ast even if nicer than cst is not nice to work with even with ast.NodeTransformer and more importantly I don't need to do ast->ast.
Doing python ast to python ast in my case at least would maybe be a performance improvement since I sometime inspect the content of a block before generating the code associated with it, for instance:
var/global: to be able to var something I must know what I need to and not to var. Instead of generating a block tracking which variable are created in a given block and inserting it on top of the generated function block I just look for revelant variable assignation when I enter the block before actually visiting the child node to generate the associated code.
yield, generators have, as of yet, a special syntax in JS, so I need to know which Python function is a generator when I want to write the "var my_generator = function"
So I don't really visit each node once for each phase of the translation.
The overall process can be described as:
Python source code -> Python ast -> Python source code compatible with Veloce mode -> Python ast -> JavaScript source code
Python builtins are written in Python code (!), IIRC there is a few restrictions related to bootstraping types, but you have access to everything that can translate Pythonium in compliant mode. Have a look at https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/builtins/?at=master
Reading JS code generated from pythonium compliant can be understood but source maps will greatly help.
The valuable advice I can give you in the light of this experience are kind old farts:
extensively review the subject both in literature and existing projects closed source or free. When I reviewed the different existing projects I should have given it way more time and motivation.
ask questions! If I knew beforehand that PyPy backend was useless because of the overhead due to C/Javascript semantic mismatch. I would maybe had Pythonium idea way before 6 month ago maybe 3 years ago.
know what you want to do, have a target. For this project I had different objectives: pratice a bit a javascript, learn more of Python and be able to write Python code that would run in the browser (more and that below).
failure is experience
a small step is a step
start small
dream big
do demos
iterate
With Python Veloce mode only, I'm very happy! But along the way I discovered that what I was really looking for was liberating me and others from Javascript but more importantly being able to create in a comfortable way. This lead me to Scheme, DSL, Models and eventually domain specific models (cf. http://dsmforum.org/).
About what Ira Baxter response:
The estimations are not helpful at all. I took me more or less 6 month of free time for both PythonJS and Pythonium. So I can expect more from full time 6 month. I think we all know what 100 man-year in an enterprise context can mean and not mean at all...
When someone says something is hard or more often impossible, I answer that "it only takes time to find a solution for a problem that is impossible" otherwise said nothing is impossible except if it's proven impossible in this case a math proof...
If it's not proven impossible then it leaves room for imagination:
finding a proof proving it's impossible
and
If it is impossible there may be an "inferior" problem that can have a solution.
or
if it's not impossible, finding a solution
It's not just optimistic thinking. When I started Python->Javascript everybody was saying it was impossible. PyPy impossible. Metaclasses too hard. etc... I think that the only revolution that brings PyPy over Scheme->C paper (which is 25 years old) is some automatic JIT generation (based hints written in the RPython interpreter I think).
Most people that say that a thing is "hard" or "impossible" don't provide the reasons. C++ is hard to parse? I know that, still they are (free) C++ parser. Evil is in the detail? I know that. Saying it's impossible alone is not helpful, It's even worse than "not helpful" it's discouraging, and some people mean to discourage others. I heard about this question via https://stackoverflow.com/questions/22621164/how-to-automatically-generate-a-parser-code-to-code-translator-from-a-corpus.
What would be perfection for you? That's how you define next goal and maybe reach the overall goal.
I am more interested in knowing what kinds of patterns I could enforce
on the code to make it easier to translate (ie: IoC, SOA ?) the code
than how to do the translation.
I see no patterns that can not be translated from one language to another language at least in a less than perfect way. Since language to language translation is possible, you'd better aim for this first. Since, I think according to http://en.wikipedia.org/wiki/Graph_isomorphism_problem, translation between two computer languages is a tree or DAG isomorphism. Even if we already know that they are both turing complete, so...
Framework->Framework which I better visualize as API->API translation might still be something that you might keep in mind as a way to improve the generated code. E.g: Prolog as very specific syntax but still you can do Prolog like computation by describing the same graph in Python... If I was to implement a Prolog to Python translator I wouldn't implement unification in Python but in a C library and come up with a "Python syntax" that is very readable for a Pythonist. In the end, syntax is only "painting" for which we give a meaning (that's why I started scheme). Evil is in the detail of the language and I'm not talking about the syntax. The concepts that are used in the language getattribute hook (you can live without it) but required VM features like tail-recursion optimisation can be difficult to deal with. You don't care if the initial program doesn't use tail recursion and even if there is no tail recursion in the target language you can emulate it using greenlets/event loop.
For target and source languages, look for:
Big and specific ideas
Tiny and common shared ideas
From this will emerge:
Things that are easy to translate
Things that are difficult to translate
You will also probably be able to know what will be translated to fast and slow code.
There is also the question of the stdlib or any library but there is no clear answer, it depends of your goals.
Idiomatic code or readable generated code have also solutions...
Targeting a platform like PHP is much more easy than targeting browsers since you can provide C-implementation of slow and/or critical path.
Given you first project is translating Python to PHP, at least for the PHP3 subset I know of, customising veloce.py is your best bet. If you can implement veloce.py for PHP then probably you will be able to run the compliant mode... Also if you can translate PHP to the subset of PHP you can generate with php_veloce.py it means that you can translate PHP to the subset of Python that veloce.py can consume which would mean that you can translate PHP to Javascript. Just saying...
You can also have a look at those libraries:
https://bitbucket.org/logilab/astroid
https://bitbucket.org/logilab/pylint-brain
Also you might be interested by this blog post (and comments): https://www.rfk.id.au/blog/entry/pypy-js-poc-jit/
This Google Tech Talk from Ira Baxter is interesting https://www.youtube.com/watch?v=C-_dw9iEzhA
You could take a look at the Vala compiler, which translates Vala (a C#-like language) into C.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Hey. I'm taking a course titled Principles of Programming Languages, and I need to decide on a project to do this summer. Here is a short version of what the project needs to accomplish:
"The nature of the project is language processing. Writing a Scheme/Lisp processor is a project of this type. A compiler for a language like C or Pascal is also a potential project of this type. Some past students have done projects related to databases and processing SQL. Another possible project might relate to pattern matching and manipulating XML. Lisp, Pascal, and C usually result in the most straight forward projects."
I am very interested in web technologies, and have some experience with PHP, MySql, JavaScript, etc. and I would like to do something web oriented, but I'm having trouble coming up with any ideas. I also want this to be a worthwhile project that could have some significance, instead of just doing the same thing as everyone else in class.
Any ideas? Thanks!
EDIT: I really like the idea of a Latex to XHTML/MathML translator, and I passed the idea to my instructor, in which he wrote back:
"I think the idea is interesting, my question (and yours) is whether it is appropriate.
I think of LateX as a low-level mark-up language. I'm wondering if converting this to XHTML or MathML is really a change in levels and complexity. I think you can make your point with a little more discussion and some examples. You might also think of some other mark-up constructs which made it easier to describe equations."
Any ideas on how to convince him this may be appropriate, or any extensions of this idea that could work for the goals of my project?
Thanks for all the responses so far!
Hm, neat! Maybe:
1. A web-based language interpreter. eg, a very simple assembly interpreter in javascript, or a PHP-based C interpreter (PHP script reads C code, and executes it in some sort of sandboxed kind of way. Obviously it would only be able to implement a small subset of the C language)
2. Maybe some automated way to transform PHP data structures (like PHP arrays) into SQL queries, and vice versa. That kind of stuff has already been done, but you might be able to do something which (for example) takes an SQL query and creates the array datastructure that would be needed to "hold" the information returned by the SQL. It could support complex things like JOINS and GROUP BYs.
3. Maybe a C-to-PHP compiler? (or a PHP-to-C compiler, to be able to run simple PHP code natively. Use this with any combination of languages)
edit:
4. Maybe a regex-to-C parser. That is, something that takes a regex, and generates C code to match that pattern. Or something which takes a regex, and converts it into an FSM which represents the "mathematical" translation of that expression. Or the opposite - something which takes an FSM for a CFL and generates the perl-syntax regex for it.
5. Maybe an XML-to-PHP/MySQL parser. eg, an XML file might contain information about a database and fields, and then your program creates the SQL to create those tables, or the HTML/PHP code for the forms.
Best of luck!
I'd stay away from PHP and MySQL for a project like this. Both are commercial platforms that have compromised a lot of core CS principles in order to gain market share and solve user's problems. Given what you've described it sounds like the point of this project is to think about how programming languages are processed. Javascript The Language (not the browser API) might be a good choice here. Writing a processor/interpreter/compiler for Javascript or using Javascript itself to write a processor/interpreter/compiler for another language would meet the criteria for the assignment. Writing a Javascript "minifier" that removes all unnecessary white space (for smaller file sizes) while maintaining the program's functionality is another possible project.
Here's something I'd love: a PHP-based LaTeX-to-MathML translator. It wouldn't have to do everything, but if I could just cut-and-paste mathematical formulas written in valid LaTeX code into a window and have the script parse it and convert it into valid MathML, that'd be awesome.
Let me expand on this some more. The current state of scientific publishing on the web isn't great. Titles, headers, section numbers, tables, etc. can all be done in HTML, but for mathematical and chemical formulas which depend on precise two-dimensional formatting, scientific authors have only second-class options:
Publish their work in pdf format, which looks great but has a (comparably) huge file size and doesn't do hyperlinking well, or
Use something like latex-to-html, which converts formulas into .gif files (or some similar image file), which are semantically meaningless and thus doesn't lend themselves to indexing or searching.
Moreover, neither of these options allow for mathematical formulas to be generated programmatically, which would be helpful to the education community (think randomly-generated online homework).
Publishing scientific work in MathML would solve all of these issues, but it has a few of issues of its own, namely:
It's really too verbose to code by hand. I mean, you can do it, but c'mon.
The scientific community uses LaTeX for publishing, they're happy with it (for good reason), and they're not about to learn another mathematical markup language when they've got their own research and lesson-planning to do.
Browser support for MathML is currently pretty limited. I know this, and I don't mean to stick my head in the sand about it.
In other words: scientific authors know LaTeX, they use it daily, it's the de facto standard for authoring scientific content. MathML isn't and won't ever be the way math and science is authored, but it's the only semantically rich way to put hypertext mathematics on the web. Browser support for MathML is weak because nobody uses it; nobody uses it because it's too hard to write by hand. Now, maybe this is wishful thinking, but I have to believe that if it were only easier to write MathML, more scientists and mathematicians, especially the early-adopter types, would at least try it, and this would inspire browsers (especially open-source browsers) to improve their support, which would then lead to more authors using it, etc.
Here's where the translator comes in: Until the barrier-to-entry for MathML drops, it'll never be widely adopted. A simple LaTeX-to-MathML converter would take care of that. It would reduce the barrier-to-entry for MathML to near zero. If it leads to widespread use of and better support for MathML, it would be a major benefit to the scientific and education communities.
I finished this course last semester :)
IMHO the best way to go is to build an expression evaluator. build the simplest expression evaluator you can.
Then add these features in order as many as you like:
1- constant symbols, just place holders for variables. your evaluator should ask for their values after parsing the expression.
2- imperative-style variables. Like variables in any imperative language, where the user can change the value of a symbol anywhere in code.
3- simple control-statements. 'if-else' and pretest while loop are the simplest to consider.
4- arrays. if you really want your expression evaluator to be really like a programming language. It would be interesting if you add variable dimension arrays to your 'language'. you have to build a generic mapping function for your arrays.
Now you got a real programming language. To be a useful one, you might add subroutines.
so the list continuous:
5- subroutines. This is little harder than previous features, but it should not be impossible :)
6- build a simple math library for your new language in your language it self! and that is the fun part in my opinion ;)
Sebest book is a good book to overview famous imperative programming languages.
You shouldn't view creating an implementation of a particular language as insignificant. Everyone probably wants to be a famous programmer and not many people achieve it. This is a great opportunity to be familiar with very cool uncommon languages. (Lisp, APL, etc) If this is your first time creating a compiler/interpreter then it will also be a better choice to go with an already existent language (so you can see what design elements are needed to create a successful language.)
Significant ideas typically arise from necessity. People began using a language because they either needed it or it was a lot easier to accomplish the task they wanted to do. I don't think you will find the answer or the motivation to start a project from scratch here. That being said, I've always thought it would be cool to have a language that uses processor native byte code to create dynamic websites (without using something like cgi).
In response to your edit, here are some latex ideas:
LaTeX-to-ASCII pretty print, perhaps just for a small subset of TeX
LaTeX-to-Maple/Mathcad/Mathematica script, so that equations can be imported or edited or solved (don't know if that already exists)
Javascript LaTeX translator. basically, as you type, it does a translation from latex to html/css/.gif/whatever, so you can see your math "live" as you type it, kinda like the stackoverflow text editor.
Perhaps some sort of latex macros for expressing C code or something? Or how about this: often, C code is doing math: "det = (b*b - 4*a*c); det_sqrt = sqrt(det); etc" How about something which takes C (or java or whatever) code, which is performing a series of arithmetic assignments, and converts it into a nicely-formatted latex list of equations that are human-readable (ie, a \begin{eqnarray} block)
Or something that does the opposite: take a listing of latex computations or equations, and generates C code which declares the requisite variables, gets requisite user input, and performs the computations listed in your latex?
Why not write some sort of interface that can be interpreted/compiled down to the appropriate web technology of the users choice?
Or something like a Python to C compiler?
Just something I thought of recently: write a Ruby interpreter in Lisp.
Something that can be interesting to work on, is a regexp to automaton using Glouchkov's algorithm, here are some key features that can be implemented
Syntaxical analysis of regexp
Transformation into an automaton using Glouchkov's algorithm
Generating random phrases matching the regexp with that automaton / Validating phrases
Exporting automatons using XML
That's not a very long assignment so you may be able to handle it in a few months
You can try to make a scripting language in the vein of nadvsh if you want to do something interesting, but it might be too removed from what your instructor is expecting of you.
New Adventure Shell (nadvsh)
If you want to process language you can do a UIMA program. UIMA stands for Unstructured Information Management Architecture, it was developped by IBM at a cost of about 45Million dollars and is now available opensource. Basically UIMA is ascii codecs to analyse text documents to find patterns. It is made to find things where there is no order(finding needles in hay stacks). It uses XML and C.
The web is a rich area for doing work with languages. Take a look at a popular web framework like Ruby on Rails, and you'll find that much of its productivity comes from the fact that it implements a domain specific language well suited to web applications. Ruby just so happened to be a good language to implement such a language because of its dynamic nature, but the power comes from the language they created from it.
In your case, perhaps you could try designing your own domain specific language using a language that you are familiar with, such as PHP, to implement the essential core of a web framework:
routing URLs to pages
generating pages dynamically using a template (and maybe implement your own template syntax!)
connecting objects to underlying databases (object relational mapping)
If you are really ambitious, instead of building from an existing language, you could build your own language from the ground up (lexer, parser, code generator, etc) to do this.
You can ideas from this massive list.
Writing compiler for C or Pascal will likely take you months or years, if you are not compiler guru.
Write a simple web server. It will be fun and might prove useful as a simple and free solution. I once met a guy who said he did something like this and used for simple customer sites. Yours could become a useful thing as well.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
The question has been asked: No PHP for large projects? Why not? It's a recurring theme and PHP developers--with some cause--are forced to defend PHP.
All of these questions are valid and there have been some responses but this got me thinking. Based on the principle that you can write good code in any language and bad code in any language, I thought it worth asking a positive rather than negative question. Rather than why you can't, I wanted to ask how you can use PHP for large projects.
So, how do you write a large, complex, scalable, secure and robust PHP application?
EDIT: While I appreciate that the organizational aspects are important, they apply to any large project. What I'm primarily aiming for here is technical guidance and how to deal with common issues of scalability. Using an opcode cache like APC is an obvious starter. Cluster-aware sessions would be another. That's the sort of thing I'm getting at.
For the most part, the problems with php are not so much with the language. The problems come from the coupling of a low barrier of entry and the lack of any infrastructure to avoid common programming problems or security problems. Its a language that, by itself, is pretty quick-and-dirty. Nevertheless, it still has many advantages for large-scale web apps. You'll just need to know how to add in some level of infrastructure to avoid a lot of the common web programming blunders. See - What should a developer know before building a public web site for help with this.
You need to learn about the reasons php can make your web app to be insecure or problematic and learn to mitigate those problems. You should learn about how to use php to securely access your database. You should learn about avoiding SQL injection. You should learn about evil things like register_globals and why you should never ever use them. In short, you should do your homework about your tool before just diving in for a real-world, large-scale, web app.
Once you are educated, it comes down to you'll probably want to build a framework or use a preexisting framework that will mitigate these problems. Popular frameworks include PEAR and Zend.
Also, useful questions that might help:
What should every php programmer know?
What should a developer know before building a public web site
Using PHP for large projects isn't different than with any other language. You need experience and knowledge in writing maintainable and extendable source code. You need to be aware of security pitfalls and performance concerns. You need to have researched your problem domain and be well familiar with it.
In the end, same as any other language - what you need are high-quality and well-motivated developers.
i know, this is a little out of date, but still, i'll tempt an answer ...
use Haxe/PHP ... i could delve into details ... but if you look at the language, its features, and the nice way the PHP API is encapsulated into something rather consistent, you will soon see, what PHPs problems are ... and also, you have all the benefits of Haxe in the end ...
edit: this was a serious answer ... Haxe/PHP automatically solves a lot of problems mentioned in the post flagged as answer ...
register_globals is turned off ... you get your parameters through the php.Web
using the SPOD-layer (same API for php) for the database automatically takes care of escaping (and will automatically provide your model (and templo is quite a good template engine, so that should help for your views))
having a typed language, you are more likely to write better code ... plus language features as generics and enums are very powerful ... and there is a lot of compile time magic in Haxe that is also of interest ... a more powerful language is always good to adress complex problems ...
if you want to use other PHP frameworks, you only need to write the external classes and everything will work as expected ...
i think Haxe is a very good answer to "large", "complex", "secure" and "robust" ... scalability does not come from Haxe itself of course ... but still, if you check out haxelib, then you find many things, that would help for scalability ... such as memcached (you will have to change neko.net.Socket to php.net.Socket in memcached.Connection) ...
if you really want to use the PHP language, and not just the platform, Haxe won't help you of course ...
You do as you would in any other language or any other enviornment.
There are a couple of simple steps in project development:
Organization; You need to organize everything, having documentation, uml diagrams and other pre-work done, before you start programming.
Structure; Before you start coding and also aftter starting, you need to have a focus on structure, meaning that you always need to do it correctly and not do any spagetthi solutions. Keep code simple and well commented.
These two points, are simple and apply in all development areas, despite the language. Keep it simple and well documented and you will find that developing a large scale web app in PHP is as easy as it would be in ASP.NET, Ruby or whatever.
However when we come to the development stage, you need to get a nice IDE, use a good database, use a repo., get an MVC / Template system, this runs in the "Structure"-part though.
Just as a side point, splitting the application into different layers: DLF ( Data, Logic, Front ). Use at least these three layers and you will find that the development will go easy.
Use Model-View-Controller framework. It's been said, yes. And, have at least one engineer for each part.
Model: Your DBA should write the Model code. No should else should be allowed to write SQL statements.
View: The one with the best knowledge of CSS and Javascript should do the view part. He/she should write the least PHP code, he is the one using PHP variables.
Controller: She's the real PHP coder, and also back-end server engineer, hopefully, with or without using other script languages.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am in the process of designing a Web Services course for students in an Information Technology program. Some students stop after getting a two-year associates degree, but other students in the program go on to a four-year bachelor's degree. This course would be for students going on to the four-year degree.
My initial thoughts for the course would be that it would cover:
Some simple database concepts, with enough command line practice to allow students to create simple relational database backends.
Enough PHP so students can create a web-interface that allows user to enter new data into the database backend, edit data in the database, and display fixed views of the database.
Basic security practices for PHP and web services in general.
Writing a barebones content management system using PHP and a database backend.
Learning about and using existing content management software such as Zope/Plone or Drupal.
Discuss feasibility of using existing content management software to provide ADA section 508 compliance for web pages. Contrast this with coming up with a simple framework to make ADA compliant pages using PHP.
Our semesters are 16 weeks long. Are there other topics that you cover instead of the ones listed? If you had a chance to design such a course, what would be the most pragmatic things to cover?
Edit: Based on the initial response, it is clear that the title of my question is misleading. It should be web programming instead of web services. The students taking this course will have already taken at least one programming course. The students would have all taken a course in Python. The Python course they take includes writing an XML parser that produces HTML with CSS. This course would also cover HTML, CSS, and JavaScript. XML would also be used (parsing XML using PHP, and possibly using converting XML into PHP code). Some of the students will also have taken an introductory course in Java, but that course will not cover JSP.
First of all, what do you understand by "web service"? As far as I know, the standard definition of a web service is that it's a "software system to support machine-to-machine interaction over a network". If it really is what you had in mind, well then (1) those parts about CMS doesn't apply and (2) there should definitely be some previous knowledge of web programming or something like that. Actually very little of the course description seems applicable for web services, from the description it reads like a generic web-development course.
Anyway, as that is probably not what you had in mind, the thing is, you cannot create "a web-interface" in PHP - you need HTML, CSS, JavaScript, etc. for that - will that be included in the course?
Regarding the last section about 508 - to be honest, it is a relatively minor part of everyday work in web development and it actually has nothing at all to do with PHP or programming, or server-side web development and more with what the client side code is like and how content is prepared.
You're probably going to need to talk about Xml. May want to even talk about XSDs... but that depends on what you want to get into in the course. I don't know about web services with PHP, but if it were .Net you would want to talk about serialization/deserialization.
I would teach (even briefly) the layers model. If students don't fundamentally understand it, somewhere down the road it will come back to haunt them. And yes, I have met students who went through a 4 year CS degree without understanding the network layer model or the OS layer model.
Why exactly are you teaching PHP as a CS course? Especially considering the web services topic.
Once these students graduate, 98% of their webservice work is going to be either Java, or C#.
Or perhaps you mean something different than REST, XML-RPC or SOAP for webservices?
Talking about the Sun's standard is a nice idea and maybe the oldest way to use WS in Java, the Apache Axi.
I believe that in the java session you could have a speech for jax-ws and jax-b.
Talk about the WS-* directions and how REST is changing our vision about service consumers and providers.
Some sort of programming language should be a pre-req to the course that way students can test what they are learning. It would be too much to have to teach a language. You, as the teacher, will be able to verify that they are in fact creating a service.
Maybe create some databases that the students can connect to and create services from.
Should probably get into REST vs nonREST, formats (xml, json, csv...)
Love it or hate it, SOAP is here to stay and extends beyond specific languages like java, php, etc... Web programming is no longer custom coded from the ground up. Teaching REST and SOAP is just like teaching people to use the standard template library in a C++ class. Reuse is paramount.
I would avoid writing a CMS - its usually the wrong choice for most web projects and if we have learned anything from twitter its that shoving a cms where it doesn't belong is bad juju. Plus its boring. Have them do a mashup competition instead. Foster creativity and entrepreneurship while also applying all the core concepts.
If there is time, delve into web tier architecture. I am always dismayed by the number of candidates I interview who don't understand how things fit together and how to scale. Understanding redundancy is impressive to any potential employer, especially if your students will move to corporate jobs.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
When asked to create system XYZ and you ask to do it in Python over PHP or Ruby, what are the main features you can mention when they require you to explain it?
This is one of those cases that really boil down to personal preference or situational details. If you're more comfortable and experienced with Python, then say so. Are they asking you to justify it because they're more comfortable with one of the other environments? After you're done, will the system be passed off to someone else for long-term maintenance?
If they ask you to use a technology or language that you're not as familiar with, then make sure they know up-front that it's going to take you longer.
The best sell of Python I've ever seen was by a manager in our group who had a young daughter. He used a quote attributed to Einstein:
If you can't explain something to a six-year-old, you really don't understand it yourself.
The next few slides of his presentation demonstrated how he was able to teach his young daughter some basic Python in less than 30 minutes, with examples of the code she wrote and an explanation of what it did.
He ended the presentation with a picture of his daughter and her quote "Programming is fun!"
I would focus on Python's user friendliness and wealth of libraries and frameworks. There are also a lot of little libraries that you might not get in other languages, and would have to write yourself (i.e. How a C++ developer writes Python).
Good luck!
It's one of the preferred languages over at Google - It's several years ahead of Ruby in terms of "maturity" (what ever that really means - but managers like that). Since it's prefered by Google you can also run it on the Google App Engine.
Mircosoft is also embracing Python, and will have a v2.0 of IronPython coming out shortly. They are working on a Ruby implementation as well, but the Python version is way ahead, and is actually "ready for primetime". That give you the possibility for easy integration with .NET code, as well as being able to write client side RIAs in Python when Silverlight 2 ships.
Focus on the shorter time needed for development/prototype and possibly easier maintenance (none of this may apply against Ruby).
I would consider that using python on a new project is completely dependent on what problem you are trying to solve with python. If you want someone to agree with you that you should use python, then show them how python's features apply specifically to that problem.
In the case of web development with python, talk about WSGI and other web libraries and frameworks you could use that would make your life easier. One note for python is that most of the frameworks for python web development can be plugged right into any current project. With ruby on rails, you're practically working in a DSL that anyone who uses your project will have to learn. If they know python, then they can figure out what you are doing with django, etc in a day.
I'm only talking about web development because it appears that's what you are going to be working on seeing ruby, python and PHP in the same list. The real message that's important is applying to whatever it is you like about python directly to some problem you are trying to solve.
Give them a snippet of code in each (no more than a page) that performs some cool function that they will like. (e.g show outliers in a data set).
Show them each page. One in PHP, Ruby and Python.
Ask them which they find easiest to understand/read.
Tell them thats why you want to use Python. It's easier to read if you've not written it, more manageable, less buggy and quicker to build features because it is the most elegant (pythonic)
I agree with mreggen. Tell them by working in Python you can get things done faster. Getting things done faster possibly means money saved by the client. In the least it means that you are working with a language you a more comfortable in, meaning faster development, debugging, and refactoring time. There will be less time spent looking up documentation on what function to use to find the length of a string, etc.
Though All 3 languages are versatile and used worldwide by programmers, Python still have some advantages over the other two. Like From my personal experience :-
Non-programmers love it (most of 'em choose Python as their first computer language,check this infographic php vs python vs ruby here)
Multiple frameworks (You can automate your system tasks, can develop apps for web and windows/mac/android OSes)
Making OpenCV apps easily than MATLAB
Testing done easy (you can work on Selenium for all kind of web testing)
OOPS concepts are followed by most languages now , so how come Python can stay behind! Inheritance, Abstraction and Encapsulation are followed by Python as well.
Python as of now is divided into two versions popularly that are not much different in terms of performance but features. Python2.x and Python 3.x both have same syntax ,except for some statements like :-
print "..." in Python2.x and print() in Python3.x
raw_input() in Python2.x and input() in Python3.x (for getting user input)
In the end, client only cares about money and Python helps you save a lot as compared to PHP and Ruby , because instead of hiring experienced programmers , you can make a newbie learn and use Python expertly.