I've been reading a lot about Facebook's PHP HipHop project, but one thing I can't seem to figure out (I don't have a 64 bit machine to test HipHop on) is whether or not one could use HipHop as simply a project conversion tool, rather than just to compile a binary.
Essentially if one were so inclined to try to convert a PHP CLI application to C++ using HipHop, would it therefore be possible to just maintain it in C++ in the future, rather than having to use HipHop every time?

I expect that automatically translated code will be harder to maintain than handwritten code. It will be generated by a computer so you can get rid of things like loops in many places. A lot of things might be sub optimal (direct translation cannot always accommodate for idioms) and after hand tweaking this for a while, it will be a mix of styles (auto generated and hand written) which will be a maintenance nightmare.
Essentially, you should treat this C++ code just like a binary after a regular compilation. Would you bit tweak that to add new functionality? No, you'd edit the source and then recompile it. That's what you should do here as well. The C++ is generated and an intermediate representation. The fact that it's in a human readable language shouldn't tempt you to modify it.

The source C++ generated by HipHop is perfectly capable of being modified. People do use HipHop for this. It would not make sense to expand greatly on HipHop generated C++ since it is not so easy to follow as hand-written code (or anywhere near as efficient), but for small changes (perhaps optimizing) it can be done.
Also HipHop can be used to generated a base which you can then completely rewrite if you wanted to convert a PHP script to C++. You can start by looking at the HipHop generated code to help you rewrite the entire thing by hand.


From my point of view, both PHP and Java have a similar structure. At first you write some high-level code, which then must be translated in a simpler code format to be executed by a VM. One difference is, that PHP works directly from the source code files, while Java stores the bytecode in .class files, from where the VM can load them.
Nowadays the requirements for speedy PHP execution grow, which leads people to believe that it would be better to directly work with the opcodes and not go through the compiling step each time a user hits a file.
The solution seem to be a load of so called Accelerators, which basically store the compiled results in cache and then use the cached opcodes instead of compiling again.
Another approach, done by Facebook, is to completely compile the PHP code to a different language.
So my question is, why is nobody in the PHP world doing what Java does? Are there some dynamic elements that really need to be recompiled each time or something like that? Otherwise it would be really smarter to compile everything when the code goes into production and then just work with that.
The most important difference is that the JVM has an explicit specification that covers the bytecode completely. That makes bytecode files portable and useful for more than just execution by a specific JVM implementation.
PHP doesn't even have a language specification. PHP opcodes are an implementation detail of a specific PHP engine, so you can't really do anything interesting with them and there's little point in making them more visible.
PHP opcodes are not the same as Java classfiles. Java classfiles are well specified, and are portable between machines. PHP opcodes are not portable in any way. They have memory addresses baked into them, for example. They are strictly an implementation detail of the PHP interpreter, and shouldn't be considered anything like Java bytecode.
Does it have to be this way? No, probably not. But the PHP source code is a mess, and there is neither the desire, nor the political will in the PHP internals community to make this happen. I think there was talk of baking an opcode cache into PHP 6, but PHP 6 died, and I don't know the status of that idea.
Reference: I wrote phc so I was pretty knee deep in PHP implementation/compilation for a few years.
It's not quite true that nobody in the PHP world is doing what java does. Projects such as Alexey Zakhlestin's appserver provide a degree of persistence more akin to a java servlet container (though his inspiration is more Ruby’s Rack and Python’s WSGI than Java)
PHP does not use a standard mechanism for opcodes. I wish it either stuck to a stack VM (python,java) or a register VM (x86, perl6 etc). But it uses something absolutely homegrown and there in lies the rub.
It uses a connected list in memory which results in each opcode having a ->op1 ->op2 and ->result. Now each of those are either constants or entries in a temp table etc. These pointers cannot be serialized in any sane fashion.
Now, people have accomplished this using items like pecl/bcompiler which does dump the stream into the disk.
But the classes make this even more complicated, which means that there are potential code fragments like
class XYZ() { }
class XYZ() { }
class ABC extends XYZ {}
Which means that a large number of decisions about classes & functions can only be done at runtime - something like Java would choke on two classes with the same name, which are defined conditionally at runtime. Basically, APC's inheritance & class caching code is perhaps the most complicated & bug-prone part of the codebase. Whenever a class is cached, all parent inherited members have to be scrubbed out before it can be saved to the opcode cache.
The pointer problem is not insurmountable. There is an apc_bindump which I have never bothered to fix up to load entire cache entries off disk directly whenever a restart is done. But it's painful to debug all that to get something that still needs to locate all system pointers - the apache case is too easy, because all php processes have the same system pointers because of the fork behaviour. The old fastcgi versions were slower because they used to fork first & init php later - the php-fpm fixed that by doing it the other way around.
But eventually, what's really missing in PHP is the will to invent a bytecode format, throw away the current engine & all modules - to rewrite it using a stack VM & build a JIT. I wish I had the time - the fb guys are almost there with their hiphop HHVM. Which sacrifies eval() for faster performance - which is a fair sacrifice :)
PS: I'm the guy who can't find time to update APC for 5.4 properly
I think all of you are misinformed. HHVM is not a compiler to another languague is a virtual machine itself. The confusion is because facebook use to compile to c++, but that approach was to slowly for the requirements of the developers (ten minutes compiling only for test some tiny things).

I have a PHP script I'd like to compile into a standalone command-line executable for running on Linux.
Is this realistic? Are there compilers for this?
I know that there are PHP compilers around, but my question is oriented more to whether it is of advantage, and which is the best compiler to use.
Will it be faster or slower than running it through PHP? If it would speed up my script then that would be great, since it does a lot of processing (lots of loops and math) and takes an hour to run.
My understanding is that HipHop compiles to C++, not C, and that the process is quite easy. FaceBook has applied it to vast amounts of code.
I'm not sure you get a .exe, but OP only wants "faster". He might get quite a good speedup; HipHop compiled-code averages only a factor of 2 or so faster, but that's because much of PHP execution is really library calls. He might get pretty good speedup on his computation part, if the HipHop compiler can figure out what the data types are in the computation code. He might have to modify his code somewhat to make this clear to the HipHop compiler, by not using his computation varaibles for anything but computation but that should be only a minor source code change. I'd expect the HipHop site to contain hints about what to do to speed up HipHop compiled code along these lines.
All of this is educated guess based on what I understand and not actual experience. YMMV.
Well, compiling itself will add very little if your script runs for hours.
The best solution would be to use another language, C seems woud be the best choice.
you can try to translate your PHP to C using hiphop compiler but it is not the task I'd call easy

I know you can minify PHP, but I'm wondering if there is any point. PHP is an interpreted language so will run a little slower than a compiled language. My question is: would clients see a visible speed improvement in page loads and such if I were to minify my PHP?
Also, is there a way to compile PHP or something similar?
PHP is compiled into bytecode, which is then interpreted on top of something resembling a VM. Many other scripting languages follow the same general process, including Perl and Ruby. It's not really a traditional interpreted language like, say, BASIC.
There would be no effective speed increase if you attempted to "minify" the source. You would get a major increase by using a bytecode cache like APC.
Facebook introduced a compiler named HipHop that transforms PHP source into C++ code. Rasmus Lerdorf, one of the big PHP guys did a presentation for Digg earlier this year that covers the performance improvements given by HipHop. In short, it's not too much faster than optimizing code and using a bytecode cache. HipHop is overkill for the majority of users.
Facebook also recently unveiled HHVM, a new virtual machine based on their work making HipHop. It's still rather new and it's not clear if it will provide a major performance boost to the general public.
Just to make sure it's stated expressly, please read that presentation in full. It points out numerous ways to benchmark and profile code and identify bottlenecks using tools like xdebug and xhprof, also from Facebook.
2021 Update
HHVM diverged away from vanilla PHP a couple versions ago. PHP 7 and 8 bring a whole bunch of amazing performance improvements that have pretty much closed the gap. You now no longer need to do weird things to get better performance out of PHP!
Minifying PHP source code continues to be useless for performance reasons.
Forgo the idea of minifying PHP in favor of using an opcode cache, like PHP Accelerator, or APC.
Or something else like memcached
Yes there is one (non-technical) point.
Your hoster can spy your code on his server. If you minify and uglify it, it is for spys more difficult to steal your ideas.
One reason for minifying and uglifying php may be spy-protection. I think uglyfing code should one step in an automatic deployment.
With some rewriting (shorter variable names) you could save a few bytes of memory, but that's also seldomly significant.
However I do design some of my applications in a way that allows to concatenate include scripts together. With php -w it can be compacted significantly, adding a little speed gain for script startup. On an opcode-enabled server this however only saves a few file mtime checks.
This is less an answer than an advertisement. I'm been working on a PHP extension that translates Zend opcodes to run on a VM with static typing. It doesn't accelerate arbitrary PHP code. It does allow you to write code that run way faster than what regular PHP allows. The key here is static typing. On a modern CPU, a dynamic language eats branch misprediction penalty left and right. Fact that PHP arrays are hash tables also imposes high cost: lot of branch mispredictions, inefficient use of cache, poor memory prefetching, and no SIMD optimization whatsoever. Branch misprediction and cache misses in particular are achilles' heel for today's processors. My little VM sidesteps those problem by using static types and C array instead of hash table. The result ends up running roughly ten times faster. This is using bytecode interpretation. The extension can optionally compile a function through gcc. In that case, you get two to five times more speed.
Here's the link for anyone interested:
Again, the extension is not a general PHP accelerator. You have to write code specific for it.
There are PHP compilers... see this previous question for a list; but (unless you're the size of Facebook or are targetting your application to run client-side) they're generally a lot more trouble than they're worth
Simple opcode caching will give you more benefit for the effort involved. Or profile your code to identify the bottlenecks, and then optimise it.
You don't need to minify PHP.
In order to get a better performance, install an Opcode cache; but the ideal solution would be to upgrade your PHP to the 5.5 version or above because the newer versions have an opcode cache by default called Zend Optimiser that is performing better than the other ones http://massivescale.blogspot.com/2013/06/php-55-zend-optimiser-opcache-vs-xcache.html.
The "point" is to make the file smaller, because smaller files load faster than bigger files. Also, removing whitespace will make parsing a tiny bit faster since those characters don't need to be parsed out.
Will it be noticeable? Almost never, unless the file is huge and there's a big difference in size.

I am working on a project that requires reading text files, extracting data from them, and then generating reports (text files). Since there are a lot of string parsing, I decided to do it in Perl or Python or PHP (preference in that order). But I don't want to expose the source code to my client. Is there any good compiler for compiling perl/python/php scripts into linux executables?
I am not looking for a 100% perfect one, but I am looking for an at least 90% perfect one. By perfect, I mean the compiler doesn't require to write scripts with a limited subset of language features.
I'm sorry, it's simply not worth spending your time on. For any language you choose (from among the ones you listed), for any compiler/obfuscator someone chooses to come up with, I promise you I can get readable source code out of it (within an hour if it's Perl; longer if it's Python or PHP simply because I'm less acquainted with the implementations of those languages, not because it's intrinsically harder with those languages).
I think you should take a better look at what your goals are and why you want to work for a client that you're assuming a priori wants to rip you off. And if you still want to go ahead with such a scheme, write in C or Fortran -- certainly not anything starting with "P".
There does exist a Compiler for perl, called perlcc. I'm not familar with perl, but it looks like what you're looking for.
There are 3 options of encrypting Perl code:
Use PAR to create executable file with PAR::Filter::Obfuscate or PAR::Filter::Crypto
Use Filter::Crypto::CryptFile (will require some modules installed on target OS)
Turn into module and encrypt into Module::Crypt.
Also you can try B::C - it was removed from core Perl distribution and is now available on CPAN.
So far we've heard about perlcc and PAR with some obfuscation filters. These may work.
I have had very good luck with ActiveState's PerlApp which is part of their Perl Dev Kit.
It works well to bundle your code and hide it. You can try it for free, and it comes with some nice extras. Whether it is expensive or cheap depends on your perspective. For me, it was cheap. The cost in time of getting code hiding working and reliable with PAR or messing with perlcc was easily less than the price of the package. YMMV.
For Python You can call your code and give the *.pyc file to the client.
For linux an executable is something which has +x set, so there's no need to compile scripts. To hide your sourcecode you could use an obfuscator. This makes your sourcecode unreadable.
I've never used this, so I don't know how easy it is to setup, but you could use HipHop PHP to turn your PHP scripts into C++ code and compile them. (Assuming you choose to write them in PHP)
For Python you can use cx.freeze. I have not used this but I believe that it basically bundles the .pyc into a zip file and adds an executable front-end.
Alternatively you could try compiling you Python code with Cython. this translates a modified version of the Python language into C code, which can then be compiled. This is normally used to write high performance extensions or interface with existing C libraries, but the latest version can also be used to make a complete executable.

If I write a hello world app using a PHP web framework such as CodeIgniter and then I compile it and run it using HipHop. Will it run faster than if I write the same hello world app in django or rails?
HIPHOP converts php code into C++ code, which needs to be compiled to run. Since pre-compiled code runs faster and uses less memory then scriping languages like python/php it will probably run faster in the example you have given.
However, HIPHOP does not convert all code. A lot of code in php is dynamic and can not be changed to c++, this means you will have to write your code with this in mind. If codeigniter can even be compiled using HIPHOP is another question.
Terry Chay wrote a big article about HIPHOP, covering when to use it, it's limitations and future. I would recomment reading this, as it will most likely answer most of your questions and give you some insight into how it works :)
At that point the run time is inconsequential. HipHop was designed for scaling... meaning billions of requests. There's absolutely no need to use something like HipHop for even a medium size website.
But more to the point of your question... I don't think there have been comparison charts available for us to see, but I doubt the run time would be faster at that level.
i don't know about django or rails, so this is a bit off-topic.
with plain php, the request goes to apache, then to mod_php. mod_php loads the helloworld.php script from disk, parses & tokenizes it, compiles it to bytecode, then interprets the bytecode, passes the output back to apache, apache serves it to the user.
with php and an optimizer the first run is about the same as with plain php, but the compiled source code is stored in ram. then, for the second request: goes to apache, apache to mod_php, apc loads bytecode from ram, interprets it, passes it back to apache, back to the user.
with hiphop there is no apache, but hiphop itself and there's no interpreter, so request goes directly to hiphop and back to the user. so yes, it's faster, because of several reasons:
faster startup because there's no bytecode compilation needed - the program is already in machine-readable code. so no per-request compilation and no source file reading.
no interpreter. machine code is not necessarily faster - that depends on the quality of source translation (hiphop) and the quality of the static compiler (g++). hiphop translated code is not fast compared to hand-written c code, because there's a bit of overhead because of type handling and such.
with node.js, there's also no apache. the script is started and directly compiled to machine code (because the V8 compiler does that), so it's kind of AOT (ahead of time) compiling (or is it still called JIT? i don't really know). every request is then directly handled by the already compiled machine code; so node.js is actually very comparable to hiphop. i assume hiphop to be multithreaded or something like this, while node does evented IO.
facebook claims a 50% speed gain, which is not really that much; if you compare the results of the language shootout, you'll see for the execution speed of assorted algorithms, php is 5 to 250 times slower.
so why only 50%? because ...
web apps depend on much more than just execution speed, e.g. IO
php's type system prevents hiphop to make the best use of c++'s static types
in practice, a lot of php is already C, because most of the functionality is either built in or comes from extensions. extensions are programmed in C and statically compiled.
i'm not sure if there was a huge performance gain for hello world, because hello world, even with a good framework, is still so small execution speed could be negligible in comparison to all the other overhead (network latency and stuff).
imo: if you want speed and ease of use, go for node.js :)
Running a simple application is always faster in any language. When it's become as complex as facebook, then you will face numerous of problems. PHP slowness will be show it's face. In same times, converting existing code to another language is not an options, since all logic and code is not so easy to translated to other language's syntax. That's why facebook developer decide to keep the old code, and make PHP faster. That's the reason they create their own PHP compiler, called HipHop.
Read this story from the perspective one of Facebook developer, so you know the history of HipHop.
That is not really an apple to apples comparison. In the most level playing field you might have something like:
Django running behind apache
Django rendering an HTML template to say hello world (no caching)
HPHP running behind apache
HPHP rendring an HTML template to say hello world (again, no caching)
There is no database, almost no file I/O, and no caching. If you hit the page 10,000 times with a load generator at varying concurrency levels you will probably find that HPHP will outperform Django or rails - that is to say it can serve render more pages per second and keep up with your traffic a bit better.
The question is, will you ever have this many concurrent users? If you will, will they likely be hitting a database or a cached page?
HPHP sounds cool, but IMHO there is no reason to jump ship just yet (unless you are getting lots of traffic, in which case it might make sense to check it out).
Will it run faster than if I write the
same hello world app in django or
It probably will, but don't fret. If we're talking prospective speed improvements from yet unreleased projects, Pythonistas have pypy-jit and unladen-swallow to look forward to ;)
