I was browsing thru the docs of APC (Alternative PHP Cache) and I've seen that it has a function called apc_compile_file. The Docs say that this function is to:
Stores a file in the bytecode cache,
bypassing all filters.
Is this like HipHop's idea, to store PHP code in more optimized code? If is not, can someone educate me because Im kinda lost in that. If is indeed like that, then why APC is older than HipHop and doesn't get all the fuzz that HipHop gets.
Best regards!
The two are very, very different.
APC isn't a bytecode optimizer, just a bytecode cache. It saves the need for the PHP script to be parsed (or even read from the .php file on disk) on subsequent accesses, but it's still being executed as PHP bytecode.
HipHop doesn't just optimize PHP code, it transforms it to compilable C++ code, ten compiles it into an native executable on the server. By its nature as compiled code, it then runs significantly faster than any scripted language.
Related
from my understanding, if you use a PHP caching program like APC, eAccelerator, etc. then opcodes will be stored in memory for faster execution upon subsequent requests. My question is, why wouldn't it ALWAYS be better/faster to compile your scripts, assuming you're using a compiler like phc or even HPHP (although I know they have issues with dynamic constructs)? Why bother storing opcodes since they have to be re-read by the Zend Engine, which uses C functions to execute it, when you can just compile and skip that step?
You cannot simply compile to c and have your php script execute the same way. HPHP does real compilation, but it doesn't support the whole superset of php features.
Other compilers actually just embed a php interpreter in the binary so you aren't really compiling the code anyway.
PHP is not meant to be compiled. opcode caching is very fast and good enough for 99% of applications out there. If you have facebook level of traffic, and you have already optimized your back end db, compilation might be the only way to increase performance.
PHP is not a thin layer to the std c library.
If PHP didn't have eval(), it probably would be possible to do a straight PHP->compiled binary translation with (relative) ease. But since PHP can itself dynamically build/execute scripts on the fly via eval(), it's not possible to do a full-on binary. Any binary would necessarily have to contain the entirety of PHP because the compiler would have no idea what your dynamic code could do. You'd go from a small 1 or 2k script into a massive multi-megabyte binary.
from wikipedia:
Most PHP accelerators work by caching the compiled bytecode of PHP
scripts to avoid the overhead of parsing and compiling source code on
each request (some or all of which may never even be executed). To
further improve performance, the cached code is stored in shared
memory and directly executed from there, minimizing the amount of slow
disk reads and memory copying at runtime.
Just in time Compilation:
JIT compilers represent a hybrid approach, with translation occurring
continuously, as with interpreters, but with caching of translated
code to minimize performance degradation.
so is using PHP accelerators such as APC on PHP have equivalent implications towards performance with "Just-in-time" compiling PHP (assuming that it's possible to do so)....in fact are they actually the same thing?
so is using PHP accelerators such as APC on PHP have equivalent implications towards performance with "Just-in-time" compiling PHP (assuming that it's possible to do so)....in fact are they actually the same thing?
Same concept, different execution.
When JIT is spoken of in most circles, it refers to compiling virtual machine bytecode into native bytecode. For example, Facebook's HHVM is a PHP implementation that uses a JIT engine.
However, PHP's native virtual machine doesn't do JIT to native bytecode. In fact, it doesn't do JIT at all in the traditional sense. While whole files are compiled to PHP bytecode on-demand, this isn't actually JIT.
Be careful with the term "PHP accelerator." Back in the PHP4 days, the bytecode created by the PHP parser could be optimized a bit to get better performance. This hasn't been needed since early PHP5. The only thing that APC, the Zend "Optimizer" and other "accelerator" products do to increase PHP performance is cache bytecode. The term "accelerator" should no longer be used to remove ambiguity.
If you're using standard PHP, then you do want a bytecode cache, just steer clear of products saying that they try to do PHP bytecode optimization.
I know you can minify PHP, but I'm wondering if there is any point. PHP is an interpreted language so will run a little slower than a compiled language. My question is: would clients see a visible speed improvement in page loads and such if I were to minify my PHP?
Also, is there a way to compile PHP or something similar?
PHP is compiled into bytecode, which is then interpreted on top of something resembling a VM. Many other scripting languages follow the same general process, including Perl and Ruby. It's not really a traditional interpreted language like, say, BASIC.
There would be no effective speed increase if you attempted to "minify" the source. You would get a major increase by using a bytecode cache like APC.
Facebook introduced a compiler named HipHop that transforms PHP source into C++ code. Rasmus Lerdorf, one of the big PHP guys did a presentation for Digg earlier this year that covers the performance improvements given by HipHop. In short, it's not too much faster than optimizing code and using a bytecode cache. HipHop is overkill for the majority of users.
Facebook also recently unveiled HHVM, a new virtual machine based on their work making HipHop. It's still rather new and it's not clear if it will provide a major performance boost to the general public.
Just to make sure it's stated expressly, please read that presentation in full. It points out numerous ways to benchmark and profile code and identify bottlenecks using tools like xdebug and xhprof, also from Facebook.
2021 Update
HHVM diverged away from vanilla PHP a couple versions ago. PHP 7 and 8 bring a whole bunch of amazing performance improvements that have pretty much closed the gap. You now no longer need to do weird things to get better performance out of PHP!
Minifying PHP source code continues to be useless for performance reasons.
Forgo the idea of minifying PHP in favor of using an opcode cache, like PHP Accelerator, or APC.
Or something else like memcached
Yes there is one (non-technical) point.
Your hoster can spy your code on his server. If you minify and uglify it, it is for spys more difficult to steal your ideas.
One reason for minifying and uglifying php may be spy-protection. I think uglyfing code should one step in an automatic deployment.
With some rewriting (shorter variable names) you could save a few bytes of memory, but that's also seldomly significant.
However I do design some of my applications in a way that allows to concatenate include scripts together. With php -w it can be compacted significantly, adding a little speed gain for script startup. On an opcode-enabled server this however only saves a few file mtime checks.
This is less an answer than an advertisement. I'm been working on a PHP extension that translates Zend opcodes to run on a VM with static typing. It doesn't accelerate arbitrary PHP code. It does allow you to write code that run way faster than what regular PHP allows. The key here is static typing. On a modern CPU, a dynamic language eats branch misprediction penalty left and right. Fact that PHP arrays are hash tables also imposes high cost: lot of branch mispredictions, inefficient use of cache, poor memory prefetching, and no SIMD optimization whatsoever. Branch misprediction and cache misses in particular are achilles' heel for today's processors. My little VM sidesteps those problem by using static types and C array instead of hash table. The result ends up running roughly ten times faster. This is using bytecode interpretation. The extension can optionally compile a function through gcc. In that case, you get two to five times more speed.
Here's the link for anyone interested:
https://github.com/chung-leong/qb/wiki
Again, the extension is not a general PHP accelerator. You have to write code specific for it.
There are PHP compilers... see this previous question for a list; but (unless you're the size of Facebook or are targetting your application to run client-side) they're generally a lot more trouble than they're worth
Simple opcode caching will give you more benefit for the effort involved. Or profile your code to identify the bottlenecks, and then optimise it.
You don't need to minify PHP.
In order to get a better performance, install an Opcode cache; but the ideal solution would be to upgrade your PHP to the 5.5 version or above because the newer versions have an opcode cache by default called Zend Optimiser that is performing better than the other ones http://massivescale.blogspot.com/2013/06/php-55-zend-optimiser-opcache-vs-xcache.html.
The "point" is to make the file smaller, because smaller files load faster than bigger files. Also, removing whitespace will make parsing a tiny bit faster since those characters don't need to be parsed out.
Will it be noticeable? Almost never, unless the file is huge and there's a big difference in size.
If I write a hello world app using a PHP web framework such as CodeIgniter and then I compile it and run it using HipHop. Will it run faster than if I write the same hello world app in django or rails?
HIPHOP converts php code into C++ code, which needs to be compiled to run. Since pre-compiled code runs faster and uses less memory then scriping languages like python/php it will probably run faster in the example you have given.
However, HIPHOP does not convert all code. A lot of code in php is dynamic and can not be changed to c++, this means you will have to write your code with this in mind. If codeigniter can even be compiled using HIPHOP is another question.
Terry Chay wrote a big article about HIPHOP, covering when to use it, it's limitations and future. I would recomment reading this, as it will most likely answer most of your questions and give you some insight into how it works :)
http://terrychay.com/article/hiphop-for-faster-php.shtml
At that point the run time is inconsequential. HipHop was designed for scaling... meaning billions of requests. There's absolutely no need to use something like HipHop for even a medium size website.
But more to the point of your question... I don't think there have been comparison charts available for us to see, but I doubt the run time would be faster at that level.
i don't know about django or rails, so this is a bit off-topic.
with plain php, the request goes to apache, then to mod_php. mod_php loads the helloworld.php script from disk, parses & tokenizes it, compiles it to bytecode, then interprets the bytecode, passes the output back to apache, apache serves it to the user.
with php and an optimizer the first run is about the same as with plain php, but the compiled source code is stored in ram. then, for the second request: goes to apache, apache to mod_php, apc loads bytecode from ram, interprets it, passes it back to apache, back to the user.
with hiphop there is no apache, but hiphop itself and there's no interpreter, so request goes directly to hiphop and back to the user. so yes, it's faster, because of several reasons:
faster startup because there's no bytecode compilation needed - the program is already in machine-readable code. so no per-request compilation and no source file reading.
no interpreter. machine code is not necessarily faster - that depends on the quality of source translation (hiphop) and the quality of the static compiler (g++). hiphop translated code is not fast compared to hand-written c code, because there's a bit of overhead because of type handling and such.
with node.js, there's also no apache. the script is started and directly compiled to machine code (because the V8 compiler does that), so it's kind of AOT (ahead of time) compiling (or is it still called JIT? i don't really know). every request is then directly handled by the already compiled machine code; so node.js is actually very comparable to hiphop. i assume hiphop to be multithreaded or something like this, while node does evented IO.
facebook claims a 50% speed gain, which is not really that much; if you compare the results of the language shootout, you'll see for the execution speed of assorted algorithms, php is 5 to 250 times slower.
so why only 50%? because ...
web apps depend on much more than just execution speed, e.g. IO
php's type system prevents hiphop to make the best use of c++'s static types
in practice, a lot of php is already C, because most of the functionality is either built in or comes from extensions. extensions are programmed in C and statically compiled.
i'm not sure if there was a huge performance gain for hello world, because hello world, even with a good framework, is still so small execution speed could be negligible in comparison to all the other overhead (network latency and stuff).
imo: if you want speed and ease of use, go for node.js :)
Running a simple application is always faster in any language. When it's become as complex as facebook, then you will face numerous of problems. PHP slowness will be show it's face. In same times, converting existing code to another language is not an options, since all logic and code is not so easy to translated to other language's syntax. That's why facebook developer decide to keep the old code, and make PHP faster. That's the reason they create their own PHP compiler, called HipHop.
Read this story from the perspective one of Facebook developer, so you know the history of HipHop.
That is not really an apple to apples comparison. In the most level playing field you might have something like:
Django running behind apache
Django rendering an HTML template to say hello world (no caching)
AND
HPHP running behind apache
HPHP rendring an HTML template to say hello world (again, no caching)
There is no database, almost no file I/O, and no caching. If you hit the page 10,000 times with a load generator at varying concurrency levels you will probably find that HPHP will outperform Django or rails - that is to say it can serve render more pages per second and keep up with your traffic a bit better.
The question is, will you ever have this many concurrent users? If you will, will they likely be hitting a database or a cached page?
HPHP sounds cool, but IMHO there is no reason to jump ship just yet (unless you are getting lots of traffic, in which case it might make sense to check it out).
Will it run faster than if I write the
same hello world app in django or
rails?
It probably will, but don't fret. If we're talking prospective speed improvements from yet unreleased projects, Pythonistas have pypy-jit and unladen-swallow to look forward to ;)
For Python, it can create a pre-compiled version file.pyc so that the program can be run without interpreted again. Can Ruby, PHP, and Perl do the same on the command line?
There is no portable bytecode specification for Ruby, and thus also no standard way to load precompiled bytecode archives. However, almost all Ruby implementations use some kind of bytecode or intcode format, and several of them can dump and reload bytecode archives.
YARV always compiles to bytecode before executing the code, however that is usually only done in memory. There are ways to dump out the bytecode to disk. At the moment, there is no way to read it back in, however. This will change in the future: work is underway on a bytecode verifier for YARV, and once that is done, bytecode can safely be loaded into the VM, without fear of corruption. Also, the JRuby developers have indicated that they are willing to implement a YARV VM emulator inside JRuby, once the YARV bytecode format and verifier are stabilized, so that you could load YARV bytecode into JRuby. (Note that this version is obsolete.)
Rubinius also always compiles to bytecode, and it has a format for compiled files (.rbc files, analogous to JVM .class files) and there is talk about a bytecode archive format (.rba files, analogous to JVM .jar files). There is a chance that Rubinius might implement a YARV emulator, if deploying apps as YARV bytecode ever becomes popular. Also, the JRuby developers have indicated that they are willing to implement a Rubinius bytecode emulator inside JRuby, if Rubinius bytecode becomes a popular way of deploying Ruby apps. (Note that this version is obsolete.)
XRuby is a pure compiler, it compiles Ruby sourcecode straight to JVM bytecode (.class files). You can deploy these .class files just like any other Java application.
JRuby started out as an interpreter, but it has both a JIT compiler and an AOT compiler (jrubyc) that can compile Ruby sourcecode to JVM bytecode (.class files). Also, work is underway to create a new compiler that can compile (type-annotated) Ruby code to JVM bytecode that actually looks like a Java class and can be used from Java code without barriers.
Ruby.NET is a pure compiler that compiles Ruby sourcecode to CIL bytecode (PE .dll or .exe files). You can deploy these just like any other CLI application.
IronRuby also compiles to CIL bytecode, but typically does this in-memory. However, you can pass commandline switches to it, so it dumps the .dll and .exe files out to disk. Once you have those, they can be deployed normally.
BlueRuby automatically pre-parses Ruby sourcecode into BRIL (BlueRuby Intermediate Language), which is basically a serialized parsetree. (See Blue Ruby - A Ruby VM in SAP ABAP(PDF) for details.)
I think (but I am definitely not sure) that there is a way to get Cardinal to dump out Parrot bytecode archives. (Actually, Cardinal only compiles to PAST, and then Parrot takes over, so it would be Parrot's job to dump and load bytecode archives.)
Perl 5 can dump the bytecodes to disk, but it is buggy and nasty. Perl 6 has a very clean method of creating bytecode executables that Parrot can run.
Perl's just-in-time compilation is fast enough that this doesn't matter in most circumstances. One place where it does matter is in a CGI environment which is what mod_perl is for.
For hysterical raisins, Perl 5 looks for .pmc files ahead of .pm files when searching for module. These files could contain bytecode, though Perl doesn't write bytecode out by default (unlike Python).
Module::Compile (or: what's this PMC thingy?) goes into some more depth about this obscure feature. They're not frequently used, but...
The clever folks who wrote Module::Compile take advantage of this, to pre-compile Perl code into... well, it's still Perl, but it's preprocessed.
Among other benefits, this speeds up loading time and makes debugging easier when using source filters (Perl code modifying Perl source code before being loaded by the interpreter).
Not for PHP, although most PHP setups incorporate a Bytecode Cache that will cache the compiled bytecode so that next time the script runs, the compiled version is run. This speeds up execution considerably.
There's no way I'm aware of to actually get at the bytecode through the command line.
For Perl you can try using B::Bytecode and perlcc. However, both of these are highly experimental. And Perl 6 is coming out soon (theoretically) and will be on Parrot and will use a different bytecode and so all of this will be somewhat moot then.
here are some example magic words for the command-line
perl -MO=Bytecode,-H,-o"Module.pm"c "Module.pm"
According to the third edition of Programming Perl, it is possible to approximate this in some experimental ways.
If you use Zend Guard on your PHP scripts, it essentially precompiles the scripts to byte-code, which can then be run by the PHP engine if the Zend Optimizer extension is loaded.
So, yes, Zend Guard/Optimizer permits pre-compiled PHP scripts to be used.
For PHP, the Phalanger Project compiles down to .Net assemblies. I'm not sure if thats what you were looking for though.
Has anyone considered using LLVM's bytecode, instead of a yet-another-custom-bytecode?
Ruby 1.8 doesn't actually use bytecode at all (even internally), so there is no pre-compilation step.