PHP is usually compiled to opcode by the Zend engine on execution time.
To skip the compiling every time one can use an opcode cache like APC to save the opcode in shared memory and reuse it.
Okay, now it seems that there is no solution yet for just compiling the PHP to opcode and using that. Similar to how you use Java.
But why? I am wondering about that b/c this is a quite obvious idea, so I guess there is a reason for this.
EDIT:
the core question is this:
wouldn't make PHP-compilation make opcode-caching superfluous?
The only "reason" against it would be that you couldn't just fix something on the live-system ... which is anyway bad bad bad practice.
You've given one reason against it.
Another very important one is that if you separate the compile from the runtime both in terms of the time at which each occur but also in terms of the hardware where it runs, you quickly run into complex dependency problems - what happens when you try to run opcode generated by PHP 5.1 on a PHP 5.3 runtime?
It also makes debugging of code harder - since the debugger has to map the opcode back to the source code.
But a very important question you don't seem to have asked let alone answered is what is the benefit of pre-generating the opcode?
Would compiling the opcode prior to runtime have a significant benefit over caching the opcode? The difference would be un-measurably small.
Certainly the raison d'etre for HipHop is that natively compiled PHP code runs faster than PHP with opcode caching at the expense of some functionality. But that's something quite different.
Do you think that having only the opcodes on the server improves the security (by obscurity)?
Related
I have read many articles saying this, Since java directly runs bytecode and for PHP its loaded and complied on every request, so PHP is slower as compared to Java
But what if we use a cache system for PHP like APC or EAccelerator, do the results of both Java and PHP in terms of performance near?
Do not read such articles... It's impossible to compare two different languages and say it's slower because it's not compiled. Yes, parsing PHP code takes time, but JVM requires additional resources, too.
APC or EAccelerator may increase performance but it doesn't mean it will be as fast as Java or C.
Frankly, few developers are writing performant enough code for this to matter. A good PHP programmer will write faster apps than an average Java programmer, and vice versa. And if you're not a good programmer, it won't matter, you'll make them both slow.
Don't get me wrong, you should certainly use an opcode cache for PHP. But if you are, the difference in performance between Java and PHP is unlikely to be the determining factor in your app's performance.
Java has threading and persistence, so if those are important use Java. PHP is super easy to deploy, and does not require extensive tuning of things like heap & garbage collection, so if that's important to you, use PHP.
Unless you're a decent sized website, use the one you know best. You'll have written it twice and optimized it before you write it once in the other language.
From my point of view, both PHP and Java have a similar structure. At first you write some high-level code, which then must be translated in a simpler code format to be executed by a VM. One difference is, that PHP works directly from the source code files, while Java stores the bytecode in .class files, from where the VM can load them.
Nowadays the requirements for speedy PHP execution grow, which leads people to believe that it would be better to directly work with the opcodes and not go through the compiling step each time a user hits a file.
The solution seem to be a load of so called Accelerators, which basically store the compiled results in cache and then use the cached opcodes instead of compiling again.
Another approach, done by Facebook, is to completely compile the PHP code to a different language.
So my question is, why is nobody in the PHP world doing what Java does? Are there some dynamic elements that really need to be recompiled each time or something like that? Otherwise it would be really smarter to compile everything when the code goes into production and then just work with that.
The most important difference is that the JVM has an explicit specification that covers the bytecode completely. That makes bytecode files portable and useful for more than just execution by a specific JVM implementation.
PHP doesn't even have a language specification. PHP opcodes are an implementation detail of a specific PHP engine, so you can't really do anything interesting with them and there's little point in making them more visible.
PHP opcodes are not the same as Java classfiles. Java classfiles are well specified, and are portable between machines. PHP opcodes are not portable in any way. They have memory addresses baked into them, for example. They are strictly an implementation detail of the PHP interpreter, and shouldn't be considered anything like Java bytecode.
Does it have to be this way? No, probably not. But the PHP source code is a mess, and there is neither the desire, nor the political will in the PHP internals community to make this happen. I think there was talk of baking an opcode cache into PHP 6, but PHP 6 died, and I don't know the status of that idea.
Reference: I wrote phc so I was pretty knee deep in PHP implementation/compilation for a few years.
It's not quite true that nobody in the PHP world is doing what java does. Projects such as Alexey Zakhlestin's appserver provide a degree of persistence more akin to a java servlet container (though his inspiration is more Ruby’s Rack and Python’s WSGI than Java)
PHP does not use a standard mechanism for opcodes. I wish it either stuck to a stack VM (python,java) or a register VM (x86, perl6 etc). But it uses something absolutely homegrown and there in lies the rub.
It uses a connected list in memory which results in each opcode having a ->op1 ->op2 and ->result. Now each of those are either constants or entries in a temp table etc. These pointers cannot be serialized in any sane fashion.
Now, people have accomplished this using items like pecl/bcompiler which does dump the stream into the disk.
But the classes make this even more complicated, which means that there are potential code fragments like
if(<conditon>)
{
class XYZ() { }
}
else
{
class XYZ() { }
}
class ABC extends XYZ {}
Which means that a large number of decisions about classes & functions can only be done at runtime - something like Java would choke on two classes with the same name, which are defined conditionally at runtime. Basically, APC's inheritance & class caching code is perhaps the most complicated & bug-prone part of the codebase. Whenever a class is cached, all parent inherited members have to be scrubbed out before it can be saved to the opcode cache.
The pointer problem is not insurmountable. There is an apc_bindump which I have never bothered to fix up to load entire cache entries off disk directly whenever a restart is done. But it's painful to debug all that to get something that still needs to locate all system pointers - the apache case is too easy, because all php processes have the same system pointers because of the fork behaviour. The old fastcgi versions were slower because they used to fork first & init php later - the php-fpm fixed that by doing it the other way around.
But eventually, what's really missing in PHP is the will to invent a bytecode format, throw away the current engine & all modules - to rewrite it using a stack VM & build a JIT. I wish I had the time - the fb guys are almost there with their hiphop HHVM. Which sacrifies eval() for faster performance - which is a fair sacrifice :)
PS: I'm the guy who can't find time to update APC for 5.4 properly
I think all of you are misinformed. HHVM is not a compiler to another languague is a virtual machine itself. The confusion is because facebook use to compile to c++, but that approach was to slowly for the requirements of the developers (ten minutes compiling only for test some tiny things).
from wikipedia:
Most PHP accelerators work by caching the compiled bytecode of PHP
scripts to avoid the overhead of parsing and compiling source code on
each request (some or all of which may never even be executed). To
further improve performance, the cached code is stored in shared
memory and directly executed from there, minimizing the amount of slow
disk reads and memory copying at runtime.
Just in time Compilation:
JIT compilers represent a hybrid approach, with translation occurring
continuously, as with interpreters, but with caching of translated
code to minimize performance degradation.
so is using PHP accelerators such as APC on PHP have equivalent implications towards performance with "Just-in-time" compiling PHP (assuming that it's possible to do so)....in fact are they actually the same thing?
so is using PHP accelerators such as APC on PHP have equivalent implications towards performance with "Just-in-time" compiling PHP (assuming that it's possible to do so)....in fact are they actually the same thing?
Same concept, different execution.
When JIT is spoken of in most circles, it refers to compiling virtual machine bytecode into native bytecode. For example, Facebook's HHVM is a PHP implementation that uses a JIT engine.
However, PHP's native virtual machine doesn't do JIT to native bytecode. In fact, it doesn't do JIT at all in the traditional sense. While whole files are compiled to PHP bytecode on-demand, this isn't actually JIT.
Be careful with the term "PHP accelerator." Back in the PHP4 days, the bytecode created by the PHP parser could be optimized a bit to get better performance. This hasn't been needed since early PHP5. The only thing that APC, the Zend "Optimizer" and other "accelerator" products do to increase PHP performance is cache bytecode. The term "accelerator" should no longer be used to remove ambiguity.
If you're using standard PHP, then you do want a bytecode cache, just steer clear of products saying that they try to do PHP bytecode optimization.
I know you can minify PHP, but I'm wondering if there is any point. PHP is an interpreted language so will run a little slower than a compiled language. My question is: would clients see a visible speed improvement in page loads and such if I were to minify my PHP?
Also, is there a way to compile PHP or something similar?
PHP is compiled into bytecode, which is then interpreted on top of something resembling a VM. Many other scripting languages follow the same general process, including Perl and Ruby. It's not really a traditional interpreted language like, say, BASIC.
There would be no effective speed increase if you attempted to "minify" the source. You would get a major increase by using a bytecode cache like APC.
Facebook introduced a compiler named HipHop that transforms PHP source into C++ code. Rasmus Lerdorf, one of the big PHP guys did a presentation for Digg earlier this year that covers the performance improvements given by HipHop. In short, it's not too much faster than optimizing code and using a bytecode cache. HipHop is overkill for the majority of users.
Facebook also recently unveiled HHVM, a new virtual machine based on their work making HipHop. It's still rather new and it's not clear if it will provide a major performance boost to the general public.
Just to make sure it's stated expressly, please read that presentation in full. It points out numerous ways to benchmark and profile code and identify bottlenecks using tools like xdebug and xhprof, also from Facebook.
2021 Update
HHVM diverged away from vanilla PHP a couple versions ago. PHP 7 and 8 bring a whole bunch of amazing performance improvements that have pretty much closed the gap. You now no longer need to do weird things to get better performance out of PHP!
Minifying PHP source code continues to be useless for performance reasons.
Forgo the idea of minifying PHP in favor of using an opcode cache, like PHP Accelerator, or APC.
Or something else like memcached
Yes there is one (non-technical) point.
Your hoster can spy your code on his server. If you minify and uglify it, it is for spys more difficult to steal your ideas.
One reason for minifying and uglifying php may be spy-protection. I think uglyfing code should one step in an automatic deployment.
With some rewriting (shorter variable names) you could save a few bytes of memory, but that's also seldomly significant.
However I do design some of my applications in a way that allows to concatenate include scripts together. With php -w it can be compacted significantly, adding a little speed gain for script startup. On an opcode-enabled server this however only saves a few file mtime checks.
This is less an answer than an advertisement. I'm been working on a PHP extension that translates Zend opcodes to run on a VM with static typing. It doesn't accelerate arbitrary PHP code. It does allow you to write code that run way faster than what regular PHP allows. The key here is static typing. On a modern CPU, a dynamic language eats branch misprediction penalty left and right. Fact that PHP arrays are hash tables also imposes high cost: lot of branch mispredictions, inefficient use of cache, poor memory prefetching, and no SIMD optimization whatsoever. Branch misprediction and cache misses in particular are achilles' heel for today's processors. My little VM sidesteps those problem by using static types and C array instead of hash table. The result ends up running roughly ten times faster. This is using bytecode interpretation. The extension can optionally compile a function through gcc. In that case, you get two to five times more speed.
Here's the link for anyone interested:
https://github.com/chung-leong/qb/wiki
Again, the extension is not a general PHP accelerator. You have to write code specific for it.
There are PHP compilers... see this previous question for a list; but (unless you're the size of Facebook or are targetting your application to run client-side) they're generally a lot more trouble than they're worth
Simple opcode caching will give you more benefit for the effort involved. Or profile your code to identify the bottlenecks, and then optimise it.
You don't need to minify PHP.
In order to get a better performance, install an Opcode cache; but the ideal solution would be to upgrade your PHP to the 5.5 version or above because the newer versions have an opcode cache by default called Zend Optimiser that is performing better than the other ones http://massivescale.blogspot.com/2013/06/php-55-zend-optimiser-opcache-vs-xcache.html.
The "point" is to make the file smaller, because smaller files load faster than bigger files. Also, removing whitespace will make parsing a tiny bit faster since those characters don't need to be parsed out.
Will it be noticeable? Almost never, unless the file is huge and there's a big difference in size.
Does anybody have experience working with PHP accelerators such as MMCache or Zend Accelerator? I'd like to know if using either of these makes PHP comparable to faster web-technologies. Also, are there trade offs for using these?
Note that Zend Optimizer and MMCache (or similar applications) are totally different things. While Zend Optimizer tries to optimize the program opcode MMCache will cache the scripts in memory and reuse the precompiled code.
I did some benchmarks some time ago and you can find the results in my blog (in German though). The basic results:
Zend Optimizer alone didn't help at all. Actually my scripts were slower than without optimizer.
When it comes to caches:
* fastest: eAccelerator
* XCache
* APC
And: You DO want to install a opcode cache!
For example:
alt text http://blogs.interdose.com/dominik/wp-content/uploads/2008/04/opcode_wordpress.png
This is the duration it took to call the wordpress homepage 10.000 times.
Edit: BTW, eAccelerator contains an optimizer itself.
MMCache has been deprecated. I recommend either http://pecl.php.net/package/APC or http://xcache.lighttpd.net/, both of which also give you variable storage (like Memcache).
Both are interesting and will provide speed boost since they compile source code into binary representation which is then executed by the PHP engine.
Any huge web site running with PHP (Facebook for example) is running some sort of opcode cache system like MMCache.
The problem is that they are not very easy to set up depending on your system.
Depending on how much of your PHP code is actually executed and how long that execution takes they can be a really big win. It certainly isn't going to hurt, but the gain you see will very much depend on where your time is currently spent.
btw mmcache has been rolled into a different project now, I forget the name but Google will tell you.
I use APC on my production servers and it works pretty well out of the box. Compile it and add it to PHP and there isn't much tweaking left to do for it. I check it every once in a while just to review stats but since I use MVC a lot all of the main files (routers, controllers, etc) rarely change on a day-to-day basis so that code stays compiled and runs pretty efficiently.
currently we use apc, free and was just a simple plug and play on our live servers. Provided a huge performance increase for our site, especially as the project size increased. I also have the apc.stat disabled so it doesn't check if the code has been updated, so whenever we need to update the code on the live site we restart apache.
I use APC, and can attest that it can dramatically reduce the CPU and I/O load on an app server if you maintain a high cache-hit rate. It not only saves you from having to compile, it can save you from having to read the php files from disk at all. (i.e. the bytecodes are served directly from main memory, so it's super fast) It lowers the speed to render a single page, and increases the requests per second your server can handle.
If you use RedHat or CentOS, installing APC is super simple:
yum install php-devel httpd-devel php-pear
pecl install apc
echo "extension=apc.so" > /etc/php.d/apc.ini
# if you're using SELinux:
chcon "system_u:object_r:textrel_shlib_t" /usr/lib/php/modules/apc.so
/etc/init.d/httpd restart
You asked about downsides. The only downside is that it requires some memory. The default on APC is 30MB, but it can be adjusted, and the cost of a little bit of memory more than pays for itself with the increased speed and response rate.
BlaM's testing included all the DB calls made by WordPress. When you're making fewer DB calls, you'll see the performance gain of opcode caches be even more dramatic.
I used Zend Accelerator a little back in the day (2004-ish). It certainly gave some significant performance wins on code it could work with, but unfortunately the system I was using was designed to quite often dynamically load code and then eval it, which Zend Accelerator couldn't do much with at the time (and I'd guess still can't).
On the down side, we certainly saw some caching issues (where the code would be changes, but the compiled version sync with the change for one reason or another). I imagine those problems have likely been ironed out by now.
Anyway, I don't have any hard comparison numbers, and certainly didn't write the same system in different environments for comparison, but for the vast majority of systems, PHP isn't going to kill you performance wise.
Have you checked out Phalanger? It compiles PHP to .NET code. Here are some benchmarks which show that it can dramatically improve performance.