The PHP interpreter is converting as a first step the souce code into bytecode. In a second step, the bytecode is passed to a Zend Engie that creates the machine code for the relevant CPU. If opcache is activated, the bytecode is stored in cache, so that the first step can be skipped in subsequent calls.
(Image taken from PHP Master Write Cutting-Edge Code)
However, I do not understand why opcache is caching bytecode instead of maschine code? Would it not be a big benefit if one could skip the first and second step? Also, since a website is in many cases only executed on one single server, I don't see that PHP is using the benefit of bytecode, which is (if I understood correctly) that the code can be used on different hardware.
About my research: Most questions that I found were about if PHP is interpreter or compiler language.
The closest relevant question that I found was: Can you "compile" PHP code and upload a binary-ish file, which will just be run by the byte code interpreter? - but here it was asked if it is possible to parse the bytecode beforehand and upload it (instead of caching). But my question is if the machine code can be cached instead of the bytecode.
I am assuming that this is done so that there only needs to be one compilation strategy and leave it up to the OS dependant PHP interpreter to convert the OpCache to machine code that can run natively.
Compare it to java, where the compilation artifacts are JVM bytecode and it's up to the JVM which is platform specific to turn this bytecode into executable machine code.
If PHP would have to know every possible platform (intel,amd,spark,arm,etc.) this would massively bloat OpCache compiler. However you usually only have the PHP runtime (refered to as Zend Engine in your diagram) that you need on your machine.
As per wikipedia:
Scripts are loaded into memory and compiled into Zend opcodes
One line below is said:
The interpreter part analyzes the input code, translates it, and
executes it.
As I know the code is loaded in the memory, then goes through lexical analyze, getting parsed and compiled to opcodes. I fall in total mess even after ton of articles about the engine. So in the end is PHP code compiled or interpretered?
I think the distinction between "compiling" and "interpreting" is less clear in practice than Computer Science lessons would imply, as is the distinction between a "runtime environment" and a "virtual machine".
The answer is essentially that it is both: the Zend Engine first compiles your PHP code to an intermediate representation called "opcodes"; it then interprets these opcodes to execute the code.
In some ways, this is similar to the way Java is first compiled to bytecode, and then executed on the Java Virtual Machine; however, the "VM" which executes the code in the Zend Engine is not defined like a real processor, and is closely tied to the PHP language. It therefore acts more like a traditional interpreter, but of a language that no human would write.
The Zend Engine is responsible for the following tasks in PHP:
High performance parsing (including syntax checking), in-memory compilation and execution of PHP scripts [..]
Source: http://www.zend.com/products/zend_engine/in_depth
I was just thinking to myself "How exactly is a PHP script executed?" I thought it was parsed first for syntax errors etc, and then interpreted and executed.
However, I don't know why I believe that is correct. I'm probably wrong.
So, how exactly is a PHP file interpreted and executed? What stages does this involve? How do included files fit into the parsing of the script?
This is just to help me get my head around it. I'm interested and can not find a good answer with Google.
PHP is a compiled language since PHP 4.0
The idea of what is a compiler seems to be a subject that causes great confusion. Some people assume that a compiler is a program that converts source code in one language into an executable program. The definition of what is a compiler is actually broader than that.
A compiler is a program that transforms source code into another representation of the code. The target representation is often machine code, but it may as well be source code in another language or even in the same language.
PHP became a compiled language in the year 2000, when PHP 4 was released for the first time. Until version 3, PHP source code was parsed and executed right away by the PHP interpreter.
PHP 4 introduced the the Zend engine. This engine splits the processing of PHP code into several phases. The first phase parses PHP source code and generates a binary representation of the PHP code known as Zend opcodes. Opcodes are sets of instructions similar to Java bytecodes. These opcodes are stored in memory. The second phase of Zend engine processing consists in executing the generated opcodes.
Form more information go to http://www.phpclasses.org/blog/post/117-PHP-compiler-performance.html
Basically, each time a PHP script is loaded, it goes by two steps :
The PHP source code is parsed, and converted to what's called opcodes
Kind of an equivalent of JAVA's bytecode
If you want to see what those look like, you can use the VLD extension
Then, those opcode are executed
These slides from Sebastian Bergmann, on slideshare, might help you understand that process a bit better : PHP Compiler Internals
Here is also a list of all the parser tokens.
In some programming languages like C,C++,c#,Java,..etc when the code is compiled then the code is converted into another form in order to execute it. Does the Apache do the same or just it executes it without any conversion?
Apache doesn't do anything except handle the incoming request and serve the resulting output. Everything else is done by the PHP interpreter which pre-compiles the PHP code to a bytecode form, and then executes the bytecode instructions.
Apache is just a webserver that may or may not be running with PHP as a module. It's better to think of the webserver as a mere mediator between the frontend and the php binary.
The latter is compiled, yes, but it runs your code without compiling. It is an interpreted language.
There are ways to accelerate php processing using some opcode cache or just in time compilers, but default PHP doesn't deal with that.
For Python, it can create a pre-compiled version file.pyc so that the program can be run without interpreted again. Can Ruby, PHP, and Perl do the same on the command line?
There is no portable bytecode specification for Ruby, and thus also no standard way to load precompiled bytecode archives. However, almost all Ruby implementations use some kind of bytecode or intcode format, and several of them can dump and reload bytecode archives.
YARV always compiles to bytecode before executing the code, however that is usually only done in memory. There are ways to dump out the bytecode to disk. At the moment, there is no way to read it back in, however. This will change in the future: work is underway on a bytecode verifier for YARV, and once that is done, bytecode can safely be loaded into the VM, without fear of corruption. Also, the JRuby developers have indicated that they are willing to implement a YARV VM emulator inside JRuby, once the YARV bytecode format and verifier are stabilized, so that you could load YARV bytecode into JRuby. (Note that this version is obsolete.)
Rubinius also always compiles to bytecode, and it has a format for compiled files (.rbc files, analogous to JVM .class files) and there is talk about a bytecode archive format (.rba files, analogous to JVM .jar files). There is a chance that Rubinius might implement a YARV emulator, if deploying apps as YARV bytecode ever becomes popular. Also, the JRuby developers have indicated that they are willing to implement a Rubinius bytecode emulator inside JRuby, if Rubinius bytecode becomes a popular way of deploying Ruby apps. (Note that this version is obsolete.)
XRuby is a pure compiler, it compiles Ruby sourcecode straight to JVM bytecode (.class files). You can deploy these .class files just like any other Java application.
JRuby started out as an interpreter, but it has both a JIT compiler and an AOT compiler (jrubyc) that can compile Ruby sourcecode to JVM bytecode (.class files). Also, work is underway to create a new compiler that can compile (type-annotated) Ruby code to JVM bytecode that actually looks like a Java class and can be used from Java code without barriers.
Ruby.NET is a pure compiler that compiles Ruby sourcecode to CIL bytecode (PE .dll or .exe files). You can deploy these just like any other CLI application.
IronRuby also compiles to CIL bytecode, but typically does this in-memory. However, you can pass commandline switches to it, so it dumps the .dll and .exe files out to disk. Once you have those, they can be deployed normally.
BlueRuby automatically pre-parses Ruby sourcecode into BRIL (BlueRuby Intermediate Language), which is basically a serialized parsetree. (See Blue Ruby - A Ruby VM in SAP ABAP(PDF) for details.)
I think (but I am definitely not sure) that there is a way to get Cardinal to dump out Parrot bytecode archives. (Actually, Cardinal only compiles to PAST, and then Parrot takes over, so it would be Parrot's job to dump and load bytecode archives.)
Perl 5 can dump the bytecodes to disk, but it is buggy and nasty. Perl 6 has a very clean method of creating bytecode executables that Parrot can run.
Perl's just-in-time compilation is fast enough that this doesn't matter in most circumstances. One place where it does matter is in a CGI environment which is what mod_perl is for.
For hysterical raisins, Perl 5 looks for .pmc files ahead of .pm files when searching for module. These files could contain bytecode, though Perl doesn't write bytecode out by default (unlike Python).
Module::Compile (or: what's this PMC thingy?) goes into some more depth about this obscure feature. They're not frequently used, but...
The clever folks who wrote Module::Compile take advantage of this, to pre-compile Perl code into... well, it's still Perl, but it's preprocessed.
Among other benefits, this speeds up loading time and makes debugging easier when using source filters (Perl code modifying Perl source code before being loaded by the interpreter).
Not for PHP, although most PHP setups incorporate a Bytecode Cache that will cache the compiled bytecode so that next time the script runs, the compiled version is run. This speeds up execution considerably.
There's no way I'm aware of to actually get at the bytecode through the command line.
For Perl you can try using B::Bytecode and perlcc. However, both of these are highly experimental. And Perl 6 is coming out soon (theoretically) and will be on Parrot and will use a different bytecode and so all of this will be somewhat moot then.
here are some example magic words for the command-line
perl -MO=Bytecode,-H,-o"Module.pm"c "Module.pm"
According to the third edition of Programming Perl, it is possible to approximate this in some experimental ways.
If you use Zend Guard on your PHP scripts, it essentially precompiles the scripts to byte-code, which can then be run by the PHP engine if the Zend Optimizer extension is loaded.
So, yes, Zend Guard/Optimizer permits pre-compiled PHP scripts to be used.
For PHP, the Phalanger Project compiles down to .Net assemblies. I'm not sure if thats what you were looking for though.
Has anyone considered using LLVM's bytecode, instead of a yet-another-custom-bytecode?
Ruby 1.8 doesn't actually use bytecode at all (even internally), so there is no pre-compilation step.