The PHP interpreter is converting as a first step the souce code into bytecode. In a second step, the bytecode is passed to a Zend Engie that creates the machine code for the relevant CPU. If opcache is activated, the bytecode is stored in cache, so that the first step can be skipped in subsequent calls.
(Image taken from PHP Master Write Cutting-Edge Code)
However, I do not understand why opcache is caching bytecode instead of maschine code? Would it not be a big benefit if one could skip the first and second step? Also, since a website is in many cases only executed on one single server, I don't see that PHP is using the benefit of bytecode, which is (if I understood correctly) that the code can be used on different hardware.
About my research: Most questions that I found were about if PHP is interpreter or compiler language.
The closest relevant question that I found was: Can you "compile" PHP code and upload a binary-ish file, which will just be run by the byte code interpreter? - but here it was asked if it is possible to parse the bytecode beforehand and upload it (instead of caching). But my question is if the machine code can be cached instead of the bytecode.
I am assuming that this is done so that there only needs to be one compilation strategy and leave it up to the OS dependant PHP interpreter to convert the OpCache to machine code that can run natively.
Compare it to java, where the compilation artifacts are JVM bytecode and it's up to the JVM which is platform specific to turn this bytecode into executable machine code.
If PHP would have to know every possible platform (intel,amd,spark,arm,etc.) this would massively bloat OpCache compiler. However you usually only have the PHP runtime (refered to as Zend Engine in your diagram) that you need on your machine.
My information:
PHP is a programming language which uses an interpreter.
The interpreter is a compiled software between the source code and the machine.
It reads and analyses the source code at runtime and starts its own Subroutines based on the source code.
Its not compiling or translating the code into something new which could be saved because its a kind of executing the code.
The Opcache by Zend is able to store precompiled bytecode and to use it again. (I know how it generally works.)
http://www.sitepoint.com/understanding-opcache/
My question:
Where does the Opcache gets his precompiled scripts from when the interpreter is not compiling?
Its not compiling or translating the code into something new which could be saved because its a kind of executing the code.
That's incorrect. The first thing the interpreter does is compile the PHP source code into an executable bytecode format, which is then executed.
It's not unlike what .NET and Java do, except that they do it pre-emptively ahead of time, whereas PHP does it on-demand as the script is executed.
Things like the OPcache take this bytecode and cache that, saving the interpreter from having to fetch the source code and parse it each time a script is executed.
I was just thinking to myself "How exactly is a PHP script executed?" I thought it was parsed first for syntax errors etc, and then interpreted and executed.
However, I don't know why I believe that is correct. I'm probably wrong.
So, how exactly is a PHP file interpreted and executed? What stages does this involve? How do included files fit into the parsing of the script?
This is just to help me get my head around it. I'm interested and can not find a good answer with Google.
PHP is a compiled language since PHP 4.0
The idea of what is a compiler seems to be a subject that causes great confusion. Some people assume that a compiler is a program that converts source code in one language into an executable program. The definition of what is a compiler is actually broader than that.
A compiler is a program that transforms source code into another representation of the code. The target representation is often machine code, but it may as well be source code in another language or even in the same language.
PHP became a compiled language in the year 2000, when PHP 4 was released for the first time. Until version 3, PHP source code was parsed and executed right away by the PHP interpreter.
PHP 4 introduced the the Zend engine. This engine splits the processing of PHP code into several phases. The first phase parses PHP source code and generates a binary representation of the PHP code known as Zend opcodes. Opcodes are sets of instructions similar to Java bytecodes. These opcodes are stored in memory. The second phase of Zend engine processing consists in executing the generated opcodes.
Form more information go to http://www.phpclasses.org/blog/post/117-PHP-compiler-performance.html
Basically, each time a PHP script is loaded, it goes by two steps :
The PHP source code is parsed, and converted to what's called opcodes
Kind of an equivalent of JAVA's bytecode
If you want to see what those look like, you can use the VLD extension
Then, those opcode are executed
These slides from Sebastian Bergmann, on slideshare, might help you understand that process a bit better : PHP Compiler Internals
Here is also a list of all the parser tokens.
I was trying to understand the working of Zend with the help of this excellent article. Its when I found out that Zend Engine was a Virtual Machine.
Now my question is whats the advantage of creating an intermediate code for scripting languages like php?
I can understand that having Intermediate Code in the case of programming languages like Java and CSharp would introduce portability across different platforms like Linux and Windows.
It is faster to execute bytecode than interpret sourcecode.
This bytecode might be cached (this is done via PHP accelerators), thus giving up to 20x performance boost.
The term VM in the article is completely wrong. In reality he's describing that PHP compiles the scripts to bytecode, and this bytecode will be interpreted, there's NO vm inside of PHP.
The bytecode operations (opcode) are only an efficient representation of a php script to run the statements one after another and store the results correctly. Have a look at "Abstract Syntax Tree"s to fully understand the bytecode and their advantage for every language.
Is PHP compiled or interpreted?
PHP is an interpreted language. The binary that lets you interpret PHP is compiled, but what you write is interpreted.
You can see more on the Wikipedia page for Interpreted languages
Both. PHP is compiled down to an intermediate bytecode that is then interpreted by the runtime engine.
The PHP compiler's job is to parse your PHP code and convert it into a form suitable for the runtime engine. Among its tasks:
Ignore comments
Resolve variables, function names, and so forth and create the symbol table
Construct the abstract syntax tree of your program
Write the bytecode
Depending on your PHP setup, this step is typically done just once, the first time the script is called. The compiler output is cached to speed up access on subsequent uses. If the script is modified, however, the compilation step is done again.
The runtime engine walks the AST and bytecode when the script is called. The symbol table is used to store the values of variables and provide the bytecode addresses for functions.
This process of compiling to bytecode and interpreting it at runtime is typical for languages that run on some kind of virtual runtime machine including Perl, Java, Ruby, Smalltalk, and others.
A compiled code can be executed directly by the computer's CPU. That is, the executable code is specified in the CPU's native language.
The code of interpreted languages must be translated at run-time from any format to CPU machine instructions. This translation is done by an interpreter.
It would not be proper to say that a language is interpreted or compiled, because interpretation and compilation are both properties of the implementation of that particular language and not a property of the language as such. So, any language can be compiled or interpreted — it just depends on what the particular implementation that you are using does.
The most widely used PHP implementation is powered by the Zend Engine and is known simply as PHP. The Zend Engine compiles PHP source into a format that it can execute, thus the Zend engine works as an interpreter.
In generally it is interpreted, but some time can use it as compiled and it is really increases performance.
Open source tool to perform this operation:
hhvm.com
PHP is an interpreted language. It can be compiled to bytecode by third party-tools, though.
I know this question is old but it's linked all over the place and I think all answers here are incorrect (maybe because they're old).
There is NO such thing as an interpreted language or a compiled language. Any programming language can be interpreted and/or compiled.
First of all a language is just a set of rules, so when we are talking about compilation we refer to specific implementations of that language.
HHVM, for example, is an implementation of PHP. It uses JIT compilation to transform the code to intermediate HipHop bytecode and then translated into machine code. Is it enough to say it is compiled? Some Java implementations (not all) also use JIT. Google's V8 also uses JIT.
Using the old definitions of compiled vs. interpreted does not make sense nowadays.
"Is PHP compiled?" is a non-sensical question given that there are no
longer clear and agreed delimiters between what is a compiled language vs an
interpreted one.
One possible way to delimit them is (I don't find any meaning in this dichotomy):
compiled languages use Ahead of Time compilation (C, C++);
interpreted languages use Just in Time compilation or no compilation at all (Python, Ruby, PHP, Java).
This is a meaningless question. PHP uses yacc (bison), just like GCC. yacc is a "compiler compiler". The output of yacc is a compiler. The output of a compiler is "compiled". PHP is parsed by the output of yacc. So it is, by definition, compiled.
If that doesn't satisfy, consider the following. Both php (the binary) and gcc read your source code and produce an abstract syntax tree. Under versions 4 and 5, php then walks the tree to translate the program to bytecode (the compilation step). You can see the bytecode translated to opcodes (which are analogous to assembly) using the Vulcan Logic Dumper. Finally, php (in particular, the Zend engine) interprets the bytecode. gcc, in comparison, walks the tree and outputs assembly; it can also run assemblers and linkers to finish the process. Calling a program handled by one "interpreted" and another program handled by the other "compiled" is meaningless. After all, programs are both run through a "compiler" with both.
You should actually ask the question you want to ask instead. ("Do I pay a performance penalty as PHP recompiles my source code for every request?", etc.)
Just keep in mind, if you need to source code every time to run the program, it means it is using Interpreter. So its an interpreted language.
On the other hand, if you compiled the source code and generate a compiled code which you can executed, then it is using complier. As here you don't need to source code. Like C, JAVA
At least it doesn't compile (or should I say optimize) the code as much as one might want it.
This code...
for($i=0;$i<100000000;$i++);
echo $i;
...delays the program equally much each time it is run.
It could have detected that it is a calculation that only needs to be done the first time.
The accepted answer is blatantly false. PHP IS compiled. End of story. Maybe not to native instructions but to an interpreted bytecode.