I was just thinking to myself "How exactly is a PHP script executed?" I thought it was parsed first for syntax errors etc, and then interpreted and executed.
However, I don't know why I believe that is correct. I'm probably wrong.
So, how exactly is a PHP file interpreted and executed? What stages does this involve? How do included files fit into the parsing of the script?
This is just to help me get my head around it. I'm interested and can not find a good answer with Google.
PHP is a compiled language since PHP 4.0
The idea of what is a compiler seems to be a subject that causes great confusion. Some people assume that a compiler is a program that converts source code in one language into an executable program. The definition of what is a compiler is actually broader than that.
A compiler is a program that transforms source code into another representation of the code. The target representation is often machine code, but it may as well be source code in another language or even in the same language.
PHP became a compiled language in the year 2000, when PHP 4 was released for the first time. Until version 3, PHP source code was parsed and executed right away by the PHP interpreter.
PHP 4 introduced the the Zend engine. This engine splits the processing of PHP code into several phases. The first phase parses PHP source code and generates a binary representation of the PHP code known as Zend opcodes. Opcodes are sets of instructions similar to Java bytecodes. These opcodes are stored in memory. The second phase of Zend engine processing consists in executing the generated opcodes.
Form more information go to http://www.phpclasses.org/blog/post/117-PHP-compiler-performance.html
Basically, each time a PHP script is loaded, it goes by two steps :
The PHP source code is parsed, and converted to what's called opcodes
Kind of an equivalent of JAVA's bytecode
If you want to see what those look like, you can use the VLD extension
Then, those opcode are executed
These slides from Sebastian Bergmann, on slideshare, might help you understand that process a bit better : PHP Compiler Internals
Here is also a list of all the parser tokens.
Related
I was reading about Preload and very excited about it, however (as I understood by searching more on Google) they both seem to have the same definition in my mind:
Preload: Loading compiled PHP files on server startup and make all the defined classes and functions to be permanently available in the context of future request (as I understand from here)
JIT: Compilation of files at run time rather than prior to execution
Which one affects the performance more? specially on frameworks
The confusion here is between two different meanings of "compiled"; or, really, the same meaning - transforming a high-level program into a set of lower-level instructions - applied twice to the same program.
Since PHP 4, the PHP code that humans write has been automatically compiled to a more abstract language, called "op codes". These act as instructions for a "virtual machine", but are still very high-level: each op code triggers a whole sub-routine in the Zend Engine.
The OpCache extension included with PHP since version 5.5 not only caches these op codes to save time re-compiling, it performs a lot of optimisations by manipulating them. Pre-loading is part of this mechanism: it runs the compilation and optimisation steps, and saves the op codes for reuse by multiple PHP processes.
However, those op codes are still a long way from what the CPU is actually going to run. The virtual machine that executes them is technically an interpreter, working through a list of instructions, and performing multiple steps even for something as simple as $x + $y.
The basic idea of JIT in PHP 8 is to supplement that interpreter with a second compiler - this time, compiling from op codes down to actual machine instructions. More specifically, a JIT compiler looks at a section of code as it runs (hence "Just In Time"), and generates a set of CPU instructions to implement it.
Now you may be wondering why we don't do this bit as early as possible - Just In Time seems to be the opposite of pre-loading! The advantage is that the JIT compiler can look at how code is actually being used, rather than all the possible ways it might be used. The interpreter looking at the op codes for $x + $y has to account for the fact that each time the code runs, those variables might be integers, floats, strings, or something where + needs to throw an error. The JIT compiler can see that the running program often has them both as integers, and compile some fast code for that scenario. When the other scenarios come up, the JIT compiler just hands back to the normal interpreter.
The PHP interpreter is converting as a first step the souce code into bytecode. In a second step, the bytecode is passed to a Zend Engie that creates the machine code for the relevant CPU. If opcache is activated, the bytecode is stored in cache, so that the first step can be skipped in subsequent calls.
(Image taken from PHP Master Write Cutting-Edge Code)
However, I do not understand why opcache is caching bytecode instead of maschine code? Would it not be a big benefit if one could skip the first and second step? Also, since a website is in many cases only executed on one single server, I don't see that PHP is using the benefit of bytecode, which is (if I understood correctly) that the code can be used on different hardware.
About my research: Most questions that I found were about if PHP is interpreter or compiler language.
The closest relevant question that I found was: Can you "compile" PHP code and upload a binary-ish file, which will just be run by the byte code interpreter? - but here it was asked if it is possible to parse the bytecode beforehand and upload it (instead of caching). But my question is if the machine code can be cached instead of the bytecode.
I am assuming that this is done so that there only needs to be one compilation strategy and leave it up to the OS dependant PHP interpreter to convert the OpCache to machine code that can run natively.
Compare it to java, where the compilation artifacts are JVM bytecode and it's up to the JVM which is platform specific to turn this bytecode into executable machine code.
If PHP would have to know every possible platform (intel,amd,spark,arm,etc.) this would massively bloat OpCache compiler. However you usually only have the PHP runtime (refered to as Zend Engine in your diagram) that you need on your machine.
As per wikipedia:
Scripts are loaded into memory and compiled into Zend opcodes
One line below is said:
The interpreter part analyzes the input code, translates it, and
executes it.
As I know the code is loaded in the memory, then goes through lexical analyze, getting parsed and compiled to opcodes. I fall in total mess even after ton of articles about the engine. So in the end is PHP code compiled or interpretered?
I think the distinction between "compiling" and "interpreting" is less clear in practice than Computer Science lessons would imply, as is the distinction between a "runtime environment" and a "virtual machine".
The answer is essentially that it is both: the Zend Engine first compiles your PHP code to an intermediate representation called "opcodes"; it then interprets these opcodes to execute the code.
In some ways, this is similar to the way Java is first compiled to bytecode, and then executed on the Java Virtual Machine; however, the "VM" which executes the code in the Zend Engine is not defined like a real processor, and is closely tied to the PHP language. It therefore acts more like a traditional interpreter, but of a language that no human would write.
The Zend Engine is responsible for the following tasks in PHP:
High performance parsing (including syntax checking), in-memory compilation and execution of PHP scripts [..]
Source: http://www.zend.com/products/zend_engine/in_depth
I tried using mycrypt with key and base64 to encrypt and then decode the code, but the code is in a variable so when i output this using eval, i am always getting errors so could you point me in the right direction, I also looked at building my own php extension but i wouldn't know how to output it into working php code.
UPDATE
I have got it to work, now I am going to convert it into an extension, I am just wondering can people decompile php extensions?
Why would you want to write your own encoder? Please, don't. The problem is that, at some point you will need to decode it into plain PHP code to feed to the PHP interpreter. And at that point someone can just come it and dump the code to a file.
Professional solutions like Zend_Guard and ionCube are the only solutions that actually work and are not hackable in 15 minutes by anyone with minimal PHP knowledge.
"can people decompile php extensions?"
Yes, it's certainly possible to reverse engineer and/or decompile compiled C code back to pseudo-code or source, but with your approach no one is going to need to in order to expose the code that you believe that you are protecting as in reality it is merely hidden.
The eval() function that you are calling is part of the opensource PHP core, and the source code could be trivially exposed either by modifying the eval() module function or the function referenced by the zend_compile_string function pointer (typically this is the address of the compile_string function).
Systems such as Zend and ionCube operate on compiled code (which PHP always produces ready for execution), and it's the bytecode that is encoded. Consequently there is no source code in encoded files to be restored at runtime. Additionally, a required component on the server may also contain a closed source execution engine rather than passing restored bytecode to the default bytecode execution engine in the PHP core, keeping bytecode more hidden and giving the opportunity to execute bytecode that does not conform to the usual PHP bytecode structure (hence needing more reverse engineering effort to understand it).
Is PHP compiled or interpreted?
PHP is an interpreted language. The binary that lets you interpret PHP is compiled, but what you write is interpreted.
You can see more on the Wikipedia page for Interpreted languages
Both. PHP is compiled down to an intermediate bytecode that is then interpreted by the runtime engine.
The PHP compiler's job is to parse your PHP code and convert it into a form suitable for the runtime engine. Among its tasks:
Ignore comments
Resolve variables, function names, and so forth and create the symbol table
Construct the abstract syntax tree of your program
Write the bytecode
Depending on your PHP setup, this step is typically done just once, the first time the script is called. The compiler output is cached to speed up access on subsequent uses. If the script is modified, however, the compilation step is done again.
The runtime engine walks the AST and bytecode when the script is called. The symbol table is used to store the values of variables and provide the bytecode addresses for functions.
This process of compiling to bytecode and interpreting it at runtime is typical for languages that run on some kind of virtual runtime machine including Perl, Java, Ruby, Smalltalk, and others.
A compiled code can be executed directly by the computer's CPU. That is, the executable code is specified in the CPU's native language.
The code of interpreted languages must be translated at run-time from any format to CPU machine instructions. This translation is done by an interpreter.
It would not be proper to say that a language is interpreted or compiled, because interpretation and compilation are both properties of the implementation of that particular language and not a property of the language as such. So, any language can be compiled or interpreted — it just depends on what the particular implementation that you are using does.
The most widely used PHP implementation is powered by the Zend Engine and is known simply as PHP. The Zend Engine compiles PHP source into a format that it can execute, thus the Zend engine works as an interpreter.
In generally it is interpreted, but some time can use it as compiled and it is really increases performance.
Open source tool to perform this operation:
hhvm.com
PHP is an interpreted language. It can be compiled to bytecode by third party-tools, though.
I know this question is old but it's linked all over the place and I think all answers here are incorrect (maybe because they're old).
There is NO such thing as an interpreted language or a compiled language. Any programming language can be interpreted and/or compiled.
First of all a language is just a set of rules, so when we are talking about compilation we refer to specific implementations of that language.
HHVM, for example, is an implementation of PHP. It uses JIT compilation to transform the code to intermediate HipHop bytecode and then translated into machine code. Is it enough to say it is compiled? Some Java implementations (not all) also use JIT. Google's V8 also uses JIT.
Using the old definitions of compiled vs. interpreted does not make sense nowadays.
"Is PHP compiled?" is a non-sensical question given that there are no
longer clear and agreed delimiters between what is a compiled language vs an
interpreted one.
One possible way to delimit them is (I don't find any meaning in this dichotomy):
compiled languages use Ahead of Time compilation (C, C++);
interpreted languages use Just in Time compilation or no compilation at all (Python, Ruby, PHP, Java).
This is a meaningless question. PHP uses yacc (bison), just like GCC. yacc is a "compiler compiler". The output of yacc is a compiler. The output of a compiler is "compiled". PHP is parsed by the output of yacc. So it is, by definition, compiled.
If that doesn't satisfy, consider the following. Both php (the binary) and gcc read your source code and produce an abstract syntax tree. Under versions 4 and 5, php then walks the tree to translate the program to bytecode (the compilation step). You can see the bytecode translated to opcodes (which are analogous to assembly) using the Vulcan Logic Dumper. Finally, php (in particular, the Zend engine) interprets the bytecode. gcc, in comparison, walks the tree and outputs assembly; it can also run assemblers and linkers to finish the process. Calling a program handled by one "interpreted" and another program handled by the other "compiled" is meaningless. After all, programs are both run through a "compiler" with both.
You should actually ask the question you want to ask instead. ("Do I pay a performance penalty as PHP recompiles my source code for every request?", etc.)
Just keep in mind, if you need to source code every time to run the program, it means it is using Interpreter. So its an interpreted language.
On the other hand, if you compiled the source code and generate a compiled code which you can executed, then it is using complier. As here you don't need to source code. Like C, JAVA
At least it doesn't compile (or should I say optimize) the code as much as one might want it.
This code...
for($i=0;$i<100000000;$i++);
echo $i;
...delays the program equally much each time it is run.
It could have detected that it is a calculation that only needs to be done the first time.
The accepted answer is blatantly false. PHP IS compiled. End of story. Maybe not to native instructions but to an interpreted bytecode.