Why is opcache caching bytecode instead of maschine code? - php

The PHP interpreter is converting as a first step the souce code into bytecode. In a second step, the bytecode is passed to a Zend Engie that creates the machine code for the relevant CPU. If opcache is activated, the bytecode is stored in cache, so that the first step can be skipped in subsequent calls.
(Image taken from PHP Master Write Cutting-Edge Code)
However, I do not understand why opcache is caching bytecode instead of maschine code? Would it not be a big benefit if one could skip the first and second step? Also, since a website is in many cases only executed on one single server, I don't see that PHP is using the benefit of bytecode, which is (if I understood correctly) that the code can be used on different hardware.
About my research: Most questions that I found were about if PHP is interpreter or compiler language.
The closest relevant question that I found was: Can you "compile" PHP code and upload a binary-ish file, which will just be run by the byte code interpreter? - but here it was asked if it is possible to parse the bytecode beforehand and upload it (instead of caching). But my question is if the machine code can be cached instead of the bytecode.

I am assuming that this is done so that there only needs to be one compilation strategy and leave it up to the OS dependant PHP interpreter to convert the OpCache to machine code that can run natively.
Compare it to java, where the compilation artifacts are JVM bytecode and it's up to the JVM which is platform specific to turn this bytecode into executable machine code.
If PHP would have to know every possible platform (intel,amd,spark,arm,etc.) this would massively bloat OpCache compiler. However you usually only have the PHP runtime (refered to as Zend Engine in your diagram) that you need on your machine.

Related

How to record the processing of php code by the server ? how to decompile php code?

How is php code executed and how is the server processing it ?
We all know that all software becomes machine language(01).
We can also easily convert machine language(01) to assembly language.
And finally we have done the decompile operation and we can reverse engineer the software !
it's true ?!
ok!
Now php code that is encrypted by different algorithms and different software such as (zend, ioncube, ...) will eventually be converted to php code for the server to be able to process them ?!
The code that is encrypted should eventually become readable language for the server, right?
I want to know what is the process of decompile this source code on the server ?
example:
This is my code:
<?php
$x = 1;
$y = 2;
$sum = $x + $y;
echo $sum;
output:
3
To do this simple calculation, the server takes the php code and then processes it and displays the output ....
Is there a way to record the process of code processing by the server?
Like software that eventually becomes the language of the machine and we get the machine code of that software for reverse engineering !
I translated this question by google translate, I hope you understand what I mean! thanks all.
Some comments above are muddled and not entirely accurate. This should give a little insight.
"We all know that all software becomes machine language"
Putting aside code that sits below machine language such as that found in CISC or EISC architectures (e.g. Linn Rekursiv from the folks that used to make record players), software may not become machine language but instead simply be interpreted as-is or be compiled and executed by a program implementing a virtual machine. So it may be transformed, but not necessarily to machine language.
"How is php code executed and how is the server processing it"
As of PHP 4, PHP source gets compiled to bytecode and the bytecode is executed by a virtual machine, the so-called "Zend Engine". With the outcome of compilation being deterministic, this naturally lends itself to caching, and PHP installations will often have a caching component that optimises and caches the compiled bytecode in shared memory. The cache allows the compilation step to be skipped for files that are cached and unchanged, and by ensuring that some other prerequisites related to memory addressing are met, compiled code can be executed directly from the shared memory cache with no deserialisation and minimal overhead, thus delivering a good speedup.
There are various ways that a server may trigger PHP execution, such as directly running the PHP command line program, having the PHP core linked into the web server, or PHP processes running in a pool with I/O transferred between server and PHP via some IPC mechanism.
Rather than protecting source code, protection systems such as Zend and ionCube leverage the compilation to bytecode process first by also compiling source to bytecode, and then going further to protect the bytecode in various ways, perhaps by also utilising a custom execution engine.
Bytecode systems tend to have machine instructions that are closer to the high level language they target than native machine code, with PHP bytecode having instructions to implement foreach and some other keywords directly for example. As a result, bytecode tends to be easier to decompile, and there have been decompilers for PHP since around 2006 first coming out of China and later with projects such as XCache.

What is the difference between Preloading on PHP 7.4 and JIT (which is going to be released on PHP 8)

I was reading about Preload and very excited about it, however (as I understood by searching more on Google) they both seem to have the same definition in my mind:
Preload: Loading compiled PHP files on server startup and make all the defined classes and functions to be permanently available in the context of future request (as I understand from here)
JIT: Compilation of files at run time rather than prior to execution
Which one affects the performance more? specially on frameworks
The confusion here is between two different meanings of "compiled"; or, really, the same meaning - transforming a high-level program into a set of lower-level instructions - applied twice to the same program.
Since PHP 4, the PHP code that humans write has been automatically compiled to a more abstract language, called "op codes". These act as instructions for a "virtual machine", but are still very high-level: each op code triggers a whole sub-routine in the Zend Engine.
The OpCache extension included with PHP since version 5.5 not only caches these op codes to save time re-compiling, it performs a lot of optimisations by manipulating them. Pre-loading is part of this mechanism: it runs the compilation and optimisation steps, and saves the op codes for reuse by multiple PHP processes.
However, those op codes are still a long way from what the CPU is actually going to run. The virtual machine that executes them is technically an interpreter, working through a list of instructions, and performing multiple steps even for something as simple as $x + $y.
The basic idea of JIT in PHP 8 is to supplement that interpreter with a second compiler - this time, compiling from op codes down to actual machine instructions. More specifically, a JIT compiler looks at a section of code as it runs (hence "Just In Time"), and generates a set of CPU instructions to implement it.
Now you may be wondering why we don't do this bit as early as possible - Just In Time seems to be the opposite of pre-loading! The advantage is that the JIT compiler can look at how code is actually being used, rather than all the possible ways it might be used. The interpreter looking at the op codes for $x + $y has to account for the fact that each time the code runs, those variables might be integers, floats, strings, or something where + needs to throw an error. The JIT compiler can see that the running program often has them both as integers, and compile some fast code for that scenario. When the other scenarios come up, the JIT compiler just hands back to the normal interpreter.

PHP Zend engine compile the code or interpreters it

As per wikipedia:
Scripts are loaded into memory and compiled into Zend opcodes
One line below is said:
The interpreter part analyzes the input code, translates it, and
executes it.
As I know the code is loaded in the memory, then goes through lexical analyze, getting parsed and compiled to opcodes. I fall in total mess even after ton of articles about the engine. So in the end is PHP code compiled or interpretered?
I think the distinction between "compiling" and "interpreting" is less clear in practice than Computer Science lessons would imply, as is the distinction between a "runtime environment" and a "virtual machine".
The answer is essentially that it is both: the Zend Engine first compiles your PHP code to an intermediate representation called "opcodes"; it then interprets these opcodes to execute the code.
In some ways, this is similar to the way Java is first compiled to bytecode, and then executed on the Java Virtual Machine; however, the "VM" which executes the code in the Zend Engine is not defined like a real processor, and is closely tied to the PHP language. It therefore acts more like a traditional interpreter, but of a language that no human would write.
The Zend Engine is responsible for the following tasks in PHP:
High performance parsing (including syntax checking), in-memory compilation and execution of PHP scripts [..]
Source: http://www.zend.com/products/zend_engine/in_depth

PHP interpreter Opcache

My information:
PHP is a programming language which uses an interpreter.
The interpreter is a compiled software between the source code and the machine.
It reads and analyses the source code at runtime and starts its own Subroutines based on the source code.
Its not compiling or translating the code into something new which could be saved because its a kind of executing the code.
The Opcache by Zend is able to store precompiled bytecode and to use it again. (I know how it generally works.)
http://www.sitepoint.com/understanding-opcache/
My question:
Where does the Opcache gets his precompiled scripts from when the interpreter is not compiling?
Its not compiling or translating the code into something new which could be saved because its a kind of executing the code.
That's incorrect. The first thing the interpreter does is compile the PHP source code into an executable bytecode format, which is then executed.
It's not unlike what .NET and Java do, except that they do it pre-emptively ahead of time, whereas PHP does it on-demand as the script is executed.
Things like the OPcache take this bytecode and cache that, saving the interpreter from having to fetch the source code and parse it each time a script is executed.

Can Ruby, PHP, or Perl create a pre-compiled file for the code like Python?

For Python, it can create a pre-compiled version file.pyc so that the program can be run without interpreted again. Can Ruby, PHP, and Perl do the same on the command line?
There is no portable bytecode specification for Ruby, and thus also no standard way to load precompiled bytecode archives. However, almost all Ruby implementations use some kind of bytecode or intcode format, and several of them can dump and reload bytecode archives.
YARV always compiles to bytecode before executing the code, however that is usually only done in memory. There are ways to dump out the bytecode to disk. At the moment, there is no way to read it back in, however. This will change in the future: work is underway on a bytecode verifier for YARV, and once that is done, bytecode can safely be loaded into the VM, without fear of corruption. Also, the JRuby developers have indicated that they are willing to implement a YARV VM emulator inside JRuby, once the YARV bytecode format and verifier are stabilized, so that you could load YARV bytecode into JRuby. (Note that this version is obsolete.)
Rubinius also always compiles to bytecode, and it has a format for compiled files (.rbc files, analogous to JVM .class files) and there is talk about a bytecode archive format (.rba files, analogous to JVM .jar files). There is a chance that Rubinius might implement a YARV emulator, if deploying apps as YARV bytecode ever becomes popular. Also, the JRuby developers have indicated that they are willing to implement a Rubinius bytecode emulator inside JRuby, if Rubinius bytecode becomes a popular way of deploying Ruby apps. (Note that this version is obsolete.)
XRuby is a pure compiler, it compiles Ruby sourcecode straight to JVM bytecode (.class files). You can deploy these .class files just like any other Java application.
JRuby started out as an interpreter, but it has both a JIT compiler and an AOT compiler (jrubyc) that can compile Ruby sourcecode to JVM bytecode (.class files). Also, work is underway to create a new compiler that can compile (type-annotated) Ruby code to JVM bytecode that actually looks like a Java class and can be used from Java code without barriers.
Ruby.NET is a pure compiler that compiles Ruby sourcecode to CIL bytecode (PE .dll or .exe files). You can deploy these just like any other CLI application.
IronRuby also compiles to CIL bytecode, but typically does this in-memory. However, you can pass commandline switches to it, so it dumps the .dll and .exe files out to disk. Once you have those, they can be deployed normally.
BlueRuby automatically pre-parses Ruby sourcecode into BRIL (BlueRuby Intermediate Language), which is basically a serialized parsetree. (See Blue Ruby - A Ruby VM in SAP ABAP(PDF) for details.)
I think (but I am definitely not sure) that there is a way to get Cardinal to dump out Parrot bytecode archives. (Actually, Cardinal only compiles to PAST, and then Parrot takes over, so it would be Parrot's job to dump and load bytecode archives.)
Perl 5 can dump the bytecodes to disk, but it is buggy and nasty. Perl 6 has a very clean method of creating bytecode executables that Parrot can run.
Perl's just-in-time compilation is fast enough that this doesn't matter in most circumstances. One place where it does matter is in a CGI environment which is what mod_perl is for.
For hysterical raisins, Perl 5 looks for .pmc files ahead of .pm files when searching for module. These files could contain bytecode, though Perl doesn't write bytecode out by default (unlike Python).
Module::Compile (or: what's this PMC thingy?) goes into some more depth about this obscure feature. They're not frequently used, but...
The clever folks who wrote Module::Compile take advantage of this, to pre-compile Perl code into... well, it's still Perl, but it's preprocessed.
Among other benefits, this speeds up loading time and makes debugging easier when using source filters (Perl code modifying Perl source code before being loaded by the interpreter).
Not for PHP, although most PHP setups incorporate a Bytecode Cache that will cache the compiled bytecode so that next time the script runs, the compiled version is run. This speeds up execution considerably.
There's no way I'm aware of to actually get at the bytecode through the command line.
For Perl you can try using B::Bytecode and perlcc. However, both of these are highly experimental. And Perl 6 is coming out soon (theoretically) and will be on Parrot and will use a different bytecode and so all of this will be somewhat moot then.
here are some example magic words for the command-line
perl -MO=Bytecode,-H,-o"Module.pm"c "Module.pm"
According to the third edition of Programming Perl, it is possible to approximate this in some experimental ways.
If you use Zend Guard on your PHP scripts, it essentially precompiles the scripts to byte-code, which can then be run by the PHP engine if the Zend Optimizer extension is loaded.
So, yes, Zend Guard/Optimizer permits pre-compiled PHP scripts to be used.
For PHP, the Phalanger Project compiles down to .Net assemblies. I'm not sure if thats what you were looking for though.
Has anyone considered using LLVM's bytecode, instead of a yet-another-custom-bytecode?
Ruby 1.8 doesn't actually use bytecode at all (even internally), so there is no pre-compilation step.

Categories