PHP is a general-purpose server side scripting language. It is well known that the php code are interpreted when the page loads and resulting webpage is shown. Recently I have heard of Just in time compilers for PHP(HipHop Virtual Machine). would like to know what kind of difference would it make in the execution and is it better to have a jit instead of interpreter? are the any php engines that have jit ?
EDIT:
Is PHP Execution flow like this ?: php code -> parsing -> tokens -> bytecode/opcode -> php engine interpretation -> machine code -> execution
Corect me if I am wrong. Bytecodes are generally executed in virtual machines. An Opcode( is close to machine language) can be executed by machine directly. Does this mean a php engine is a virtual machine or just some implementations of it are virtual machines?
Thanks in Advance.
HipHop is not a JIT compiler - it's a code converter which changes PHP into C++ which can then be compiled using a conventional offline compiler.
As a result, eval and create_function won't work, nor the tokenizer functions. I've not looked into the matter deeply, but I would expect that conditional / runtime evaluated include operations will likely cause problems too.
There wouldn't be much point unless it made the code much faster.
OTOH using a PHP opcode cache gives a huge performance boost (not quite as much as native code) without compromising functionality.
(given the architecture off PHP a JIT compiler doesn't really make a lot of sense)
Related
How is php code executed and how is the server processing it ?
We all know that all software becomes machine language(01).
We can also easily convert machine language(01) to assembly language.
And finally we have done the decompile operation and we can reverse engineer the software !
it's true ?!
ok!
Now php code that is encrypted by different algorithms and different software such as (zend, ioncube, ...) will eventually be converted to php code for the server to be able to process them ?!
The code that is encrypted should eventually become readable language for the server, right?
I want to know what is the process of decompile this source code on the server ?
example:
This is my code:
<?php
$x = 1;
$y = 2;
$sum = $x + $y;
echo $sum;
output:
3
To do this simple calculation, the server takes the php code and then processes it and displays the output ....
Is there a way to record the process of code processing by the server?
Like software that eventually becomes the language of the machine and we get the machine code of that software for reverse engineering !
I translated this question by google translate, I hope you understand what I mean! thanks all.
Some comments above are muddled and not entirely accurate. This should give a little insight.
"We all know that all software becomes machine language"
Putting aside code that sits below machine language such as that found in CISC or EISC architectures (e.g. Linn Rekursiv from the folks that used to make record players), software may not become machine language but instead simply be interpreted as-is or be compiled and executed by a program implementing a virtual machine. So it may be transformed, but not necessarily to machine language.
"How is php code executed and how is the server processing it"
As of PHP 4, PHP source gets compiled to bytecode and the bytecode is executed by a virtual machine, the so-called "Zend Engine". With the outcome of compilation being deterministic, this naturally lends itself to caching, and PHP installations will often have a caching component that optimises and caches the compiled bytecode in shared memory. The cache allows the compilation step to be skipped for files that are cached and unchanged, and by ensuring that some other prerequisites related to memory addressing are met, compiled code can be executed directly from the shared memory cache with no deserialisation and minimal overhead, thus delivering a good speedup.
There are various ways that a server may trigger PHP execution, such as directly running the PHP command line program, having the PHP core linked into the web server, or PHP processes running in a pool with I/O transferred between server and PHP via some IPC mechanism.
Rather than protecting source code, protection systems such as Zend and ionCube leverage the compilation to bytecode process first by also compiling source to bytecode, and then going further to protect the bytecode in various ways, perhaps by also utilising a custom execution engine.
Bytecode systems tend to have machine instructions that are closer to the high level language they target than native machine code, with PHP bytecode having instructions to implement foreach and some other keywords directly for example. As a result, bytecode tends to be easier to decompile, and there have been decompilers for PHP since around 2006 first coming out of China and later with projects such as XCache.
I was reading about Preload and very excited about it, however (as I understood by searching more on Google) they both seem to have the same definition in my mind:
Preload: Loading compiled PHP files on server startup and make all the defined classes and functions to be permanently available in the context of future request (as I understand from here)
JIT: Compilation of files at run time rather than prior to execution
Which one affects the performance more? specially on frameworks
The confusion here is between two different meanings of "compiled"; or, really, the same meaning - transforming a high-level program into a set of lower-level instructions - applied twice to the same program.
Since PHP 4, the PHP code that humans write has been automatically compiled to a more abstract language, called "op codes". These act as instructions for a "virtual machine", but are still very high-level: each op code triggers a whole sub-routine in the Zend Engine.
The OpCache extension included with PHP since version 5.5 not only caches these op codes to save time re-compiling, it performs a lot of optimisations by manipulating them. Pre-loading is part of this mechanism: it runs the compilation and optimisation steps, and saves the op codes for reuse by multiple PHP processes.
However, those op codes are still a long way from what the CPU is actually going to run. The virtual machine that executes them is technically an interpreter, working through a list of instructions, and performing multiple steps even for something as simple as $x + $y.
The basic idea of JIT in PHP 8 is to supplement that interpreter with a second compiler - this time, compiling from op codes down to actual machine instructions. More specifically, a JIT compiler looks at a section of code as it runs (hence "Just In Time"), and generates a set of CPU instructions to implement it.
Now you may be wondering why we don't do this bit as early as possible - Just In Time seems to be the opposite of pre-loading! The advantage is that the JIT compiler can look at how code is actually being used, rather than all the possible ways it might be used. The interpreter looking at the op codes for $x + $y has to account for the fact that each time the code runs, those variables might be integers, floats, strings, or something where + needs to throw an error. The JIT compiler can see that the running program often has them both as integers, and compile some fast code for that scenario. When the other scenarios come up, the JIT compiler just hands back to the normal interpreter.
I was trying to understand the working of Zend with the help of this excellent article. Its when I found out that Zend Engine was a Virtual Machine.
Now my question is whats the advantage of creating an intermediate code for scripting languages like php?
I can understand that having Intermediate Code in the case of programming languages like Java and CSharp would introduce portability across different platforms like Linux and Windows.
It is faster to execute bytecode than interpret sourcecode.
This bytecode might be cached (this is done via PHP accelerators), thus giving up to 20x performance boost.
The term VM in the article is completely wrong. In reality he's describing that PHP compiles the scripts to bytecode, and this bytecode will be interpreted, there's NO vm inside of PHP.
The bytecode operations (opcode) are only an efficient representation of a php script to run the statements one after another and store the results correctly. Have a look at "Abstract Syntax Tree"s to fully understand the bytecode and their advantage for every language.
This is something I've always wondered: Why is PHP slower than Java or C#, if all 3 of these languages get compiled down to bytecode and then executed from there? I know that normally PHP recompiles each file with each request, but even when you bring APC (a bytecode cache) into the picture, the performance is nowhere near that of Java or C# (although APC greatly improves it).
Edit:
I'm not even talking about these languages on the web level. I am talking about the comparison of them when they're number crunching. Not even including startup time or anything like that.
Also, I am not making some kind of decision based on the replies here. PHP is my language of choice; I was simply curious about its design.
One reason is the lack of a JIT compiler in PHP, as others have mentioned.
Another big reason is PHP's dynamic typing. A dynamically typed language is always going to be slower than a statically typed language, because variable types are checked at run-time instead of compile-time. As a result, statically typed languages like C# and Java are going to be significantly faster at run-time, though they typically have to be compiled ahead of time. A JIT compiler makes this less of an issue for dynamically typed languages, but alas, PHP does not have one built-in. (Edit: PHP 8 will come with a built-in JIT compiler.)
I'm guessing you are a little bit into the comparing of apples and oranges here - assuming that you are using all these languages to create web applications there is quite a bit more to it than just the language. (And lots of the time it is the database that is slowing you down ;-)
I would never suggest choosing one of these languages over the other on the basis of a speed argument.
Both Java and C# have JIT compilers, which take the bytecode and compile into true machine code. The act of compiling it can take time, hence C# and Java can suffer from slower startup times, but once the code is JIT compiled, its performance is in the same ballpark as any "truly compiled" language like C++.
The biggest single reason is that Java's HotSpot JVM and C#'s CLR both use Just-In-Time (JIT) compilation. JIT compilation compiles the bytecodes down to native code that runs directly on the processor.
Also I think Java bytecode and CIL are lower-level than PHP's internal bytecode which might make alot of JIT optimizations easier and more effective.
A wild guess might be that JAVA depends on some kind of "application" server, while PHP doesn't -- which means a new environnement has to be created each time a PHP page is called.
(This was especially true when PHP was/is used as a CGI, and not as an Apache module or via FastCGI)
Another idea might be that C# and JAVA compilers can do some heavy optimisations at compile time -- on the other side, as PHP scripts are compiled (at least, if you don't "cheat" with an opcode cache) each time a page is called, the compilation phase has to be real quick ; which means it's not possible to spend much time optimizing.
Still : Each version of PHP generally comes with some amelioration of the performances ; for instance, you can gain between 15% and 25% of CPU, when switching from PHP 5.2 to 5.3.
For instance, take a look at those benchmarks :
Benchmark of PHP Branches 3.0 through 5.3-CVS
Performance PHP 5.2 vs PHP 5.3 - huge gain
Bench PHP 5.2 vs PHP 5.3 -- disclaimer : it's in french, and I'm the one who did it.
One important thing, also, is that PHP is quite easy to scale : just add a couple of web servers, and voila !
The problem you often meet when going from 1 to several servers is with sessions -- store those in DB or memcached (very easy), and problem solved !
As a sidenote : I would not recommend choosing a technology because there is a couple of percent difference of speed on some benchmark : there are far more important factors, like how well your team know each technology -- or, even, the algorithms you are going to use !
There is no way an interpreted language can be faster than a compiled language or even a JIT language under trivial conditions.
Unless your test program consists of printing out "Hello Worlds" if you are concerned about speed, stick with C# or Java.
Depends on what you want to do. In some cases, PHP is definitely faster. PHP is (pretty) good at file manipulation and other basic stuff (also XML stuff). Java or C# might be slower in those cases (though I didn't benchmark).
Also, the PHP output (HTML or whatever) needs to be downloaded to the browser, which also consumes time.
Also, the speed of Java / C# is very much depending on the machine it runs on (which could be multiple). Java / C# could be slow on your computer, while PHP just runs on one server from which it is available and is always as fast as the server is (except for download times, etc.).
I don't think they are comparable in a general manner. I think you need to take a task, which you could be accomplished with those three programming languages, and then compare that. That is basically always what you should do when choosing a programming language; find the one that fits the task. Don't shape the task until it fits the programming language.
According to wikipedia, PHP uses The Zend Engine, which does not have a JIT.
If I write a hello world app using a PHP web framework such as CodeIgniter and then I compile it and run it using HipHop. Will it run faster than if I write the same hello world app in django or rails?
HIPHOP converts php code into C++ code, which needs to be compiled to run. Since pre-compiled code runs faster and uses less memory then scriping languages like python/php it will probably run faster in the example you have given.
However, HIPHOP does not convert all code. A lot of code in php is dynamic and can not be changed to c++, this means you will have to write your code with this in mind. If codeigniter can even be compiled using HIPHOP is another question.
Terry Chay wrote a big article about HIPHOP, covering when to use it, it's limitations and future. I would recomment reading this, as it will most likely answer most of your questions and give you some insight into how it works :)
http://terrychay.com/article/hiphop-for-faster-php.shtml
At that point the run time is inconsequential. HipHop was designed for scaling... meaning billions of requests. There's absolutely no need to use something like HipHop for even a medium size website.
But more to the point of your question... I don't think there have been comparison charts available for us to see, but I doubt the run time would be faster at that level.
i don't know about django or rails, so this is a bit off-topic.
with plain php, the request goes to apache, then to mod_php. mod_php loads the helloworld.php script from disk, parses & tokenizes it, compiles it to bytecode, then interprets the bytecode, passes the output back to apache, apache serves it to the user.
with php and an optimizer the first run is about the same as with plain php, but the compiled source code is stored in ram. then, for the second request: goes to apache, apache to mod_php, apc loads bytecode from ram, interprets it, passes it back to apache, back to the user.
with hiphop there is no apache, but hiphop itself and there's no interpreter, so request goes directly to hiphop and back to the user. so yes, it's faster, because of several reasons:
faster startup because there's no bytecode compilation needed - the program is already in machine-readable code. so no per-request compilation and no source file reading.
no interpreter. machine code is not necessarily faster - that depends on the quality of source translation (hiphop) and the quality of the static compiler (g++). hiphop translated code is not fast compared to hand-written c code, because there's a bit of overhead because of type handling and such.
with node.js, there's also no apache. the script is started and directly compiled to machine code (because the V8 compiler does that), so it's kind of AOT (ahead of time) compiling (or is it still called JIT? i don't really know). every request is then directly handled by the already compiled machine code; so node.js is actually very comparable to hiphop. i assume hiphop to be multithreaded or something like this, while node does evented IO.
facebook claims a 50% speed gain, which is not really that much; if you compare the results of the language shootout, you'll see for the execution speed of assorted algorithms, php is 5 to 250 times slower.
so why only 50%? because ...
web apps depend on much more than just execution speed, e.g. IO
php's type system prevents hiphop to make the best use of c++'s static types
in practice, a lot of php is already C, because most of the functionality is either built in or comes from extensions. extensions are programmed in C and statically compiled.
i'm not sure if there was a huge performance gain for hello world, because hello world, even with a good framework, is still so small execution speed could be negligible in comparison to all the other overhead (network latency and stuff).
imo: if you want speed and ease of use, go for node.js :)
Running a simple application is always faster in any language. When it's become as complex as facebook, then you will face numerous of problems. PHP slowness will be show it's face. In same times, converting existing code to another language is not an options, since all logic and code is not so easy to translated to other language's syntax. That's why facebook developer decide to keep the old code, and make PHP faster. That's the reason they create their own PHP compiler, called HipHop.
Read this story from the perspective one of Facebook developer, so you know the history of HipHop.
That is not really an apple to apples comparison. In the most level playing field you might have something like:
Django running behind apache
Django rendering an HTML template to say hello world (no caching)
AND
HPHP running behind apache
HPHP rendring an HTML template to say hello world (again, no caching)
There is no database, almost no file I/O, and no caching. If you hit the page 10,000 times with a load generator at varying concurrency levels you will probably find that HPHP will outperform Django or rails - that is to say it can serve render more pages per second and keep up with your traffic a bit better.
The question is, will you ever have this many concurrent users? If you will, will they likely be hitting a database or a cached page?
HPHP sounds cool, but IMHO there is no reason to jump ship just yet (unless you are getting lots of traffic, in which case it might make sense to check it out).
Will it run faster than if I write the
same hello world app in django or
rails?
It probably will, but don't fret. If we're talking prospective speed improvements from yet unreleased projects, Pythonistas have pypy-jit and unladen-swallow to look forward to ;)