Facebook's HipHop - What's it for? - php

The news in the PHP world today is Facebook's HipHop, which:
HipHop for PHP isn't technically a compiler itself. Rather it is a source code transformer. HipHop programmatically transforms your PHP source code into highly optimized C++ and then uses g++ to compile it. HipHop executes the source code in a semantically equivalent manner and sacrifices some rarely used features — such as eval() — in exchange for improved performance. HipHop includes a code transformer, a reimplementation of PHP's runtime system, and a rewrite of many common PHP Extensions to take advantage of these performance optimizations.
My question is, what type of web applications is this actually useful for?
Seems like typical database-bound web apps may not be greatly served by this, but rarer CPU-bound apps would.

Web applications that do a lot of processing and/or use a lot of memory. Apparently this HipHop will reduce CPU usage by around 50% and also reduce memory usage (I didn't see how much the memory usage would be reduced by mentioned anywhere). This means that you should be able to serve the same number of requests with fewer servers.
An added benefit may be that there will be some basic type checking to ensure that the code is consistent before it is compiled. This should help to locate the type of bugs that PHP currently tends to ignore as a result of its weak type system.
The downside appears to be that it might not support some of PHP's more dynamic features such as eval (though arguably that's a positive too).

Well it "transforms" PHP into C++ to help performance of a largely scalable website.
So, HipHop is for when you have a website that you started at Harvard that you quickly grow into a billion dollar company and that people are making a movie about starring Justin Timberlake. When you have such a website and want to save CPU cycles, but don't want to rewrite your codebase, you use HipHop.
If you are just starting out, unless you are trapped on a desert island with only PHP programmers that refuse to learn a more scalable language, you don't use HipHop.

Running machine code over interpreted code is faster. This is useful in one sense, but also reduces the amount of machines you require, as each processor has less work to do.
This is good for a company like Facebook, in that they can cut the amount of machines they need.
In terms of why it's useful for them, they probably run a lot of sorting and indexing, on the large amounts of data they have.

This article:
http://terrychay.com/article/hiphop-for-faster-php.shtml
answers this question perfectly with its series of "if" statements.

You can think of it as some sort of compiler that takes in a bunch of .php files, and generate a bunch of c++ files for which you can then compile using g++ (Not sure if other compilers are supported). The resulting exe is your web application with a web server included. That means you could run the exe and you are good to go. The web server is based on libevent and supposedly pretty efficient.

Hip Hop is essentially pointless to everyone except Facebook and other gigantic PHP-based sites. I'm sure many people will jump on the bandwagon due to "it's fast" but how many PHP based apps use whole server farms?
Just because you are working on a social network site, doesn't mean you should consider using HH.

Related

Facebook HipHop virtual machine for PHP?

I've looked at this article- HipHop PHP (was Hyper PHP by Facebook)
However, recently come across this Facebook Speeds Development With “HipHop Virtual Machine”, A 60% Faster PHP Executor
Does anyone have details on this? Is it worth exploring for a PHP developer?
You can find the details in the article linked in the article you linked:
https://www.facebook.com/notes/facebook-engineering/the-hiphop-virtual-machine/10150415177928920
hphpc is in essence a traditional static compiler that converts PHP→AST→C++→x64. We have long been keenly aware of the limitations to static analysis imposed by such a dynamic language as PHP, not to mention the risks inherent in developing software with hphpi and deploying with hphpc. Our experiences with hphpc led us to start experimenting with dynamic translation to native machine code, also known as just-in-time (JIT) compilation. A dynamic translator can observe data types as the program executes, and generate type-specialized machine code.
The type-specialized machine code runs faster. Unless you are the size of FB, you do not need this. Use APC or memcached and more traditional approaches to scaling out.
The other answer about HPHPC is accurate... the performance boost is quite nice, but only if you actually need it enough that it's worth the extra complexity.
I'd suggest waiting a few weeks or even months before looking into HPHPVM. It's looking promising, but it's still in the very early stages as far as optimization and bug-fixing go. In the long run it'll probably end up being a great alternative to Zend, but right now the relatively small boost in performance (compared to, say, full translated/compiled HipHop) is likely not worth the extra complexity. That said, do check back in a few months.

Is there a point to minifying PHP?

I know you can minify PHP, but I'm wondering if there is any point. PHP is an interpreted language so will run a little slower than a compiled language. My question is: would clients see a visible speed improvement in page loads and such if I were to minify my PHP?
Also, is there a way to compile PHP or something similar?
PHP is compiled into bytecode, which is then interpreted on top of something resembling a VM. Many other scripting languages follow the same general process, including Perl and Ruby. It's not really a traditional interpreted language like, say, BASIC.
There would be no effective speed increase if you attempted to "minify" the source. You would get a major increase by using a bytecode cache like APC.
Facebook introduced a compiler named HipHop that transforms PHP source into C++ code. Rasmus Lerdorf, one of the big PHP guys did a presentation for Digg earlier this year that covers the performance improvements given by HipHop. In short, it's not too much faster than optimizing code and using a bytecode cache. HipHop is overkill for the majority of users.
Facebook also recently unveiled HHVM, a new virtual machine based on their work making HipHop. It's still rather new and it's not clear if it will provide a major performance boost to the general public.
Just to make sure it's stated expressly, please read that presentation in full. It points out numerous ways to benchmark and profile code and identify bottlenecks using tools like xdebug and xhprof, also from Facebook.
2021 Update
HHVM diverged away from vanilla PHP a couple versions ago. PHP 7 and 8 bring a whole bunch of amazing performance improvements that have pretty much closed the gap. You now no longer need to do weird things to get better performance out of PHP!
Minifying PHP source code continues to be useless for performance reasons.
Forgo the idea of minifying PHP in favor of using an opcode cache, like PHP Accelerator, or APC.
Or something else like memcached
Yes there is one (non-technical) point.
Your hoster can spy your code on his server. If you minify and uglify it, it is for spys more difficult to steal your ideas.
One reason for minifying and uglifying php may be spy-protection. I think uglyfing code should one step in an automatic deployment.
With some rewriting (shorter variable names) you could save a few bytes of memory, but that's also seldomly significant.
However I do design some of my applications in a way that allows to concatenate include scripts together. With php -w it can be compacted significantly, adding a little speed gain for script startup. On an opcode-enabled server this however only saves a few file mtime checks.
This is less an answer than an advertisement. I'm been working on a PHP extension that translates Zend opcodes to run on a VM with static typing. It doesn't accelerate arbitrary PHP code. It does allow you to write code that run way faster than what regular PHP allows. The key here is static typing. On a modern CPU, a dynamic language eats branch misprediction penalty left and right. Fact that PHP arrays are hash tables also imposes high cost: lot of branch mispredictions, inefficient use of cache, poor memory prefetching, and no SIMD optimization whatsoever. Branch misprediction and cache misses in particular are achilles' heel for today's processors. My little VM sidesteps those problem by using static types and C array instead of hash table. The result ends up running roughly ten times faster. This is using bytecode interpretation. The extension can optionally compile a function through gcc. In that case, you get two to five times more speed.
Here's the link for anyone interested:
https://github.com/chung-leong/qb/wiki
Again, the extension is not a general PHP accelerator. You have to write code specific for it.
There are PHP compilers... see this previous question for a list; but (unless you're the size of Facebook or are targetting your application to run client-side) they're generally a lot more trouble than they're worth
Simple opcode caching will give you more benefit for the effort involved. Or profile your code to identify the bottlenecks, and then optimise it.
You don't need to minify PHP.
In order to get a better performance, install an Opcode cache; but the ideal solution would be to upgrade your PHP to the 5.5 version or above because the newer versions have an opcode cache by default called Zend Optimiser that is performing better than the other ones http://massivescale.blogspot.com/2013/06/php-55-zend-optimiser-opcache-vs-xcache.html.
The "point" is to make the file smaller, because smaller files load faster than bigger files. Also, removing whitespace will make parsing a tiny bit faster since those characters don't need to be parsed out.
Will it be noticeable? Almost never, unless the file is huge and there's a big difference in size.

PHP Speed Vs Other Languages

I have heard a lot that PHP is slow compared other languages. Is the speed difference noticeable enough that I should switch to another language? And if so what other language would you recommend? Or what would be some good optimizations that could speed up the PHP?
This question comes up a lot. The answer is:
Yes it's slower than C#, Java, C/C++, etc.
No it probably won't matter.
You can build large scale PHP systems. 4 of the top 20 visited Websites are powered by PHP (Facebook, Yahoo, Wikipedia, Flickr). PHP with an opcode cache (eg APC) can take you much further than you'll probably need or care about.
Most slow Websites have nothing to do with the language they're using. A lot of the time spent on an HTTP request comes down to network latency, absent or ineffectual caching of static resources, lack of compression resulting in more bandwidth used than necessary, poorly performning Javascript and so on.
If you get really desperate for performance you can always use HipHop, which compiles PHP to C++.
PHP will be plenty fast enough for web site applications if you use best practices.
If you compare PHP to, say C++, of course it will be slower. But you need to consider total cost of development. Just because one language produces faster programs doesn't mean it will be more cost effective. Depending on your programming style, experience, and the project you are working on, you may find that a different language is better suited for the task.
If you use an opcode cache, you will get a very big speed gain simply by removing the need for accessing the disk and parsing the PHP files.
As with any language, you do need to be familiar with the data structures and how they are to be used efficiently. Poor algorithms will be slow regardless of the language, but especially in a scripting language where lots of "magic" happens under the hood.
To speed up PHP, try APC - Alternative PHP Cache.
It can cache the compiled code so the source code files don't need to be reparsed for every request.
More info about APC and other PHP accelerators can be found at Wikipedia.
It depends on usage case. Nice example to illustrate this:
When you use PHP as server side web scripting language it will be faster than C/C++ program running as a CGI (this is because for CGI a separate process needs to be created and some setup must be done, while PHP scripts are running inside http server module and are just "ready to go")
On the other hand, when you use PHP for numerical computation it will be drastically slower than program written in C/C++
PHP is designed to be server side web programming language and for that purpose it should be used. It is reasonably efficient for this task but you can speed it up with caching tools. If even that is not enough, you can write extension in Zend API.

Interpreted vs. Compiled Languages for Web Sites (PHP, ASP, Perl, Python, etc.)

I build database-driven web sites. Previously I have used Perl or PHP with MySQL.
Now I am starting a big new project, and I want to do it in the way that will result in the most responsive possible site.
I have seen several pages here where questions about how to optimize PHP are criticized with various versions of "it's not worth going to great lengths to optimize PHP since it's an interpreted language and it won't make that much difference".
I have also heard various discussions (especiallon on the SO podcast) about the benefits of compiled vs. interpreted languages, and it seems as though it would be in my interests to use a compiled language to serve up the site instead of an interpreted language.
Is this even possible in a web context? If so, what would be a reasonable language choice?
In addition to speed one benefit I forsee is the possiblity of finding bugs at compile time instead of having to debug the web site. Is this reasonable to expect?
What you can do is what multiple heavy-traffic websites do (like Facebook or Twitter), aka write your "CPU consuming" algorythm in a C-plugin.
For example, you could write a PHP extension if you plan to use PHP, or a Ruby extension if you plan to use Ruby / Ruby on Rails, etc.
That way, you can keep your streamline code simple and easy to maintain (it might be way harder to handle request from C rather than from PHP), while having a strong and solid background core (because it's compiled, and the compiler tells you what the issues are at compile time)
If you were going to build a new language... and you came up with all the semantics and it was complete, and you had some magic box that had a switch between making the language compiled vs. interpreted, the compiled version would be faster than the interpreted version.
Why? Because compiling brings your semantics down to a lower level on the machine which means it can executed much faster, whereas interpreting means the semantics of your language will translated by some thing (i.e. the interpreter) when the user actually uses your site.
Having said that... that doesn't necessarily mean that your site is going to 100% run faster on a compiled language vs an interpreted language. There are interpreters out there that are very fast nowadays for various languages (i.e. PHP), and there are even optimizers for interpreted languages that make them faster even still.
There are many other things that go into the performance of your site that are agnostic of the language you choose. Hardware setup, Database setup, Network topology, etc. These things can have a bigger impact on you. I would suggest measuring to be sure.
For me, finding bugs at compile time is a huge time saver, so I tend to prefer compiled languages which are strongly typed. It lets me get my work done faster, but that doesn't make it objectively the best option. Some people have no issue writing weakly typed code, and running test suites on them to verify their functionality, which I would think would work just as well.
IMHO it is quite a non-sense to write a complex web app using a compiled language, as it gives not benefits against a number of manageability problems.
There a lot of ways to rise up performances and scalability in a scripted language, both at language level and at system level, being the minor performances gain eventually available with a compiled language totally influential.
On the other side I find very useful to be possible to follow an agile development and bug hunting schema, simply changing your code and seeing the results.
Perl isn't an interpreted language: it is compiled to bytecode, so you pay the price of interpretation only when the perl executable is started. So when using it with Apache, don't use CGI but mod_perl.
Whatever you do, development time is probably going to widely exceed response time if you pick a language that isn't suitable to web programming or doesn't have good libraries to support what you need to do. E.g. I'd never pick C or C++. You don't want a web app that is blisteringly fast but buggy and 6 months late.
Tomcat is a common way to use compiled languages to deploy webpages, but before you go too far, seriously consider what your speed bottlenecks will be. There are a few main sources of slowdown in web applications:
Network latencies
Static media, especially images
Database queries
Server-side processing code
Client-side processing code
1 and 5 don't really have much to do with this question.
2 will be relevant if you have many images that vary from page to page. If that's the case, client browsers won't do such a good job caching, and each page-load will take some time. In this case, it is very likely that your server-side language won't be noticed, because the overhead from static media will dominate.
3 is likely to be a bigger factor than 4 for a lot of applications. If you have very little data, but you do a whole lot of processing, then 4 may dominate, but otherwise, 3 will dominate even if you're using an interpreted language.
People can ask "Why optimize php?" because the 2 and 3 are often more important anyway. Often, a good database caching framework is going to be a better (and easier) optimization.
There are lots of parts that goes into a web application. The time taken by the application layer doesn't need to be big. For a typical application, the biggest hogs would be in the webserver and in the database. Replacing PHP with a binary cgi isn't going to change this.
Furthermore, while the interpreted parts of PHP may be somewhat slow, that is only a small part of what goes on in the execution of a PHP script. All the functions that are provided as part of the language are implemented in native code. For example, when you call a function like preg_match, it will call out a native code library and let it do its work. This means that there is less actual interpretation going on than you might think.
There may be some cases where using a different language than PHP might be worthwhile, but those are special cases. In general, there is nothing to gain here.
The latency of the network is by far the greatest determining factor in this argument. In fact, network latency is so much of a factor that it renders language considerations rather unimportant from a performance issue. So...go with what you know. Use the language that you are most comfortable and most productive with and other considerations can be worked out as you go along. Now, that said, it's always fun to try new stuff and learning new things can become an obsession, so if the project is a personal one that allows you the opportunity to experiment, well, by all means.....

Simple Facebook HipHop Performance Question

If I write a hello world app using a PHP web framework such as CodeIgniter and then I compile it and run it using HipHop. Will it run faster than if I write the same hello world app in django or rails?
HIPHOP converts php code into C++ code, which needs to be compiled to run. Since pre-compiled code runs faster and uses less memory then scriping languages like python/php it will probably run faster in the example you have given.
However, HIPHOP does not convert all code. A lot of code in php is dynamic and can not be changed to c++, this means you will have to write your code with this in mind. If codeigniter can even be compiled using HIPHOP is another question.
Terry Chay wrote a big article about HIPHOP, covering when to use it, it's limitations and future. I would recomment reading this, as it will most likely answer most of your questions and give you some insight into how it works :)
http://terrychay.com/article/hiphop-for-faster-php.shtml
At that point the run time is inconsequential. HipHop was designed for scaling... meaning billions of requests. There's absolutely no need to use something like HipHop for even a medium size website.
But more to the point of your question... I don't think there have been comparison charts available for us to see, but I doubt the run time would be faster at that level.
i don't know about django or rails, so this is a bit off-topic.
with plain php, the request goes to apache, then to mod_php. mod_php loads the helloworld.php script from disk, parses & tokenizes it, compiles it to bytecode, then interprets the bytecode, passes the output back to apache, apache serves it to the user.
with php and an optimizer the first run is about the same as with plain php, but the compiled source code is stored in ram. then, for the second request: goes to apache, apache to mod_php, apc loads bytecode from ram, interprets it, passes it back to apache, back to the user.
with hiphop there is no apache, but hiphop itself and there's no interpreter, so request goes directly to hiphop and back to the user. so yes, it's faster, because of several reasons:
faster startup because there's no bytecode compilation needed - the program is already in machine-readable code. so no per-request compilation and no source file reading.
no interpreter. machine code is not necessarily faster - that depends on the quality of source translation (hiphop) and the quality of the static compiler (g++). hiphop translated code is not fast compared to hand-written c code, because there's a bit of overhead because of type handling and such.
with node.js, there's also no apache. the script is started and directly compiled to machine code (because the V8 compiler does that), so it's kind of AOT (ahead of time) compiling (or is it still called JIT? i don't really know). every request is then directly handled by the already compiled machine code; so node.js is actually very comparable to hiphop. i assume hiphop to be multithreaded or something like this, while node does evented IO.
facebook claims a 50% speed gain, which is not really that much; if you compare the results of the language shootout, you'll see for the execution speed of assorted algorithms, php is 5 to 250 times slower.
so why only 50%? because ...
web apps depend on much more than just execution speed, e.g. IO
php's type system prevents hiphop to make the best use of c++'s static types
in practice, a lot of php is already C, because most of the functionality is either built in or comes from extensions. extensions are programmed in C and statically compiled.
i'm not sure if there was a huge performance gain for hello world, because hello world, even with a good framework, is still so small execution speed could be negligible in comparison to all the other overhead (network latency and stuff).
imo: if you want speed and ease of use, go for node.js :)
Running a simple application is always faster in any language. When it's become as complex as facebook, then you will face numerous of problems. PHP slowness will be show it's face. In same times, converting existing code to another language is not an options, since all logic and code is not so easy to translated to other language's syntax. That's why facebook developer decide to keep the old code, and make PHP faster. That's the reason they create their own PHP compiler, called HipHop.
Read this story from the perspective one of Facebook developer, so you know the history of HipHop.
That is not really an apple to apples comparison. In the most level playing field you might have something like:
Django running behind apache
Django rendering an HTML template to say hello world (no caching)
AND
HPHP running behind apache
HPHP rendring an HTML template to say hello world (again, no caching)
There is no database, almost no file I/O, and no caching. If you hit the page 10,000 times with a load generator at varying concurrency levels you will probably find that HPHP will outperform Django or rails - that is to say it can serve render more pages per second and keep up with your traffic a bit better.
The question is, will you ever have this many concurrent users? If you will, will they likely be hitting a database or a cached page?
HPHP sounds cool, but IMHO there is no reason to jump ship just yet (unless you are getting lots of traffic, in which case it might make sense to check it out).
Will it run faster than if I write the
same hello world app in django or
rails?
It probably will, but don't fret. If we're talking prospective speed improvements from yet unreleased projects, Pythonistas have pypy-jit and unladen-swallow to look forward to ;)

Categories