Seriously speeding up PHP?

Seriously speeding up PHP? - php

I've been writing PHP for years, and have used every framework under the sun, but one thing has always bugged me... and that's that the whole bloody thing has to be interpreted and executed every time someone tells my server they want the page served.
I've experimented with caching, FastCGI, the Zend Job Queue (and symfony plug-ins that do similar - as well as my own DB-based solutions that implement the System_Daemon class to run background processes) and I've managed to make my apps fairly quick using all that stuff... but I can't get over the mental block that my settings files, system/environment check functions, and all the stuff that should only really be loaded ONCE... loads every darn time someone hits my page.
So, my ramble leads to the following Q--
Is there some method/technique for loading certain aspects of PHP into RAM so that when that page is requested, all my settings.yml files, system checks, framework files, cached pages etc can be loaded directly from memory without ever even touching the HD... or needing to go through the same loading mechanism 50,000 times per day to init the program?
If there's nothing in PHP... are there any other 'web' languages that can be compiled in this way, to allow for true init-once apps?

I think you should give memcached a try, if you're talking about caching data. I think PHP is fairly proficient in caching compiled php-pages if you use stuff like mod_php in apache (which doesn't die in between requests).

Take a look on APC (Alternative PHP Cache), it keeps a cache of compiled files (PHP Opcode) and also lets you store random variables on memory with apc_fetch, apc_store.
The instalation is very simple and it really gives a boost on performance.

Create a full page cache on the ram disk and make your web server serve the page from there. This is a method that wordpress supercache plugin uses and it works great if your web site is suitable for full page caching. This whay you are not even invoking the PHP interpreter.
For users that are logged in (have an open session) you can create a rewrite condition that will redirect their request to the PHP engine.
Also, always use an opcode cache like APC and use it for caching config files (memcache is also fine).

If you are asking for a JVM/Tomcat like application server, then the answer is likely no. To my knowledge nothing (usable) like this exists for PHP. PHP uses a shared-nothing architecture, so it is by design everything is setup on all requests. But actually, this makes PHP scale pretty well.
As for speeding up your apps, try to use memcached and a code accelerator. Maybe look into Zend Server to get a complete package.

Regarding your last question, I believe at least most of the Python and Ruby web frameworks work like that.
Ruby web applications are nowadays built so that the app is only initialized once per server process. When requests come in, the server (Apache, for example) passes them to the web application (over Rack interface) which is running on the background.
This is how web frameworks based on Rack work. Older versions of Ruby on Rails were similar, although they used a different interface to talk to the web server.

I'd keep an eye on the Facebook Engineering ppage (http://www.facebook.com/notes.php?id=9445547199), every now and then they come up with posts about how they keep things fast/optimize/scale. I think they're use of php is super impressive.

Related

Is it possible to have a persistent PHP session for faster delivery of webpages?

Well, to be specific, I'm running my own content management system running on Linux Apache2 MySQL PHP server. The system is comparable to Linux kernel w/ modules.
--(request start)--
The system launches his "init" script that takes care of dependency-based module loading (only minimum modules are loaded, in proper order, so it "just works", but nothing else (disabled/unused modules are not loaded)).
Once the system is ready, request processing comes in - all the data gets loaded, parsed, processed, buffered, chewed and so on, until we have a complete (x)HTML page.
--(request end)--
Once the request is processed, the data are passed to browser and the system is killed. All this happens in a very short time, but the most cpu-intensive is the beginning part (preparating system for use).
I have a few options:
Let system be in a way it's now (and risk performance issues after it's deployed for REAL usage (approx. 100-500 requests/s per system))
Do some kind of preloading (preparing the system manually and not let anything magic happen then)
Find a way to keep the system in ready-for-use state (all modules loaded, classes initialized, ready MySQL link, etc.)
Question is:
Is there a way to accomplish point 3? (point 2 is what I want the least)
If it's possible, how?
Thanks for any advices that'll point me right way!

Probably what you need is PHP APC, eAccelerator or some other extension that parses your code and keeps it as byte-code in the memory, which for CPU hungry situations can help your performance a LOT. It seems that you have the knowledge to setup such extension, I would recommend you the "APC" being the most used and tested one out there:
http://en.wikipedia.org/wiki/List_of_PHP_accelerators
Edit: For MySQL I would go with using "persistent connection" which might help as well.

You may Want to compile your Php into a c++ Or Java and gain performance by sacrifice a little Bit go check the wikipedia for more info's HipHop

Is it possible not to load the bootstrapping mechanism at every call?

This is not a PHP question, but my expertise is with PHP frameworks.
A lot of frameworks have a bootstrapping (loading of classes and files) mechanism. (Drupal, Zend Framework to name a few)
Everytime that you make a request, the complete bootloading process needs to be repeated. And it can be optimized using APC by automatically caching some intermediate code
The general question is:
For any language, is there any way to not load the complete bootstrapping process? Is there any way of "caching" the state (or starting at) at the end of the bootstraping process to not load everything again? (maybe the answer is in some other language/framework/pattern)
It looks to me as extremely inefficient.

In general, it's quite possible to perform bootstrap / init code once per process, instead of having to reload it for every request. In your specific case, I don't think this is possible with PHP (but my knowledge of PHP is limited). I know I have seen this as a frequently criticism of PHP's architecture... but to be fair to PHP, it's not the only language or framework that does things this way. To go into some detail...
The style of "run everything for every request" came about with "CGI" scripts (c.f. Common Gateway Interface), which were essentially just programs that got executed as a separate process by the webserver whenever a request came in matching the file, and predefined environmental variables would be set providing meta information. The file could be basically any executable, written in any language. Since this was basically the first way anyone came up with of doing server-side scripting, a number of the first languages to integrate into a webserver used the cgi interface, Perl and PHP among them.
To eliminate the inefficiency you identified, the a second method was devised, which used plugins into the webserver itself... for Apache, this includes mod_perl for Perl, and mod_python for Python (the latter now replaced by mod_wsgi for Python). Using these plugins, you could configure the server to identify a program to load once per process, which then does the requisite initialization, loads it's persistent state into memory, and offers up a single function for the server to call whenever there is a request. This can lead to some extremely fast frameworks, as well as things such as easy database connection pooling.
The other solution that was devised was to write a web server (usually stripped down) in the language required, and then use the real webserver to act as a proxy for the complicated requests, while still serving static files directly. This route is also used frequently by Python (quite often via the server provided by the 'Paste' project). It's also used by Java, through the Tomcat webserver. These servers, in turn, offer approximately the same interface as I mentioned in the last paragraph.

The short answer is: in PHP there's no good way to skip the bootstrapping. (Technically you could run a PHP service 24/7 that ran forked children to handle requests, but that's not going to make your life any better.)
A good framework shouldn't do much in bootstrapping. In my personal one that I use, it simply registers an autoload function for classes, loads the config settings from MemCache, and connects to a database.
At that point, it parses the request and sends it to the proper controller / action. While creating the new router object every time is a "waste," the actual process of handling the request needs to be done regardless if the bootstrapping process is magically "cached" between requests.
So I would measure the time it takes between starting the page and getting to the action method to see if it's even a problem. If the framework is doing expensive things related to configuration and class loading, you should be able to minimize that via storing the end results in memcache.
Note that you should always be using an opcode cache (e.g. APC) and a persistent SAPI (e.g., php-fpm) in production. Otherwise, there is a lot of overhead with starting up and shutting down.

I would suggest you to look into FastCGI and C/C++ interface if you want to handle multiple requests. Usually it brings many problems (such as data caching / flushing, memory leaks etc), but can raise performance 10-100 times.
PHP is more suitable for web interface, and if you need fast-processing then you can write a persistent handler.
Also take a look at Java / Tomcat, Python and mod_perl. Some people have also suggested xcache.
For the PHP frameworks they do need to support a multi-request structure in the core, and I'm not aware of any framework doing that.
However said that, I'd love to have a project which would let PHP script to respond to multiple requests inside a loop. Not simultaneously, but bypassing the initialization.
Also you can take a look at https://github.com/kvz/system_daemon, and http://gearman.org/.

How to persist objects between requests in PHP

I've been using rails, merb, django and asp.net mvc applications in the past. What they have common (that is relevant to the question) is that they have code that sets up the framework. This usually means creating objects and state that is persisted until the web server is recycled (like setting up routing, or checking which controllers are available, etc).
As far as I know PHP is more like a CGI script that gets compiled to some bytecode each time it's run, and after the request it's discarded. Of course you can have sessions, to persist data between requests from the same user, and as I see there are extensions like APC, with which you can persist objects between requests at the server level.
My question is: how can one create a PHP application that works like rails and such? I mean an application that on the first requests sets up the framework, then on the 2nd and later requests use the objects that are already set up. Is there some built in caching facility in mod_php? (for example that stores the compiled bytecode of the executed php applications) Or is using APC or some similar extensions the only way to solve this problem? How would you do it?
Thanks.
EDIT: Alternative question: if I create a large PHP application that has a very large set up time, but minor running time (like in the frameworks mentioned above) then how should I "cache" the things that are already set up (this might mean a lot of things, except for maybe the database connections, because for that you have persistent connections in PHP already).
To justify large set up time: what if I'm using PHP reflection to check what objects are available and set the runtime according to that. Doing a lot of reflection is usually slow, but one has to do it only once (and re-evaluate only if the source code is modified).
EDIT2: It seems it's APC then. The fact that it caches bytecode automatically is good to know.

Not sure if APC is the only solution but APC does take care of all your issues.
First, your script will be compiled once with APC and the bytecode is stored in memory.
If you have something taking long time to setup, you can also cache it in APC as user data. For example, I do this all the time,
$table = #apc_fetch(TABLE_KEY);
if (!$table) {
$table = new Table(); // Take long time
apc_store(TABLE_KEY, $table);
}
With APC, the task of creating table is only performed once per server instance.

PHP (and ruby for that matter) are interpretive languages. That is they parse the files each time they are requested and I suppose you could say are converted to a pseudo byte code. It is more 'apparent' one could say that PHP is more like this than say RoR but they both behave the same way.
The feature of persisting data between requests is a feature of the server not of the language itself. For example, the RoR routing you speak of is in fact cached but that's cached in the server's local memory. It isn't compiled and stored for faster readins. The server (and by server I mean both the box & the web service instances) restarts this information is gone. The 'setting up the framework' you speak of still involves parsing EACH file involved in the framework. Rails parses each file during the request again and again, the production level features may in fact cache this data in memory but certainly in development it does not. The only reason I mention that is because it illustrates that it's a feature of the server not the language.
To achieve the same thing in PHP you could use Zend Server. As far as I know this is the only PHP interpreter that will 'compile' and use byte code when told to. Otherwise you'll need to find a way to store the data you want to persist over requests. APC as you mentioned is a very powerful feature, a more distributed one is Memcached and then of course there's more persistent forms like disc & sql.
I am interested in knowing why you'd like this particular feature. Are you noticing performance issues that would be 'solved' by doing this?

I think you're making some incorrect generalizations. All of those frameworks (ex: Rails) can be run with different configurations. Under some, a process is created for every request. This obviously hurts performance, but it shows that these frameworks don't rely on a long-running process. They can set things up (reparse config files, create objects, etc.) every request if needed.
Of course, mod_php (the way PHP is usually used) runs inside the web server process, unlike CGI. So I don't see anything fundamentally different between CakePHP (for example) and Rails.
I think perhaps you are looking for something like Python's WSGI or Ruby's Rack, but for PHP. This specifies an interface (independent of how the language is run) for an application. For a new request, a new instance of an application object is created. As far as I know, this does not exist for PHP.

should I add a php APC to my server

A friend has recommended that I install php APC, claiming it will help php run faster and use less memory
sounds promising but I'm a little nervous about adding it to my VPS server
I have one small app that I've built using codeigniter, and several sites that use the popular slideshowpro photo gallery software
could install this break any of the back end code on my sites?
I'm no high tech server guy, but should I give this a try?

Depends entirely on your situation.
Is your site unresponsive or slow at the moment? Is this definitely due to the PHP scripts and not any other data sources such as a database or remote API?
If you answered yes to the above, then installing one of the many PHP accelerators out there would be a good shout. As for using less memory, that's largely dependent on your apache/lightppd/nginx config and php.ini variables.
Most PHP accelerators work by converting the (to be) interpreted PHP code into opcode. This is then stored in memory (RAM) for fast access. If you haven't already implemented file-based caching in CodeIgniter then the benefits of installing a PHP accelerator would be noticeable. If you haven't, then I suggest you do that first before moving straight over to (wasting?) spending time trying to install APC manually.
If your site is currently performing well and you're not too confident in your *nix skills then I suggest you try implementing CodeIgniter caching first rather than try messing with what is an already working VPS.
My personal preference is PHP eAccelerator.
Should installing a PHP cache engine not improve your site's performance then I suggest you look at what other factors influence your application. As stated above, these could be: database or API to name a few.
Hope this helps.

APC is basically a cache engine that stores your compiled php scripts on a temp location on your server. Meaning that these do not have to be interpreted every time someone calls your sccript. It is a PHP extension can can safely be turned ON or OFF and it does not affect your actual code. So... do not fear!

When a php script is processed, there is a compilation phase, where php converts the source code of the php files into "opcodes". APC simply caches the result of this compilation phase, so it should be safe to turn on.
That said, when making such changes to production code it is always wise to run a regression test to ensure no new issues have been introduced.

How do I implement a HTML cache for a PHP site?

What is the best way of implementing a cache for a PHP site? Obviously, there are some things that shouldn't be cached (for example search queries), but I want to find a good solution that will make sure that I avoid the 'digg effect'.
I know there is WP-Cache for WordPress, but I'm writing a custom solution that isn't built on WP. I'm interested in either writing my own cache (if it's simple enough), or you could point me to a nice, light framework. I don't know much Apache though, so if it was a PHP framework then it would be a better fit.
Thanks.

You can use output buffering to selectively save parts of your output (those you want to cache) and display them to the next user if it hasn't been long enough. This way you're still rendering other parts of the page on-the-fly (e.g., customizable boxes, personal information).

If a proxy cache is out of the question, and you're serving complete HTML files, you'll get the best performance by bypassing PHP altogether. Study how WP Super Cache works.
Uncached pages are copied to a cache folder with similar URL structure as your site. On later requests, mod_rewrite notes the existence of the cached file and serves it instead. other RewriteCond directives are used to make sure commenters/logged in users see live PHP requests, but the majority of visitors will be served by Apache directly.

The best way to go is to use a proxy cache (Squid, Varnish) and serve appropriate Cache-Control/Expires headers, along with ETags : see Mark Nottingham's Caching Tutorial for a full description of how caches work and how you can get the most performance out of a caching proxy.
Also check out memcached, and try to cache your database queries (or better yet, pre-rendered page fragments) in there.

I would recommend Memcached or APC. Both are in-memory caching solutions with dead-simple APIs and lots of libraries.
The trouble with those 2 is you need to install them on your web server or another server if it's Memcached.
APC
Pros:
Simple
Fast
Speeds up PHP execution also
Cons
Doesn't work for distributed systems, each machine stores its cache locally
Memcached
Pros:
Fast(ish)
Can be installed on a separate server for all web servers to use
Highly tested, developed at LiveJournal
Used by all the big guys (Facebook, Yahoo, Mozilla)
Cons:
Slower than APC
Possible network latency
Slightly more configuration
I wouldn't recommend writing your own, there are plenty out there. You could go with a disk-based cache if you can't install software on your webserver, but there are possible race issues to deal with. One request could be writing to the file while another is reading.
You actually could cache search queries, even for a few seconds to a minute. Unless your db is being updated more than a few times a second, some delay would be ok.

The PHP Smarty template engine (http://www.smarty.net) includes a fairly advanced caching system.
You can find details in the caching section of the Smarty manual: http://www.smarty.net/manual/en/caching.php

You seems to be looking for a PHP cache framework.
I recommend you the template system TinyButStrong that comes with a very good CacheSystem plugin.
It's simple, light, customizable (you can cache whatever part of the html file you want), very powerful ^^

Simple caching of pages, or parts of pages - the Pear::CacheLite class. I also use APC and memcache for different things, but the other answers I've seen so far are more for more complete, and complex systems. If you just need to save some effort rebuilding a part of a page - Cache_lite with a file-backed store is entirely sufficient, and very simple to implement.

Project Gazelle (an open source torrent site) provides a step by step guide on setting up Memcached on the site which you can easily use on any other website you might want to set up which will handle a lot of traffic.
Grab down the source and read the documentation.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.