PHP vs. long-running process (Python, Java, etc.)? - php

I'd like to have your opinion about writing web apps in PHP vs. a long-running process using tools such as Django or Turbogears for Python.
As far as I know:
- In PHP, pages are fetched from the hard-disk every time (although I assume the OS keeps files in RAM for a while after they've been accessed)
- Pages are recompiled into opcode every time (although tools from eg. Zend can keep a compiled version in RAM)
- Fetching pages every time means reading global and session data every time, and re-opening connections to the DB
So, I guess PHP makes sense on a shared server (multiple sites sharing the same host) to run apps with moderate use, while a long-running process offers higher performance with apps that run on a dedicated server and are under heavy use?
Thanks for any feedback.

After you apply memcache, opcode caching, and connection pooling, the only real difference between PHP and other options is that PHP is short-lived, processed based, while other options are, typically, long-lived multithreaded based.
The advantage PHP has is that its dirt simple to write scripts. You don't have to worry about memory management (its always released at the end of the request), and you don't have to worry about concurrency very much.
The major disadvantage, I can see anyways, is that some more advanced (sometimes crazier?) things are harder: pre-computing results, warming caches, reusing existing data, request prioritizing, and asynchronous programming. I'm sure people can think of many more.
Most of the time, though, those disadvantages aren't a big deal. You can scale by adding more machines and using more caching. The average web developer doesn't need to worry about concurrency control or memory management, so taking the minuscule hit from removing them isn't a big deal.

With APC, which is soon to be included by default in PHP compiled bytecode is kept in RAM.
With mod_php, which is the most popular way to use PHP, the PHP interpreter stays in web server's memory.
With APC data store or memcache, you can have persistent objects in RAM instead of for example always creating them all anew by fetching data from DB.
In real life deployment you'd use all of above.

PHP is fine for either use in my opinion, the performance overheads are rarely noticed. It's usually other processes which will delay the program. It's easy to cache PHP programs with something like eAccelerator.

As many others have noted, PHP nor Django are going to be your bottlenecks. Hitting the hard disk for the bytecode on PHP is irrelevant for a heavily trafficked site because caching will take over at that point. The same is true for Django.
Model/View and user experience design will have order of magnitude benefits to performance over the language itself.

PHP is a language like Java etc.
Only your executable is the php binary and not the JVM! You can set another MAX-Runtime for PHP-Scripts without any problems (if your shared hosting provider let you do so).
Where your apps are running shouldn't depend on the kind of the server. It should depend on the ressources used by the application (CPU-Time,RAM) and what is given by your Server/Vserver/Shared Host!
For performance tuning reasons you should have a look at eAccelerator etc.
Apache supports also modules for connection pooling! See mod_dbd.
If you need to scale (like in a cluster) you can use distributed memory caching systems like memcached!

Related

Does loading .php files that you don't use affect performance?

Just wondering... Does it? And how much
Like including 20 .php files whith classes in them, but without actually using the classes (they might be used though).
I will give a slight variant Answer to this:
If you are running on a tuned VPS or dedicated server: a trivial amount.
If you are running on a shared hosting service: it can considerably degrade performance of your script execution time.
Why? because in the first case you should have configured a PHP opcode cache such as APC or Xcache, which can, in practical terms, eliminate script load and compilation overheads. Even where files need to be read or stat-checked the meta and file data will be "hot" and therefore largely cached in the file-system cache if the (virtual) server is dedicated to the application.
On a shared service everything is running in the opposite direction: PHP is run as a per-request image in the users UID; no opcode caching solutions support this mode, so everything needs to be compiled. The killer here is that files need to be read and many (perhaps most) shared LAMP hosting providers use a scalable server farm for the LAMP tier, with the user data on shared NFS mounted NAS infrastructure. Since these NFS mounts with have an acremin of less than 1 min, the I/O requests will require RPCs off-server. My blog article gives some benchmarks here. The details for a shared IIS hosting template are different but the net effects are similar.
I run the phpBB forum package on my shared service and I roughly halved response times by aggregating the common set of source includes as I describe here.
Yes, though by how much depends on a number of things. The performance cost isn't too high if you are using a PHP accelerator, but will drastically slow things if you're aren't. Your best bet is generally to use autoloading, so you only load things at the point of actual use, rather than loading everything just in case. That may reduce your memory consumption too.
Of course it affects the performance. Everything you do in PHP does.
How much performance is a matter of how much data is in them, and how long it takes to execute them or in the case of classes, read them.
If your not using them, why include them? I assume your using some main engine file, or header file and should rethink your methods of including files.
EDIT: Or as #Pekka pointed out, you can autoload classes.
Short answer - yes it will.
For longer answers a quick google search revealed these - Will including unnecessary php files slow down website? ; PHP Performance on including multiple files
Searching helps!
--Matīss

PHP website Optimization

I have a high traffic website and I need make sure my site is fast enough to display my pages to everyone rapidly.
I searched on Google many articles about speed and optimization and here's what I found:
Cache the page
Save it to the disk
Caching the page in memory:
This is very fast but if I need to change the content of my page I have to remove it from cache and then re-save the file on the disk.
Save it to disk
This is very easy to maintain but every time the page is accessed I have to read on the disk.
Which method should I go with?
Jan & idm are right but here's how to:
Caching (pages or contents) is crutial for performance. The minimum calls you request to the database or the file system is better whether if your content is static or dynamic.
You can use a PHP accelerator if you need to run dynamic content:
My recommendation is to use Alternative PHP Cache (APC)
Here's some benchmark:
What is the best PHP accelerator to use?
PHP Accelerators : APC vs Zend vs XCache with Zend Framework
Lighttpd – PHP Acceleration Benchmarks
For caching content and even pages you can use: Memcached or Redis.
Memcached:
Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
Redis
Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
Both are very good tool for caching contents or variables.
Here's some benchmark and you can choose which one you prefer:
Redis vs Memcached
Redis vs Memcached
Redis VS Memcached (slightly better bench)
On Redis, Memcached, Speed, Benchmarks and The Toilet
You can install also Varnish, nginx, or G-Wan
Varnish:
Varnish is an HTTP accelerator designed for content-heavy dynamic web sites. In contrast to other HTTP accelerators, such as Squid, which began life as a client-side cache, or Apache, which is primarily an origin server, Varnish was designed from the ground up as an HTTP accelerator.
nginx
nginx (pronounced ?engine-x?) is a lightweight, high-performance Web server/reverse proxy and e-mail (IMAP/POP3) proxy, licensed under a BSD-like license. It runs on Unix, Linux, BSD variants, Mac OS X, Solaris, and Microsoft Windows.
g-wan
G-WAN is a Web server with ANSI C scripts and a Key-Value store which outperform all other solutions.
Here's some benchmark and you can choose which one you prefer:
Serving static files: a comparison between Apache, Nginx, Varnish and G-WAN
Web Server Performance Benchmarks
Nginx+Varnish compared to Nginx
Apache, Varnish, nginx and lighttpd
G-WAN vs Nginx
You have a good idea, which is close to what i do myself. If i have a page that is 100% static, i'll save a html version of it and serve that to the user instead of generating the content again every time. This saves both mysql queries and several io operations in some cases. Every time i make some change, my administration interface simply removes the html file and recreates it.
This method has proven to be around 100x faster on my server.
The big question with website performance is "do you serve static pages, or do you serve dynamic pages?".
Static pages
The best way to speed up static pages is to cache them outside your website. If you can afford to, serve them from a CDN (Akamai, Cotendo, Level3). In this case, the traffic never hits your site. There are several ways to control the cache - from fixed duration to the standard HTTP cache directives.
Even if you can't serve your HTML from a CDN, storing your images, javascript and other static assets on a CDN can speed up your site - you could use a cloud service like Amazon for this.
If you can't afford a CDN for your HTML, you could use your own caching proxy layer, as book of Zeus suggests. I've had good results with Varnish. Ideally, you'd run your caching proxy on its own hardware - but you can run it on your existing servers.
Dynamic pages
Dynamic pages are harder to cache - so then you need to concentrate on making the pages themselves as efficient as possible. This basically means hunting the bottleneck - in most systems, the bottleneck is the database (but by no means always).
If you're confident your bottleneck is the database, there are several ways caching options - you can cache "snippets" of HTML, or you can cache database queries. Using an accelerator helps with this - I wouldn't invent one from scratch. This probably means re-architecting (parts of) your application.
You have to profile your site first.
Instead of wild guess one have to determine certain bottleneck(s) and then solve that certain problem.
Cahing is not a silver bullet nor a synonym for the optimization.
Sometimes caching is not applicable (for the ads, for example), sometimes it will help nothing as the reason of ht site slowness may be in some unrelated spot.
Your site may run out of memory. So, memory caching will make the things worse.
I can't believe someone has a high traffic site and said nmot a word of the prior profiling. How can you run it knowing nothing of it's internals? CPU load, memory load, disk i/o and such.
I can add:
Cache everything you can
Minimize number of includes
Use accelerator
Please, investigate, what makes your site slow. Don't forget about YSlow and similar things, they can help you a lot.
Besides, if you have heavy calculations you could write php extension for them, but i don't think this is your case

variable caching softwares APC,Memcached performances

You need to cache arbitrary data like results of PHP logic within methods,database query calls and generally any data results from a process (not Opcode caching).
What would you want to use between third-party caching softwares like Apc and Memcached?What makes you prefer the above tools to caching your data onto your local file system?
thanks
Luca
Go with Memcache. It has a lot more support and larger community (because it can be used by multiple languages). Supports access from multiple servers, so it allow for a more scalable architecture.
That being said, still install APC or another opcode cache for PHP. It will significantly speed up PHP's execution time.
They're both different. APC is a local machine cache specific to PHP and memcached is a multiple-computer distributed cache. If you're trying to scale your programs memcached is often preferred. If you're designing for a single server then APC will suit you better.
I personally prefer a combination of both.
Simple answer, Memcache and APC store the data in memory, not on the disk. Access time is MUCH faster.

Why is PHP apt for high-traffic websites?

I was surprised to learn today that PHP is used widely in high-traffic websites.
I always thought that PHP is not strong in terms of performance, being a dynamic, scripting language (e.g. compared to statically typed, compiled language like C/Java/C# etc.).
So how come it performs so well?
What you'll usually find is that it's not as slow as you think. The reason a lot of sites are slow is because the hosts are overloaded.
But one primary benefit of PHP over a compiled language is ease of maintenance. Because PHP is designed from the ground up for HTTP traffic, there's less to build than with most other compiled languages. Plus, merging in changes becomes easier as you don't need to recompile and restart the server (as you would with a compiled binary)...
I've done a considerable amount of benchmarks on both, and for anywhere under about 50k requests per second (based upon my numbers) there really isn't a significant gain to using a compiled binary (FastCGI). Sure, it's a little faster using compiled C, but unless you're talking Facebook level traffic, that's not really going to mean significant $$$. And it's definitely not going to offset the relatively rapid rate of development that PHP will afford in comparison to using C (which more than likely will require many times the code since it's not memory managed)...
PHP, if properly written can be quite scalable. The limiting factors are typically in your database engine. And that's going to be a common factor no matter what technology you use...
Java deployments in a big enterprise setting are a mess...fighting with builds and code that might not compile for the slightest little things. Also, PHP runs on a fairly simple setup server-wise, not the bulky code that is Weblogic (or others), so others are right in that it's low cost to develop and cheap to deploy on several different machines. It certainly didn't help that I was soured by working in a large, VERY inefficient corporate setting while doing Java....
I wouldn't say that PHP developers are cheaper per se (I make more now as a PHP developer than I did as a Java UI developer) but I do know that my last employer paid me for a not-insignificant amount of time spent configuring, deploying, compiling, etc that is not required in PHP. We're talking probably one day/week of related configuration fussing due to new branch roll outs or release-related configurations. So, the extra I'm paid now is made up for by a significant amount more code that I'm able to work through each week.
PHP is certainly being helped by the fact that MySQL and Postgres (to a smaller extent) have become so much more powerful. They're not directly linked, but having that as a common pairing just sweetens the deal for those making decisions.
It doesn't really perform "so well", just well enough to be used. Keep in mind, though, that Java and C#.NET are also run as bytecode inside a VM. PHP, with tools such as Zend Optimizer, can also skip the compilation step and run as bytecode.
PHP will not run as fast as native, compiled C code, but websites such as Facebook compile PHP to C++ to make it run faster (see HipHop-PHP).
Most websites have performance bottle necks when querying a database etc. The amount of time the script spends executing is usually small compared to this. Using things like libmemcached can help mitigate this.
Many sites started as low-traffic sites. Once you have your PHP website running and suddenly you have to handle much higher traffic, it's cheaper just to buy more servers than to rewrite your app from PHP to something else. Moreover there are tools that improve PHP performance.
Also note, that there are other factors: database, caching strategy which affect performance more than PHP itself.
It doesn't, which is why there are projects like HipHop, but dynamic languages are often faster to develop in, and hardware is cheaper than developers.
In my opinion the stateless nature of PHP is the most important factor to it's scalability. It's been a while since I've done any web work with Java/ASP.NET, but I recall that both technologies have a central application "engine" that all requests are piped through. That's great, because information and state can be shared between instances, and a lot of bootstrapping (reading configuration files, connecting to databases, etc) can be done once, and then shared among instances. It's bad though because that central "engine" itself becomes a bottleneck for the whole application.
The lack of a central engine in PHP also means scaling your application is usually a simple matter of adding another web server to your rig (although scaling the database along with it is more complicated). I imagine scaling a Java/ASP.NET application is a good deal more complicated, and they reach a saturation point where adding more hardware gives less of a boost each time.

Is it possible to retain a variable in memory (RAM) in PHP?

I'm studying high-performance coding for websites in PHP, and this idea popped into my mind:
We know that accessing a database uses a significant amount of CPU usage, so we cache such data, saving it to the HDD. But I was wondering, can't it rest in the RAM of the server, so I can access it even more faster?
You might want to check out memcached:
http://www.php.net/manual/en/intro.memcache.php
PHP normally comes with APC as a bytecode cache. You can also use it as a local cache. If you need something in a distributed/clustered environment, then memcached (plus possibly beanstalkd) is the way to go.
XCache, eaccelerator, apc and memcache allow you to save items to semi persistent memory (you don't necessarily know when an item will expire in most cases). It isn't the same as a database, more like a key/value list. The downside being that it requires a third party library, so you might be a bit limited depending on your environment.
I think you might be able to get the same effect using shared memory (via php's shmop_ functions). But I have never used them or know if they are included with php's library so someone feel free to bash me or edit out this mention.
If your server is ANY good, then it will already do so. But of course, it may be the case that your server is serving a few thousand other tasks besides yours as well, meaning you don't have that server's cache all for yourself.
And if there really are a few thousand others being served besides you, then the probability just gets higher that there is at least one nutcase among those thousands of others, who is doing something that he really shouldn't be doing but that the server has not been programmed to detect, not been programmed to stop, but just been programmed to try and make the best of it, at the expense of availability of resources for the x999 "responsible" users.

Categories