Related
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Imagine there is a PHP page on http://server/page.php.
Client(s) send 100 requests from browser to the server for that page simultaneously.
Does the server run 100 separate processes of php.exe simultaneously?
Does it re-interpret the page.php 100 times?
The answer is highly variable, according to server config.
Let's answer question 1 first:
Does the server run 100 separate processes of php.exe simultaneously?
This depends on the way PHP is installed. If PHP is being run via CGI, then the answer is "Yes, each request calls a separate instance of PHP". If it's being run via an Apache module, then the answer is "No, each request starts a new PHP thread within the Apache executable".
Similar variations will exist for other web servers. Please note that for a Unix/Linux based operating system, running separate copies of the executable for each request is not necessarily a bad thing for performance; the core of the OS is designed such that in many cases, tasks are better done by a number of separate executables rather than one monolithic one.
However, no matter what you do about it, having large numbers of simultaneous requests will drain your server resources and lead to timeouts and errors for your users. This is why it is important for your PHP programs to finish running as quickly as possible. Do not write PHP programs for web consumption that are slow to run; if you're likely to have a lot of traffic, you need to test for performance as much as you do for functionality. Having your programs exit quickly will dramatically reduce the likelihood of having a significant number of simultaneous requests, which will obviously have a big impact on your site's performance.
Now your second question:
Does it re-interpret the page.php 100 times?
For a standard PHP installation, the answer here is "Yes it does, and yes it does have a performance impact."
However, PHP provides several Caching solutions that are designed specifically to mitigate this. The main options are APC and the Zend Cache, either of which can be installed as standard modules. Using these modules will mean that PHP caches the interpreted code, so it can be run much faster for subsequent calls.
The Zend Cache will be included as part of the standard PHP installation as of the forthcoming PHP 5.5 release.
Apache2 has multiple different mode to work.
In "prefork" (the most commonly used) mode, Apache will create process for every request, each process will run a own php.exe. Config file will assign a maximum number of connections (MaxClients in httpd.conf), Apache will only create MaxClients. This is to prevent memory exhaustion. More requests are queued, waiting for the previous request to complete.
If you do not install opcode cache extensions like APC, XCache, eAccelerator, php.exe will re-interpret the page.php 100 times.
It depends.
There are different ways of setting things up, and things can get quite complex.
The short answer is 'more or less'. A number of apache processes will be spawned, the PHP code will be parsed and run.
If you want to avoid the parsing overhead use an opcode cache. APC (Alternative PHP Cache) is a very popular one. This has a number of neat features which are worth digging into, but without any config other than installing it it will ensure that each php page is only parsed into opcode once.
To change how many apache services are spawned, most likely you'll be using MPM Prefork. This lets you decide if how you want Apache to deal with multiple users.
For general advice, in my experience (small sites, not a huge amount of traffic), installing APC is worth doing, for everything else the defaults are not too bad.
There are a number of answers to this. In general, Apache will create a process for an incoming request, so it is possible that 100 process are created. However, a process takes time to create, so it might be that by the time a process has finished and died, one of those 100 connections comes in a fraction of a second later (since 100 connections at exactly the same time is very rare indeed, unless you're Google).
However, let us imagine that 100 processes do really need to be held in memory simultaneously, but that there is only room for 50 in available server RAM. In that case, 50 connections will be served, and 50 will have to wait for processes to die and be re-spawned. Thus, a random half of those requests will be delayed, though if a process create-process-die sequence only takes a fraction of a second, they won't have to wait very long. This is why, when improving server capacity, reducing your page load time is as important as adding more RAM - the sooner a process finishes, the sooner a new one can take its place.
One way, incidentally, to reduce load time is to spawn a number of PHP processes and hold them in memory. This is the basis of FastCGI (or fcgid, which is compatible). Rather than creating and killing a process for every request, a process is spawned in memory immediately and is re-used for several requests. For PHP, these are usually configure to die after a certain number of page requests (e.g. 1000) as historically PHP has had quite a lot of memory leaks (the more a process is reused, the worse the memory leaks get).
You ask if a page is re-interpreted for every request. Normally yes, but if you also run a PHP Accelerator, then no - the byte-code that PHP compiles to is cached and reused. Thus, mixing the FastCGI approach with an accelerator can make for a very speedy server indeed. Standard PHP does not come with an accelerator, but Zend Cache is scheduled for inclusion into the PHP core.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I've been working with Java webapps for quite sometime now. I was wondering how PHP based webapps are different from Java based ones. I could find some differences about security, availability of libraries etc.,
Basically, Java (servlet) based webapps are multi-threaded environments, where by a single process caters to the needs of different requests. How does this work in PHP?
a) Is every request a single process? In such a case how can we ensure memory utilization and other shared resources are under control?
b) Is there a concept called application scope/singleton in PHP based webapps?
c) Can we have connection pools?
Basically, how is PHP different from CGI?
The question might sound silly for PHP developers. I'll be happy to know there is already a place where the differences are documented. Thanks.
Java web applications are hosted (i.e. run within) a stateful, long-running Java process; because of this you can take advantage of in-memory object caching and the ability to manipulate threads.
The standard CGI model (ignoring FastCGI for now) is considerably simpler: a process is started and is passed the incoming HTTP request. The CGI process handles the request itself (including the creation of its own threads, if necessary) and then returns the HTTP response to the process that spawned the child CGI process. The CGI process is then terminated - so anything kept in-memory is lost and has to be serialized to some kind of persistance medium such as a database or file on disk.
(Speculation: CGI's design probably has to do with limited resources available on servers in the early 1990s, and how websites weren't visited all that often so it didn't make sense to use memory like that; finally if you're working on a huge scalable project then you probably won't be interested in in-memory caching because you'll have a dedicated state server).
PHP is a CGI system, so it inherits the limitations of the "one server process per request" model - as for not supporting threads: it seems to be a conscious decision of PHP's developers because it vastly simplifies the system (no need to worry about synchronisation, for example, and because PHP is probably the number one "beginner's language" nowadays it makes sense not to give them enough rope to hang themselves) - besides if you need to use multi-threading in a webserving scenario then PHP is probably the wrong tool to use in the first place.
PHP is not different to CGI - PHP implements CGI. Java webservers do not use CGI (at least not to serve Java applications, and note that there do exist CGI implementations of Java servlet hosts, but let's not complicate things).
Because PHP is not stateful it means it can't pool connections - but in practice this isn't a problem. When you pair PHP with MySQL you'll find that operations are surprisingly cheap. You can connect to a database, grab some data from a SELECT, and return a nicely formatted HTML table all in under 5ms even on a decade-old machine. It doesn't matter what platform you use so long as page generation times are kept under 30ms (my personal objective time limit for a good user experience).
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
To give some context:
I had a discussion with a colleague recently about the use of Autoloaders in PHP. I was arguing in favour of them, him against.
My point of view is that Autoloaders can help you minimise manual source dependency which in turn can help you reduce the amount of memory consumed when including lots of large files that you may not need.
His response was that including files that you do not need is not a big problem because after a file has been included once it is kept in memory by the Apache child process and this portion of memory will be available for subsequent requests. He argues that you should not be concerned about the amount of included files because soon enough they will all be loaded into memory and used on-demand from memory. Therefore memory is less of an issue and the overhead of trying to find the file you need on the filesystem is much more of a concern.
He's a smart guy and tends to know what he's talking about. However, I always thought that the memory used by Apache and PHP was specific to that particular request being handled.
Each request is assigned an amount of memory equal to memory_limit PHP option and any source compilation and processing is only valid for the life of the request.
Even with op-code caches such as APC, I thought that the individual request still needs to load up each file in it's own portion of memory and that APC is just a shortcut to having it pre-compiled for the responding process.
I've been searching for some documentation on this but haven't managed to find anything so far. I would really appreciate it if someone can point me to any useful documentation on this topic.
UPDATE:
Just to clarify, the autoloader discussion part was more of a context :).
It may not have been clear but my main question is about whether Apache will pool together its resources to respond to multiple requests (especially memory used by included files), or whether each request will need to retrieve the code required to satisfy the execution path in isolation from other requests handled from the same process.
e.g.:
Files 1, 2, 3 and 4 are an equal size of 100KB each.
Request A includes file 1, 2 and 3.
Request B includes file 1, 2, 3 and 4.
In his mind he's thinking that Request A will consume 300KB for the entirety of it's execution and Request B will only consume a further 100KB because files 1,2 and 3 are already in memory.
In my mind it's 300KB and 400KB because they are both being processed independently (if by the same process).
This brings him back to his argument that "just include the lot 'cos you'll use it anyway" as opposed to my "only include what you need to keep the request size down".
This is fairly fundamental to how I approach building a PHP website, so I would be keen to know if I'm off the mark here.
I've also always been of the belief that for large-scale website memory is the most precious resource and more of a concern than file-system checks for an autoloader that are probably cached by the kernel anyway.
You're right though, it's time to benchmark!
Here's how you win arguments: run realistic benchmark, and be on the right side of the numbers.
I've had this same discussion, so I tried an experiment. Using APC, I tried a Kohana app with a single monolithic include (containing all of Kohana) as well as with the standard autoloader. The final result was that the single include was faster at a statistically irrelevant rate (less than 1%) but used slightly more memory (according to PHP's memory functions). Running the test without APC (or XCache, etc) is pointless, so I didn't bother.
So my conclusion was to continue use autoloading because it's much simpler to use. Try the same thing with your app and show your friend the results.
Now you don't need to guess.
Disclaimer: I wasn't using Apache. I cannot emphasize enough to run your own benchmarks on your own hardware on your own app. Don't trust that my experience will be yours.
You are the wiser ninja, grasshopper.
Autoloaders don't load the class file until the class is requested. This means that they will use at most the same amount memory as manual includes, but usually much less.
Classes get read fresh from file each request even if an apache thread can handle multiple requests, so your friends 'eventuall all are read' doesn't hold water.
You can prove this by putting an echo 'foo'; above the class definition in the class file. You'll see on each new request the line will be executed regardless of if you autoload or manually include the whole world of class files at start.
I couldn't find any good concise documentation on this--i may write some with some memory usage examples--as i also have had to explain this to others and show evidence to get it to sink in. I think the folks at zend didn't think anyone would not see the benifits of autoloading.
Yes, apc and such (like all caching solutions) can overcome the resouce negatives and even eek out small gains in performance, but you eat up lots of unneeded memory if you do this on a non-trivial number of libraries and serving a large number of clients. Try something Like loading a healthy chunk of the pear libraries in a massive include file while handling 500 connections hitting your page at the same time.
Even using things like Apc you benefit from using autoloaders with any non-namespaced classes (most of the existing php code currently) as it can help avoid global namespace pollution when dealing with large umbers of class libraries.
This is my opionion.
I think autoloaders are a very bad idea for the following reasons
I like to know what and where my scripts are grabbing the data/code from. Makes debugging easier.
This also has configuration problems in so far as if one of your developers changes the file (upgrade etc) or configuration and things stop working it is harder to find out where it is broken.
I also think that it is lazy programming.
As to memory/preformance issues it is just as cheap to buy some more memory for the computer if it is struggling with that.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am working on a Drupal based site and notice there are a lot of seperate CSS and js files. Wading though some of the code I can also see quite a few cases where many queries are used too.
What techniques have you tried to improve the performance of Drupal and what modules (if any) do you use to improve the performance of Drupal 'out of the box'?
Going to the admin/settings/performance page, turning on CSS and JS aggregation, and page caching with a minimum lifetime of 1 minute, will give you an immediate boost on a high traffic site. If you're writing your own code and doing any queries, consider writing your own discrete caching for expensive functions. The linked article covers Drupal 5, not 6, but the only change in d6 is the elimiation of the serialization requirement and the function signature for the cache_set() and cache_get() functions. (Both noted in comments on the article)
On large scale sites also consider dropping a memcached server onto the network: Using the memcached module, you can entirely bypass the drupal database for cached data. If you have huge amounts of content and search is a hot spot, you can also consider using lucene/solr as your search indexer instead of drupal's built-in search indexer. It's nice for a built-in indexer but it's not designed for heavy loads (hundreds or thousands of new pieces of content an hour, say, with heavy faceted searching). The apache solr module can tie in with that.
If you're making heavy use of Views, be sure that you've checked the queries it generates for unindexed fields; sorting and filtering by CCK fields in particular can be slow, because CCK doesn't automatically add indexes beyond the primary keys. In D6, preview the View in the admin screen, copy the text of the query, and run it through EXPLAIN in mysql or whatever query analysis tools you have.
Tools like YSlow and Firebug can also help you spot slow stuff like massive image files, JS hosted on remote servers, and so on.
Drupal 6, out-of-the-box, provides css and javascript aggregation --- most css and js files will be combined into a single file (and thus a single HTTP request), and are also whitespace-shortened (to reduce bandwidth consumption). You can enable this under /admin/settings/performance .
Also on that screen are controls for Drupal's (very effective) cache, which helps reduce the number of database queries.
Additionally, because Drupal (and all the modules you'll probably have installed) has a ton of PHP source, using a PHP opcode cache such as APC helps significantly decrease the request time.
I strongly second Benedict's recommendation of the Boost module - it alone will make your website fly on shared hosting, if configured correctly, and is not really buggy at all.
Turn on CSS/JS aggregation, turn on Boost, and your site can perform very well for anonymous users.
If your site deals with mostly logged-in users, you're going to have to do a lot more work making sure sessions are cached well, and probably consider using memcached and more SQL query caching.
The biggest performance gains always come from caching—but monitoring and adjusting your slow queries, monitoring and adjusting apache and PHP configurations, and being smart about the modules you use are very important as well.
Other than the drupal default cache , there are some another method to increase the performance
Boost module is one of the best.
memcache, Varnish(Drupal 7/Pressflow), CDN are the another methods that can improve the performance
It's also worth mentioning the performance boost from using quality SSD storage. It has consistently cut my initial response load times by 30% or more when migrating to SSD (I went from about ~450ms to ~250ms on my last project, using identical Apache/Memcache/Cloudfront EC2 configs), not to mention how much nicer it is to manage a snappy server where every command or script you throw at it is nearly instantaneous. I will never go back.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I want to know when building a typical site on the LAMP stack how do you optimize it for the best possible load times. I am picturing a typical DB-driven site.
This is a high-level look and could probably pull in question and let me break it down into each layer of the stack.
L - At the system level, (setup and filesystem) can you do to improve speed? One thing I can think of is image sizes, can compression here help optimize anything?
A - There have to be a ton of settings related to site speed here in the web server. Not my Forte. Probably depends a lot on how many sites are running concurrently.
M - MySQL in a database driven site, DB performance is key. Is there a better normalization approach i.e, using link tables? Web developers often just make simple monolithic tables resembling 1NF and this can kill performance.
P - aside from performance-boosting settings like caching, what can the programmer do to affect performance at a high level? I would really like to know if MVC design approaches hit performance more than quick-and-dirty. Other simple tips like are sessions faster than cookies would be interesting to know.
Obviously you have to get down and dirty into the details and find what code is slowing you down. Also I realize that many sites have many different performance characteristics, but let's assume a typical site that has more reads then writes.
I am just wondering if we can compile a bunch of best practices and fully expect people to link other questions so we can effectively workup a checklist.
My goal is to see if even in addition to the usual issues in performance we can see some oddball things you might not think of crop up to go along with a best-practices summary.
So my question is, if you were starting from scratch, how would you make sure your LAMP site was fast?
Here's a few personal must-dos that I always set up in my LAMP applications.
Install mod_deflate for apache, and
do not use PHP's gzip handlers.
mod_deflate will allow you to
compress static content, like
javascript/css/static html, as well
as the usual dynamic PHP output, and
it's one less thing you have to worry
about in your code.
Be careful with .htaccess files!
Enabling .htaccess files for
directories in your app means that
Apache has to scan the filesystem
constantly, looking for .htaccess
directives. It is far better to put
directives inside the main
configuration or a vhost
configuration, where they are loaded
once. Any time you can get rid of a
directory-level access file by moving
it into a main configuration file,
you save disk access time.
Prepare your application's database
layer to utilize a connection manager
of some sort (I use a Singleton for
most applications). It's not very
hard to do, and reducing the number
of database connections your
application opens saves resources.
If you think your application will
see significant load, memcached can
perform miracles. Keep this in mind
while you write your code... perhaps
one day instead of creating objects
on the fly, you will be getting them
from memcached. A little foresight
will make implementation painless.
Once your app is up and running, set
MySQL's slow query time to a small
number and monitor the slow query log
diligently. This will show you where
your problem queries are coming from,
and allow you to optimize your
queries and indexes before they
become a problem.
For serious performance tweakers, you
will want to compile PHP from source.
Installing from a package installs a
lot of libraries that you may never
use. Since PHP environments are
loaded into every instance of an
Apache thread, even a 5MB memory
overhead from extra libraries quickly
becomes 250MB of lost memory when
there's 50 Apache threads in
existence. I keep a list of my
standard ./configure line I use when
building PHP here, and I find it
suits most of my applications. The
downside is that if you end up
needing a library, you have to
recompile PHP to get it. Analyze
your code and test it in a devel
environment to make sure you have
everything you need.
Minify your Javascript.
Be prepared to move static content,
such as images and video, to a
non-dynamic web server. Write your
code so that any URLs for images and
video are easily configured to point
to another server in the future. A
web server optimized for static
content can easily serve tens or even
hundreds of times faster than a
dynamic content server.
That's what I can think of off the top of my head. Googling around for PHP best practices will find a lot of tips on how to write faster/better code as well (Such as: echo is faster than print).
First, realize that performance is an iterative process. You don't build a web application in a single pass, launch it, and never work on it again. On the contrary, you start small, and address performance issues as your site grows.
Now, onto specifics:
Profile. Identify your bottlenecks. This is the most important step. You need to focus your effort where you'll get the best results. You should have some sort of monitoring solution in place (like cacti or munin), giving you visibility into what's going on on your server(s)
Cache, cache, cache. You'll probably find that database access is your biggest bottleneck on the back end -- but you should verify this on your own. Fortunately, you'll probably find that a lot of your traffic is for a small set of resources. You can cache those resources in something like memcached, saving yourself the database hit, and resulting in better backend performance.
As others have mentioned above, take a look at the YDN performance rules. Consider picking up the accompanying book. This'll help you with front end performance
Install PHP APC, and make sure it's configured with enough memory to hold all your compiled PHP bytecode. We recently discovered that our APC installation didn't have nearly enough ram; giving it enough to work in cut our CPU time in half, and disk activity by 10%
Make sure your database tables are properly indexed. This goes hand in hand with monitoring the slow query log.
The above will get you very far. That is to say, even a fairly db-heavy site should be able to survive a frontpage digg on a single modestly-spec'd server if you've done the above.
You'll eventually hit a point where the default apache config won't always be able to keep up with incoming requests. When you hit this wall, there are two things to do:
As above, profile. Monitor your apache activity -- you should have an idea of how many connections are active at any given time, in addition to the max number of active connections when you get sudden bursts of traffic
Configure apache with this in mind. This is the best guide to apache config I've seen: Practical mod_perl chapter 11
Take as much load off of apache as you can. Apache's too heavy-duty to serve static content efficiently. You should be using a lighter-weight reverse proxy (like squid) or webserver (lighttpd or nginx) to serve static content, and to take over the job of spoon-feeding bytes to slow clients. This leaves Apache to do what it does best: execute your code. Again, the mod_perl book does a good job of explaining this.
Once you've gotten this far, it's largely an issue of caching more, and keeping an eye on your database. Eventually, you'll outgrow a single server. First, you'll probably add more front end boxes, all backed by a single database server. Then you're going to have to start spreading your database load around, probably by sharding. For an excellent overview of this growth process, see this livejournal presentation
For a more in-depth look at much of the above, check out Building Scalable Web Sites, by Cal Henderson, of Flickr fame. Google has portions of the book available for preview
I've used MysqlTuner for performance analysis on my mysql servers and its given a good insight into further issues for googling, as well as making its own recommendations
A resource you might find helpful is the YDN set of performance rules.
Don't forget the fact that your users will be thousands of miles away from your server, and downloading dozens of files to render a single page. That latency, and the overhead of rendering the page in their browsers can be larger than the amount of time that you spend collecting the information, and generating the page.
See the pages at Yahoo Developer Network about Best Practices for Speeding Up Your Web Site, and the YSlow tool for seeing what part of the downloading of the site is taking time.
Don't forget to turn off atime for your filesystem!
I'd recommend using Jet Profiler for MySQL to find any bad queries. I've successfully used it on a couple of my sites. Really helpful, and much easier to digest than the slow query log.
I'd recommend starting with http://highscalability.com/
As for your suggestions:
Compression for images, definitely no. Type of files system tunning, yes, that could have some effect, but minimal. But actually the best is to use in-memory reverse proxy, or even better CDN.
For Apache basically only load the modules you need. Do not load anything else. As with PHP you can only use forking MPM, it's important to keep it slim. As for optimal settings, well you have to fine tune them to specific application, hardware etc. If you have enough CPU, it's recommendable that you use mod_deflate. Faster the server can send data to the client, faster it can start processing next request.