What's the best way (ways?) to speed up a php web site and how much faster it can using this or that way?
PHP isn't really the kind of language where you can do micro-optimizations, or just work on the code alone. There's really no point. Although PHP isn't particularly fast, PHP itself is rarely the bottleneck in a given web site.
You need to work out where that bottleneck is before you can fix it. There are a lot of common bottlenecks, with common solutions. It's difficult to generalize, given so few details, but there are a lot of performance hints that apply to most web sites.
The first good place to look is actually on the client side, rather than the server side. How large are your pages (including images, CSS, JavaScript and the like)? How many HTTP requests does a single page view require? Use something like Firebug (and the YSlow add-on for Firebug) to see how long your page actually takes to load, and which bits of your page cause the problem. Some general hints:
Work out ways to shrink the CSS and JavaScript - remove anything you don't need, and run the rest through a tool like YUI Compressor.
If you have multiple CSS and JavaScript files, try to combine them into a single file.
Optimize all of your images as much as possible, and see if you can combine any of those into a single file using CSS sprites or similar. PunyPNG is good for lossless images. A decent JPEG encoder (NOT Photoshop) is good for photos.
Move the CSS to the top of the page, and the JavaScript to the bottom, so the browser can render the page before the JavaScript has finished downloading.
Make sure that all of your CSS, JavaScript and HTML are being served compressed.
Make sure that you're using appropriate caching - if a file hasn't changed, there's no point in re-downloading it.
Once you've got the client side out of the way, you might have to turn your attention to the server side.
Install an opcode cache, like APC, XCache, or Zend Optimizer. It's very easy to do, and will always provide some improvement. Once you've done that, profile your pages, to find out where the time is actually being spent.
More likely than not, you'll be spending most of your time waiting for the database to return results. So, at a bare minimum:
Work out which queries are taking the longest, and work on them first. Use your head though - a query that takes five seconds on an admin page that nobody looks at is not as important as a query that takes one second on the front page.
Make sure that your query uses appropriate indexes. No common query should ever need to do a full table scan. Certain kinds of sorting or grouping may be unable to use indexes - try to avoid them, or modify the query so that it can use indexes.
Make sure that your queries aren't using temporary tables.
Use the EXPLAIN keyword - it's very useful.
Tune the database server itself. MySQL is generally not optimized for performance.
Once you've done that, it's usually best to start working out how to use caching. The best way to speed PHP code up is to reduce the amount of work it has to do.
Make sure your database's query cache is working properly.
Use something like Memcached to store frequently used results, instead of getting them from the database.
If you have enough memory, try to keep everything in Memcached, resorting to the database only when something isn't present in the cache.
If you have chunks of pages that are dynamic, but the same for all users, try caching those chunks. For example, if two users are looking at an article, the article itself is going to be exactly the same for each user, even if the rest of the page isn't. Generate the HTML for the article, and chuck it in the cache.
If you have lots of non-authenticated users, it's entirely possible that they'll all be seeing the exact same page. Two non-authenticated users looking at the above article won't just see an identical article - they'll see an identical page, right down to the login links. Set your PHP scripts up so you can use HTTP caching headers (check the last modified date, and return a 304 Not Modified if it's not been changed). Once you've done that, stick a Squid reverse-proxy in front of the webserver, and let Squid serve pages out of it's cache.
After that point, the general approach is to start using more servers, and the problem becomes one of scaling, rather than raw speed. The general plan is to make sure that your website has a shared-nothing architecture - all persistent data is stored in the database. Then, you install multiple webservers, move the database server to a separate machine, and run the entire thing behind a caching reverse proxy. To add more capacity, you add more machines.
One way: php accelerators, e.g. APC.
Another; read blog articles, e.g. performance tuning overview.
A general question i would say. Try looking for optimazation tips online...
Several parameters are involved:
I/O access (using it a lot - file_exists, is_file overheads)
Database access (optimize queries, use stored procedures, check your db cache)
Using an opcode cache (like APC)
Compressing output
Serving js/css minified and compressed (and using subdomains to deliver them to the browser)
Using memcache to cache data into memory for faster access
You can use benchmarking tools to test your environment before and after the optimizations.
Try apache bench for example.
Filesize.
A file of 500 KB takes longer to download then a file of 300 KB. So optimize and crop as much as you can.
Accelators
Self explainable: List of PHP accelerators
Server upgrade
Though this costs money, when dealing with a lot of traffic, it will have impact on how fast the .php files gets processes and how fast data will be send to the user.
I don't recommend this though since there are other (free) ways to improve speed.
Don't user external resources
If you are linking some images trough other sites, the speed of the downloading will not be in your control. Instead, if you plan on using images from others download them to your own server first (or upload them to your own provider) and load them that way.
Review and improve your code
Find short cuts, remove unnecessary code, delete unused variables, reuse others etc.
There are other ways but I believe the above information has the most impact on your speed
You should probably do some search for existing answers to this question, however...
APC for opcode caching
Memcached for object storing (to reduce the number of database queries)
Check for / optimize slow SQL queries
Measure and find bottlenecks
Don't rely on (slow) web services on each page load, etc.
Yahoo has got some good basic advice on speeding up web pages, much of it very easy to implement. You may also want to download yslow + firebug for firefox; they will help indicate possible basic bottlenecks from a client request perspective.
The rest of the advice here is good, so I wont add much else other than; don't bother optimising any code until you're 100% sure that you've found a bottleneck. I can't stress that enough. Don't waste time tweaking code or implementing new things (ie caching) because you "feel" will make things quicker, act only on real evidence (ie performance profiling).
Related
I have a situation where my Linux server will be running a website which gets some of its data from a 3rd-party server through a SOAP interface. The data isn't exactly real-time, but it does change every 5 minutes or so. I was told not to have our website hammer their website for data, which I can completely understand.
So I wondered if this was a good candiate to use a cache scheme of some type. Where when a user comes to our web page to display the data, if it's less than 5 minutes old (for example), it would get that data from our server instead of polling the 3rd-party for it. This way, if 100 users at once come to our website, our server won't be access the 3rd-party website 100 times to share the same exact data within a given time-frame.
Is this a practical thing to do in PHP? Or should this be written in a faster language when it comes to caching? Are their cache packages for this sort of situation which can be used along with a PHP Joomla application? Thanks!
I think memcached is a good choice.
You can set timeout when you store content to memcached server, if key-value missed, retrieve data from 3rd-part server and store again.
There is memcached extension for PHP, check doc here.
There's lots of ways to solve the problem -we can't say which is the right one without knowing a lot more about the constraints you are working in or how the service is used. If you are using Joomla then you're obviously not bothered about performance - it would be really hard to write anything which has a measurable impact on your html generation times. This does not need to "be written in a faster language", but....
can you install additional software?
have you got access to cron?
at what rate is the service consumed?
how many webservers do you have consuming the service - do they have a shared filesystem? Are they on the same sub-net?
Is the SOAP response cacheable?
how do you deal with non-availability of the service?
For a very scalable solution I would suggest running a simple forward proxy (e.g. squid) but do make sure that it's not accessible from the internet. Sven (see comment elsewhere) is right about POST sometimes not being cacheable - but you can cache the response from a surrogate script on your own site accessed via GET returning appropriate caching instructions - and this could return the data as a serialized php array / object which is much less expensive to process. Indeed whichever method you choose I would recommend caching the parsed response - not the XML. This also allows you to override poor caching information from the service.
If the rate is less than around 1 per minute then the cron solution is overkill. But if its more than 20 per minute then it makes a lot of sense. If you don't have access to cron / can't install your own software then you might consider simply caching the response and refreshing the cache on demand. Don't bother with memcache unless you are already using it. APC is faster on a single server - but memcache is distributed. If you have multiple servers then use whatever cluster storage you are currently sharing your data in (distributed filesystem / database cluster / shared filesystem....).
Don't try to use locking / mutexes around the cache refresh unless you really have to (i.e. only if accessing the service more than once every 5 minutes is a mortal sin) - this gets real complicated real quick - it's too easy to introduce bugs.
Do make sure you buffer and validate any responses before writing them to the cache.
Yes, just use HTTP. Most of the heavy lifting has already been built into your web server.
Since SOAP is just a simple HTTP POST request with an XML body, you could set up your website or HTTP API in front of the SOAP endpoint to act like a translator to regular HTTP, attaching the appropriate HTTP caching headers on the transformed response body and then configure an NGinx reverse proxy in front of it.
Notably: if the transformation is simple you could just use XSLT to transform the response body from the SOAP API and remove the web service layer entirely.
Your problem is a very small one, which does not require a complicated solution.
You could write a small cron job that is executed every five minutes, sends the request to the SOAP server, and stores the result in a local file. If any script needs the data, it reads the local file. This will result in 288 requests to the SOAP server per day, and have excellent performance for any script call that needs the results because they are already on your server.
If you do not have cron jobs available and cannot fake them, any other cache will do. You really don't need fancy stuff like Memcached, unless it already is available. Storing the result to a cache file will work as well. Note that if you have to really fetch the SOAP result from the origin, this will take some more time and might affect the perceived performance of your site.
There are plenty of frameworks which also offer cache support, and if you use one you should investigate if there is support included. I'm not sure if Joomla has something appropriate for you. Otherwise, you can implement something yourself. It isn't that hard.
Cache functionality comes in various flavours:
memory-based, where a separate process on the server holds data in RAM (or overflows to disk) and you query it like you would a database; very efficient and powerful, and will have options to manage storage use and clear up after themselves, but requires setting up additional software on the server; e.g. memcached, redis
file-based, where you just write the data to disk; less efficient, but can be implemented in "user-land" code, i.e. pure PHP; beware of filling up your disk with variant caches that have expired but not been cleaned up; many frameworks have an implementation of this built in
database-backed, where you push data into an RDBMS (e.g. MySQL, PostgreSQL) or fully-featured NoSQL store (e.g. MongoDB); might make sense if you have a large amount of data, and can trade a bit of performance; as with files, you need to make sure that stale data is cleaned up
In each case, the basic idea is that you create a "key" that can tell one request from another (e.g. the name of the SOAP call and its input parameters, serialized), and pick a "lifetime" (how long you want to carry on using the same copy of the data). The caching engine or library then checks for a cache with that key, and if it is still within its "lifetime" returns the previously cached data. If there is a "cache miss" (there is no cache for that key, or it has expired), you perform the costly operation (in your case, the SOAP call) and save to the cache, using the same key.
You can do more complex things, like pre-caching things in the background so that there is never a cache miss, or having some code paths which accept stale data in order to return quickly, but these can generally be implemented on top of whatever you're using as the main caching solution.
Edit Another important decision is at what level of granularity to cache the data, in relation to processing it. At one extreme, you could cache each individual SOAP call: simple to set up, but means re-processing the same data repeatedly, and can cause problems if two responses are related, but cached independently and may get out of sync. At the other extreme, you can cache whole rendered pages: pages load very fast once cached, but creating variations based on the same data without repeating work becomes tricky. In between are various points in your code where you have processed and combined data into meaningful chunks: if your application is well-written, these are the input and output of major functions, or possibly even complete model objects; this is more work to implement, as you have to choose the right keys (avoiding two contexts overwriting each other's caches while ignoring variables that have no impact on the data in question) and values (avoiding repeats of costly work without having to store huge blobs of data which will be slow to unserialize and use up the capacity of your cache store). As with anything else, no approach suits all needs, and a complex application will probably involve caching at multiple levels for different purposes.
My question is whether or not using multiple PHP includes() is a bad idea. The only reason I'm asking is because I always hear having too many stylesheets or scripts on a site creates more HTTP requests and slows page loading. I was wondering the same about PHP.
The detailed answer:
Every CSS or JS file referenced in a web page is actually fetched over the network by the browser, which involves often 100s of milliseconds or more of network latency. Requests to the same server are (by convention, though not mandated) serialized one or two at a time, so these delays stack up.
PHP include files, on the other hand, are all processed on the server itself. Instead of 100s of milliseconds, the local disk access will be 10s of milliseconds or less, and if cached, will be direct memory accesses which is even faster.
If you use something like http://eaccelerator.net/ or http://php.net/manual/en/book.apc.php then your PHP code will all be precompiled on the server once, and it doesn't even matter if you're including files or dumping them all in one place.
The short answer:
Don't worry about it. 99 times out of 100 with this issue, the benefits of better code organization outweigh the performance increases.
The use of includes helps with code organization, and is no hindrance in itself. If you're loading up a bunch of things you don't need, that will slow things down -- but that's another problem. Clarification: As you include pages, be aware what you're adding to the load; don't carelessly include unneeded resources.
As already said, the use of multiple PHP includes helps to keep your code organized, so it is not a bad idea. It will become a problem when too many includes are used, because the web server will have to perform an I/O operetation for each include you have.
If you have a large web application, you can boost it using a PHP accelerator, which caches data and compiled code from the PHP bytecode compiler in shared memory. If you have lots of PHP includes in a specific file they will be performed just once. Further calls to that file will hit the cache, so no PHP require will be performed.
It really depends on what you want to do, I mean if you have a piece of code that is used all the time it is really convenient to include it instead of copying and pasting all the time and that will make your code more clear and not slower, but if you include all the functions or classes you have written in your files without using them of course thats not a good practice...I would suggest using a framework (like codeigniter or something else you find convinient) because it really helps clearing this things out... good luck!
The only reason I'm asking is because
I always hear having too many
stylesheets or scripts on a site
creates more HTTP requests and slows
page loading. I was wondering the same
about PHP.
Do notice that an HTTP request is several orders of magnitude slower that a PHP include.
Several HTTP requests -> Client has to request and accept over the wire several files
Several PHP includes -> Client has to request and accept only one file
The includes, obviously, will have a server penalty. But by your question... don't sweat it. Only on really large-scale PHP projects will you face such problems.
What are some important optimizations that can be made to a website to reduce the loading time?
Remove/Minimize any bottlenecks on the server side. For this purpose, use a profiler like Xdebug or Zend Debugger to find out where your application is doing expensive and slow operations. Implement caching where possible. Use an OpCode Cache. If this still isn't fast enough consider investing in more CPU or RAM or SSDs (depending on whether you are CPU, IO or Memory bound)
For general server/client side optimizations, see the Yahoo YSlow! User Guide.
It basically sums it up to:
Minimize HTTP Requests
Use a Content Delivery Network
Add an Expires or a Cache-Control Header
Gzip Components
Put StyleSheets at the Top
Put Scripts at the Bottom
Avoid CSS Expressions
Make JavaScript and CSS External
Reduce DNS Lookups
Minify JavaScript and CSS
Avoid Redirects
Remove Duplicate Scripts
Configure ETags
Make AJAX Cacheable
Use GET for AJAX Requests
Reduce the Number of DOM Elements
No 404s
Reduce Cookie Size
Use Cookie-Free Domains for Components
Avoid Filters
Do Not Scale Images in HTML
Make favicon.ico Small and Cacheable
Also see the comments contributed below, as they contain some additional useful information for other users.
Before attempting any optimizations first you need to be able to profile, get FireBug for Firefox. Then you can run some analysis that will tell you exactly what to do using YSlow. Fundamental things that you should do are listed here.
definitely want to look at caching, as round trips to DB are expensive.
also, minify JS
Here are a few "best practice" things:
Caching CSS, JavaScript, images, etc.
Minifying Javascript files.
gzip content.
Place links to JavaScript files, JavaScript code, and links to CSS files at the bottom of your page when possible.
Load only what is necessary.
For an existing website, before you do any of this determine where your bottlenecks are with tools like Firebug and as someone else mentioned YSlow (I highly recommend this tool).
install firebug and pagespeed plugin
follows all the pagespeed directives (until possible) and be happy
http://code.google.com/intl/it/speed/page-speed/
anyway the most importante optimization in my experience is to reduce the number of HTTP requests to a minimum...
The simple options I can think of are:
Gzip (x)html, so a compressed file should arrive more quickly to the user
minify the CSS
minify the JS
use caching where possible
use a content-delivery network
use a tool, such as yslow to identify bottlenecks and further suggestions
There are two sides you can care about, when optimizing :
The server side : what matters is generating the ouput faster
The client side : what matters is getting all that has to be displayed faster.
Note : we, as developpers, often think about optimizing the server-side first... Which in most cases only represents less than 10% percent of the loading-time of the page !
On the server side, you'll generally want to :
profile, to determine what's long
optimize your SQL queries, and reduce their number
use caching
For more informations, you can take a look to the answer I gave some time ago to this question : Optimizing Kohana-based Websites for Speed and Scalability
On the client side, the biggest gains are generally achieved by :
Reducing the number of HTTP requests -- the easiest way being to reduce the number of JS/CSS/images files, by combining several files into one
Compressing CSS/JS/HTML, using for instance Apache's mod_deflate.
About that, there is a lot of great stuff on Yahoo's Exceptional Performance : they've released lots of good pratices and tools, such as yslow.
We recently did this on our web site. Here we outlined nine techniques that seemed to have the highest impact with the least difficulty: http://mentormate.com/blog/easy-ways-speed-website-load-time/
The first optimisation is: Decide if it is slow, and if not, don't bother.
This is trickier than it sounds, because it's not like testing a desktop app or game. A game is slow if when you play it on the target hardware, the frame rate is too low. This is very easy to measure.
A web site is trickier, because you, as the developer, are probably using a local test system with a very fast network. Even when you use your staging / system test servers, you're probably still on the local network. Even your production servers are in all likelihood, on the same continent.
The same is possibly not true for quite a lot of your users.
Therefore the options which exist are:
Find out by asking your users, whether they find it to be slow
Simulate a high latency environment and test it yourself (or your QA team)
Guesswork
The latter is not recommended.
An option which the holier-than-thou Yahoo Web Sites performance book (which yes, is a book you can buy) doesn't mention a lot is HTTPS. Most web applications which handle important data run mostly or entirely over HTTPS, which changes the rules of the game rather a lot. Remember to do all testing with it enabled.
i wrote some things about, see:
Google page speed test optimization
As already mentioned, you can use Yslow or PageSpeed firefox extension. But you can also use GTmetrix, an online service scanning your page with both tools.
Features I like / use:
soft, clean and usable presention
comparison with another page. It's really interesting to see where are your friends / competitors.
(by the way, i'm not related to gtmetrix !)
To reduce network traffic, you can minify static files, such as CSS and Javascript, and use gzip compression on generated content. You can also try using tools such as optipng to reduce the size of images.
However, the first step to take is to actually analyse what's taking all of the time -- whether it's sending the bits over the network, or actually generate the content to send. There's no point making your CSS files 10% smaller if it takes a minute to generate each HTML page.
Don't use whitespace in code.
Load balancing would help to reduce the loading time immense.
The site I am developing in php makes many MySQL database requests per page viewed. Albeit many are small requests with properly designed index's. I do not know if it will be worth while to develop a cache script for these pages.
Are file I/O generally faster than database requests? Does this depend on the server? Is there a way to test how many of each your server can handle?
One of the pages checks the database for a filename, then checks the server to see if it exists, then decides what to display. This I would assume would benefit from a cached page view?
Also if there is any other information on this topic that you could forward me to that would be greatly appreciated.
If you're doing read-heavy access (looking up filenames, etc) you might benefit from memcached. You could store the "hottest" (most recently created, recently used, depending on your app) data in memory, then only query the DB (and possibly files) when the cache misses. Memory access is far, far faster than database or files.
If you need write-heavy access, a database is the way to go. If you're using MySQL, use InnoDB tables, or another engine that supports row-level locking. That will avoid people blocking while someone else writes (or worse, writing anyway).
But ultimately, it depends on the data.
It depends on how the data is structured, how much there is and how often it changes.
If you've got relatively small amounts, of relatively static data with relatively simple relationships - then flat files are the right tool for the job.
Relational databases come into their own when the connections between the data are more complex. For basic 'look up tables' they can be a bit overkill.
But, if the data is constantly changing, then it can be easier to just use a database rather than handle the configuration management by hand - and for large amounts of data, with flat files you've got the additional problem of how do you find the one bit that you need, efficiently.
This really depends on many factors. If you have a fast database with much data cached in the RAM or a fast RAID system, chances are bad, that you will gain much from simple file system caching on the web server. Also think about scalibility. Under high workload a simple caching mechanism might easily become a bottle neck while a database is well designed to handle high work loads.
If there are not so much requests and you (or the operating system) is able to keep the cache in RAM, you might be able to gain some performance. But now the question arises, if it is realy neccessary to perform caching under low work load.
From plain performance perspective, it is wiser to tune the database server and not complicate the data access logic with intermediate file caches. A good database server would do the caching on its own if the results are cacheable. (I'm not sure what is teh case with mysql).
If you have performance problems, you should profile the pages to see the real bottlenecks. Even when you are -like me- a fan of the optimized codes, putting a stronger/more hardware into the equation is cheaper on the long run.
If you still need to use caches, consider using an existing solution, like memcached.
We are working on a website for a client that (for once) is expected to get a fair amount of traffic on day one. There are press releases, people are blogging about it, etc. I am a little concerned that we're going to fall flat on our face on day one. What are the main things you would look at to ensure (in advance without real traffic data) that you can stay standing after a big launch?
Details: This is a L/A/M/PHP stack, using an internally developed MVC framework. This is currently being launched on one server, with Apache and MySQL both on it, but we can break that up if need be.
We are already installing Memcached and doing as much PHP-level caching as we can think of. Some of the pages are rather query intensive, and we are using Smarty as our template engine. Keep in mind there is no time to change any of these major aspects--this is the just the setup. What sorts of things should we watch out for?
Measure first, and then optimize. Have you done any load testing? Where are the bottlenecks?
Once you know your bottlenecks then you can intelligently decide if you need additional database boxes or web boxes. Right now you'd just be guessing.
Also, how does your load testing results compare against your expected traffic? Can you handle two times the expected traffic? Five times? How easy/fast can you acquire and release extra hardware? I'm sure the business requirement is to not fail during launch, so make sure you have lots of capacity available. You can always release it afterwards when the load has stabilized and you know what you need.
I would at least factor out all static content. Set up another vhost somewhere else and load all the graphics, CSS, and JavaScript onto it. You can buy some extra cycles, offloading the serving of that type of content. If you're really concerned, you can signup and use a content distribution service. There are lots now similar to Akamai and quite cheap.
Another idea might be to utilize Apache mod_proxy to keep the generated page output for a specific amount of time. APC would also be quite usable... You could employ output buffering capture + the last modified time of related data on the page, and use the APC cached version. If the page isn't valid any more, you regenerate and store in APC again.
Good luck. It'll be a learning experience!
Have a beta period where you allow in as many users as you can handle, measure your site's performance, and work out bugs before you go live.
You can either control the number of users explicitly in a private beta, or a Google-style semi-public beta where each user has a number of referrals that they can offer to their friends.
To prepare or handle a spike (or peak) performance, I would first determine whether you are ready through some simple performance testing with something like jmeter.
It is easy to set up and get started and will give you early metrics whether you will handle an expected peak load.
However, given your time constraints, other steps to take would be to prepare static versions of content that will attract the highest attention (such as press releases, if your launch day). Also ensure that you are making the best use of client-side caching (one fewer request to your server can make all the difference). The web is already designed for extremely high scalability and effective use content caching is your best friend in these situations.
There is an excellent podcast on high scalability on software engineering radio on the design of the new Guardian website when things calm down.
Good luck on the launch.
I'd, personally, do a few things
1) Put in some sort of load balancer/database replication system
This means that you can have your service spread across multiple servers. Can't afford to have more than one server permanently? Use Amazon E3 - It's good for putting in place for things like this (switch on a few more servers to handle the load)
2) Code in some "High Load" restrictions
For example, if your searching is inefficient - switch it off when load gets to a certain level. "Sorry, we're busy, try again later for searching"
3) Load test... Use something like ApacheBench to stress test your servers.
4) Personally, I think that switching "Keep-Alive" Connections off is better. It may slightly reduce overall performance, but - it means that instead of having something where the site works well for a few people, and the others get timeouts, everyone gets inconsistent service, if it gets to that level
Linux Format did a good article on "How to survive a slashdotting"... which I've found useful in the past. It's available online as a PDF
Basic first steps to harden your site for high traffic.
Use a low-cost tool like https://browsermob.com/ to load-test your site. At a minimum, you should be looking at 100K unique visitors per hour. If you get an ad off of the MSN home page, look to be able to handle 500K unique visitors per hour.
Move all static graphic/video content to a CDN. Edgecast and Amazon are two excellent choices.
Use Jet Profiler to profile your MySQL server to analyze any slow performing queries. Minor changes can have huge benefits.
Look into using Varnish - it's a caching reverse proxy server (like Squid, but much more single purpose).
I've run some pretty big sites behind it, and it seemed to work really well.