Caching website by Codeigniter or 3rd party caching web application? - php

I'm very new to the concept "caching", so excuse me if my question is too simple.
So,I'm using Codeigniter(PHP framework) and it supports page caching, simply by doing this $this->output->cache(n)//n: number of minutes to remain cached
(I think) Codeigniter's caching will store any requested page in a cache file, and get the page when needed immediately.
Also there's a 3rd part web application called Vanish Cache, it sits between Apache and the client, then it will cache requested pages and re-send them again when needed, isn't that the same thing Codeigniter does, or is it different from that?
Wouldn't it be a waste to cache each page twice, by Codeigniter and Vanish?
assuming they do the exact same thing(cache pages and re-send them to the user),which one is more efficient for dynamic(database driver) websites?

On the surface, they do the same thing, however there are appropriate uses for different levels of cache.
A cache like Varnish, that sits between the web server and the application, provides very high performance. You use it for static content like CSS, static pages and dynamic content that changes very rarely.
An application cache provides a less performant but far more flexibile option. Usually you can cache by time, but also by application/request variables like "current user". This allows you to provide an state-dependant cache with a lot more fine control. For example, you could cache an object's detail page by it's last modified time in the database.

Related

Linux server: Would a cache scheme help reduce hits to 3rd-party server?

I have a situation where my Linux server will be running a website which gets some of its data from a 3rd-party server through a SOAP interface. The data isn't exactly real-time, but it does change every 5 minutes or so. I was told not to have our website hammer their website for data, which I can completely understand.
So I wondered if this was a good candiate to use a cache scheme of some type. Where when a user comes to our web page to display the data, if it's less than 5 minutes old (for example), it would get that data from our server instead of polling the 3rd-party for it. This way, if 100 users at once come to our website, our server won't be access the 3rd-party website 100 times to share the same exact data within a given time-frame.
Is this a practical thing to do in PHP? Or should this be written in a faster language when it comes to caching? Are their cache packages for this sort of situation which can be used along with a PHP Joomla application? Thanks!
I think memcached is a good choice.
You can set timeout when you store content to memcached server, if key-value missed, retrieve data from 3rd-part server and store again.
There is memcached extension for PHP, check doc here.
There's lots of ways to solve the problem -we can't say which is the right one without knowing a lot more about the constraints you are working in or how the service is used. If you are using Joomla then you're obviously not bothered about performance - it would be really hard to write anything which has a measurable impact on your html generation times. This does not need to "be written in a faster language", but....
can you install additional software?
have you got access to cron?
at what rate is the service consumed?
how many webservers do you have consuming the service - do they have a shared filesystem? Are they on the same sub-net?
Is the SOAP response cacheable?
how do you deal with non-availability of the service?
For a very scalable solution I would suggest running a simple forward proxy (e.g. squid) but do make sure that it's not accessible from the internet. Sven (see comment elsewhere) is right about POST sometimes not being cacheable - but you can cache the response from a surrogate script on your own site accessed via GET returning appropriate caching instructions - and this could return the data as a serialized php array / object which is much less expensive to process. Indeed whichever method you choose I would recommend caching the parsed response - not the XML. This also allows you to override poor caching information from the service.
If the rate is less than around 1 per minute then the cron solution is overkill. But if its more than 20 per minute then it makes a lot of sense. If you don't have access to cron / can't install your own software then you might consider simply caching the response and refreshing the cache on demand. Don't bother with memcache unless you are already using it. APC is faster on a single server - but memcache is distributed. If you have multiple servers then use whatever cluster storage you are currently sharing your data in (distributed filesystem / database cluster / shared filesystem....).
Don't try to use locking / mutexes around the cache refresh unless you really have to (i.e. only if accessing the service more than once every 5 minutes is a mortal sin) - this gets real complicated real quick - it's too easy to introduce bugs.
Do make sure you buffer and validate any responses before writing them to the cache.
Yes, just use HTTP. Most of the heavy lifting has already been built into your web server.
Since SOAP is just a simple HTTP POST request with an XML body, you could set up your website or HTTP API in front of the SOAP endpoint to act like a translator to regular HTTP, attaching the appropriate HTTP caching headers on the transformed response body and then configure an NGinx reverse proxy in front of it.
Notably: if the transformation is simple you could just use XSLT to transform the response body from the SOAP API and remove the web service layer entirely.
Your problem is a very small one, which does not require a complicated solution.
You could write a small cron job that is executed every five minutes, sends the request to the SOAP server, and stores the result in a local file. If any script needs the data, it reads the local file. This will result in 288 requests to the SOAP server per day, and have excellent performance for any script call that needs the results because they are already on your server.
If you do not have cron jobs available and cannot fake them, any other cache will do. You really don't need fancy stuff like Memcached, unless it already is available. Storing the result to a cache file will work as well. Note that if you have to really fetch the SOAP result from the origin, this will take some more time and might affect the perceived performance of your site.
There are plenty of frameworks which also offer cache support, and if you use one you should investigate if there is support included. I'm not sure if Joomla has something appropriate for you. Otherwise, you can implement something yourself. It isn't that hard.
Cache functionality comes in various flavours:
memory-based, where a separate process on the server holds data in RAM (or overflows to disk) and you query it like you would a database; very efficient and powerful, and will have options to manage storage use and clear up after themselves, but requires setting up additional software on the server; e.g. memcached, redis
file-based, where you just write the data to disk; less efficient, but can be implemented in "user-land" code, i.e. pure PHP; beware of filling up your disk with variant caches that have expired but not been cleaned up; many frameworks have an implementation of this built in
database-backed, where you push data into an RDBMS (e.g. MySQL, PostgreSQL) or fully-featured NoSQL store (e.g. MongoDB); might make sense if you have a large amount of data, and can trade a bit of performance; as with files, you need to make sure that stale data is cleaned up
In each case, the basic idea is that you create a "key" that can tell one request from another (e.g. the name of the SOAP call and its input parameters, serialized), and pick a "lifetime" (how long you want to carry on using the same copy of the data). The caching engine or library then checks for a cache with that key, and if it is still within its "lifetime" returns the previously cached data. If there is a "cache miss" (there is no cache for that key, or it has expired), you perform the costly operation (in your case, the SOAP call) and save to the cache, using the same key.
You can do more complex things, like pre-caching things in the background so that there is never a cache miss, or having some code paths which accept stale data in order to return quickly, but these can generally be implemented on top of whatever you're using as the main caching solution.
Edit Another important decision is at what level of granularity to cache the data, in relation to processing it. At one extreme, you could cache each individual SOAP call: simple to set up, but means re-processing the same data repeatedly, and can cause problems if two responses are related, but cached independently and may get out of sync. At the other extreme, you can cache whole rendered pages: pages load very fast once cached, but creating variations based on the same data without repeating work becomes tricky. In between are various points in your code where you have processed and combined data into meaningful chunks: if your application is well-written, these are the input and output of major functions, or possibly even complete model objects; this is more work to implement, as you have to choose the right keys (avoiding two contexts overwriting each other's caches while ignoring variables that have no impact on the data in question) and values (avoiding repeats of costly work without having to store huge blobs of data which will be slow to unserialize and use up the capacity of your cache store). As with anything else, no approach suits all needs, and a complex application will probably involve caching at multiple levels for different purposes.

Is it problematic to cache files for too long?

I discovered https://developers.google.com/speed/pagespeed/ the other day and have improved my website's page speed from ~75 to ~95 now.
One of the last few things it recommends is that I:
Leverage browser caching: Setting an expiry date or a maximum age in the HTTP headers
for static resources instructs the browser to load previously downloaded resources
from local disk rather than over the network.
The cache time for my main javascript and css files is set to 2 days, Google suggests I set it to at least 1 week. They also suggest that I do the same for html and php files.
What would happen to my users if I decided to make a large website change and they had just cached my website yesterday (for 1 week)? Would they not see the changes on my website until 1 week later?
Also, since my website contains a control panel and has some dynamically generated PHP pages, is there any reason for caching any of it? Wouldn't my server still be churning through php script and generating new content every time they logged into their account?
You probably doesn't want to cache your HTML and PHP in visitors browsers. However you might want to cache that in a layer you have more control over, like PHP opcode caching with APC and a reverse proxy like Varnish.
For the static assets, like your JavaScript and CSS files, it should be safe to cache them a year or more. If you make a change to them you can just update their URL to say mystyles.css?v=123 and browsers will think it's a whole different file from mystyles.css?v=122 or even just mystyles.css.

Use a cached version of a php page unless the database has changed

I've looked at similar questions about caching in PHP and I'm still stumped as to how to check whether the database has changed without making a new call to the database, which would defeat the point of caching.
I understand technically how to implement caching in PHP -- using ETag and Last Modified headers, output buffering, storing static files, etc. What is tripping me up is how to determine when to serve up a new version of a page instead of a cached version. If the database content has changed, I want to show the new version and not the cached version.
For example, let's say I have a page that displays details about a product. Generally, once the product info is stored in the database, it won't change much. But occasionally there might be an edit to the product description or a price change. If the product has a new price, I don't want to show the user the old price by using a cached version of the page. For that reason, updating the cached content every hour doesn't seem sufficient. Not to mention that that's too often for the content that doesn't change, the real problem is that it won't update the content fast enough when there is a change.
So should I store something (e.g., an ETag value or a static html file) every time the product database is updated through a form in the Admin area of the application? What am I missing here?
[Note: Not interested in using a caching library here. I'd like to learn how to do it in straight PHP for now.]
Caching is a pretty complex topic, because you can cache all sort of data in various places. Usually you implement caching to relieve bottlenecks in your server structure.
In your setup you can cache data at three different locations:
1) Clientside, between client and server
You would use this method to save bandwidth and shorten loading times for the user. You can achieve this by setting cache related fields into the http header (Cache-Control, Expires, ETag and so on).
If you use Cache-Control or Expires, the decision wether to load an updated version from the server or not purely depends on the client. So even if there is a new version available, the user won't see it. On the plus side you are saving lots of cpu cycles on the server, because your php script won't be executed.
If you use ETag, you can inform the client on each request, if the version of the requested content has changed. But your php script will be executed on each request, even if the ETag is unchanged.
2) Serverside, between client and server
This kind of caching primarily reduces high cpu load on your server. It won't affect the amount of traffic generated between client and server.
You can use a client proxy like Varnish to store rendered responses on the server side. The good thing is, that you have full control over the cache. If an updated version of a requested content is available, you can simply purge the old version from the cache, so that a new version is generated from your php script and stored in your cache.
Every response that is cacheable will only be generated one time and then be served from cache to the clients.
3) In your application
If you are heavily using your database, you should consider using a fast key value store like memcached to cache query results. Of course you have to adjust your database classes for this (first ask memcached, if memcached doesn't have the result ask the database and store the result into memcached), but the performance gain will be quite impressive, because memcached is really fast.
Sometimes it even makes sense to store data solely in memcached, if the data doesn't has to be persisted permanently (php sessions for example).
I had also faced the same problem long back (I dont know if you will find my way to be correct).
Why I needed the cache :-
What my site use to do was, it use to update the database by running the script on cron.php file and index.php use to show the listing from database (this use to take ages to load )
my solution :-
every time a new list was created or updated I unlinked the cache file then on index.php page I checked if cache file exists load cache or else load content from database also at the same time write this data to the cache file so next time when user requests for index.php file

Caching, CDN - are they the same for PHP Yii site? How to use it for a dynamic php site?

I have a dynamic php (Yii framework based) site. User has to login to do anything on the site. I am trying to understand how caching and CDN work; and I am a bit confused.
Caching (memcache):
My site has a good amount of css, js, and images. I've been given to understand that enabling caching ("memcache"?) will GREATLY speed up my site. But this has me confused. How does caching help? I mean, how can you cache something that's coming out of DB for each user separately? For instance, user-1 logs-in, he sees his control panel. User-2 logs-in, user 2 will see their control panel.
How do I determine what to cache? Plus, how do I enable caching (memcaching)?
CDN:
I have been told to use a content delivery network like CloudFlare. It is suppose to automatically cache my site. So, when my user-1 logs in, what will it cache? Will it cache only the homepage CSS, JS, and homepage images? Because everything else requires login? What happens when user logs-out? I mean, do "sessions" interfere with working of a CDN?
Does serving up images via CDN reduce significant load on my server? I don't have much cash for getting a clustered-server configuration. So, I just want my (shared) server to be able to devote all its resources in processing PHP code. So, how much load can I save by using "caching" (something like memcache) and/or "CDN" (something like CloudFlare)?
Finally,
What would be general strategy to implement in this scenario for caching, cdn, and basic performance optimization? do I need to make any changes to my php-code to enable CDN like CloudFlare and to enable/implement/configure caching? What can I do that would take least amount of developer/coding time and will make my site run much much faster?
Oh wait, some of my pages like "about us" page etc. are going to be static html too. But they won't get as many hits. Except for maybe the iFrame page that will be used for my Facebook Page.
I actually work for CloudFlare & thought I would hop in to address some of the concerns.
"do I need to make any changes to my php-code to enable CDN like
CloudFlare and to enable/implement/configure caching? What can I do
that would take least amount of developer/coding time and will make my
site run much much faster?"
No, nothing like a need to re-write urls, etc. We automatically cache static content by file extension. This does require changing your DNS to point to us, however.
Does serving up images via CDN reduce significant load on my server?
Yes, and it should also help most visitors access the site faster and save you a fair amount on bandwidth.
"Oh wait, some of my pages like "about us" page etc. are going to be
static html too."
CloudFlare doesn't cache HTML by default. You use PageRules to setup more advanced caching options for things like static HTML.
Caching helps because instead of performing disk io for each user the data is stored in the memory, ie memcached. This provides a SIGNIFICANT increase in performance.
Memcache is generally used for cacheing data ie query results.
http://pureform.wordpress.com/2008/05/21/using-memcache-with-mysql-and-php/
There are lots of tutorials.
I have only ever used amazon s3 which is is not quite a cdn. It is more of a storage platform but still it helps to take the load off of my own servers when serving media.
I would put all of your static resources on a CDN so your own server would not have to serve these. It would not require any modifcation to your php code. This includes JS, and CSS.
For your static pages (your about page) I'd make sure that php isn't processing that since there is no reason for it. Your web server should serve it directly.
Cacheing will require changes to your code. For cacheing a normal flow is:
1) user makes a request
2) check if data is in cache
3) if it is not in cache do the DB query and put it in cache
4) if it is in cache retrieve it
5) return data.
You can cache anything that requires disk io and you should see a speed up.
Memcached works by storing database information (usually from a remote server or even a database engine on the same server) in a flat file format in the filesystem of the web server. Accessing a flat file directly to retrieve data stored in a regulated format is much much muuuuuch faster than accessing that data from a remote query each time. This is typically useful when you have data that can be safely stored for certain periods of time as it is not subject to regular changes.
The way this works is that if you want to store a user's account information in a cache to speed up loading pages where that user is logged in. You would load the information and cache it locally. On any subsequent requests for that data, it will load in a fraction of the time it normally would take to load that information from the database itself. Obviously you will need to make sure that you update/recache that information if the user changes it while logged in, but you will greatly reduce the time it takes to serve up pages if you implement a caching system that can minimize the time spent waiting on the database.
I'm personally not familiar with CloudFlare so I can't offer any advice to that effect, but in terms of implementing caching in your application, you should check out:
http://code.google.com/p/memcached/wiki/NewOverview
And read the rest of the Wiki entries there which cover installation/implementation/etc. That should get you started on the right track.

php "page caching" solution suggestions for CMS Applications

Most examples use time-based cache expiration. I'd like to read more about file caches (where the database is called only when there is no file in a given directory). This is for a basic information site with CMS functions made with php/mysql. My searches are returning too many sites on web applications. Adding CMS to the search returns script repositories. I'd appreciate your suggestions.
It's not hard to write something like this yourself. Use file_exists() to check whether a specific file exists, or glob() how many files matching a given pattern there are.
I use a page build system...
Each page created is given a guid - when a request comes in for the page check to see if a file in the cache named GUID.xxx serve it if not build the page and cache.
On editing a page (or if its expiration has passed) delete the file from the cache.
You can elaborate at will as to how the expiration is determined/administered and what protions of the page to cache and which to allow dynamic builds per request...
I'm not quite sure what you're looking for.
If you're talking about generating a page (from the CMS) and placing it at the requested URI (so the next request bypasses even the CMS) - it's possible, but you make refreshing the 'cache' a little difficult.
However, what you may be looking for is just a server side cache (as opposed to telling the browser how long to cache a page). Those are usually file or memory based, and if you place the caching mechanism high in the CMS flow (perhaps where it processes the requests), you'll be caching a large portion of page creation.
Some cache libraries let you set an unlimited lifetime (for example Zend_Cache), leaving the cache maintenance up to you. That may be what you're looking for.

Categories