If you open up your mozilla Firefox web browser and turn on firebug to check for incoming and outcoming network traffic, you see that, when you look at Wikipedia articles, the amount of cached content is very large.
Unless the article in question has many pictures, most of the content comes from the cache.
I'd like to know whether that is done by the browser itself or if it's some underlying PHP Caching mechanism. (is that what they call memcache?APC?) It works very well so I'd like to know how they do it.
Memcacahe, APC etc are server side data stores. You basically use it as a key value store so you don't have to ping your database all the time.
However, what you're actually seeing is a site being loaded on a primed cache. This is the technique of telling your web server to let the browser know that your commonly used resources haven't changed since the last time you viewed it. This effect is achieved by setting far future headers so that the browser doesn't keep requesting the resources. A lot of sites use this technique, including SO.
Here's a great source to read up on, if you want more info : http://developer.yahoo.com/performance/rules.html
Related
I have a very simple question, which I have been unable to find a clear answer to. I have a web page which is generated dynamically (so I absolutely do not want it cached), which loads in several thousand images. Since there are so many images, and they never change, I very definitely want the user's browser to cache these images.
I'm applying headers to the HTML page to prevent caching, by following the advice in this solution: How to control web page caching, across all browsers?
My question is: Will this cause the user's browser to also not-cache any images this page contains, or will it cache them? Thank you.
TL;DR the answer is not clear because it is complicated.
There is an ongoing struggle between a drive to do the "right" thing (i.e., follow the standards... which themselves have changed) and a drive to "improve" the standards to achieve better performances or smoother navigation experience for users. So from the application point of view you need to properly use headers such as ETag, If-Modified-Since and Expires, together with cache hinting pragmas, but the browser - or something in the middle such as a proxy - might still decide to override what would be the "clear thing to do".
On the latest Firefox, directly attached to an Apache 2.4 on virtual Ubuntu machine, I have tried with a page (test.html) referring to an image (test.jpg).
When the page is cached, server side I see a single request for the HTML and nothing for the image. What is probably happening is that the "rendering" part of Firefox does request the image (it has to!), but that is entirely supplied by the local cache. This makes sense; if the page has not changed, its content hasn't changed.
When the page is not cached, I see two requests, one for the page and one for the image, to which the server responds with a 304, but that is because I also send the image's Last-Modified header. This also makes sense - if the page has changed, the images might have changed too, so the browser has to know whether this is the case, and can only do so by asking the server (unless the Expires header is used to "assure" the client that the image will not change).
I have not yet tried with an uncached page that responds with a 304. I expect it to generate a single request (no image request to the server), for the same reasons.
What you might want to consider is that your way you will not cache the HTML page but might still perform a thousand image requests (which will yield one thousand 304's, but still). Performances on this kind of event depend on whether the requests are sent independently or back-to-back by using the Keep-Alive HTTP/1.1 extension (has to be enabled and advertised server side).
You should then use the Expires header on the images to tell the client that those resources will not go stale anytime soon.
You might perhaps also want to explore a different approach:
the HTML is cached
images are cached too
the HTML also references a (cached?) Javascript
variable content is loaded by the Javascript via AJAX. That request can be made cache-unfriendly by including a timestamp, without involving the server at all.
This way you can configure the server for caching everything everywhere, except where you make sure it can't via a single crafted request.
Problem
When you load up a page on my site, often times one (or several) images (jpg files that I have saved from Lightroom and/or Photoshop) will not appear. It instead looks like a broken link (ALT description appears) but no image. Hard reload of the browser solves problem (e.g. all images load properly after a hard reload).
Error Message
Chrome displays an "ERR_CONTENT_LENGTH_MISMATCH" warning for all images it does not load. (Sometimes the image will flash quickly before going to what looks like a dead image)
Setup
Running latest version of Wordpress (4.2.2) on a Shared Host. Site is SSL (https) if that matters. Images are located in an image upload folder (nothing complex like Imagemagick, etc) on the host.
My Troubleshooting
I have replicated the issue from multiple locations using various ISPs on various machines (both Mac & PC) and with various browsers (Chrome & Safari) some of which are not using any ad-blockers.
What I've tried is the following:
I asked the host if there was an issue on the server side. They claim no.
I've tried resetting the functions.php file. No impact.
I've disabled all plug-ins. No impact.
I've hardkeyed in the meta charset as UTF-8. No impact.
Checked if I'm using Gzip. I am not.
Enabled Wordpress Cache plugin. No impact.
Cleared .htaccess of all non-necessary redirects & commands. No
impact.
Replaced wp-admin and wp-includes folders from fresh install. No
impact.
Deleted Wordpress & Reinstalled from a Backup. No dice.
I've put source code from pages that have this issue into a test.html file and the images seem to load up fine doing that.
My Thoughts & Questions
The images are 100-200kb each and sometimes there are a fair amount of them on the page. Is something timing out and then once I hard reload, everything show up because the timeout isn't tripped? That is the best random guess I can gather without understanding the issue perfectly.
Any ideas of things I can try? Should I delete the whole database and start again? Everything I know about computers is self-taught and server issues are not a strong point for me. Even if you don't know what it might be, could someone explain what a content length mismatch is in general terms?
Thanks much!
When you request data from a web server, it responds first with some information about the data (HTTP headers) and then with the data. One of these pieces of information, an HTTP header, is called Content-Length. It tells the client how much data it should expect to receive from the server. When your browser gets an image, the server's response (very simplified looks like)
Content-Length: 100000
< the image, 100000 bytes of data >
The client knows the request is complete when it has received the amount of data told by Content-Length. Until it receives in this case 100KB (100000 bytes), it considers the image, for example, to not be done loading.
If the server breaks the request before the client receives the data from the server, or if the client receives more data than it received, the client will throw some sort of error and assume the data to be corrupted/unusable and dispose of it. How this is handled can vary between browsers.
How did you upload the images to your website? Myself, I have encountered this problem in a situation where the file's supposed size was stored in the database, and this was used to set the Content-Length header. The file size in the DB wasn't correct for the file. HOWEVER, I know that WordPress does not store file sizes in the database; media uploads are simply represented by a URL.
This could also happen if the web server runs out of resources and can no longer fulfill your requests; you said you had lots of images per page. If you are on a really lousy shared hosting plan, it may be the case that the host imposes limits, or that the server simply can't handle the traffic of all the sites it hosts.
I wanted to circle back on this in case someone else is experiencing this problem. It appears that there is some type of glitch between HTTPS and image retrieval that was causing the problem. While I don't understand WHY that is, I converted my site from SSL/HTTPS to simple HTTP (which I was able to do as it doesn't require encryption) and it appears all images load as they should.
If someone understands the "why", I'd love to understand what the issue actually is. Luckily, I was able to come up with workaround. So, while this doesn't answer my question, it does provide context of what is causing the problem and my common sense workaround.
You might see this problem with a shared hosting service. Free bandwidth is like free speech, not free beer. Resource outage policies are invoked during traffic spikes.
A distributed system architecture solves this by inserting a front-end CDN tier (eg. CloudFlare). CDNs cache your static resources and can vastly reduce the load on your host. In fact, for completely static sites the host can be shut down.
There are other advantages to CDNs, like attack detection, free SSL (not beer) and overall improved performance and security compared to shared hosting alone.
Many CDNs are free (as in speech). You could also upgrade to private hosting, but $ and you still might want a front-end tier.
I'm working on a website that can be found here:
http://odesktestanswers2013.com/Metareviewer
The index appears to be unusually slow (slowing down the browser as it loads) even though Yslow doesn't seem to see anything particularly wrong with it and that my php microtime returns a fine value.
What's the other things I should be looking into ?
Using Chrome Developer Tools, the network tab shows this:
... a timeline of what's loading in your page.
There are also plenty of good practices that aren't being made here. Some of these can also be flagged up by using Google Chrome's Audit tool (F12 menu), but in my opinion the most important are:
Use a CDN for serving common library code. Do you really need to host Jquery yourself? (side-rant, do you really need jquery at all?)
Your JavaScript files are taking a long time to load, because they are all served as separate HTTP calls. You can combine them into a single JavaScript file, and also minify them to save lots of bandwidth.
Foundation.css is very large - not that there's a problem with large CSS files, but it looks like there are over 2000 rules in the CSS file that aren't being used on your site. Do you need this file?
CACHE ALL THE THINGS - there are 26 HTTP requests that are made, that are uncached, meaning that everyone who clicks on your site will have to download everything, every request.
The whole bandwidth can be reduced by about two thirds if you enabled gzip compression on your server (or even better, implement SPDY, but that's a newer technology with less of a community).
Take a look on http://caniuse.com - there are a lot of CSS technologies that are supported in modern browsers without the need for -webkit or -moz, which could save a fortune of kebabbobytes.
If I could change one thing on your site...
Saying all of that, each point above will make a very small (but accumulative) difference to the speed of your site, but it's probably a good idea to attack the worst offender first.
Look at the network graph. While all that JavaScript is downloaded, it is blocking the rest of the site to download.
If you're lazy, just move it all to the end of the document body. That way, the rest of the page will download before the JavaScript has to, but this could harm the execution of your scripts if they are programmed in particular styles.
Hope this helps.
You should also consider using http://www.webpagetest.org/
It's one of the best tools when it comes to benchmarking your site's performance.
You can use this site (http://gtmetrix.com/) to analyze the causes and to fix them them.The site provides the reasons as well as solutions like js and css in optimized formats.
As per this site's report, you need to optimize images and minify js and css files. The optimized images and js and css files can be downloaded from this site.
Use Google Chrome -> F12 -> Network and check the connect, send, receive and etc. time for each resource, used in your page.
It looks like your CSS and JS scripts have a very long conntect and wait times.
you can use best add-one available for both chrome and firefox
YSlow analyzes web pages and suggests ways to improve their performance based on a set of rules for high performance web pages.
above is link for firefox add-one you can also search chrome and is freely available.
Yslow gives you details about your website's front end. Most likely you have a script that is looping one to many times in the background.
If you suspect that a sequence of code is hanging server side then you need to do a stack trace to pinpoint exactly where the overhead is taking place.
I recommend using New Relic.
Try to use Opera. Right click -> Inspect element -> Profiler.
Look to Inspect element -> Errors.
I have a dynamic php (Yii framework based) site. User has to login to do anything on the site. I am trying to understand how caching and CDN work; and I am a bit confused.
Caching (memcache):
My site has a good amount of css, js, and images. I've been given to understand that enabling caching ("memcache"?) will GREATLY speed up my site. But this has me confused. How does caching help? I mean, how can you cache something that's coming out of DB for each user separately? For instance, user-1 logs-in, he sees his control panel. User-2 logs-in, user 2 will see their control panel.
How do I determine what to cache? Plus, how do I enable caching (memcaching)?
CDN:
I have been told to use a content delivery network like CloudFlare. It is suppose to automatically cache my site. So, when my user-1 logs in, what will it cache? Will it cache only the homepage CSS, JS, and homepage images? Because everything else requires login? What happens when user logs-out? I mean, do "sessions" interfere with working of a CDN?
Does serving up images via CDN reduce significant load on my server? I don't have much cash for getting a clustered-server configuration. So, I just want my (shared) server to be able to devote all its resources in processing PHP code. So, how much load can I save by using "caching" (something like memcache) and/or "CDN" (something like CloudFlare)?
Finally,
What would be general strategy to implement in this scenario for caching, cdn, and basic performance optimization? do I need to make any changes to my php-code to enable CDN like CloudFlare and to enable/implement/configure caching? What can I do that would take least amount of developer/coding time and will make my site run much much faster?
Oh wait, some of my pages like "about us" page etc. are going to be static html too. But they won't get as many hits. Except for maybe the iFrame page that will be used for my Facebook Page.
I actually work for CloudFlare & thought I would hop in to address some of the concerns.
"do I need to make any changes to my php-code to enable CDN like
CloudFlare and to enable/implement/configure caching? What can I do
that would take least amount of developer/coding time and will make my
site run much much faster?"
No, nothing like a need to re-write urls, etc. We automatically cache static content by file extension. This does require changing your DNS to point to us, however.
Does serving up images via CDN reduce significant load on my server?
Yes, and it should also help most visitors access the site faster and save you a fair amount on bandwidth.
"Oh wait, some of my pages like "about us" page etc. are going to be
static html too."
CloudFlare doesn't cache HTML by default. You use PageRules to setup more advanced caching options for things like static HTML.
Caching helps because instead of performing disk io for each user the data is stored in the memory, ie memcached. This provides a SIGNIFICANT increase in performance.
Memcache is generally used for cacheing data ie query results.
http://pureform.wordpress.com/2008/05/21/using-memcache-with-mysql-and-php/
There are lots of tutorials.
I have only ever used amazon s3 which is is not quite a cdn. It is more of a storage platform but still it helps to take the load off of my own servers when serving media.
I would put all of your static resources on a CDN so your own server would not have to serve these. It would not require any modifcation to your php code. This includes JS, and CSS.
For your static pages (your about page) I'd make sure that php isn't processing that since there is no reason for it. Your web server should serve it directly.
Cacheing will require changes to your code. For cacheing a normal flow is:
1) user makes a request
2) check if data is in cache
3) if it is not in cache do the DB query and put it in cache
4) if it is in cache retrieve it
5) return data.
You can cache anything that requires disk io and you should see a speed up.
Memcached works by storing database information (usually from a remote server or even a database engine on the same server) in a flat file format in the filesystem of the web server. Accessing a flat file directly to retrieve data stored in a regulated format is much much muuuuuch faster than accessing that data from a remote query each time. This is typically useful when you have data that can be safely stored for certain periods of time as it is not subject to regular changes.
The way this works is that if you want to store a user's account information in a cache to speed up loading pages where that user is logged in. You would load the information and cache it locally. On any subsequent requests for that data, it will load in a fraction of the time it normally would take to load that information from the database itself. Obviously you will need to make sure that you update/recache that information if the user changes it while logged in, but you will greatly reduce the time it takes to serve up pages if you implement a caching system that can minimize the time spent waiting on the database.
I'm personally not familiar with CloudFlare so I can't offer any advice to that effect, but in terms of implementing caching in your application, you should check out:
http://code.google.com/p/memcached/wiki/NewOverview
And read the rest of the Wiki entries there which cover installation/implementation/etc. That should get you started on the right track.
I'd like to create a button on my site that COMPLETELY clears my cache. As neither Safari's nor Chrome's features work at all it seems. Is this possible?
Not possible. That'd expose low-level functionality to public access. Even if an exploit would simply empty your cache, it'd still be undesireable. Firefox and Chrome both use shift-ctrl-del for this, so at the cost of actually having to use your keyboard, you can do the same thing without the security risk.
It sounds like you want to clear the cached data that sits between your server and your browser, not the data that the browser has cached. A copy of your resources mat be sitting on a machine between your client computer and your server, and that is returning the cached copy, instead of asking your server for the data again.
You should read up on different caching methods, so you can set up certain types of files to be cached for a certain amount of time etc. Try this for starters.
I usually set up static resources (css, js, etc.) to be cached for a long period of time, but I change the URL when I have made changes to it. I usually do this by rewriting the request url so /resources/dummy/file.css becomes /resources/file.css and I can change dummy whenever I want. This creates the allusion of a different file (that hasn't been cached yet) but I don't have to rename the file.
RewriteRule resources/[^/]+/([^/]+)$ resources/$1
Highly doubt this is possible moreover you want it to be cross-browser. It will create serious security flaw to be exploit.