Here is what I found after investigating cache behavior in Magento.
I'm not sure of this and asking for correction.
When something like a product is modified, cache entries such as "HTML Block" becomes "invalidated", resulting on being ignored and not used in frontend. This makes sense because these data is now outdated.
It remains "invalidated" until manually "refreshed" through admin area.
Once manually "refreshed", the first render of a cached block will construct its cached copy and append it to this HTML Block cache reserve. Subsequent render operation of this block will find this cache usable, and use it, finally, until cache becomes "invalidated" over again.
Why this process is called "refresh", as it should be something like "reset"? because "refresh" would means it generate updated cache snapshot, but instead it merely allows cache entries to be constructed.
Why don't invalidated data become refreshed once it's invalidated?
This makes me question my conclusion, was I correct?
why this process is called "refresh", as it should be like "reset"
Your general take on this is correct -- some people call it "refresh" because although the action you take resets the cache, in a working Magento system the cache will almost immediately rebuild itself the next time you (or another user) loads the page.
Why don't invalidated data become refreshed once it's invalidated?
When the cache is invalidated, that means the developer working on whatever backend feature invalidated the cache was smart enough to know their actions required a cache refresh, but that that programatic cache control wasn't sufficient to refresh only their portion of the changed cache.
For example, certain blocks might render a change in a product's price, which means any blocks with the price cached need to be refreshed. However, as a backend programmer, there's no way to know which blocks need that invalidation, nor know which cache system (block cache, FPC, varnish) they're stored in. There's also a question of store performance -- if you're editing 100 products, do you want Magento to rebuild the cache 100 times during peak traffic hours? So, instead of deciding how to handle all that, the developer marks the cache as invalidated. This allows the cache system to take whatever action it deems necessary.
In a perfect theoretical cache system, there would be automated processes running that would detect an invalidated cache, and know what to do and when to refresh it. That's a complex system to implement and maintain, so instead Magento chose to simply notify the store owner's of the invalidated cache, and let them take whatever action they deemed appropriate.
Magento cache refreshing should happen when data important to the user is modified by default. E.g. order data, shipping info etc.
This is the behavior I observed during my years writing extensions for the software. You can can manually disable this behavior, but as it stands by default , dynamic data should be punching "holes" through the cache.
Related
we run a Magento shop with Varnish. Everything works fine except the following problem: If you open a shop page and then leave the browser open for a very long time, say 12 - 24 hours, and then reload the page, the page loads very slow (about 15s).
We located the problem in the start_session() call in app/code/core/Mage/Core/Model/Session/Abstract.php. This call takes about 15s.
We use memcache (not memcached) for session management.
We googled a lot, and found many posts about slow session starts, but none about this particular issue.
Can anobody help?
Many thanks in advance,
Tilman
I've seen this quite a lot on New Relic as well.
From what I've seen there are a few different causes, I don't have a complete understanding of this issue but it is something I've been looking into recently. Here's my findings.
Sessions in Magento, Locking, and New Relic
Every controller action in Magento uses the session, whether it needs to or not. The session is eagerly instantiated in Mage_Core_Controller_Varien_Action::preDispatch
If you have session locking enabled, this means that for the duration of the request your session is locked down until the request completes. I haven't found the bit of code that releases the session lock yet, but I'm pretty sure it's in there somewhere.
Ultimately this means if you fire off multiple concurrent requests to Magento controller actions from the one location using the same session, you will have to wait for some of those requests to complete and unlock the session to proceed. I usually see this as a slow transaction on new relic stuck at Mage_Core_Model_Session_Abstract_Varien::start for ~30 seconds (my session lock wait timeout I think).
This report on New Relic has multiple downsides as I see it
Slows down the total average response time, because these requests are slower than they otherwise should have been.
New Relic records a sample of the slowest transactions, if I have performance bottlenecks that take for example 20 seconds New Relic will not report them automatically for me if the same URL is plagued by session locking timeouts. The timeouts are hiding the useful data.
Causes
I've seen a few common causes for this, not a definitive list by any means
Bots
Crawlers like Baidu and Yandex being a being a bit rude and battering the website. They're being run from one location firing off numerous requests, using the same session, and tripping up the session locking mechanism, hence showing slow transactions in New Relic.
Ajax calls to Magento controller actions
With varnished websites customer specific data must be loaded with care, some websites manage this by using ajax calls to the Magento backend to get the required data. I have also seen some websites using ajax calls to the backend to get product specific information, such as the amount left in stock when an item is on sale.
If a single page triggers multiple ajax calls to the backend on page load, it can potentially trigger the session locking mechanism. The more ajax calls back to the Magento backend the more likely you are to experience locking.
Varnish ESI
The same as above really, except instead of using ajax calls it uses Edge Side Includes which seem to be new calls to the backend.
My Plan
I have not actioned this yet so it's still purely theoretical, but it's something i'm looking into doing over the next few months.
I brought this problem up during the Mage Titans UK 2016 conference and Fabrizio Branca pointed me towards the following module: https://github.com/AOEpeople/Aoe_BlackHoleSession.
Based on a regular expression the module will prevent Bots from creating real sessions, this should have the benefit that no session lock will be hit, and that your session resources won't be battered by rude bots. Bots should no longer pollute your New Relic readings.
For ajax/ESI calls to get customer data there on cached pages there's nothing you can do that I can see. You need access to the session in order to retrieve customer specific data.
However, for ajax/ESI calls to get catalog specific data (such as limited stock) I don't see any need for a session to exist on that request at all. My plan for the future is to trial out an extension to the Aoe_BlackHoleSession module so that I can silo off requests to a specific URL as being sessionless.
I'm less familiar with the internals of ESI, so sadly I don't have too much to comment there.
An alternative
During the conference Fabrizio Branca said he was able to disable session locking completely without any ill effects, test at your own risk.
We have 2x pfSense FW's in HA, behind that, 2x Zen Load Balancers in Master/Slave Cluster, behind those, 3x Front End web stack servers running NGinx, PHP-FPM, PHP-APC. In that same network segment, there are 2x MySQL DB Servers in Master/Slave replication.
PHP sessions on the front ends should be "cleaned up" after 1440 seconds:
session.gc_maxlifetime = 1440
.
Cookies are expired when the users browser closes:
session.cookie_lifetime = 0
Today, we were alerted by an end user that they logged in (PHP based login form on the website), but were authenticated as a completely different user. This is inconvenient to say the least.
The ZLB's are set to use Hash: Sticky Client. They should stick users to a single Front End (FE) for the duration of their session. The only reason I can think of this happening is that two of the FE's generated the same PHP Session ID, and then somehow the user was unlucky enough to be directed to that other FE by the LB's.
My questions are plentiful, but for now, I only have a few:
Could I perhaps set a different SESSID name per front end server? Would this stop the FE's generating session ID's that were the same? This would at least then result in the user getting logged out rather than logged in again as a different user!
We sync the site data using lsyncd and a whole bunch of inotifywatch processes, but we do not sync the /var/lib/php directories that contain the sessions. I deliberately didn't do this... I'm now thinking perhaps I should be syncing that. lsyncd will be able to duplicate session files across all 3 front ends within about 10seconds of the sessions being modified. Good idea as a temporary fix?
Lastly, I know full well that the client should be using the DB to store sessions. This would completely eradicate it being able to duplicate the session ID's. But right now, they are unwilling to prioritise that in the development time-line.
Ideas very much welcome as I'm struggling to see an easy way out, even as a temporary measure. I cant let another client get logged in as a different user. It's a massive no-no.
Thanks!!
Judging by your question you are somewhat confused by the problem - and its not clear exactly what problem you are trying to fix.
Today, we were alerted by an end user that they logged in (PHP based login form on the website), but were authenticated as a completely different user
There's potentially several things happening here.
Cookies are expired when the users browser closes:
Not so. Depending on how the browser is configured, most will retain session cookies across restarts. Since this is controlled at the client, its not something you can do much about.
PHP sessions on the front ends should be "cleaned up" after 1440 seconds
The magic word here is "after" - garbage collection is triggered on a random basis. Session files can persist for much longer and the default handler will happily retrieve and unserialize session data after the TTL has expired.
Do you control the application code? (if not, your post is off-topic here). If so, then its possible you have session fixation and hijack vulnerabilities in your code (but that's based on the description provided by the user - which is typically imprecise and misleading).
Its also possible that content is being cached somewhere in the stack inappropriately.
You didn't say if the site is running on HTTP, HTTPS or mixed, and if HTTPS is involved, where the SSL is terminated. These are key to understanding where the issue may have arisen.
Your next steps are to ensure that:
you have logout functionality in your code which destroys the session data and changes the session id
that you change the session id on authentication
That your session based scripts are returning appropriate caching information (including a Varies: Cookie header)
It is highly improbable that 2 systems would generate the same session id around the same time.
Really you want to get away from using sticky sessions. It's not hard.
You've got 2 layers at your front end that are adding no functional or performance value, and since you are using sticky sessions, effectively no capacity or resillience value!!! Whoever sold you this is laughing all the way to the bank.
I'm using a Symfony 2 to generate my pages from data in a MySQL database. For most content, users have to be authenticated but the content itself does not change often and does not need to be customized for the users. So what's a good caching strategy for avoiding database calls while still maintaining the auth check?
Simply put, use Memcache to cache the SQL result-set for extended period of time.
Maybe this be too huge change, but the following scheme may be useful in the case:
Create several sets of pages, one for not-yet-authed users (let's put in the site root), and others for authenticated users that should see the same content (say, it two or more should see the same content when they are authenticated, then we'll create only one set for all of them), and put it into directory under root. Then form simple .htaccess/.htpasswd files for each of such 'for-authed-only' directory and then it'll be webserver's problem not your script.
Hope you got the idea. It is fuzzy to say, but will be easy to implement.
Example: say you care to allow only authenticated users to see page '/topsecret.html' on the site. Create dir (/authed), establish HTTP-auth on it, and put your topsecret.html into the dir (so it'll be '/authed/topsecret.html'). Now edit '/topsecret.html' and simple replace it's main content with 'sorry, please authenticate yourself' link that'll point to '/authed/topsecret.html'.
If you use Symfony2, you are using Doctrine2
if you use Doctrine2, caching should be enabled by default.
Choose your cache driver for your purposes and there should be no problem.
You might also be specifically interested in query result caching.
Do not use Doctrine without a metadata and query cache! Doctrine is
highly optimized for working with caches. The main parts in Doctrine
that are optimized for caching are the metadata mapping information
with the metadata cache and the DQL to SQL conversions with the query
cache. These 2 caches require only an absolute minimum of memory yet
they heavily improve the runtime performance of Doctrine. The
recommended cache driver to use with Doctrine is APC. APC provides you
with an opcode-cache (which is highly recommended anyway) and a very
fast in-memory cache storage that you can use for the metadata and
query caches
I solved this by using Zend_Cache inside the cacheable actions to store the rendered template result. I then create a new Response object from the cached content. If the cache is empty, I generate the content.
I thought of creating a plugin that checks for an annotation and stores the Response output automatically but it turned out that I only have 3-4 display actions that are cacheable and have very complex cache ID creation rules, so I put the caching logic directly into the controller code.
It appears that you have a lot of options for caching with symfony http://www.symfony-project.org/book/1_2/12-Caching (not for 2 but my guess is not a lot has changed).
You could put your heavy sql statements in its own script and turn caching on for that script
list:
enabled: on
with_layout: false # Default value
lifetime: 86400 # Default value
Further if you are sure that the generated tag won't change for a while you could use symfony to tell the user's browser not even to bother your server for the content which will cause the page to load nearly instananeously for the user.
$this->getResponse()->addCacheControlHttpHeader('max_age=1200'); // in seconds - less than 1 year seconds
Just make sure your max age is small enough that when something changes (say a code update) that the user doesn't get stuck with old page since there is no way to force them to request that page again short of changing the url.
What I'm trying to do is render either a partial or a fragment in Symfony from the cache (the easy part) but if the cache does not exist, then I want Symfony to (instead of recreating the cache) render nothing.
My website pulls data from multiple other websites, which can insanely slow down page rendering speed, so instead of loading the info from other websites on the initial page load, I plan on doing it once the initial page is finished loading and a user clicks the appropriate button, then caching the data for later. However, if the data is cached (from a previous request) then I would rather dump the cached data right into the initial page load.
I tried to clarify it as much as possible, so hopefully it makes sense.
i think you could handle this with a filter and the getViewCacheManager()
So I have a PHP CodeIgniter webapp and am trying to decide whether to incorporate caching.
Please bear with me on this one, since I'll happily admit I don't fully understand caching!
So the first user loads up a page of user submitted-content. It takes 0.8 seconds (processing) to load it 'slow'. The next user then loads up that same page, it takes 0.1 seconds to load it 'fast' from cache.
The third user loads it up, also taking 0.1 seconds execution time. This user decides to comment on the page.
The fourth user loads it up 2 minutes later but doesn't see the third user's comment, because there's still another 50 minutes left before the cache expires
What do you do in this situation? Is it worth incorporating caching on pages like this?
The reason I'd like to use caching is because I ran some tests. Without caching, my page took an average of 0.7864 seconds execution time. With caching, it took an average of 0.0138 seconds. That's an improvement of 5599%!
I understand it's still only a matter of milliseconds, but even so...
Jack
You want a better cache.
Typically, you should never reach your cache's timeout. Instead, some user-driven action will invalidate the cache.
So if you have a scenario like this:
Joe loads the page for the first time (ever). There is no cache, so it takes a while, but the result is cached along the way.
Mary loads the page, and it loads quickly, from the cache.
Mary adds a comment. The comment is added to the database (or whatever), and the software invalidates the cache
Pete comes along and loads the page, the cache is invalid, so it takes a second to render the page, and the result is cached (as a valid cache entry)
Sam comes along, page loads fast
Jenny comes along, page loads fast.
I'm not a CodeIgniter guy, so I'm not sure what that framework will do for you, but the above is generally what should happen. Your application should have enough smarts built-in to invalidate cache entries when data gets written that requires cache invalidation.
Try CI's query caching instead. The page is still rendered every time but the DB results are cached... and they can be deleted using native CI functionality (i.e no third party libraries).
While CI offers only page level caching, without invalidation I used to handle this issue somewhat differently. The simplest way to handle this problem was to load all the heavy content from the cache, while the comments where loaded via a non cacheable Ajax calls.
Or you might look into custom plugins which solve this, like the one you pointed out earlier.
It all comes down to the granularity at which you want to control the cache. For simple things like blogs, loading comments via external ajax calls (on demand - as in user explicitly requests the comments) is the best approach.