Output Cache Improvements in Orchard 1.9

Background

The release of Orchard 1.9 is imminent (yes I know, it’s been “imminent” for about 5 months now, but this time it’s really imminent) and one of the contributions made by us at IDeliverable for this release is a major overhaul of the output cache logic. The modifications significantly alter the way the output cache operates, and so I wanted to describe these changes in depth once and for all so that folks out there have some place to turn to really understand how output caching works under the hood and how to best put it to use for their sites.

All ye TL;DR-type people be warned - this is a long and detailed post. Now let's dig in!

The previous output cache implementation (i.e. before 1.9) had one serious performance issue, which would typically rear its ugly head whenever the following conditions were met:

  • Your site is running with Orchard.OutputCache enabled (duh).
  • Some resource on your site (such as a page) is CPU and/or database-intensive to render.
  • The resource is output cached with a finite expiration time.
  • The resource is under heavy user load (or more technically, the average time between requests for the resource is considerably less than the time it takes to render the resource).

We discovered this issue while working with a client who experience an annual peak load during February and March. Luckily, they are proactive enough to also run annual performance tests in preparation of this peak period well in advance (a practice I would recommend anybody by the way) and it was during such load testing that we found their site kept crashing when load reached a certain level. The first symptom in their case was of course longer response times and abnormally high server CPU utilization. The second was that the ADO.NET connection pool was exhausted. The third was total denial of service.

After analyzing the issue and examining the code in Orchard.OutputCache, we realized this was happening due to the way the caching logic was designed. To understand the cause of the problems and the sequence of events leading up to an eventual crash, let’s look at a narrowed-down example involving a site with a fictitious page A:

  1. Page A contains a bunch of content and renders a bunch of menu widgets and projection widgets. Because it’s such a content-heavy page (and because Orchard is not the fastest crayon in the box) it is both CPU- and database-intensive, and takes 2 seconds to render from scratch, on an otherwise idle server.
  2. Our site is under heavy load, and page A is getting 10 requests per second.
  3. This is not a problem, because page A is currently in output cache, so the site is happily humming along and all requests are satisfied in a matter of milliseconds.
  4. Page A expires from the output cache.
  5. The next request for page A finds it missing from cache, and starts to generate it. This will take at least 2 seconds.
  6. 100 ms later the next request for page A comes in. It too finds it missing from cache, and starts to generate it. Now both requests are fighting for the same scarce CPU and database resources, and as a result, both will now likely take twice as long to complete. The estimated time until page A is once again in cache just went from 2 seconds to 4 seconds.
  7. More and more requests for page A are received, and things get worse with every one. The more requests start rendering page A, the longer they all take to complete because they compete for the same CPU and database resources, and as a result even more requests have time to arrive and start doing the same thing. Call it a feedback loop, spiraling effect, snowball effect – “a dear child has many names” as we say in Swedish – the point is, the problem exacerbates itself and it only gets worse from here.
  8. In the best case, at least one of the requests eventually succeed in rendering page A and put it back into the cache, the ones that fail do so in a graceful manner, and your site recovers (at least until the next time page A expires from the cache). In the worst case, your site crashes and burns.

Naturally, once I realized the problem I set out to fix it.

Possible Solutions

For an output caching solution to work reliably under the above conditions, there are three basic measures one might employ:

  1. Prevent multiple concurrent requests for the same resource from rendering that resource in parallel. Instead, let only the first request render the resource and make any subsequent requests block and wait until the rendering is finished. This is the most effective remedy, and for all practical intents and purposes solves the problem.
  2. Introduce a “grace time” between the expiration of a resource and the actual removal of that resource from cache. If an expired version of the resource still exists in cache, instead of blocking subsequent requests, simply serve them the stale content. This goes one step further and also improves the response time for those requests that would otherwise be blocked waiting for the first one to render the resource. It also shortens the request queue on the web server, and uses less concurrent threads. In our example scenario above, this would mean 20 less requests waiting while the resource is being rendered.
  3. Proactively “pre-cache” resources and refresh them before they have a chance to expire.

One or more of these strategies can be found in most professional-grade caching solutions, such as nginx or Varnish.

In my opinion, combining methods #1 and #2 is the best way to go for Orchard. They are relatively simple to implement because they both act within the context of existing requests, and they remove almost 100% of the problem. #3 is more complex and introduces new moving parts to the system (there needs to be some background task which renders resources independent of any incoming user requests). Additionally, for #3 alone to be effective, there needs to be a warmup period during which all resources are pre-cached before the server can start accepting incoming requests, otherwise the same problem will arise if this happens during heavy load. And besides, the only advantage that #3 brings over the other two, is that it gives a faster response time for that one guy who happens to be the first one to request an expired resource. Hardly a game-changer.

So, for Orchard 1.9 I decided (after getting approval from the steering committee of course) to implement the first two measures.

Implementation in Orchard

Considerations

When designing the new logic in Orchard, I had to consider a few challenges:

  • The storage mechanism for output cache (like most things in Orchard) is extensible and provider based. Depending on the underlying storage provider, expiration and eviction from the cache is likely done by the cache itself (by specifying an expiration policy when adding the item to the cache) rather than actively by Orchard. Therefore, in order to be able serve stale content, an item must be considered expired by Orchard (and regeneration begin) before that item actually expires on the storage level.
  • Serving from cache is done in one method, while adding rendered content to the cache is done in another. This makes it difficult to reliably hold a lock for the duration of the request (no using statements or try/finally blocks are possible). This must be carefully considered – what if the request fails in such a way that the second part never executes? That could easily lead to deadlocks if we’re not careful.
  • The time it takes to render a given item is unknown. Any introduction of pre-fetch or grace times will therefore be arbitrary. Too wide a margin and cached content will be re-rendered too often - too narrow a margin and a number of requests will have to block. Ideally the time spans involved need to be configurable in output cache settings, and whether rendering takes longer or shorter than expected, it all needs to be handled gracefully.
  • Orchard is often deployed in web farms. The cache storage might be distributed/synchronized across farm nodes, but .NET thread synchronization primitives most certainly are not. Therefore, we must either use database transactions for cross-farm synchronizations, or let each node act independently in terms of caching logic and simply accept that multiple farm nodes will most likely race to render the same content. I decided that the latter was a completely acceptable trade-off, and should be considered a benign race condition.

New Configuration

To account for the fact that rendering time might vary from one resource to another, and make grace time configurable in the output cache settings page, I added both a default grace time setting and ability to override it per individual route, just like for the duration:

Screenshot of cache settings user interface

As you’ll notice, you can leave the grace time column per route empty to fall back on the configured default grace time, or you may specify 0 to disable grace time altogether for that route. I’ll provide some guidance later in this post as to what you should consider when configuring these values.

To account for the fact that items are most often expired/evicted by the cache itself, there are now two datetime properties associated with a cache item:

  • ValidUntilUtc specifies the time at which the item is considered expired by Orchard. The first request for the item after this time will be tasked with regenerating it and refreshing the cache. This property is calculated as the time when the item is stored in the cache plus the configured duration for the item.
  • StoredUntilUtc specifies the time at which the item will be actually removed from the cache. This property is calculated as the ValidUntilUtc property value plus the configured grace time for the item. This is the value that is actually specified to the underlying cache storage as the expiration time; in the default storage implementation (which uses the ASP.NET cache) the cache will remove the item at this time.

Both these two values can now be seen in the Statistics tab in the output cache settings page:

Screenshot of cache statistics user interface

New Caching Logic

Based on these new configuration and storage values, the new output caching logic performs synchronization of concurrent requests for the same resource, as well as serving stale content during the configured grace time for a given resource. Let’s take a look at how the algorithm works.

The output caching logic resides in the Orchard.OutputCache.Filters.OutputCacheFilter class in the Orchard.OutputCache module. This class is both an IActionFilter and an IResultFilter in ASP.NET MVC parlance. For the purposes of output caching, the filter does its magic in the OnActionExecuting() and OnResultExecuted() methods, respectively. This separation of logic between two methods, each executing on separate ends of the request, is what requires some extra care when managing locks.

Let’s use two diagrams to illustrate how these two methods operate, starting with the OnActionExecuting() method:

Diagram illustrating the OnActionExecuting() method logic

Some things to note:

  • Brown items on the diagram indicate beginning and end of the request.
  • The filter maintains a ConcurrentDictionary containing cache keys and lock objects for each cache key. These lock objects are used to synchronize concurrent requests for each cacheable item individually. Orange items on the diagram indicate critical sections, i.e. sections of the logic during which the lock for a given cache key is held by the current request.
  • The “request allowed for cache?” step performs a number of checks to ensure the request is eligible for output caching. If not, then the whole output cache logic is bypassed and the request executes as would it without output cache enabled. These checks include:
    • respecting any OutputCacheAttribute on the controllers and actions invoked
    • Not caching any POST requests
    • Not caching any admin dashboard requests
    • Not caching child actions
    • Not caching requests that are configured in output cache settings to not be cache
  • The “compute cache key” step determines a unique cache key for the resource. This key includes not only the resource identifier, but also things like the tenant name, action parameters, configured query string parameters, culture, request headers, and whether the request is authenticated or not.
  • If the cache key is found in the cache:
    • The filter checks to see if it has expired (i.e. the ValidUntilUtc value of the item has passed). If this is the case, the filter can assume the item is in its grace period (if it was passed its grace period it would have been automatically evicted from the cache and we would not have found it there in the first place). If the item hasn’t expired yet, simply send it to the client and short-circuit the rest of the request.
    • If the item has expired (i.e. is in its grace period) the filter checks to see if the lock for the cache key can be acquired. If not, then some other request is already in the process of regenerating the content, so simply send the stale cached content to the client and short-circuit the rest of the request.
    • If the lock could be acquired, the filter sets up a capture of the response and executes the rest of the request.
  • If the cache key is not found in the cache:
    • The filter tries to acquire the lock for the cache key, with a timeout of 20 seconds. This is the mechanism that causes the request to block and wait if rendering of the requested resource is already in progress, and no stale content exists. If the lock cannot be acquired within the timeout, the request is executed with output caching completely bypassed. The timeout is there primarily as a fail-safe against the theoretical possibility that some request fails to release the lock. If this happens, at least the site can continue to operate normally for requests to all other resources instead of building an infinite queue of blocking requests for the offending one.
    • If the lock could be acquired successfully, the filter rechecks the cache to see if the item is now in the cache (which would be the case if we waited for another request to render the item). If it is, the lock is released and the cached item is sent to the client.
    • If the item is still not in cache, the filter sets up a capture of the response and executes the rest of the request.
  • There are some additional twists and turns in the implementation that I have omitted from the diagram for clarity, such as the fact that a “hard refresh” from the client forces regeneration of an item regardless of its current cache status.

Now let’s look at a similar diagram of the OnResultExecuted() method to illustrate what happens after the request has been executed:

Diagram illustrating the OnResultExecuted() method logic

Here’s what happens:

  • Depending on what happened in OnActionExecuting() the thread may or may not hold the cache key lock at this point. The diagram assumes the former, which is why the relevant items are in orange.
  • If the response was captured (because a capture was set up in OnActionExecuting()) then the filter first checks if the response is allowed to be cached. If not, then some cache control headers are included in the response to prevent caching on proxy servers etc. This check includes:
    • not caching responses with an HTTP result code other than 200
    • not caching routes which are configured to not be cached
    • not caching the response if the request created any notification messages
  • If the response was deemed eligible for caching, it is written to the cache.
  • If the cache key lock is held by the current thread, it is released.
  • Finally, the response is sent to the client, and the request ends.

Result

These modifications mean dramatically improved scalability characteristics for Orchard sites.

After completing the implementation, we once again put that same client’s site through a gruesome round of load testing. We expected significant improvements, but quite frankly we were baffled by the result. The vendor that carries out the performance testing literally ran out of load agent capacity before we observed any noticeable impact on the Orchard site in terms of response times, CPU utilization or database query intensity.

For the most part, the site was now just happily humming along, effortlessly serving all content from cache. Once in a while, as expected, a small increase in resource use could be observed as a confirmation that some piece of content expired from cache and was being regenerated.

It’s no great mystery: the combination of blocking and grace time means that at any given time, no matter how short the expiration time of your content and no matter how many users are concurrently hammering your site, at most one of them will ever be rendering a given piece of content on your site. The rest are either waiting idle in the worst case, or served stale cached content in the best case.

Other Improvements

Aside from a lot of cleanup and refactoring of the output cache code, and a bunch of settings UI usability improvements, I also seized the opportunity to introduce a couple of other small functional improvements to the output cache module. Let’s take a look at them in this screenshot:

Screenshot of new authenticated request options

The labels and hints should be pretty self-explanatory. You now have the option to cache not only anonymous but also authenticated requests. You also have the option to cache different versions of rendered resources depending on whether the request was authenticated or not. This is useful on sites where pages do not contain any personal information for logged-in users, but where the rendered markup differs depending on whether the user is logged in or not.

Configuration Recommendations

So with these two configuration options (duration and grace time) now at your disposal, how should you configure them?

Well, I'll give you some recommendations based on my personal experience and preferences, but don't take them as absolute truth because all Orchard sites are different and YMMV - you need to consider the nature of your content and test the performance characteristics of your sites to make good determinations!

Let’s start with duration. This one comes down to a trade-off between how expensive your content is to render, how volatile it is (i.e. how frequently it changes) and how important it is that clients see an up-to-date version of the content. If your content is extremely static and extremely expensive to render, consider setting the duration to a very high value, such as 43,200 seconds (12 hours). If your content changes frequently or is very fast to render, consider setting the duration to a very low value, such as 30 seconds or even 15 seconds. If your content is expensive to render and changes frequently, you’re going to have to apply your judgment and make a reasonable trade-off. One good approach here is to run load tests, which can give you an indication of where the sweet spot is.

Grace time, on the other hand, comes down to how long you think it is acceptable to serve a stale (expired) version of your content. Most often this is proportional to the acceptable duration, but not always. Paradoxically though, the higher your user load is, the less likely it is that any stale cache item will remain in the cache for very long, because the next user will soon be along to request it and cause it to be regenerated, and the lower your user load is, the less useful the grace time becomes in the first place because blocking is less likely to happen anyway. As a general rule-of-thumb, if your content changes frequently then set your grace time to half the duration, otherwise if your content is highly static then set your grace time to double the duration.

Now, I realize not all your content shares the same characteristics. Unfortunately there's no way (yet) in Orchard to configure these things based on anything other than the route, which means in practice you have to pick one set of values for all your content so you're just going to have to find a reasonable compromise that works well for the majority of your content. Ideally I think the configuration ought be more granular and based on composition, so that you would be able to specify values on content types, content items, layers, widgets etc, and have them all result in a calculated duration and grace time for the final rendered page depending on which parts contributed to it. Who knows, maybe some day we'll take a stab at building such a configuration system into Orchard - if you or your company would find that valuable and are interested in co-funding such an effort, do get in touch.

The default values for a new Orchard installation is a duration of 300 seconds and a grace time of 60 seconds.

That’s it. If you made it this far, I’m impressed – you must really care about output caching! ;) And indeed you should! I had tons of fun working on these improvements, and I’m excited to see what kinds of results folks are going to see in terms of performance and scalability now that it goes into the wild and production sites start getting upgraded to Orchard 1.9. We sincerely hope this work will benefit other Orchard users out there as much as it has our clients.

23 comments

  • ... Posted by Sebastien Posted 03/08/2015 05:08 PM

    Open-source at its best ! Thanks Daniel.

  • ... Posted by spoissant Posted 03/10/2015 01:59 PM [http://koneka.com/]

    Amazing job, thanks for sharing all the information as well. I'm really looking forward to 1.9!

    Food for thoughts: For authenticated requests caching, I would probably look into adding the user's Role(s) to the cache key. I know for sure we have some pages that will contain slightly different information based on the user's Permissions.

  • ... Posted by Xeevis Posted 03/10/2015 06:15 PM

    Amazing job on the module and blog post. I've took it for a spin and got into troubles updating one of my modules to it. Deal is that it's also using OnActionExecuting() to apply response filter on resulting html, but it looks like it doesn't have a chance to run before cache is generated. Any idea what I can do to have my filter run before cache is stored?

    https://jadexhtmlmarkupminifier.codeplex.com/SourceControl/latest#Filters/HtmlFilter.cs

  • ... Posted by jtkech Posted 03/11/2015 12:13 AM

    Daniel, impressive and very informative

    I think there is still a little issue with child actions because of the parent and child handlers execution order. Child action needs to be tested in OnActionExecuting() as already done, but also at the beginning of OnResultExecuted(). See workitem 20879 on CodePlex

    Xeevis, Did you try this: http://www.orchardpros.net/tickets/4695

    Best

  • ... Posted by Xeevis Posted 03/11/2015 01:55 AM

    Oh this is embarrassing, I didn't check if someone replied to that question after Bertrand, didn't receive any mail notifications :(. From quick testing, your solution appears to be working perfectly. I finally get to update my favorite module yay! Thanks a lot jktech.

  • ... Posted by Daniel Stolt Posted 03/11/2015 09:02 AM (Author)

    spoissant: Yes this is a good idea. In 1.9 it is also now easier to add arbitrary things to the cache keys yourself by inheriting from OutputCacheFilter and overriding the GetCacheKeyParameters() method, like so:

    protected override IDictionary<string, object> GetCacheKeyParameters(ActionExecutingContext filterContext)
    {
        var result = base.GetCacheKeyParameters(filterContext);
    
        // Include dashboard access and hostname in the cache key.
        result["dashboard"] = m_Authorizer.Authorize(StandardPermissions.AccessAdminPanel).ToString();
        result["hostname"] = _workContextAccessor.GetContext().HttpContext.Request.Url.Host;
    
        return result;
    }
    

    Post-1.9 we have decided to make this provider-based so that anything from any module can participate in adding things to the cache keys, not just an inherited and substituted version of the filter itself. The inheritance/overriding approach was really just a quick fix to enable scenarios where you absolutely need to vary by additional things.

  • ... Posted by Daniel Stolt Posted 03/11/2015 09:10 AM (Author)

    Xeevis: IFilterProvider execution order is supposed to be based on dependencies, so if your module has a dependency on Orchard.OutputCache then your filter should run first. Not sure if it's related, but there was a fix recently regarding this execution order:

    Commit: 93abfc0acac6bdca195e41a3456feed7ef32965d [93abfc0]
    Parents: 1407ff7fce
    Author: Katsuyuki Ohmuro <harmony7@pex2.jp>
    Date: den 5 mars 2015 22:55:23
    Committer: Zoltán Lehóczky <lehoczky_zoltan@pyrocenter.hu>
    Labels: origin/1.8.2-int
    #21218: Fixing IFilterProvider execution order
    
    Work Item: 21218
    

    jtkech: I'll look into the issue with child actions, thanks for pointing it out.

  • ... Posted by Xeevis Posted 03/11/2015 12:14 PM

    Daniel: Indeed dependency also does the trick. Awesome! Thanks

  • ... Posted by Tapas Pal Posted 03/12/2015 06:04 AM [http://tapas-pal.blogspot.com/]

    Hello, This query is regarding Orchard CMS which is an open source enterprise community software. From Orchard project portal I got the link of your blog - http://www.orchardproject.net/about.

    One of our enterprise The customer has unstructured data spread across discrete location in the form of documents(.doc, .pdf, .ppt) that needs to be migrated.

    This will be have to be stored on ShareDrive and Orchard needs to interface with it.

    The points we are specifically interested are..

    a) Scalability of the software and whether it has any in build database that can store documents. Currently our customer has 15000 documents? b) What all operation can be performed using Orchard? c) Whether there are any enterprise that deployed this software? d) Do you have any enterprise support team ? e) Is Orchard is a CMS for individuals & small organizations ?

    This is bit urgent and we will appreciate if you can help us on this..

    Thanks and Regards, Tapas

  • ... Posted by dpomt Posted 03/12/2015 08:50 AM

    Great Job and post, looking forward to upgrade my sites to 1.9.
    Do your improvements also fix the compatibility issue for certain Content types (https://orchard.codeplex.com/workitem/20881)?

    Thanks and regards

  • ... Posted by Daniel Stolt Posted 03/12/2015 09:06 AM (Author)

    dpomt: It does not, but I'm trying to get that done.

  • ... Posted by dpomt Posted 03/12/2015 04:37 PM

    Daniel, that would be awesome. Thanks.

  • ... Posted by spoissant Posted 03/16/2015 01:16 AM [http://koneka.com]

    Thx for the reply Daniel, that will definitely do the trick!

  • ... Posted by Matthias Posted 03/16/2015 11:11 AM [http://www.nuboserv.com/]

    We have a low traffic web site. Therefore we are using the Warmup Module that is calling manually defined URLs periodically. I configured the cache period for 30 minutes and defined the warmup module to call the site every 15 minutes. If I understand the article correctly, this would be what you considered in method number #3 in possible solutions. Maybe it is the best to use the warmup module in 1.9 too (for our scenario). I can see the problem that the user asking for the site has wait too long if the sites do not get cached proactively . For a normal site I have encountered waiting periods of a few seconds. I expect the site to load within 300-1000 ms. By optimizing the caching period and using the Warmup module I was happy with the performance.

  • ... Posted by Dave Gardner Posted 03/16/2015 11:16 AM [http://cascadepixels.com.au]

    Do I understand correctly that,on a single server (not a farm), the first request to trigger regeneration of a page will be delayed until the page is regenerated, while subsequent requests that arrive during the grace period will instantly receive the old cached page?

    Thank you so much for documenting this. I hope other orchard developers now feel they have a new standard to meet!

  • ... Posted by Daniel Stolt Posted 03/16/2015 12:58 PM (Author)

    Matthias: Yes, certainly you can combine the Warmup module with output caching if it works well for your scenario!

    Dave: Yes, that is correct. However, if there is no stale content to serve for any reason (was never generated before, zero grace time configured, etc) then both the first and the subsequent requests will block.

  • ... Posted by Arman Posted 03/16/2015 06:15 PM

    Daniel, Thank you so much for this awesome article.

  • ... Posted by Derek Gabriel Posted 03/17/2015 04:43 AM [http://www.ignitetheday.com]

    Great work, thanks!

  • ... Posted by Chris Payne Posted 03/20/2015 05:04 PM

    Nice work!

    I've had a situation recently where I needed to vary the output cache key based on various factors (in this case, which layers were active).

    I forked Orchard and did the work to allow for module developers to create a provider to append to the cache key, but I have an issue with my Codeplex account that prevents me from pushing up to my fork :(

  • ... Posted by jtkech Posted 03/20/2015 10:52 PM

    I didn't know dependencies order, thanks

    After a quick test on the last version, here my finding

    If a CustomModule depends on OutputCache

    OnActionExecuting() of OutputCache run first!
        And OnActionExecuting() of CustomModule run after
        But OnResultExecuted() of CustomModule run first
    And OnResultExecuted() of OutputCache run last
    

    Best

  • ... Posted by Daniel Stolt Posted 03/21/2015 11:36 AM (Author)

    Arman and Derek: Thanks, much appreciated!
    Chris: Feel free to send me that code offline and I'll see if I can use it.

  • ... Posted by Darius Posted 06/28/2015 09:37 AM

    Amazing job, thank you for doing this! PS. nice markdown editor you have here! Where did you get it?

  • ... Posted by Daniel Stolt Posted 07/01/2015 11:06 PM (Author)

    Darius: Thanks! I actually built it myself last week. :) CodeMirror + implemented my own Markdown manipulation logic on top using CodeMirror APIs. I am considering putting the source and some documentation up on GitHub, because I actually couldn't find anything similar so seems like maybe it would be useful to many.

Leave a comment