Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WW-5355 Integrate W-TinyLfu cache and use by default #766

Merged
merged 18 commits into from
Oct 18, 2023
Merged

Conversation

kusalk
Copy link
Member

@kusalk kusalk commented Oct 13, 2023

WW-5355

Current state

Struts currently caches the following:

  • compiled OGNL expressions
  • BeanInfo objects

There are currently 2 cache options:

  • Basic (OgnlDefaultCache.java)
  • LRU (OgnlLRUCache.java)

By default, both caches will use the 'basic' option.

However, they can be configured to use the 'LRU' option using the following configuration:

  • struts.ognl.expressionCacheLRUMode=true
  • struts.ognl.beanInfoCacheLRUMode=true

Motivation

The 'basic' cache option flushes the cache when its max size is reached. This is highly ineffective in the face of malicious requests.

The 'LRU' option has a better eviction policy but is backed by a synchronized LinkedHashMap which hampers raw performance greatly.

Instead I propose Caffeine, which "is a high performance, near optimal caching library" that implements the W-TinyLfu algorithm.

Some raw read/write benchmarks for WTLFU as implemented in Caffeine can be found here.

Please also find this paper that evaluated the performance of WTLFU, including against malicious requests (thanks @ben-manes):

Changes introduced

In this PR I am introducing a 3rd cache option:

  • WTLFU (OgnlCaffeineCache.java)

The previous configuration constants were booleans and thus only allowed for 2 options. To accomodate the 3rd option, I've replaced them with the following configuration:

  • struts.ognl.expressionCacheType=basic|lru|wtlfu
  • struts.ognl.beanInfoCacheType=basic|lru|wtlfu

Additionally, I've made both caches default to 'WTLFU' in default.properties.

Note that the 'basic' cache option will still be used to bootstrap the container. This allows consumers who have configured an alternative cache type to exclude the caffeine dependency without any issues.

Original PR description

I see very limited advantage to using the default cache over the LRU one? The eviction process is faster but at the cost of having to repopulate the whole cache from scratch!

Keen to hear other thoughts or anything I may have missed

Perhaps in the future we can move to a more advanced algorithm such as the ones provided by Caffeine, although I'm not sure how effective they'd be given the cache can be easily attacked.

@kusalk
Copy link
Member Author

kusalk commented Oct 13, 2023

@JCgH4164838Gh792C124B5

@@ -705,29 +710,31 @@ public void copy(final Object from, final Object to, final Map<String, Object> c
}

for (PropertyDescriptor fromPd : fromPds) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to see here, just made this code more digestible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me.

@@ -30,10 +30,10 @@
public class OgnlDefaultCache<Key, Value> implements OgnlCache<Key, Value> {

private final ConcurrentHashMap<Key, Value> ognlCache;
private final AtomicInteger cacheEvictionLimit = new AtomicInteger(25000);
private final AtomicInteger cacheEvictionLimit = new AtomicInteger();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These init values are inconsequential as far as I can tell as they are immediately overridden by the constructor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting a positive non-zero value for the initializer was to avoid the instance ever having a cacheEvictionLimit of zero (which "new AtomicInteger()" would). Also, if someone built a custom cache implementation extending OgnlDefaultCache with an overridden constructor that did not call super(), a (hopefully) reasonable nonzero initial value would be ensured.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overriding is a fair point - I'll move the initialisation of the whole field into the constructor

@ben-manes
Copy link

fwiw, this would change from a concurrent cache to a synchronized one. It looks like the original code replaced an unbounded concurrent map. The performance difference may be unacceptable of explain the concern without benchmarks/profiling.

Caffeine is very resilient to attacks, like hash flooding and being scan resilient. An independent analysis might be of interest, An evaluation of cache management policies under workloads with malicious requests.

@kusalk
Copy link
Member Author

kusalk commented Oct 14, 2023

Good call out @ben-manes, I completely glossed over the change from concurrent map to synchronised map (and its performance implications).

And thank you for that analysis on resistance to attacks.

In that case I may just go ahead and integrate Caffeine as it seems rather straightforward anyway.

Copy link
Contributor

@JCgH4164838Gh792C124B5 JCgH4164838Gh792C124B5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kusalk. Thanks for reaching out to invite comments on the potential cache implementation changes. 😄

Switching to LRU mode by default might be a bit risky performance-wise, until someone benchmarks performance with it on/off.

In terms of potentially integrating Caffeine to handle caching, that sounds interesting. Maybe making it another option (with a wrapper implementing the OgnlCache interface) would be one way to do so ?

In that way, I think that might keep the library dependency as optional, depending on what cache implementation was chosen via configuration (even if a Caffeine-based cache was to become the default cache implementation for the framework). Others might have different thoughts on the matter ...

Regards.

@@ -30,10 +30,10 @@
public class OgnlDefaultCache<Key, Value> implements OgnlCache<Key, Value> {

private final ConcurrentHashMap<Key, Value> ognlCache;
private final AtomicInteger cacheEvictionLimit = new AtomicInteger(25000);
private final AtomicInteger cacheEvictionLimit = new AtomicInteger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting a positive non-zero value for the initializer was to avoid the instance ever having a cacheEvictionLimit of zero (which "new AtomicInteger()" would). Also, if someone built a custom cache implementation extending OgnlDefaultCache with an overridden constructor that did not call super(), a (hopefully) reasonable nonzero initial value would be ensured.

@@ -36,15 +36,15 @@
public class OgnlLRUCache<Key, Value> implements OgnlCache<Key, Value> {

private final Map<Key, Value> ognlLRUCache;
private final AtomicInteger cacheEvictionLimit = new AtomicInteger(2500);
private final AtomicInteger cacheEvictionLimit = new AtomicInteger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OgnlDefaultCache comment above would apply here as well.

if (from == null || to == null) {
LOG.warn("Attempting to copy from or to a null source. This is illegal and is bein skipped. This may be due to an error in an OGNL expression, action chaining, or some other event.");
LOG.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the wording of that log warning has been invariant as far back as the history for the file goes. Does changing it by reducing the wording detail (other than maybe fixing the "is bein" typo) make it less clear ?

I am unsure of how frequently this log statement might get triggered, but adding "new RuntimeException()" as a parameter would (I think ?) result in a stack-trace being output each time. That might potentially create a lot of log clutter (a trade-off, if the stack-trace gives better context) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this method is only called by ChainingInterceptor within Struts, and from what I can tell, these parameters will never be null. Although it is also a public method. Calling this method with null values is improper usage and the stack trace will provide better context for such an issue to be resolved. The previous warning message wasn't very helpful.

@@ -705,29 +710,31 @@ public void copy(final Object from, final Object to, final Map<String, Object> c
}

for (PropertyDescriptor fromPd : fromPds) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me.

@@ -243,25 +243,25 @@ struts.ognl.enableExpressionCache=true
### For expressionCacheLRUMode true, the limit will ensure the cache does not exceed
### that size, dropping the oldest (least-recently-used) expressions to add new ones.
### NOTE: If not set, the default is 25000, which may be excessive.
# struts.ognl.expressionCacheMaxSize=1000
struts.ognl.expressionCacheMaxSize=10000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an (uncommented) initial value it to be used/set in default.properties, maybe it should match the programmatic defaults in the factory and cache instances ?

Whether 10000 (or 25000) is a reasonable default might not be determined without some benchmarks.

This comment would apply to any of the numeric size choices in the properties file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that it would be better to have the defaults in the code and properties file be consistent (as well as the accompanying comments in the properties file). I'll amend this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up defining the defaults in the properties file and removing defaults from the code - except for required values when bootstrapping the container. (And except for a deprecated constructor which is awaiting deletion)


### Indicates if the OGNL expressionCache should use LRU mode.
### NOTE: When true, make sure to set the expressionCacheMaxSize to a reasonable value
### for your application. Otherwise the default limit will never (practically) be reached.
# struts.ognl.expressionCacheLRUMode=false
struts.ognl.expressionCacheLRUMode=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ben-manes already pointed out potential implications in making the LRU mode cache active by default (switching from a concurrent to synchronized cache). It might be safer to keep LRU mode caching off by default, at least until someone can confirm if performance of the LRU mode cache is acceptable under load (benchmarking).


### Specify a limit to the number of entries in the OGNL beanInfoCache.
### For the standard beanInfoCache mode, when the limit is exceeded the entire cache's
### content will be cleared (can help prevent memory leaks).
### For beanInfoCacheLRUMode true, the limit will ensure the cache does not exceed
### that size, dropping the oldest (least-recently-used) expressions to add new ones.
### NOTE: If not set, the default is 25000, which may be excessive.
# struts.ognl.beanInfoCacheMaxSize=1000
struts.ognl.beanInfoCacheMaxSize=5000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above comment for numeric defaults applies here as well.


### Indicates if the OGNL beanInfoCache should use LRU mode.
### NOTE: When true, make sure to set the beanInfoCacheMaxSize to a reasonable value
### for your application. Otherwise the default limit will never (practically) be reached.
# struts.ognl.beanInfoCacheLRUMode=false
struts.ognl.beanInfoCacheLRUMode=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above comment on LRU mode defaults applies here as well.

@kusalk
Copy link
Member Author

kusalk commented Oct 15, 2023

Thank you for the well considered review @JCgH4164838Gh792C124B5

Yep I no longer intend to make LRU (backed by a synchronised LinkedHashMap) the default given the performance implications. And yes I intend to keep the cache implementation configurable, even with the implementation of Caffeine.

I think choosing a default cache size will always be a challenge as it is subject to multiple application-specific factors so the best we can do here is to encourage developers to adjust the cache size as needed.

@kusalk kusalk changed the title WW-5355 Use LRU cache by default WW-5355 Use W-TinyLfu cache by default Oct 15, 2023
@kusalk
Copy link
Member Author

kusalk commented Oct 15, 2023

I found that previously, struts.ognl.expressionCacheLRUMode=true was not reliable due to setter injection not guaranteeing the order of operations. I've thus refactored the code to use constructor injection.

@@ -125,6 +127,24 @@
*/
public class DefaultConfiguration implements Configuration {

public static final Map<String, Object> BOOTSTRAP_CONSTANTS;
Copy link
Member Author

@kusalk kusalk Oct 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted bootstrap constants into a public static final field for reuse by the default Struts configuration provider and any test code as needed. Adding new required constants was previously a pain

* @param <Value> The type for the cache value entries
*/
public class DefaultOgnlCacheFactory<Key, Value> implements OgnlCacheFactory<Key, Value> {

private final AtomicBoolean useLRUCache = new AtomicBoolean(false);
private final AtomicInteger cacheMaxSize = new AtomicInteger(25000);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this factory class thread-safe seemed unnecessary

@@ -81,7 +81,9 @@ public int getEvictionLimit() {

@Override
public void setEvictionLimit(int cacheEvictionLimit) {
if (cacheEvictionLimit < size()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this bug

@@ -143,12 +143,18 @@ protected void setEnableExpressionCache(String cache) {
enableExpressionCache = BooleanUtils.toBoolean(cache);
}

@Inject(value = StrutsConstants.STRUTS_OGNL_EXPRESSION_CACHE_MAXSIZE, required = false)
Copy link
Member Author

@kusalk kusalk Oct 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't make much sense to inject this both in the factory and here. Plus, Caffeine doesn't support changing the max size after construction (and neither does LRU really), so we'd have to flush/reconstruct the cache to continue supporting this method. I can't see much use for changing cache max size mid-life anyway so I've just gone ahead and deprecated it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache.policy().eviction().ifPresent(eviction -> {
  eviction.setMaximum(2 * eviction.getMaximum());
});

The Policy api contains various methods for pragmatic needs, while keeping the core apis simple and focused on the common needs.

@kusalk
Copy link
Member Author

kusalk commented Oct 15, 2023

I've taken care to maintain full API compatibility given that these classes are extension points and thus public API

Comment on lines 58 to 60
if (cache.getIfPresent(key) == null) {
cache.put(key, value);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache.asMap().putIfAbsent(key, value)


@Override
public int size() {
return Math.toIntExact(cache.estimatedSize());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache.asMap().size()

@kusalk
Copy link
Member Author

kusalk commented Oct 16, 2023

Thanks @ben-manes, amended!

Comment on lines 66 to 68
CONCURRENT_BASIC,
SYNC_LINKED_LRU,
CAFFEINE_WTLFU
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these different implementations of cache layer? Shouldn't this go through the extension point?

Copy link
Member Author

@kusalk kusalk Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering the same thing - I was only maintaining the design introduced in #528.

I simply replaced struts.ognl.expressionCacheLRUMode=true|false with struts.ognl.expressionCacheType=basic|lru|wtlfu

Let me look into refactoring this, it will likely be a breaking change though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah given the breaking nature, I'm more inclined to take on this refactor as part of 7.0, all good for me to create a follow-up card for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That would make more sense and then we can remove all the properties and leave that to a given implementation. Please create the ticket :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@lukaszlenart lukaszlenart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you adjust the ticket's description because right now I'm a bit confused what's the scope of the ticket.

@sonarcloud
Copy link

sonarcloud bot commented Oct 17, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 13 Code Smells

90.1% 90.1% Coverage
0.0% 0.0% Duplication

@kusalk
Copy link
Member Author

kusalk commented Oct 17, 2023

@lukaszlenart I've updated the PR description :)

@kusalk kusalk changed the title WW-5355 Use W-TinyLfu cache by default WW-5355 Integrate W-TinyLfu cache and use by default Oct 17, 2023
@kusalk kusalk merged commit cae627f into master Oct 18, 2023
10 checks passed
@kusalk kusalk deleted the WW-5355-cache-lru branch October 18, 2023 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants