Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ArchCondition evaluation #876

Merged
merged 12 commits into from
Jun 25, 2022
Merged

Improve ArchCondition evaluation #876

merged 12 commits into from
Jun 25, 2022

Conversation

codecholeric
Copy link
Collaborator

At the moment evaluating ArchConditions is quite memory inefficient. E.g. for a rule like noClasses().should().accessField(Foo.class, "bar") we create a ConditionEvent for every single field access and qualify it with "allowed" or "violation". Thus, this is independent of some access actually being a violation but done for every access. Furthermore, all such events of all evaluated objects will be kept in memory until the test failure is reported. This will keep in the former case events for all field accesses of all classes in memory until the final failure (or success) is reported.

The background for this behavior is the complex evaluation logic of nested and inverted conditions. E.g. noClasses() will invert the events and thus make the events that were "allowed" before to be violations. To make matters worse we have combinations like accessField(..).orShould()... and aggregations like noClasses().should().haveAny... where haveAny... will be satisfied if there is any occurrence that supports the condition, but by negating it via noClasses() we effectively change the evaluation to "all occurrences must NOT satisfy the condition". So it is quite challenging to determine at which point the information about "allowed" events is finally obsolete, since we can arbitrarily aggregate and invert.

I tried to come up with a lazy solution, where we would use some Supplier/Consumer approach, but failed to come up with one that a) makes up a good public API for users, b) works for all those cases of aggregation including customized behavior by users and c) really saves substantial amounts of memory. So in the end I gave up on that and focused on optimizing the current approach.

This PR will address two points: a) improve the memory footprint of ArchCondition evaluation and b) improve the public API to make future optimizations easier and the code more maintainable (now is a very good opportunity since we are about to release a new major version, so we can break the public API a little bit).

Performance Aspect

As for point a) the main optimization here is to throw out intermediate copying of events, and once we know we are done with event evaluation (i.e. we are now top-level about to create the result) to dump all references to now obsolete satisfied events. This had the following positive effect when testing noClasses().should().accessField(System.class, "out") on a big code base on my machine:

Before

Heap allocation

image

Performance

Evaluating the rule could be done 89 times in 60 seconds.

After

Heap allocation

image

Performance

Evaluating the rule could be done 135 times in 60 seconds.

Result

So altogether we see a smoother memory allocation with less GCs and a performance improvement of about 50% when evaluating such conditions. Thus, I consider the change worthwile.

API Aspect

Before the public API included a final class ConditionEvents, this is now an interface to allow different implementations to be used internally. Also the API of this interface has been stripped of all non-essential methods, like getAllowed() which would return the allowed events and force us to always keep a reference to satisfied events as long as we hold ConditionEvents. I tried to strip the API down to really just the essentials, e.g. containViolation() or getViolating() and remove all methods that don't make sense in all contexts.

@codecholeric codecholeric requested a review from hankem June 3, 2022 10:19
@codecholeric codecholeric added this to the 1.0.0 milestone Jun 18, 2022
The example was a little confusing since it looks like `areSubtypeOf` would be a predicate from the ArchUnit standard library, while it in fact was meant as a custom predicate. I have now replaced this by a predicate that really exists and made the example a little bit more realistic.

Signed-off-by: Peter Gafert <[email protected]>
To make `ConditionEvents` easier to maintain we should slim down the interface as much as possible. Users should rely on `EvaluationResult` to retrieve textual descriptions of the violations, not on `ConditionEvents`.

Signed-off-by: Peter Gafert <[email protected]>
To make `ConditionEvents` easier to maintain we should slim down the interface as much as possible. Users should rely on `EvaluationResult` to handle specific violations, not on `ConditionEvents`.

Signed-off-by: Peter Gafert <[email protected]>
For a Java 7 language level the API `handleViolations(ViolationHandler<T> handler)` was quite fitting, since there was a good chance that this would be an anonymous instantiation capturing the reified type `T`. However, with Java 8 language level the most straight forward API would accept a lambda, and in that case we would typically lose that information.
We will add a hack to ensure that the type parameter of `ViolationHandler<T>` will always be reified by adding an unused varargs array of that type to the method. While this is not very pretty, it is the only way I found to really force the type parameter to be reified and not get erased. Since the usage of the API is then quite convenient (e.g. it is possible to just write `handleViolations((Collection<JavaAccess<?>> items, String message) -> ...)` and the items will automatically be filtered for `JavaAccess`) I decided to still go the way of this "hack" with some proper documentation.

Signed-off-by: Peter Gafert <[email protected]>
We want to get rid of tracking the allowed events to save memory. A precondition for this is to remove `isEmpty()` which depends on both sets of events.

Signed-off-by: Peter Gafert <[email protected]>
This way it is easier to maintain it in the future since we can freely adapt / optimize the implementation.

Signed-off-by: Peter Gafert <[email protected]>
There is no need to store the collected events in a separate list just to invert them afterwards. Instead, we can invert them on the fly and add them to the original events.

Signed-off-by: Peter Gafert <[email protected]>
@codecholeric codecholeric force-pushed the improve-conditions branch 2 times, most recently from ba20156 to 300ccef Compare June 25, 2022 10:31
We do not need to hold the whole `ConditionEvents`, but only the violating events. Thus, after constructing the `EvaluationResult` the `ConditionEvents` can be garbage collected.

Signed-off-by: Peter Gafert <[email protected]>
This is an overly broad interface for our use case. In particular since we also have `getViolating()` which is the set we are always ultimately interested in. The allowed events are more of a current implementation detail that we need for the inversion logic, but the goal should be to remove this from the public interface eventually.

Signed-off-by: Peter Gafert <[email protected]>
Having this inside of `ArchCondition` is a little inconsistent, since all the other conditions, including `never` or `containsOnly`, reside inside the subpackage `conditions`. Also, these conditions can then not share package private classes with the other conditions.

Signed-off-by: Peter Gafert <[email protected]>
We do not want to track the allowed events in cases where we know we do not need them (e.g. when we know no inverting will happen anymore). This will improve the memory footprint since the garbage collector can clean those up once all necessary transformations are through.

Signed-off-by: Peter Gafert <[email protected]>
I do not see the advantage of encapsulating adding the event inside the `ConditionEvent`. On the other hand it provides a more flexible API if it is possible to simply retrieve the inverted event from a `ConditionEvent` and then decide as the caller what to do with it.

Signed-off-by: Peter Gafert <[email protected]>
@codecholeric codecholeric merged commit 19643f5 into main Jun 25, 2022
@codecholeric codecholeric deleted the improve-conditions branch June 25, 2022 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants