Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClassCastException when using Combinators during shrinking #475

Closed
SimY4 opened this issue Apr 20, 2023 · 21 comments
Closed

ClassCastException when using Combinators during shrinking #475

SimY4 opened this issue Apr 20, 2023 · 21 comments

Comments

@SimY4
Copy link

SimY4 commented Apr 20, 2023

Testing Problem

ClassCastException when using Combinators and arbitraries producing elements of the type hierarchy.

java.lang.ClassCastException: class mypackage.MyTypeLike$Any cannot be cast to class mypackage.MyTypeLike (mypackage.MyTypeLike$Any and mypackage.MyTypeLike are in unnamed module of loader 'app')
	at net.jqwik.engine.properties.arbitraries.combinations.DefaultCombinator5.lambda$combineFunction$0(DefaultCombinator5.java:33)
	at net.jqwik.engine.properties.shrinking.CombinedShrinkable.createValue(CombinedShrinkable.java:25)
	at net.jqwik.engine.properties.shrinking.CombinedShrinkable.value(CombinedShrinkable.java:21)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
	at net.jqwik.engine.properties.shrinking.AbstractSampleShrinker.lambda$shrink$2(AbstractSampleShrinker.java:52)

Suggested Solution

Very brief analysis points to the lack of variance in functional combinator methods. i.e.:

<R> Arbitrary<R> flatAs(F5<T1, T2, T3, T4, T5, Arbitrary<@NotNull R>> flatCombinator)

<R> Arbitrary<R> as(Combinators.F5<T1, T2, T3, T4, T5, R> combinator)

Combinators.Combinator5<T1, T2, T3, T4, T5> filter(Combinators.F5<T1, T2, T3, T4, T5, Boolean> filter)

<R> Function<List<Object>, R> combineFunction(Combinators.F5<T1, T2, T3, T4, T5, R> combinator)

should probably be relaxed to:

<R> Arbitrary<R> flatAs(F5<? super T1, ? super T2, ? super T3, ? super T4, ? super T5, Arbitrary<? extends R>> flatCombinator)

<R> Arbitrary<R> as(Combinators.F5<? super T1, ? super T2, ? super T3, ? super T4, ? super T5, ? extends R> combinator)

Combinators.Combinator5<T1, T2, T3, T4, T5> filter(Combinators.F5<? super T1, ? super T2, ? super T3, ? super T4, ? super T5, Boolean> filter)

<R> Function<List<Object>, R> combineFunction(Combinators.F5<? super T1, ? super T2, ? super T3, ? super T4, ? super T5, ? extends R> combinator)
@jlink
Copy link
Collaborator

jlink commented Apr 20, 2023

@SimY4 Thanks for reporting. Can you add a short example reproducing the phenomenon?

First thought: Changing type signatures without changing the implementation shouldn’t be able to get rid of a CCE if it already compiled before. But I may be wrong.

@jlink
Copy link
Collaborator

jlink commented May 3, 2023

@SimY4 I couldn't reproduce it on my own. Waiting for your input then.

@SimY4
Copy link
Author

SimY4 commented Jun 21, 2023

@jlink Sorry, took me a while to pin it down. I'm still trying to create a minimal reproducible example for this bug. But my current understanding of the problem is:

I have an entities with identity equality defined like so:

abstract class Common {
  private final String value;
  protected Common(String value) { this.value = value; }
  public boolean equals(Object o) { return this == o || (o instanceof Common && value.equals(((Common) o).value)); }
  public int hashCode() { return value.hashCode(); }
}

class This extends Common { 
  static final This INSTANCE = new This();
  This() { super(""); }
}

class That extends Common { 
  static final That INSTANCE = new That();
  That() { super(""); }
}

This and That types are different but their identity is the same, therefore:

Arbitraries.just(This.INSTANCE) and Arbitraries.just(That.INSTANCE) identity is also the same. And any derivative from both of these arbitraries will have the same identity.

So, when it comes to memorization that's built-in inside Jqwik, this right here:

Tuple3<Arbitrary<?>, Integer, Boolean> key = Tuple.of(arbitrary, genSize, withEdgeCases);
RandomGenerator<?> generator = computeIfAbsent(
generatorStore().get(),
key,
ignore -> generatorSupplier.get()
);
return (RandomGenerator<U>) generator;

won't tell them apart. So one can be substituted with another. And that would break the runtime type checker.

@vlsi
Copy link
Contributor

vlsi commented Jun 21, 2023

Based on the sample above, I would say equals is not well-defined.

What is the use case for having different classes that are equal in that way?

Have you considered adding getClass() == o.getClass() check to the Common implementation, and, getClass().hashCode() to Common#hashCode?

PS. I wonder if it would make sense to avoid caching Arbitraries.just in jqwik. In other words, Arbitraries.just should be a trivial generator, so it might be faster to just return the generator instead of trying to create a cache key, lookup the value in the cache, and keep the cache in-memory.

@SimY4
Copy link
Author

SimY4 commented Jun 21, 2023

@vlsi you might be right. But discovering that my somewhat broken equality is being used as a cache key also came as a surprise to me since I wasn't expecting jqwik to do that. I'd probably consider Arbitraries.just and Arbitraries.of as potentially not giving stable equality to rely on.

But the problem when goes higher because:

record Product(Common p1, Common p2) {}

Combinators.combine(arbCommon(), arbCommon()).as(Product::new)

are also the same now

@jlink
Copy link
Collaborator

jlink commented Jun 21, 2023

Caching is done on a heuristical basis and requires reasonable equality implementation. In a way this is a hack, but I haven't found a better solution yet. If caching was done on an identity basis it wouldn't have the intended effect.

@SimY4
Copy link
Author

SimY4 commented Jun 21, 2023

@jlink is there a configuration option to disable it? Maybe I can add that in

@jlink
Copy link
Collaborator

jlink commented Jun 21, 2023

is there a configuration option to disable it? Maybe I can add that in

Nope. Convince me that it would be a good idea to make caching configurable.

The problem is that without generator caching, many non-trivial value generators will become VERY slow.
In other words, configuration would have to happen on a case by case basis.
Something like arbitrary.dontCacheGenerators().

@vlsi
Copy link
Contributor

vlsi commented Jun 21, 2023

Convince me that it would be a good idea to make caching configurable.

WDYT of making the cache configurable (e.g. to allow more or less memory for the caching)?
At the same time, configuring the cache to 0 would be a reasonable way of disabling the cache, which might be helpful to see if the caching introduces differences (e.g. if there are objects with weird equality), and it might be helpful for benchmarking.

In other words, configuration would have to happen on a case by case basis.
Something like arbitrary.dontCacheGenerators().

The key issue here is that users supply Arbitraries.just(weridThing), and they, well, reasonably expect the arbitrary would produce the exactly the same thing they passed in the constructor.

I doubt they will know when it is the right time to call .dontCacheGenerators(), so I would consider that as a workaround only :-/

Even though handling Arbitraries.just specially does not seem to resolve all the possible cases of using arbitraries based on objects with weird equality, the following makes sense for me:

  1. Avoid caching generators for Arbitraries.just. We probably need to test performance, however, I would expect that creating the generator every time or caching it into Arbitraries.just would be better from the perf point of view
  2. Use the supplied just value for equality only for well-known classes like java.lang.String, java.lang.Integer, java.lang.Class, enum, and so on. Treat all other objects as "unknown", so use identity for JustArbitrary#equals and hashCode. Then, if people build trees of Arbitrary.randomOf(Arbitrary.just(weirdThing), Arbitrary.just(weirdThing)), then the existing caching would work just fine as it won't trust just(weirdThing).

Of course, the caching would be sub-optimal, however, it might be a reasonable compromise between reasonably hard to support code, reasonably easy to use (less caching surprises), and reasonably fast to execute.

WDYT?

Combinators.combine(arbCommon(), arbCommon()).as(Product::new)
are also the same now

@SimY4 , would you clarify the issue? I don't get it :-/

@jlink
Copy link
Collaborator

jlink commented Jun 22, 2023

  1. Avoid caching generators for Arbitraries.just. We probably need to test performance, however, I would expect that creating the generator every time or caching it into Arbitraries.just would be better from the perf point of view

The simplest solution for this special case is to use identity hashing for just(..). This is a single-line change. But that doesn't solve the general problem. Already Arbitraries.create(..) would fail with a broken equals implementation.

Frankly, I'm not convinced it's time well spent to handle anomalies like that. There are probably dozens or hundreds of similar cases all over jqwik.

@SimY4
Copy link
Author

SimY4 commented Jun 22, 2023

Some afterthoughts that I had lately:
What's the effectiveness of this cache anyway? If I'm just by chance defining equality for some of my arbitrary instances.
What's the identity of Arbitrary<Function<A, A>>? I've discovered some shady reflection magic that tries to compute identity of a function by hashing its inner fields. But stateless functions will all resolve as identical.

@vlsi
Copy link
Contributor

vlsi commented Jun 22, 2023

What's the effectiveness of this cache anyway?

jqwik relies on the cache for cases like frequencyOf(List<Tuple2<Integer, Arbitrary<T>>> frequencies).
In order to prepare a generator for such arbitrary, jqwik needs to calculate cumulative upper bounds, so the generation could be like "draw a uniform random and binary search the value according to frequencies". The cache does work and it improves perf in my case (==if I disable the cache, the number of tries/second reduces).

See

private void calculateUpperBorders(List<Tuple.Tuple2<Integer, T>> frequencies) {
List<T> values = new ArrayList<>(frequencies.size());
// Zero-frequency elements are possible, so we start with a list, and transform to array later
List<Integer> upperBounds = new ArrayList<>(frequencies.size());
for (Tuple.Tuple2<Integer, T> tuple : frequencies) {
int frequency = tuple.get1();
if (frequency <= 0)
continue;
size = Math.addExact(size, frequency);
T value = tuple.get2();
values.add(value);
upperBounds.add(size);
}
valuesToChooseFrom = values;
this.upperBounds = upperBounds.stream().mapToInt(i -> i).toArray();
}
private T choose(int index) {
int i = Arrays.binarySearch(upperBounds, index);
if (i < 0) {
// Exact value not found => convert "negative" insertion point to the actual index
i = -(i + 1);
} else {
// Exact value is found, use the next bucket
// For instance, if weights are {2,3}, the upperBounds are {2,5},
// and the input indices could be 0..4.
// "index" {0, 1} should be mapped to the first element
// "index" {2, 3, 4} should be mapped to the second
// If (input) index==2, binarySearch returns 0, so we must advance i by one
// This advance is never triggered for the very last element since random.nextInt
// excludes upper bound.
i++;
}
return valuesToChooseFrom.get(i);

The preparation step has non-trivial overhead, so creating generators always does not sound good.

@jlink
Copy link
Collaborator

jlink commented Jun 22, 2023

I've discovered some shady reflection magic

I love that term :-). jqwik is 50% reflection magic. Some of it is shady.

So you're right. My counter-argument is: It (mostly) works for the use cases I've seen so far.

@SimY4
Copy link
Author

SimY4 commented Jun 22, 2023

@vlsi I'm sorry, I may be overstepping here, but I don't see the need for what you just described. Isn't frequency about weighing your arbitraries and randomly selecting one based on that? I can't think of a reason for you to explode and weigh individual items.

The frequency method you've shown is about frequencies of values and not arbitraries, but I'm too new to this codebase to maybe I'm not seeing what you're showing me. I still can't see how caching of arbitraries is useful. Are you trying to save on heap space?

@vlsi
Copy link
Contributor

vlsi commented Jun 22, 2023

I still can't see how caching of arbitraries is useful. Are you trying to save on heap space?

Weighted generators require preprocessing which is CPU-intensive. Caching enables running the pre-processing once rather than "for every generated item".

@jlink
Copy link
Collaborator

jlink commented Jun 22, 2023

I still can't see how caching of arbitraries is useful. Are you trying to save on heap space?

Generators are cached, not arbitraries - based on the arbitrary which creates them.

It's not about saving heap space but creation time. In addition to the example from @vlsi, there are many more. One prominent example is recursive arbitraries. Creating a new generator on each recursion step can be very time-consuming. Same holds for nested combinators. Without this kind of caching (or another approach to speed this up) quite a few use cases get out of reasonable performance bounds.

All that said, is there a reason that the supposedly broken implementation of equals() cannot be fixed?
Is there another issue hiding underneath that I'm missing?

@SimY4
Copy link
Author

SimY4 commented Jun 23, 2023

Just wanted to include this for future references, just in case we'd land on some outcomes from these discussions. I've fixed my equalities and hashcodes and now they aren't universal like they used to. Most of the CCE exceptions are gone but some persisted and they happen in a different place in jqwik:

java.lang.ClassCastException: class mypackage.MyType cannot be cast to class mypackage.MyOtherType (mypackage.MyType and mypackage.MyOtherType are in unnamed module of loader 'app')
	at net.jqwik.engine.properties.arbitraries.EdgeCasesSupport.lambda$filter$7(EdgeCasesSupport.java:111)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:178)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
	at net.jqwik.engine.properties.arbitraries.EdgeCasesSupport.filter(EdgeCasesSupport.java:113)
	at net.jqwik.engine.properties.arbitraries.ArbitraryFilter.edgeCases(ArbitraryFilter.java:38)
	at net.jqwik.api.arbitraries.ArbitraryDecorator.edgeCases(ArbitraryDecorator.java:62)
	at net.jqwik.engine.properties.arbitraries.ArbitraryFilter.edgeCases(ArbitraryFilter.java:38)
	at net.jqwik.api.arbitraries.ArbitraryDecorator.edgeCases(ArbitraryDecorator.java:62)
	at net.jqwik.engine.properties.arbitraries.ArbitraryFilter.edgeCases(ArbitraryFilter.java:38)
	at net.jqwik.api.arbitraries.ArbitraryDecorator.edgeCases(ArbitraryDecorator.java:62)
	at net.jqwik.engine.properties.arbitraries.ArbitraryFilter.edgeCases(ArbitraryFilter.java:38)
	at net.jqwik.api.arbitraries.ArbitraryDecorator.edgeCases(ArbitraryDecorator.java:62)
	at net.jqwik.engine.properties.arbitraries.ArbitraryFilter.edgeCases(ArbitraryFilter.java:38)
	at net.jqwik.api.arbitraries.ArbitraryDecorator.edgeCases(ArbitraryDecorator.java:62)
	at net.jqwik.engine.properties.arbitraries.ArbitraryFilter.edgeCases(ArbitraryFilter.java:38)
	at net.jqwik.engine.properties.RandomizedShrinkablesGenerator.lambda$resolveEdgeCases$3(RandomizedShrinkablesGenerator.java:126)

@SimY4
Copy link
Author

SimY4 commented Jun 23, 2023

Sorry, it was my bad with the last one.

@jlink
Copy link
Collaborator

jlink commented Jun 23, 2023

Sorry, it was my bad with the last one.

You mean it's not a jqwik issue?

@SimY4
Copy link
Author

SimY4 commented Jun 23, 2023

@jlink I can twist it that it is. 😀 The last one came from ArbitraryConfiguratorBase's contract. Basically, the contract says that I should define a class like this:

class Configurator extends ArbitraryConfiguratorBase {
  public Arbitrary<MyType> configure(Arbitrary<MyType> arb, MyAnnotation ann) { ... }
}

This looks like it a correct configurator. Except it isn't because you also shouldn't forget to overrideboolean acceptTargetType(TypeUsage) with:

MyType.class.isAssignableFrom(targetType.getRawType());

without it it can mess up all annotated arbitraries.

But it's not relevant to the original issue, so it's irrelevant to the initial issue with cache collisions.

@jlink
Copy link
Collaborator

jlink commented Jun 23, 2023

@jlink I can twist it that it is. 😀 The last one came from ArbitraryConfiguratorBase's contract. Basically, the contract says that I should define a class like this:

class Configurator extends ArbitraryConfiguratorBase {
  public Arbitrary<MyType> configure(Arbitrary<MyType> arb, MyAnnotation ann) { ... }
}

This looks like it a correct configurator. Except it isn't because you also shouldn't forget to overrideboolean acceptTargetType(TypeUsage) with:

MyType.class.isAssignableFrom(targetType.getRawType());

without it it can mess up all annotated arbitraries.

This is exactly what I discovered yesterday when looking into "your" other issue: #493

At the time of implementing ArbitraryConfiguratorBase I forgot (or was too lazy to) doing it correctly. Technical debt.

@SimY4 SimY4 closed this as completed Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants