Bulk Evaluation Design and Goals #31

erka · 2024-08-29T11:54:55Z

The discussion has been initiated here[1].

It would be helpful to have more detailed documentation for the bulk evaluation endpoint, as the current documentation may be open to different interpretations.

From the original discussion, it seems the design primarily focuses on static context evaluation only. However, this concept isn’t thoroughly explained in the documentation[2] or the OpenAPI specification, which can cause confusion - especially for those working with highly dynamic evaluation contexts in backend services, cron jobs, or batch processing. They may view bulk evaluation as a way to reduce network costs compared to multiple single evaluation calls and could end up using OFREP incorrectly due to this lack of clarity.

Additionally, the ADRS 0004-no-flag-filter-in-bulk-evaluation-request.md[3] mentions that systems that rely on a list of flags may use an evaluation context with a flags attribute, but it doesn’t clarify how the system should behave if flags attribute is missing in the evaluation context. In our case, Flipt initially assumed that an error should be returned, since we want a list of flags to be provided. However, the original issue[1] suggests that the flags attribute should be treated as an optional filter. It would be beneficial to address this in the specification as well.

Lastly, it would be valuable to understand the design for insights with bulk evaluation. The evaluation of everything at once introduces additional complexities for both the provider and client in managing the lifecycle of flags. If all flags are evaluated each time on the provider’s side, there’s no indication of which flags are active or stale. Similarly, the protocol does not provide any endpoint where client could share how many times each flag has been evaluated. As a result, both the provider and client are left to make assumptions about what's happening within the system. This is one of aspects why Flipt would like to have flags in evaluation context for bulk evaluation.

[1] flipt-io/flipt#3412
[2] https://github.com/open-feature/protocol/blob/main/service/adrs/0002-two-evaluation-endpoints.md
[3] https://github.com/open-feature/protocol/blob/main/service/adrs/0004-no-flag-filter-in-bulk-evaluation-request.md

The text was updated successfully, but these errors were encountered:

toddbaert · 2024-08-29T19:01:06Z

I see the problem and why a solution is desirable.

I think a better place for the data you describe would be a query param though... either as a repeated array, or as some kind of identifier for a flags set, or as a filter expression. My reasoning:

the context is a specific field that describes the subject of flag evaluation; this data doesn't really qualify as that
query params are easily parsed (unlike the body) so they can easily be (re)written by intervening infra in more advanced architectures, so that this or of thing could be controlled by infra, or cached
query params aren't stripped as headers often are

markphelps · 2024-08-30T01:12:45Z

👋🏻 I think from the Flipt side (ha) we would like more clarification as to the expected behavior if a user does not provide a flags list. As @erka mentioned, our current implementation requires it because wanted evaluation to still be performant on the server side for bulk eval, and doing an evaluation for every flag if an organization has hundreds or more would not be the most performant.

Sorry if this is too provider (Flipt) specific, but I think to @erka 's point, the current spec is a bit too open to interpretation.

Currently we return an error, which I think is what @lukas-reining ran into when testing Flipt OFREP with the ofrep web provider. I think because of this error Flipt bulk eval is not technically compliant.

From our side we could do one of a few things:

don't return an error if no flags are specified, but instead return an empty result set. This will have implications for the ofrep-web provider though as in the current state it would still not function correctly if using Flipt as the provider
perform an evaluation for all flags, not requiring a flags filter, but optionally allowing it (we still need to determine the best way for this filter to be provided, either in the context like we are doing now, or in a query param as @toddbaert mentioned, this would be great to have in the spec as well). Doing so would require a bit of re-work on our side as our current evaluation logic (which we reuse for ofrep) does not work this way.. but that's our implementation detail and issue to figure out
continue with the behavior we have today, but document it in our documentation that we are not fully OFREP bulk eval compliant

From the OFREP side, I can also think of an option that would potentially also help the clients:

Perhaps we could use the existing /ofrep/v1/configuration endpoint for the OFREP server to specify that the bulk evaluation endpoint requires specifying a flags filter, similar to how the support types works for single eval #21

This would at least allow the ofrep-web client to error or return a more helpful response, or maybe behave differently but not caching the response and using for later evals? in the case that Flipt or other providers are not able to do bulk evaluation without specifying flags to be considered.

lukas-reining · 2024-09-03T20:03:09Z

Sorry for the delay @erka @markphelps.

From the original discussion, it seems the design primarily focuses on static context evaluation only. However, this concept isn’t thoroughly explained in the documentation[2] or the OpenAPI specification, which can cause confusion - especially for those working with highly dynamic evaluation contexts in backend services, cron jobs, or batch processing.

Currently we return an error, which I think is what @lukas-reining ran into when testing Flipt OFREP with the ofrep web provider. I think because of this error Flipt bulk eval is not technically compliant.

Yes, this is why it is not compliant to "our idea" of this endpoint.
As we discussed, we have to document the intention behind this endpoint more thoroughly.
You are right @erka that the static context evaluation is the main purpose of the bulk eval endpoint, while it still may be valuable for batch jobs or something similar as you said.

Lastly, it would be valuable to understand the design for insights with bulk evaluation. The evaluation of everything at once introduces additional complexities for both the provider and client in managing the lifecycle of flags. If all flags are evaluated each time on the provider’s side, there’s no indication of which flags are active or stale. Similarly, the protocol does not provide any endpoint where client could share how many times each flag has been evaluated. As a result, both the provider and client are left to make assumptions about what's happening within the system. This is one of aspects why Flipt would like to have flags in evaluation context for bulk evaluation.

I totally see the problem of observability and metrics gathering here.
As we said, maybe OFREP could offer some kind of tracking here to allow tracking in the case of cached evaluation and also general tracking of evaluations. Not sure how for this is from tracking in general @toddbaert?
Maybe we can come up with something here @open-feature/spec-evaluation-maintainers, as I can think of this being a problem that comes up for many services.

Perhaps we could use the existing /ofrep/v1/configuration endpoint for the OFREP server to specify that the bulk evaluation endpoint requires specifying a flags filter, similar to how the support types works for single eval #21

@markphelps I think this option could only be paired with some kind of specified filter, because otherwise OFREP implementations would tend to fail being static context compliant without the user being able to fix it.
In this case I would like the flags or filter to be specified so that at least consumers of the OFREP web providers can adapt to that condition and still use these services, even if they require a given parameter (without they might fail).

The greatest leaver to service compliance could be an endpoint that returns the rule set instead of the evaluated flags.
This is a pattern that many services follow but the problem about it would be finding a description language as often discussed like CEL or JsonLogic. This would reduce server compute but maybe give us PII problems like hashing PII to still be GDPR compliant.

I would like to discuss the pros and cons of server side filtered evaluation against returning the full ruleset and specifying the format of ruleset.

beeme1mr assigned thomaspoignant Aug 29, 2024

lukas-reining assigned Kavindu-Dodan and lukas-reining Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk Evaluation Design and Goals #31

Bulk Evaluation Design and Goals #31

erka commented Aug 29, 2024

toddbaert commented Aug 29, 2024 •

edited

Loading

markphelps commented Aug 30, 2024 •

edited

Loading

lukas-reining commented Sep 3, 2024 •

edited

Loading

Bulk Evaluation Design and Goals #31

Bulk Evaluation Design and Goals #31

Comments

erka commented Aug 29, 2024

toddbaert commented Aug 29, 2024 • edited Loading

markphelps commented Aug 30, 2024 • edited Loading

lukas-reining commented Sep 3, 2024 • edited Loading

toddbaert commented Aug 29, 2024 •

edited

Loading

markphelps commented Aug 30, 2024 •

edited

Loading

lukas-reining commented Sep 3, 2024 •

edited

Loading