Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure orderable types as input of array_sort when it is used as a mitigation function #6999

Closed
duanmeng opened this issue Oct 11, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@duanmeng
Copy link
Collaborator

duanmeng commented Oct 11, 2023

Description

In AggregationFuzzerTest order-dependent functions may depend on array_sort to transform/sort the results to a value that can be verified. The array_sort is supposed only to support orderable input types (see #6928), and the current fuzzer framework could only respect the function signature constraints when it is a 'test' function. In this scenario, the array_sort is not a 'test' function but a verify helper function, so its input would be the result of the actual test function, which may be an unorderable type (e.g. MapType).

https://github.com/facebookincubator/velox/blob/main/velox/exec/tests/AggregationFuzzerTest.cpp#L64-L94

// Functions whose results verification should be skipped. These can be
  // order-dependent functions whose results depend on the order of input rows,
  // or functions that return complex-typed results containing floating-point
  // fields. For some functions, the result can be transformed to a value that
  // can be verified. If such transformation exists, it can be specified to be
  // used for results verification. If no transformation is specified, results
  // are not verified.
  static const std::unordered_map<std::string, std::string>
      customVerificationFunctions = {
       ...
          {"set_agg", "array_sort({})"},
          {"set_union", "array_sort({})"},
          {"map_agg", "array_sort(map_keys({}))"},
          {"map_union", "array_sort(map_keys({}))"},
          {"map_union_sum", "array_sort(map_keys({}))"},

https://github.com/facebookincubator/velox/blob/main/velox/exec/tests/utils/AggregationFuzzer.cpp#L771-L789

if (customVerification) {
          // Add optional projection on the original result to make it order
          // independent for comparison.
          auto mitigation = customVerificationFunctions_.at(signature.name);
          if (!mitigation.empty()) {
            projections = groupingKeys;
            projections.push_back(fmt::format(fmt::runtime(mitigation), "a0"));
          }
        }

        auto input = generateInputData(argNames, argTypes);

        verifyAggregation(
            groupingKeys,
            {call},
            masks,
            input,
            customVerification,
            projections);

cc @mbasmanova @xiaoxmeng

@duanmeng
Copy link
Collaborator Author

Proposal solution

  • Ignore the mitigation, when the mitigation contains array_sort and the input types contains unordered type,
 bool ignoreMitigation = false;
 if (mitigation.find("array_sort") != std::string::npos) {
     ignoreMitigation = std::any_of(argTypes.cbegin(), argTypes.cend(),
                  [](const TypePtr& typePtr) {return !typePtr->isOrderable();});
  • Add a special version of array_sort without orderable input constraints

@mbasmanova
Copy link
Contributor

@duanmeng Thank you for reporting this problem. Ignoring verification is undesirable. Using an internal version of array_sort that can sort any type might be better.

@duanmeng
Copy link
Collaborator Author

@duanmeng Thank you for reporting this problem. Ignoring verification is undesirable. Using an internal version of array_sort that can sort any type might be better.

@mbasmanova Got it, will add it in PR #6928, after its preceding PRs #6998 and #6950 merged.

@duanmeng
Copy link
Collaborator Author

Close this issue as #6928 landed.

Note that this change reduces coverage in the AggregationFuzzer. Before this change, the Fuzzer was able to verify results against DuckDB (since it supports array_sort function), but now it cannot. We are going to face same problem with using Presto as the reference query runner. A solution could be to change the Fuzzer to apply post-aggregation projections using Velox only (once to Velox results, once to results from reference DB), then compare. This is a follow-up though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants