Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HyperLogLog type registration in registerApproxDistinctAggregates #10664

Closed

Conversation

pramodsatya
Copy link
Collaborator

@pramodsatya pramodsatya commented Aug 5, 2024

Prefix is not needed to register custom type HyperLogLog in ApproxDistinctAggregates.
registerHyperLogLogType is used instead to make registration of customType HyperLogLog
uniform across velox.

Use of prefix in registerCustomType here resulted in the following error while running tests
in presto PR:

Reason: Type doesn't exist: 'HYPERLOGLOG'
Retriable: False
Function: validateBaseTypeAndCollectTypeParams
File: /Users/pramod/Desktop/velox/velox/expression/FunctionSignature.cpp
Line: 125

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2024
Copy link

netlify bot commented Aug 5, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit da531cf
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66b31c6fa091270008b81e3f

@aditi-pandit
Copy link
Collaborator

@pramodsatya : Your fix seems reasonable.

What did you do to encounter this error ? Please can you elaborate.

Reason: Type doesn't exist: 'HYPERLOGLOG'
Retriable: False
Function: validateBaseTypeAndCollectTypeParams
File: /Users/pramod/Desktop/velox/velox/expression/FunctionSignature.cpp
Line: 125

Its best if there is a Presto e2e query to illustrate this problem. We can then link the Presto PR to this PR to give the reviewers more details.

@pramodsatya
Copy link
Collaborator Author

Hi @aditi-pandit, I came across this issue while running the tests from FunctionMetadataTest.cpp in presto native function metadata PR. I observed this after making changes to address this review comment, to replace schema from default to whatever is configured in the presto namespace.

I ran the existing e2e tests for ApproxDistinctAggregate with this change and did not see any failures. I am not sure how it could be reproduced with a presto e2e test, since the hyperloglog custom type is already registered without a prefix in PrestoServer.cpp (from registerHyperLogFunctions() -> registerHyperLogLogType()). The check which is causing the failure, hasType(HYPERLOGLOG) -> customTypeExists(HYPERLOGLOG), would not fail in presto native and it seems like a bug in velox FunctionSignature. Could you please share how to proceed?

@aditi-pandit
Copy link
Collaborator

Hi @aditi-pandit, I came across this issue while running the tests from FunctionMetadataTest.cpp in presto native function metadata PR. I observed this after making changes to address this review comment, to replace schema from default to whatever is configured in the presto namespace.

I ran the existing e2e tests for ApproxDistinctAggregate with this change and did not see any failures. I am not sure how it could be reproduced with a presto e2e test, since the hyperloglog custom type is already registered without a prefix in PrestoServer.cpp (from registerHyperLogFunctions() -> registerHyperLogLogType()). The check which is causing the failure, hasType(HYPERLOGLOG) -> customTypeExists(HYPERLOGLOG), would not fail in presto native and it seems like a bug in velox FunctionSignature. Could you please share how to proceed?

@pramodsatya : It might be better to say that this code is changed to make it consistent with https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/types/HyperLogLogType.cpp#L20

Actually its even better if you use the registerHyperLogLogType() function itself instead of the registerCustomType call.

@pramodsatya
Copy link
Collaborator Author

Thanks for the suggestion @aditi-pandit, it is better to use registerHyperLogLogType here and updated the PR.

@czentgr czentgr marked this pull request as ready for review August 6, 2024 20:56
@aditi-pandit aditi-pandit changed the title Remove prefix from HyperLogLog type registration Make HyperLogLog type registration consistent across different functions Aug 7, 2024
@aditi-pandit
Copy link
Collaborator

@pramodsatya : We should follow up with a PR adding e2e presto-native tests. Add a single function to test SQL with all the different functions using HLL to ensure the correctness of this fix.

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya

@mbasmanova mbasmanova changed the title Make HyperLogLog type registration consistent across different functions Fix HyperLogLog type registration in registerApproxDistinctAggregates Aug 7, 2024
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pramodsatya Nice catch. The change looks good. Some comments re: test.

I assume this issue shows up only when we register aggregate, but not scalar functions. I see that scalar functions register 'hyperloglog' type correctly.

velox/exec/tests/FunctionSignatureBuilderTest.cpp Outdated Show resolved Hide resolved
class SignaturesWithCustomType : public ::testing::Test {};

TEST_F(SignaturesWithCustomType, customType) {
aggregate::prestosql::registerAllAggregateFunctions("presto.default.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use registerApproxDistinctAggregates

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not present in any header and looks like only registerAllAggregateFunctions is exposed for tests. Is it fine if registerAllAggregateFunctions is retained?

velox/exec/tests/FunctionSignatureBuilderTest.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbasmanova, yes this issue shows up when we register aggregate functions with a prefix and do not register hyperloglog as a custom type (using registerHyperLogLogType).

class SignaturesWithCustomType : public ::testing::Test {};

TEST_F(SignaturesWithCustomType, customType) {
aggregate::prestosql::registerAllAggregateFunctions("presto.default.");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not present in any header and looks like only registerAllAggregateFunctions is exposed for tests. Is it fine if registerAllAggregateFunctions is retained?

@@ -432,5 +433,14 @@ TEST_F(ApproxDistinctTest, toIntermediate) {
digests, {"c0"}, {"merge(a0)"}, {"c0", "cardinality(a0)"}, {input});
}

TEST(ApproxDistinctSignatureTest, hyperLogLog) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned earlier, test name should match file name. Please, change to TEST_F(ApproxDistinctTest, ...

The test name 'hyperLogLog' is not very descriptive. Consider, typeRegistration.

It could be that changing to ApproxDistinctTest will end up registering scalar functions in the SetUp, in which case, this test won't reproduce the problem. I'm thinking that, perhaps, we don't need a test for this fix at all. Seems more trouble then worth it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes actually I had changed it because when I used the file name as the test name I got this error when running the test:

velox/cmake-build-debug/_deps/gtest-src/googletest/src/gtest.cc:2520: Failure
Failed
All tests in the same test suite must use the same test fixture
class, so mixing TEST_F and TEST in the same test suite is
illegal.  In test suite ApproxDistinctTest,
test groupByIntegers is defined using TEST_F but
test hyperLogLog is defined using TEST.  You probably
want to change the TEST to TEST_F or move it to another test
case.

Apologies for not mentioning this earlier in the comment.

Yes, I agree a test doesn't seem suitable for this fix, reverted the test.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pramodsatya Thank you for the fix.

@mbasmanova mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Aug 7, 2024
@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in f340734.

Copy link

Conbench analyzed the 1 benchmark run on commit f3407347.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants