Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggfuncs: implement bit-or with new aggregation framework #6975

Merged
merged 9 commits into from
Jul 5, 2018
12 changes: 7 additions & 5 deletions executor/aggfuncs/aggfuncs.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,13 @@ var (
_ AggFunc = (*avgOriginal4Float64)(nil)
_ AggFunc = (*avgPartial4Float64)(nil)

// All the AggFunc implementations for "FIRSTROW" are listed here.
// All the AggFunc implementations for "MAX" are listed here.
// All the AggFunc implementations for "MIN" are listed here.
// All the AggFunc implementations for "GROUP_CONCAT" are listed here.
// All the AggFunc implementations for "BIT_OR" are listed here.
// All the AggFunc implementations for "FIRSTROW" are listed here.
// All the AggFunc implementations for "MAX" are listed here.
// All the AggFunc implementations for "MIN" are listed here.
// All the AggFunc implementations for "GROUP_CONCAT" are listed here.
// All the AggFunc implementations for "BIT_OR" are listed here.
_ AggFunc = (*bitOrUint64)(nil)

// All the AggFunc implementations for "BIT_XOR" are listed here.
// All the AggFunc implementations for "BIT_AND" are listed here.
)
Expand Down
7 changes: 6 additions & 1 deletion executor/aggfuncs/builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,12 @@ func buildGroupConcat(aggFuncDesc *aggregation.AggFuncDesc, ordinal int) AggFunc

// buildCount builds the AggFunc implementation for function "BIT_OR".
func buildBitOr(aggFuncDesc *aggregation.AggFuncDesc, ordinal int) AggFunc {
return nil
// BIT_OR doesn't need to handle the distinct property.
base := baseAggFunc{
args: aggFuncDesc.Args,
ordinal: ordinal,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to handle the function which has the distinct property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bit-or need to care distinct property?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider this query: select bit_or(distinct a) from t; we only calculate the distinct values of column a in this kind of query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter, because bit_or(distinct a) = bit_or(a), bit_and same too, except bit_xor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment here to statement that function bitor no need to consider the distinct property

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return &bitOrUint64{baseBitAggFunc{base}}
}

// buildCount builds the AggFunc implementation for function "BIT_XOR".
Expand Down
60 changes: 60 additions & 0 deletions executor/aggfuncs/func_bit_or.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
// Copyright 2018 PingCAP, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the filename as func_bitfuncs.go

//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// See the License for the specific language governing permissions and
// limitations under the License.

package aggfuncs

import (
"github.com/juju/errors"
"github.com/pingcap/tidb/sessionctx"
"github.com/pingcap/tidb/util/chunk"
)

type baseBitAggFunc struct {
baseAggFunc
}

type bitOrUint64 struct {
baseBitAggFunc
}

type partialResult4BitFunc = uint64

func (e *bitOrUint64) AllocPartialResult() PartialResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a member function of *baseBitAggFunc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...I take back the last sentence. 😂
We should not make the result value to be a member of baseBitAggFunc , Because this will make The baseBitAggFunc to be Stateful.
Consider another scenario, If we have many groups to be aggregated, if the AggFunc is statefull, we have to create many aggFunc to handle this.
But if AggFunc is not statefull, we can only create one AggFunc and many partialResult4BitFunc, this will reduce go GC pressure.
( This is @zz-jason told me. Thanks very much~ )

return PartialResult(new(partialResult4BitFunc))
}

func (e *bitOrUint64) ResetPartialResult(pr PartialResult) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

p := (*partialResult4BitFunc)(pr)
*p = 0
}

func (e *bitOrUint64) AppendFinalResult2Chunk(sctx sessionctx.Context, pr PartialResult, chk *chunk.Chunk) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

p := (*partialResult4BitFunc)(pr)
chk.AppendUint64(e.ordinal, *p)
return nil
}

func (e *bitOrUint64) UpdatePartialResult(sctx sessionctx.Context, rowsInGroup []chunk.Row, pr PartialResult) error {
p := (*partialResult4BitFunc)(pr)
for _, row := range rowsInGroup {
inputValue, isNull, err := e.args[0].EvalInt(sctx, row)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we wrap a cast as uint in typeInfer4BitFuncs,
or bit_or(varchar) may fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit_or( varchar ) will return 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, we need to add a cast, consider this case:

drop table if exists t;
create table t(a decimal(10, 4));
insert into t values(12.2);
select bit_or(a) from (select * from t union all select * from t) tmp;
TiDB(localhost:4000) > desc select bit_or(a) from (select * from t union all select * from t) tmp;
+--------------------------+------+----------------------------------------------+----------+
| id                       | task | operator info                                | count    |
+--------------------------+------+----------------------------------------------+----------+
| StreamAgg_13             | root | funcs:bit_or(tmp.a)                          | 1.00     |
| └─Union_21               | root |                                              | 20000.00 |
|   ├─TableReader_24       | root | data:TableScan_23                            | 10000.00 |
|   │ └─TableScan_23       | cop  | table:t, range:[-inf,+inf], keep order:false | 10000.00 |
|   └─TableReader_27       | root | data:TableScan_26                            | 10000.00 |
|     └─TableScan_26       | cop  | table:t, range:[-inf,+inf], keep order:false | 10000.00 |
+--------------------------+------+----------------------------------------------+----------+
6 rows in set (0.00 sec)

The above StreamAgg_13 directly handles the original data instead of another aggregate operator's partial result, which is guaranteed to be uint64. This PR may failed on this query if we don't wrap a cast on it's parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ye, I'll fix it

if err != nil {
return errors.Trace(err)
}
if isNull {
continue
}
*p |= uint64(inputValue)
}
return nil
}