Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support internal function to_binary() for new charset #29863

Closed
Tracked by #26816
tangenta opened this issue Nov 17, 2021 · 3 comments · Fixed by #29736
Closed
Tracked by #26816

Support internal function to_binary() for new charset #29863

tangenta opened this issue Nov 17, 2021 · 3 comments · Fixed by #29736

Comments

@tangenta
Copy link
Contributor

tangenta commented Nov 17, 2021

The code changes about built-in functions in TiKV can be minimized.

After the investigation about several built-in functions, I found some common patterns:

  1. Create a new encoding.
  2. Convert the arguments into new charset from utf8.
  3. Apply these changes to the vectorized-version evaluation.

We can extract these patterns into another Sig named builtinToBinarySig(to_binary()) and wrap the cast function as follows:

The rewrite process before this proposal:

select some_builtin(arg1, arg2...)

is rewritten to

select some_builtin(cast_as_string(arg1), cast_as_string(arg2)...)

The rewrite process after this proposal:

select some_builtin(arg1, arg2...)

is rewritten to

select some_builtin(to_binary(cast_as_string(arg1)), to_binary(cast_as_string(arg2))...)

Then we can maintain a built-in function list to determine which function needs to be wrapped, instead of putting the boilerplates everywhere.

var toBinaryMap = map[string]struct{}{
	ast.Hex: {}, ast.Length: {}, ast.OctetLength: {}, ast.ASCII: {},
	ast.ToBase64: {},
}
@tangenta
Copy link
Contributor Author

tangenta commented Nov 17, 2021

@zimulala @xiongjiwei @ou-bing @Defined2014 @unconsolable @jackwener @chacha923 @hawkingrei @jayl-zxl FYI, please feel free to leave your suggestions here, if any.

@unconsolable
Copy link
Contributor

unconsolable commented Nov 17, 2021

Looks good since it can reduce code repetition, but I have a small question.
This code snippet comes from (*builtinInternalToBinarySig).vecEvalString

	for i := 0; i < n; i++ {
		var str string
		if buf.IsNull(i) {
			result.AppendNull()
			continue
		}
		str = buf.GetString(i)
		str, err = enc.EncodeString(str)

I wonder whether an encode buffer should be introduced here to reduce memory allocation.

@tangenta
Copy link
Contributor Author

tangenta commented Nov 17, 2021

@unconsolable Good catch! If you are interested in it you can file a PR to fix it.

By the way, we are trying to collect all the functions that can be wrapped by to_binary() and add them to the list, you can also include some of them in your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants