Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): Support generate virtual columns #11590

Merged
merged 36 commits into from
Jun 1, 2023
Merged

Conversation

b41sh
Copy link
Member

@b41sh b41sh commented May 26, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Support generating virtual columns

Add the following SQLs to create, drop and generate virtual columns:

create virtual columns (<expr>, ..) for <table>
alter virtual columns (<expr>, ..) for <table>
drop virtual columns for <table>
generate virtual columns for <table>

Using a log analysis script to get which JSON internal fields are frequently queried and then calls create virtual columns .. to create virtual columns for these fields, and then calls generate virtual columns .. to generate virtual columns. Once the virtual columns have been generated, we can read the data from the JSON internal fields directly through the virtual columns.

for example:

create table test (id int, val json);
insert into test values(1, '{"a":33,"b":44}'),(2, '{"a":55,"b":66}');
create virtual columns (val['a'], val['b']) for test;
generate virtual columns for test;

select val['a'], val['b'] from test;
+----------+----------+
| val['a'] | val['b'] |
+----------+----------+
| 33       | 44       |
| 55       | 66       |
+----------+----------+

Part of #6994

@vercel
Copy link

vercel bot commented May 26, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
databend ⬜️ Ignored (Inspect) Visit Preview Jun 1, 2023 8:34am

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label May 26, 2023
@b41sh b41sh marked this pull request as ready for review May 29, 2023 08:00
@b41sh b41sh requested a review from drmingdrmer as a code owner May 29, 2023 08:00
@RinChanNOWWW
Copy link
Contributor

What does Using a log analysis script to get which JSON internal fields are frequently queried mean?

src/meta/app/src/schema/virtual_column.rs Outdated Show resolved Hide resolved
src/meta/app/src/schema/virtual_column.rs Outdated Show resolved Hide resolved
src/meta/api/src/schema_api_impl.rs Outdated Show resolved Hide resolved
src/meta/api/src/schema_api_impl.rs Outdated Show resolved Hide resolved
src/meta/api/src/schema_api_impl.rs Outdated Show resolved Hide resolved
src/meta/api/src/schema_api_impl.rs Outdated Show resolved Hide resolved
src/meta/api/src/schema_api_impl.rs Outdated Show resolved Hide resolved
@b41sh
Copy link
Member Author

b41sh commented May 30, 2023

What does Using a log analysis script to get which JSON internal fields are frequently queried mean?

We want the virtual columns to be generated automatically, without the user having to create them manually. So a Python script is needed to analyze the query log, find the internal JSON fields that are frequently queried, and call the add virtual columns and generate virtual columns commands to generate the virtual columns. This script can be implemented in the next PR.

src/meta/api/src/schema_api_impl.rs Outdated Show resolved Hide resolved
src/meta/api/src/schema_api_impl.rs Outdated Show resolved Hide resolved
src/meta/api/src/schema_api_impl.rs Outdated Show resolved Hide resolved
@b41sh b41sh requested a review from drmingdrmer May 31, 2023 08:01
src/meta/api/src/schema_api_impl.rs Outdated Show resolved Hide resolved
@b41sh b41sh merged commit 1abffb2 into main Jun 1, 2023
@b41sh b41sh deleted the virtual-columns_b41sh branch June 1, 2023 09:27
@BohuTANG
Copy link
Member

cc @soyeric128 for documentation, thanks.

andylokandy pushed a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
* feat(query): Support generate virtual columns

* fix

* fix

* fix taplo fmt

* add tests

* fix

* fix

* add test

* fix

* add tests

* add alter virtual column

* fix tests

* add virtual block folder

* fix

* fix

* fix meta api

* fix

* fix

* fix

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants