Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: catalog #1454

Merged
merged 15 commits into from
Apr 26, 2023
Merged

Conversation

v0y4g3r
Copy link
Contributor

@v0y4g3r v0y4g3r commented Apr 24, 2023

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

An epic refactor that completely removes the vicious CatalogList and corresponding adaptors from GreptimeDB.

CatalogList is used for query engine to get table entity by the triplet: catalog name/schema name/table name. But as we've discussed, all these traits are sync, which means we cannot rely on any async operation to retrieve the table entities from remote catalog implementation, for example, metasrv in distributed mode.

As an workaround, we've managed to bridge sync traits with async implementation, which works but brings too much overhead in that each catalog operation requires a new thread.

In this PR, instead fetching table entity from underlying storage during planning, we resolves tables from statements in advance and put these tables in a memory based catalog list.

To make "resolving tables in advance" possibile, we examines all the code paths where a catalog list is required.

  • when planning SQL statements, we can get all the required tables from statements and
    async fn plan_sql(&self, stmt: Statement, query_ctx: QueryContextRef) -> Result<LogicalPlan> {
    let df_stmt = (&stmt).try_into().context(SqlSnafu)?;
    let context_provider = DfContextProviderAdapter::try_new(
    self.engine_state.clone(),
    self.session_state.clone(),
    &df_stmt,
    query_ctx,
    )
    .await?;
  • when converting substrait encoded plans to datafusion's logical plans, we can get current catalog name and schema name from gRPC headers and build a dummy schema provider which resolves table in that catalog/schema asynchronously.
    • This depends on an invariant that catalog/schema name in gRPC header contains all the needed catalog/database names in this substrait plan, which we've checked with @fengjiachun

remember, datafusion's SchemaProvider::table is already async

pub(crate) async fn execute_logical(

As result, this PR is able to remove the implementation of trait datafusion::catalog::catalog::CatalogList/datafusion::catalog::catalog::CatalogProvider since we have our own async version alternatives.

This PR also removes the catalog/schema adaptors.

Also, please note that, since multiple table engines is supported now, table regional values need also a engine_name field to locate a table in RemoteSchemaProvider. This does not breaks compatibility since when this field is not present we use the default MITO_ENGINE as table engine name when getting tables from SchemaProvider.

async fn table(&self, name: &str) -> Result<Option<TableRef>> {
let key = self.build_regional_table_key(name).to_string();
let table_opt = self
.backend
.get(key.as_bytes())
.await?
.map(|Kv(_, v)| {
let TableRegionalValue { engine_name, .. } =
TableRegionalValue::parse(String::from_utf8_lossy(&v))
.context(InvalidCatalogValueSnafu)?;
let reference = TableReference {
catalog: &self.catalog_name,
schema: &self.schema_name,
table: name,
};
let engine_name = engine_name.as_deref().unwrap_or(MITO_ENGINE);
let engine = self
.engine_manager
.engine(engine_name)
.context(TableEngineNotFoundSnafu { engine_name })?;
let table = engine
.get_table(&EngineContext {}, &reference)
.with_context(|_| OpenTableSnafu {
table_info: reference.to_string(),
})?;
Ok(table)
})
.transpose()?
.flatten();
Ok(table_opt)
}

This PR also supports renaming tables in catalog, however, tables routes are not renamed in metasrv, which will leads to renamed_table does not exist.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.

Refer to a related PR or issue link (optional)

closes #1335
#816
#723

…catalog

# Conflicts:
#	src/common/substrait/src/df_substrait.rs
#	src/datanode/src/sql/alter.rs
#	src/datanode/src/sql/create.rs
#	src/table-procedure/src/alter.rs
#	tests/cases/distributed/alter/rename_table.result
@v0y4g3r v0y4g3r changed the title refactor: remove catalog refactor: catalog Apr 25, 2023
@v0y4g3r v0y4g3r marked this pull request as ready for review April 25, 2023 16:05
@codecov
Copy link

codecov bot commented Apr 25, 2023

Codecov Report

Merging #1454 (bbb15b0) into develop (1a245f3) will decrease coverage by 0.71%.
The diff coverage is 72.69%.

❗ Current head bbb15b0 differs from pull request most recent head 2420a6f. Consider uploading reports for the commit 2420a6f to get more accurate results

@@             Coverage Diff             @@
##           develop    #1454      +/-   ##
===========================================
- Coverage    86.19%   85.49%   -0.71%     
===========================================
  Files          547      546       -1     
  Lines        81948    81851      -97     
===========================================
- Hits         70639    69978     -661     
- Misses       11309    11873     +564     

src/catalog/src/lib.rs Outdated Show resolved Hide resolved
src/catalog/src/local/memory.rs Outdated Show resolved Hide resolved
src/common/substrait/src/df_logical.rs Show resolved Hide resolved
src/datanode/src/instance/grpc.rs Show resolved Hide resolved
src/meta-client/src/client/router.rs Show resolved Hide resolved
src/catalog/src/local/memory.rs Show resolved Hide resolved
src/catalog/src/datafusion/catalog_adapter.rs Outdated Show resolved Hide resolved
src/catalog/src/helper.rs Show resolved Hide resolved
tests/cases/distributed/alter/rename_table.sql Outdated Show resolved Hide resolved
src/servers/src/http/handler.rs Outdated Show resolved Hide resolved
src/meta-client/src/client/router.rs Show resolved Hide resolved
src/common/substrait/src/df_substrait.rs Show resolved Hide resolved
@waynexia waynexia enabled auto-merge (squash) April 26, 2023 08:28
@waynexia waynexia merged commit fb9978e into GreptimeTeam:develop Apr 26, 2023
paomian pushed a commit to paomian/greptimedb that referenced this pull request Oct 19, 2023
* wip

* add schema_async

* remove CatalogList

* remove catalog provider and schema provider

* fix

* fix: rename table

* fix: sqlness

* fix: ignore tonic error metadata

* fix: table engine name

* feat: rename catalog_async to catalog

* respect engine name in table regional value when deregistering tables

* fix: CR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce the impact of block methods on tokio runtime
4 participants