Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Catalogs #12709

Open
9 of 21 tasks
dain opened this issue Jun 6, 2022 · 16 comments
Open
9 of 21 tasks

Dynamic Catalogs #12709

dain opened this issue Jun 6, 2022 · 16 comments
Assignees
Labels
roadmap Top level issues for major efforts in the project

Comments

@dain
Copy link
Member

dain commented Jun 6, 2022

This is an umbrella tracking issue for the Dynamic Catalogs project. As we progress on this project, PRs and sub-issues will be linked back to this issue.

Goal

Add support for adding and removing catalogs without restarting servers or interrupting existing queries. This will be dependent on the connector being updated to support concurrent versions with same name (e.g., no JMX names containing just a catalog name). For non-compliant connectors, support a basic model where queries are killed when dropping.

For catalog properties the current plan is to simply use the existing Map<String, String> of arbitrary properties instead of requiring pre-declaration of typed catalog properties like is done for session and table properties. This is mainly for expedience and backwards compatibility of the existing connector configuration system.

MVP Plan

Follow-up Items

@dain dain added the roadmap Top level issues for major efforts in the project label Jun 6, 2022
@dain dain self-assigned this Jun 6, 2022
@dain
Copy link
Member Author

dain commented Jun 6, 2022

I have a prototype if the basic stuff working, and an working on cleaning this up for submission.

@Unclecc
Copy link

Unclecc commented Oct 25, 2022

Hi, I use this PR to test DynamicCatalog, However, I ran into a problem when I deleted and then created a Hive Catalog. like reload, the first time is OK and can execute SQL, the second time will have the following error occurred when i execute SQL.
java.lang.ClassCastException: class io.trino.plugin.hive.HiveTableHandle cannot be cast to class io.trino.plugin.hive.HiveTableHandle (io.trino.plugin.hive.HiveTableHandle is in unnamed module of loader io.trino.server.PluginClassLoader @408b024e; io.trino.plugin.hive.HiveTableHandle is in unnamed module of loader io.trino.server.PluginClassLoader @aa6aca3)
at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:150)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:49)
at io.trino.split.PageSourceManager.createPageSource(PageSourceManager.java:62)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:308)
at io.trino.operator.Driver.processInternal(Driver.java:411)
at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
at io.trino.operator.Driver.tryWithLock(Driver.java:706)
at io.trino.operator.Driver.process(Driver.java:306)
at io.trino.operator.Driver.processForDuration(Driver.java:277)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:739)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:515)
at io.trino.$gen.Trino_400_100_gc6901d5_dirty____20221025_085036_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)

so i take two days to read trino code, i found this commit #1820, i think @dain will face this problem in worker too, because it will generate a new classloader when you update hive catalog, but io.trino.plugin.hive.HiveTableHandle's instance loaded last time

@leeyh0216
Copy link
Member

leeyh0216 commented Nov 30, 2022

Hi, @dain , @Unclecc .

I am also interested in Dynamic Catalog, so I am trying to implement my own based on the PR.
The above issue that @Unclecc mentioned was reproduced in the same way, but I thought that the issue you mentioned is not the problem.

I solved the problem by slightly modifying AbstractTypedJacksonModule.InternalTypeDeserializer .
Since the AsPropertyTypeDeserializer object caches the class loader for the class parsed once, the second call process does not use the class loader changed during the Dynamic Catalog application process, but the existing class loader is used.
Since it was for simple test purposes, I changed the method to create an AsPropertyTypeDeserializer object every time as shown below, and confirmed that it works.

    private static class InternalTypeDeserializer<T>
            extends StdDeserializer<T>
    {
        private TypeIdResolver typeIdResolver;

        private Class<T> baseClass;
        public InternalTypeDeserializer(Class<T> baseClass, TypeIdResolver typeIdResolver)
        {
            super(baseClass);
            this.typeIdResolver = typeIdResolver;
            this.baseClass = baseClass;
        }

        @SuppressWarnings("unchecked")
        @Override
        public T deserialize(JsonParser jsonParser, DeserializationContext deserializationContext)
                throws IOException
        {
            TypeDeserializer typeDeserializer = new AsPropertyTypeDeserializer(
                    TypeFactory.defaultInstance().constructType(baseClass),
                    typeIdResolver,
                    TYPE_PROPERTY,
                    false,
                    null);
            return (T) typeDeserializer.deserializeTypedFromAny(jsonParser, deserializationContext);
        }
    }

@Unclecc , If you apply the above code and test it, I think it will work.
@dain , If you have the same problem, I hope the above solution helps.

I'm not good at English... Please understand if the context is strange.

@Gqyanxin
Copy link

Hi, I am also interested in Dynamic Catalog.When will this feature be available?

@bitsondatadev
Copy link
Member

Hey @Gqyanxin and anyone else that stumbles upon this issue. Keep in mind that this is an open-source project and some of this work has to happen during the developers' free time.

The best you can do is follow this issue and the related to get a sense of how close they are.

To get a sense of why this issue may take some time, see the discussion on this during the Trino Summit. If Dain has an update on when to expect this, he will provide them here.

@cdrappier
Copy link

Hi,

At ForePaaS, we are currently testing this new feature, which is proving to be very helpful for our use case. I have start some work on a JdbcCatalogStore inspirated by the FileCatalogStore. If this is something that is relevant to the main Trino open source project, I will create a merge request as soon as possible to discuss the subject with you.
Is that something referring to you follow-up items : SPI for catalog storage ?

My goal is to have a dynamic catalog, with persistant data, but without directly using disk.
Thanks,

@bitsondatadev
Copy link
Member

@dain ^^

@Smith-Cruise
Copy link

What's grammar now, I want to have a test for it.

@ockhamlabs
Copy link

@dain Any update on when the pending items for MVP will be done? We have been eagerly waiting for this to be finished for a usable dynamic catalog capability. Pl let us know

@hashhar
Copy link
Member

hashhar commented May 17, 2023

There is related work ongoing in #15921 - without that while you can have dynamic catalogs one of the most important connectors - Hive (and Iceberg) won't be able to be used with it.

@xuhaiL
Copy link

xuhaiL commented Aug 16, 2023

i need this feature,it can be used?

@dungdm93
Copy link
Member

dungdm93 commented Sep 1, 2023

It's useful to implementation k8s CRD based catalogs. And further more is Trino k8s operator.

@gmartini2019
Copy link

Hey, did you guys figure out the dynamic catalog? I am trying to make an API just for that, but I am missing the specific user permission that will allow me to do so.

@trinodb trinodb deleted a comment from Unclecc Mar 19, 2024
@trinodb trinodb deleted a comment from Unclecc Mar 19, 2024
@trinodb trinodb deleted a comment from Unclecc Mar 19, 2024
@trinodb trinodb deleted a comment from hackeryang Mar 19, 2024
@trinodb trinodb deleted a comment from sarlaccpit Mar 19, 2024
@trinodb trinodb deleted a comment from hackeryang Mar 19, 2024
@trinodb trinodb deleted a comment from Unclecc Mar 19, 2024
@hashhar
Copy link
Member

hashhar commented Jul 31, 2024

@dain A question, the CatalogVersion javadoc says:

/**
 * Version of a catalog.  The string maybe compared lexicographically using ASCII, and to determine which catalog version is newer.
 */

Right now none of the implementations actually adhere to this though and nor do we use CatalogVersion#compareTo anywhere.

e.g. in FileCatalogStore#computeCatalogVersion we break the contract by returning a length-prefixed hash of catalog name, connector name and properties. We also have a javadoc there which says:

/**
 * This is not a generic, universal, or stable version computation, and can and will change from version to version without warning.
 * For places that need a long term stable version, do not use this code.
 */

This is interesting because the CatalogStore interface always returns the "latest" version of the catalog (because concurrency is being limited in CatalogDynamicCatalogManager usign the catalogsUpdateLock which means the CatalogVersion is of no use for doing things like optimistic concurrency control in a database-backed catalog store for example.

This is confusing. Can you please clarify whether CatalogVersion is something which the CatalogStore implementation should care about? If not maybe we should move that computation to the CatalogManager.

Also lot of SPIs around pluggable catalog stores are marked @Experimental at the moment. Do you think it's time to promote them to be stable?

@Unclecc

This comment was marked as off-topic.

@hashhar
Copy link
Member

hashhar commented Aug 2, 2024

In an offline discussion we realised that the CatalogVersion contract doesn't need to define lexicographic ordering because in current implementation the engine or catalog store never needs a sorting of catalog versions - only the latest catalog version and using the catalog version to disambiguate the catalog handles for the same connector with same catalog name.

I'll send a PR with the javadoc updates and some code cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap Top level issues for major efforts in the project
Development

No branches or pull requests