Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting dataset_table_separator to _ throws an error. #1998

Open
ahnazary opened this issue Oct 28, 2024 · 0 comments
Open

Setting dataset_table_separator to _ throws an error. #1998

ahnazary opened this issue Oct 28, 2024 · 0 comments

Comments

@ahnazary
Copy link

ahnazary commented Oct 28, 2024

dlt version

1.2.0

Describe the problem

I am trying to build a pipeline for moving data from postgres to clickhouse, setting dataset_table_separator to a single underscore _ throws an exception. I am setting the dataset_table_separator like below:

dlt.secrets["destination.clickhouse.dataset_table_separator"] = "_"

Here is the full error message:

dlt.pipeline.exceptions.PipelineStepFailed: Pipeline execution failed at stage load when processing package 1729763674.409475 with exception:

<class 'dlt.destinations.exceptions.DatabaseTerminalException'>
Code: 57.
DB::Exception: Table dev.dlt_dlt_sentinel_table already exists. Stack trace:

0. Poco::Exception::Exception(String const&, int)
1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool)
2. DB::Exception::Exception<String, String>(int, FormatStringHelperImpl<std::type_identity<String>::type, std::type_identity<String>::type>, String&&, String&&)
3. DB::InterpreterCreateQuery::doCreateTable(DB::ASTCreateQuery&, DB::InterpreterCreateQuery::TableProperties const&, std::unique_ptr<DB::DDLGuard, std::default_delete<DB::DDLGuard>>&)
4. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&)
5. DB::InterpreterCreateQuery::execute()
6. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*)
7. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, std::shared_ptr<DB::Context>, std::function<void (DB::QueryResultDetails const&)>, DB::QueryFlags, std::optional<DB::FormatSettings> const&, std::function<void (DB::IOutputFormat&)>)
8. DB::DDLWorker::tryExecuteQuery(DB::DDLTaskBase&, std::shared_ptr<zkutil::ZooKeeper> const&)
9. DB::DDLWorker::processTask(DB::DDLTaskBase&, std::shared_ptr<zkutil::ZooKeeper> const&)
10. DB::DatabaseReplicatedDDLWorker::tryEnqueueAndExecuteEntry(DB::DDLLogEntry&, std::shared_ptr<DB::Context const>)
11. DB::DatabaseReplicated::tryEnqueueReplicatedDDL(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const>, DB::QueryFlags)
12. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&)
13. DB::InterpreterCreateQuery::execute()
14. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*)
15. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum)
16. DB::TCPHandler::runImpl()
17. DB::TCPHandler::run()
18. Poco::Net::TCPServerConnection::start()
19. Poco::Net::TCPServerDispatcher::run()
20. Poco::PooledThread::run()
21. Poco::ThreadImpl::runnableEntry(void*)
22. ?
23. ?

From what I understand, dlt tries to create dlt_dlt_sentinel_table table twice in the pipleline.run() process. using different combination of values for dataset_sentinel_table_name and dataset_name did not work either.
I am wondering if this is an intended behaviour or not.

Setting the dataset_table_separator in the secrets.toml file results in the same error too.

Expected behavior

Expected behaviour is a running pipeline moving data from postgres into a clickhouse table with _ being the separator character between dataset_name and table_name.

Steps to reproduce

Here is a code snippet to recreate the issue:

import dlt
from dlt.sources.sql_database import Table, sql_database, sql_table

dlt.secrets["destination.clickhouse.dataset_table_separator"] = "_"
dlt.secrets["destination.clickhouse.table_engine_type"] = "merge_tree"
dlt.secrets["destination.clickhouse.dataset_sentinel_table_name"] = "dlt_sentinel_table"

pipeline = dlt.pipeline(
    pipeline_name="dummy_pipeline_name",
    destination="clickhouse",
    dataset_name="dlt",
)
table = sql_table(table="dummy_table_name")

info = pipeline.run(table)

Operating system

macOS

Runtime environment

Local

Python version

3.11

dlt data source

Postgres

dlt destination

Clickhouse

Other deployment details

No response

Additional information

No response

@ahnazary ahnazary changed the title Settong dataset_table_separator to _ throws an error. Setting dataset_table_separator to _ throws an error. Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant