Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguate alter commands list #59532

Merged

Conversation

antaljanosbenjamin
Copy link
Member

@antaljanosbenjamin antaljanosbenjamin commented Feb 2, 2024

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Allow alter operations to be surrounded by parenthesis. The emission of parentheses can be controlled by the format_alter_operations_with_parentheses config. By default in formatted queries the parentheses are emitted as we store the formatted alter operations in some places as metadata (e.g.: mutations). The new syntax clarifies some of the queries where alter operations end in a list. E.g.: ALTER TABLE x MODIFY TTL date GROUP BY a, b, DROP COLUMN c cannot be parsed properly with the old syntax. In the new syntax the query ALTER TABLE x (MODIFY TTL date GROUP BY a, b), (DROP COLUMN c) is obvious. Older versions are not able to read the new syntax, therefore using the new syntax might cause issues if newer and older version of ClickHouse are mixed in a single cluster.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-feature Pull request with new product feature label Feb 2, 2024
@robot-clickhouse-ci-1
Copy link
Contributor

robot-clickhouse-ci-1 commented Feb 2, 2024

This is an automated comment for commit 425799b with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
A SyncThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Mergeable CheckChecks if all other necessary checks are successful✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
SQLTestThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
SQLancerFuzzing tests that detect logical bugs with SQLancer tool✅ success
SqllogicRun clickhouse on the sqllogic test set against sqlite and checks that all statements are passed✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success
Check nameDescriptionStatus
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc❌ failure

@antaljanosbenjamin antaljanosbenjamin changed the title [WIP] Disambiguate ttl element list and alter command list Disambiguate ttl element list and alter command list Feb 5, 2024
@antaljanosbenjamin antaljanosbenjamin marked this pull request as ready for review February 5, 2024 16:21
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added pr-improvement Pull request with some product improvements and removed pr-feature Pull request with new product feature labels Feb 5, 2024
@UnamedRus
Copy link
Contributor

Can it cover as well #17303?

@antaljanosbenjamin
Copy link
Member Author

@UnamedRus I will check that issue, but it seems like not a parsing, but a logical issue there.

@UnamedRus
Copy link
Contributor

Actually yeah, my memory fails me.
There was another issue with comma, but for TTL GROUP BY #17304 and it fixed.

@antaljanosbenjamin antaljanosbenjamin changed the title Disambiguate ttl element list and alter command list Disambiguate alter commands list Feb 7, 2024
@antaljanosbenjamin
Copy link
Member Author

The scope of the PR is changed: because the necessary part is only the alter commands and the TTL elements are harder to parse properly ((expr) can be either a TTL element with parens inside and with the expression expr, or just a TTL element without parens with the expression (expr), it won't try to fix that now.

@antaljanosbenjamin
Copy link
Member Author

it fixed.

Actually there is still an issue if you don't use the SET keyword, so e.g.:

CREATE TABLE test_table
(
    `key_a` UInt32,
    `key_b` UInt32,
    `ts` DateTime,
    `value` UInt32
)
ENGINE = MergeTree
PARTITION BY tuple()
ORDER BY (key_a, key_b)
TTL
    ts GROUP BY key_a, key_b,
    ts + toIntervalMonth(2)


Received exception from server (version 24.2.1):
Code: 450. DB::Exception: Received from localhost:9000. DB::Exception: TTL Expression GROUP BY key should be a prefix of primary key. (BAD_TTL_EXPRESSION)

Here ts + toIntervalMonth(2) is parsed as GROUP BY, which is bad. I will create an issue about this.

@antaljanosbenjamin
Copy link
Member Author

So about this issue, I got stuck with some build issues. With Sasha we talked about that I shouldn't format the queries with parenthesis by default, because that makes online upgrading dangerous: if an alter command serialized by a new version reaches a node with an older version, then the parsing will fail on that node probably causing similar issues like when a user now runs ALTER TABLE x MODIFY TTL expr GROUP BY expr queries, but for all alter queries. That sounds a bit scary to me. As a result I tried to introduce a server setting to control the behavior of the parser, but it turns out this is more tricky than I thought. The two main approach here:

  1. Access the server settings in ASTAlterQuery::formatImpl and print the parentheses if the flag is set.
  2. Introduce a new field in FormatSettings and set that flag from the places where we know we want to serialize alter commands. This could cover most of the places, most critically where we serialize the mutation commands. However as we serialize ASTs in many different ways, we should get the server setting and transfer the flag to FormatSettings and then serialize the AST. For me this is kind of a bad solution, because then we don't have a well centralized place to handle this and there might be inconsistencies between different places. Right now most of the things are handled by the formatAST and serializeAST functions. It would be nice to read the server settings there, because then we have a somewhat centralized place to deal with this, but if this would be possible, than we could also go with option1.

However there is an issue with option1: in order to access the server settings, we need to call Context::getGlobalContextInstance()->getServerSettings(), which is defined in src/Interpreters/Context.cpp which is part of the dmbs lib in our CMake. This means everytime we link clickhouse_parsers to a somewhere, we should also link dbms. For me this is too intrusive solution, .e.g.: we would need to link dbms to clickhouse_compression, because it links to clickhouse_parsers. That means, we have to link dbms to clickhouse_common_zookeeper_no_log and clickhouse_common_zookeeper, which doesn't seem like a good idea from the dependency point of view.

If we opted for do not have server settings, then we would have to wait until the oldest LTS version of clickhouse will contain the new parsers capable of parsing alter commands with parentheses to fix this bug, or we should introduce a change where the upgrade from an older version can break online clusters if they do alters during upgrades.

I don't feel any if the above choice is a good solution, so I am here to ask you, which one should we do:

  1. Have a server setting
    • access the server settings directly from ASTAlterQuery and link dbms to every target which links against clickhouse_compression
    • access the server settings only in places where we think it is necessary to fix the bug
  2. Do not have a server setting
    • first roll out the new parsers in the next CH version, then in the near future roll out a version which not just parses, but also formats the alter queries with parentheses
    • just roll out the new parsing and formatting and make a fat note about this in our release notes
  3. something else

I really don't see a clearly good solution here.

As a result I will introduce a global variable that will be updated with the value of the regarding server setting in order to bridge the gap between server settings and the parsers. It is meant to a temporary solution until we can enforce the serialization with the new format.

@UnamedRus
Copy link
Contributor

BTW, when change happened to D DDL worker, it was done via user level setting distributed_ddl_entry_format_version, with ~1 year delay

#35463

@tavplubix
Copy link
Member

BTW, when change happened to D DDL worker, it was done via user level setting distributed_ddl_entry_format_version, with ~1 year delay

#35463

This change is not anyhow related to DDLWorker. There's already a compatibility setting (format_alter_operations_with_parentheses, disabled by default) that controls the new behavior, so nothing to do with distributed_ddl_entry_format_version

@tavplubix tavplubix self-assigned this Feb 16, 2024
Comment on lines +225 to +227
// This function is only meant to be called during application startup
// For reasons see https://github.com/ClickHouse/ClickHouse/pull/59532
static void setFormatAlterCommandsWithParentheses(bool value) { format_alter_commands_with_parentheses = value; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call it in clickhouse-client and clickhouse-local as well? They print a formatted query:

localhost.localdomain :) SeLeCt NuLl

SELECT NULL

Query id: fdbaa7b8-75c0-4f58-b79c-46b9ae290e1e

┌─NULL─┐
│ ᴺᵁᴸᴸ │
└──────┘

1 row in set. Elapsed: 0.005 sec. 

I think it's fine to unconditionally set it to true for that purpose

Comment on lines +280 to +282
const auto alter_command = command_col.getDataAt(i).toString();
const auto with_round_bracket = alter_command.front() == '(';
ParserAlterCommand parser{with_round_bracket};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to always allow round brackets when parsing, despite the value of the setting? So the setting will be only used for formatting, and the parser will be able to always understand both versions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is by allowing parsing with parentheses by default is the transition can be smoother during migrating to newer versions. E.g. during live migration if there are one replica which produces alter commands with parens, then the older replicas will parse it properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe my previous answer wasn't the best. So I wanted to avoid allowing the "mix" of the old and new syntax. So in an alter command list each command must or mustn't have to be surrounded with parentheses. E.g.: ALTER TABLE x MODIFY TTL, DROP COLUM a is fine, ALTER TABLE x (MODIFY TTL), (DROP COLUMN a) also fine, but ALTER TABLE x (MODIFY TTL), DROP COLUMN a is not fine.

This means in some way ParserAlterCommandList has to be able to enforce it. This could be done in two ways in my opinion:

  1. The list parser figures this out and passes it to the individual command parser.
  2. The individual command parser decides whether parens should be used or not and because in ParserAlterCommandList::parseImpl only a single parser is used, it can carry over this information. It is doable, but it would introduce in my opnion a hidden relation between the first parsed command and every command after that. I think right now parsers doesn't have this kind of "memory" in general. Thus, I opted for option 1.

That's why here determining with_round_bracket is necessary. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in an alter command list each command must or mustn't have to be surrounded with parentheses.

Yes, I agree.

This means in some way ParserAlterCommandList has to be able to enforce it. ... That's why here determining with_round_bracket is necessary.

Sorry, I missed this when reading the code, it was unobvious.
Yes, the first option is better because in the worst case, someone will pass true or false unconditionally instead of proper with_round_bracket flag (so only one version of the syntax will work in this place). And with the second option, someone may use one ParserAlterCommand for multiple queries, and it will cause an obscure bug

@@ -0,0 +1,3 @@
<clickhouse>
<format_alter_operations_with_parentheses>1</format_alter_operations_with_parentheses>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can enable it in functional tests by adding this config to tests/config/install.sh
Btw, functional tests are much better than integration tests: #39359 (comment)
(but it's not necessary to rewrite existing tests)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this, and if we would change this in functional tests, then we would change this for all of the functional tests. I think it shouldn't cause an issue, but we would test something that is not the "default" behavior of clickhouse.

Furthermore I used the same queries in functional tests, but expecting the old behavior. So I would say the only question is which way should be the default in functional tests? The other one has to be tested in integration test either way, because we have to change the server setting.

@antaljanosbenjamin
Copy link
Member Author

antaljanosbenjamin commented Feb 21, 2024

Test failures:

2024.02.21 02:22:36.637567 [ 33028 ] {} <Trace> DatabaseAtomic (test_15): Loading table test_15.test_param_view2
2024.02.21 02:22:36.637759 [ 33028 ] {} <Fatal> : Logical error: 'Invalid storage definition in metadata file: it's a bug or result of manual intervention in metadata files'.

The table test_param_view2 is created in tests/queries/0_stateless/02818_parameterized_view_with_cte_multiple_usage.sql:

create view test_param_view2 as
with {param_test_val:UInt8} as param_test_val
select param_test_val,
       arrayCount((a)->(a < param_test_val), t.arr) as cnt1,
       arrayCount((a)->(a < param_test_val+1), t.arr) as cnt2
from (select [1,2,3,4,5] as arr) t;

I don't think it has anything to do with my changes, so I will merge, but because our CI was a bit messy lately, I will merge a current master to see the actual state of tests. Maybe some unreported issues are present.

@antaljanosbenjamin
Copy link
Member Author

@antaljanosbenjamin antaljanosbenjamin merged commit 7cad005 into master Feb 22, 2024
255 of 268 checks passed
@antaljanosbenjamin antaljanosbenjamin deleted the disambiguate-ttl-element-list-and-alter-command-list branch February 22, 2024 18:17
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants