-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: remove limitations for parquet format #103129
Labels
A-cdc
Change Data Capture
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-cdc
Comments
jayshrivastava
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-cdc
Change Data Capture
T-cdc
labels
May 11, 2023
cc @cockroachdb/cdc |
jayshrivastava
changed the title
changefeedccl: remove limitations for parquet
changefeedccl: remove limitations for parquet format
May 11, 2023
This was referenced May 15, 2023
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Jun 5, 2023
Previously, `format=parquet` and `resolved` could not be used together when running changefeeds. This change adds support for this. The release note is being left intentionally blank for a future commit. Informs: cockroachdb#103129 Release note: None
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Jun 5, 2023
Previously, `format=parquet` and `resolved` could not be used together when running changefeeds. This change adds support for this. The release note is being left intentionally blank for a future commit. Informs: cockroachdb#103129 Release note: None
craig bot
pushed a commit
that referenced
this issue
Jun 5, 2023
101790: ui: update changefeed metrics page r=samiskin a=samiskin Resolves #98085 Resolves #98088 Resolves #99409 Resolves #100640 Resolves #97931 Resolves #98086 This PR does the following changes: - Added a `scale` parameter to `Metrics` so that I could support a duration metric that's being emitted in Seconds rather than Nanoseconds. Would like frontend feedback on whether this is ok. - Added support for minutes and hours in Duration graphs - There is now a "Changefeed Status" graph to show counts of Running/Paused/Failed - There is now a "Commit Latency" graph to show P50,P90, and P99 commit latencies - Sink Byte Traffic is now Emitted Bytes - Sink Timings has been removed because I don't believe either of the metrics exist anymore - Max Changefeed Latency is now Max Checkpoint Latency - There is now a Backfill Pending Ranges graph - There is now a Protected Timestamp Max Age graph - There is now a Schema Registry Registrations graph <img width="824" alt="Screenshot 2023-04-18 at 5 02 33 PM" src="https://user-images.githubusercontent.com/6236424/232904464-d52000d9-7e4f-4fd2-a1ee-7df6eaf41c4a.png"> <img width="660" alt="Screenshot 2023-04-19 at 12 48 19 PM" src="https://user-images.githubusercontent.com/6236424/233144858-3bd27004-b907-4c31-90d2-aeec6695f6aa.png"> <img width="655" alt="Screenshot 2023-04-18 at 4 19 24 PM" src="https://user-images.githubusercontent.com/6236424/232895664-97103785-13f8-4224-90e6-5706a8f4dd37.png"> <img width="660" alt="Screenshot 2023-04-18 at 4 19 42 PM" src="https://user-images.githubusercontent.com/6236424/232895722-bc60bf04-c08a-48f0-ac93-1b48d3a4303c.png"> Release note (ui change): The metrics page for changefeeds has been updated with new graphs to track backfill progress, protected timestamps age, and number of schema registry registrations. 104239: cloud: limit object reads for pebble for S3 and GCS r=RaduBerinde a=RaduBerinde #### cloud: consolidate ReadFile APIs This change consolidates the `ReadFile` and `ReadFileAt` APIs in `cloud.ExternalStorage`. We use a `ReadOptions` struct to optionally specify the offset or that we don't care about the size. This will allow us to add more options without large code changes. Epic: none Release note: None #### cloud: add read LimitHint, implement for S3 and GCS Epic: none Release note: None #### storage: set LimitHint for pebble object reads Epic: none Release note: None 104283: changefeedccl: support the resolved option with format=parquet r=miretskiy a=jayshrivastava Previously, `format=parquet` and `resolved` could not be used together when running changefeeds. This change adds support for this. The release note is being left intentionally blank for a future commit. Informs: #103129 Release note: None 104286: keyvisualizer: return error if delete query fails r=zachlite a=zachlite This commit returns the error produced from `DeleteSamplesBeforeTime`, if any. Before this change, an error would cause a panic, which is disruptive and unnecessary. The caller of this function returns errors produced to the job system, which will back off, and try again later. For more details, see [Resume](https://github.com/cockroachdb/cockroach/blob/afcd974a8ca96f9f89a3ccb2e2b75bd70830fbf6/pkg/keyvisualizer/keyvisjob/job.go#L38). resolves #103968 Epic: none Release note (bug fix): The keyvisualizer job no longer panics if an error is encountered while cleaning up stale samples. Instead, if the job encounters an error, the job will try again later. 104288: roachtest: harden tpchvec/perf r=yuzefovich a=yuzefovich This commit improves `tpchvec/perf` roachtest so that it's less likely to fail due to some flake in performance. In particular, this test has an assertion that if a query runtime in ON config is 1.5x slower than in OFF config, then some bundles are collected and the test is failed. However, we've seen quite a few times when those bundles don't explain the slowness (which likely to be intermittent). To prevent these false positives this commit improves the test to run the query that was marked as too slow one more time and only fail the test if it's significantly slower again in ON config vs OFF config. Fixes: #101526. Release note: None Co-authored-by: Shiranka Miskin <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Zach Lite <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Jun 8, 2023
Previously, the option `key_in_value` was disallowed with `format=parquet`. This change allows these settings to be used together. Note that `key_in_value` is enabled by default with `cloudstorage` sinks and `format=parquet` is only allowed with cloudstorage sinks, so `key_in_value` is enabled for parquet by default. Informs: cockroachdb#103129 Informs: cockroachdb#99028 Epic: CRDB-27372 Release note: None
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Jun 8, 2023
This change adds support for the `diff` changefeed options when using `format=parquet`. Enabling `diff` also adds support for CDC Transformations with parquet. Informs: cockroachdb#103129 Informs: cockroachdb#99028 Epic: CRDB-27372 Release note: None
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Jun 8, 2023
This change adds support for the `end_time` changefeed options when using `format=parquet`. No significant code changes were needed to enable this feature. Closes: cockroachdb#103129 Closes: cockroachdb#99028 Epic: CRDB-27372 Release note (enterprise change): Changefeeds now officially support the parquet format at specificiation version 2.6. It is only usable with the cloudstorage sink. The syntax to use parquet is like the following: `CREATE CHANGEFEED FOR foo INTO `...` WITH format=parquet` It supports all standard changefeed options and features including CDC transformations, except it does not support the `topic_in_value` option.
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Jun 8, 2023
This change forces all tests, including tests for `diff` and `end_time` to run with the `cloudstorage` sink and `format=parquet` where possible. Informs: cockroachdb#103129 Informs: cockroachdb#99028 Epic: CRDB-27372 Release note: None
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Jun 13, 2023
This change adds support for the `diff` changefeed options when using `format=parquet`. Enabling `diff` also adds support for CDC Transformations with parquet. Informs: cockroachdb#103129 Informs: cockroachdb#99028 Epic: CRDB-27372 Release note: None
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Jun 13, 2023
This change forces all tests, including tests for `diff` and `end_time` to run with the `cloudstorage` sink and `format=parquet` where possible. Informs: cockroachdb#103129 Informs: cockroachdb#99028 Epic: CRDB-27372 Release note: None
craig bot
pushed a commit
that referenced
this issue
Jun 15, 2023
104528: changefeedccl: add full support for the parquet format r=miretskiy a=jayshrivastava ### changefeedccl: support key_in_value with parquet format Previously, the option `key_in_value` was disallowed with `format=parquet`. This change allows these settings to be used together. Note that `key_in_value` is enabled by default with `cloudstorage` sinks and `format=parquet` is only allowed with cloudstorage sinks, so `key_in_value` is enabled for parquet by default. Informs: #103129 Informs: #99028 Epic: [CRDB-27372](https://cockroachlabs.atlassian.net/browse/CRDB-27372) Release note: None --- ### changefeedccl: add test coverage for parquet event types When using `format=parquet`, an additional column is produced to indicate the type of operation corresponding to the row: create, update, or delete. This change adds coverage for this in unit testing. Additionally, the test modified in this change is made more simple by reducing the number of rows and different types because this complexity is unnecessary as all types are tested within the util/parquet package already. Informs: #99028 Epic: [CRDB-27372](https://cockroachlabs.atlassian.net/browse/CRDB-27372) Release note: None Epic: None --- ### util/parquet: support tuple labels in util/parquet testutils Previously, the test utilities in `util/parquet` would not reconstruct tuples read from files with their labels. This change updates the package to do so. This is required for testing in users of this package such as CDC. Informs: #99028 Epic: [CRDB-27372](https://cockroachlabs.atlassian.net/browse/CRDB-27372) Release note: None --- ### changefeedccl: support diff option with parquet format This change adds support for the `diff` changefeed options when using `format=parquet`. Enabling `diff` also adds support for CDC Transformations with parquet. Informs: #103129 Informs: #99028 Epic: [CRDB-27372](https://cockroachlabs.atlassian.net/browse/CRDB-27372) Release note: None --- ### changefeedccl: support end_time option with parquet format This change adds support for the `end_time` changefeed options when using `format=parquet`. No significant code changes were needed to enable this feature. Closes: #103129 Closes: #99028 Epic: [CRDB-27372](https://cockroachlabs.atlassian.net/browse/CRDB-27372) Release note (enterprise change): Changefeeds now officially support the parquet format at specificiation version 2.6. It is only usable with the cloudstorage sink. The syntax to use parquet is like the following: `CREATE CHANGEFEED FOR foo INTO `...` WITH format=parquet` It supports all standard changefeed options and features including CDC transformations, except it does not support the `topic_in_value` option. --- ### changefeedccl: use parquet with 50% probability in nemeses test Informs: #99028 Epic: [CRDB-27372](https://cockroachlabs.atlassian.net/browse/CRDB-27372) Release note: None --- ### do not merge: force parquet cloud storage tests This change forces all tests, including tests for `diff` and `end_time` to run with the `cloudstorage` sink and `format=parquet` where possible. Informs: #103129 Informs: #99028 Epic: [CRDB-27372](https://cockroachlabs.atlassian.net/browse/CRDB-27372) Release note: None Co-authored-by: Jayant Shrivastava <[email protected]>
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Jun 21, 2023
Previously, `format=parquet` and `resolved` could not be used together when running changefeeds. This change adds support for this. The release note is being left intentionally blank for a future commit. Informs: cockroachdb#103129 Release note: None
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Jun 21, 2023
Previously, the option `key_in_value` was disallowed with `format=parquet`. This change allows these settings to be used together. Note that `key_in_value` is enabled by default with `cloudstorage` sinks and `format=parquet` is only allowed with cloudstorage sinks, so `key_in_value` is enabled for parquet by default. Informs: cockroachdb#103129 Informs: cockroachdb#99028 Epic: CRDB-27372 Release note: None
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Jun 21, 2023
This change adds support for the `diff` changefeed options when using `format=parquet`. Enabling `diff` also adds support for CDC Transformations with parquet. Informs: cockroachdb#103129 Informs: cockroachdb#99028 Epic: CRDB-27372 Release note: None
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Jun 21, 2023
This change adds support for the `end_time` changefeed options when using `format=parquet`. No significant code changes were needed to enable this feature. Closes: cockroachdb#103129 Closes: cockroachdb#99028 Epic: CRDB-27372 Release note (enterprise change): Changefeeds now officially support the parquet format at specificiation version 2.6. It is only usable with the cloudstorage sink. The syntax to use parquet is like the following: `CREATE CHANGEFEED FOR foo INTO `...` WITH format=parquet` It supports all standard changefeed options and features including CDC transformations, except it does not support the `topic_in_value` option.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-cdc
Change Data Capture
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-cdc
#99028 tracks all the steps required to add the apache arrow parquet library and remove the old one. This issue tracks changefeed options which are not supported by parquet (these are the same as the options not supported by initial scan)
Jira issue: CRDB-27845
Epic CRDB-27372
The text was updated successfully, but these errors were encountered: