Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update proxy to near TiKV 0b97a39d520afcd82a02d0f5c0966d6ccab2cd74 #377

Merged
merged 50 commits into from
Jul 16, 2024

Conversation

CalvinNeo
Copy link
Member

What is changed and how it works?

Issue Number: Close #xxx

What's Changed:


Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note


SpadeA-Tang and others added 30 commits June 14, 2024 05:40
ref tikv#16141, close tikv#17143

set boundries when gc

Signed-off-by: SpadeA-Tang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…te range (tikv#17142)

ref tikv#16141, close tikv#17140

encode key for lock cf without mvcc version in delete range

Signed-off-by: SpadeA-Tang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17123

The `safe_point` is used to check if a cache is eligible to serve a read request.
If the `safe_point` drifts into the future, for example by 10 minutes, the cache
will not be able to serve any requests for those 10 minutes.

This commit fixes such issue by getting the `safe_point` from PD.

Signed-off-by: Neil Shen <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#16141, ref tikv#16764

In-Memory Engine: Algorithmic Load and Eviction

Signed-off-by: Alex Feinberg <[email protected]>
ref tikv#16141, close tikv#17131

consider range overlaps when eviction

Signed-off-by: SpadeA-Tang <[email protected]>
ref tikv#16141

support delete range

Signed-off-by: SpadeA-Tang <[email protected]>
…global indexScan (tikv#17141)

close tikv#17138

let `ExtraPhysTblIDCol` column return partition id when it's a global indexScan

Signed-off-by: Jason Mo <[email protected]>
Signed-off-by: Hangjie Mo <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#16001

The dispatch of snapshot generation tasks and region destruction are handled 
by the same thread. It was observed in tikv#12587 that region destruction can take 
a long time and block snapshot generation. This commit addresses the problem by
handling the region destruction in a separate thread pool. To support running 
the logic in a separate pool, methods related to region destruction (e.g. 
clean_stale_ranges, clean_overlap_ranges_roughly, clean_overlap_ranges) are 
moved under a new struct called `RegionCleaner`. Note that there is no 
functional change to those methods at all. The new struct is wrapped inside a 
mutex for thread safety because the original thread also needs to call 
`clean_overlap_ranges` when applying a snapshot.

After the change, region destruction will no longer block snapshot generation, 
but it can still block snapshot apply because an apply task also needs to 
detroy regions.

Signed-off-by: Bisheng Huang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17153

evict range should consider loading range

Signed-off-by: SpadeA-Tang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#16141, close tikv#17103

clear write batch when range load failed

Signed-off-by: SpadeA-Tang <[email protected]>
ref tikv#11189

As suggested in tikv#11189, using a dedicated procedural macro for simple
literal case conversion is overkill. This commit replace `case_macros`
(a procedural macro crate) with `heck`. It not only simplifies the code,
but also slightly reduces the TiKV binary size, as `heck` is already
in the dependency tree.

Signed-off-by: Neil Shen <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#17078

In a recent issue, it took a long time for snapshots to be applied likely due
to a high number of L0 files, but it wasn't easy to troubleshoot the problem.
This commit introduces metrics to track the number of pending snapshot applies
and the occurrences of snapshot ingest delays to help diagnose similar 
problems.

Signed-off-by: Bisheng Huang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Co-authored-by: lucasliang <[email protected]>
…e to memory (tikv#17149)

close tikv#17104

fix concurrency issue between delete range and write to memory

Signed-off-by: SpadeA-Tang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#16141

add seek duration metrics

Signed-off-by: SpadeA-Tang <[email protected]>
ref tikv#15990

The rocksdb log is replace by tikv log in tikv#7358 about 4 years ago. After this PR, rocksdb configs `info_log_max_size`, `info_log_roll_time`, `info_log_keep_log_file_num` and `info_log_level` are no long taken effect. This PR marks these configs as deprecated and clean up related code.

Signed-off-by: glorv <[email protected]>
)

ref tikv#16141

add metrics for prepare for write duration

Signed-off-by: SpadeA-Tang <[email protected]>
ref tikv#16141

set disk engine when start

Signed-off-by: SpadeA-Tang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17133

Signed-off-by: Jinlong Liu <[email protected]>
Signed-off-by: King-Dylan <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#8160

This commit offloads one case of snapshot deletion from the raftstore 
thread to the background cleanup thread introduced in tikv#12415. This 
aligns with the principle of performing only performance-critical work 
in the raftstore thread.

Signed-off-by: Bisheng Huang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#16141

avoid using snapshot cache

Signed-off-by: SpadeA-Tang <[email protected]>
ref tikv#16141

evict range when becomeing follower, merge, and deleting data ranges.

Signed-off-by: SpadeA-Tang <[email protected]>
…tikv#17171)

ref tikv#16141

filter plus one when any key is removed during GC

Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: Spade  A <[email protected]>
ref tikv#17166

Currently, the In-memory Engine heavily uses tikv/skiplist-rs, which is
a fork of crossbeam-skiplist. This fork was created to include a patch
crossbeam-skiplist#1091 that has not yet been merged.

Maintaining this fork in a separate repository has proven to be
impractical for several reasons:

* It contains only one TiKV-specific change.
* It will not be released to crates.io.
* PRs in this repository rarely receive code reviews.

After discussing offline, we have decided to move this fork into the TiKV
components repository. We hope this change will:

* Streamline maintenance.
* Increase visibility and code review participation.

Signed-off-by: Neil Shen <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#15083

Allows log backup to read the encryption config and use it to encrypt the local files.

Signed-off-by: hillium <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#17178

Fix memory leak in crossbeam-channel's unbounded channel.

Signed-off-by: Neil Shen <[email protected]>
close tikv#17172

allow pop untracked lock to prevent cdc component panic

Signed-off-by: 3AceShowHand <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17176

Set compression arguments for TiKV service. The compression arguments are loaded from TiKV config.
It will affect TiKV's response to TiDB.

Signed-off-by: ekexium <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…egion (tikv#17169)

close tikv#17168

backup: continue to seek regions if one range has no located leader region

Signed-off-by: Jianjun Liao <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…entries. (tikv#17183)

close tikv#17106

fix under-estimate memory size usage of delete entry

Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Connor1996 and others added 16 commits June 28, 2024 09:08
tikv#17164)

ref tikv#15990

Enhance slow log for peer/store messages with detailed info

Signed-off-by: Connor1996 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17211

cdc: rollback txn-source attached transactions correctly

Signed-off-by: qupeng <[email protected]>
Signed-off-by: qupeng <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Co-authored-by: Neil Shen <[email protected]>
…7215)

close tikv#17191, close tikv#17213

After tikv#17019, raft snapshots get blocked when a peer is removed
immediately after requesting a snapshot. This occurs because the leader
must wait for `MsgSnapGenPrecheckResponse` before generating snapshots.
As a result, not only is the snapshot blocked, but subsequent conf
changes (such as adding a node) are also blocked.

To resolve this issue, TiKV needs to cancel the snapshot generation
task if a peer becomes unreachable.

Signed-off-by: Neil Shen <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990

Fix display of ingest SST and compaction size

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#17206

Introduce a new security mode `marker` for log desensitization.

Signed-off-by: lucasliang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17182

Both Lightning import and BR restore will read ingest data for checksum
verification. However, TiKV currently does not load SST files immediately
after ingestion. This delay causes checksum mismatches, leading to
failures in Lightning and BR.

This commit workaround such issue by evict ranges that are ingested sst.

Signed-off-by: Neil Shen <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…ikv#17235)

close tikv#17234

Fix cpu profiling flag may not reset when meeting errors

Signed-off-by: Connor1996 <[email protected]>
ref tikv#16141

Add property test for skiplist map vs. RocksDB. This test validates
basic key-value operations including put, get, delete, scan, and
delete_range.

Signed-off-by: Neil Shen <[email protected]>
…ikv#17240)

ref tikv#17239

This pr is used to fix the incorrect metrics on `Ingest SST duration seconds`, introduced in tikv#17070

Signed-off-by: lucasliang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17226

As a follow-up of tikv#17215, this commit fixes an issue with the
`cancel_generating_snap` function to ensure that `snap_tried_cnt`
within PeerStorage is not unexpectedly incremented.

Background:
In PeerStorage, `snap_tried_cnt` is used to track the number of
snapshot generation attempts. As with what happened in tikv#17226, TiKV
would panic if `snap_tried_cnt` exceeds the limit (currently hardcoded
to 5). By design, `snap_tried_cnt` should only be incremented when an
actual failure happens during the snapshot generation process. It
should not be incremented if the snapshot generation is intentionally
canceled.

Although its name suggests so, the `cancel_generating_snap` function
doesn't always cancel the snapshot generation. If the inputting
`compact_to` is so small that it doesn't make the snapshot stale,
`cancel_generating_snap` will return early and won't set `cancel` to
true. In the fix, we ensure that we only cancel the precheck when
`cancel` is set to true. Without it, `snapshot()` will see a 
`TryRecvError::Disconnected` error while `last_canceled` is false and 
increment `tried_cnt`.

Note that `cancel_generating_snap` is only called with a `compact_to`
value in `on_compact_raftlog`, which I believe to be the trigger of
issue tikv#17226.

Also, this commit fixes a similar problem to ensure that `tried_cnt` is
not incremented in another potential scenario—when the peer requesting
snapshot is unknown.

Signed-off-by: Bisheng Huang <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…7228)

close tikv#17229

cdc: revert the enhancemant about region partial subscription

Signed-off-by: qupeng <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#16141

Do not evict the cache when applying a snapshot because the peer
must be in the follower role during this process. The cache is
already evicted when the leader steps down.

Signed-off-by: Neil Shen <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17224

Add a disk usage check when execute `download` and `apply` RPC from br.
When the disk is not `Normal`, the request would be rejected.

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17233

cdc: skip incremental scaned events after region fails

Signed-off-by: qupeng <[email protected]>
…7257)

close tikv#17166

Following up on tikv#17167, this commit cherry-picks the remaining
`OwnedIter` changes from the tikv/skiplist-rs repository. It also
replaces skiplist-rs with crossbeam-skiplist in the TiKV codebase.

Signed-off-by: Neil Shen <[email protected]>

Co-authored-by: SpadeA-Tang <[email protected]>
@CLAassistant
Copy link

CLAassistant commented Jul 15, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
8 out of 17 committers have signed the CLA.

✅ SpadeA-Tang
✅ glorv
✅ Defined2014
✅ Leavrth
✅ 3AceShowHand
✅ ekexium
✅ wuhuizuo
✅ CalvinNeo
❌ afeinberg
❌ overvenus
❌ hicqu
❌ hbisheng
❌ King-Dylan
❌ YuJuncen
❌ Connor1996
❌ LykxSassinator
❌ RidRisR
You have signed the CLA already but the status is still pending? Let us recheck it.

@ti-chi-bot ti-chi-bot bot added the size/XXL label Jul 15, 2024
@CalvinNeo CalvinNeo changed the title Upd redact log Update proxy to near TiKV 0b97a39d520afcd82a02d0f5c0966d6ccab2cd74 Jul 15, 2024
@CalvinNeo
Copy link
Member Author

/retest

@ti-chi-bot ti-chi-bot bot added the lgtm label Jul 15, 2024
Copy link

ti-chi-bot bot commented Jul 15, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

ti-chi-bot bot commented Jul 15, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-07-15 11:27:07.656446339 +0000 UTC m=+266849.647387804: ☑️ agreed by JaySon-Huang.

@ti-chi-bot ti-chi-bot bot added the approved label Jul 15, 2024
@CalvinNeo
Copy link
Member Author

/merge_label

@CalvinNeo
Copy link
Member Author

/label tide/merge-method-merge

Copy link

ti-chi-bot bot commented Jul 16, 2024

@CalvinNeo: The label(s) /label tide/merge-method-merge cannot be applied. These labels are supported: ``. Is this label configured under labels -> additional_labels or `labels -> restricted_labels` in `plugin.yaml`?

In response to this:

/label tide/merge-method-merge

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@CalvinNeo CalvinNeo added the tide/merge-method-merge merge method merge label Jul 16, 2024
@CalvinNeo CalvinNeo merged commit 0ea7365 into pingcap:raftstore-proxy Jul 16, 2024
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.