Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: add create segment message, enable empty segment flush #37407

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chyezh
Copy link
Contributor

@chyezh chyezh commented Nov 4, 2024

issue: #37172

  • add redo interceptor to implement append context refresh. (make new timetick)
  • add create segment handler for flusher.
  • make empty segment flushable and directly change it into dropped.
  • add create segment message into wal when creating new growing segment.
  • make the insert operation into following seq: createSegment -> insert -> insert -> flushSegment.
  • make manual flush into following seq: flushTs -> flushsegment -> flushsegment -> manualflush.

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chyezh
To complete the pull request process, please assign yanliang567 after the PR has been reviewed.
You can assign the PR to them by writing /assign @yanliang567 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot added size/XXL Denotes a PR that changes 1000+ lines. area/dependency Pull requests that update a dependency file labels Nov 4, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Nov 4, 2024
Copy link
Contributor

mergify bot commented Nov 4, 2024

@chyezh go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@chyezh E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@chyezh chyezh force-pushed the fix_streaming_flush_empty_segment branch from 39b8028 to cb5a5e9 Compare November 4, 2024 07:25
@@ -1013,6 +1013,24 @@ func UpdateIsImporting(segmentID int64, isImporting bool) UpdateOperator {
}
}

// UpdateAsDroppedIfEmptyWhenFlushing updates segment state to Dropped if segment is empty and in Flushing state
// It's used to make a empty flushing segment to be dropped directly.
func UpdateAsDroppedIfEmptyWhenFlushing(segmentID int64) UpdateOperator {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used to set a empty segment into dropped when savebinlogpath

@@ -147,6 +155,14 @@ func UpdateNumOfRows(numOfRows int64) SegmentAction {
}
}

func SetStartPositionIfNil(startPos *msgpb.MsgPosition) SegmentAction {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setup startPosition when first insert message comes, applied to writebuffer

@@ -236,6 +236,19 @@ func (ddn *ddNode) Operate(in []Msg) []Msg {
WithLabelValues(fmt.Sprint(paramtable.GetNodeID()), metrics.DeleteLabel).
Add(float64(dmsg.GetNumRows()))
fgMsg.DeleteMessages = append(fgMsg.DeleteMessages, dmsg)
case commonpb.MsgType_CreateSegment:
createSegment := msg.(*adaptor.CreateSegmentMessageBody)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create new segment when create segment incoming.
TODO: Move the create new segment operation on datacoord at interceptor to here.

@@ -257,7 +258,11 @@ func (s *storageV1Serializer) serializeMergedPkStats(pack *SyncPack) (*storage.B
BF: pks.PkFilter,
PkType: int64(s.pkField.GetDataType()),
}
}), segment.NumOfRows())
})
if len(stats) == 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow to merge a empty stats if the segment is empty.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@chyezh E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@chyezh chyezh force-pushed the fix_streaming_flush_empty_segment branch from cb5a5e9 to 6ab0ba1 Compare November 4, 2024 08:16
Copy link
Contributor

mergify bot commented Nov 4, 2024

@chyezh go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@chyezh cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@chyezh E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

logger := log.With(
zap.String("vchannel", ddn.Name()),
zap.Int32("msgType", int32(msg.Type())),
zap.Uint64("timetick", createSegment.CreateSegmentMessage.TimeTick()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to log segmentID as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it soon

@@ -16,9 +16,15 @@

package flusher

import "github.com/milvus-io/milvus/pkg/streaming/util/message"
import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the file?

m.mu.Lock()
defer m.mu.Unlock()
// no-op if the incoming time tick is less than the fenced time tick.
if timeTick <= m.fencedAssignTimeTick {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Concurrent maunalflush incoming, The older manualflush will be ignored.
  2. If manualflush trigger a flushsegment operation, it will redo the operation to generate a new timetick to keep the message sequence:flushTs -> flushsegment -> flushsegment -> manualflush. So the redo operation will be rejected here. In previous implementation, the sequence will be flushTs -> manualflush -> flushsegment -> flushsegment, it's wierd.

@chyezh chyezh force-pushed the fix_streaming_flush_empty_segment branch 2 times, most recently from 0807515 to 9a09f1a Compare November 4, 2024 13:50
Copy link
Contributor

mergify bot commented Nov 4, 2024

@chyezh E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link

codecov bot commented Nov 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.31%. Comparing base (f813fb4) to head (1936a4d).
Report is 7 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #37407   +/-   ##
=======================================
  Coverage   67.31%   67.31%           
=======================================
  Files         290      290           
  Lines       25377    25377           
=======================================
  Hits        17082    17082           
  Misses       8295     8295           
Components Coverage Δ
Client ∅ <ø> (∅)
Core 67.31% <ø> (ø)
Go ∅ <ø> (∅)

@chyezh chyezh force-pushed the fix_streaming_flush_empty_segment branch from 9a09f1a to a2fbfc4 Compare November 5, 2024 08:35
Copy link
Contributor

mergify bot commented Nov 5, 2024

@chyezh E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@chyezh chyezh force-pushed the fix_streaming_flush_empty_segment branch from a2fbfc4 to 32bed9b Compare November 5, 2024 12:50
@chyezh
Copy link
Contributor Author

chyezh commented Nov 5, 2024

rerun ut

@mergify mergify bot added ci-passed and removed ci-passed labels Nov 5, 2024
Copy link
Contributor

mergify bot commented Nov 6, 2024

@chyezh E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@chyezh
Copy link
Contributor Author

chyezh commented Nov 6, 2024

/run-cpu-e2e

@mergify mergify bot added the ci-passed label Nov 6, 2024
- add redo interceptor to implement append context refresh. (make new timetick)
- add create segment handler for flusher.
- make empty segment flushable and directly change it into dropped.
- add create segment message into wal when creating new growing segment.

Signed-off-by: chyezh <[email protected]>
@chyezh chyezh force-pushed the fix_streaming_flush_empty_segment branch from 32bed9b to 1936a4d Compare November 7, 2024 03:02
@mergify mergify bot removed the ci-passed label Nov 7, 2024
Copy link
Contributor

mergify bot commented Nov 7, 2024

@chyezh E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@chyezh
Copy link
Contributor Author

chyezh commented Nov 7, 2024

/run-cpu-e2e

@mergify mergify bot added the ci-passed label Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependency Pull requests that update a dependency file ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants