Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBG-3703 create a basic bucket to bucket XDCR implementation [HAS DEPENDENCY] #29

Closed
wants to merge 10 commits into from

Conversation

torcolvin
Copy link
Collaborator

@torcolvin torcolvin commented Mar 28, 2024

  • uses new common elements from sg-bucket
  • absorbs DataStoreName and GetCollectionID into DataStore to avoid duplication
  • The implementation only copies documents with xattrs, and excludes specially named documents. It does not implement
    _vv, _mou, or _sync handling.
  • test is duplicated in sync_gateway xdcr package with the code matching the common interface

couchbase/sg-bucket#117

Includes #30 which can be merged independently

- uses new common elements from sg-bucket
- absorbs DataStoreName and GetCollectionID into DataStore to avoid
duplication
- The implementation only copies documents with xattrs, and excludes
  specially named documents. It does not implement _vv, _mou, or _sync
  handling.
Copy link
Contributor

@gregns1 gregns1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good I think! Just couple of Q's in comments. Will let Adam have final say too.

xdcr_test.go Outdated Show resolved Hide resolved
xdcr_test.go Show resolved Hide resolved
Copy link
Contributor

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions/clarifications.

xdcr.go Outdated Show resolved Hide resolved
xdcr.go Outdated
fromBucket: fromBucket,
toBucket: toBucket,
replicationID: fmt.Sprintf("%s-%s", fromBucket.GetName(), toBucket.GetName()),
fromBucketCollectionIDs: map[uint32]sgbucket.DataStoreName{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected handling if there are collections defined in fromBucket that aren't present in toBucket? Should we be checking for this now instead of logging a warning for every mutation that arrives for a non-existent collection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see below that we're only supporting the default collection at this point? (we're not passing any scopes args to the DCP feed). Are you thinking of adding that later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an oversight and I've added tests for the non default collection.

xdcr.go Outdated

// getFromBucketCollectionName returns the collection name for a given collection ID in the from bucket.
func (r *XDCR) getFromBucketCollectionName(collectionID uint32) (sgbucket.DataStoreName, error) {
dsName, ok := r.fromBucketCollectionIDs[collectionID]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we need to know the set of fromBucket collections when we start the DCP feed anyway, I think populating fromBucketCollectionIDs on start instead of lazily will simplify things and avoid potential races accessing fromBucketCollectionIDs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a precondition of this that the DataStores are computed when XDCR.Start is called, and that there have to be matching collection names.

We could relax this in a future PR but I don't think this is necessary for testing at this time.

xdcr.go Outdated

switch event.Opcode {
case sgbucket.FeedOpDeletion, sgbucket.FeedOpMutation:
if strings.HasPrefix(docID, sgbucket.SyncDocPrefix) && !strings.HasPrefix(docID, sgbucket.Att2Prefix) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to abstract this as a key filter function that's defined on the replication (i.e. XDCR.keyFilterFunc), and set that filter to a pre-defined function when the replication is created with sgbucket.XDCRMobileOn. That would keep the general XDCR implementation a bit more generic and make it clear what's happening when XDCRMobileOn is set.

xdcr.go Outdated Show resolved Hide resolved
xdcr.go Outdated
return true
}

toCollection, ok := toDataStore.(*Collection)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unexpected to me that we're doing this check after we've already done toDataStore.Get. Can we just get the collection in the first place on line 107?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this code to avoid casts by implementing fixes in sgbucket to consolidate these interfaces into DataStore. This code exists in a separate PR but is included in this PR.

xdcr.go Outdated
return nil
}

// writeDoc writes a document to the target datastore. This will not return an error on a CAS mismatch, but will return error on other types of write.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "other types of write" means in this comment. Also, setWithMeta still needs to do a CAS check to ensure the document hasn't been mutated on the target between the time we evaluated LWW and the time we write, right?

xdcr.go Outdated
// writeDoc writes a document to the target datastore. This will not return an error on a CAS mismatch, but will return error on other types of write.
func writeDoc(ctx context.Context, collection *Collection, originalCas uint64, event sgbucket.FeedEvent) error {
if event.Opcode == sgbucket.FeedOpDeletion {
_, err := collection.Remove(string(event.Key), originalCas)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think XDCR does delWithMeta to ensure that the delete mutation on the target ends up with the same CAS as the delete mutation on the source - does collection.Remove() do the same thing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed.

xdcr.go Outdated

err := collection.SetWithMeta(ctx, string(event.Key), originalCas, event.Cas, event.Expiry, xattrs, body, event.DataType)

if !collection.IsError(err, sgbucket.KeyNotFoundError) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to ignore a KeyNotFound error here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

xdcr.go Outdated

}

err := collection.SetWithMeta(ctx, string(event.Key), originalCas, event.Cas, event.Expiry, xattrs, body, event.DataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we preserving system (_sync) xattrs on the target here? Or is that a pending enhancement?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All xattrs are preserved. A future enhancement will be to handle _vv and _mou xattrs.

@adamcfraser adamcfraser assigned torcolvin and unassigned adamcfraser and gregns1 Mar 28, 2024
@torcolvin torcolvin changed the title [DO NOT MERGE] CBG-3764 create a basic bucket to bucket XDCR implementation CBG-3703 create a basic bucket to bucket XDCR implementation [HAS DEPENDENCY] Apr 1, 2024
@torcolvin torcolvin assigned adamcfraser and unassigned torcolvin Apr 2, 2024
@torcolvin torcolvin closed this Apr 2, 2024
@torcolvin torcolvin deleted the CBG-3764 branch April 2, 2024 14:30
@torcolvin torcolvin restored the CBG-3764 branch April 2, 2024 14:30
@torcolvin torcolvin reopened this Apr 2, 2024
@torcolvin
Copy link
Collaborator Author

Dropping this PR to implement this entirely within sync gateway to avoid three repo PRs.

@torcolvin torcolvin closed this Apr 9, 2024
@torcolvin torcolvin deleted the CBG-3764 branch April 9, 2024 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants