Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set write batch readTs to the readMark.DoneUntil() value #778

Merged
merged 1 commit into from
May 2, 2019

Conversation

jarifibrahim
Copy link
Contributor

@jarifibrahim jarifibrahim commented Apr 23, 2019

Every Transaction stores the latest value of readTs it is aware of. When the transaction is discarded (which happens even when we commit), the global value of readMark is updated.

badger/txn.go

Lines 501 to 503 in 1fcc96e

if !txn.db.orc.isManaged {
txn.db.orc.readMark.Done(txn.readTs)
}

Previously, the readTs of transaction inside the write batch struct was set to 0. So the global value of readMark would also be set to 0 (unless someone ran a transaction after using write batch).
Due to the 0 value of the global readMark, the compaction algorithm would skip all the values which were inserted in the write batch call.

badger/levels.go

Lines 480 to 484 in 1fcc96e

// Pick a discard ts, so we can discard versions below this ts. We should
// never discard any versions starting from above this timestamp, because
// that would affect the snapshot view guarantee provided by transactions.
discardTs := s.kv.orc.discardAtOrBelow()

and

badger/txn.go

Lines 138 to 145 in 1fcc96e

func (o *oracle) discardAtOrBelow() uint64 {
if o.isManaged {
o.Lock()
defer o.Unlock()
return o.discardTs
}
return o.readMark.DoneUntil()
}

The o.readMark.DoneUntil() call would always return 0 and so the compaction wouldn't compact the newer values.

With this commit, the compaction algorithm works as expected with key-values inserted via Transaction API or via the Write Batch API.

Fixes #767


This change is Reviewable

@jarifibrahim jarifibrahim force-pushed the gc-test branch 3 times, most recently from 80a4518 to 7e9b9e4 Compare April 23, 2019 16:16
@jarifibrahim jarifibrahim changed the title Add more tests for vlog GC Set write batch readTs to the latest known readTs Apr 23, 2019
@jarifibrahim jarifibrahim marked this pull request as ready for review April 23, 2019 16:26
@jarifibrahim jarifibrahim force-pushed the gc-test branch 3 times, most recently from 036d3e7 to 7d9781e Compare April 24, 2019 10:03
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @jarifibrahim, @mangalaman93, and @manishrjain)


batch.go, line 138 at r1 (raw file):

	wb.txn.CommitWith(wb.callback)
	wb.txn = wb.db.newTransaction(true, true)
	// See comment about readTs in NewWriteBatch. This readTs is used only to detect conflicts in a transaction.

100 chars.

Remove that comment about "this readts ..."

@mangalaman93
Copy link
Contributor

@jarifibrahim will it now affect the performance as @manishrjain mentioned? How are we solving that? We should probably add comments around it in the PR.

@jarifibrahim
Copy link
Contributor Author

@jarifibrahim will it now affect the performance as @manishrjain mentioned? How are we solving that? We should probably add comments around it in the PR.

@mangalaman93 No, this change shouldn't affect performance. My previous changes would've affected the performance but this one shouldn't.

Copy link
Contributor Author

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @mangalaman93, and @manishrjain)


batch.go, line 138 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

100 chars.

Remove that comment about "this readts ..."

Done.

…d older versions of keys during compactions.

Every Transaction stores the latest value of `readTs` it is aware of. When the transaction is discarded (which happens even when we commit), the global value of `readMark` is updated.
https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/txn.go#L501-L503
Previously, the `readTs` of transaction inside the write batch struct was set to 0. So the global value of `readMark` would also be set to 0 (unless someone ran a transaction after using write batch).
Due to the 0 value of the global `readMark`, the compaction algorithm would skip all the values which were inserted in the write batch call.
See https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/levels.go#L480-L484
and
https://github.com/dgraph-io/badger/blob/1fcc96ecdb66d221df85cddec186b6ac7b6dab4b/txn.go#L138-L145
The `o.readMark.DoneUntil()` call would always return `0` and so the compaction wouldn't compact the newer values.

With this commit, the compaction algorithm works as expected with key-values inserted via Transaction API or via the Write Batch API.

See dgraph-io#767
@ashish-goswami
Copy link
Contributor

@jarifibrahim can you check why unit tests are failing on Teamcity?

@jarifibrahim jarifibrahim changed the title Set write batch readTs to the latest known readTs Set write batch readTs to the readMark.DoneUntil() value May 2, 2019
@jarifibrahim jarifibrahim merged commit 6b796b3 into dgraph-io:master May 2, 2019
@jarifibrahim jarifibrahim deleted the gc-test branch May 2, 2019 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Deleting values and running GC doesn't reclaim space
4 participants