Skip to content
This repository has been archived by the owner on Jun 19, 2023. It is now read-only.

blockstore: Adding Stat method to map from Cid to BlockSize #5

Merged
merged 1 commit into from
Aug 7, 2018

Conversation

taylormike
Copy link
Contributor

@taylormike taylormike commented Jul 11, 2018

Performant way to map from Cid to BlockSize.

Helps resolve issue: ipfs/kubo#4378 (comment)
@whyrusleeping @kevina @magik6k

License: MIT
Signed-off-by: Jeromy [email protected]

@taylormike
Copy link
Contributor Author

I'm looking into the unit test failure...
I'm unable to reproduce this failure locally on my machine. The test appears to be passing. I am going to try running the tests on another machine.
--- FAIL: TestHasIsBloomCached (0.09s)
bloom_cache_test.go:113: Bloom filter has cache miss rate of more than 5%

Copy link
Member

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the test fail, code LGTM

@Stebalien
Copy link
Member

ipfs/kubo#3208

@Stebalien
Copy link
Member

This should probably return a special Stat struct or be called `GetSize. That is:

type Stat struct {
    Size int // -1 means "don't know"
}

@taylormike
Copy link
Contributor Author

@Stebalien Thank you, I will make this change.

-I will rename Stat to GetSize
-I will also change the return type from a unit to an int and return -1 when the BlockSize is unknown.

@Stebalien
Copy link
Member

I will also change the return type from a unit to an int and return -1 when the BlockSize is unknown.

If we're going to go with a method dedicated to returning the size, I'd just return (usize, error) and always get the size (falling back on querying the underlying datastore if necessary).

I suggested using -1 in the Stat struct if we "don't know" the size for performance reasons. That way, the user can use Stat to try to get cached/fast information that the datastore tracks and use Get (plus some additional logic) to get information the datastore doesn't. In retrospect, that was probably a bad idea anyways...

@taylormike
Copy link
Contributor Author

@Stebalien Thank you for your suggestions. After giving it some thought I decided on a method dedicated to returning the size. GetSize(*cid.Cid) (uint, error) I pushed out this change and squashed into a single commit.

Let me know if you have any questions. I'm happy to share more details and discuss further.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good in general but this'll fail for zero sized blocks (forgot we had those...). We could just memorize all zero-sized blocks (one per hash) but that's a decision I'd like to make later, if ever. Ideally, we should start using identity hashes for zero-length blocks so a zero-length block table wouldn't be very useful.

arc_cache.go Outdated
return ErrNotFound
}

b.arc.Remove(k) // Invalidate cache before deleting.
err := b.blockstore.DeleteBlock(k)
switch err {
case nil, ds.ErrNotFound, ErrNotFound:
b.addCache(k, false)
b.addCache(k, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we can actually have zero-sized blocks. We'll eventually "inline" these into CIDs but that's a work in progress. You'll probably need to use -1 to indicate a missing block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien
Done. I pushed out these changes and squashed into a single commit.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn I'm bad about catching everything on the first review.

GetSize should probably return ErrNotFound when the block isn't found to match Get.


It would also be nice to have a zero-sized block test (so we don't regress there).

blockstore.go Outdated
func (bs *blockstore) GetSize(k *cid.Cid) (int, error) {
maybeData, err := bs.datastore.Get(dshelp.CidToDsKey(k))
if err == ds.ErrNotFound {
return -1, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that Get returns bs.ErrNotFound when a block isn't found, this should probably do the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look... I'll need to think about this.

Removing that check on Line 191: Triggers three test failures:

--- FAIL: TestPutManyAddsToBloom (0.00s)
	bloom_cache_test.go:68: datastore: key not found
--- FAIL: TestGetAndDeleteFalseShortCircuit (0.00s)
	arc_cache_test.go:41: get hit datastore
--- FAIL: TestHasRequestTriggersCache (0.00s)
	arc_cache_test.go:41: has hit datastore

The two arc_cache_test failures are because:

  1. The test case assumes no errors when a block is not found via 'arccache.Has'. Removing the check in 'GetSize/Line 191' triggers this failure because arccache.Has has a dependency on GetSize:
func (b *arccache) Has(k *cid.Cid) (bool, error) {
	blockSize, err := b.GetSize(k)
	return blockSize > -1, err
}

The bloom_cache_test failure is due to similar reason:

  1. The test case assumes no errors when a block is not found via 'bloomcache.GetSize'.
       blockSize, err = cachedbs.GetSize(block2.Cid())
	if err != nil {
		t.Fatal(err) //<--Test fails here bloom_cache_test.go:68: datastore: key not found
	}
	if blockSize > -1 || has {
		t.Fatal("not added block is reported to be in blockstore")
	}

This is what I need to think about a bit more...
Open Question:
Ideally, should 'Has' be dependent on 'GetSize'? or decoupled?

Open Question:
Ideally, should 'GetSize' and/or 'Has' return an error when a block is not found? The current implementation for 'Has' w/ or w/o my changes currently suppresses the error in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solution
I can pull that check out of 'GetSize' on line 191 and add it to 'arccache.Has'

  1. This would allow for 'Has' to return false instead of an error when a block isn't found.
  2. This would also allow 'GetSize' to return bs.ErrNotFound when a block isn't found
func (b *arccache) Has(k *cid.Cid) (bool, error) {
	blockSize, err := b.GetSize(k)
	if err == ds.ErrNotFound {
		return false, nil
	}
	return blockSize > -1, err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like the right way to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien This is done. I pushed out these changes and squashed into a single commit.

Performant way to map from Cid to BlockSize. 

License: MIT
Signed-off-by: Jeromy <[email protected]>
@taylormike
Copy link
Contributor Author

@Stebalien This is done. I pushed out these changes and squashed into a single commit.

@Stebalien Stebalien merged commit a8e9dc0 into ipfs:master Aug 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants