-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow data transfers with more than 8192 received CIDs #118
Conversation
integrate cid lists as a migraiton to replace the previous cidlist stored in the state
use different tempdirs per peer, don't delete files for received cids lists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits but otherwise LGTM 👍
From a design perspective my only concern is whether we would need to change any lotus tooling to be aware of the new directory needed to store cid lists. I'm not sure if this is going to be an issue in practice, just flagging in case it's something we need to think about.
cidlists/cidlists.go
Outdated
} | ||
|
||
// CreateList initializes a new CID list with the given initial cids (or can be empty) for a data transfer channel | ||
func (cl *cidLists) CreateList(chid datatransfer.ChannelID, initalCids []cid.Cid) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (cl *cidLists) CreateList(chid datatransfer.ChannelID, initalCids []cid.Cid) error { | |
func (cl *cidLists) CreateList(chid datatransfer.ChannelID, initialCids []cid.Cid) error { |
cidlists/cidlists.go
Outdated
} | ||
} | ||
|
||
// DeleteList deletes the lsit for the given data transfer channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// DeleteList deletes the lsit for the given data transfer channel | |
// DeleteList deletes the list for the given data transfer channel |
cidlists/cidlists_test.go
Outdated
|
||
func TestCIDLists(t *testing.T) { | ||
|
||
err := os.MkdirAll("_test", 0755) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe create _test in a tmp directory, then we don't need the .gitignore?
channels/channel_state.go
Outdated
@@ -97,7 +96,8 @@ func (c channelState) Voucher() datatransfer.Voucher { | |||
|
|||
// ReceivedCids returns the cids received so far on this channel | |||
func (c channelState) ReceivedCids() []cid.Cid { | |||
return c.receivedCids | |||
receivedCids, _ := c.channelCIDsReader(c.ChannelID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return the error from ReceivedCids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just cause it's an API change -- and there's an interface that expects no error. But I will log it.
cidlists/cidlists.go
Outdated
return err | ||
} | ||
defer func() { | ||
_ = f.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be worth returning the error on Close. It seems like the caller would want to know if the file couldn't be closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit of a late review: I'm concerned about the perf of opening a file per CID fetched.
An alternative approach would be to just write to the datastore. That is, write /transferprefix/TRANSFER_ID/cids/CID
. We can then extract/delete the list with a range query. However, I'm also not entirely sure about the performance here either.
return c.receivedCids | ||
receivedCids, err := c.channelCIDsReader(c.ChannelID()) | ||
if err != nil { | ||
log.Error(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider propagating this up and letting the caller handle it, but logging it and moving on is also quite reasonable as we have "no information" in this case.
} | ||
|
||
// AppendList appends a single CID to the list for a given data transfer channel | ||
func (cl *cidLists) AppendList(chid datatransfer.ChannelID, c cid.Cid) (err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this going to be really slow if we have to do this per cid? I'd suggest having some form of batching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most files will be big, and the default lotus chunking is 1MB, so this should be usable
But it's still pretty far from ideal
Goals
Fix filecoin-project/lotus#4856
Implementation
This is a first attempt to fix 4856 -- it solves the issue in a protocol non-blocking way. There are more ideal ways to solve the issue but they involve changes to graphsync and require changes on the network.
What this does is remove the ReceivedCids list from the stored state for the channel, and it puts them instead in an append only file.
This has the effect of not only removing the limit but significantly reducing io & memory consumption relating to loading and saving these CIDs. The list is simply left in the file except when its needed for a restart.
The files are placed in a directory, indexed by channel ID.