Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compartmentalization/files API simplification #93

Open
rakoo opened this issue Apr 22, 2015 · 3 comments
Open

Compartmentalization/files API simplification #93

rakoo opened this issue Apr 22, 2015 · 3 comments

Comments

@rakoo
Copy link
Contributor

rakoo commented Apr 22, 2015

Hey there,

While working on the files subset of the code (specifically for putting everything files related, so various *files.go, various *cache.go, sftp.go) I realized that there is room to make the code simpler, more understandable. Here's a rough list of things I have in mind:

Modification of active pieces

A torrent.ActivePiece would keep a []byte with the whole piece content. In https://github.com/jackpal/Taipei-Torrent/blob/master/torrent/torrent.go#L1061 we don't write the chunk to the FileStore but to the active piece's buffer; then in https://github.com/jackpal/Taipei-Torrent/blob/master/torrent/torrent.go#L844 we don't read content back from the FileStore but directly from the active piece's buffer.

  • Instead of many little writes (16k at a time), we only have 1 "big" write (pieceLength)
  • If the piece is not complete, TT (or any torrent client) restarts the piece from scratch, even if all but one chunk have already been downloaded, because there is no way to know if the chunks are correct or not. So we might as well not write any chunk until the piece is complete
  • pieceLength shouldn't be huge and TT actively tries to have as few active pieces as possible, so it's not going to take a lot of memory. With the repo's dsl iso a memprofile shows 2MB of memory for the whole download
  • Instead of reading back from the FileStore (which could be slow if it's SFTP and you don't have a cache for some reason), the piece is verified directly from memory. Also, checkPiece and computePieceSum aren't needed anymore.
  • If the client is purely leeching, there is no need for a cache, even if the FileStore is remote
  • We'd need to specify in the FileStore doc that io.WriteAt is only called for full, verified pieces, and that they aren't read back unless asked by another peer

FileStore as the only relevant outside interface

FileStore would get back to what it was before, ie

type FileStore interface {
  // Write is always called for full, verified pieces only;
  // once a piece is written it's not read back until another client needs it
  io.WriterAt

  io.ReaderAt
  io.Closer
}

A TorrentCache would actually be a specific implementation of the FileStore interface that wraps another FileStore; there would be no specific interface just for it. From the point of view of the user, there's no indication that the FileStore is actually a cache. CacheProvider would be changed so that NewCache also has an underlying FileStore as argument. All reads are forwarded (if necessary), all writes are forwarded.

Combined with the previous change in active pieces, there is no need for a specific Commit phase for FileStore, because all writes are "clean" ie they're only full pieces. Any write of a piece to cache should be immediately forwarded to the underlying FileStore. This will also get rid of all the unfulfilled, because the cache wraps the FileStore; the cache would just read the whole piece from FileStore and store it locally, then write only the relevant part for the caller to use.

I am working on this in my file branch (see the storage sub-package) if you want to have a look.

Compartmentalization

Instead of having pretty much everything in the torrent package, I'd like to move the files stuff in its own package (I have a storage package). I also externalized the bitset.

Thoughts ?

@Zilog8
Copy link
Collaborator

Zilog8 commented Apr 23, 2015

Modification of active pieces & FileStore

I like the idea from a code perspective, but I'm still a bit unsure about the memory-use implications. Memory use would basically be unbounded (depending on how many open ActivePieces there are, and the size of the torrent's pieces). Although I guess if that were to become a problem, it could be mitigated through other means (off the top of my head, maybe choose at runtime if ActivePiece should use a []byte or file for storage), and we would still have the benefit of simpler code.

If the client is purely leeching, there is no need for a cache, even if the FileStore is remote

BitTorrent file distribution "works" because no client can/should be "purely leeching".

Otherwise, looks good to me.

@rakoo
Copy link
Contributor Author

rakoo commented Apr 23, 2015

100% agree, this proposal makes memory usage unbounded. Here's another proposal: keep an upper boundary of memory used for all ActivePieces, and in RequestBlock, prevent from creating a new ActivePiece if we have already reached the limit. Somewhere around here, check if the sum of active pieces' content + the potential new piece goes beyond the threshold; if true, don't request a new piece, just wait for the other ones to be retrieved, committed to disk and then discarded for memory to be available.

As you can guess, my goal is to avoid mixing parts as much as possible to keep everything simple (and, as you raised, under control).

BitTorrent file distribution "works" because no client can/should be "purely leeching".

If you're talking about "bittorrent the community", then I totally agree with you. However I'd like to see bittorrent used beyond the "community" use case, because the transport itself is extremely powerful; I'd like bittorrent to be the default transport protocol for distributing Wikipedia dumps, or Internet Archive items, or OSM tiles (or raw data), or my distribution packages. For those cases I expect users to be purely leeching content until completion, then leave the swarm (which would be kind of ok because there will always be an available seed, the upstream project). I'd expect them to be barely seeding content. Basically, I'd like bittorrent to be used when a client can reliably get the torrent (or even just the magnet) to retrieve data wherever it comes from, without having to seed just to improve its choking score.

Anyway, point taken, TT should remain "community" oriented in that it should not be directed towards pure leeching (that's a fair assumption).

@Zilog8
Copy link
Collaborator

Zilog8 commented Apr 23, 2015

Here's another proposal: keep an upper boundary of memory used for all ActivePieces, and in RequestBlock, prevent from creating a new ActivePiece if we have already reached the limit...

I like this, it seems like the most correct approach: "Don't []byte off more than we can chew" :-) .

Zilog8 added a commit that referenced this issue Feb 16, 2016
A while back rakoo, in issue #93, proposed some improvements and
simplifications that could be done to how TT handles active pieces,
caching, and FileStore. This commit starts on implementing a few.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants