Compartmentalization/files API simplification #93

rakoo · 2015-04-22T22:49:18Z

Hey there,

While working on the files subset of the code (specifically for putting everything files related, so various *files.go, various *cache.go, sftp.go) I realized that there is room to make the code simpler, more understandable. Here's a rough list of things I have in mind:

Modification of active pieces

A torrent.ActivePiece would keep a []byte with the whole piece content. In https://github.com/jackpal/Taipei-Torrent/blob/master/torrent/torrent.go#L1061 we don't write the chunk to the FileStore but to the active piece's buffer; then in https://github.com/jackpal/Taipei-Torrent/blob/master/torrent/torrent.go#L844 we don't read content back from the FileStore but directly from the active piece's buffer.

Instead of many little writes (16k at a time), we only have 1 "big" write (pieceLength)
If the piece is not complete, TT (or any torrent client) restarts the piece from scratch, even if all but one chunk have already been downloaded, because there is no way to know if the chunks are correct or not. So we might as well not write any chunk until the piece is complete
pieceLength shouldn't be huge and TT actively tries to have as few active pieces as possible, so it's not going to take a lot of memory. With the repo's dsl iso a memprofile shows 2MB of memory for the whole download
Instead of reading back from the FileStore (which could be slow if it's SFTP and you don't have a cache for some reason), the piece is verified directly from memory. Also, checkPiece and computePieceSum aren't needed anymore.
If the client is purely leeching, there is no need for a cache, even if the FileStore is remote
We'd need to specify in the FileStore doc that io.WriteAt is only called for full, verified pieces, and that they aren't read back unless asked by another peer

`FileStore` as the only relevant outside interface

FileStore would get back to what it was before, ie

type FileStore interface {
  // Write is always called for full, verified pieces only;
  // once a piece is written it's not read back until another client needs it
  io.WriterAt

  io.ReaderAt
  io.Closer
}

A TorrentCache would actually be a specific implementation of the FileStore interface that wraps another FileStore; there would be no specific interface just for it. From the point of view of the user, there's no indication that the FileStore is actually a cache. CacheProvider would be changed so that NewCache also has an underlying FileStore as argument. All reads are forwarded (if necessary), all writes are forwarded.

Combined with the previous change in active pieces, there is no need for a specific Commit phase for FileStore, because all writes are "clean" ie they're only full pieces. Any write of a piece to cache should be immediately forwarded to the underlying FileStore. This will also get rid of all the unfulfilled, because the cache wraps the FileStore; the cache would just read the whole piece from FileStore and store it locally, then write only the relevant part for the caller to use.

I am working on this in my file branch (see the storage sub-package) if you want to have a look.

Compartmentalization

Instead of having pretty much everything in the torrent package, I'd like to move the files stuff in its own package (I have a storage package). I also externalized the bitset.

Thoughts ?

The text was updated successfully, but these errors were encountered:

Zilog8 · 2015-04-23T15:03:55Z

Modification of active pieces & FileStore

I like the idea from a code perspective, but I'm still a bit unsure about the memory-use implications. Memory use would basically be unbounded (depending on how many open ActivePieces there are, and the size of the torrent's pieces). Although I guess if that were to become a problem, it could be mitigated through other means (off the top of my head, maybe choose at runtime if ActivePiece should use a []byte or file for storage), and we would still have the benefit of simpler code.

If the client is purely leeching, there is no need for a cache, even if the FileStore is remote

BitTorrent file distribution "works" because no client can/should be "purely leeching".

Otherwise, looks good to me.

rakoo · 2015-04-23T20:50:42Z

100% agree, this proposal makes memory usage unbounded. Here's another proposal: keep an upper boundary of memory used for all ActivePieces, and in RequestBlock, prevent from creating a new ActivePiece if we have already reached the limit. Somewhere around here, check if the sum of active pieces' content + the potential new piece goes beyond the threshold; if true, don't request a new piece, just wait for the other ones to be retrieved, committed to disk and then discarded for memory to be available.

As you can guess, my goal is to avoid mixing parts as much as possible to keep everything simple (and, as you raised, under control).

BitTorrent file distribution "works" because no client can/should be "purely leeching".

If you're talking about "bittorrent the community", then I totally agree with you. However I'd like to see bittorrent used beyond the "community" use case, because the transport itself is extremely powerful; I'd like bittorrent to be the default transport protocol for distributing Wikipedia dumps, or Internet Archive items, or OSM tiles (or raw data), or my distribution packages. For those cases I expect users to be purely leeching content until completion, then leave the swarm (which would be kind of ok because there will always be an available seed, the upstream project). I'd expect them to be barely seeding content. Basically, I'd like bittorrent to be used when a client can reliably get the torrent (or even just the magnet) to retrieve data wherever it comes from, without having to seed just to improve its choking score.

Anyway, point taken, TT should remain "community" oriented in that it should not be directed towards pure leeching (that's a fair assumption).

Zilog8 · 2015-04-23T22:46:39Z

Here's another proposal: keep an upper boundary of memory used for all ActivePieces, and in RequestBlock, prevent from creating a new ActivePiece if we have already reached the limit...

I like this, it seems like the most correct approach: "Don't []byte off more than we can chew" :-) .

A while back rakoo, in issue #93, proposed some improvements and simplifications that could be done to how TT handles active pieces, caching, and FileStore. This commit starts on implementing a few.

Zilog8 mentioned this issue Jun 10, 2015

Rakoo's Compartmentalization #96

Closed

Zilog8 added a commit that referenced this issue Feb 16, 2016

Simplify Cache Code

8a22594

A while back rakoo, in issue #93, proposed some improvements and simplifications that could be done to how TT handles active pieces, caching, and FileStore. This commit starts on implementing a few.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compartmentalization/files API simplification #93

Compartmentalization/files API simplification #93

rakoo commented Apr 22, 2015

Zilog8 commented Apr 23, 2015

rakoo commented Apr 23, 2015

Zilog8 commented Apr 23, 2015

Compartmentalization/files API simplification #93

Compartmentalization/files API simplification #93

Comments

rakoo commented Apr 22, 2015

Modification of active pieces

FileStore as the only relevant outside interface

Compartmentalization

Zilog8 commented Apr 23, 2015

rakoo commented Apr 23, 2015

Zilog8 commented Apr 23, 2015

`FileStore` as the only relevant outside interface