Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a super-seeder docker image/container #43

Open
kelson42 opened this issue Jan 2, 2019 · 16 comments · May be fixed by #214
Open

Create a super-seeder docker image/container #43

kelson42 opened this issue Jan 2, 2019 · 16 comments · May be fixed by #214
Assignees

Comments

@kelson42
Copy link
Contributor

kelson42 commented Jan 2, 2019

Looks like still not all BitTorrent clients can deal properly with our Web seeds. Having a complete and always running super-seeder would help to solve that problem. We could run it on a mirror (files already there). Additionally this Docker image might be interesting to a few Kiwix supporters who have that way a solution to support the project by easily sharing a bit of there bandwidth.

Using rsync, see https://download.kiwix.org/README, and rtorrent, that should not be too complicated.

@nemobis
Copy link
Contributor

nemobis commented Jun 2, 2019

Do you think the docker image should assume a disk space in the TBs, to automatically seed everything, or something more conservative?

@kelson42
Copy link
Contributor Author

kelson42 commented Jun 2, 2019

@nemobis The whole download.kiwix.org is around 10TB. It is difficult to assume that a seeder has so much space for this. What might be a solution to that problem is to be able to share (as a Docker environment variable) a list of path regular expressions to filter what he wants to seed from download.kiwix.org.

@kelson42
Copy link
Contributor Author

kelson42 commented Jun 5, 2019

An old attemps can be found in this repo with the files:

  • kiwix_superseed.sh
  • .rtorrent.rc

These files should be removed at the end of the implementation

@stale
Copy link

stale bot commented Aug 4, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label Aug 4, 2019
@kelson42 kelson42 self-assigned this Sep 12, 2020
@stale stale bot removed the stale label Sep 12, 2020
@kelson42
Copy link
Contributor Author

Here a base of work https://gitlab.com/adrienandrem/kiwix-torrent-watcher

@rgaudin
Copy link
Member

rgaudin commented Sep 14, 2020

I see this is very recent. What's the status of this? Is there any need beyond that?

@stale
Copy link

stale bot commented Dec 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label Dec 24, 2020
@kelson42
Copy link
Contributor Author

I have been thinking about this ticket the last weeks and I think I know now how to do that the best way.

First of all, I plan to use a pre-existing Docker image https://github.com/linuxserver/docker-qbittorrent because:

  • It exists and is maintained
  • qBittorrent is a well know BitTorrent client
  • This is based on the qbittorrent-nox version, which is headless
  • qBittorrent proposed since many years an API which allows us to have a proper way to instrumentalise it
  • We can intrumentalize from outside of the container easily
  • It exists many client tools and library to deal with this API
  • I have verified it works and propose the options I believe we need

Considering we reuse linuxserver/qbitttorrent Docker image, we still have to have a solution to synchronise (add/remove torrents) with https://download.kiwix.org (or maybe even better https://library.kiwix.org?). I plan to do so:

  • Build a dedicated Docker image based on a simple bash script running in cron
  • Script will retrieve the list of ZIM to mirror in the superseeder based on the OPDS feed (so user can set a filter if needed)
  • Based on the feed data (parse with gron), script with require via API the qbittorrent client (via https://github.com/fedarovich/qbittorrent-cli/) to download new content
  • Content which are not in the feed anymore, will be deleted after a configurable delay.

@stale stale bot removed the stale label Feb 19, 2022
@kelson42 kelson42 linked a pull request Feb 20, 2022 that will close this issue
@stale
Copy link

stale bot commented Apr 27, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@benoit74
Copy link
Collaborator

benoit74 commented Oct 3, 2023

I like the idea of using linuxserver/qbitttorrent and its Docker image ; the project is very active and mature ; the web API is very usefull

I don't get if we want to :

  1. download missing content locally to be able to serve it
  2. take benefit of existing local content, meaning it has to be installed on a mirror

It looks like the initial idea was option 2, but we have switched to option 1, and I don't know why

I see lots of advantages in option 2 because:

  • we already have tooling to mirror files, I don't see the benefit of re-implementing it with bittorrent instead of rsync
  • it might save storage space + bandwidth for those who already have a mirror (and this is our case)
  • it will avoid complexities in our tooling (filtering what we want to download, deciding what has to be purged about which delay, ...)
  • it will avoid potential conflicts if installed on a place where a mirroring tool is already running (otherwise both the mirroring tool and the super seeder will need to have write access to the same location)
  • it will work even for hidden ZIM files / non-ZIM content if installed on download.kiwix.org

The drawbacks of option 2 are that:

  • we need to detect which files are available locally to add them to qbittorrent (I've checked, it is capable to handle already existing files)
  • we don't need to use the OPDS feed (I consider that qbittorrent will check file hash in any case before seeding it)
  • we need to detect which files have been removed to remove them from qbittorrent (but qbittorrent is probably already handling it a bit, it is quite common that users move downloaded files once the download is finished and usually - at least in Transmission - the client does not restart the resource download)

Looking at existing proposal, I can't comment much on that, it's a shell script and it's clearly not a language I can comment a lot. The overall logic is simple so it looks like it will work. I don't know how many subtleties we might discover once running in real conditions.

I wonder if we should instead write this additional tooling in Python, because:

  • qbittorrent CLI project is not maintained anymore while the Python library (https://github.com/rmartin16/qbittorrent-api) is maintained and very active (already supporting Python 3.12 and latest qbittorrent release)
  • it is easier (for me at least) to develop / test / maintain

@stale stale bot removed the stale label Oct 3, 2023
@rgaudin
Copy link
Member

rgaudin commented Oct 3, 2023

I think what this ticket misses is an (up-to-date) description of what problem this should solve with user scenarios examples.
The discussion already highlights that the storage/selection/cleanup is core. It's important to clear how our need and kiwix-enthusiasts' ones align for instance.

@kelson42
Copy link
Contributor Author

kelson42 commented Oct 14, 2023

I don't get if we want to :

1. download missing content locally to be able to serve it

2. take benefit of existing local content, meaning it has to be installed on a mirror

It looks like the initial idea was option 2, but we have switched to option 1, and I don't know why

We should be able to do both because:

I see lots of advantages in option 2 because:

* we already have tooling to mirror files, I don't see the benefit of re-implementing it with bittorrent instead of rsync

* it might save storage space + bandwidth for those who already have a mirror (and this is our case)

* it will avoid complexities in our tooling (filtering what we want to download, deciding what has to be purged about which delay, ...)

* it will avoid potential conflicts if installed on a place where a mirroring tool is already running (otherwise both the mirroring tool and the super seeder will need to have write access to the same location)

* it will work even for hidden ZIM files / non-ZIM content if installed on download.kiwix.org

Creating a HTTP mirror needs a lot more of infrastructure effort than creating a BitTorrent super-seeder. This is why both solutions don't really compete in the same field. The most obvious ones been to have a big and stable bandwidth and a fix IP.

The drawbacks of option 2 are that:

* we need to detect which files are available locally to add them to qbittorrent (I've checked, it is capable to handle already existing files)

Yes, this should be trivial. I'm ready to reconsider the requirements if not.

* we don't need to use the OPDS feed (I consider that qbittorrent will check file hash in any case before seeding it)

True, but that part is already implemented in the BitTorrent tracker, this is not really new work.

* we need to detect which files have been removed to remove them from qbittorrent (but qbittorrent is probably already handling it a bit, it is quite common that users move downloaded files once the download is finished and usually - at least in Transmission - the client does not restart the resource download)

True, wonder if this part is also not handled in the BitTorrent tracker!

Looking at existing proposal, I can't comment much on that, it's a shell script and it's clearly not a language I can comment a lot. The overall logic is simple so it looks like it will work. I don't know how many subtleties we might discover once running in real conditions.

If I remember correctly, I was almost over with the work and I was just lacking time. I don't remember having faced big challenges linked to subtilities.

I wonder if we should instead write this additional tooling in Python, because:

* qbittorrent CLI project is not maintained anymore while the Python library (https://github.com/rmartin16/qbittorrent-api) is maintained and very active (already supporting Python 3.12 and latest qbittorrent release)

* it is easier (for me at least) to develop / test / maintain

Nothing against this, should be fairly easy. I made it in Bourne shell because I didn't wanted to impose Perl as I can not write Python myself. Actually this is even probably a good idea.

@kelson42
Copy link
Contributor Author

For the following reasons I believe the effort of completing this PR woukd be really helpful:

  • We have already and recently invested significant amount of resources to improve quality of download speed
  • Audience grows and we have HTTP mirrors which struggle
  • Kiwix Desktop (in particular the downloader) has made a significant quality jump with version 2.4.0. We would be ready there to better support BitTorrent (both download/upload).
  • Still not all BitTorrent clients don't support Webseed properly

For all these reasons I believe we should now secure the super-seeder guaranties download via BitTorrent works as good as we could expect.

@nemobis
Copy link
Contributor

nemobis commented Oct 27, 2024

Nice to see some movement. I'm happy to help test this but I'll need some suggestions on what files to seed (the last times I tried to seed Kiwix torrents I failed to reach any meaningful ratio).

What I'd personally really like is a ruTorrent/transmission/other plugin to handle the addition and removal of torrents. That would be easy to install on top of any existing installation method, be it a web UI or a docker image.

@kelson42
Copy link
Contributor Author

kelson42 commented Oct 27, 2024

Nice to see some movement. I'm happy to help test this but I'll need some suggestions on what files to seed (the last times I tried to seed Kiwix torrents I failed to reach any meaningful ratio).

Since around two years, we have our own BitTorrent tracker. Therefore that for any "famous" ZIM file, you will find peers to share bits. Not even talking about the Web seeds.

This issue is only there to offer a guarante, to have - at least - one peer (to download from).

What I'd personally really like is a ruTorrent/transmission/other plugin to handle the addition and removal of torrents. That would be easy to install on top of any existing installation method, be it a web UI or a docker image.

To me this belongs to an other issue which is left to open. I was not even aware it was possible to create plugin for such a purpose to Transmission.

@nemobis
Copy link
Contributor

nemobis commented Oct 27, 2024

Therefore that for any "famous" ZIM file, you will find peers to share bits. Not even talking about the Web seeds.

Ok. I've ever had trouble finding peers via DHT or the public trackers. I just never get anyone leeching these days, probably because the web seeds are so fast. So I have no idea what to seed.

To me this belongs to an other issue which is left to open.

Maybe. OTOH it's compatible with the idea you wrote above:

Script will retrieve the list of ZIM to mirror in the superseeder based on the OPDS feed

The "script" can be implemented as something that uses the transmission/rtorrent/other RPC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants