Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rclone Integration #5324

Open
buengese opened this issue Sep 4, 2020 · 13 comments
Open

rclone Integration #5324

buengese opened this issue Sep 4, 2020 · 13 comments
Milestone

Comments

@buengese
Copy link

buengese commented Sep 4, 2020

Hi,
I'm one of developers of rclone a command line tool written in golang to interact with various cloud storage providers. I'm wondering if the team here is interested in some sort of integration with rclone.
We have a number of users that users that use borg with a cloud storage provider mounted via rclone with varying levels of success. Even though rclone's vfs has gotten much mature over time there is still significant overhead associated with it so a more direct would be beneficial. Currently rclone already has integration with restic that is achieved by a custom http2 api we are running on stdio. If only had a quick look at borg's remote repository protocol and not yet sure if it would be viable to implement something based on this on rclone's side.

Given the rather obvious absence of any cloud storage integration on borg's side, which may have been a conscious decision, I'd like to know if this something you would be in principle interested in before looking into this any further.
buengese

@ThomasWaldmann
Copy link
Member

I guess it would be interesting if it could be done without much changes to the existing Repository and RemoteRepository classes, e.g. by adding yet another one and triggering its use via repo url.

I can help if there are questions about borg internals (that are not already answered in the internals docs), but I don't use cloud storage myself.

@buengese
Copy link
Author

buengese commented Sep 9, 2020

Thanks for the info. I'm going to look into how far this possible with minimal changes to the current RemoteRepository setup.

@dragetd
Copy link
Contributor

dragetd commented Oct 4, 2020

Some thoughts I want to drop here, without being deep in the code base on either project. Please excuse if I misunderstanding concepts - if this is the case, I will delete my post. :) Background: Long time borg an rclone user, currently using it via rclone fuse/mount, including all the pain. :P

borg has a serve mode where it basically takes commands via SSH and acts as the 'storage engine'. Having code at the receiving end allows to do some operations on the server-side without roundtrips to the client. This of course is not possible with a typical cloud storage.

So one would not be able to use this architecture and would need to hook only into the storage layers. Which prevents some optimizations, but that is a logical consequence.

As far as I know, there is no big abstraction in the storage/file-io code in general in borg. Given the huge amount of possible options rclone has, I think a tight integration might be difficult here. But at the same time, rclone already has an architecture for providing backends through other protocols (e.g. SFTP) vai its own serve. One idea I had was, that we decouple things by rclone running as its own process and providing a prococol (backend) via rclone serve that we integrate into RemoteRepository on the Borg side or add a different kind of RemoteRepository.
Borg on the client side takes care of deduping, hashing etc. and the borg serve side 'only' needs to support some repository operations.

A typical borg remote repository is configured as ssh://user@host:port/path/to/repo (or without the explicit protocol, but I'd keep it here). And instead, the user could specify either
ssh+rclone://user@host:port/remote/path/to/repo or
rclone:///remote/path/to/repo

If ssh+rclone is specified, borg connects to the remote and starts a borg serve as usual at first. If only rclone is specified, borg would also start a borg serve, but would need to skip SSH and maybe use TCP locally to talk to borg serve. Alternatively we would only support the ssh+rclone implementation.

borg would look for rclone in the path and then strip the first component of the path, using this as the remote which has be configured in rclone. Then rclone is launched with a new borg-specific backend which is passed to the remote to use and the path of the repository. The new borg-backend also checks if this remotes supports everything we need (without seeking/appending, it would be a PITA)

This rclone backend would have to reimplement the repository configuration and locking mechanisms, the later probably being a major PITA along this road. What is left is mapping the repository operations to however rclone backends work.

That is basically the idea to work around some of the architecture challenges we have right now in borg. The idea of having a server-side implementation that is more flexible and allows different protocols has been floating around for a while, but was never picked up on. The loose coupling as I described it, might reduce the amount of refactoring needed in borg. So at least I personally would be okay with hacking something specific for rclone into a new 'RemoteRepository' implementation in borg and not try to design some general, flexible backend architecture here.

PS: As I said, I am not deep in either codebase. But maybe someone would be interested in bouncing around ideas via an online video session. I can offer a BigBlueButton instance (browser-based, no login, privacy, opensource) and would be available tomorrow (Monday, 5th october) around ~21:00 CEST. Drop a note!

[EDIT]
Okay, after talking with someone on IRC and reading some things, the serve 'protocol' is actually simpler than I thought. So maybe a direct 'serve' integration would work.

@buengese
Copy link
Author

buengese commented Oct 7, 2020

@dragetd Great to see there is some else interested in this. Sorry for coming back to you this late. I had already looked at Borg's RemoteRepository and it's definitely feasible to implement this into rclone. I think the current approach used by remote repository is relatively viable. As far as i can tell it's done by passing MsgPack data via stdio and through the ssh tunnel to another instance of borg. Most of the repository logic is on client side. Rclone already does something similar for restic (another backup tool) the communication also works through stdio but using http2 with rclone running separately.
Admittedly I haven't really worked on this any further despite announcing my intention to do so over a month ago (other projects got in the way). It'd still like to get to this soon hopefully this weekend especially if there is also interest from this now.

@enkore
Copy link
Contributor

enkore commented Oct 14, 2020

The serve protocol is pretty simple, as you noted, and also fairly stable (because of backwards-compatibility with old borg versions).

One potential drawback of adding repository drivers like this that came up way-back-then was that you probably end up with totally different or slightly different repository formats, so migrating a repository that uses driver A to a repo using driver B would need support from borg (a low-level, object-for-object copy, which is fairly simple, unlike replication, but it would still have to be added).

IIRC back then the idea was to just use "xyz://..." which would invoke something like "borg-driver-xyz" (=> loosely coupled binaries) with the rest of the URI as the sole parameter and use that for RPC.

There were also some daydreams about this then being able to be routed through qubes exec and such with basically no effort, which would have made backups of Qubes machines wayyyyyyy easier (their built-in backup was crap at the time).

@luminoso
Copy link

Is there any progress on this?

@ThomasWaldmann
Copy link
Member

not AFAICS.

@enkore
Copy link
Contributor

enkore commented Dec 27, 2020

What needs to be done from my PoV to make this happen:

  • Take a stern look at RemoteRepository and the server. There's a bunch of legacy hacks and workarounds in there. Try to figure out how to get rid of them for a "clean & stable" RPC protocol.
  • Document all RPC calls in a black-box form (on the level of msgpack messages). Document the transaction model implied by those RPC calls.
  • Implement foobar:// invoking borg-driver-foobar.

1.) is the hard part, 2.) is a bunch of legwork, 3.) is easy.

@git70
Copy link

git70 commented Aug 26, 2022

Hi

Two questions:

  1. Any updates on this topic?
  2. Do I understand correctly that the end result of this case may be similar to Restic's native Rclone support?
    https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#other-services-via-rclone

I love Borg and I'm not going to betray him ;)
I just want to complete Borg with some cool features for me :)

@ThomasWaldmann
Copy link
Member

@git70

  1. not as far as i know
  2. maybe. but not sure whether that can be done with what we have currently.

The easiest workaround is to just use a remote with borg support: borgbase, hetzner storagebox, rsync.net, own machine, ...

@git70
Copy link

git70 commented Aug 26, 2022

A little worse, but I understand that there may be serious reasons and too much work :(
The point is, I have a lifetime account on pCloud and I wouldn't have to pay for other services separately.
Regards!

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Aug 19, 2024

After #8332 is finished / merged, integrating rclone would likely become much easier.

Maybe it could be done in a similar way as with restic:

  • borgstore would need a http REST client as a backend, that does not exist yet (file: and sftp: do exist)
  • rclone would need a http REST server similar to the one it has for restic

In borg 2.0 (current master branch), things are still a bit complicated as it needs to support old borg repos also, at least as a source for borg transfer.

borg 2.1 is planned as the version when we get rid of all the borg < 2 legacy and remove a lot of code that won't be needed any more (old crypto, old repo code, ...).

@ThomasWaldmann
Copy link
Member

@buengese Did you already have a look?

#8332 is merged and borgstore repo has a PR for a REST client, but misses a REST server yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants