Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce zero merge conflicts #172

Closed
joshuakarp opened this issue May 28, 2021 · 10 comments
Closed

Enforce zero merge conflicts #172

joshuakarp opened this issue May 28, 2021 · 10 comments
Assignees
Labels
design Requires design r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management

Comments

@joshuakarp
Copy link
Contributor

joshuakarp commented May 28, 2021

Created by @CMCDragonkai

I want to explore the idea of making Polykey NEVER have a merge conflict.

How do we do this? Well we must limit the control surface of pushes/pulls and changes so that this can never occur.

One idea is this. As soon as a vault has been pushed to a keynode. If that keynode changes that vault. It doesn't mutate that vault. Instead what happens is that the vault is forked, and represents a new vault.

This idea is actually similar to persistent data structures. That is if every vault is a chain of states/changes. Then that makes it a linked list. Any subsequent change doesn't mean the old version of the data structure is gone.

In Haskell and in js-resource-counter we have persistent data structures. Sharing a linked list between 2 functions that change the list doesn't affect each other.

This also means you can never rewrite history. You can only go ahead. Vaults are there append-only hash chains (linked lists).

This also means visible vaults are like garbage collected "handles". Or views of the data structures. You can view at any point.

Then pushing a vault from source keynode to target keynode is about appending new changes to a particular vault view.

All the vaults can then form a tree. But any specific vault forms a linear chain back to the beginning.

This should prevent any possibility of merge conflicts.

Other ideas include CRDTs.

This may impact how we identify vaults. Instead of randomly generated vaultids. They are some deterministically generated identifier of a particular position in the tree of commits.


Resolved in https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205

@joshuakarp
Copy link
Contributor Author

@CMCDragonkai : This means mutating a vault that isn't "owned" by you should fork a new branched vault and you never merge back. This behaviour needs to be verified @DrFacepalm @scottmmorris

@CMCDragonkai CMCDragonkai added this to the Polykey CLI 1.0.0 Release milestone Jul 3, 2021
@CMCDragonkai CMCDragonkai removed their assignment Jul 3, 2021
@CMCDragonkai CMCDragonkai added design Requires design development Standard development and removed development Standard development labels Jul 3, 2021
@CMCDragonkai
Copy link
Member

Consider that the GitHub wiki has the same problem if there are concurrent editors. I wonder how GitHub wiki enforces zero merge conflicts, since there is no interface to fix merge conflicts if 2 people edit the same wiki page. Good idea to investigate the GitHub wiki and see what we should do our vaults if this occurs.

@CMCDragonkai
Copy link
Member

While reviewing the Vault class right now in client-refactoring, I can see that it's quite high level doesn't expose the underlying fs interface much at all. If we later want to expose the vault has a mountable filesystem, this will need to be changed later.

Actually I might have to change how fs operations translate to git commits quite significantly...

@CMCDragonkai
Copy link
Member

Designing this requires completing these MRs:

@CMCDragonkai CMCDragonkai self-assigned this Jul 14, 2021
@CMCDragonkai
Copy link
Member

Copying this comment which is relevant to this design:

Vaults are an abstraction on top of EFS.

Right now we are creating an explicit set of operations for a vault.

But there is a more flexible way to do this.

Instead expose the entire EFS, so it can be used like a normal FS.

The main difference is that all FS operations must be wrapped together to form a "transaction".

That transaction is then committed via the iso git integration.

Here's an example:

vault.commit((fs) => {
  await promisify(fs.readFile).bind(fs)();
  await promisify(fs.writeFile).bind(fs)();
  // ...
});

Right now there's no promise interface that EFS nor VFS exposes. So for now you have to use promisify or callback chaining.

The commit method automatically takes all changes there and turns it into a Git commit.

This requires a few things:

  • the fs object passed in has to be a sort of proxy on top of EFS
  • the proxy has to allow us to "observe" what the changes are, this is what allows to make an "automatic git commit message"
  • the fs interface has to be limited right now, in particular we should not provide access to the underlying .git directory
  • the translation of high level commands like pk secrets put vault:secretfile would then have to be done at the highest level, like in client service

Given the above interface is a "callback" style, I don't think this is supported in GRPC yet, so it's not possible to have the CLI/GUI pass a "callback" to the server to act on. The closest thing would be to create a client to server stream, where the client streams commands to the server, and the server translates these into a callback. But this is overkill atm. (Alternatively the client constructs a serialized command language, and sends it over to the server to execute (like having a SQL thing)).

So the final effect of this is that the Vault methods are still really general, and that actual translation of high level commands from CLI/GUI to fs manipulation happens in clientService. This opens up the possibility of a future fs manipulation language if needed, and potentially for the agent to mount a vault on the Unix/Windows filesystem similar to how keybase does things.

This means for now the fs object can be limited in its type interface, but retain the full object capability of an efs instance.

In terms of limiting access to .git..., this would have to be figured out later, our proxies can be tracing the calls and preventing access to it, maybe we can put the .git somewhere else, but not sure what is a good method for this.

Creating a proxy to EFS can be done either by extending the EFS class instance, or by applying an ES6 proxy, in such a case, we are mostly recording what operations are being performed.

Alternatively we don't even use the proxy, but allow any changes to occur, and then use isogit to just capture the changes, and isogit to tell us what files changed and construct the commit message from that.

@scottmmorris
Copy link
Contributor

'isomorphic-git` does not allow merge conflicts either, so a simple try and catch statement when pulling has been added.

As far as I can see the only case where merge conflicts occur is when updateSecret is used on ANY file in the vault has been changed and the content is different to the latest commit on the remote.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jul 23, 2021 via email

@scottmmorris
Copy link
Contributor

Right, that case is just for pulling from the original repository. I'll make more edits once I tests with vaults that are not the original repository

@CMCDragonkai
Copy link
Member

The challenge of concurrent edits to the vault needs to be considered.

If there are multiple CLI calls or GUI calls or GRPC calls to the same vault, and they are editing different files, the commits should be tracked against the files being edited and we also need to consider globbing here. Then the commits can be serialised one at a time.

We need to ensure that isogit actually properly locks when it commits. If it doesn't, we can use an async-mutex for each vault to ensure that commits are serialised without inconsistency.

This means that each command such as in MatrixAI/Polykey-CLI#32 must keep track of all files and directories edited, and those are the ones added into the commit.

What happens if 2 commands are editing the same file? How is this sorted in GitHub/GitLab wiki? It turns out that each edit to the wiki is a complete update to the file, and there are no merged to concurrent editing. See: https://gitlab.com/gitlab-org/gitlab/-/issues/200002 and https://gitlab.com/gitlab-org/gitlab-foss/-/issues/1827.

echo 'abc abc' | pk secrets write vault1:/abc &
echo 'blah blah' | pk secrets write vault2:/abc &

In the case of PK how is this sorted out? At this point this probably depends on the concurrency of the actual EFS, how the changes are applied there. There may be some clobbering here. But then how does one commit if the file changes are already clobbered and committed in the first case?

One quick way of resolving this is just prevent parallel edits to the same vault. If we add a global lock to the vault entirely and synchronise edits this way. But this seems heavy weight. Let's start with a whole vault lock first, and then let's see what we can do subsequently.

@CMCDragonkai
Copy link
Member

This was done by merging of vaultsrefactoring https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205.

Note that making pulled vaults remote should still be tested for. Docs will be MatrixAI/Polykey-Docs#3. @joshuakarp please make sure to indicate any missing tests with regards to this issue.

@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management label Jul 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Requires design r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management
Development

No branches or pull requests

3 participants