Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure Session token remains safe with multiple processes. #236

Closed
2 of 3 tasks
tegefaulkes opened this issue Sep 1, 2021 · 31 comments
Closed
2 of 3 tasks

Ensure Session token remains safe with multiple processes. #236

tegefaulkes opened this issue Sep 1, 2021 · 31 comments
Assignees
Labels
development Standard development r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management

Comments

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Sep 1, 2021

An MR has been created for this issue here: https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/212

Specification

The Session Session.writeToken in src/sessions/Session.ts:116 makes use of proper-lockfile. However we need to create a test to make sure this functions properly when multiple processes are attempting to update the token file.

The proper-lockfile library is fairly old, and is more complex than what we require. As such, we should switch to using fd-lock instead.

fd-lock only provides two methods: lock and unlock. There is no explicit method to check whether a file is already locked, however, since a file can only be locked if it is not already locked then we can simply attempt to lock the file - if we fail, the file is already locked.

Additional context

imagine 2 calls

pk dosomething1 # uses T1 token (T1 token changes to T2)
pk dosomething2 # uses T2 token

that's in the case of a serial calls but in parallel calls

pk dosomething1 & # uses T1 token (changes to T2 when it finishes)
pk dosomething2 & # uses T1 token as well 
(tries to change to T3, but drops the change when it realises that T2 locking

so at the end of parallel calls, the token state us still T2, but that's fine as T2 is still valid, and T3 is simply dropped

Information on fd-lock can be found here.

Tasks

  1. Explore the use of fd-lock as an alternative to proper-lockfile
  2. Create test for multiple processes attempting to update session and token file (in tests/client and/or tests/bin).
  3. Confirm that the locking functions as expected.
@tegefaulkes tegefaulkes added the development Standard development label Sep 1, 2021
@tegefaulkes
Copy link
Contributor Author

@CMCDragonkai
Copy link
Member

The primary thing to understand here is that the multi-process situation here is about the PK client, no the PK agent.

We have a client-server architecture now between clients and agents.

Every single CLI call pk ... is a separate execution of the PK client process.

This means there are going to be multiple PK client processes running simultaneously.

The PK GUI is also a PK client, it's just a long-running one.

Therefore we need to ensure that the session token usage by the PK clients are safe across multiple simultaneous process executions and that means we need simulation tests inside tests/client or tests/bin to ensure that our session tokens are not clobbered up.

@CMCDragonkai
Copy link
Member

A diagram should be made for this and ported to the sessions wiki that shows the component of multiple clients, a single agent, and the usage of a common session key stored at the HOME directory or client state directory that is being used to authenticate.

A second component diagram can be used to represent a multi-step process (similar to a sequence diagram) but with labelled ordinal arrows.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 27, 2021

This should illustrate the entire architecture.

image

In addition to this, one should use the same diagram, but illustrate a labelled ordinal actions.

I have to redo the diagram cause asciiflow broke.. so the above is a snapshot.

The labelled ordinal action diagram is actually a "top-down" sequence diagram. It's the application of the sequence of steps to a component diagram. An example is this: https://datatracker.ietf.org/doc/html/rfc6749#section-1.2. The advantage of using this over a sequence diagram are:

  • More clearer mapping of the sequence to the components - can aid mental mapping
  • Is clearer for multiple components, whereas normal sequence diagram would have to cross lines (avoid line crossing in diagrams)

Disadvantage is:

  • Cannot be used for very complex sequences, there will be too many arrows

So the normal sequence diagram is better for fewer agents and complex sequences and sequence nesting. While top down sequence is better for multiple agents, and in order to map the a high level sequence to multiple components.

@CMCDragonkai
Copy link
Member

Because each Session shares the ~/.polykey/client/token, they need to use a lock to coordinate access. This lock is primarily intended for when the token is updated.

The token is updated on every single call. The PK agent returns a new valid token to be used.

In our case, we expect that any call holds a lock to the token, and if another process discovers that the lock is held, it simply drops and cancels the session token update request. That way the token is only updated by the first process that is starting a call, one the process finishes their call, they should release the token file lock. All of this logic should be inside the Session class.

@CMCDragonkai
Copy link
Member

This issue is independent of #240, however this should be done first as it in relation to correctness of our system.

And #240 is an enhancement of the UX of using the CLI/GUI.

@CMCDragonkai
Copy link
Member

Diagram fixed up and with extra details.

                                       ┌────────────────┐
                                       │ SessionManager │  SessionManager generates and verifies session tokens
Client on different host/user          └────────┬───────┘
                                                │
    ┌───────────────┐                   ┌───────┴──────┐
    │ PolykeyClient ├───────────────────► PolykeyAgent │   Agent accepts authenticated requests
    └───────┬───────┘                   └───────▲──────┘
            │                                   │
       ┌────┴────┐            ┌─────────────────┼─────────────────┐
       │ Session │            │                 │                 │
       └─────────┘    ┌───────┴───────┐ ┌───────┴───────┐ ┌───────┴───────┐
                      │ PolykeyClient │ │ PolykeyClient │ │ PolykeyClient │  Multiple CLI & GUI clients on same host/user
                      └───────┬───────┘ └───────┬───────┘ └───────┬───────┘
                              │                 │                 │
                         ┌────┴────┐       ┌────┴────┐       ┌────┴────┐
                         │ Session │       │ Session │       │ Session │     Session manages token lifecycle
                         └────┬────┘       └────┬────┘       └────┬────┘
                              │                 │                 │
                              └─────────────────┼─────────────────┘
                                                │
                                   ┌────────────┴────────────┐
                                   │ ~/.polykey/client/token │  Concurrency managed via file lock
                                   └─────────────────────────┘

                         Shared session token for authenticated requests

@CMCDragonkai
Copy link
Member

The https://github.com/mafintosh/fd-lock library is a much simpler locking library. Since the concurrency management only requires you to check if the lock is in fact locked and not have to wait/block for the lock to be unlocked, then it can be a much simpler library to use instead of proper-lockfile. BTW I don't think proper-lockfile uses fd-lock.

The proper-lockfile is much older, bigger attack surface. So it's better to use something simpler.

@emmacasolin
Copy link
Contributor

@CMCDragonkai I've remade your diagram above with sequence numbers included. This is my understanding of what safety of the session token should look like across multiple processes. Please let me know if I'm off about anything.

┌─────────────────────────────────────────┐
│              SessionManager             │
└──────▲─┬────────────────────────────────┘
       │ │
       │ │
┌──────┴─┴────────────────────────────────┐
│                                         ├────────────┐
│               PolykeyAgent              │            │
│                                         ◄─────────┐  │
└────┬─┬─┬────────────────────┬───────────┘         │  │
     │ │ │                    │                     │  │
     │ │ │                    │                     │  │
     1 2 3                    4                     6  │ Client on different
     │ │ │                    │                     │  │      host/user
     │ │ │                    │                     │  │
 ┌───┴─┴─┴───────┐        ┌───┴───────────┐        ┌┴──┴───────────┐
 │ PolykeyClient │        │ PolykeyClient │        │ PolykeyClient │
 └───┬─┬─┬───┬───┘        └───┬───▲───┬───┘        └───┬───▲───┬───┘
     │ │ │   │                │   │   │                │   │   │
    ┌┴─┴─┴───┴┐              ┌┴───┴───┴┐              ┌┴───┴───┴┐
    │ Session │              │ Session │              │ Session │
    └┬───┬───┬┘              └┬───┬───┬┘              └┬───┬───┬┘
     │   │   │                │   │   │                │   │   │
     │   │   │                │   │   │                │   │   │
     │   │   11               │   5   9                │   │   │
     │   │   │                │   │   │                7   8   10
     ▼   │   │                │   │   ▼                │   │   │
  (empty)│   │                │   │(locked)            │   │   │
     ┌───▼───▼────────────────▼───┴───┐                │   │   │
     │                                ◄────────────────┘   │   │
     │                                │                    │   │
     │    ~/.polykey/client/token     ├────────────────────┘   │
     │                                │                        │
     │                                │(locked)◄───────────────┘
     └────────────────────────────────┘
  1. PolykeyAgent starts a new PolykeyClient and associated Session to run a command. The session checks ~/.polykey/client/token and finds either no token or an invalid token.
  2. Session requests a new token from the SessionManager.
  3. SessionManager provides a new token, which is passed back through to the Session, which puts it in ~/.polykey/client/token, locking the file. Now the PolykeyClient can begin running its asynchronous command.
  4. While the first command is running a second command is initiated, so the PolykeyAgent initialises a second PolykeyClient and associated Session. This Session checks ~/.polykey/client/token and finds a valid token.
  5. The valid token is passed to the second PolykeyClient and the second asynchronous command can also commence.
  6. While both of the first two commands are still running, a Client on a different host sends an authenticated request to the PolykeyAgent.
  7. The PolykeyAgent responds to this third request and the third PolykeyClient requests a token from its Session. The Session subsequently checks ~/.polykey/client/token for a valid token.
  8. A valid token is returned and the third PolykeyClient can run its command.
  9. The second PolykeyClient has now finished running its command and attempts to refresh the shared session token in ~/.polykey/client/token, however, since the file was locked by the first PolykeyClient's Session it drops this request to refresh the token.
  10. Similarly, when the third PolykeyClient attempts to refresh the session token it also drops the request upon discovering that the file is locked.
  11. The first PolykeyClient finishes running its command. It unlocks ~/.polykey/client/token and refreshes the token.

Note that the second and third PolykeyClients must also lock ~/.polykey/client/token while their own commands are running to prevent the token from being refreshed by any other command that finishes before their own. The session token should only be refreshed by the last Client to finish using it. In this sense, EVERY command needs to lock ~/.polykey/client/token when it starts and unlock it when it ends, so we need to be able to have multiple, independent locks on the session token such that it can only be refreshed when ALL of the locks are released.

@CMCDragonkai
Copy link
Member

Just a quick comment, in this diagram, you don't need the other 2 PolykeyClient. Focus on just the interaction between client and agent. Then you can bring up the second client to show the locking interaction.

@emmacasolin
Copy link
Contributor

So just like this? Or are the arrows labelled 4 and 5 also not necessary?

┌────────────────────────────────────────┐
│             SessionManager             │
└─────▲─┬────────────────────────────────┘
      │ │
      │ │
┌─────┴─┴────────────────────────────────┐
│                                        │
│              PolykeyAgent              │
│                                        │
└───┬─┬─┬────────────────────┬───────────┘
    │ │ │                    │
    │ │ │                    │
    1 2 3                    4
    │ │ │                    │
    │ │ │                    │
┌───┴─┴─┴───────┐        ┌───┴───────────┐
│ PolykeyClient │        │ PolykeyClient │
└───┬─┬─┬───┬───┘        └───┬───▲───┬───┘
    │ │ │   │                │   │   │
   ┌┴─┴─┴───┴┐              ┌┴───┴───┴┐
   │ Session │              │ Session │
   └┬───┬───┬┘              └┬───┬───┬┘
    │   │   │                │   │   │
    │   │   │                │   │   │
    │   │   7                │   5   6
    │   │   │                │   │   │
    ▼   │   │                │   │   ▼
 (empty)│   │                │   │(locked)
    ┌───▼───▼────────────────▼───┴───┐
    │                                │
    │                                │
    │    ~/.polykey/client/token     │
    │                                │
    │                                │
    └────────────────────────────────┘

@emmacasolin
Copy link
Contributor

Also I had a look at fd-lock and I don't think it will work for our purposes. If a file is already locked then you can't lock it again, and there's no way to determine who locked the file and what processes are still using it. All it offers is locking and unlocking so we would need another way to track which processes are still reliant on the file before we unlock and refresh the token.

@CMCDragonkai
Copy link
Member

As 2 separate diagrams.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 29, 2021 via email

@emmacasolin
Copy link
Contributor

But what if the first client (who locks the file) finishes before the second client?
For example:

  1. A locks the file
  2. B starts running a command using the token stored in the locked file
  3. A finishes its process, unlocks the file, and refreshes the token. It has no way of knowing that B is still using the original token

In this case, wouldn't B now be using an invalid token?

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 29, 2021 via email

@CMCDragonkai
Copy link
Member

We may be using a file locking in the agent for the agent lockfile. It might be using the same library. Even in that case fd-lock should be sufficient too so we reduce the complexity of our dependencies. Although I wonder how difficult it is to extend it in the future to enable blocking waits.

@emmacasolin
Copy link
Contributor

Ahh ok I think I understand now.

So in the first diagram we have a single agent and client interaction:

┌──────────────────┐
│  SessionManager  │
└──────▲───┬───────┘
       │   │
       │   3
       │   │
┌──────┴───┴───────┐
│   PolykeyAgent   │
└──┬───┬───┬───────┘
   │   │   │
   1   │   │
   │   │   │
┌──┴───┴───┴───────┐
│  PolykeyClient   │
└──┬───┬───┬───┬───┘
   │   │   │   │
   │   2   │   4
   │   │   │   │
┌──┴───┴───┴───┴───┐
│     Session      │
└──┬───────┬───┬───┘
   │       │   │
   │       │   │
   │       │   │
┌──▼───────▼───▼───┐
│   sessionFile    │
└──────────────────┘
  1. Agent starts up a new Client and Session. Session checks the session file for a valid token (none found).
  2. Session requests a valid token from SessionManager.
  3. SessionManager returns a new token and the Session writes this to the session file and locks it.
  4. When the Client finishes the Session unlocks the session file and refreshes the token.

And then in the second diagram is two clients operating at the same time:

┌──────────────────────────────────────────┐
│               PolykeyAgent               │
└────┬────────────────────────┬────────────┘
     │                        │
     1                        3
     │                        │
┌────┴────────────┐      ┌────┴────────────┐
│  PolykeyClient  │      │  PolykeyClient  │
└────┬───▲───┬────┘      └────┬───▲───┬────┘
     │   │   │                │   │   │
     │   │   6                │   │   5
     │   │   │                │   │   │
┌────┴───┴───┴────┐      ┌────┴───┴───┴────┐
│     Session     │      │     Session     │
└────┬───┬───┬────┘      └────┬───┬───┬────┘
     │   │   │                │   │   │
     │   2   │                │   4   ▼
     │   │   │                │   │(locked)
┌────▼───┴───▼────────────────▼───┴────────┐
│               sessionFile                │
└──────────────────────────────────────────┘
  1. Agent starts up a new Client and Session. Session checks the session file for a valid token (found) and locks the session file.
  2. Valid token from session file is returned to the Client.
  3. Agent starts up a second Client and Session. Second Session also checks the session file for a valid token (found) and attempts to lock the session file, however this fails because it's already locked.
  4. Valid token from session file is returned to the second Client.
  5. When the second Client finishes the Session attempts to unlock the session file, however this fails because it does not possess the lock. The session refresh attempt is dropped.
  6. When the first Client finishes the Session unlocks the session file and refreshes the token.

@emmacasolin
Copy link
Contributor

The MR for this has been merged now. There were a few things that I'll be continuing over in #240 (tests for renewal of the session token) but otherwise there are only a couple outstanding issues:

  • The tests for concurrent commands use jest's concurrent feature at the moment - this feature is considered experimental so we may run into issues with it in the future. It works really well at the moment but we should eventually look into alternatives.
  • The current tests are unable to test the actual locking and unlocking of the session file - I looked into mocking the Session class in order to achieve this, however decided to leave it for now. We'll need to write tests for this in the future. At the moment we're assuming that the locking function is working due to the concurrent process tests passing, but we don't know for sure.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 8, 2021 via email

@CMCDragonkai
Copy link
Member

@emmacasolin before closing this issue, can you update the session wiki reference with the diagrams you did above. Suggest you update to using arrow labels on the side rather than putting labels inline.

@CMCDragonkai
Copy link
Member

Then confirm and close.

@CMCDragonkai
Copy link
Member

In the future remember to cross reference the MR and PR.

@emmacasolin
Copy link
Contributor

Regarding the second kind of tests, why not just do a unit test where you lock the session file and then use fd-lock and test whether the file is locked or not. That should work.

Yeah I could add in a test that does that, but I was referring more to the order in which the locking and unlocking happens and being able to test that it's happening while a process is occurring - for example being able to pause the process during its execution and check that the session file is locked, then letting the process finish and checking again that the session file is unlocked.

For concurrent, it's considered experimental but is it expected to be standard later? We're using a bunch of experimental features in js-async-init anyway.

I couldn't find anything confirming or denying this, but if I had to guess I'd say it will probably eventually become standard. There are still a few outstanding issues with it (for example, concurrent tests currently don't wait for beforeAll() or afterAll() blocks so I had to create my own async functions that are awaited at the beginning of each concurrent test and resolved at the end of each before/afterAll block) so it's very likely that there will be changes to the way it works that might affect the tests I wrote, but we can deal with that if/when it becomes an issue.

And yep I'll fix up the diagram labels and then add them to the session wiki right now.

@emmacasolin
Copy link
Contributor

The Sessions wiki (https://github.com/MatrixAI/js-polykey/wiki/Sessions) has been updated with the diagrams from above, so this issue can now be closed with this comment.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 11, 2021 via email

@CMCDragonkai
Copy link
Member

Regarding wiki...

The Agent starts up a new Client and Session in response to a CLI call. The Session checks the session file (located at ~/.polykey/client/token) for a valid token. At this stage, either the session file is empty or the stored token is invalid.

I don't think the "Agent starts up a new Client" is correct.

The Client is started from the CLI, starting clients have nothing to do with the Agent. Can you update this?

@CMCDragonkai
Copy link
Member

The Session requests a new, valid token from the Session Manager.

The session doesn't request a valid token from the session manager. The session only just reads and writes tokens. It also contains the session token data.

The PolykeyClient is what requests the token from the agent.

@CMCDragonkai
Copy link
Member

When the Client finishes its command, its Session unlocks the session file and refreshes the token.

It actually unlocks the session file as soon as the write is completed.

PK clients can send repeated calls to the agent, each time we get a new token from the agent, and the PK client can initiate a write to the session file.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 31, 2021

I'm writing these comments in relation to the code by the way. I know I wrote before:

In our case, we expect that any call holds a lock to the token, and if another process discovers that the lock is held, it simply drops and cancels the session token update request. That way the token is only updated by the first process that is starting a call, one the process finishes their call, they should release the token file lock. All of this logic should be inside the Session class.

It turns out that this isn't necessary. It's only when the returned metadata exists with the new session token data do we try to acquire a lock and write the updated session token file. Only at that point is the lock required.

@CMCDragonkai
Copy link
Member

Note that with https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/213 merged, the behaviour of locking in Session changed. Now using it with session interceptor means that each call does perform a write lock. To be optimised in the future, we need #290.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management
Development

No branches or pull requests

3 participants