Skip to content
This repository has been archived by the owner on Jul 23, 2019. It is now read-only.

Latest commit

 

History

History
30 lines (17 loc) · 5.46 KB

2018_07_23.md

File metadata and controls

30 lines (17 loc) · 5.46 KB

Update for July 23, 2018

Contributions

@MoritzKn fixed a bug where we were incorrectly calculating the position to place the cursor when inserting strings containing multibyte characters. Thanks!

Convergence for replicated directory trees

Late last week we were able to achieve convergence in our randomized tests of replicated directory trees. As I mentioned in the last update, the biggest challenge was the possibility of concurrent moves introducing cycles. Our proposed solution of breaking cycles via synthetic "fixup" operations worked out well, but determining exactly which fixup operations to generate was still a challenging problem.

In certain diabolical scenarios, reverting a move to break one cycle could end up introducing a second cycle. By reverting multiple moves, however, it should always be possible to end up with a directory tree that is free of cycles, and so that's what we do. Whenever a cycle is detected, we continually revert the most recent move that contributes to that cycle, ignoring any moves that have already been reverted. Eventually, we're guaranteed to end up with a tree that's free of cycles. Though we don't have a formal proof to back up our intuition, we've been unable to find a failing scenario over a million randomized trials, and we're ready to move forward.

We applied the same "fixup" strategy to recover from directory name conflicts as well. When we attempt to insert a directory entry whose name conflicts with an existing entry, we compare the entries' Lamport timestamps and replica ids to select an entry that gets to keep the existing name. For the other entry, we append ~ characters until we find a name that does not conflict and synthesize a rename operation. In a real-time scenario, this situation should almost never occur, but if it does, renaming one of the directories means we can mirror the state of the CRDT to the file system without losing data. The users can then decide how to deal with the situation by deleting one of the directories or merging their contents.

Interacting with the file system

For our convergence results to be useful outside the realm of automated tests, we need to communicate changes to and from the file system. That presents its own set of challenges, since we can't rely on our internal representation always being perfectly synchronized with the state of the disk. After confusing ourselves a bit too much trying to devise a strategy for file system synchronization that could cover every possible scenario, we've decided to focus on a few narrowly-defined situations on the critical path to a working demo.

  • Read a tree into a new index: When the Eon daemon starts, we will need to read the current state of the tree into our internal representation.
  • Write an index to a file system tree: When you want to clone a remote replica, we need to write its initial state to your local disk.
  • Update an index from a tree: Once the daemon is started, we want to watch the file system for changes. When we detect a change, we will scan the directory tree to determine which directories have been inserted, removed, or moved.
  • Write incoming operations to the disk: As operations come in, we interpret them relative to our internal index and translate them into writes to the file system.

For now, we've decided to rely on the fact that files and directories get associated with unique inode numbers in order to detect moves. In our previous attempt, we were hoping to not fully rely on inodes in hopes of covering cases such as the entire repository being recursively copied or another system like Git manipulating the file tree. Now we've decided we will deal with those scenarios in a separate pass once we get the basic scenario working. Tracking the mapping between our internal file identifiers and inodes makes everything much simpler.

One thing that makes it challenging (if not impossible) to mirror changes to the file system perfectly is the inability to perform file system operations atomically. When we receive a move operation from the network, we'll resolve abstract identifiers in the operation to actual paths on the local disk. If the disk's contents have changed in the meantime and we haven't heard about it, there's a potential for these paths to be wrong. To mitigate this issue, we will always confirm that the relevant paths exist and have the expected inode numbers before applying a remote operation. If we detect that our index has fallen behind the contents of the disk, we will defer handling the operation until the next update.

However, even if we determine that our index is consistent with the disk, this determination isn't atomic. In the microseconds between checking for consistency and performing the write, another change might invalidate our conclusion and cause the operation to fail. Worse, a change might cause the same paths to point to different inodes, meaning the operation would succeed but apply to different paths. Luckily, we anticipate this sort of situation to be extremely rare. It could only happen if a file at a given path was replaced with another in the moment between our consistency check and actually writing the operation. It might lead to surprising results, but we don't think the consequences are catastrophic.

Dealing with all of these problems and getting changes to and from the file system will be our focus for this week.