Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make write safer #1416

Merged
merged 4 commits into from
Sep 13, 2023
Merged

Make write safer #1416

merged 4 commits into from
Sep 13, 2023

Conversation

JakeSiFive
Copy link
Contributor

Recently we had a very small mistake (swapping the arguments of write) cause total workspace data loss. This is unacceptable for wake. This change makes write safer via the following measures:

  1. write will no longer allow "." or "" as a write location
  2. write will only unlink files, and not do a deep unlink. This means that two different build options might conflict in wake but I think that's bad style anyway
  3. write will not write outside of the root workspace. This will break some rare use cases but they can be replaced with a bespoke job instead
  4. write will not overwrite a source file

It would be nice to add an additional check that write does not overwrite anything previously hashed but this is a bit tricky to do and thanks to the fact that we don't deep unlink anything now, I think this should be fine for now as data loss is quite limited and it would be unusual for the content to 1) be a valid file path and 2) point to something the user intended to keep.

Comment on lines +54 to +55
require False = matches `\.\..*` path
else failWithError "Attempt to write outside of the workspace"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs to be run through a relativizer of some sort (I don't think it is yet, but I might have missed something) to catch subdir/../... It's also filtering out things like ..inRoot which are most definitely bad names but which are still in the workspace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All paths leading into this function simplify the path before passing it to the implementation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

Comment on lines +982 to +986
require Pair (Pass inFile) _ =
write specFilePath (prettyJSON json)
| rmap getPathName
else Pair (Fail (makeError "Failed to 'write {specFilePath}'.")) ""
| addErrorContext "Failed to 'write {specFilePath}: '"
| (Pair _ "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the Pair for? It seems to be unconditionally added with no meaningful data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type of this function is Pair (Result ...) String because its a runner pre function. You can see that previously the same Pair was being constructed in the else branch. I just made the error message nicer while working around the return type being a Pair

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now that I'm seeing the mixed lines, it looks like something to fit the else case.

else failWithError "Attempt to write to an absolute path"

# Source files should never be deleted so we check for this case
def scan dir regexp = prim "sources"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling prim "sources" for every call to write is expensive no? Do we cache the result somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wake reads the full list at the start and then does a linear regex scan through it. The linear scan could be improved to be a Tree in this case since we don't need a regex but this is the same cost we incur when we source a file today so I think its fine? I'll test it on a larger build to see what the effect is.

@JakeSiFive JakeSiFive merged commit 117fb96 into master Sep 13, 2023
12 checks passed
@JakeSiFive JakeSiFive deleted the make_write_safer branch September 13, 2023 19:45
@mmjconolly
Copy link
Contributor

We should try to get this into a wake release

V-FEXrt pushed a commit that referenced this pull request Sep 15, 2023
1. write will no longer allow "." or "" as a write location
2. write will only unlink files, and not do a deep unlink. This means that two different build options might conflict in wake but I think that's bad style anyway
3. write will not write outside of the root workspace. This will break some rare use cases but they can be replaced with a bespoke job instead
4. write will not overwrite a source file
Comment on lines +261 to +265
error +=
" is a directory and cannot be overwritten. If this is intentional please manually "
"delete this directory";
size_t len = std::min(error.size(), max_error);
String *out = String::claim(runtime.heap, error.c_str(), len);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is running into max_error problems, and really doesn't look good when it does: Fail (Error "src is a directory and cannot be overwritten. If this is intentional please manually delete this direct" Nil)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh crap, thanks for finding and debugging that! We should increase max error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's always going to be some problem if we simply bump the value. Using an unrealistically low max_error to stand in for a longer path than I want to type: Fail (Error "very/long/path/to/some/fi" Nil). We need to trim the filepath (with some indication that it's been trimmed) without risking the error message itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably won't be prioritizing that level of quality fix but I'd love to see that sort of thing go in. We can add in the the size of the path instead I think. Paths on Linux are not allowed to be any longer than 4096 so we can set it to something like 100 + std::max(path.size(), 4096).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants