Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postprocess hardening #9122

Merged
merged 10 commits into from
Aug 19, 2024
Merged

Postprocess hardening #9122

merged 10 commits into from
Aug 19, 2024

Conversation

freddyaboulton
Copy link
Collaborator

Description

Ensures only files located in the current working directory or tempdir are able to be moved to the cache

🎯 PRs Should Target Issues

Before your create a PR, please check to see if there is an existing issue for this change. If not, please create an issue before you create this PR, unless the fix is very small.

Not adhering to this guideline will result in the PR being closed.

Tests

  1. PRs will only be merged if tests pass on CI. To run the tests locally, please set up your Gradio environment locally and run the tests: bash scripts/run_all_tests.sh

  2. You may need to run the linters: bash scripts/format_backend.sh and bash scripts/format_frontend.sh

@gradio-pr-bot
Copy link
Collaborator

gradio-pr-bot commented Aug 14, 2024

🪼 branch checks and previews

Name Status URL
Website ready! Website preview
🦄 Changes detected! Details

@gradio-pr-bot
Copy link
Collaborator

gradio-pr-bot commented Aug 14, 2024

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
gradio minor
  • Maintainers can select this checkbox to manually select packages to update.

With the following changelog entry.

Postprocess hardening

Maintainers or the PR author can modify the PR title to modify this entry.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

@freddyaboulton freddyaboulton marked this pull request as ready for review August 15, 2024 18:34
@freddyaboulton freddyaboulton requested review from aliabid94, pngwn, abidlabs, aliabd, dawoodkhan82 and hannahblair and removed request for pngwn August 15, 2024 18:34
raise HTTPException(403, f"File not allowed: {path_or_url}.")

if not abs_path.exists():
raise HTTPException(404, f"File not found: {path_or_url}.")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the unit tests, we actually don't raise 404 if the file does not exist

@@ -1910,6 +1909,10 @@ async def process_api(
state_ids_to_track, hashed_values = self.get_state_ids_to_track(block_fn, state)
changed_state_ids = []

from gradio.context import LocalContext

LocalContext.blocks.set(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, why do we set it here and not in the init of Blocks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this would be more "future proof" as it's the most up to date version of the Blocks before any sort of processing happens. But I don't think we need this actually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this to be done after init for is_running to be set. I think actually this is a good place for it.

if blocks is None or not blocks.is_running:
return

import tempfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest putting this import at top of file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

msg += "located in either the current working directory or your system's temp directory. "
msg += "To fix this error, please ensure your function returns files located in either "
msg += f"the current working directory ({os.getcwd()}), your system's temp directory ({tempfile.gettempdir()}) "
msg += f"or add {str(abs_path.parent)} to the allowed_paths parameter."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add something like, "if you'd like to specifically allow this file to be served, you can add it to the allowed_paths parameter of launch()"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

):
raise InvalidPathError(
"Dotfiles located in the temporary directory cannot be moved to the cache for security reasons. "
"Please check whether this file is valid and rename it if so."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same^

allowed_paths=create_path_list(),
file_sets=st.lists(create_path_set(), min_size=0, max_size=3),
)
def test_is_allowed_file_fuzzer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool idea to use the fuzzer here. But if I'm understanding correctly, randomly generating paths is very, very unlikely to produce any overlaps between the parameters, right? So this is actually unlikely to test what happens if a path is in blocked_paths or allowed_paths or file_sets? Perhaps part of the path should be manually specified or the values should be chosen from a set with smaller cardinality

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I greatly reduced the cardinality of the allowed possible paths set and also added some corner cases manually in a different test.

is_dir = abs_path.is_dir()

if is_dir or in_blocklist:
if abs_path.is_dir() or not abs_path.exists():
raise HTTPException(403, f"File not allowed: {path_or_url}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be better to set an allowed flag here, and raise a single 403 later, after the utils.is_allowed_file has completed so that malicious attackers cannot using timing attacks to infer whether a particular filepath even exists (that itself can be insecure in certain settings)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea good call.

@abidlabs
Copy link
Member

Note: I believe the gr.Code component is still vulnerable to attacks since in postprocess(), we simply read the file and return its contents to users. I think we just should just deprecate letting gr.Code taking in a tuple value

@abidlabs
Copy link
Member

abidlabs commented Aug 15, 2024

Left some comments but overall this works great. Good stuff @freddyaboulton, will go ahead and approve

@freddyaboulton freddyaboulton merged commit 2672ea2 into 5.0-dev Aug 19, 2024
20 checks passed
@freddyaboulton freddyaboulton deleted the postprocess-hardening branch August 19, 2024 17:52
@freddyaboulton freddyaboulton mentioned this pull request Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants