Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File access security guide #9156

Merged
merged 8 commits into from
Aug 22, 2024
Merged

Conversation

freddyaboulton
Copy link
Collaborator

Description

Pull out the "File Access" section from the "Sharing your App" guide into its own guide.

Take two of #9154

Closes: #(issue)

🎯 PRs Should Target Issues

Before your create a PR, please check to see if there is an existing issue for this change. If not, please create an issue before you create this PR, unless the fix is very small.

Not adhering to this guideline will result in the PR being closed.

Tests

  1. PRs will only be merged if tests pass on CI. To run the tests locally, please set up your Gradio environment locally and run the tests: bash scripts/run_all_tests.sh

  2. You may need to run the linters: bash scripts/format_backend.sh and bash scripts/format_frontend.sh

Add code

Add code

Add code

emphasis
@gradio-pr-bot
Copy link
Collaborator

gradio-pr-bot commented Aug 20, 2024

🪼 branch checks and previews

Name Status URL
Spaces ready! Spaces preview
Website building...
Storybook ready! Storybook preview
🦄 Changes detected! Details

Install Gradio from this PR

pip install https://gradio-pypi-previews.s3.amazonaws.com/cab9ba13ebec3bbe56ce358ab2ec7802b0b28240/gradio-4.42.0-py3-none-any.whl

Install Gradio Python Client from this PR

pip install "gradio-client @ git+https://github.com/gradio-app/gradio@cab9ba13ebec3bbe56ce358ab2ec7802b0b28240#subdirectory=client/python"

Install Gradio JS Client from this PR

npm install https://gradio-npm-previews.s3.amazonaws.com/cab9ba13ebec3bbe56ce358ab2ec7802b0b28240/gradio-client-1.5.1.tgz

@gradio-pr-bot
Copy link
Collaborator

gradio-pr-bot commented Aug 20, 2024

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
website minor
  • Maintainers can select this checkbox to manually select packages to update.

With the following changelog entry.

File access security guide

Maintainers or the PR author can modify the PR title to modify this entry.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.


Sharing your Gradio app with others (by hosting it on Spaces, on your own server, or through temporary share links) **exposes** certain files on your machine to the internet.

This guide will explain which ones as well as some best practices for making sure the files on your machine are secure.
Copy link
Member

@abidlabs abidlabs Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This guide will explain which ones as well as some best practices for making sure the files on your machine are secure.
This guide explains which files are exposed as well as best practices for making sure the files on your machine are secure.


First, it's important to understand that Gradio places files in a special `cache` before returning them to the frontend. For example, if your prediction function returns a video file, then Gradio will move that video to the `cache` after your prediction function runs and returns a URL the frontend can use to show the video. Any file in the `cache` is available via URL while the application is running.

Tip: You can customize the location of the `cache` by setting the `GRADIO_TEMP_DIR` environment variable to an absolute path, such as `/home/usr/scripts/project/temp/`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tip: You can customize the location of the `cache` by setting the `GRADIO_TEMP_DIR` environment variable to an absolute path, such as `/home/usr/scripts/project/temp/`.
Tip: You can customize the location of the cache by setting the `GRADIO_TEMP_DIR` environment variable to an absolute path, such as `/home/usr/scripts/project/temp/`.

3. It is in the current working directory of the python interpreter.
4. It is in the temp directory obtained by `tempfile.gettempdir()`.

Additionally, files in the current working directory whose name starts with a period (`.`) will not be moved to the cache. If no criteria are met, the prediction function that created that file will error. Gradio performs this check so that arbitrary files on your machine are not moved to the cache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Additionally, files in the current working directory whose name starts with a period (`.`) will not be moved to the cache. If no criteria are met, the prediction function that created that file will error. Gradio performs this check so that arbitrary files on your machine are not moved to the cache.
Note: files in the current working directory whose name starts with a period (`.`) will not be moved to the cache, since they often contain sensitive information for your application.
If none of these criteria are met, the prediction function that created that file will raise an exception instead of moving the file to cache. Gradio performs this check so that arbitrary files on your machine cannot be accessed.

* Set a `max_file_size` for your application.
* Do not treat arbitrary user input as input to a file-based component (`gr.Image`, `gr.File`, etc.).
* Prefer to use absolute paths in `allowed_paths`. If a path in `allowed_paths` is a directory, any file within that directory can be accessed. If passing a directory is necessary, make sure it only contains files related to your application.
* Run your gradio application from the same directory the application file is located in. This will narrow the scope of files Gradio will be allowed to move into the cache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also might be worth adding a quick example:

python app.py

instead of

python Users/.../dev/app.py


* Set a `max_file_size` for your application.
* Do not treat arbitrary user input as input to a file-based component (`gr.Image`, `gr.File`, etc.).
* Prefer to use absolute paths in `allowed_paths`. If a path in `allowed_paths` is a directory, any file within that directory can be accessed. If passing a directory is necessary, make sure it only contains files related to your application.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you're connecting

Prefer to use absolute paths in allowed_paths.
and
If a path in allowed_paths is a directory, any file within that directory can be accessed. If passing a directory is necessary, make sure it only contains files related to your application.
but these are two independent points iiuc. I would separate them into two separate bullet points

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point I was trying to make is that if you add some_big_dir/ to allowed_paths, everything in it is exposed. I'm suggesting that the directories in allowed_paths are "as small as possible".

@@ -107,6 +107,15 @@ Environment variables in Gradio provide a way to customize your applications and
```


### 12. `GRADIO_EXAMPLES_CACHE`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah nice

@abidlabs
Copy link
Member

Went a bit overboard with the suggestions sorry @freddyaboulton 😅, take whatever sounds reasonable. Also there's one or two places in the Guides where we link to /guides/sharing-your-app#security-and-file-access, we should update those

@freddyaboulton
Copy link
Collaborator Author

Thanks @abidlabs - will address in a bit!

@freddyaboulton
Copy link
Collaborator Author

Should be good for another review @abidlabs !


Note: files in the current working directory whose name starts with a period (`.`) will not be moved to the cache, since they often contain sensitive information.

If none of these criteria are met, the prediction function that created that file will raise an exception instead of moving the file to cache. Gradio performs this check so that arbitrary files on your machine cannot be accessed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If none of these criteria are met, the prediction function that created that file will raise an exception instead of moving the file to cache. Gradio performs this check so that arbitrary files on your machine cannot be accessed.
If none of these criteria are met, the prediction function that is returning that file will raise an exception instead of moving the file to cache. Gradio performs this check so that arbitrary files on your machine cannot be accessed.


While running, Gradio apps will NOT ALLOW users to access:

- **Files that you explicitly block via the `blocked_paths` parameter in `launch()`**. You can pass in a list of additional directories or exact filepaths to the `blocked_paths` parameter in `launch()`. This parameter takes precedence over the files that Gradio exposes by default or by the `allowed_paths`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is true:

Suggested change
- **Files that you explicitly block via the `blocked_paths` parameter in `launch()`**. You can pass in a list of additional directories or exact filepaths to the `blocked_paths` parameter in `launch()`. This parameter takes precedence over the files that Gradio exposes by default or by the `allowed_paths`.
- **Files that you explicitly block via the `blocked_paths` parameter in `launch()`**. You can pass in a list of additional directories or exact filepaths to the `blocked_paths` parameter in `launch()`. This parameter takes precedence over the files that Gradio exposes by default or by the `allowed_paths` parameter or by `gr.set_static_paths`.

## Best Practices

* Set a `max_file_size` for your application.
* Do not treat arbitrary user input as input to a file-based component (`gr.Image`, `gr.File`, etc.). For example, the following interface would allow anyone to move an arbitrary file in your local directory to the cache: `gr.Interface(lambda s: s, "text", "file")`. This is because the user input is treated as an arbitrary file path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Do not treat arbitrary user input as input to a file-based component (`gr.Image`, `gr.File`, etc.). For example, the following interface would allow anyone to move an arbitrary file in your local directory to the cache: `gr.Interface(lambda s: s, "text", "file")`. This is because the user input is treated as an arbitrary file path.
* Do not return arbitrary user input from a function that is connected to a file-based output component (`gr.Image`, `gr.File`, etc.). For example, the following interface would allow anyone to move an arbitrary file in your local directory to the cache: `gr.Interface(lambda s: s, "text", "file")`. This is because the user input is treated as an arbitrary file path.


* Set a `max_file_size` for your application.
* Do not treat arbitrary user input as input to a file-based component (`gr.Image`, `gr.File`, etc.). For example, the following interface would allow anyone to move an arbitrary file in your local directory to the cache: `gr.Interface(lambda s: s, "text", "file")`. This is because the user input is treated as an arbitrary file path.
* Make `allowed_paths` as small as possible. If a path in `allowed_paths` is a directory, any file within that directory can be accessed. Ma sure the entires of `allowed_paths` only contains files related to your application.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Make `allowed_paths` as small as possible. If a path in `allowed_paths` is a directory, any file within that directory can be accessed. Ma sure the entires of `allowed_paths` only contains files related to your application.
* Make `allowed_paths` as small as possible. If a path in `allowed_paths` is a directory, any file within that directory can be accessed. Make sure the entires of `allowed_paths` only contains files related to your application.

* Set a `max_file_size` for your application.
* Do not treat arbitrary user input as input to a file-based component (`gr.Image`, `gr.File`, etc.). For example, the following interface would allow anyone to move an arbitrary file in your local directory to the cache: `gr.Interface(lambda s: s, "text", "file")`. This is because the user input is treated as an arbitrary file path.
* Make `allowed_paths` as small as possible. If a path in `allowed_paths` is a directory, any file within that directory can be accessed. Ma sure the entires of `allowed_paths` only contains files related to your application.
* Run your gradio application from the same directory the application file is located in. This will narrow the scope of files Gradio will be allowed to move into the cache. For examples, prefer `python app.py` to `python Users/sources/project/app.py`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Run your gradio application from the same directory the application file is located in. This will narrow the scope of files Gradio will be allowed to move into the cache. For examples, prefer `python app.py` to `python Users/sources/project/app.py`.
* Run your gradio application from the same directory the application file is located in. This will narrow the scope of files Gradio will be allowed to move into the cache. For example, prefer `python app.py` to `python Users/sources/project/app.py`.

Copy link
Member

@abidlabs abidlabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @freddyaboulton! Small comments

@freddyaboulton freddyaboulton merged commit 8deeeb6 into 5.0-dev Aug 22, 2024
21 of 22 checks passed
@freddyaboulton freddyaboulton deleted the file-access-security-guide-2 branch August 22, 2024 19:43
@freddyaboulton
Copy link
Collaborator Author

Thank you @abidlabs !

freddyaboulton added a commit that referenced this pull request Aug 22, 2024
* first draft

Add code

Add code

Add code

emphasis

* suggestions

* redirects

* add changeset

* trigger ci

* typos

---------

Co-authored-by: gradio-pr-bot <[email protected]>
freddyaboulton added a commit that referenced this pull request Aug 28, 2024
* Fix unified case

* commit

* Add code

* add changeset

* notebook

* Lint

* delete

* Fix code

* fix tests

* File access security guide (#9156)

* first draft

Add code

Add code

Add code

emphasis

* suggestions

* redirects

* add changeset

* trigger ci

* typos

---------

Co-authored-by: gradio-pr-bot <[email protected]>

* redirect

* typos

* link

* fix

* See what the problem is

* less time

* fix

* try again with busted cache

* try again

* Code

* Demo and code

---------

Co-authored-by: gradio-pr-bot <[email protected]>
Co-authored-by: pngwn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants