Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use yaml.safe_dump and _load? #26

Closed
bollwyvl opened this issue Sep 8, 2022 · 7 comments
Closed

Use yaml.safe_dump and _load? #26

bollwyvl opened this issue Sep 8, 2022 · 7 comments

Comments

@bollwyvl
Copy link
Contributor

bollwyvl commented Sep 8, 2022

Using the default load and dump features of pyyaml can have unintended consequences, as they can use YAML tags to execute arbitrary code.

For an application, this is great as it provides a low-code way to use complex structures.

For a library, this is kinda bad, as it means any yaml source is immediately an arbitrary code execution engine.

Likely, this would just need from yaml import safe_dump, safe_load.

Additionally, when reading yaml, using read_text(encoding="utf-8") is a good way to avoid future issues.

@Zsailer
Copy link
Member

Zsailer commented Sep 8, 2022

Thank you, @bollwyvl! This is super helpful!

I'll work on a PR later today to address these issues.

@dlqqq
Copy link
Contributor

dlqqq commented Sep 8, 2022

Python example of this can be found in TF: GHSA-r6jx-9g48-2r5r

@dlqqq
Copy link
Contributor

dlqqq commented Sep 8, 2022

@bollwyvl Wanted some more clarification from you here. Would you agree in saying this issue is specific to a YAML engine implementation? It doesn't look like the YAML spec requires an implementation to support custom tags that allow for ACE. The required tags are for very common types (e.g. dict, str, float) and seem to be safe as long as the engine is using the language's standard library and is safe. It seems the issue stems from the fact most implementations choose to have ACE via tags available by default for whatever reason, rather than stemming from the YAML spec itself.

@bollwyvl
Copy link
Contributor Author

bollwyvl commented Sep 8, 2022

Yes, indeed it's implementation-driven, but also use case driven.

Throughout the jupyter ecosystem, YAML has had a lot of push-back as an interchange and configuration format because of these security gotchas around untrusted input, performance, and the complexity of adopting one of the implementations: for a few years, the primary one just didn't work very well on windows. The deployment and performance situation has improved, but the fact that, as a default, it will allow calling any-old-class is not

That being said: for hand-writing JSON schema, YAML is lovely, especially for things like anchors, with decent tooling e.g. yaml-language-server.

If the only purpose is for loading configurations in this library is to be then be handed over to instantiate a JSON Schema validator, none of the none-core JSON types are of particular value, and safe_ is the right play.

If events are going to be flowing as YAML (which is a bad idea for myriad reasons), then best-effort-at-security seems like a requirement, rather than a sensible default.

At any rate: starting with pyyaml's best-effort-at-security safe_load and safe_dump is probably a better starting point for a library, and should probably be the default until a real need arises.

@dlqqq
Copy link
Contributor

dlqqq commented Sep 9, 2022

Agreed. We should not follow the existing convention of being unsafe by default 🙈

@dlqqq
Copy link
Contributor

dlqqq commented Sep 9, 2022

@Zsailer Sorry, forgot to specify "fixes #XX" instead to auto-close.

@Zsailer
Copy link
Member

Zsailer commented Sep 9, 2022

No problem. Thanks, @dlqqq. Closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants