Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does PyYAML 5.1 raise YAMLLoadWarning when the default loader has been made safer already? #292

Open
lonelearner opened this issue Apr 14, 2019 · 8 comments
Labels

Comments

@lonelearner
Copy link

Here is my code:

import yaml
yaml.load('foo')

This code leads to the following warning with PyYAML (5.1).

$ pip install pyyaml
$ python3 foo.py
foo.py:2: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  yaml.load('foo')

So I visited https://msg.pyyaml.org/load to see what this is about but I do not understand the need for this warning.

First, the documentation says,

UnsafeLoader (also called Loader for backwards compatability)

The original Loader code that could be easily exploitable by untrusted data input.

Okay, that makes sense. In an earlier version, the original loader was unsafe. Further, it says,

FullLoader

Loads the full YAML language. Avoids arbitrary code execution. This is currently (PyYAML 5.1) the default loader called by yaml.load(input) (after issuing the warning).

So the current version uses FullLoader which is not unsafe. This is confirmed again in the document.

The load function was also made much safer by disallowing the execution of arbitrary functions by the default loader (FullLoader).

If the current version that uses FullLoader is not unsafe, then why do we need the YAMLLoadWarning at all?

@JonathonReinhart
Copy link

Because if your code also ran against an older version of PyYAML, then it would still be vulnerable. The purpose of the warning is to get you (the developer) to change your code to explicitly pass a loader, so that it can be safe on any version of PyYAML.

ngfgrant added a commit to Sceptre/sceptre that referenced this issue Apr 16, 2019
Updates the PyYaml version to 5.1.

Adds in YamlLoader as per yaml/pyyaml#292

Other incompatible changes were reviewed
(yaml/pyyaml#265) and the yaml.Loader appears
to the be only concern for now.

[Resolves #665]
ngfgrant added a commit to Sceptre/sceptre that referenced this issue Apr 23, 2019
Updates the PyYaml version to 5.1.

Adds in YamlLoader as per yaml/pyyaml#292

Other incompatible changes were reviewed
(yaml/pyyaml#265) and the yaml.Loader appears
to the be only concern for now.

[Resolves #665]
@slavaGanzin
Copy link

And how it can be safer on any version of PyYAML if Loader.FullLoader was implemented recently?

ngfgrant added a commit to Sceptre/sceptre that referenced this issue May 2, 2019
Updates the PyYaml version to 5.1.

Adds in YamlLoader as per yaml/pyyaml#292

Other incompatible changes were reviewed
(yaml/pyyaml#265) and the yaml.Loader appears
to the be only concern for now.

[Resolves #665]
@bingyao
Copy link

bingyao commented Jun 5, 2019

I recently also encounter this warning, after some reading, I posted an answer for your question in StackOverflow, just copied the answer here:

I think this warning is more like a notification & guidance to let the user know what is the PyYAML best practice in the future. Recall that: Explicit is better than implicit.


Before version 5.1 (e.g. 4.1), the yaml.load api use Loader=Loader as default:

def load(stream, Loader=Loader):
    """
    Parse the first YAML document in a stream
    and produce the corresponding Python object.
    """
    loader = Loader(stream)
    try:
        return loader.get_single_data()
    finally:
        loader.dispose()

def safe_load(stream):
    """
    Parse the first YAML document in a stream
    and produce the corresponding Python object.
    Resolve only basic YAML tags.
    """
    return load(stream, SafeLoader)

At that time, there were only three available choice for Loader class: limited BaseLoader, SafeLoader and the unsafe Loader. Although the default one is unsafe, just like we read from the doc:

PyYAML's load function has been unsafe since the first release in May
2006. It has always been documented that way in bold type: PyYAMLDocumentation.
PyYAML has always provided a safe_load function that can load a subset of
YAML without exploit.

But there are still a lot of resources and tutorials prefer using yaml.load(f) directly, so the users (and especially the new user) are choosing a default Loader class implicitly.


And since PyYAML version 5.1, the yaml.load api is changed to be more explicit:

def load(stream, Loader=None):
    """
    Parse the first YAML document in a stream
    and produce the corresponding Python object.
    """
    if Loader is None:
        load_warning('load')
        Loader = FullLoader

    loader = Loader(stream)
    try:
        return loader.get_single_data()
    finally:
        loader.dispose()

def safe_load(stream):
    """
    Parse the first YAML document in a stream
    and produce the corresponding Python object.
    Resolve only basic YAML tags. This is known
    to be safe for untrusted input.
    """
    return load(stream, SafeLoader)

And a new FullLoader is added into Loader classes. As users, we should also be aware of the changes and use yaml.load more explicitly:

  • yaml.load(stream, SafeLoader)

    Recommended for untrusted input. Limitation: Loads a subset of the YAML language.

  • yaml.load(stream, FullLoader)

    For more trusted input. Still a bit of Limitation: Avoids arbitrary code execution.

  • yaml.load(stream, Loader) (UnsafeLoader is the same as Loader)

    Unsafe. But has the full power.

@slavaGanzin
Copy link

slavaGanzin commented Jun 5, 2019

Explicit is better than implicit.

I just want my YAML loaded. I don't care about best practices, moral views of creators and other unrelated stuff. You made it more secure? Awesome, good job. But make changes unnoticeable: set securest loader by default. It's service library and it should make things done and not "educate users".

And if you like to cite principles, here is one for you too:
https://en.wikipedia.org/wiki/Open%E2%80%93closed_principle

@rpuntaie
Copy link

Flectra has a problem with pyyaml 5.1, which would not need to be. I just changed =Loader to =FullLoader consistently in __init__() and then flectra works with 5.1. Why the change in only load() to make it inconsistent with the other functions in __init__.py?

I would suggest some tests that test the interface of pyyaml as a whole, i.e. the interoperability of the functions, like add_constructor() and load().
Use these tests to verify that changes do not change the interface.
Interface changes do produce a lot of effort. We've seen this with the Python 3 move.

I wonder whether semantic versioning is the right way to go at all.
New interfaces should rather be handled via a new name and be installable in parallel.
So we would have pyyaml5 1.0 instead of pyyaml 5.1.0. In code we would do import yaml5 instead of import yaml.

@perlpunk
Copy link
Member

I just changed =Loader to =FullLoader consistently in __init__() and then flectra works with 5.1

This has been fixed in the 5.2 release branch a while ago, but the release was blocked.
We're trying to get that out as soon as possible now.

@perlpunk
Copy link
Member

perlpunk commented Dec 2, 2019

We released 5.2: https://pypi.org/project/PyYAML/5.2/

thawkson pushed a commit to thawkson/sceptre that referenced this issue Feb 6, 2021
Updates the PyYaml version to 5.1.

Adds in YamlLoader as per yaml/pyyaml#292

Other incompatible changes were reviewed
(yaml/pyyaml#265) and the yaml.Loader appears
to the be only concern for now.

[Resolves Sceptre#665]
@syed-shah100
Copy link

@perlpunk Is there a way to silence this warning without changing the code or has it been fixed in any newer versions? I am currently using PyYAML==5.3.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants