Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safe_load #119

Closed
postmodern opened this issue Jan 13, 2013 · 129 comments
Closed

Add safe_load #119

postmodern opened this issue Jan 13, 2013 · 129 comments

Comments

@postmodern
Copy link
Member

In lieu of the recent Rails YAML RCE vulnerability, Psych should provide a safe_load equivalent method, that only loads Ruby primitives.

@tenderlove
Copy link
Member

Can you please be more specific about the desired behavior? Is it acceptable to load Symbols? Are subclasses of Ruby primitives considered "primitive"? If someone sets an instance variable on a string, then dumps / loads that string, should the instance variable survive?

Can you please demonstrate how to use YAML dump / load to execute arbitrary code? AFAIK it depends on objects other than YAML.

@postmodern
Copy link
Member Author

  • Symbols, sure they are primitives. In order to prevent a DoS via Symbols, perhaps there should be a configuration mechanism to specify exactly what types/tags should be allowed.
  • Subclasses of primitives are no longer primitives. In safe mode the custom class could be ignored and the primitive class used instead.
  • Instance variables in primitives should not survive.

Can you please demonstrate how to use YAML dump / load to execute arbitrary code?

rails_rce.rb

AFAIK it depends on objects other than YAML.

Which is why I believe there should be a safe_load method or a configuration mechanism that restricts YAML to only loading primitives or explicitly allowed Classes.

@tenderlove
Copy link
Member

@postmodern does the rails_rce.rb script depend on code other than YAML? I'm confused.

@tenderlove
Copy link
Member

  • Symbols, sure they are primitives. In order to prevent a DoS via Symbols, perhaps there should be a configuration > mechanism to specify exactly what types/tags should be allowed.
  • Subclasses of primitives are no longer primitives. In safe mode the custom class could be ignored and the primitive > class used instead.
  • Instance variables in primitives should not survive.

Given these points, why not just use JSON?

@postmodern
Copy link
Member Author

JSON does not support Symbols, does not support non-String-keyed Hashes and does not have special formatting such as:

summary: |
  bla bla bla bla bla bla
  bla bla bla bla

    indented stuff

  bla bla bla bla

Is there any reason YAML should not have a secure mode?

@postmodern
Copy link
Member Author

@tenderlove rails_rce.rb uses !ruby/hash:MyClass and ActionController::Routing::RouteSet::NamedRouteCollection.

@charliesome's PoC used ActiveSupport::Deprecation::DeprecatedInstanceVariableProxy, ERB, Gem::Requirement and Rack::Session::Cookie::Base64::Marshal.

These exploits would not work if the YAML parser was configured to only allow primitives (aka safe_load).

@haileys
Copy link

haileys commented Jan 14, 2013

+1. Only primitives that YAML can deserialize without using ! directives should be allowed.

No symbols IMO. I can't see a use case where you would want to allow symbols but still restrict other deserialization.

@envygeeks
Copy link

👍 on what @postmodern and @charliesome are saying. I really wish it supported a way for it to skip over everything and raise what isn't in the specs by default, and certainly like what @charliesome said about skipping over symbols too... the symbols part is kind of important to me because it would make it easier to keep a consistent configuration file to API without having to run to_s on all the keys myself before I create an indifferent Hash.

@postmodern
Copy link
Member Author

Instead of having a big case/when statement of all the tags in to_ruby.rb, Psych could have a table of parser methods/procs. This way Psych could be configured to run in "safe mode". When an unknown tag is encountered, it could skip over it (and it's children) or raise an exception.

@envygeeks
Copy link

There is no other option other than raise an exception so it 💣 and somebody knows they did bad.

@tenderlove
Copy link
Member

@postmodern so YAML didn't execute the code, but Rails did?

Is there any reason YAML should not have a secure mode?

I don't particularly care whether it does or not. We just have to be specific about what it means. Also, we need someone to write the patch. :-)

@postmodern
Copy link
Member Author

@tenderlove YAML just happened to call #[]= on an instance of the Class we specified. ActionController::Routing::RouteSet::NamedRouteCollection just so happens to pass the key value from #[]= down to module_eval. We could have used other non-Rails Classes that somehow pass our data into instance_eval or send. CVE-2013-0156 is not really specific to Rails, other applications that accept arbitrary YAML are also vulnerable; such as chef-server which uses extlib.

Please see the Rails PoC write up for a full walk through.

@tenderlove
Copy link
Member

Ah, so it's not specific to YAML, but anything that will feed strings to eval. Makes sense.

Aaron Patterson
http://tenderlovemaking.com/
I'm on an iPhone so I apologize for top posting.

On Jan 13, 2013, at 7:26 PM, Postmodern [email protected] wrote:

@tenderlove YAML just happened to call #[]= on an instance of the Class we specified. ActionController::Routing::RouteSet::NamedRouteCollection just so happens to pass the key value from #[]= down to module_eval. We could have used other non-Rails Classes that somehow pass our data into instance_eval or send. CVE-2013-0156 is not really specific to Rails, other applications that accept arbitrary YAML are also vulnerable; such as chef-server which uses extlib.

Please refer to the Rails PoC write up for a full walk through.


Reply to this email directly or view it on GitHub.

@benmmurphy
Copy link

Marshal.load has this problem as well and so would any serialization mechanism that does the following with arbitrary Foo classes and arbitrary field values.

obj = Foo.allocate
obj.instance_variable_set(field, value)

@envygeeks
Copy link

@benmmurphy it's in Marshal's nature to be like this by simple definition of what it is, YAML however, it's not.

@trans
Copy link

trans commented Jan 15, 2013

I think safe load can have a very simple definition: It simply loads the YAML as if no ! tags are present. The only exceptions would be the YAML standard types which are not the implicit types (e.g. omap).

Psych can add a yaml_tag method so a program can see what the tag is. This would allow a program to look at the tag and decide what to do with it if need be, or it can just ignore the tag if there is no need. Having the tag available is important though. In fact, a fundamental design principle of YAML is the ability to "round-trip" --a YAML document that is loaded and then re-emitted should be substantively the same. So we don't want to loose that information even if it is being ignored in a safe load.

@envygeeks
Copy link

👍 on @trans tag callback idea.

@postmodern
Copy link
Member Author

👍 @trans. The definition is easy to grok.

@NathanZook
Copy link

I have been toying with my own ideas for a framework to do rpc/serialize objects for a while. I really don't know if this is the correct approach. Attribute assignment safety is not, and, in a language such as ruby, cannot ever be global. Indeed, some of the high-level classes (drb, tk, and the like) are the sorts of things that would make me nervous unless I examined them. (I count 82 instances of []= in /usr/local/lib/ruby/2.0.0.) On the other hand, there are a huge number of user classes for which []= is perfectly safe.

My approach is to whitelist. You can examine the existing ruby base and stdlib classes and determine if there is a problem with any of them. There are several ways to proceed from there, but most likely, the data should be held in a data structure inside Psyche. All classes not registered as safe then are unsafe. A facility would be provided to register a class as safe. (HashWithIndifferentAccess comes to mind) Presumably, Psyche might probe a previously unknown class to see if it lists itself as safe.

The next issue is attribute assignment generally. Again, if a whitelist approach is taken, (and exact matches required), it become fairly simple to proceed. Indeed, both of these facilities might well specify that some inputs are safe but not others.

With this approach, you have the full ability to do attribute assignment where it is known to be safe. And if a maintainer does not want to bother, then their classes are simply not safe.

A global switch that says, "while processing the following string, consider everything as safe" would bring back the existing behaviour--which is exactly what would be needed in some cases.

@envygeeks
Copy link

@Student Protecting setters is your problem, not Pysch's problem. Psych is not your object-guardian. The goal is to block unserializing attributes. Example: If people don't want symbols then they come out as ":value" rather than :value, as far as []= and "protecting it"... that is your job to protect. The rest of your argument, I'm not even gonna touch, I'll leave that to other folk to debate.

@NathanZook
Copy link

The question to me seems to be how much pain a user must endure in order to make use of a tool. If YAML is to compete with JSON, then YAML cannot require the user to audit their entire object library for []= weirdness. What's more, I doubt that I'm the only one that did not know that YAML was calling []= until last week--people won't even know about the threat. You can sit back at say nasty things about people who do weird stuff with []= or you can make a library that doesn't regularly get implicated in security problems.

Again, what I'm saying is that safety requires that unsafe actions be screened out by default. To me, that suggests an audit of Ruby's base classes and a facility for classes to register themselves as safe to whatever degree.

@haileys
Copy link

haileys commented Jan 21, 2013

@Student: Why does this have to be so complicated? @trans's approach is the way to go - deserializing the YAML as if the ! directives didn't exist ensures that only 'primitive' Ruby types are created.

@JakubOboza
Copy link

Do we really needs this ? I have a strange feeling that making safe_load will be really hard and in the end nobody will use it because it will be so limited. Lets just not use YAML as something we accept from requests/external sources and just parse. If someone wants to do something like this he should explicitly say he wants to expose himself to problems that it carries with.

In my opinion we should use only/mostly JSON and Ruby. For config files ruby is so expressive that we can have instead of databas.yml like things something that will just load .rb file. aka

MyBaddAssWebAPP::Application.database do
  production do
    host: "localhost" 
     ....
  end
end

Or something similar to this. Do we need YAML at all ?

@envygeeks
Copy link

@JakubOboza How is it going to be limited? The fact of the matter is it already has an unserializer to move non-primates into their Objects, at that point it's a matter of having that Object tell you what kind of non-primative the Object wants to be and then you deciding if you want that non-primative, or giving you the ability to block non-primatives (like symbols) right off the bat. Since it has to have a decision engine for them to begin with (see lib/psych/visitors/to_ruby.rb) the logic for rejection is not really all that hard to add in. The callbacks might be hard, but only in the sense of how do you do it properly, not how do you implement it.

@JakubOboza
Copy link

Yeah right. So first of all do symbols get ever garbage collected ? No. So basically someone can makes series of reuqests with dozens of symbols and make your VM grow to insane size and die. Performing this DoS attack is not so hard.

So making load_safe or safe_load don't solve a problem but probably creates new ones :)

@trans
Copy link

trans commented Jan 21, 2013

@JakubOboza Using Ruby is the exact opposite of the solution to this issue. It would create even more security issues --insurmountable ones.

@Student Is your whitelist solution more complicated than we need? I wonder if whitelisting prior to loading is necessary when we can just safe-load plain YAML and then instantiate any nodes we need proactively. In other words, we can simply apply our own whitelist solution after loading given the plain YAML. That way Psych doesn't need to handle it and all the additional code and api it would entail.

However, I could see a nice general-purpose method for traversing arrays and hashes to do these conversions would be helpful to make that dead simple.

@JakubOboza
Copy link

Why ? if you use ruby in config files and never autoparse yaml in requests ? I gave the example just to show that we don't need yaml.

Everything you need from serialization format is list of objects thingy [ ], hash/map/object { } and string "" and i know working is not perfect but thats all and thats the stuff that you have in JSON.

@envygeeks
Copy link

@JakubOboza I think you should take a second, recollect your thoughts and come back when you are more rational and not spewing trash. Psych is what makes the non-primitives. So what you are saying is broken and flawed logic because safe_load would prevent the symbols from ever being created and would keep them as primitive strings.

@NathanZook
Copy link

First, an apology. I forgot that in general YAML is creating instances of objects, then making calls on these instances or setting attributes on them directly. Some of my first comments were coloured by that error. Clearly, it is safe to set attributes on a newly created instance, so there is no concern there.

However, if a class uses a singleton object, this is no longer safe, as global state is being overwritten. (I observe no occurrences of "ingleton" in my ruby 1.9.3p327 source.)

[]=, is another matter entirely, however. []= is no more a setter than a method ending in ? is a boolean probe. In fact, I would argue that it is less so. There is nothing at all wrong with having some sort of generic class that implements []= as sending messages to other classes. It just breaks some assumptions.

Now as I mentioned before, someone who wants to use gem X (which depends on gem Y) should not have to do research to figure out if it is safe to do so with YAML parsing external input. And just because I need to use a gem that needs a gem that implements an obscure class that is not safe, why should I be prevented from using YAML in the normal way for my entire application?

I audited the rails lib classes yesterday, and as of 2.0.0.preview2, they are clean. It is easy enough to create a structure to hold this data. When a unknown classes is encountered, it can be queried to determine if it has a yaml_safe method. If it returns false, then not even attributes will be set. If it returns true, then []= will be called with abandon. If it returns a symbol, that method will be called instead of []=. If it returns something that responds to call.... Otherwise, the class is registered as unknown, and attributes will be set but []= will not be called.

Of course, the global settings can also be used, and the end user can explicitly register whatever sort of bizarre rules make sense for them. Let the user decide!

With such a facility, the community has an easy way to register the external-input safety of their classes, and 99.9% of the people can proceed in ignorant bliss.

I REALLY want to be able to use YAML for parsing external input. JSON is entirely too lowest-common-denominator. XML is stilted at best. Marshall practically requires both ends to be running the same everything. DrB is for internal use only.

@envygeeks
Copy link

This shit is bananas. Seriously though, this thread turned from useful into a bag full of "come again say what?"

@blambeau
Copy link

@Kimcha

We are discussing convenience vs security and I cannot believe you are in favour of convenience at such a risk.

I'm affraid, my experience in the ruby world seems to suggest that ruby programmers tend to favor convenience over almost everything, including security. Given what you've already told about your project and concerns, I'm not sure the ruby/rails culture will fit very well.

@envygeeks
Copy link

@blambeau I'll just leave this here:

YAML
serialization language
serialization
broadly useful for programming needs
broadly useful
broadly
configuration files to Internet messaging
configuration files

Just because it says it can be used for the internet doesn't mean it's a good idea and doesn't mean "should".

@postmodern
Copy link
Member Author

@Kimcha changing YAML.load would not just break the application, it would break many dependencies. The change would be comparable to upgrading from 1.8 to 1.9.

We are discussing convenience vs security and I cannot believe you are in favour of convenience at such a risk.

I do not agree that this is a binary choice between convenience and security, that we must be forced into choosing one or the other. It is simply the matter of the right tool for the right job:

  • Is the YAML input trusted: YAML.load
  • Is the YAML input from the user: YAML.load_safe

Just because nobody has reported to the rails security team that a part is vulnerable, doesnt mean it is not.

This is where one should take extreme measures, such as auditing their code-base for any YAML.load calls. On a more philosophical note, we can never know every vulnerability before they are reported. We can only respond to vulnerabilities as they are reported.

It is an never ending game to try to fix all the attacking vectors when you can simply fix it at the root of the issue in default behaviour.

You assume that every developer (and user) would immediately upgrade to the latest version of Psych containing safe_load. :) Although, there is a bit of truth to this. Even if we were to audit every rubygem and every Rails app, a clueless (or sleep deprived) developer could still use YAML.load unsafely. The Ruby programming language does not prevent the developer from making bad design decisions. Even if we changed YAML.load to behave like YAML.safe_load, there is still eval, instance_eval, module_eval, send, system, CSV.load, JSON.load. Likewise, SQL injection and XSS are still possible in the Java and Haskell programming languages. A competent developer must be aware of the potential dangers of their languages/frameworks and avoid them. I feel that Secure Coding Guidelines are a part of the solution here.

@blambeau
Copy link

@Kimcha something I haven't told you about the ruby world: confusion. A nice example from @postmodern here:

It is simply the matter of the right tool for the right job:

  • Is the YAML input trusted: YAML.load
  • Is the YAML input from the user: YAML.load_safe

Very strong confusion between functional requirements (loading data that might or might not contain only scalars) and non-functional requirements (keeping the operation safe).

There is no such right tool for the right job, especially when your job tells you to load non-scalar data in a safe way. Anyway...

@mpapis
Copy link

mpapis commented Feb 16, 2013

continuing #119 (comment) - I assume that every developer does not write to_s_safe, you just assume that default is the safe the same default in the feature should be safe, even if it takes 3 years to make it happen - we should make it happen.

Would it be hard to announce an deprecation today, add a warning in a year, allow flag in two years and switch the new default in 3 years - is it enough time to make things safe?

@crazymykl
Copy link

I think YAML.load should remain as it is, .load implies that we are trusting the input, just as in Marshal.load or CSV.load. Perhaps, by analogy, the primitives-only version should be YAML.parse?

@envygeeks
Copy link

If you want to fall inline with JSON then sure, which I wouldn't mind since it means consistency.

@trevor
Copy link

trevor commented Jun 2, 2013

since it hasn't been mentioned, here's documentation that contains examples of python's / pyyaml's take on safe_load: http://pyyaml.org/wiki/PyYAMLDocumentation

@tenderlove
Copy link
Member

safe_load is added and released in 2.0.0, so I'm closing this.

@envygeeks
Copy link

Thank you!

@postmodern
Copy link
Member Author

w00t finally!

mislav added a commit to lostisland/faraday_middleware that referenced this issue Oct 10, 2014
We can't just switch to `YAML.safe_load()` ourselves since that would
break backwards compatibility. For instance, `safe_load` returns nil for
empty yaml documents where `load` returns `false`. Also, `safe_load`
will refuse to parse Symbol keys since DoS attacks targeting symbols are
a real thread. Finally, not every Ruby version has a Psych that supports
`safe_laod`.

ruby/psych#119 (comment)

Fixes #92
gylaz pushed a commit to houndci/hound that referenced this issue Feb 26, 2015
There's a vulnerability in `YAML.load` which can enable arbitrary code to be run
on our sytems. `YAML.safe_load` does not deserialize unsafe classes.

Related reading:
http://blog.codeclimate.com/blog/2013/01/10/rails-remote-code-execution-vulnerability-explained/
ruby/psych#119
http://docs.ruby-lang.org/en/2.1.0/Psych.html#method-c-safe_load
mislav added a commit to lostisland/faraday_middleware that referenced this issue Jul 7, 2015
We can't just switch to `YAML.safe_load()` ourselves since that would
break backwards compatibility. For instance, `safe_load` returns nil for
empty yaml documents where `load` returns `false`. Also, `safe_load`
will refuse to parse Symbol keys since DoS attacks targeting symbols are
a real thread. Finally, not every Ruby version has a Psych that supports
`safe_laod`.

ruby/psych#119 (comment)

Fixes #92
petems pushed a commit to petems/faraday_middleware_safeyaml that referenced this issue Jul 3, 2017
An issue from June 2014,
lostisland#92, raises
the risks of the current `FaradayMiddleware::ParseYaml` middleware
which uses `YAML.load`. This method is very unsafe and exposes you
to remote code execution - see
ruby/psych#119 for discussion.

At the time, @mislav decided not to make this change to avoid
messing with backwards compatability.

I would suggest that we should revisit this decision - the risks
of this are very high, very few people are using this middleware
most likely, and it doesn't seem unreasonable to break this
as long as we are clear on the change in the changelog.

This does that by installing the `safe_yaml` gem, which is
compatible with all Ruby versions we support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests