Add safe_load #119

postmodern · 2013-01-13T08:50:11Z

In lieu of the recent Rails YAML RCE vulnerability, Psych should provide a safe_load equivalent method, that only loads Ruby primitives.

tenderlove · 2013-01-13T20:21:24Z

Can you please be more specific about the desired behavior? Is it acceptable to load Symbols? Are subclasses of Ruby primitives considered "primitive"? If someone sets an instance variable on a string, then dumps / loads that string, should the instance variable survive?

Can you please demonstrate how to use YAML dump / load to execute arbitrary code? AFAIK it depends on objects other than YAML.

postmodern · 2013-01-14T02:19:10Z

Symbols, sure they are primitives. In order to prevent a DoS via Symbols, perhaps there should be a configuration mechanism to specify exactly what types/tags should be allowed.
Subclasses of primitives are no longer primitives. In safe mode the custom class could be ignored and the primitive class used instead.
Instance variables in primitives should not survive.

Can you please demonstrate how to use YAML dump / load to execute arbitrary code?

rails_rce.rb

AFAIK it depends on objects other than YAML.

Which is why I believe there should be a safe_load method or a configuration mechanism that restricts YAML to only loading primitives or explicitly allowed Classes.

tenderlove · 2013-01-14T02:21:10Z

@postmodern does the rails_rce.rb script depend on code other than YAML? I'm confused.

tenderlove · 2013-01-14T02:22:36Z

Symbols, sure they are primitives. In order to prevent a DoS via Symbols, perhaps there should be a configuration > mechanism to specify exactly what types/tags should be allowed.

Subclasses of primitives are no longer primitives. In safe mode the custom class could be ignored and the primitive > class used instead.

Instance variables in primitives should not survive.

Given these points, why not just use JSON?

postmodern · 2013-01-14T02:29:00Z

JSON does not support Symbols, does not support non-String-keyed Hashes and does not have special formatting such as:

summary: |
  bla bla bla bla bla bla
  bla bla bla bla

    indented stuff

  bla bla bla bla

Is there any reason YAML should not have a secure mode?

postmodern · 2013-01-14T02:33:50Z

@tenderlove rails_rce.rb uses !ruby/hash:MyClass and ActionController::Routing::RouteSet::NamedRouteCollection.

@charliesome's PoC used ActiveSupport::Deprecation::DeprecatedInstanceVariableProxy, ERB, Gem::Requirement and Rack::Session::Cookie::Base64::Marshal.

These exploits would not work if the YAML parser was configured to only allow primitives (aka safe_load).

haileys · 2013-01-14T02:42:09Z

+1. Only primitives that YAML can deserialize without using ! directives should be allowed.

No symbols IMO. I can't see a use case where you would want to allow symbols but still restrict other deserialization.

envygeeks · 2013-01-14T02:47:09Z

👍 on what @postmodern and @charliesome are saying. I really wish it supported a way for it to skip over everything and raise what isn't in the specs by default, and certainly like what @charliesome said about skipping over symbols too... the symbols part is kind of important to me because it would make it easier to keep a consistent configuration file to API without having to run to_s on all the keys myself before I create an indifferent Hash.

postmodern · 2013-01-14T02:50:51Z

Instead of having a big case/when statement of all the tags in to_ruby.rb, Psych could have a table of parser methods/procs. This way Psych could be configured to run in "safe mode". When an unknown tag is encountered, it could skip over it (and it's children) or raise an exception.

envygeeks · 2013-01-14T02:52:04Z

There is no other option other than raise an exception so it 💣 and somebody knows they did bad.

tenderlove · 2013-01-14T03:17:53Z

@postmodern so YAML didn't execute the code, but Rails did?

Is there any reason YAML should not have a secure mode?

I don't particularly care whether it does or not. We just have to be specific about what it means. Also, we need someone to write the patch. :-)

postmodern · 2013-01-14T03:26:30Z

@tenderlove YAML just happened to call #[]= on an instance of the Class we specified. ActionController::Routing::RouteSet::NamedRouteCollection just so happens to pass the key value from #[]= down to module_eval. We could have used other non-Rails Classes that somehow pass our data into instance_eval or send. CVE-2013-0156 is not really specific to Rails, other applications that accept arbitrary YAML are also vulnerable; such as chef-server which uses extlib.

Please see the Rails PoC write up for a full walk through.

tenderlove · 2013-01-14T06:29:23Z

Ah, so it's not specific to YAML, but anything that will feed strings to eval. Makes sense.

Aaron Patterson
http://tenderlovemaking.com/
I'm on an iPhone so I apologize for top posting.

On Jan 13, 2013, at 7:26 PM, Postmodern [email protected] wrote:

@tenderlove YAML just happened to call #[]= on an instance of the Class we specified. ActionController::Routing::RouteSet::NamedRouteCollection just so happens to pass the key value from #[]= down to module_eval. We could have used other non-Rails Classes that somehow pass our data into instance_eval or send. CVE-2013-0156 is not really specific to Rails, other applications that accept arbitrary YAML are also vulnerable; such as chef-server which uses extlib.

Please refer to the Rails PoC write up for a full walk through.

�
Reply to this email directly or view it on GitHub.

benmmurphy · 2013-01-14T09:48:07Z

Marshal.load has this problem as well and so would any serialization mechanism that does the following with arbitrary Foo classes and arbitrary field values.

obj = Foo.allocate
obj.instance_variable_set(field, value)

envygeeks · 2013-01-14T10:32:06Z

@benmmurphy it's in Marshal's nature to be like this by simple definition of what it is, YAML however, it's not.

trans · 2013-01-15T22:32:41Z

I think safe load can have a very simple definition: It simply loads the YAML as if no ! tags are present. The only exceptions would be the YAML standard types which are not the implicit types (e.g. omap).

Psych can add a yaml_tag method so a program can see what the tag is. This would allow a program to look at the tag and decide what to do with it if need be, or it can just ignore the tag if there is no need. Having the tag available is important though. In fact, a fundamental design principle of YAML is the ability to "round-trip" --a YAML document that is loaded and then re-emitted should be substantively the same. So we don't want to loose that information even if it is being ignored in a safe load.

envygeeks · 2013-01-15T22:37:28Z

👍 on @trans tag callback idea.

postmodern · 2013-01-15T22:41:09Z

👍 @trans. The definition is easy to grok.

NathanZook · 2013-01-21T03:08:00Z

I have been toying with my own ideas for a framework to do rpc/serialize objects for a while. I really don't know if this is the correct approach. Attribute assignment safety is not, and, in a language such as ruby, cannot ever be global. Indeed, some of the high-level classes (drb, tk, and the like) are the sorts of things that would make me nervous unless I examined them. (I count 82 instances of []= in /usr/local/lib/ruby/2.0.0.) On the other hand, there are a huge number of user classes for which []= is perfectly safe.

My approach is to whitelist. You can examine the existing ruby base and stdlib classes and determine if there is a problem with any of them. There are several ways to proceed from there, but most likely, the data should be held in a data structure inside Psyche. All classes not registered as safe then are unsafe. A facility would be provided to register a class as safe. (HashWithIndifferentAccess comes to mind) Presumably, Psyche might probe a previously unknown class to see if it lists itself as safe.

The next issue is attribute assignment generally. Again, if a whitelist approach is taken, (and exact matches required), it become fairly simple to proceed. Indeed, both of these facilities might well specify that some inputs are safe but not others.

With this approach, you have the full ability to do attribute assignment where it is known to be safe. And if a maintainer does not want to bother, then their classes are simply not safe.

A global switch that says, "while processing the following string, consider everything as safe" would bring back the existing behaviour--which is exactly what would be needed in some cases.

envygeeks · 2013-01-21T03:52:25Z

@Student Protecting setters is your problem, not Pysch's problem. Psych is not your object-guardian. The goal is to block unserializing attributes. Example: If people don't want symbols then they come out as ":value" rather than :value, as far as []= and "protecting it"... that is your job to protect. The rest of your argument, I'm not even gonna touch, I'll leave that to other folk to debate.

NathanZook · 2013-01-21T07:51:02Z

The question to me seems to be how much pain a user must endure in order to make use of a tool. If YAML is to compete with JSON, then YAML cannot require the user to audit their entire object library for []= weirdness. What's more, I doubt that I'm the only one that did not know that YAML was calling []= until last week--people won't even know about the threat. You can sit back at say nasty things about people who do weird stuff with []= or you can make a library that doesn't regularly get implicated in security problems.

Again, what I'm saying is that safety requires that unsafe actions be screened out by default. To me, that suggests an audit of Ruby's base classes and a facility for classes to register themselves as safe to whatever degree.

haileys · 2013-01-21T08:28:32Z

@Student: Why does this have to be so complicated? @trans's approach is the way to go - deserializing the YAML as if the ! directives didn't exist ensures that only 'primitive' Ruby types are created.

JakubOboza · 2013-01-21T10:40:07Z

Do we really needs this ? I have a strange feeling that making safe_load will be really hard and in the end nobody will use it because it will be so limited. Lets just not use YAML as something we accept from requests/external sources and just parse. If someone wants to do something like this he should explicitly say he wants to expose himself to problems that it carries with.

In my opinion we should use only/mostly JSON and Ruby. For config files ruby is so expressive that we can have instead of databas.yml like things something that will just load .rb file. aka

MyBaddAssWebAPP::Application.database do
  production do
    host: "localhost" 
     ....
  end
end

Or something similar to this. Do we need YAML at all ?

envygeeks · 2013-01-21T15:44:38Z

@JakubOboza How is it going to be limited? The fact of the matter is it already has an unserializer to move non-primates into their Objects, at that point it's a matter of having that Object tell you what kind of non-primative the Object wants to be and then you deciding if you want that non-primative, or giving you the ability to block non-primatives (like symbols) right off the bat. Since it has to have a decision engine for them to begin with (see lib/psych/visitors/to_ruby.rb) the logic for rejection is not really all that hard to add in. The callbacks might be hard, but only in the sense of how do you do it properly, not how do you implement it.

JakubOboza · 2013-01-21T16:55:54Z

Yeah right. So first of all do symbols get ever garbage collected ? No. So basically someone can makes series of reuqests with dozens of symbols and make your VM grow to insane size and die. Performing this DoS attack is not so hard.

So making load_safe or safe_load don't solve a problem but probably creates new ones :)

trans · 2013-01-21T16:56:28Z

@JakubOboza Using Ruby is the exact opposite of the solution to this issue. It would create even more security issues --insurmountable ones.

@Student Is your whitelist solution more complicated than we need? I wonder if whitelisting prior to loading is necessary when we can just safe-load plain YAML and then instantiate any nodes we need proactively. In other words, we can simply apply our own whitelist solution after loading given the plain YAML. That way Psych doesn't need to handle it and all the additional code and api it would entail.

However, I could see a nice general-purpose method for traversing arrays and hashes to do these conversions would be helpful to make that dead simple.

JakubOboza · 2013-01-21T16:59:09Z

Why ? if you use ruby in config files and never autoparse yaml in requests ? I gave the example just to show that we don't need yaml.

Everything you need from serialization format is list of objects thingy [ ], hash/map/object { } and string "" and i know working is not perfect but thats all and thats the stuff that you have in JSON.

envygeeks · 2013-01-21T16:59:19Z

@JakubOboza I think you should take a second, recollect your thoughts and come back when you are more rational and not spewing trash. Psych is what makes the non-primitives. So what you are saying is broken and flawed logic because safe_load would prevent the symbols from ever being created and would keep them as primitive strings.

NathanZook · 2013-01-22T00:21:52Z

First, an apology. I forgot that in general YAML is creating instances of objects, then making calls on these instances or setting attributes on them directly. Some of my first comments were coloured by that error. Clearly, it is safe to set attributes on a newly created instance, so there is no concern there.

However, if a class uses a singleton object, this is no longer safe, as global state is being overwritten. (I observe no occurrences of "ingleton" in my ruby 1.9.3p327 source.)

[]=, is another matter entirely, however. []= is no more a setter than a method ending in ? is a boolean probe. In fact, I would argue that it is less so. There is nothing at all wrong with having some sort of generic class that implements []= as sending messages to other classes. It just breaks some assumptions.

Now as I mentioned before, someone who wants to use gem X (which depends on gem Y) should not have to do research to figure out if it is safe to do so with YAML parsing external input. And just because I need to use a gem that needs a gem that implements an obscure class that is not safe, why should I be prevented from using YAML in the normal way for my entire application?

I audited the rails lib classes yesterday, and as of 2.0.0.preview2, they are clean. It is easy enough to create a structure to hold this data. When a unknown classes is encountered, it can be queried to determine if it has a yaml_safe method. If it returns false, then not even attributes will be set. If it returns true, then []= will be called with abandon. If it returns a symbol, that method will be called instead of []=. If it returns something that responds to call.... Otherwise, the class is registered as unknown, and attributes will be set but []= will not be called.

Of course, the global settings can also be used, and the end user can explicitly register whatever sort of bizarre rules make sense for them. Let the user decide!

With such a facility, the community has an easy way to register the external-input safety of their classes, and 99.9% of the people can proceed in ignorant bliss.

I REALLY want to be able to use YAML for parsing external input. JSON is entirely too lowest-common-denominator. XML is stilted at best. Marshall practically requires both ends to be running the same everything. DrB is for internal use only.

envygeeks · 2013-01-22T00:39:44Z

This shit is bananas. Seriously though, this thread turned from useful into a bag full of "come again say what?"

blambeau · 2013-02-12T14:37:53Z

@Kimcha

We are discussing convenience vs security and I cannot believe you are in favour of convenience at such a risk.

I'm affraid, my experience in the ruby world seems to suggest that ruby programmers tend to favor convenience over almost everything, including security. Given what you've already told about your project and concerns, I'm not sure the ruby/rails culture will fit very well.

envygeeks · 2013-02-12T14:40:13Z

@blambeau I'll just leave this here:

YAML
serialization language
serialization
broadly useful for programming needs
broadly useful
broadly
configuration files to Internet messaging
configuration files

Just because it says it can be used for the internet doesn't mean it's a good idea and doesn't mean "should".

postmodern · 2013-02-12T15:01:38Z

@Kimcha changing YAML.load would not just break the application, it would break many dependencies. The change would be comparable to upgrading from 1.8 to 1.9.

We are discussing convenience vs security and I cannot believe you are in favour of convenience at such a risk.

I do not agree that this is a binary choice between convenience and security, that we must be forced into choosing one or the other. It is simply the matter of the right tool for the right job:

Is the YAML input trusted: YAML.load
Is the YAML input from the user: YAML.load_safe

Just because nobody has reported to the rails security team that a part is vulnerable, doesnt mean it is not.

This is where one should take extreme measures, such as auditing their code-base for any YAML.load calls. On a more philosophical note, we can never know every vulnerability before they are reported. We can only respond to vulnerabilities as they are reported.

It is an never ending game to try to fix all the attacking vectors when you can simply fix it at the root of the issue in default behaviour.

You assume that every developer (and user) would immediately upgrade to the latest version of Psych containing safe_load. :) Although, there is a bit of truth to this. Even if we were to audit every rubygem and every Rails app, a clueless (or sleep deprived) developer could still use YAML.load unsafely. The Ruby programming language does not prevent the developer from making bad design decisions. Even if we changed YAML.load to behave like YAML.safe_load, there is still eval, instance_eval, module_eval, send, system, CSV.load, JSON.load. Likewise, SQL injection and XSS are still possible in the Java and Haskell programming languages. A competent developer must be aware of the potential dangers of their languages/frameworks and avoid them. I feel that Secure Coding Guidelines are a part of the solution here.

blambeau · 2013-02-12T15:07:41Z

@Kimcha something I haven't told you about the ruby world: confusion. A nice example from @postmodern here:

It is simply the matter of the right tool for the right job:

Is the YAML input trusted: YAML.load

Is the YAML input from the user: YAML.load_safe

Very strong confusion between functional requirements (loading data that might or might not contain only scalars) and non-functional requirements (keeping the operation safe).

There is no such right tool for the right job, especially when your job tells you to load non-scalar data in a safe way. Anyway...

mpapis · 2013-02-16T00:41:59Z

continuing #119 (comment) - I assume that every developer does not write to_s_safe, you just assume that default is the safe the same default in the feature should be safe, even if it takes 3 years to make it happen - we should make it happen.

Would it be hard to announce an deprecation today, add a warning in a year, allow flag in two years and switch the new default in 3 years - is it enough time to make things safe?

crazymykl · 2013-02-20T15:56:29Z

I think YAML.load should remain as it is, .load implies that we are trusting the input, just as in Marshal.load or CSV.load. Perhaps, by analogy, the primitives-only version should be YAML.parse?

envygeeks · 2013-02-20T16:38:26Z

If you want to fall inline with JSON then sure, which I wouldn't mind since it means consistency.

trevor · 2013-06-02T23:48:26Z

since it hasn't been mentioned, here's documentation that contains examples of python's / pyyaml's take on safe_load: http://pyyaml.org/wiki/PyYAMLDocumentation

tenderlove · 2013-07-05T19:55:51Z

safe_load is added and released in 2.0.0, so I'm closing this.

envygeeks · 2013-07-05T19:57:02Z

Thank you!

postmodern · 2013-07-05T22:42:30Z

w00t finally!

We can't just switch to `YAML.safe_load()` ourselves since that would break backwards compatibility. For instance, `safe_load` returns nil for empty yaml documents where `load` returns `false`. Also, `safe_load` will refuse to parse Symbol keys since DoS attacks targeting symbols are a real thread. Finally, not every Ruby version has a Psych that supports `safe_laod`. ruby/psych#119 (comment) Fixes #92

There's a vulnerability in `YAML.load` which can enable arbitrary code to be run on our sytems. `YAML.safe_load` does not deserialize unsafe classes. Related reading: http://blog.codeclimate.com/blog/2013/01/10/rails-remote-code-execution-vulnerability-explained/ ruby/psych#119 http://docs.ruby-lang.org/en/2.1.0/Psych.html#method-c-safe_load

We can't just switch to `YAML.safe_load()` ourselves since that would break backwards compatibility. For instance, `safe_load` returns nil for empty yaml documents where `load` returns `false`. Also, `safe_load` will refuse to parse Symbol keys since DoS attacks targeting symbols are a real thread. Finally, not every Ruby version has a Psych that supports `safe_laod`. ruby/psych#119 (comment) Fixes #92

@mislav

An issue from June 2014, lostisland#92, raises the risks of the current `FaradayMiddleware::ParseYaml` middleware which uses `YAML.load`. This method is very unsafe and exposes you to remote code execution - see ruby/psych#119 for discussion. At the time, @mislav decided not to make this change to avoid messing with backwards compatability. I would suggest that we should revisit this decision - the risks of this are very high, very few people are using this middleware most likely, and it doesn't seem unreasonable to break this as long as we are clear on the change in the changelog. This does that by installing the `safe_yaml` gem, which is compatible with all Ruby versions we support.

geemus mentioned this issue Jun 11, 2013

"Called 'load_file' without the :safe option -- defaulting to safe mode." fog/fog#1880

Closed

tenderlove closed this as completed Jul 5, 2013

gjtorikian mentioned this issue Sep 11, 2013

Add yaml front-matter filter gjtorikian/html-pipeline#83

Closed

dtao mentioned this issue Apr 9, 2014

Is safe_yaml safe from CVE-2014-2525? dtao/safe_yaml#56

Closed

bklang mentioned this issue Aug 29, 2014

unable to run scenario mojolingo/sippy_cup#63

Closed

gylaz mentioned this issue Feb 26, 2015

Use YAML.safe_load when loading custom config houndci/hound#684

Merged

croaky mentioned this issue Feb 26, 2015

Use YAML.safe_load rubocop/rubocop#1680

Merged

kbrock mentioned this issue Nov 13, 2015

Stopped logging warnings for large queue payload ManageIQ/manageiq#5393

Closed

otheus mentioned this issue Jul 22, 2016

Can we please get a "raw" loader? #286

Open

dometto mentioned this issue Apr 8, 2017

Yaml frontmatter gollum/gollum-lib#252

Merged

justin808 mentioned this issue Apr 10, 2017

Psych::DisallowedClass during some locale files convertion to js shakacode/react_on_rails#798

Closed

timrogers mentioned this issue Jun 14, 2017

Parse YAML safely lostisland/faraday_middleware#157

Merged

lacostej mentioned this issue Oct 1, 2017

undefined method `safe_load' for Psych:Module DragonBox/u3d#156

Closed

cben mentioned this issue May 31, 2018

[security] Config.read: Use YAML.safe_load to avoid arbitrary Ruby class instantiation ManageIQ/kubeclient#334

Merged

eli-darkly mentioned this issue Jul 24, 2019

Using YAML.load for the FileDataSource launchdarkly/ruby-server-sdk#139

Closed

Add safe_load #119

Add safe_load #119

Comments

postmodern commented Jan 13, 2013

tenderlove commented Jan 13, 2013

postmodern commented Jan 14, 2013

tenderlove commented Jan 14, 2013

tenderlove commented Jan 14, 2013

postmodern commented Jan 14, 2013

postmodern commented Jan 14, 2013

haileys commented Jan 14, 2013

envygeeks commented Jan 14, 2013

postmodern commented Jan 14, 2013

envygeeks commented Jan 14, 2013

tenderlove commented Jan 14, 2013

postmodern commented Jan 14, 2013

tenderlove commented Jan 14, 2013

benmmurphy commented Jan 14, 2013

envygeeks commented Jan 14, 2013

trans commented Jan 15, 2013

envygeeks commented Jan 15, 2013

postmodern commented Jan 15, 2013

NathanZook commented Jan 21, 2013

envygeeks commented Jan 21, 2013

NathanZook commented Jan 21, 2013

haileys commented Jan 21, 2013

JakubOboza commented Jan 21, 2013

envygeeks commented Jan 21, 2013

JakubOboza commented Jan 21, 2013

trans commented Jan 21, 2013

JakubOboza commented Jan 21, 2013

envygeeks commented Jan 21, 2013

NathanZook commented Jan 22, 2013

envygeeks commented Jan 22, 2013

blambeau commented Feb 12, 2013

envygeeks commented Feb 12, 2013

postmodern commented Feb 12, 2013

blambeau commented Feb 12, 2013

mpapis commented Feb 16, 2013

crazymykl commented Feb 20, 2013

envygeeks commented Feb 20, 2013

trevor commented Jun 2, 2013

tenderlove commented Jul 5, 2013

envygeeks commented Jul 5, 2013

postmodern commented Jul 5, 2013