Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport tests and fix for CVE-2018-3740 to 2.x branch. Resolves #187 #188

Merged
merged 7 commits into from
Sep 30, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
Sanitize History
================================================================================

Version 2.1.1 (2018-30-09)
--------------------------

* [CVE-2018-3740][176]: Backported the fix for an HTML injection vulnerability that could allow
XSS from the `sanitize 4.x` line.

When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
specially crafted HTML fragment can cause libxml2 to generate improperly
escaped output, allowing non-whitelisted attributes to be used on whitelisted
elements.

Sanitize now performs additional escaping on affected attributes to prevent
this.

Many thanks to the Shopify Application Security Team for responsibly reporting
this issue.

[176]:https://github.com/rgrove/sanitize/issues/176
[187]:https://github.com/rgrove/sanitize/issues/187

Version 2.1.0 (2014-01-13)
--------------------------

Expand Down
54 changes: 53 additions & 1 deletion lib/sanitize/transformers/clean_element.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,32 @@
class Sanitize; module Transformers

class CleanElement

# Attributes that need additional escaping on `<a>` elements due to unsafe
# libxml2 behavior.
UNSAFE_LIBXML_ATTRS_A = Set.new(%w[
name
])

# Attributes that need additional escaping on all elements due to unsafe
# libxml2 behavior.
UNSAFE_LIBXML_ATTRS_GLOBAL = Set.new(%w[
action
href
src
])

# Mapping of original characters to escape sequences for characters that
# should be escaped in attributes affected by unsafe libxml2 behavior.
UNSAFE_LIBXML_ESCAPE_CHARS = {
' ' => '%20',
'"' => '%22'
}

# Regex that matches any single character that needs to be escaped in
# attributes affected by unsafe libxml2 behavior.
UNSAFE_LIBXML_ESCAPE_REGEX = /[ "]/

def initialize(config)
@config = config

Expand Down Expand Up @@ -88,11 +114,37 @@ def call(env)
!protocol[attr_name].include?(:relative)
end

attr.unlink if del
if del
attr.unlink
else
# Leading and trailing whitespace around URLs is ignored at parse
# time. Stripping it here prevents it from being escaped by the
# libxml2 workaround below.
attr.value = attr.value.strip
end
end
end
end

# libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
# attempt to preserve server-side includes. This can result in XSS since
# an unescaped double quote can allow an attacker to inject a
# non-whitelisted attribute.
#
# Sanitize works around this by implementing its own escaping for
# affected attributes, some of which can exist on any element and some
# of which can only exist on `<a>` elements.
#
# The relevant libxml2 code is here:
# <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588>
node.attribute_nodes.each do |attr|
attr_name = attr.name.downcase
if UNSAFE_LIBXML_ATTRS_GLOBAL.include?(attr_name) ||
(name == 'a' && UNSAFE_LIBXML_ATTRS_A.include?(attr_name))
attr.value = attr.value.gsub(UNSAFE_LIBXML_ESCAPE_REGEX, UNSAFE_LIBXML_ESCAPE_CHARS)
end
end

# Add required attributes.
if @add_attributes.has_key?(name)
@add_attributes[name].each {|key, val| node[key] = val }
Expand Down
2 changes: 1 addition & 1 deletion lib/sanitize/version.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
class Sanitize
VERSION = '2.1.0'
VERSION = '2.1.1'
end
84 changes: 79 additions & 5 deletions test/test_sanitize.rb
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,10 @@

:malicious => {
:html => '<b>Lo<!-- comment -->rem</b> <a href="javascript:pants" title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br/>amet <<foo>script>alert("hello world");</script>',
:default => 'Lorem ipsum dolor sit amet script&gt;alert("hello world");',
:restricted => '<b>Lorem</b> ipsum <strong>dolor</strong> sit amet script&gt;alert("hello world");',
:basic => '<b>Lorem</b> <a rel="nofollow">ipsum</a> <a href="http://foo.com/" rel="nofollow"><strong>dolor</strong></a> sit<br>amet script&gt;alert("hello world");',
:relaxed => '<b>Lorem</b> <a title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br>amet script&gt;alert("hello world");'
:default => 'Lorem ipsum dolor sit amet &lt;script&gt;alert("hello world");',
:restricted => '<b>Lorem</b> ipsum <strong>dolor</strong> sit amet &lt;script&gt;alert("hello world");',
:basic => '<b>Lorem</b> <a rel="nofollow">ipsum</a> <a href="http://foo.com/" rel="nofollow"><strong>dolor</strong></a> sit<br>amet &lt;script&gt;alert("hello world");',
:relaxed => '<b>Lorem</b> <a title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br>amet &lt;script&gt;alert("hello world");'
},

:raw_comment => {
Expand Down Expand Up @@ -181,7 +181,7 @@
:default => '',
:restricted => '',
:basic => '',
:relaxed => '<img src="">'
:relaxed => '<img src>'
}
}

Expand Down Expand Up @@ -645,3 +645,77 @@
Sanitize.clean!('foo <style>bar').must_equal('foo bar')
end
end

describe 'Malicious HTML' do
make_my_diffs_pretty!
parallelize_me!

before do
@s = Sanitize.new(Sanitize::Config::RELAXED)
end

# libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
# attempt to preserve server-side includes. This can result in XSS since an
# unescaped double quote can allow an attacker to inject a non-whitelisted
# attribute. Sanitize works around this by implementing its own escaping for
# affected attributes.
#
# The relevant libxml2 code is here:
# <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588>
describe 'unsafe libxml2 server-side includes in attributes' do
tag_configs = [
{
tag_name: 'a',
escaped_attrs: %w[ action href src name ],
unescaped_attrs: []
},

{
tag_name: 'div',
escaped_attrs: %w[ action href src ],
unescaped_attrs: %w[ name ]
}
]

before do
@s = Sanitize.new({
elements: %w[ a div ],

attributes: {
all: %w[ action href src name ]
}
})
end

tag_configs.each do |tag_config|
tag_name = tag_config[:tag_name]

tag_config[:escaped_attrs].each do |attr_name|
input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]

it 'should escape unsafe characters in attributes' do
@s.clean(input).must_equal(%[<#{tag_name} #{attr_name}="examp<!--%22%20onmouseover=alert(1)>-->le.com">foo</#{tag_name}>])
end

it 'should round-trip to the same output' do
output = @s.clean(input)
@s.clean(output).must_equal(output)
end
end

tag_config[:unescaped_attrs].each do |attr_name|
input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]

it 'should not escape characters unnecessarily' do
@s.clean(input).must_equal(input)
end

it 'should round-trip to the same output' do
output = @s.clean(input)
@s.clean(output).must_equal(output)
end
end
end
end
end