Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] nokogiri-1.11.0-x86_64-linux causes Active Support failure #2168

Closed
yahonda opened this issue Jan 5, 2021 · 8 comments
Closed

[bug] nokogiri-1.11.0-x86_64-linux causes Active Support failure #2168

yahonda opened this issue Jan 5, 2021 · 8 comments

Comments

@yahonda
Copy link

yahonda commented Jan 5, 2021

Please describe the bug

Rails CI got failed using nokogiri-1.11.0-x86_64-linux native gem. It does not reproduce without native gem or nokogiri-1.11.0-x86_64-darwin

This issue has been reported at rails/rails#41015 first.

Help us reproduce what you're seeing

git clone https://github.com/rails/rails.git
cd rails
rm Gemfile.lock
bundle install
bundle info nokogiri
cd activesupport
bin/test test/xml_mini/libxmlsax_engine_test.rb test/xml_mini/nokogirisax_engine_test.rb --seed 30319 -n "/^(?:LibXMLSAXEngineTest#(?:test_blank_returns_empty_hash)|NokogiriSAXEngineTest#(?:test_exception_thrown_on_expansion_attack))$/"

Expected behavior
It should pass.

Actual behavior

$ bundle info nokogiri
  * nokogiri (1.10.10)
	Summary: Nokogiri () is an HTML, XML, SAX, and Reader parser
	Homepage: https://nokogiri.org
	Documentation: https://nokogiri.org/rdoc/index.html
	Source Code: https://github.com/sparklemotion/nokogiri
	Changelog: https://nokogiri.org/CHANGELOG.html
	Bug Tracker: https://github.com/sparklemotion/nokogiri/issues
	Path: /home/yahonda/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/nokogiri-1.10.10
$ cd activesupport
$ bin/test test/xml_mini/libxmlsax_engine_test.rb test/xml_mini/nokogirisax_engine_test.rb --seed 30319 -n "/^(?:LibXMLSAXEngineTest#(?:test_blank_returns_empty_hash)|NokogiriSAXEngineTest#(?:test_exception_thrown_on_expansion_attack))$/"

Run options: --seed 30319 -n "/^(?:LibXMLSAXEngineTest#(?:test_blank_returns_empty_hash)|NokogiriSAXEngineTest#(?:test_exception_thrown_on_expansion_attack))$/"

# Running:

..

Finished in 0.202464s, 9.8783 runs/s, 14.8174 assertions/s.
2 runs, 3 assertions, 0 failures, 0 errors, 0 skips
$

Environment

$ bundle info nokogiri
  * nokogiri (1.11.0)
	Summary: Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.
	Homepage: https://nokogiri.org
	Documentation: https://nokogiri.org/rdoc/index.html
	Source Code: https://github.com/sparklemotion/nokogiri
	Changelog: https://nokogiri.org/CHANGELOG.html
	Bug Tracker: https://github.com/sparklemotion/nokogiri/issues
	Path: /home/yahonda/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/nokogiri-1.11.0-x86_64-linux
$ bundle exec nokogiri -v
# Nokogiri (1.11.0)
    ---
    warnings: []
    nokogiri:
      version: 1.11.0
      cppflags:
      - "-I/home/yahonda/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/nokogiri-1.11.0-x86_64-linux/ext/nokogiri"
      - "-I/home/yahonda/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/nokogiri-1.11.0-x86_64-linux/ext/nokogiri/include"
      - "-I/home/yahonda/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/nokogiri-1.11.0-x86_64-linux/ext/nokogiri/include/libxml2"
    ruby:
      version: 3.0.0
      platform: x86_64-linux
      gem_platform: x86_64-linux
      description: ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]
      engine: ruby
    libxml:
      source: packaged
      precompiled: true
      patches:
      - 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
      - 0002-Remove-script-macro-support.patch
      - 0003-Update-entities-to-remove-handling-of-ssi.patch
      - 0004-libxml2.la-is-in-top_builddir.patch
      - 0005-Fix-infinite-loop-in-xmlStringLenDecodeEntities.patch
      - 0006-htmlParseComment-treat-as-if-it-closed-the-comment.patch
      - 0007-use-new-htmlParseLookupCommentEnd-to-find-comment-en.patch
      - '0008-use-glibc-strlen.patch'
      - '0009-avoid-isnan-isinf.patch'
      libxml2_path: "/home/yahonda/.rbenv/versions/3.0.0/lib/ruby/gems/3.0.0/gems/nokogiri-1.11.0-x86_64-linux/ext/nokogiri"
      iconv_enabled: true
      compiled: 2.9.10
      loaded: 2.9.10
    libxslt:
      source: packaged
      precompiled: true
      patches: []
      compiled: 1.1.34
      loaded: 1.1.34
    other_libraries:
      zlib: 1.2.11
$

Additional context

@flavorjones
Copy link
Member

Thanks for reporting, I'll take a look.

@flavorjones
Copy link
Member

Just a note for lurkers: In order to set up the failure, I needed to bundle set --local without storage to avoid installing azure-storage-blob v2.0.0 which depends on Nokogiri ~> 1.10.4.

I've reproduced this (on x86_64-linux).

@flavorjones
Copy link
Member

OK - this isn't an issue with the precompiled gems, this is an issue with v1.11.0 generally. I'm able to reproduce this failure with both nokogiri-1.11.0-x86_64-linux and nokogiri-1.11.0 (the ruby platform gem file).

@flavorjones
Copy link
Member

git bisect indicates this is the offending commit:

771164daed8a1c6b902c7ca086ba479729b13f72 is the first bad commit
commit 771164daed8a1c6b902c7ca086ba479729b13f72
Author: Mike Dalessio <[email protected]>
Date:   Mon Dec 28 12:59:46 2020 -0500

    fix(cruby): stop clobbering libxml2 error handler on SAX parser init
    
    This was leading to loss of error capture on extremely short HTML docs
    when encoding was not passed by the caller.
    
    This call was introduced in d23fe2c (#87) for reasons that are
    unclear, but we've come a long way with how we manage the global error
    handlers and so I think we're OK to stop doing this now.

 CHANGELOG.md                        |  1 +
 ext/nokogiri/xml_sax_parser.c       |  2 --
 test/html/test_document_encoding.rb | 11 +++++++++++
 3 files changed, 12 insertions(+), 2 deletions(-)

@flavorjones
Copy link
Member

OK, there's a couple of things going on here.

First, here's the test that's failing:

    def test_exception_thrown_on_expansion_attack
      assert_raise expansion_attack_error do
        Hash.from_xml(<<~eoxml)
          <?xml version="1.0" encoding="UTF-8"?>
          <!DOCTYPE member [
            <!ENTITY a "&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;">
            <!ENTITY b "&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;">
            <!ENTITY c "&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;">
            <!ENTITY d "&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;">
            <!ENTITY e "&f;&f;&f;&f;&f;&f;&f;&f;&f;&f;">
            <!ENTITY f "&g;&g;&g;&g;&g;&g;&g;&g;&g;&g;">
            <!ENTITY g "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx">
          ]>
          <member>
            &a;
          </member>
        eoxml
      end
    end

This is supposed to be testing for a RuntimeError when an XXE attack doc is parsed. Unfortunately, this is not what's being tested. Because the xml string is created with <<-eoxml it has whitespace at the start of the string, and libxml2 raises the following fatal error: Fatal error: XML declaration allowed only at the start of the document at :1. instead of the soft error Fatal error: Entity 'a' not defined at :12. which is what should be getting tested if there was no initial whitespace.

Second, the change made within nokogiri in commit 771164d avoids clobbering the global error handler used by libxml2. Unfortunately, it looks like a handler has been set up that has been trapping the fatal error in this test (possibly libxml-ruby?) and so that handler is still active during this test.

Points to investigate:

  • who's setting that handler?
  • can we fix this in a patch version of nokogiri without having to patch rails or rails tests?

Things to do now:

  • patch Nokogiri
  • send a PR to the rails test suite to make sure it's testing the right thing

Things to do later:

  • take inventory of whether nokogiri is showing good hygiene in how it's setting this handler (I think it is but let's make sure)

@flavorjones
Copy link
Member

I've confirmed that libxml-ruby is setting the handler that's active during this Nokogiri xml_mini test. The change in Nokogiri that allowed this to happen was the removal of some defensive code to prevent using foreign handlers.

The approach I'd like to take here is to set up the test suite to set a foreign handler before every test, and assert that it doesn't get called. This should expose the areas in which we have poor handler hygiene, and give me a failing test (or many of them) to make pass.

This will take a few more hours, but the net result will be:

  • more robust error handling
  • protection against regressions
  • a deeper understanding of what objects might need to change to track errors → future work

flavorjones added a commit that referenced this issue Jan 6, 2021
to check that we're setting error handlers everywhere we need to.

Related to #2168
flavorjones added a commit that referenced this issue Jan 6, 2021
Note that this change regresses the behavior changed in 771164d which
we'll have to fix in an upcoming commit.

Related to #2168, #87
flavorjones added a commit that referenced this issue Jan 6, 2021
to check that we're setting error handlers everywhere we need to.

Related to #2168
flavorjones added a commit that referenced this issue Jan 6, 2021
Note that this change regresses the behavior changed in 771164d which
we'll have to fix in an upcoming commit.

Related to #2168, #87
flavorjones added a commit that referenced this issue Jan 6, 2021
@flavorjones
Copy link
Member

#2169 has been merged, I'm planning to ship a release for this tonight.

@flavorjones flavorjones added topic/libxml-ruby and removed state/needs-triage Inbox for non-installation-related bug reports or help requests labels Jan 6, 2021
@flavorjones flavorjones added this to the v1.11.1 milestone Jan 6, 2021
@flavorjones
Copy link
Member

Release v1.11.1 / 2021-01-06 has been shipped with this fix.

This was referenced Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants