Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrectly calculated sum in #unnormalize #193

Closed
vikiv480 opened this issue Aug 6, 2024 · 2 comments · Fixed by #195
Closed

Incorrectly calculated sum in #unnormalize #193

vikiv480 opened this issue Aug 6, 2024 · 2 comments · Fixed by #195

Comments

@vikiv480
Copy link
Contributor

vikiv480 commented Aug 6, 2024

Describe the bug

I'm not completely familiar with this repo so please enlighten me if I'm wrong. I suspect sum is calculated incorrectly in #unnormalize. rv.bytesize is added multiple times over, even for matches that has already been substituted.

if matches.size > 0
sum = 0
matches.each do |entity_reference|
unless filter and filter.include?(entity_reference)
entity_value = entity( entity_reference, entities )
if entity_value
re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
rv.gsub!( re, entity_value )
sum += rv.bytesize
if sum > Security.entity_expansion_text_limit
raise "entity expansion has grown too large"
end
else
er = DEFAULT_ENTITIES[entity_reference]
rv.gsub!( er[0], er[2] ) if er
end
end
end
rv.gsub!( Private::DEFAULT_ENTITIES_PATTERNS['amp'], '&' )
end


How to reproduce

require "rexml/parsers/baseparser"

entity_less_than = "<"
entitiy_length = 100

filler_text = "A"
filler_length = 100

feed = "#{entity_less_than * entitiy_length}#{filler_text * filler_length}"

base_parser = REXML::Parsers::BaseParser.new("")
base_parser.unnormalize(feed) # => "<" * 100 + "A" * 100

Error:

❯ bundle exec ruby test.rb
/Users/vikiv/.gem/ruby/3.1.2/gems/rexml-3.3.4/lib/rexml/parsers/baseparser.rb:560:in `block in unnormalize': entity expansion has grown too large (RuntimeError)
	from /Users/vikiv/.gem/ruby/3.1.2/gems/rexml-3.3.4/lib/rexml/parsers/baseparser.rb:552:in `each'
	from /Users/vikiv/.gem/ruby/3.1.2/gems/rexml-3.3.4/lib/rexml/parsers/baseparser.rb:552:in `unnormalize'
	from test.rb:13:in `<main>'

Suggestion/fix

diff --git a/lib/rexml/parsers/baseparser.rb b/lib/rexml/parsers/baseparser.rb
index 28810bf..e6d6ae6 100644
--- a/lib/rexml/parsers/baseparser.rb
+++ b/lib/rexml/parsers/baseparser.rb
@@ -549,7 +549,7 @@ module REXML
         matches.collect!{|x|x[0]}.compact!
         if matches.size > 0
           sum = 0
-          matches.each do |entity_reference|
+          matches.uniq.each do |entity_reference|
             unless filter and filter.include?(entity_reference)
               entity_value = entity( entity_reference, entities )
               if entity_value
@@ -557,7 +557,7 @@ module REXML
                 rv.gsub!( re, entity_value )
                 sum += rv.bytesize
                 if sum > Security.entity_expansion_text_limit
-                  raise "entity expansion has grown too large"
+                  # raise "entity expansion has grown too large"
                 end
               else
                 er = DEFAULT_ENTITIES[entity_reference]
@@ -566,6 +566,7 @@ module REXML
             end
           end
           rv.gsub!( Private::DEFAULT_ENTITIES_PATTERNS['amp'], '&' )
+          puts "Sum of entity expansions: #{sum}"
         end
         rv
       end

Result:

# Before change
❯ bundle exec ruby test.rb
Sum of entity expansions: 20000

# After change
❯ bundle exec ruby test.rb
Sum of entity expansions: 200
@dstarner
Copy link

dstarner commented Aug 6, 2024

We are also encountering the entity expansion has grown too large exception on 3.3.4. Reverting back to 3.3.2 resolved the issue for us.

vikiv480 added a commit to vikiv480/rexml that referenced this issue Aug 7, 2024
* Improve `#unnormalize` by only iterating over unique matches
* Fix bug where `sum` for `#unnormalize` is calculated multiple times over causing a runtime error "entity expansion has grown too large"
* Adjust tests to the reflect the changes to the `entity_expansion_count`

See ruby#193
@otegami
Copy link
Contributor

otegami commented Aug 8, 2024

Below is an example of a problem you might encounter while using REXML::Parsers::BaseParser to parse XML data. If this helps in any way, I'd be delighted.

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "rexml", path: "~/work/ruby/rexml"
end

require 'rexml/parsers/baseparser'

xml = <<-XML
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <e type="test" ver="1">
        Commiter List: &quot;A&quot; of 1 commit, &quot;B&quot; of 2 commits, &quot;C&quot; of 3 commits, &quot;D&quot; of 4 commits, 
        &quot;E&quot; of 5 commits, &quot;F&quot; of 6 commits, &quot;G&quot; of 7 commits, &quot;H&quot; of 8 commits, 
        &quot;I&quot; of 9 commits, &quot;J&quot; of 10 commits, &quot;K&quot; of 11 commits, &quot;L&quot; of 12 commits, 
        &quot;M&quot; of 13 commits, &quot;N&quot; of 14 commits, &quot;O&quot; of 15 commits, &quot;P&quot; of 16 commits, 
        &quot;Q&quot; of 17 commits, &quot;R&quot; of 18 commits, &quot;S&quot; of 19 commits, &quot;T&quot; of 20 commits, 
        &quot;U&quot; of 21 commits, &quot;V&quot; of 22 commits, &quot;W&quot; of 23 commits, &quot;X&quot; of 24 commits, 
        &quot;Y&quot; of 25 commits, &quot;Z&quot; of 26 commits.
    </e>
</root>
XML

parser = REXML::Parsers::BaseParser.new('')
parser.unnormalize(xml)
$ git diff
diff --git a/lib/rexml/parsers/baseparser.rb b/lib/rexml/parsers/baseparser.rb
index 28810bf..0d235ba 100644
--- a/lib/rexml/parsers/baseparser.rb
+++ b/lib/rexml/parsers/baseparser.rb
@@ -549,6 +549,7 @@ module REXML
         matches.collect!{|x|x[0]}.compact!
         if matches.size > 0
           sum = 0
+          p matches
           matches.each do |entity_reference|
             unless filter and filter.include?(entity_reference)
               entity_value = entity( entity_reference, entities )
@@ -556,6 +557,7 @@ module REXML
                 re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
                 rv.gsub!( re, entity_value )
                 sum += rv.bytesize
+                p sum
                 if sum > Security.entity_expansion_text_limit
                   raise "entity expansion has grown too large"
                 end
$ ruby xml_parser.rb
["quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot", "quot"]
652
1304
1956
2608
3260
3912
4564
5216
5868
6520
7172
7824
8476
9128
9780
10432
/home/otegami/work/ruby/rexml/lib/rexml/parsers/baseparser.rb:562:in `block in unnormalize': entity expansion has grown too large (RuntimeError)
	from /home/otegami/work/ruby/rexml/lib/rexml/parsers/baseparser.rb:553:in `each'
	from /home/otegami/work/ruby/rexml/lib/rexml/parsers/baseparser.rb:553:in `unnormalize'
	from xml_parser.rb:26:in `<main>'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants