explore: removing or overhauling the EncodingReader #2513

flavorjones · 2022-04-11T21:41:19Z

The Nokogiri::HTML4::EncodingReader class is used to try to detect encoding of HTML4 documents when they have ambiguous encoding.

Recently, a REDOS vulnerability was found in this code. There are other regular expressions which should be vetted; and we should explore replacing some of those regexes with simpler calls like String#include?.

This class was written during a time (Ruby 1.9) when Ruby strings were encoded as ASCII-8BIT by default. This hasn't been true since (I think) Ruby 2.0, and so this complexity may only be for an edge case that we no longer need to support; and so maybe we can remove the entire class thereby simplifying both CRuby and JRuby implementations.

The text was updated successfully, but these errors were encountered:

flavorjones · 2022-11-14T02:40:16Z

Perhaps more specifically: let's consider unifying the encoding detection algorithm from the HTML5 parser and the HTML4 parser.

flavorjones · 2022-11-14T14:11:47Z

@stevecheckoway notes that the HTML5 encoding detection is incomplete with respect to https://html.spec.whatwg.org/multipage/parsing.html#prescan-a-byte-stream-to-determine-its-encoding

flavorjones mentioned this issue Nov 14, 2022

fix: html5 encoding detection case insensitive re: meta tag #2693

Merged

flavorjones mentioned this issue Feb 28, 2023

[bug] HTML5 document encoding differs from HTML4 #2801

Open

flavorjones added this to the v1.18.0 milestone Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explore: removing or overhauling the EncodingReader #2513

explore: removing or overhauling the EncodingReader #2513

flavorjones commented Apr 11, 2022

flavorjones commented Nov 14, 2022

flavorjones commented Nov 14, 2022

explore: removing or overhauling the EncodingReader #2513

explore: removing or overhauling the EncodingReader #2513

Comments

flavorjones commented Apr 11, 2022

flavorjones commented Nov 14, 2022

flavorjones commented Nov 14, 2022