-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test EncodedSring#to_s for undefined conversion / invalid byte sequence #134
Changes from all commits
4949be0
2615f3d
5c54a6e
09ec191
78b032e
d431a4d
d85222d
93776b9
a51874c
d3a95df
8aa73ae
db2c3a4
591c3fc
b3a0257
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,8 @@ branch = File.read(File.expand_path("../maintenance-branch", __FILE__)).chomp | |
end | ||
|
||
### dep for ci/coverage | ||
gem 'simplecov', '~> 0.8' | ||
gem 'simplecov', '~> 0.9' | ||
gem 'simplecov-html', :github => 'colszowka/simplecov-html' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One of your comments mentioned this addresses a failure but I don't understand...we didn't need this gem before so why do we need it now? What failure does it address? |
||
|
||
gem 'rubocop', "~> 0.23.0", :platform => [:ruby_19, :ruby_20, :ruby_21] | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,8 +7,14 @@ module RSpec | |
module Support | ||
# rubocop:disable ClassLength | ||
class Differ | ||
if String.method_defined?(:encoding) | ||
EMPTY_DIFF = EncodedString.new("", Encoding.default_external) | ||
else | ||
EMPTY_DIFF = EncodedString.new("") | ||
end | ||
|
||
def diff(actual, expected) | ||
diff = "" | ||
diff = EMPTY_DIFF.dup | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why was Also, is the |
||
|
||
if actual && expected | ||
if all_strings?(actual, expected) | ||
|
@@ -25,12 +31,10 @@ def diff(actual, expected) | |
|
||
# rubocop:disable MethodLength | ||
def diff_as_string(actual, expected) | ||
@encoding = pick_encoding actual, expected | ||
|
||
@encoding = EncodedString.pick_encoding(actual, expected) | ||
@actual = EncodedString.new(actual, @encoding) | ||
@expected = EncodedString.new(expected, @encoding) | ||
|
||
output = EncodedString.new("\n", @encoding) | ||
output = EncodedString.new("\n", @encoding) | ||
|
||
hunks.each_cons(2) do |prev_hunk, current_hunk| | ||
begin | ||
|
@@ -47,8 +51,6 @@ def diff_as_string(actual, expected) | |
finalize_output(output, hunks.last.diff(format_type).to_s) if hunks.last | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this being removed because of https://github.com/rspec/rspec-expectations/pull/220/files#diff-9533f5f156a38a3307ecfc610d2282d7R47 ? I ask because rspec-mocks uses the differ in RSpec 2.2....but it doesn't rescue this error. It makes me wonder if we should move this rescue back into here and/or add a similar rescue to rspec-mocks. @JonRowe, give that you authored rspec/rspec-expectations#220, what do you think should be done? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I'm noticing that rspec-expectations doesn't rescue this error anymore, either...so why is it safe to remove, @bf4? (Apologies if your commit messages explain but at 23 commits it's a lot to look through). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After reading through the rest of the diff it looks like this error isn't possible anymore since There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @myronmarston That's my experience thus far. I have been unable to cause the code to fail with that exception, which is why I tracked down where it came from. Specifically, as I noted in the commit message bf4@96584ba
As I commented elsewhere, I want to confirm this isn't still a problem on 1.9.2. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given how notoriously difficult it is to actually craft an invalidly encoded string when you run a conventually encoded environment I'm uncomfortable with this FYI |
||
|
||
color_diff output | ||
rescue Encoding::CompatibilityError | ||
handle_encoding_errors | ||
end | ||
# rubocop:enable MethodLength | ||
|
||
|
@@ -188,26 +190,6 @@ def object_to_string(object) | |
PP.pp(object, "") | ||
end | ||
end | ||
|
||
if String.method_defined?(:encoding) | ||
def pick_encoding(source_a, source_b) | ||
Encoding.compatible?(source_a, source_b) || Encoding.default_external | ||
end | ||
else | ||
def pick_encoding(_source_a, _source_b) | ||
end | ||
end | ||
|
||
def handle_encoding_errors | ||
if @actual.source_encoding != @expected.source_encoding | ||
"Could not produce a diff because the encoding of the actual string " \ | ||
"(#{@actual.source_encoding}) differs from the encoding of the expected " \ | ||
"string (#{@expected.source_encoding})" | ||
else | ||
"Could not produce a diff because of the encoding of the string " \ | ||
"(#{@expected.source_encoding})" | ||
end | ||
end | ||
end | ||
# rubocop:enable ClassLength | ||
end | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,19 @@ module RSpec | |
module Support | ||
# @private | ||
class EncodedString | ||
MRI_UNICODE_UNKOWN_CHARACTER = "\xEF\xBF\xBD" | ||
if String.method_defined?(:encoding) | ||
# see https://github.com/ruby/ruby/blob/ca24e581ba/encoding.c#L1191 | ||
def self.pick_encoding(source_a, source_b) | ||
Encoding.compatible?(source_a, source_b) || Encoding.default_external | ||
end | ||
else | ||
def self.pick_encoding(_source_a, _source_b) | ||
end | ||
end | ||
|
||
# Ruby's default replacement string for is U+FFFD ("\xEF\xBF\xBD") for Unicode encoding forms | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sentence doesn't make grammatical sense ("string for is" reads awkwardly and I'm not sure what you're trying to say). Can you rephrase it? |
||
# else is '?' ("\x3F") | ||
REPLACE = "\x3F" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can this just be Also why is this better than the default replacement string of |
||
|
||
def initialize(string, encoding=nil) | ||
@encoding = encoding | ||
|
@@ -33,21 +45,52 @@ def to_s | |
|
||
private | ||
|
||
ENCODING_STRATEGY = { | ||
:bad_bytes => { | ||
:invalid => :replace, | ||
# :undef => :nil, | ||
:replace => REPLACE | ||
}, | ||
:cannot_convert => { | ||
# :invalid => :nil, | ||
:undef => :replace, | ||
:replace => REPLACE | ||
}, | ||
:no_converter => { | ||
:invalid => :replace, | ||
# :undef => :nil, | ||
:replace => REPLACE | ||
} | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what do you think of this? I kind of like it but I've never seen it before. I found it while looking through the Ruby encoding / transcoding code and tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's with the commented out entries in these hashes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, is there any real benefit to sticking all these strategies in a big named hash? Why not just make them separate constants and refer to them separately? That would remove an unnecessary method call ( |
||
|
||
# Raised by Encoding and String methods: | ||
# Encoding::UndefinedConversionError: | ||
# when a transcoding operation fails | ||
# e.g. "\x80".encode('utf-8','ASCII-8BIT') | ||
# Encoding::InvalidByteSequenceError: | ||
# when the string being transcoded contains a byte invalid for the either | ||
# the source or target encoding | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should "...for the either the source or target..." be "...for either the source or target..."? |
||
# e.g. "\x80".encode('utf-8','US-ASCII') | ||
# Raised by transcoding methods: | ||
# Encoding::ConverterNotFoundError: | ||
# when a named encoding does not correspond with a known converter | ||
# e.g. 'abc'.force_encoding('utf-8').encode('foo') | ||
# Encoding::CompatibilityError | ||
# | ||
def matching_encoding(string) | ||
string.encode(@encoding) | ||
rescue Encoding::UndefinedConversionError, Encoding::InvalidByteSequenceError | ||
normalize_missing(string.encode(@encoding, :invalid => :replace, :undef => :replace)) | ||
encoding = EncodedString.pick_encoding(source_encoding, @encoding) | ||
# Converting it to a higher character set (UTF-16) and then back (to UTF-8) | ||
# ensures that we strip away invalid or undefined byte sequences | ||
# => no need to rescue Encoding::InvalidByteSequenceError, ArgumentError | ||
string.encode(::Encoding::UTF_16LE, ENCODING_STRATEGY[:bad_bytes]). | ||
encode(encoding) | ||
rescue Encoding::UndefinedConversionError, Encoding::CompatibilityError | ||
string.encode(encoding, ENCODING_STRATEGY[:cannot_convert]) | ||
# Begin: Needed for 1.9.2 | ||
rescue Encoding::ConverterNotFoundError | ||
normalize_missing(string.force_encoding(@encoding).encode(:invalid => :replace)) | ||
end | ||
|
||
def normalize_missing(string) | ||
if @encoding.to_s == "UTF-8" | ||
string.gsub(MRI_UNICODE_UNKOWN_CHARACTER.force_encoding(@encoding), "?") | ||
else | ||
string | ||
end | ||
string.force_encoding(encoding).encode(ENCODING_STRATEGY[:no_converter]) | ||
end | ||
# End: Needed for 1.9.2 | ||
|
||
def detect_source_encoding(string) | ||
string.encoding | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
module RSpec | ||
module Support | ||
module EncodingHelpers | ||
module_function | ||
|
||
# For undefined conversions, replace as "U+<codepoint>" | ||
# e.g. '\xa0' becomes 'U+00A0' | ||
# see https://github.com/ruby/ruby/blob/34fbf57aaa/test/ruby/test_transcode.rb#L2050 | ||
def safe_chr | ||
# rubocop:disable Style/RescueModifier | ||
@safe_chr ||= Hash.new { |h, x| h[x] = x.chr rescue ("U+%.4X" % [x]) } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I consider trailing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was actually thinking these helpers need their own tests :) |
||
# rubocop:enable Style/RescueModifier | ||
end | ||
|
||
if String.method_defined?(:encoding) | ||
|
||
def safe_codepoints(str) | ||
str.each_codepoint.map { |codepoint| safe_chr[codepoint] } | ||
rescue ArgumentError | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What part of |
||
str.each_byte.map { |byte| safe_chr[byte] } | ||
end | ||
|
||
# rubocop:disable MethodLength | ||
def expect_identical_string(str1, str2, expected_encoding=str1.encoding) | ||
expect(str1.encoding).to eq(expected_encoding) | ||
str1_bytes = safe_codepoints(str1) | ||
str2_bytes = safe_codepoints(str2) | ||
return unless str1_bytes != str2_bytes | ||
str1_differences = [] | ||
str2_differences = [] | ||
# rubocop:disable Style/Next | ||
str2_bytes.each_with_index do |str2_byte, index| | ||
str1_byte = str1_bytes.fetch(index) do | ||
str2_differences.concat str2_bytes[index..-1] | ||
return | ||
end | ||
if str1_byte != str2_byte | ||
str1_differences << str1_byte | ||
str2_differences << str2_byte | ||
end | ||
end | ||
# rubocop:enable Style/Next | ||
expect(str1_differences.join).to eq(str2_differences.join) | ||
end | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mentioned before that |
||
# rubocop:enable Style/MethodLength | ||
|
||
else | ||
|
||
def safe_codepoints(str) | ||
str.split(//) | ||
end | ||
|
||
def expect_identical_string(str1, str2) | ||
str1_bytes = safe_codepoints(str1) | ||
str2_bytes = safe_codepoints(str2) | ||
expect(str1_bytes).to eq(str2_bytes) | ||
end | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,6 +7,10 @@ source $SCRIPT_DIR/predicate_functions.sh | |
|
||
# idea taken from: http://blog.headius.com/2010/03/jruby-startup-time-tips.html | ||
export JRUBY_OPTS="${JRUBY_OPTS} -X-C" # disable JIT since these processes are so short lived | ||
# Set the external encoding to UTF-8 in a 1.8.7-compatible way | ||
export LANG=en_US.UTF-8 | ||
export LC_ALL=en_US.UTF-8 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this needed now when it wasn't before? I thought the external encoding was already set to UTF-8? |
||
|
||
SPECS_HAVE_RUN_FILE=specs.out | ||
MAINTENANCE_BRANCH=`cat maintenance-branch` | ||
|
||
|
@@ -112,7 +116,7 @@ function check_documentation_coverage { | |
} | ||
|
||
function check_style_and_lint { | ||
echo "bin/rubucop lib" | ||
echo "bin/rubocop lib" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ha, good catch! |
||
bin/rubocop lib | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why all these JRUBY_OPTS changes? I don't understand what these options do and I'd prefer to keep our JRUBY_OPTS simpler without a clear reason to add the complication of these options...especially since we haven't needed these options in the other repos or in this repo before now.