Allow encoding to be specified with S3 storage class #585

pond · 2022-06-15T06:19:30Z

When using local file storage with Shrine, a call to #read will accept encoding: ... in the same way as one might expect for a regular IO object. This is quite important when dealing with input data for things like CSV which might come from all manner of sources, including correctly-encoded UTF-8 with or without BOM, ISO8859-1 or (quite often) Windows CP1252.

In our application, we used this to figure out the encoding and decode the CSV accordingly, but while it works for local development where file storage is convenient for most engineers, it didn't work in deployment since the S3 storage class didn't understand that option and just passed it through to the AWS SDK, which understandably complained.

This PR fixes that in a backwards-compatible way by accepting a with-nil-default encoding option and passing it to the Down::ChunkedIO constructor. The updated tests pass with the patch present, else fail with:

  1) Failure:
Shrine::Storage::S3::#open#test_0007_accepts :encoding option [...shrine/test/storage/s3_test.rb:334]:
Expected: #<Encoding:UTF-8>
  Actual: #<Encoding:ASCII-8BIT>

That's not the end of the story, unfortunately! When Shrine is actually (out of tests) driving the Down::ChunkedIO object, it calls #read on the instance and passes a buffer length. For some reason, if and only if a buffer length is given, Down currently ignores the encoding that it's been given. This is addressed by janko/down#74, which will hopefully be merged soon.

In the mean time, this PR for Shrine is harmless and beneficial if either using a monkey-patched Down gem (as we currently are, as well as monkey patched Shrine!) or using a future Down gem release with the above PR merged.

pond · 2022-06-20T05:14:51Z

(Bump) Any interest in this? It's quite important to us (various encoding issues discovered down the chain - see referenced Down gem PR and subsequently linked issue janko/down#73). Thanks!

benedikt · 2022-10-06T16:53:10Z

We're running into the exact same issue and passing the encoding to Down::ChunkedIO seems to fix things for us as well. Would appreciate if this was merged :)

janko · 2022-10-06T17:22:18Z

Sorry for the wait, I just haven't been finding the energy for Shrine. This change looks good and useful, I'm merging 👍🏻

benedikt · 2022-10-06T17:36:41Z

@janko Thanks a lot for merging this and for all the time and energy you've put into Shrine. It's very much appreciated!

pond · 2022-10-15T22:06:26Z

Thanks for merging this 😄

Is it possible to do a Shrine release to wrap up recent fixes? At present, the most up-to-date public release on RubyGems is 3.4.0 from June 14, 2021.

Allow encoding to be specified with S3 storage class

fe0c279

pond mentioned this pull request Jun 15, 2022

Do not ignore encoding just because a read length is given janko/down#74

Open

janko merged commit 59cdf1f into shrinerb:master Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow encoding to be specified with S3 storage class #585

Allow encoding to be specified with S3 storage class #585

pond commented Jun 15, 2022

pond commented Jun 20, 2022

benedikt commented Oct 6, 2022

janko commented Oct 6, 2022

benedikt commented Oct 6, 2022

pond commented Oct 15, 2022

Allow encoding to be specified with S3 storage class #585

Allow encoding to be specified with S3 storage class #585

Conversation

pond commented Jun 15, 2022

pond commented Jun 20, 2022

benedikt commented Oct 6, 2022

janko commented Oct 6, 2022

benedikt commented Oct 6, 2022

pond commented Oct 15, 2022