Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Sidekiq delivery method for non-UTF8 email #43

Merged
merged 1 commit into from
Sep 24, 2015

Conversation

DouweM
Copy link
Collaborator

@DouweM DouweM commented Sep 24, 2015

DouweM added a commit that referenced this pull request Sep 24, 2015
Fix Sidekiq delivery method for non-UTF8 email
@DouweM DouweM merged commit 32aa5ee into tpitale:master Sep 24, 2015
@DouweM
Copy link
Collaborator Author

DouweM commented Sep 24, 2015

@tpitale Could you do another release? We want to get this in a GitLab patch release.

@DouweM DouweM deleted the utf8-message branch September 24, 2015 11:35
@tpitale
Copy link
Owner

tpitale commented Sep 24, 2015

Woah. You just created a PR, and merged it yourself without any look from me? And you added a new gem dependency to do it?

@DouweM
Copy link
Collaborator Author

DouweM commented Sep 24, 2015

@tpitale :/ Sorry about that, I didn't really think about it. I thought it'd be okay as it's a bug we've encountered with Sidekiq/JSON/text encoding in GitLab a few times already, where we use the same fix.

I thought part of the reason why I was added as a collaborator was to be able to quickly merge minor (in my eyes) stuff like this, but I realize we never discussed that.

I'm happy to revert it and discuss it properly.

@tpitale
Copy link
Owner

tpitale commented Sep 24, 2015

It's okay, I'm going to release this gem. But, I'd like to discuss ways we can get rid of the dependency. I think there are some simple things we can do in ruby to check for compatible encoding changes and try to encode them. If they can't be switched between encodings … we can decide what to do then.

We also probably have this issue with essentially every other delivery method.

@tpitale
Copy link
Owner

tpitale commented Sep 24, 2015

Released 0.5.2.

@tpitale
Copy link
Owner

tpitale commented Sep 24, 2015

So, one issue I see is that charlock_holmes is a C extension. Not that we explicitly support jruby, but I like to imagine we do 😄

@DouweM
Copy link
Collaborator Author

DouweM commented Sep 24, 2015

I think there are some simple things we can do in ruby to check for compatible encoding changes and try to encode them.

I'm not so sure, what we need to do is detect the encoding based on the content, and that's exactly what CharlockHolmes was written for. Next, we need to convert it, which Ruby can't do by default either.

We also probably have this issue with essentially every other delivery method.

We have the issue with Sidekiq, because we're encoding the message body as JSON, which requires UTF-8. If the message body isn't UTF-8 but ASCII-8BIT, JSON will error out. :( Other delivery methods don't do JSON encoding, so probably don't have this issue, right?

So, one issue I see is that charlock_holmes is a C extension. Not that we explicitly support jruby, but I like to imagine we do 😄

Fair enough. :)

@tpitale
Copy link
Owner

tpitale commented Sep 24, 2015

@tpitale
Copy link
Owner

tpitale commented Sep 24, 2015

Also, the new Que delivery method does dump json: https://github.com/tpitale/mail_room/blob/master/lib/mail_room/delivery/que.rb#L57

@DouweM
Copy link
Collaborator Author

DouweM commented Sep 24, 2015

@tpitale Right you are :/ I jumped the gun on merging this, my bad.

@tpitale
Copy link
Owner

tpitale commented Sep 24, 2015

Can you open an issue, with some of the email encoding that was causing issues for gitlab users? I'd like to take a stab at running it through mail_room and fixing the issue with just ruby. If it turns out it can't be done, we'll expand our use of charlock (or something else maybe).

@tpitale
Copy link
Owner

tpitale commented Sep 24, 2015

@DouweM in the future, I'd love to at least have a chance to 👍 any PRs before merging. I'll try to be responsive. I do appreciate all your help with this, and I'm excited that it's helping gitlab.

@DouweM
Copy link
Collaborator Author

DouweM commented Sep 24, 2015

A reproduction of the specific case GitLab users were having trouble with is this:

message = "é"
message.force_encoding("ASCII-8BIT")

Note that in this case, the message was actually UTF-8-encoded, but for some reason Ruby thought it was ASCII-8BIT, and when trying to do JSON.generate, it tried to convert this string that was actually UTF-8 encoded from ASCII-8BIT to UTF-8, yielding the "\xC3" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError) error you see in https://gitlab.com/gitlab-org/gitlab-ce/issues/2698.

Reproduce this by doing:

JSON.generate(message) # => error

In this case, simply doing message.force_encoding('UTF-8') would work, but not in this case, where the original message was ISO-8859-1 rather than UTF-8:

message = "é".encode('ISO-8859-1')
message.force_encoding('ASCII-8BIT')
message.force_encoding("UTF-8")
JSON.generate(message) # => error

@tpitale
Copy link
Owner

tpitale commented Sep 24, 2015

Thanks, I'm going to make this an issue.

@DouweM
Copy link
Collaborator Author

DouweM commented Sep 24, 2015

@tpitale I completely understand, and I will do better at that in the future. I thought this MR was minor, but as we see now, it definitely wasn't.

@tpitale
Copy link
Owner

tpitale commented Sep 24, 2015

No worries. Every OSS project is different, too. I appreciate you being considerate and working with me on this.

@tpitale
Copy link
Owner

tpitale commented Sep 24, 2015

I've opened issue #44 and I'll try to see if I can fix this in ruby for Que.

@DouweM
Copy link
Collaborator Author

DouweM commented Sep 24, 2015

Thanks @tpitale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants