Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text conversion does not decode foreign non-UTF8 text fields (e.g. subject) correctly when using .NET Core #330

Closed
AntGraham opened this issue Aug 23, 2017 · 3 comments
Labels
bug Something isn't working

Comments

@AntGraham
Copy link

First - great library. Trying to arrange my current employer to make a substantial $ contribution.

When parsing non-UTF8 fields (e.g. using the Japanese EML you have in UnitTests\TestData\messages\japanese.txt), fields such as subject aren't decoded correctly when using .NET Core.

The unit test TestJapaneseMessage will succeed for .NET 4.x, but will fail if using .NET Core 2.0.

To save having to step through the EML parsing code, this can be demonstrated by calling the failing code which gives the same inputs to the (internal) Header c'tor as you'd get if you run MimeMessage.Load using the japanese.txt input:

// For simplification, get the raw bytes that would be read in from a problematic EML
byte[] field = Encoding.ASCII.GetBytes("Subject");
byte[] value = Encoding.ASCII.GetBytes(
      " =?ISO-2022-JP?B?GyRCRnxLXDhsJWEhPCVrJUYlOSVIGyhCICh0ZXN0aW5nIEph?=\r\n" +
      " =?ISO-2022-JP?B?cGFuZXNlIGVtYWlscyk=?=\r\n");

// Get the internal contructor "internal Header (ParserOptions options, byte[] field, byte[] value)" that gets called when parsing an eml
ConstructorInfo[] c = typeof(Header).GetConstructors(BindingFlags.NonPublic | BindingFlags.Instance);

// (I know it's the second of three constructors listed)
Header h = c[1].Invoke(new object[] { ParserOptions.Default, field, value }) as Header;

If the above code is called from a .NET 4.x console app, 'h.Subject' (which calls the decoding code) hasn't been decoded correctly.
In 4.x it's correct: '日本語メールテスト (testing Japanese emails)'
In .NET Core 2.0 it's incorrect: '�$BF|K\8l%a!<%k%F%9%H�(B (testing Japanese emails)'

I'm trying to find a solution currently btw, and will submit a push request when/if found.

Ant.

@AntGraham AntGraham changed the title Text conversion fails for non-UTF8 text fields (e.g. subject) when using .NET Core Text conversion fails for foreign non-UTF8 text fields (e.g. subject) when using .NET Core Aug 23, 2017
@AntGraham AntGraham changed the title Text conversion fails for foreign non-UTF8 text fields (e.g. subject) when using .NET Core Text conversion does not decode foreign non-UTF8 text fields (e.g. subject) correctly when using .NET Core Aug 23, 2017
@jstedfast
Copy link
Owner

With .NET Core, character encoding support works a bit differently than in .NET 4.x apparently.

I thought it would be enough to depend on System.Text.Encoding.CodePages 4.3.0, but apparently the encoding instance needs to be registered like so:

System.Text.Encoding.RegisterProvider (System.Text.CodePagesEncodingProvider.Instance);

I could probably add this to MimeKit itself (need to make sure it won't conflict with anyone's app if they make the same call), but in the meantime, you should be able to do that yourself and things should work for you. Let me know if they don't.

jstedfast added a commit that referenced this issue Aug 23, 2017
@jstedfast jstedfast added the bug Something isn't working label Aug 23, 2017
@AntGraham
Copy link
Author

Great - that fixed it, thanks!

@jstedfast
Copy link
Owner

no problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants