Text conversion does not decode foreign non-UTF8 text fields (e.g. subject) correctly when using .NET Core #330

AntGraham · 2017-08-23T02:25:06Z

First - great library. Trying to arrange my current employer to make a substantial $ contribution.

When parsing non-UTF8 fields (e.g. using the Japanese EML you have in UnitTests\TestData\messages\japanese.txt), fields such as subject aren't decoded correctly when using .NET Core.

The unit test TestJapaneseMessage will succeed for .NET 4.x, but will fail if using .NET Core 2.0.

To save having to step through the EML parsing code, this can be demonstrated by calling the failing code which gives the same inputs to the (internal) Header c'tor as you'd get if you run MimeMessage.Load using the japanese.txt input:

// For simplification, get the raw bytes that would be read in from a problematic EML
byte[] field = Encoding.ASCII.GetBytes("Subject");
byte[] value = Encoding.ASCII.GetBytes(
      " =?ISO-2022-JP?B?GyRCRnxLXDhsJWEhPCVrJUYlOSVIGyhCICh0ZXN0aW5nIEph?=\r\n" +
      " =?ISO-2022-JP?B?cGFuZXNlIGVtYWlscyk=?=\r\n");

// Get the internal contructor "internal Header (ParserOptions options, byte[] field, byte[] value)" that gets called when parsing an eml
ConstructorInfo[] c = typeof(Header).GetConstructors(BindingFlags.NonPublic | BindingFlags.Instance);

// (I know it's the second of three constructors listed)
Header h = c[1].Invoke(new object[] { ParserOptions.Default, field, value }) as Header;

If the above code is called from a .NET 4.x console app, 'h.Subject' (which calls the decoding code) hasn't been decoded correctly.
In 4.x it's correct: '日本語メールテスト (testing Japanese emails)'
In .NET Core 2.0 it's incorrect: '�$BF|K\8l%a!<%k%F%9%H�(B (testing Japanese emails)'

I'm trying to find a solution currently btw, and will submit a push request when/if found.

Ant.

The text was updated successfully, but these errors were encountered:

jstedfast · 2017-08-23T15:28:46Z

With .NET Core, character encoding support works a bit differently than in .NET 4.x apparently.

I thought it would be enough to depend on System.Text.Encoding.CodePages 4.3.0, but apparently the encoding instance needs to be registered like so:

System.Text.Encoding.RegisterProvider (System.Text.CodePagesEncodingProvider.Instance);

I could probably add this to MimeKit itself (need to make sure it won't conflict with anyone's app if they make the same call), but in the meantime, you should be able to do that yourself and things should work for you. Let me know if they don't.

Fixes issue #330

AntGraham · 2017-08-23T22:24:35Z

Great - that fixed it, thanks!

jstedfast · 2017-08-24T14:33:03Z

no problem!

AntGraham changed the title ~~Text conversion fails for non-UTF8 text fields (e.g. subject) when using .NET Core~~ Text conversion fails for foreign non-UTF8 text fields (e.g. subject) when using .NET Core Aug 23, 2017

AntGraham changed the title ~~Text conversion fails for foreign non-UTF8 text fields (e.g. subject) when using .NET Core~~ Text conversion does not decode foreign non-UTF8 text fields (e.g. subject) correctly when using .NET Core Aug 23, 2017

jstedfast added a commit that referenced this issue Aug 23, 2017

For .NET Core, register the CodePagesEncodingProvider

3e80c5c

Fixes issue #330

jstedfast added the bug Something isn't working label Aug 23, 2017

jstedfast closed this as completed Aug 23, 2017

jstedfast mentioned this issue Dec 13, 2017

System.NotSupportedException: No data is available for encoding 51936 jstedfast/MailKit#598

Closed

whitneyschmidt mentioned this issue Jun 9, 2020

Could not load the file 'I18N, Version=2.0.5.0, Culture=neutral, PublicKeyToken=0738eb9f132ed756'. #578 xamarin/xamarin-macios#8815

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text conversion does not decode foreign non-UTF8 text fields (e.g. subject) correctly when using .NET Core #330

Text conversion does not decode foreign non-UTF8 text fields (e.g. subject) correctly when using .NET Core #330

AntGraham commented Aug 23, 2017

jstedfast commented Aug 23, 2017

AntGraham commented Aug 23, 2017

jstedfast commented Aug 24, 2017

Text conversion does not decode foreign non-UTF8 text fields (e.g. subject) correctly when using .NET Core #330

Text conversion does not decode foreign non-UTF8 text fields (e.g. subject) correctly when using .NET Core #330

Comments

AntGraham commented Aug 23, 2017

jstedfast commented Aug 23, 2017

AntGraham commented Aug 23, 2017

jstedfast commented Aug 24, 2017