Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Add System.Resources.Extensions #36906

Merged
merged 21 commits into from
Apr 26, 2019
Merged

Conversation

ericstj
Copy link
Member

@ericstj ericstj commented Apr 16, 2019

This adds a new pair of IResourceWriter & IResourceReader named BinaryResourceWriter and BinaryResourceReader.

The primary difference between these and the base reader/writer is that they support more formats for pre-serialized data. The benefit of using pre-serialized data is that we don't need to create live-objects during the resource writing phase. This will enable us to fix dotnet/msbuild#2221 in a source comaptible way.

MSBuild will adopt the new BinaryResourceWriter to pass through resource data from resx directly to resources, rather than reading the resx, creating live objects, and directly serializing those.

The Writer is designed in such a way that it only requires the new reader when any of the new-pass-through formats are used. Otherwise it will write the .resources in a format that is compatible with the existing built-in ResourceReader.

The writer must target netstandard so that it can be used by MSBuild when building from visual studio. The reader targets netstandard so that it other frameworks might benefit from this improved resource build process, but it is a secondary requirement.

Since these new types are largely just an extension of the existing ResourceWriter and ResourceReader functionality I unsealed the base types and added the minimal amount of virtuals to enable this. To benefit from the same code-sharing I made the netstandard implementation carry a copy of the base ResourceReader and ResourceWriter types. I have hidden this derivation from the reference assemblies since it cannot be made universally (types are sealed in netstandard2.0) and it's not particularly useful as an "is a" relationship.

I did investigate trying to implement this functionality by wrapping the existing ResourceReader/ResourceWriter types (and wrapping their format) but I couldn't do so cleanly without duplicating a resourceId to type-table map in the outer format. This would bloat the resources considerably and add quite a lot of complexity, so I avoided it.

This is currently in draft. I'd like feedback on the names and the strategy. @stephentoub @jkotas @tarekgh @rainersigwald @nguerrera

@ericstj ericstj added this to the 3.0 milestone Apr 16, 2019
@ericstj ericstj self-assigned this Apr 16, 2019
@jkotas
Copy link
Member

jkotas commented Apr 16, 2019

cc @GrabYourPitchforks

Copy link
Member

@GrabYourPitchforks GrabYourPitchforks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick initial comments and thoughts.

src/Common/src/CoreLib/System/Resources/ResourceReader.cs Outdated Show resolved Hide resolved
src/Common/src/CoreLib/System/Resources/ResourceReader.cs Outdated Show resolved Hide resolved
src/Common/src/CoreLib/System/Resources/ResourceReader.cs Outdated Show resolved Hide resolved
// file, just return a Stream pointing to that block of memory.
unsafe
{
stream = new UnmanagedMemoryStream(ums.PositionPointer, length, length, FileAccess.Read);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a bug in the original ResourceReader code. The ResourceReader ctor assumes that any instance of an UnmanagedMemoryStream passed in to its ctor must correspond to memory that will never be freed for the lifetime of the application, because a non-lifetime-tracked pointer to that memory is being exposed to the outside world via this method. While this is true for resources loaded from assemblies (where the assembly isn't in an unloadable context), this is not generally a safe assumption to make.

I don't think we need to address this now since this code has existed for 20 years without issue, but it's definitely a bug.

Copy link
Member Author

@ericstj ericstj Apr 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this is a bad assumption. UMS doesn't provide any way to rent out a ref count to its pointer. Probably need a better check to make sure the UMS points to the embedded resources if we wanted to avoid this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to address this now since this code has existed for 20 years without issue, but it's definitely a bug.

This was fixed recently: dotnet/coreclr#22925

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we can't take advantage of that in the netstandard build, but can in the corelib version.

stream = new MemoryStream(bytes, false);
}

value = Activator.CreateInstance(type, new object[] { stream });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be as dangerous as BinaryFormatter. I think we said this was intended to operate over trusted data, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct: resources are assumed to be as trusted as executable code. Even before we get here ResourceManager had the opportunity to do something similar based on the resources header (the extension point we're using here): https://github.com/dotnet/coreclr/blob/8252365bfb7651a5473e98d3bdaeaa8cb22c0ec3/src/System.Private.CoreLib/shared/System/Resources/ManifestBasedResourceGroveler.cs#L255

The point of this change is not to make resources more trusted, its to enable non-primitive resources without relying on BinaryFormatter.

@ericstj
Copy link
Member Author

ericstj commented Apr 16, 2019

Here are some alternate options:

  1. Just copy the code.
  • we don't need a public abstraction, we only need behavior provided by ResourceReader/ResourceWriter. We could copy it all.
  • Pro: no new API on framework types. Works everywhere
  • Con: long term we have a maintainence issue in that we need to service multiple assemblies when this code changes.
  1. New abstraction inside ResourceReader.
  • we don't need to override all of the behavior of ResourceReader. Rather than unsealing it and exposing virtuals we could have an internal abstraction for seserialization. We could define this abstraction through a new version of resources, and a type name in the resources section. Still need a public type/interface for the abstraction. ResourceWriter will need a parallel abstraction, and might need API specific for the abstraction.
  • Pro: no need to unseal the reader, just add some new functionality to it.
  • Con: doesn't work on frameworks without the reader change, requires a rev to the resources format, adds a new extension point rather than reusing an existing one.
  1. Envelope approach
  • instead of embedding information inside the current format, wrap the format and store new information outside. New reader and writer can handle the new information then the old reader deal with the old info. We can store data blobs as streams in the inner format. Outer format has a table that maps resource ID to final type and serialization method.
  • Pro: No new API on framework types. Works everywhere.
  • Con: Larger payload due to additional resource header and additional id table. More complicated reader/writer since they now need to understand the header and a format for additional table.
  1. ResourceSet converter
  • similar to envelope approach, but doesn't wrap ResourceReader with a custom ResourceReader. Instead we only write a custom ResourceSet. The ResourceSet is the go-between for ResourceManager and ResourceReader so it can actually intercept the data coming out of the resource reader and deserialize/convert it. We could use a "known key" in the underlying resource collection to store our type / format table serializing in whatever format we want.
  • ResourceManager permits passing in a custom ResourceSet to its constructor, so if we could change the designer generated code this could work on its own without needing a custom ResourceWriter.
  • Pro: No new API on framework types. Works everywhere. Data inside Reader / Resources is pure primitive data.
  • Con: Requires either change to designer generated code or custom writer. If anyone had a scenario for accessing the raw reader to get non-primitive data it wouldn't work. It looks like the performance of ResourceSet without "special" knowledge of the ResourceReader is significantly degraded since the IResourceReader abstraction is effectively just enumeration. Unfortunately most of the smarts that let RuntimeResourceSet + ResourceReader have reasonable performance are due to tight coupling of internals.

@jkotas
Copy link
Member

jkotas commented Apr 16, 2019

long term we have a maintainence issue in that we need to service multiple assemblies when this code changes.

The current PR has this problem too because of it compiles the CoreLib code into the NuGet package.

FWIW, I do not see a problem with duplicating the code. It is simplest and most flexible.

@ericstj
Copy link
Member Author

ericstj commented Apr 16, 2019

The current PR has this problem too because of it compiles the CoreLib code into the NuGet package.

I was trying to draw the distinction that at the limit with the current PR we wouldn't rely on the duplicated code. We could imagine dead-ending the downlevel implementation then only servicing the latest version. In that case there would not be duplication.

FWIW, I do not see a problem with duplicating the code. It is simplest and most flexible.

Are you referring to the current PR or the alternate option 1?

@jkotas
Copy link
Member

jkotas commented Apr 17, 2019

Are you referring to the current PR or the alternate option 1?

Option 1.

The CoreFX implementation style tends to prefer compiling the same code into multiple places with different shapes. We have somewhere between 2MB and 5 MBs worth of C# under Common that is compiled into more than assembly. Adding 100kB of resource handling to this pile does not sound like a big deal.

(I do not have strong opinion on whether option 1 is better than current PR. It is different style, but about as good.)

@ericstj
Copy link
Member Author

ericstj commented Apr 18, 2019

So I'll plan to discuss this option + alternate 1 in FXDC. Before I write up a proposal, can I get feedback on naming and factoring?

Currently I chose System.Resources.Binary, BinaryResourceWriter, BinaryResourceReader on the loose premise that this scheme was based on preserialized (binary rather than object) data. For the type names I think it makes sense to have a ResourceWriter ResourceReader with a common prefix. Names aren't terribly important here since the reader will only be used by the resource manger (not actual developers) and the writer will only be used by MSBuild, but I'd still like to make sure they are sensible.

Here are some other options:

  • namespace

    • System.ComponentModel.Resources
    • System.Resources.Serialization
    • System.Resources.PassThrough
    • System.Resources.Extensions
    • System.Resources.Data
  • name prefix

    • Serializing
    • PassThrough
    • Data
    • Object
    • Component
    • Converting

I'd also like feedback on assembly factoring. Currently I've placed the reader and writer in separate assemblies since the writer is a build time component and reader is runtime. I could combine them if folks don't think the separation saves much. The size of the writer is currently 23KB and its dependencies are a subset of the reader's.

@rainersigwald
Copy link
Member

Now that I've started to consume this, I'd rather move away from Binary in the names.

In MSBuild (left over from resgen) there is a Format enum that has Text, XML, Assembly, and Binary, and the writer gets used in the Binary case, regardless of whether it's this one or the old one, because the output is a binary .resources file.

I like PassThrough but it feels like it might be obtuse for someone who isn't deeply familiar with the area. Though maybe that's less of a problem as a prefix than as a namespace?

@tarekgh
Copy link
Member

tarekgh commented Apr 18, 2019

I prefer to use

System.Resources.Extensions

and even to have the new library named same. extensions really reflect it is a resource extension which I think this is really the case. Having extensions name will allow us in the future to add more stuff there without a need to recreate a new library.

For prefix, I think Serialization or Converter make more sense to me.

@jkotas
Copy link
Member

jkotas commented Apr 18, 2019

I'd also like feedback on assembly factoring.

Does the reader need to be separate NuGet package assembly? Can it be inbox only?

@ericstj
Copy link
Member Author

ericstj commented Apr 18, 2019

Does the reader need to be separate NuGet package assembly? Can it be inbox only?

It must be inbox at least for WindowsDesktop. It could be inbox on the base shared framework if that's what folks think makes sense. It needs to sit rather high in the stack due to the dependencies on BinaryFormatter, and TypeConverter. I need a nuget package for the writer so that it can be used by MSBuild running on desktop inside VS. MSBuild can target many frameworks which is the reason why I also have a package for the reader. I'd prefer it not go inbox, or at least not in a way that freezes the API surface as we might want to iterate on that before it is done.

For prefix, I think Serialization or Converter make more sense to me.

Both these names read a little unusual to me: SerializationResourceReader|Writer ConverterResourceReader|Writer. Perhaps ConvertingResourceReader|Writer. I don't like Serialization here as the whole purpose of the writer is to force the caller to do the serialization, working with pre-serialized data. Perhaps PreserializedResourceWriter and ConvertingResourceReader, it loses the parallelism but then again the writer is already a 1:many readers as it supports writing the old format.

@ericstj ericstj closed this Apr 18, 2019
@ericstj ericstj reopened this Apr 18, 2019
@ericstj
Copy link
Member Author

ericstj commented Apr 18, 2019

Poking around this code a bit more I came up with an Option 4 above. #36906 (comment). I think its interesting and may experiment with it a bit. @tarekgh do you know of many scenarios for folks directly accessing the reader, or do you think it would be reasonable to lift the converter behavior out into a custom ResourceSet?

@tarekgh
Copy link
Member

tarekgh commented Apr 19, 2019

The scenarios that kind of common is people use ResourceReader with the ResourceSet. I saw some little cases of using the reader by itself. We may try to get some more data about that. But in general using ResiurceSet is a good idea.

@ericstj
Copy link
Member Author

ericstj commented Apr 19, 2019

I'm not so sure of the viability of only implementing a custom ResourceSet since it wouldn't be able to access ResourceReader internals to provide the same perf characteristics that people get today. I'll leave it as an option but I'm less enthusiastic after digging into that code.

@ericstj ericstj force-pushed the PassThroughResources branch 3 times, most recently from 81169f5 to df1eaa2 Compare April 24, 2019 06:44
@ericstj ericstj changed the title Add System.Resources.Binary.Reader|Writer Add System.Resources.Extensions Apr 24, 2019
@ericstj ericstj removed the NO REVIEW Experimental/testing PR, do NOT review it label Apr 25, 2019
@ericstj
Copy link
Member Author

ericstj commented Apr 25, 2019

Ok, this should be ready now.

Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor comments/questions. other than that, LGTM

We don't want to use the runtime type identity when doing type checks or writing
types as this may change.  Instead, check-in hard-coded strings that match the
type identity.

Add a test case to ensure the resources we generate match what we
test the reader against in the resource manager case (also to track any
change to the binary format).
@ericstj ericstj merged commit da36ea2 into dotnet:master Apr 26, 2019
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/coreclr that referenced this pull request Apr 26, 2019
* Add System.Resources.Binary.Reader|Writer

* Fix ResourceWriter tests

* Test fixes and PR feedback

* More test fixes

* Add packages for System.Resources.Binary.*

* Suppress duplicate types in System.Resources.Binary.*

* Test refactoring and adding RuntimeResourceSet

It turns out me must have our own ResourceSet since the CoreLib resource set doesn't
expose a constructor that takes an IResourceReader.

I've shared the code since it does a bit of non-trivial caching.

* Don't use auto-property initializers for platfrom sensitive test data

For some reason I thought these lazy-initialized the properties but they don't.

As a result we were hitting the platform sensitive code even when we never
called the getter.  Switch to an expression instead.

* Only use Drawing converters on Windows

* Fix test failures

* Don't leak System.Private.CoreLib into resources

* Make sure RuntimeResourceSet doesn't call ResourceReader(IResourceReader)

* WIP

* Rename types in System.Resources.Extensions

Leave RuntimeResourceSet internal as it doesn't need to be public.

* Update packages

* Respond to API review feedback

Remove abstraction for ResourceReader/Writer: just reuse the source.

Remove non-essential members.

* Clean up

* Further cleanup

* Further cleanup

* Review feedback

* Ensure we have stable type names in Resources.Extensions

We don't want to use the runtime type identity when doing type checks or writing
types as this may change.  Instead, check-in hard-coded strings that match the
type identity.

Add a test case to ensure the resources we generate match what we
test the reader against in the resource manager case (also to track any
change to the binary format).

Signed-off-by: dotnet-bot <[email protected]>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corert that referenced this pull request Apr 26, 2019
* Add System.Resources.Binary.Reader|Writer

* Fix ResourceWriter tests

* Test fixes and PR feedback

* More test fixes

* Add packages for System.Resources.Binary.*

* Suppress duplicate types in System.Resources.Binary.*

* Test refactoring and adding RuntimeResourceSet

It turns out me must have our own ResourceSet since the CoreLib resource set doesn't
expose a constructor that takes an IResourceReader.

I've shared the code since it does a bit of non-trivial caching.

* Don't use auto-property initializers for platfrom sensitive test data

For some reason I thought these lazy-initialized the properties but they don't.

As a result we were hitting the platform sensitive code even when we never
called the getter.  Switch to an expression instead.

* Only use Drawing converters on Windows

* Fix test failures

* Don't leak System.Private.CoreLib into resources

* Make sure RuntimeResourceSet doesn't call ResourceReader(IResourceReader)

* WIP

* Rename types in System.Resources.Extensions

Leave RuntimeResourceSet internal as it doesn't need to be public.

* Update packages

* Respond to API review feedback

Remove abstraction for ResourceReader/Writer: just reuse the source.

Remove non-essential members.

* Clean up

* Further cleanup

* Further cleanup

* Review feedback

* Ensure we have stable type names in Resources.Extensions

We don't want to use the runtime type identity when doing type checks or writing
types as this may change.  Instead, check-in hard-coded strings that match the
type identity.

Add a test case to ensure the resources we generate match what we
test the reader against in the resource manager case (also to track any
change to the binary format).

Signed-off-by: dotnet-bot <[email protected]>
jkotas pushed a commit to dotnet/coreclr that referenced this pull request Apr 27, 2019
* Add System.Resources.Binary.Reader|Writer

* Fix ResourceWriter tests

* Test fixes and PR feedback

* More test fixes

* Add packages for System.Resources.Binary.*

* Suppress duplicate types in System.Resources.Binary.*

* Test refactoring and adding RuntimeResourceSet

It turns out me must have our own ResourceSet since the CoreLib resource set doesn't
expose a constructor that takes an IResourceReader.

I've shared the code since it does a bit of non-trivial caching.

* Don't use auto-property initializers for platfrom sensitive test data

For some reason I thought these lazy-initialized the properties but they don't.

As a result we were hitting the platform sensitive code even when we never
called the getter.  Switch to an expression instead.

* Only use Drawing converters on Windows

* Fix test failures

* Don't leak System.Private.CoreLib into resources

* Make sure RuntimeResourceSet doesn't call ResourceReader(IResourceReader)

* WIP

* Rename types in System.Resources.Extensions

Leave RuntimeResourceSet internal as it doesn't need to be public.

* Update packages

* Respond to API review feedback

Remove abstraction for ResourceReader/Writer: just reuse the source.

Remove non-essential members.

* Clean up

* Further cleanup

* Further cleanup

* Review feedback

* Ensure we have stable type names in Resources.Extensions

We don't want to use the runtime type identity when doing type checks or writing
types as this may change.  Instead, check-in hard-coded strings that match the
type identity.

Add a test case to ensure the resources we generate match what we
test the reader against in the resource manager case (also to track any
change to the binary format).

Signed-off-by: dotnet-bot <[email protected]>
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Add System.Resources.Binary.Reader|Writer

* Fix ResourceWriter tests

* Test fixes and PR feedback

* More test fixes

* Add packages for System.Resources.Binary.*

* Suppress duplicate types in System.Resources.Binary.*

* Test refactoring and adding RuntimeResourceSet

It turns out me must have our own ResourceSet since the CoreLib resource set doesn't
expose a constructor that takes an IResourceReader.

I've shared the code since it does a bit of non-trivial caching.

* Don't use auto-property initializers for platfrom sensitive test data

For some reason I thought these lazy-initialized the properties but they don't.

As a result we were hitting the platform sensitive code even when we never
called the getter.  Switch to an expression instead.

* Only use Drawing converters on Windows

* Fix test failures

* Don't leak System.Private.CoreLib into resources

* Make sure RuntimeResourceSet doesn't call ResourceReader(IResourceReader)

* WIP

* Rename types in System.Resources.Extensions

Leave RuntimeResourceSet internal as it doesn't need to be public.

* Update packages

* Respond to API review feedback

Remove abstraction for ResourceReader/Writer: just reuse the source.

Remove non-essential members.

* Clean up

* Further cleanup

* Further cleanup

* Review feedback

* Ensure we have stable type names in Resources.Extensions

We don't want to use the runtime type identity when doing type checks or writing
types as this may change.  Instead, check-in hard-coded strings that match the
type identity.

Add a test case to ensure the resources we generate match what we
test the reader against in the resource manager case (also to track any
change to the binary format).


Commit migrated from dotnet/corefx@da36ea2
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GenerateResource task does not support non-string resources on .NET core
5 participants