Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using JSON columns for TPH inheritance #32242

Closed
clintsinger opened this issue Nov 6, 2023 · 22 comments
Closed

Using JSON columns for TPH inheritance #32242

clintsinger opened this issue Nov 6, 2023 · 22 comments
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported

Comments

@clintsinger
Copy link

I have looked through the issues related to inheritance and json mappings and so far I don't believe I have come across a proposal such as this, though it looks like there may have been similar discussions that I feel were more complex than the whole thing should have been.

With the TPH model one is required to create a column to map to every type of property that might exist on derived classes. The challenge with this is that the table can get pretty ugly if you have a large number of classes that share the same base type.

Most databases support a JSON column which can effectively represent any type including deeply nested complex ones. If one could use, in combination with a discriminator column, a JSON column to map to the derived classes properties it would allow for a richer inheritance with minimal complexity.

It would be expected that the serialization would be a simple process of taking the object that is being serialized, ignoring the base class properties and putting the remaining into the json column.

When deserializing, since the base class properties aren't included in the json document they won't be overwritten. Or alternatively, the json deserialization could occur first and then base class properties filled in afterwards.

The difference between this proposal and the others regarding complex types is that this serializes/deserializes the object at the root level and not a specific property.

An example of how the system would map the entities could be something like this

modelBuilder.Entity<ContentDbEntity>()
    .HasDiscriminator()
    .ToJsonColumn("properties")
    .HasValue<FileContentDbEntity>("file")
    .HasValue<ImageContentDbEntity>("image")
    .HasValue<VideoContentDbEntity>("video");

The discriminator is set and the values are assigned but when the serialization happens, EF knows to use the properties column (in this case) for the root properties that aren't mapped explicitly in the base type.

@roji
Copy link
Member

roji commented Nov 6, 2023

@clintsinger that looks like #31250?

The difference between this proposal and the others regarding complex types is that this serializes/deserializes the object at the root level and not a specific property.

Do you mean that the root entity itself is persisted as a JSON document into a single column in the database, presumably alongside a single primary key column? How is the configuration of that table determined, e.g. the primary key column type, the names of the key/JSON columns etc.? At the end of the day, when modeling a relational database schema, there really is a table with columns - this isn't something we want to hide away from the user.

@clintsinger
Copy link
Author

clintsinger commented Nov 6, 2023

I was thinking along these lines, given the following class structure:

public class Animal  // could be abstract or an interface
{
  public int Id { get; set; }
  public string Species { get; set; }
}

public class Pet // and this could also be abstract or an interface
{
  public string Name { get; set; }
}

public class Cat : Pet
{
  public string EducationLevel { get; set; }
}

public class Dog : Pet
{
  public string FavoriteToy { get; set; }
}

For the modelBuilder it would look something like

  modelBuilder.Entity<Animal>()
    .HasDiscriminator()
    .ToJsonColumn("Properties")  // or whatever would make sense for this scenario
    .HasValue<Cat>("Cat")
    .HasValue<Dog>("Dog")

The database would end up like the following for a Cat and Dog types based on knowing that Animal is the base class and everything else is inherited.

Id Species Discriminator Properties
1 Felis catus Cat { Name: "Alice", EducationLevel: "MBA" }
2 Canis familiaris Dog { Name: "Toast", FavoriteToy: "Mr. Squirrel" }

What EF would do is map the Animal properties to appropriate columns in the table as it does now and have an extra column that would take the place of all of the nullable columns that would have mapped to the individual properties of all of the derived types that would have been necessary for TPH.

@clintsinger
Copy link
Author

One advantage to this model is that it implicitly supports complex types because they are just serialized into the Json string.

@clintsinger
Copy link
Author

#31250 could be saying the same thing but I was honestly having a hard time groking it. Hopefully my example is clearer.

@clintsinger
Copy link
Author

Currently if you have to map a JSON column to an object, it can only be to a single field per column which makes for some terrible classes such as the following:


public class Cat : Animal
{
    public CatProperties Properties {get; set; }
}

public class Dog : Animal
{
    public DogProperties Properties { get; set; }
}

modelBuilder.Entity<Animal>()
    .HasDiscriminator()
    .HasValue<Cat>("Cat")
    .HasValue<Dog>("Dog");

modelBuilder.Entity<Animal>(entity =>
{
    // Configure Animal class but not the properties field
});

modelBuilder.Entity<Cat>(entity =>
{
    entity.Property(e => e.Properties)
        .HasColumnType("jsonb")
        .HasColumnName("properties")
        .HasJsonConversion();
});

modelBuilder.Entity<Dog>(entity =>
{
    entity.Property(e => e.Properties)
        .HasColumnType("jsonb")
        .HasColumnName("properties")
        .HasJsonConversion();
});

Where .HasJsonConversion() is a value converter in the form of an extension method.

public static PropertyBuilder<T> HasJsonConversion<T>(this PropertyBuilder<T> propertyBuilder) where T : class
{
    ValueConverter<T, string> converter = new(
        v => v.Serialize()!,
        v => v.Deserialize<T>()!
    );

    ValueComparer<T?> comparer = new(
        (l, r) => l.Serialize() == r.Serialize(),
        v => v == null ? 0 : v.Serialize()!.GetHashCode(),
        v => v != null ? v.Serialize().Deserialize<T>() : default
    );

    propertyBuilder.HasConversion(converter);
    propertyBuilder.Metadata.SetValueConverter(converter);
    propertyBuilder.Metadata.SetValueComparer(comparer);
    
    return propertyBuilder;
}

By mapping a json column to the root type you can have a more natural inheritance structure without having to do weird single property hack.

@roji
Copy link
Member

roji commented Nov 7, 2023

#31250 will allow you to map a complex type hierarchy to a single database column, serializing the graph into a JSON document. In other words, you'll be able to do the following:

public class Animal
{
    // Animal properties
}

public class Cat : Animal
{
    // Cat properties
}

public class Dog : Animal
{
    // Dog properties
}

public class AnimalContainer
{
    public int Id { get; set; }
    public Animal Animal { get; set; }
}

AnimalContainer is a regular entity type, whose Animal property is mapped to a single JSON column in the database.

Note that crucially, the discriminator would most likely be inside the JSON document, and not in a column outside/alongside it. This allows using TPH-style inheritance for types nested deeply inside the JSON document - each type carries its own discriminator within it.

This indeed requires you to have another type (AnimalContainer) to map to the table, rather than just mapping Animal directly. But that's necessary to allow you to configure various aspects on the Id column, on the column containing the JSON data (what should be its name?).

Note also that no value converters are (or should be) involved - the JSON support is built into EF itself. This allows EF to be aware of the structure inside the JSON document, and to e.g. correctly translate LINQ queries inside it.

@clintsinger
Copy link
Author

clintsinger commented Nov 7, 2023

Ok, I understand what #31250 is doing now. Though it isn't really TPH in that the root object isn't the object that has the discriminator. And, I do see the value in being able to support inheritance in complex properties.

I'm actually migrating from Cosmos DB to PostgreSQL and this is where I noticed the issue. With Cosmos, due to its document nature, it supported discriminator style inheritance quite well including the rich no-schema model. I was looking for a natural way to fit that in a relational database. What I had proposed seemed to fit that quite well.

Any reason what I proposed couldn't work? It obviously works for databases like cosmos so there is precedent for this. (I understand that they are fundamentally different).

@roji
Copy link
Member

roji commented Nov 7, 2023

There's a fundamental difference between document (e.g. Cosmos) and relational (e.g. PG) databases, where the former store (JSON-like) documents, whereas the latter store tables, in which one (or more) columns can contain a JSON-like document.

So yeah, if the intent really is to store just a single JSON document and nothing more, than the way things currently look, you need to have the additional boilerplate of an entity type to map to the table, in addition to the entity types that make up the JSON document - that's not necessary in a document database since there's no enclosing table which contains the document.

EF could in theory have a mode where you instruct it to map an entity type directly to a JSON column, and leave out the details of the enclosing table (EF would do that for you implicitly). This would be a bit of sugar, removing the need to define the table yourself; but as I wrote above, the table still needs to have e.g. a primary key, and you need some way to configure that (should it be an int, a GUID, how is it generated, etc. etc.), as well as the JSON column itself. So I don't think such an implicit "shortcut" mode is appropriate for relational mapping, and in any case the value it provides isn't huge (it just removes the need for the table entity).

Note that none of the above is related to inheritance per se, just to JSON mapping in general.

@clintsinger
Copy link
Author

I feel like you might have missed something in my example. You talk about needing primary keys, etc. But what I was proposing still has those things. They were in the base class, Animal. Only the inherited properties were in the extra json column. The goal isn't just to simply "store a json document" but to take advantage of having a schemaless column that could map to class properties of derived types.

In TPH we would normally have something like this where all of the columns have to be defined for all of the sub types.

Id Species Discriminator Name EducationLevel FavoriteToy
1 Felis catus Cat Alice MBA NULL
2 Canis familiaris Dog Toast NULL Mr. Squirrel

Where, what I was saying is you still have all of the normal base class fields like you would have had in TPH but the "extra" columns that you would have needed for each property are now mapped to the single JSON column.

Id Species Discriminator Properties
1 Felis catus Cat { Name: "Alice", EducationLevel: "MBA" }
2 Canis familiaris Dog { Name: "Toast", FavoriteToy: "Mr. Squirrel" }

What you were saying about #31250 has it's purpose but what I'm proposing basically folds what would have been the need for an extra property on the C# classes up to the root level. That would be a more natural fit for TPH.

From the EF perspective everything looks like standard inheritance.

@roji
Copy link
Member

roji commented Nov 7, 2023

What's Species in your example, where does it come from and how does it relate to Discriminator?

How do I configure the Properties column in your table above, given that there's no property anywhere that corresponds to it? For example, I may want to specify its type or MaxLength facet, or name it something other than Properties.

And what happens if I want to have an Id property inside the JSON? How do you distinguish between the Id property on Animal which gets mapped to the relational Id column, and the property that needs to actually go inside the JSON?

Re extracting the discriminator out to its own column outside the JSON, that goes against the de-facto standard for representing polymorphism in JSON, where a $type property inside the JSON contains the discriminator (see System.Text.Json docs). It may have some benefit in terms of easier indexing, but that's not the way things work.

I'm still not clear on what you're getting here compared to the direction that we're already planning to implement in #31250. All it seems to do is allow you to not define the extra entity (AnimalContainer), which doesn't seem that important.

@clintsinger
Copy link
Author

clintsinger commented Nov 7, 2023

Species was in the original Animal base type when I first included all of the classes earlier in the thread. I basically used the same classes from community standup

public class Animal  // could be abstract or an interface
{
  public int Id { get; set; }
  public string Species { get; set; }
}

The Id and Species are mapped to columns like any other time. They don't go into the JSON at all. The JSON is just treated as virtual columns as if you had created them in the table. But instead of dozens of null columns you have a single JSON document that contains their values.

The Discriminator is the same as any other EF inheritance strategy, such in TPH where you configure the modelBuilder. You don't define the discriminator property on the EF side in the class directly but it still gets mapped to the Discriminator column on the database because of the following code.

  modelBuilder.Entity<Animal>()
    .HasDiscriminator()
    .ToJsonColumn("Properties")  // or whatever would make sense for this scenario
    .HasValue<Cat>("Cat")
    .HasValue<Dog>("Dog")

What I slipped in there was another method which doesn't exists today in EF and I had called .ToJsonColumn("Properties") (or whatever it should look like and/or be called) which set up the system to place all of the "extra" properties in the derived class into the same JSON column.
This was the magic that changed how the inheritance is handled. The name of the column was passed in as a parameter. Just like you do to .HasValue(..._), for example.

This is different than when you want a property to also supports inheritance on the property type. And the reason why my proposal is important is because it keeps the inheritance top-level as you get with all of the other inheritance strategies, TPH, TPC, TPT, etc.

@roji
Copy link
Member

roji commented Nov 7, 2023

Do I understand correctly that all properties on Animals (e.g. Species) get mapped to regular columns, but all properties in subtypes get mapped to a JSON document stored in the Properties column? If so, that seems like an extremely arbitrary mapping - why should there be such a fundamental difference between the root type and its subtypes?

@roji
Copy link
Member

roji commented Nov 7, 2023

I mean, I understand the desire to not have any subtype columns in the table, since they're only relevant for that specific subtypes and would be null for other subtypes. But shaping the entire inheritance strategy around this - this point where base properties are mapped to columns and subtype properties to JSON seems very... disjoint, and not what most people would want...

First, note that in regular TPH mapping, the nullable columns are seldom an actual problem; people frequently assume there's an issue with having such a "wide" table with mostly nullable problems, but that's not necessarily the case. SQL Server has sparse columns which can be ideal for such subtype columns, where most rows contain null. Otherwise there's of course TPT and TPC as alternative inheritance mapping strategies.

Finally, I'm still not sure why the planned JSON inheritance feature (as in #31250) isn't sufficient; it means that all properties in the hierarchy are mapped to the JSON column (including the base type's), making things more consistent, and really doesn't seem to have any significant disadvantage compared to what you're proposing here.

@clintsinger
Copy link
Author

It certainly made sense for the scenarios that I was working on. For me the base type tended to have more of my queryable properties and the subtypes tended to be more of the extra "metadata".

Currently I have been simulating what I expect #31250 will provide and having to create a wrapper class to stuff all of the class properties into a single property isn't as ergonomic as having access to the properties directly.

I'm ending up with classes like the following:

public class ParkFeatureDbEntity : PropertyFeatureDbEntity
{
    public class ParkFeatureProperties
    {
        public MapDbEntity? Map { get; set; }

        public MarkersDbEntity? Markers { get; set; }
    }

    public ParkFeatureProperties? Properties { get; set; }
}

Where I have this Properties on every derived class that has to also have a wrapper class.

The use of it now looks something like

obj.Properties?.Map
obj.Properties?.Markers

Instead of the more natural option of

obj.Map
obj.Markers

I'm assuming this is how it is going to be with #31250 as well?

In this case the MapDbEntity and MarkersDbEntity are both complex objects which would either need their own JSON column in the database for TPH or just naturally get serialized if they were using a shared JSON column that serialized at the root ignoring the root specific properties during serialization.

I'll suppose I'll have to try #31250 once it becomes generally available. It is possible I'm misunderstanding how it works.

@roji
Copy link
Member

roji commented Nov 8, 2023

Yes, your understanding is more or less correct.

The main point here is that the mapping you're proposing may fit your very specific needs, but isn't very general... For example, you may want Map and Markers to go into different JSON documents (i.e. have two JSON columns in the table), at which point your proposal breaks down. Similarly, the division of base types having more "queryable properties" vs. subtypes having more "extra metadata" seems very specific to your scenarios; we're seeing different scenarios using JSON mappings for various different things.

At the end of the day, you're right that #31250 would impose having a wrapper entity type (ParkFeatureDbEntity in your above example), and you'd have to drill into an explicit Properties type which represents the JSON document. You're right that this adds some boilerplate, and if all your mapping follows the same scheme as above, that may seem like a burden. But this allows for full flexibility in terms of mapping whatever you want, in whatever you want it.

@clintsinger
Copy link
Author

If you want Map and Markers to go into their own columns, then there are already mechanisms to support that model. You can do that today. #31250 wouldn't solve that problem either.

I think both options have value and the conversation was worthwhile. You can't have too many tools available to you for the job. Perhaps other people will also like the idea that I'm proposing as it will fit their needs too and at that point might get some traction as a feature to add.

@roji
Copy link
Member

roji commented Nov 8, 2023

If you want Map and Markers to go into their own columns, then there are already mechanisms to support that model. You can do that today. #31250 wouldn't solve that problem either.

It would if Map and/or Markers themselves require inheritance.

I think both options have value and the conversation was worthwhile. You can't have too many tools available to you for the job

There's an infinite number of possible specialized mapping strategies that fit specific needs of specific people; it's not feasible for EF to support them all, and it would also make for quite an unwieldy and confusing product... There's a balancing act here, where we provide a general-enough solution to cover most/all needs, without going into too many very speciifc solutions that fit only certain scenarios..

@clintsinger
Copy link
Author

I certainly agree on the need for balance. Personally I saw this more as an extension to the TPH that already exists, but allows one to use a JSON column as virtual columns to all of the ones you would have had to manually add at the time of the database creation. It was nothing more than that.

@roji
Copy link
Member

roji commented Nov 8, 2023

Personally I saw this more as an extension to the TPH that already exists, but allows one to use a JSON column as virtual columns to all of the ones you would have had to manually add at the time of the database creation. It was nothing more than that.

This is certainly no simple "extension" to TPH; for example, it requires treating base properties very differently from subtype ones (the former get mapped to real columns, the latter to the document). It also requires that the same entity type be mapped both to the table, and to the JSON document in a column contained by that document.

In other words, this is a full-on additional mapping strategy with some very non-trivial implementation aspects; and the value here is to reduce a bit of .NET boilerplate, for when your needs match exactly what this mapping provides. That's why I don't see us implementing something like this.

@clintsinger
Copy link
Author

You're the expert here so I'm sure you are right. On the surface it doesn't appear complicated but I can see how it could get ugly under the covers.

@ajcvickers ajcvickers added closed-no-further-action The issue is closed and no further action is planned. and removed type-enhancement shay-loves-labels labels Jan 3, 2024
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Jan 3, 2024
@peter-8640
Copy link

peter-8640 commented Aug 27, 2024

+1 for this feature. Thanks @clintsinger for the very clear examples which imho precisely articulate the problem.
I have exactly the same use -> several hundred derived classes, each with a few unique properties.
In the example,

  • Animal would have a json attribute
  • In a derived class there would be an overriding method to_json where you can define which properties of that class are stored in the animal class

I see what @roji says about no performance issue from having a very wide table, but from a practical point of view, having a table a few thousand columns wide is not easy to navigate. It also means you have to handle EF migrations every time you add a derived class, which wouldn't be the case if all the properties were in a single json field.

Perhaps an easier implication, within the TPH, would be to restrict which fields are created? i.e. only create fields that exist in the abstract class, not in the derived class.

@roji
Copy link
Member

roji commented Aug 27, 2024

@peter-8640 we do plan to introduce TPH-like inheritance in JSON mappings - that's #31250. This issue (which is closed) is about a very specific, constrained mapping model which doesn't offer much beyond that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported
Projects
None yet
Development

No branches or pull requests

4 participants