-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split Field model into distinct Feature and Entity objects #655
Conversation
/test test-end-to-end |
/test test-end-to-end |
/test test-end-to-end-batch |
Could you please summarize in the description what the SQL schema changes are that are implied with this branch? Should the PR include a SQL migration script? |
For the script, should it perhaps be at the release level, or is there an advantage to having it at the PR level? |
Could be at release. With the PR, reviewers could run it on an existing development environment with data in it and a.) see that the migration script works, and b.) experiment with the PR with some existing data. Manual of course, but it's something. This would be out of scope for this PR for sure, but separately maybe we could consider integrating something like Flyway into the project, both for development convenience and for shipping migration scripts with Feast releases that operators can have a process to apply. Could be used to load seed data for automated integration tests too. |
Yea I really like Flyway. I think it makes sense. I have only used it as an external tool in the past in non-JVM projects. It seems like integration here would mean that migrations could be triggered manually using the CLI or The value add would be mostly in the migration scripts themselves, so perhaps we could start there I think, and add documentation and Flyway around it. |
@ches @woop The necessary migration between the 0.47 schema and 0.5 is the following:
I can add migration scripts to this PR if necessary, the only difficulty here seems to be the entities and features. Alternatively we can alter the existing |
Thanks @zhilingc From an upgrade perspective I think people care about retaining their existing data.
Are any of our systems (like Spring) picky about column order? |
It's not spring, spring doesn't care, but it's not directly supported in postgres. |
Ok, so where does the failure happen if we lose column ordering? I am trying to understand the comment you made above. |
Oh, actually I suppose its not necessary to reorder columns.
|
481ab83
to
030d6e6
Compare
/test test-end-to-end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to love the migration, but I think that I'll be happier after it's done.
Functionally LGTM, I think it's a good move for internals of Feast's design, as the rest of your work based on it probably will further prove. My remarks are just doc nits.
An aside about SQL data model in general, not specific to this PR: any particular reasons for the heavy use of string keys? There are several cases where a string PK is entirely made up of data in other columns in the table, they could possibly be composite keys instead of surrogates. If there's going to be a surrogate anyway, I feel like auto-generated numeric IDs would get us into less trouble down the road than "project/feature_set_name:42"
when versions are going away and projects may have changes coming…
/test test-end-to-end |
Removed the embedded ids in favor of surrogate ids. I've retained the child -> parent relationship because bidirectional relationships are more efficient than unidirectional ones. |
/test test-end-to-end-batch |
This looks good to me. @ches I'll leave this open until EOD or until you greenlight it, whichever comes first :) |
I'm still not crystal clear on what tradeoffs we've just made in the last commit. Could we update examples of the SQL schema and/or migration? Roughly speaking, I get that with the prior I'm not sure whether they need/warrant a table though (objectively neutral statement here pending pros and cons to weigh, not an "I'm not sure" to be read as "I don't think they do"). Or if there is a join table, should it also use surrogate rather than composite keys because of open questions for the future about reference components, in #479? Assuming we have simpler schema as per the most recent change, and assuming we need/want references to have first class types so we soon define canonical domain models for them without SQL backing, what have we lost or gained with the resulting design? Feel somewhat as if we exhausted @zhilingc into submission without hearing some valid considerations articulated. Sorry to drag out any further. This has begged for a whiteboard session. |
The join tables have been removed. The generated SQL for migration is following:
|
Disclaimer: My gripe is mostly on the join tables and their value in a one-to-many relationship, and not so much whether surrogates or composite keys are preferable. We don't need join tables to use composite keys, but I understand how they help when you want to propagate all the fields down to children. In favour of Surrogate/Hierarchical/Normalized approach
In favour of Composite/EmbeddedId approach
I take it you mean we could have FeatureRefs, FeatureSetRefs, and EntityRefs floating around without each one of them being persisted. Something like GetOnlineFeatures where no persistence layer is touched for these references, but they are still functionally used? If that is the case then it seems like we have three possible implementations areas:
It seems like one risk that you are flagging is a potential disconnect between (2) and (3) that may not exist if we followed the EmbeddedId approach where they could be the same thing? Whether or not we follow the EmbeddedId (FeatureRef, EntityRef) approach or not, we will still have to make separate classes for FeatureRef, and EntityRef. I like the idea of these classes being used for querying Features/Entities with Spring’s Specifications Interface. So we basically can get the benefits of composite keys without the horrible (in my opinion) data model, and we’d also not need to query for surrogates. This also keeps the FeatureRef and EntityRef pure (annotation free). Although the same won't apply to Features, Entities, Jobs, etc. I expect the domain model and persistence models to be the same classes. It's clear that there are unknowns here and so we should probably err on the side of simplicity and delaying unnecessary complexity. So I think the normalized/surrogate approach has the edge here, but perhaps I am missing some obvious advantage to going the composite/embedded approach. Useful links |
Always happy with good feedback, but we are under time pressure to cut this release in order to roll it out. We still need to get through version removal as well, which is based on this PR. Async communication seems to be working well here, but if you think jumping on a quick call to discuss would be better then we could do that as well. |
Thanks a lot for the detail @woop, that's the analysis I was after and the ERDs are essentially the whiteboard session I was looking for. I'm sold on the model without the join tables.
Yes, exactly, and that's precisely a case where looking them up from persistence would be unnecessary, and undesirable unless maybe for a SQL online store.
I hadn't thought very deeply about it yet, but no, so far I'm not worried that maintaining appropriate invariants in code will be a problem here. I suggest being conservative about committing to proto schemas until we're more certain about further outlook of #479, I think that's discussion for #674.
Indeed, this is part of what I had in mind in raising discussion of whether join models would have value or not—if they did, it would still likely be cumbersome because of how widely these are going to be needed, beyond SQL persistence contexts as above. |
/lgtm |
What this PR does / why we need it:
This is a split-off off #612 that introduces the model changes made in that PR in a more digestible chunk.
This PR includes:
SQL changes:
long
ids for feature sets, features and entitiesDoes this PR introduce a user-facing change?: