-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to provide a custom feature_store.yaml
during CLI operations
#1556
Comments
Hey @MattDelac Do you think #1509 (with separate repositories) would address this problem? |
I don't think so as you are staying in the same repository. It's just that we would like the flexibility to have multiple configurations
For example, this would be useful to let the user confirm that their new FeatureView is properly applied by doing |
@MattDelac Some options that I can imagine Option 1: One repo, one config This is what we have today. The idea is that the
but I think its possible to do
so your prod and staging would pull definitions from Option 2: One repo, many configs Alternatively, we could make it possible to specify a remote configuration file. My main concern with that is that it could be unintuitive how it would function. Would we still consider it to be the root of a feature repo? When I see a command like |
We tried to organize our code with I may not have a good understanding of the problem statement. What benefit does one repo with multiple feature store definitions give us if we can structure our repo to match GCP projects? |
Same things on our side ! We basically have
Then once we merge a new PR, our CD tool is going to spin two jobs that basically do
That's where I should be able to not copy the files and directly do Also to give you more details, in our code we change the GCP project of our We have something like table_ref: str = f"{get_bigquery_project()}.{BIGQUERY_SCHEMA}.{entity}_{feature_view}" So the two registry (prod & dev) does not contain exactly the same information (as the |
So more tangibly @MattDelac, are you suggesting that all parameterization should happen in the I'm just trying to figure out what the most natural approach is here for users. |
Yes |
Digression warning One of the things I have been thinking about a lot is the philosophy behind Black. The idea is basically that we should stop thinking about formatting and just let a tool handle it. The reason I think this may apply to Feast is because we could also let Feast take a more opinionated approach to managing a feature repository. Let's take feature inferencing for instance. Today, you have something like driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=["driver_id"],
ttl=timedelta(days=1),
input=driver_hourly_stats
) after which you should run feast apply which infers features and adds them to the registry. The repo itself is generalized and light weight. At first glance this sounds great, but I have been thinking about whether this is actually a good practice. How does a user constrain the schema of a feature view? They should add specific features to the feast discover which infers schemas for defined feature views and updates them in the repository like driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=["driver_id"],
ttl=timedelta(days=1),
features=[
Feature(name="conv_rate", dtype=ValueType.FLOAT),
Feature(name="acc_rate", dtype=ValueType.FLOAT),
Feature(name="avg_daily_trips", dtype=ValueType.INT64),
],
input=driver_hourly_stats
) A benefit of this approach is that we can version control all schema changes in git, and we have a consistent way to define features (all of it is in the repo, as opposed to some in the repo and some of them inferrred). How does this relate to this particular issue? Well if we have a single repo then the user probably has conditional logic within their FeatureView, meaning Feast will probably have trouble updating/adding the FeatureView in the repo. Also, if we go the single repo (or folder) route, then it's not possible to easily Don't feel too strongly, but just some things on my mind. |
I am clearly not against a more opinionated approach. It might be hard though as Feast is trying to be a tool which let the users connecting OfflineStore & OnlineStore (through Provider)
Ho I see what you mean here and it might a good approach. The problem (at least for us) is that our FS repo is also our source of truth about which features are published and which are not. Moreover we add extra information like
I don't know if we could easily infer the description of the Features with other OfflineStore than BigQuery (eg Presto). Even if all OfflineStore supports it, it means that it's the responsability of the upstream pipeline to properly document a FeatureView. This will be harder to enforce as we would need to create this logic in all of our upstream tools.
I mean it depends how we can save metadata. It sounds like adding tags to FeatureViews gives a lot of flexibility to the user. This gives them the creativity to "tweak" Feast to make it work on s specific environment (each company is different). Keeping track of those tags (or another form of metadata) should be trivial I believe and is key.
I don't understand what you mean here
Same on my side. I really enjoy this chat as it helps me think out of the box 🙂 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Closed by #3077 |
Is your feature request related to a problem? Please describe.
We often want to run
feast apply
(or other CLI operations) on different GCP projects.Therefore it would be nice if we could point to a specific "
feature_store.yaml
" when we use the CLIDescribe the solution you'd like
Something easy like
feast apply --conf feature_store_prod.yaml
. By default--conf
would befeature_store.yaml
Describe alternatives you've considered
Copying a specific yaml to
feature_store.yaml
when we need to perform CLI operations on different environmentsAdditional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: