Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up Feast configuration #525

Closed
woop opened this issue Mar 9, 2020 · 8 comments · Fixed by #611
Closed

Clean up Feast configuration #525

woop opened this issue Mar 9, 2020 · 8 comments · Fixed by #611

Comments

@woop
Copy link
Member

woop commented Mar 9, 2020

Creating this card to clean up the Feast Core and Feast Serving configuration files. The configuration files have the following problems

  • No clear separation of static Spring or JVM configuration from user level configuration
  • Duplicate configuration points for the same thing (BigQuery, or Dataflow configuration)
  • Lack of documentation of the options that are configurable.
  • Lack of documentation on where new configuration should be placed.
  • [EDIT] Adding more scope to this refactor. Configuration should also be easy to define and pass to stores. Validation should happen at the storage layer, not in Feast Core or Feast Serving. This change should make it easier to support stores as interfaces, possibly with factories building the stores/retrievers based on configuration.

This issue tracks the refactor, cleanup, and documentation to solve the above.

@mrzzy
Copy link
Collaborator

mrzzy commented Mar 26, 2020

Not sure if this should be a separate issue, but this is definitely a pain point. There is almost no documentation on feast.dev the configuring and using Dataflow runner except that it is faster and more reliable than DirectRunner.

Getting Dataflow to work:

  • Had to reverse engineer the config file from core/src/main/java/feast/core/config/JobConfig.java. Also make sure to a set region from dataflow regional endpoints
  • Finding out from Google Docs that I needed to service account credentials via GOOGLE_APPLICATION_CREDENTIALS
  • Figure from the logs that service account needed Storage Bucket permissions in a addition to Dataflow permissions
  • and needed to enable Dataflow API & Cloud Resource Manager API

@woop
Copy link
Member Author

woop commented Mar 28, 2020

Yea, that is definitely a major pain point. Thanks for pointing it out and being so detailed.

@woop woop self-assigned this Mar 28, 2020
@woop
Copy link
Member Author

woop commented Mar 28, 2020

Throwing this into the ring as well. The following BQ configuration parameters must be set for BQ retrieval to work.

bigquery-initial-retry-delay-secs: 
bigquery-total-timeout-secs: 21600

@woop
Copy link
Member Author

woop commented Mar 30, 2020

Two more notes.

  • We can generate configuration metadata from @ConfigurationProperties annotated classes (like FeastProperties). This produces a JSON with keys/descriptions/defaults for all feast keys in application.yml. We could easily convert that into markdown and publish it to docs.feast.dev. I
  • We should ensure that our JavaDocs are up to date. It's currently not being generated.

@ches
Copy link
Member

ches commented Apr 4, 2020

The account which someone apparently created to catch accidental mentions like that one is my favorite thing on the Internet today.

Returning to my seat in the peanut gallery now.

@woop
Copy link
Member Author

woop commented Apr 4, 2020

The account which someone apparently created to catch accidental mentions like that one is my favorite thing on the Internet today.

Returning to my seat in the peanut gallery now.

Just noticed it, hahahahahahha

@woop
Copy link
Member Author

woop commented Apr 16, 2020

Keeping this open for the time being until #533 is done.

@woop
Copy link
Member Author

woop commented May 13, 2020

I believe this issue is done enough. Let's open a new one if needed.

@woop woop closed this as completed May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment