Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feast Roadmap for 0.11+ #1527

Closed
woop opened this issue Apr 30, 2021 · 19 comments
Closed

Feast Roadmap for 0.11+ #1527

woop opened this issue Apr 30, 2021 · 19 comments

Comments

@woop
Copy link
Member

woop commented Apr 30, 2021

Our current proposed roadmap for 0.11 and onward is as follows

Backlog

  • On demand transformations
  • Data quality monitoring
  • Snowflake offline store support
  • Bigtable support for Feast
  • Add Push/Ingestion API support

Scheduled for development (next 3 months)

We're open to feedback. Either new roadmap items or reprioritization!

@woop woop changed the title Feast Roadmap Feast Roadmap for 0.11+ Apr 30, 2021
@YikSanChan
Copy link
Contributor

YikSanChan commented May 6, 2021

@woop

Add Push/Ingestion API support

Is "ingestion" == "consuming some streams"?

@YikSanChan
Copy link
Contributor

@woop Also I hope there will be Hive support for offline store

@woop
Copy link
Member Author

woop commented May 6, 2021

Is "ingestion" == "consuming some streams"?

No. It's simply allowing teams to push events to the online store. We aren't starting consumption jobs, but in theory you could launch those jobs with a custom provider.

@woop Also I hope there will be Hive support for offline store

@YikSanChan the current plan does not include developing Hive support. What if we work together to make it easy to add support for Hive? We can add a simply plugin interface and you can extend it to support Hive.

@YikSanChan
Copy link
Contributor

@woop

We can add a simply plugin interface and you can extend it to support Hive.

That sounds good! Will the Hive support be similar as how Dynamo / Redshift support is added, or not?

@jianshen92
Copy link

Looking forward for Redshift support!

@cloudbow
Copy link

cloudbow commented May 7, 2021

Looking forward to redshift, dynamo , feature view support.

@oleg-savko
Copy link

Looking forward for Clickhouse support!

@singh-b
Copy link

singh-b commented Jun 9, 2021

Is there a plan to add AWS as a provider?

@woop
Copy link
Member Author

woop commented Jun 18, 2021

Is there a plan to add AWS as a provider?

Yes, development is in progress.

@bennfocus
Copy link
Contributor

@YikSanChan @woop Any progress re the Hive support? I'd like to talk/contribute to it as well.

@YikSanChan
Copy link
Contributor

YikSanChan commented Jul 2, 2021

@YikSanChan @woop Any progress re the Hive support? I'd like to talk/contribute to it as well.

FYI I am not working on Hive support

@bennfocus
Copy link
Contributor

#1686 FYI, I created a new issue for Hive support, will work on this recently.

@woop
Copy link
Member Author

woop commented Jul 5, 2021

Thanks @Baineng

@rakshithvsk
Copy link

rakshithvsk commented Jul 6, 2021

Hey, Hi @woop,

I have recently moved from "feast 0.9.3" to "feast 0.11.0" and have few questions after using FeatureViews. Particularly I see few gaps in FeatureView

  1. All I see is that FeatureView revolves around the FeatureRepo directory hence with the introduction of FeatureView, are we planning to remove dependency on Feast Core, Serving,Postgres and Feast Spark?

  2. Also I don't see start_stream_to_online_ingestion in FeatureView, which was available in Client. So should I still depend on FeaturTable, and hence Feast 0.9 for online ingestion?

  3. Feast 0.9.3 had client.ingest() API where data was created with datetime partitions, which was helpful in faster historical retrieval. But with Feast 0.11, in case of local provider, we need to deal with a single large file, which might not scale up for larger datasets. Particularly if we need on-prem deployment, and if I do not want to entitle to GCP, this would be an issue.

I believe, Point 2 and 3, should definitely be addressed as part of roadmap, if we are going with FeatureView.

Thanks

Let me know your thoughts @woop

@rightx2
Copy link
Contributor

rightx2 commented Jul 6, 2021

@rakshithvsk, @woop I also have interested in your Q1, Q2. I'd like to add some points on the Q2:

  1. If we should depend on the FeatureTable, how can we materialize from stream source to online store? In new version, Client doesn't have start_stream_to_online_ingestion() anymore on feast >= v0.10 ....
  2. If I want to use Featureview, is there no way to materialize from online source to online storage?

Thanks

@woop
Copy link
Member Author

woop commented Jul 9, 2021

Hey, Hi @woop,

I have recently moved from "feast 0.9.3" to "feast 0.11.0" and have few questions after using FeatureViews. Particularly I see few gaps in FeatureView

  1. All I see is that FeatureView revolves around the FeatureRepo directory hence with the introduction of FeatureView, are we planning to remove dependency on Feast Core, Serving,Postgres and Feast Spark?

Yes we are removing those dependencies, but we are not precluding the use of Spark or having an API centric registry. We just think that the base installation of Feast should be lighter weight. We are building extension points for Feast so that teams can plug in their own storage or compute systems.

  1. Also I don't see start_stream_to_online_ingestion in FeatureView, which was available in Client. So should I still depend on FeaturTable, and hence Feast 0.9 for online ingestion?

Feast 0.9 has streaming ingestion. We don't have streaming support in 0.10+ yet, since we've removed the Spark dependency. Streaming jobs will be launched through the apply method.

  1. Feast 0.9.3 had client.ingest() API where data was created with datetime partitions, which was helpful in faster historical retrieval. But with Feast 0.11, in case of local provider, we need to deal with a single large file, which might not scale up for larger datasets. Particularly if we need on-prem deployment, and if I do not want to entitle to GCP, this would be an issue.

Our File sources are meant for convenience today, and won't scale to production loads. It's just using Pandas under the hood, not Spark. I don't think the key value here was ingest(), but more the compute layer that did the retrieval, right?

One of our design goals is to double down on storage technologies that offload a lot of the complexity of reading, writing, and transforming data. We can't support all technologies. We'd rather support BigQuery, Redshift, and other data warehouses instead of having to rewrite the same queries as ETL pipelines in Spark. In your case it seems like Hive might be a good idea, but we don't support that today. It should be a pretty straightforward addition though.

I believe, Point 2 and 3, should definitely be addressed as part of roadmap, if we are going with FeatureView.

Thanks

Let me know your thoughts @woop

@woop
Copy link
Member Author

woop commented Jul 9, 2021

@rakshithvsk, @woop I also have interested in your Q1, Q2. I'd like to add some points on the Q2:

  1. If we should depend on the FeatureTable, how can we materialize from stream source to online store? In new version, Client doesn't have start_stream_to_online_ingestion() anymore on feast >= v0.10 ....

Streaming jobs can be launched by apply if you use a custom provider. Other than that you will need to wait for us to add streaming support.

  1. If I want to use Featureview, is there no way to materialize from online source to online storage?

Materialize()

Thanks

@rightx2
Copy link
Contributor

rightx2 commented Jul 12, 2021

Materialize()

Isn't it for loading data from 'offline' source to 'online' storage? I asked from 'online(streaming)' source to 'online' storage... But according to your first answer, I guess I need to wait for you guys to add streaming support

@woop
Copy link
Member Author

woop commented Jul 14, 2021

Materialize()

Isn't it for loading data from 'offline' source to 'online' storage? I asked from 'online(streaming)' source to 'online' storage... But according to your first answer, I guess I need to wait for you guys to add streaming support

Ah I see. I was through off by "online source". Yea, there is not solution right now.

@adchia adchia closed this as completed Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants