Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote apply #4529

Open
dmartinol opened this issue Sep 17, 2024 · 6 comments · May be fixed by #4559
Open

Remote apply #4529

dmartinol opened this issue Sep 17, 2024 · 6 comments · May be fixed by #4559

Comments

@dmartinol
Copy link
Contributor

Expected Behavior

Running feast apply from a client connected using remote proxies should apply the python definitions.

Current Behavior

The apply fails. With a project using the postgres template and a deployment on kind using #4528 it throws the following error:

% feast feature-views list
NAME                         ENTITIES    TYPE
driver_hourly_stats_fresh    {'driver'}  FeatureView
driver_hourly_stats          {'driver'}  FeatureView
transformed_conv_rate_fresh  {'driver'}  OnDemandFeatureView
transformed_conv_rate        {'driver'}  OnDemandFeatureView

% feast apply
...
  File "/Users/dmartino/.pyenv/versions/3.11.9/bin/feast", line 8, in <module>
    sys.exit(cli())
             ^^^^^
...
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/repo_operations.py", line 355, in apply_total
    apply_total_with_repo_instance(
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/repo_operations.py", line 313, in apply_total_with_repo_instance
    store.apply(all_to_apply, objects_to_delete=all_to_delete, partial=False)
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/feature_store.py", line 903, in apply
    self._make_inferences(
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/feature_store.py", line 627, in _make_inferences
    update_feature_views_with_inferred_features_and_entities(
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/inference.py", line 174, in update_feature_views_with_inferred_features_and_entities
    _infer_features_and_entities(
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/inference.py", line 212, in _infer_features_and_entities
    table_column_names_and_types = fv.batch_source.get_table_column_names_and_types(
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/infra/offline_stores/contrib/postgres_offline_store/postgres_source.py", line 113, in get_table_column_names_and_types
    with _get_conn(config.offline_store) as conn, conn.cursor() as cur:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/infra/utils/postgres/connection_utils.py", line 18, in _get_conn
    conninfo=_get_conninfo(config),
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/infra/utils/postgres/connection_utils.py", line 60, in _get_conninfo
    "user": config.user,
            ^^^^^^^^^^^
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/pydantic/main.py", line 811, in __getattr__
    raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
AttributeError: 'RemoteOfflineStoreConfig' object has no attribute 'user'

Reference to local feature_store.yaml:

project: sample
registry:
  path: localhost:8001
  registry_type: remote
offline_store:
  host: localhost
  port: 8002
  type: remote
online_store:
  path: http://localhost:8003
  type: remote
entity_key_serialization_version: 2
auth:
  type: no_auth

Forwarded ports for all remote services to local ports 8001-8003 (see working example of feast feature-views list).

Steps to reproduce

Follow steps of PR #4528 and then run feast apply from the client folder (after copying the example_repo.py)

@tokoko
Copy link
Collaborator

tokoko commented Sep 17, 2024

right, makes sense. this is similar to #4186. Basically, we need to move all DataSource methods that actually need to touch the datasets from DataSource to OfflineStore. I did this only for valiadate before, now we need to do the same for get_table_column_names_and_types.

@dmartinol
Copy link
Contributor Author

I see that validate_data_source is a static method in OfflineStore interface. I assume we want to make it an instance method and re-implement if in the OfflineServer class, right? together with the other new method get_table_column_names_and_types, of course.

This should not require a big effort, even if the Arrow Flight server was not exactly design to serve such purposes but rather suited for "efficient data transport". Shouldn't we make it a multi-protocol application instead and delegate some endpoints unrelated to data-transport to a traditional REST or grpc server?

@tokoko
Copy link
Collaborator

tokoko commented Sep 18, 2024

This should not require a big effort, even if the Arrow Flight server was not exactly design to serve such purposes but rather suited for "efficient data transport". Shouldn't we make it a multi-protocol application instead and delegate some endpoints unrelated to data-transport to a traditional REST or grpc server?

imho that would end up being more complicated than adding a couple of non-data "endpoints" to a flight server,

@dmartinol
Copy link
Contributor Author

I will start evaluating this one.

@tokoko
Copy link
Collaborator

tokoko commented Sep 18, 2024

I think we can start by requiring data source read permissions for both.

@dmartinol
Copy link
Contributor Author

dmartinol commented Sep 23, 2024

@tokoko While exploring this option I had one big doubt:

  • XYZDataSource.get_table_column_names_and_types must execute from an instance where the offline_store config is the actual data store, so it can run the metadata collection query like SELECT * FROM (SELECT * FROM feast_driver_hourly_stats) AS sub LIMIT 0. The Offline Server seems the perfect place to run this query, BUT:
  • This query MUST run only after the feature view table has been created, of course, otherwise it fails with:
psycopg.errors.UndefinedTable: relation "feast_driver_hourly_stats" does not exist
LINE 1: SELECT * FROM (SELECT * FROM feast_driver_hourly_stats) AS s...

At least, this is the psql implementation, but I had a look at other DS and I think they are also raising errors in case the "table" is not there at the time we execute the method. I debugged and the method is called before the tables being created, so it fails.

Am I missing something?

@dmartinol dmartinol linked a pull request Sep 23, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants