Skip to content

Commit

Permalink
fix: Use ParquetDataset for Schema Inference (#2686)
Browse files Browse the repository at this point in the history
Updates to use ParquetDataset instead of ParquetFile to do schema inference.  This supports both single files and directories of partitioned parquet datasets.

Signed-off-by: Dirk Van Bruggen <[email protected]>
  • Loading branch information
dvanbrug authored May 13, 2022
1 parent 7c69f1c commit 4f85e3e
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions sdk/python/feast/infra/offline_stores/file_source.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

from pyarrow._fs import FileSystem
from pyarrow._s3fs import S3FileSystem
from pyarrow.parquet import ParquetFile
from pyarrow.parquet import ParquetDataset

from feast import type_map
from feast.data_format import FileFormat, ParquetFormat
Expand Down Expand Up @@ -179,9 +179,9 @@ def get_table_column_names_and_types(
filesystem, path = FileSource.create_filesystem_and_path(
self.path, self.file_options.s3_endpoint_override
)
schema = ParquetFile(
schema = ParquetDataset(
path if filesystem is None else filesystem.open_input_file(path)
).schema_arrow
).schema.to_arrow_schema()
return zip(schema.names, map(str, schema.types))

@staticmethod
Expand Down

0 comments on commit 4f85e3e

Please sign in to comment.