Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove file_type() from FileFormat #10499

Merged
merged 1 commit into from
May 15, 2024
Merged

Conversation

Jefffrey
Copy link
Contributor

Which issue does this PR close?

Closes #8657

Rationale for this change

Was implementing rudimentary ORC integration for DataFusion:

https://github.com/datafusion-contrib/datafusion-orc/blob/main/src/datafusion/file_format.rs

Noticed that to implement FileFormat it requires a file_type(), even though this is not really used anywhere. Removing this can make it clearer for those wanting to extend with more custom file formats.

What changes are included in this PR?

Remove FileFormat::file_type() which was unused.

Are these changes tested?

Are there any user-facing changes?

Yes, function removed from trait

@github-actions github-actions bot added the core Core DataFusion crate label May 14, 2024
@Jefffrey
Copy link
Contributor Author

There are some related issues that I think might be good to close now? #8345 and #8637

I was able to implement basic read support for ORC files via ListingTable provider:

https://github.com/datafusion-contrib/datafusion-orc/blob/main/src/datafusion/mod.rs

Which might be enough to demonstrate those issues are done?

@alamb alamb added the api change Changes the API exposed to users of the crate label May 14, 2024
@alamb
Copy link
Contributor

alamb commented May 14, 2024

Makes sense to me -- thank you @Jefffrey

As you go through implementing ORC support, if you hit anything else that woudl make it easier to add new format support to the core and/or listing table that would be great.

@alamb
Copy link
Contributor

alamb commented May 14, 2024

strictly speaking this is an API change but I think it makes it easier to implement new formats 🚀

@alamb
Copy link
Contributor

alamb commented May 14, 2024

There are some related issues that I think might be good to close now? #8345 and #8637

I was able to implement basic read support for ORC files via ListingTable provider:

https://github.com/datafusion-contrib/datafusion-orc/blob/main/src/datafusion/mod.rs

Which might be enough to demonstrate those issues are done?

Indeed -- it sounds reasonable

In my mind, the ideal example is to make an example in datafusion-examples showing how to create a custom file format

I wonder if we could create a custom CSV reader that ignores comments 🤔 -- for example #10262 that @bbannier has been working on

@Jefffrey
Copy link
Contributor Author

As you go through implementing ORC support, if you hit anything else that woudl make it easier to add new format support to the core and/or listing table that would be great.

Will definitely keep this in mind 👍

@alamb alamb merged commit 33aea7e into apache:main May 15, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented May 15, 2024

Thanks again @Jefffrey

@Jefffrey Jefffrey deleted the remove_file_type branch May 16, 2024 18:47
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make/Remove FileType enum and replace with a trait
2 participants