Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe interchange protocol #25

Closed
datapythonista opened this issue Aug 7, 2020 · 4 comments
Closed

Dataframe interchange protocol #25

datapythonista opened this issue Aug 7, 2020 · 4 comments

Comments

@datapythonista
Copy link
Member

datapythonista commented Aug 7, 2020

This issue supersedes #1 and #14. As agreed in 6 Aug call the first milestone in the definition of the dataframe API will be the part to interchange data. As an sample use case, Matplotlib being able to receive dataframes from different implementations (e.g. pandas, vaex, modin, cudf, etc.).

This work was originally discussed in OSSData, and an initial draft was later proposed here: wesm/dataframe-protocol#1.

The topics to discuss and decide on are next:

The procedure to include this part on the standard RFC will be as follows:

  1. Define the goals, requirements, target audience, scope and use cases, and include them in the RFC
  2. Discuss and build an standalone document specific to the data interchange based on the above topics
  3. Review internally, and post publicly for additional feedback
  4. Update the prototype with the agreed API
  5. Finalize and approve the API and the prototype, and add them to the RFC document
@jorisvandenbossche
Copy link
Member

There is already quite some discussion about this at wesm/dataframe-protocol#1. Do we want to continue the discussion there, or rather here?
Is what is being discussed over there the basis of the proposal here, or are there clear aspects where we want to deviate?

@datapythonista
Copy link
Member Author

Since we're focussing in the interchange protocol, we're using the discussion in Wes' PR as a starting point. I've been using that PR to discuss things previously discussed, but I'm creating issues in this repo for things that weren't.

@rgommers
Copy link
Member

Is what is being discussed over there the basis of the proposal here, or are there clear aspects where we want to deviate?

Probably not. And if there are cases, those should be discussed on wesm/dataframe-protocol.

There is already quite some discussion about this at wesm/dataframe-protocol#1. Do we want to continue the discussion there, or rather here?

I think the intent here is to go back and actually start by documenting what was missed in the whole Discourse discussion that started that prototype. Namely purpose & scope, goals/non-goals, etc. That would have made that discussion a whole lot more productive, and the decisions on the prototype easier and more consistent.

The prototype itself seems great, and Wes and a few others had goals very much in line with the broader goals for the dataframe effort within this Consortium. From my perspective that prototype stalled because of (a) muddled discussion and half the participants not understanding the goals, and (b) no funding.

So if we can fix those issues, hopefully we can bring __dataframe__ to life.

@rgommers
Copy link
Member

This is a very old issue, let close it.

So if we can fix those issues, hopefully we can bring __dataframe__ to life.

I think we're slowly getting there - 4 implementations in the wild now, and it seems to cover what scikit-learn needs in principle.

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants