Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to user guide for DataFrames #11388

Open
Tracked by #7013
alamb opened this issue Jul 10, 2024 · 0 comments
Open
Tracked by #7013

Improvements to user guide for DataFrames #11388

alamb opened this issue Jul 10, 2024 · 0 comments

Comments

@alamb
Copy link
Contributor

alamb commented Jul 10, 2024

First of all, I'm not sure we need the distinction between "user guide" and "library user guide" when it comes to data frames. The only way you can use a data frame is if you are using it as library? I'm unsure why I should be reading one section or the other.

Second, I think you lose a lot of context by removing the table. The SessionContext and DataFrame structs both expose large API surfaces. I think they become much easier to digest once you understand that there is actually a fairly small number of categories of things being exposed. However, the API documentation doesn't provide any way of seeing this structure. Ideally, there would be something like a way to do something like tagging the methods into different categories.

But I think the important part is simply to note that there are transformations, methods that execute the frame and administrative methods. I might further break down the methods that execute the frame into those that return a new frame in some way and those that write to a data sink? That is, I'm not sure its necessary to list every method in each of these categories but it is helpful to identify the categories. That being said, I think a table, perhaps more granular, with links to the API documentation for each method and possibly even links to the SQL equivalent where appropriate would be a good long term goal. Is there some tooling / macros we could build to support this in a sustainable way?

Also, is it the case that I can only create a data frame via SessionContext? The typically in the introduction suggests there are other ways of doing it. I wonder if it would be better to be more precise and just enumerate the different ways you can create a data frame. I think it's something like: read from a file, read from a table (which really covers a lot of possibilities), execute SQL statements.

So - I suppose to make this executable within the context of this PR - perhaps reduce the tables to more of a summary? But also curious to hear from others.

Finally, not for this PR, I wonder if SessionContext warrants its own section. As with DataFrame I think it would benefit from a discussion of the different categories of things it can be used for. Related, it's becoming clear to me from poking around the documentation and methods its becoming clear that there is a great deal of flexibility in mixing and matching SQL and data frames if you want to but I'm not sure that's coming across in the guides? When I have time I can try drafting something to see how it might fit.

Originally posted by @efredine in #11324 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant