Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mechanism for verifying that source code in documentation is valid #7956

Closed
wants to merge 28 commits into from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Oct 27, 2023

Which issue does this PR close?

Closes #7951

Rationale for this change

Adds a mechanism for ensuring that the documentation contains valid source code. See docs README in this PR for a description of the mechanism.

What changes are included in this PR?

  • Moves some source code from markdown to unit tests
  • Adds include directives in the markdown
  • Adds a Python script to merge the markdown and source code when building the docs

This resulted in some small improvements to the documentation for the two pages that are updated to use this mechanism:

  • The udf docs no longer use let mut when creating contexts, and no longer use unwrap
  • The code for building logical plans is now tested to make sure it produces the expected plan
  • The published code is now formatted by cargo fmt

We can follow up with more PRs to move more code over. I suggest we create one PR per page to keep reviews short.

Are these changes tested?

Yes! Some code that was previously untested is now tested.

Are there any user-facing changes?

No

@github-actions github-actions bot added documentation Improvements or additions to documentation development-process Related to development process of DataFusion core Core DataFusion crate labels Oct 27, 2023
@andygrove andygrove added the devrel Issues related to making it easier to use DataFusion by developers label Oct 27, 2023
@github-actions github-actions bot removed development-process Related to development process of DataFusion core Core DataFusion crate labels Oct 28, 2023
@andygrove andygrove marked this pull request as ready for review October 28, 2023 16:44
@andygrove andygrove changed the title Verify that source code in documentation is valid Add mechanism for verifying that source code in documentation is valid Oct 28, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @andygrove -- this is a great step forward

I ran build.sh locally and it looks great to me

Screenshot 2023-10-29 at 8 06 09 AM

I have two suggestions:

  1. Add an error if there is an include that refers to a non existent example
  2. Inline the examples in the doc and extract them for testing (though I don't feel strongly about this)

@@ -53,6 +60,25 @@ To make changes to the docs, simply make a Pull Request with your
proposed changes as normal. When the PR is merged the docs will be
automatically updated.

## Including Source Code

We want to make sure that all source code in the documentation is tested as part of the build and release process. We
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you considered the opposite:

Leaving the code example inlined in the documentation, and extracting it out to datafusion-docs-test with a script.

This would have the benefit that the markdown would be closer to the actual output documentation, and it might be easier to work on the examples inline.

The downside is that then we would need to somehow run a script to copy the examples into datafusion-docs-tests as part of CI to make sure they were up to date

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is more challenging because sometimes we are copying a whole function, and sometimes just a few lines of code, and some snippets depend on other snippets. It is also hard to write and test code with this approach because there is no IDE support.

@@ -21,8 +21,14 @@
set -e
rm -rf build 2> /dev/null
rm -rf temp 2> /dev/null

# copy source to temp dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a build.bat file in this directory that might need to get updated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I had missed that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a convenient way to test on Windows, unfortunately

// print the plan
println!("{}", plan.display_indent_schema());
```
<!-- include: library_logical_plan::create_plan -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested what happens if there is a bad link to an example here:

$ git diff
diff --git a/docs/source/library-user-guide/building-logical-plans.md b/docs/source/library-user-guide/building-logical-plans.md
index b75a788e83..6ae376d445 100644
--- a/docs/source/library-user-guide/building-logical-plans.md
+++ b/docs/source/library-user-guide/building-logical-plans.md
@@ -36,7 +36,7 @@ much easier to use the [LogicalPlanBuilder], which is described in the next sect

 Here is an example of building a logical plan directly:

-<!-- include: library_logical_plan::create_plan -->
+<!-- include: library_logical_plan::create_planXXXXX -->

 This example produces the following plan:

It seems to just leave a blank in the docs

Screenshot 2023-10-29 at 8 10 17 AM

Is there any way to generate an error in this case instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add some error checks.

Copy link
Contributor

@alamb alamb Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this again and now the script simply doesn't change the section, which I think is fine

@andygrove
Copy link
Member Author

@alamb I have addressed the feedback. PTAL when you have time.

@alamb
Copy link
Contributor

alamb commented Nov 7, 2023

I took the liberty of merging this branch to main.

While I was testing it it turns out there are a bunch of clippy errors in the library code now such as

error: unused imports: `SessionContext`, `create_udf`
  --> docs/src/library_udfs.rs:24:27
   |
24 | use datafusion::prelude::{create_udf, SessionContext};
   |                           ^^^^^^^^^^  ^^^^^^^^^^^^^^


error: this import is redundant
  --> docs/src/library_udfs.rs:26:1
   |
26 | use tokio;
   | ^^^^^^^^^^ help: remove it entirely
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#single_component_path_imports
   = note: `-D clippy::single-component-path-imports` implied by `-D warnings`


error: function `add_one` is never used
  --> docs/src/library_udfs.rs:29:4
   |
29 | fn add_one(args: &[ArrayRef]) -> Result<ArrayRef> {
   |    ^^^^^^^
   |
   = note: `-D dead-code` implied by `-D warnings`


However, this PR seems not to trigger the full CI checks (probably because it only runs the docs workflow): https://github.com/apache/arrow-datafusion/blob/main/.github/workflows/docs_pr.yaml

Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Apr 23, 2024
@github-actions github-actions bot closed this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devrel Issues related to making it easier to use DataFusion by developers documentation Improvements or additions to documentation Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add mechanism to verify that source code in documentation is valid
2 participants