Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

best practice for integrating pandas-generated tables into manubot docs #494

Open
cmungall opened this issue May 25, 2023 · 5 comments
Open

Comments

@cmungall
Copy link

[feel free to close if this is too open-ended]

I like to have a Jupyter notebook that accompanies every paper I write, and I am always trying to better automate syncing between artefacts generated by the notebook and the paper; in particular data tables.

I like how manubot allows me to use markdown tables as these are easy to auto-generate. It seems I should read up on the pandoc docs on the different table options (I usually just use the basic flavor of markdown)

It looks like there possibilities to do things like auto-include tables that are generated outside:

But I feel I'm not enough of a manubot wizard to understand what's happening here. {{ looks like jinja templates are being applied? Where do the variables come from? Apologies if I am missing a guide somewhere...

@agitter
Copy link
Member

agitter commented May 26, 2023

Open ended issues are welcomed and helpful for other users. This is also a fairly custom use case, so we don't have a guide anywhere. I can try to help you set up something that works for your project.

Looking back at the example in #461, you're right that jinja templates are being used. That example manuscript is a very complex project that runs scripts in GitHub Actions and stores a lot of data in JSON files on a separate branch, including these Markdown tables. Check out line 28 of owiddata/owiddata-stats.json of the pull request that added the support for those tables to see an example: https://github.com/greenelab/covid19-review/pull/1104/files#diff-2978568b038ee194710db4ab79813d6dcd7e6647dda2b1c71cfe38558dfddd7c That JSON file and all the variables within are then made accessible to jinja by modifying the Manubot build script and setting the --template-variables-path argument: https://github.com/greenelab/covid19-review/blob/e60f9dbb029ae8708655e748a202b8574454b14a/build/build.sh#L47

Do you have your Jupyter notebook in the same repository as your Manubot manuscripts? If so, you should be able to set up a workflow that roughly:

  • has the notebook export dataframes as Markdown tables and saves then in a JSON file, as suggested in import table file #461 (comment)
  • provides that JSON file to manubot process in the build script using --template-variables-path

The first step would be to get it working once. Then we could think about how to automate syncing by exporting the Markdown tables from the notebook on every manuscript build, a schedule, every commit, etc.

@cmungall
Copy link
Author

cmungall commented Jun 2, 2023

I ended up writing my own dataframe to markdown converter (unfortunately pandas to_markdown doesn't support style, for things like lighting max value in a column). My notebook exports this to the ./content/ folder.

I feel I should just be able to

{% include 'my_table.md' %}

but this always results in:

jinja2.exceptions.TemplateNotFound: my_table.md

I will try putting the markdown in the json and rendering this, but it feels a little contorted...

@cmungall cmungall closed this as completed Jun 2, 2023
@agitter
Copy link
Member

agitter commented Jun 2, 2023

Using the jinja include would be more elegant. I'm going to reopen this so we can consider whether we should support that in the future.

I'm not familiar with include problems in jinja2. After a quick Stack Overflow search, it looks like the general solution TemplateNotFound is to use a FileSystemLoader so it has visibility to other "templates" (files). If that's correct, it would require changing how the Manubot Python package calls jinja2: https://github.com/manubot/manubot/blob/f62dd4cfdebf67f99f63c9b2e64edeaa591eeb69/manubot/process/util.py#L313

@agitter agitter reopened this Jun 2, 2023
@cmungall
Copy link
Author

cmungall commented Jun 2, 2023

That would be great!

I seem to recall doing something similar in the past in a different project; create the loader, pass the environment to the loader, and then load directly from the folder:
https://github.com/linkml/linkml/blob/main/linkml/generators/docgen.py#L303-L306

@dhimmel
Copy link
Member

dhimmel commented Jun 3, 2023

Using the jinja include would be more elegant

Hmm yeah, a way to insert entire text files, either from a local path or URL, would be a great solution here. So the questions are:

  • do we use jinja include for this?
  • if so, do we apply jinja2.FileSystemLoader by default with a default searchpath directory in a repo
  • or do we let the manubot process command take a list of paths/urls that then get loaded and passed to something like jinja2.DictLoader

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants