Add Myst-to-NB converter #82

chrisjsewell · 2020-03-15T19:10:09Z

This PR introduces a function for converting text documents, written in the myst format, to a notebook. It could be used in the parser and in #55 to convert these documents to notebooks before executing/reading. See the test input/output for an example.

I would envisage this could be utilised by also adding myst to https://github.com/ExecutableBookProject/MyST-NB/blob/4c3f36b884086878dc0af5dfcfd735b98ebbbacf/myst_nb/parser.py#L27
This would denote files with the .myst extension as ones that should be parsed as notebooks, as opposed to normal .md files, then in https://github.com/ExecutableBookProject/MyST-NB/blob/4c3f36b884086878dc0af5dfcfd735b98ebbbacf/myst_nb/parser.py#L35
we would check the documents extension and apply this function, instead of nbf.reads, before proceeding with the rest of the process.

This addresses to a large extent #12

chrisjsewell · 2020-03-15T19:39:49Z

Or we could have a parser subclass:

import nbformat as nbf
from myst_nb.parser import NotebookParser
from myst_nb.convert import myst_to_nb


class MystNbParser(NotebookParser):
    supported = ("mystnb",)

    def parse(self, inputstring, document):
        ntbk = myst_to_nb(inputstring)
        return super().parse(nbf.writes(ntbk), document)

(it would be good here to allow NotebookParser.parse to optionally take NotebookNode directly, so we don't have to do a spurious round trip conversion)

chrisjsewell · 2020-03-15T19:48:46Z

FYI there is an edge case that should probably be raised with sphinx: if you have multiple documents with the same name but different extensions; sphinx will not complain but will read the same one multiple times, e.g. if you had

a.md
a.rst
a.ipynb
a.myst

Only a.rst would be converted to a.html

choldgraf · 2020-03-15T19:59:38Z

How do you see this usecase being used relative to writing in a .md file? Or more specifically - do you imagine users commonly writing three different kinds of documents: markdown, myst-notebook-markdown, and notebooks-with-myst-inside?

chrisjsewell · 2020-03-15T20:12:53Z

Yes, I think these are three distinct document types. Although note, users could override this behaviour with source_suffix

source_suffix = {
    '.md': 'myst',
}

Which would make all .md be read "as notebooks"

chrisjsewell · 2020-03-15T20:36:44Z

Note this is kind of an alternative to using the jupyter-sphinx directives; the problem with them being:

(a) at present you end up with different formatting/css relative to NotebookParser, but mainly
(b) how can they be tied in to #55; where we want to identify all the notebooks to execute, before any parsing has been done?

with the approach outlined in this PR, that is relatively trivial, and means we don't have to have any code (other than this function) that is different for notebooks and text documents; they'll be treated identically.

chrisjsewell · 2020-03-15T20:47:27Z

Perhaps the default extension for these files could be .mystnb (or Nmd would be analogous to Rmd), and this is also what jupytext would convert to/from

chrisjsewell · 2020-03-15T22:23:02Z

In 5d50424 I have added the full round-trip conversion (well obviously sans code outputs). nb_to_myst could probably be added directly to jupytext, but myst_to_nb uses myst-parser so they probably wouldn't want that as a dependency and would want to re-write.

choldgraf · 2020-03-15T23:40:59Z

Actually, this comment: mwouts/jupytext#447 (comment) makes me think that marc would be happy for us to make myst a requirement for the ntbk <--> myst markdown sync. When people want to use pandoc markdown he just calls pandoc, and myst-parser is a more lightweight dependency than pandoc.

Maybe @AakashGfude could give a look at @chrisjsewell 's code and see how tough it'd be to use that as the basis of a PR to jupytext?

choldgraf · 2020-03-16T15:47:52Z

just to confirm - if the jupytext PR gets merged, then that will supercede this one right?

choldgraf · 2020-03-16T15:49:31Z

(and, just another thought in case it seems relevant in the other PR, we could also keep the merging logic in this repository and have Jupytext use a function that myst_nb provides, or something like this...that might give us more iterative control over the code without needing to wait for upstream reviews in jupytext)

chrisjsewell · 2020-03-16T16:39:04Z

just to confirm - if the jupytext PR gets merged, then that will supercede this one right?

It would yes

(and, just another thought in case it seems relevant in the other PR, we could also keep the merging logic in this repository and have Jupytext use a function that myst_nb provides, or something like this...that might give us more iterative control over the code without needing to wait for upstream reviews in jupytext)

The problem with this is that all the testing is in jupytext. So any changes to the function might break these tests, but you wouldn't know until you tried to use the function in jupytext

choldgraf · 2020-03-16T16:42:03Z

@chrisjsewell makes sense - and to be clear I prefer that this code is an upstream contribution to jupytext, I'm just a bit worried if we need to make some rapid changes to the codebase etc.

choldgraf · 2020-03-16T16:45:40Z

also let's discuss the {execute} vs {nb-code} thing here, so it doesn't clutter the other PR (since Marc has no context for that conversation).

I added {execute} as a synonym for {jupyter-execute} only as a stopgap in this repository until we could get a proper "myst-based notebook format" ready to go. I think that asking users to write {execute} is more intuitive than asking them to write {nb-cell}, so my preference is to use that pattern, rather than to only use {execute} as a short-hand for {jupyter-execute}

choldgraf · 2020-03-16T16:47:01Z

(happy to hop on a phone call to discuss this if that's a faster way to resolve this. I think that the "notebook format in myst" should match exactly what we have in the issue in this repository, unless the group decides otherwise, since that's what everybody agreed on the last time we discussed)

chrisjsewell · 2020-03-16T17:01:30Z

I think that asking users to write {execute} is more intuitive than asking them to write {nb-code}

I wouldn't say I have any strong preference either way TBH.

Firstly though, I wanted to clarify the distinction, and note that if we do use execute, then we should remove https://github.com/ExecutableBookProject/MyST-NB/blob/4c3f36b884086878dc0af5dfcfd735b98ebbbacf/myst_nb/__init__.py#L99

Secondly, from a different perspective, if you're thinking of the file in terms of a notebook, isn't it more intuitive to use nb-code which implies directly that this is a notebook code cell?

Thirdly, {raw} is already a sphinx directive, so we would have to use a different name.

chrisjsewell · 2020-03-16T17:08:36Z

Also, it does feel a bit of a "fudge" that these directives aren't technically directives (i.e. that can be directly used by sphinx), but I don't see any better trade-off, to achieve a full round-trip capability?

choldgraf · 2020-03-16T17:22:53Z

Firstly though, I wanted to clarify the distinction, and note that if we do use execute, then we should remove

Yep, that was my intention all along. I just added the add_directive bit to give us something to work with for development purposes. If we're not going to use {execute} as the way to represent code cells in MyST, then I'll probably remove this line anyway because I think it's confusing to have two words for the same directive split across two packages

Secondly, from a different perspective, if you're thinking of the file in terms of a notebook, isn't it more intuitive to use nb-code which implies directly that this is a notebook code cell?

Thirdly, {raw} is already a sphinx directive, so we would have to use a different name.

I think these are both reasonable points - but the point is that we need to decide as a group what the syntax should be, we can't choose different names just at the time of a PR without getting feedback from others. I think these are reasonable options, and you make a good point about clobbering things like {raw}, but especially for things like user-facing API of the core myst-markdown format, others should have an opportunity to weigh in. We need to treat #12 as the canonical source of truth for "representing a notebook in myst markdown"

Another option is to treat this as a stop-gap for the sake of making progress, let marc know that the names may change but we're happy to make a PR to do this if it happens, then update this issue to have further discussion about how the proposed format should change.

WDYT?

chrisjsewell · 2020-03-16T17:34:54Z

Yep, that was my intention all along. I just added the add_directive bit to give us something to work with for development purposes.

👍

but especially for things like user-facing API of the core myst-markdown format, others should have an opportunity to weigh in

Oh yeh absolutely. I have actually written these functions to take the directive names as variables:
https://github.com/ExecutableBookProject/jupytext/blob/8999b1d93ee4938ddbc36fd749c02639b5d9b588/jupytext/myst.py#L98-L100
with the original intention in this PR of leaving that decision until later.
But then for the jupytext PR obviously they have to be solidified

Another option is to treat this as a stop-gap for the sake of making progress, let marc know that the names may change but we're happy to make a PR to do this if it happens, then update this issue to have further discussion about how the proposed format should change.

yes I'd be happy with this route

chrisjsewell · 2020-03-16T20:28:03Z

then update this issue to have further discussion about how the proposed format should change.

The good thing about having now made this initial jupytext implementation, is that we have a whole suite of output .mystnb/.mnb files, for everyone to inspect and get a feel for how they are looking:

https://github.com/ExecutableBookProject/jupytext/tree/mystnb/tests/notebooks/mirror/ipynb_to_myst

choldgraf · 2020-03-16T21:52:34Z

@mmcky and @jstac would be great to hear your thoughts

jstac · 2020-03-16T22:14:44Z

The syntax here looks perfectly fine to my eyes: https://github.com/ExecutableBookProject/jupytext/blob/mystnb/tests/notebooks/mirror/ipynb_to_myst/Notebook%20with%20metadata%20and%20long%20cells.mystnb

I don't have strong opinions on this point.

chrisjsewell · 2020-03-17T01:36:59Z

authors shouldn't have to think in notebooks.

Mostly I am trying to not deviate as much as possible from RMarkdown's patterns of {r}and {python}...ideally users would be able to just use that exactly in order to define executable code, since it's such a common pattern already. I'm just not sure if that would be possible without colliding with other things in Sphinx, which is why I was thinking {nb:python} etc.

The common pattern here is jupyter notebooks, and how they run code, not however RMarkdown runs code; we, and jupyter-sphinx, and sphinxcontrib-jupyter are all literally converting the blocks to notebooks and running them. I think we should absolutely be saying to authors "think of the code blocks running in terms of notebooks", that is the simplest way for them to understand the execution: one kernel per document, that dictates the language/executable that all of the cells in that notebook are run with, with all cells running consecutively.

This is why {r} or {python} or {nb:python} makes no sense to me, because there is no guarantee that that is the language that the code cells will actually be run with. You are just going to have multiple names for exactly the same underlying directive. We would also have to have one of these directive names for at least every one of the languages supported by jupytext:

_JUPYTER_LANGUAGES = ['R', 'bash', 'sh', 'python', 'python2', 'python3', 'javascript', 'js', 'perl',
                      'html', 'latex', 'markdown', 'pypy', 'ruby', 'script', 'svg',
                      'matlab', 'octave', 'idl', 'robotframework', 'spark', 'sql']

Lastly by having {r}, {python}, ... you are suggesting to the user that they can do something like:

```{python}
python code
```
```{r}
r code
```

and have that execute, which is just not the case, unless we completely re-write myst-nb and scrap notebook round-tripping and jupyer-cache.

jstac · 2020-03-17T01:44:22Z

I agree. Jupyter notebooks is the right mental model for authors, one kernel per notebook. I imagine most people will start off authoring in ipynb anyway, and then gradually start to use text based source as they seek extra features.

mmcky · 2020-03-17T01:50:42Z

Jupyter notebooks is the right mental model for authors, one kernel per notebook.

I agree that thinking in terms of notebooks is a good way to think about how the blocks are executed but I don't think the syntax should be tied to notebooks. It is ok if I get outvoted on this but I guess my point is when I am writing a myst text file I wouldn't think to naturally write {nb-code} for an executable code block as I am writing a document not a notebook.

chrisjsewell · 2020-03-17T02:01:00Z

when I am writing a myst text file I wouldn't think to naturally write {nb-code}

But the converse is, if I had converted from a notebook then seeing {nb-code} would be pretty intuitive

choldgraf · 2020-03-17T02:14:54Z

Personally, I agree with you both in terms of {nb-code} or {execute} being better names than {python} or {r}. I'm just thinking in terms of the behemoth in this space, which is RMarkdown, and our opportunity to bring some of those folks onto this stack by making it as painless as possible.

I'd be fine with either {nb-code} or {execute}/{exec}....hmmm

what about {cell} or {code-cell}? Those are not quite "notebook-specific" but they convey the same idea. If you're coming from a notebook you'll understand what they mean, but if you aren't then they still make sense?

why do we not like {execute} anymore, again? a matter of {nb-code} being more intuitive for notebook users?

chrisjsewell · 2020-03-17T02:21:33Z

. I'm just thinking in terms of the behemoth in this space, which is RMarkdown, and our opportunity to bring some of those folks onto this stack by making it as painless as possible.

Yeh I absolutely take your point, its just trying to get the balance right. Remember that, since jupytext now round-trips both RMarkdown and MyST-NB, we do have a ready-made conversion tool from RMarkdown 😄

{code-cell} would be fine by me, not cell because we would also have {raw-cell}

mmcky · 2020-03-17T03:23:53Z

I think {code-cell} would make the markup syntax clearer.

chrisjsewell · 2020-03-17T03:29:08Z

See mwouts/jupytext@d37f73a

choldgraf · 2020-03-17T03:46:24Z

thanks for the quick discussion and turnaround on this one all, much appreciated!

it definitely does make me more responsive when I'm not supposed to leave the house 😆

jstac · 2020-03-17T04:36:37Z

I hear what you're saying about Rmarkdown, @choldgraf , although perhaps it's possible to overstate the number of would-be authors who are coming from that environment. I imagine that the great majority will be notebook users hoping to soup up the presentation of their results, while knowing little about Rmarkdown.

(Like me 😁)

I suspect most (but not all) Rmarkdown users are R users, who have good tools already in terms of EBP substitutes.

Anyways, +1 for {code-cell}.

choldgraf · 2020-03-17T14:58:19Z

Hey all - another decision-point has come up in @chrisjsewell's PR to jupytext:

Should the extension coming out of jupytext be

.md
.mnb
.myst.md
.myst

My preference as you can guess is for plain old .md, I think it will confuse casual users (AKA, most of our users) if there is anything other than .md in their file extensions. But, I am open to alternative points. Chris made the good point that having a myst-specific extension would make it much easier to know when a markdown file is "myst markdown" for things like parsing, translation, etc. That said, the more we make people do anything different from what they're doing, the more people will not try out this stack in the first place.

jstac · 2020-03-17T18:31:53Z

Looks like you got your preference: mwouts/jupytext@62351f4

chrisjsewell · 2020-03-17T19:06:15Z

Looks like you got your preference: mwouts/jupytext@62351f4

Yeh I can't be bothered to argue about it lol. It does mean that I'll probably put the LSP on the back-burner though. Because, if there is no way to distinguish myst from md, then it doesn't really work.

chrisjsewell · 2020-03-17T19:18:14Z

Also these files won't work directly with the myst-parser extension (because it won't know what to do with code-cell and raw-cell). You need to add a separate parser to myst-nb that calls jupytext.myst.matches_mystnb, to work out if the markdown file should be read as pure markdown or converted to a notebook. (it should just need to be a small subclass of NotebookParser)

@AakashGfude will also need to use it in some manner in #55 to work out which markdown files to convert/execute/cache

jstac · 2020-03-17T19:44:32Z

Yeh I can't be bothered to argue about it lol. It does mean that I'll probably put the LSP on the back-burner though. Because, if there is no way to distinguish myst from md, then it doesn't really work.

In that case I can definitely be bothered to argue about it.

In terms of the user interface, this is hugely important. I can tell you this as someone who has been authoring in jupinx for a long time. The lack of editor support for the source format is absolutely the first thing I would change if I could.

I think user confusion between md and mnb is likely to be minor. In fact, as a user rather than a developer, I would strongly prefer mnb because this is a one-to-one mapping with a notebook. It is, from the very start, a markdown representation of a notebook. This is exactly the mental model that we want users to have in mind. So mnb is perfect.

jstac · 2020-03-17T19:48:42Z

Sorry I'm late to the party on this --- clearly I didn't understand all the implications --- but should we put a hold on mwouts/jupytext#458 until this is decided?

choldgraf · 2020-03-17T19:54:06Z

Given what @chrisjsewell mentioned I think I'd be fine if we used .myst or .mnb. We can always revisit this later if it seems like a big problem. Would it be enough for our usecases?

I am also happy to hop on a call to discuss this if that's helpful to hash out the details.

chrisjsewell · 2020-03-17T19:55:52Z

Sorry I'm late to the party on this --- clearly I didn't understand all the implications --- but should we put a hold on mwouts/jupytext#458 until this is decided?

No, that's probably not necessary. Note that, with the current PR state, if you have jupytext installed and this at the top of your file:

---
jupytext:
  text_representation:
    format_name: myst
    format_version: '0.7'
    jupytext_version: 1.4.0+dev
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

Then any of the following extensions will open directly as notebooks (and changes saved back to them):

chrisjsewell · 2020-03-17T20:02:57Z

I think theres a way forward on this, so mwouts/jupytext#458 can definitely be merged. Then @choldgraf can decide if you want to work out how to make the default pairing (if going from ipynb -> text based) be to .md, rather than default now which is .mnb (see mwouts/jupytext#458 (comment))

choldgraf · 2020-03-17T20:04:57Z

Let's keep it mnb for now, and see how things go.

chrisjsewell · 2020-03-17T20:07:17Z

Let's keep it mnb for now, and see how things go.

Yeh cool cool, if you mention that to mwouts in the PR then, and hopefully we can get a jupytext release with this code in soon, to utilise with #82 (comment)

choldgraf · 2020-03-17T20:10:58Z

done 👍

chrisjsewell · 2020-03-17T20:13:07Z

Cheers 😁

jstac · 2020-03-17T20:33:24Z

Thanks all!!

mmcky · 2020-03-17T23:52:52Z

Nice work! Can I just say it's great working in a team that is open to debate and dialogue on these issues. The project will be better off for it.

chrisjsewell · 2020-03-19T12:39:34Z

MyST format is now merged into jupytext 🎉 https://github.com/mwouts/jupytext/releases/tag/1.4.1

chrisjsewell added 2 commits March 15, 2020 17:37

Add initial implementation

76e1c2d

Add reading of block breaks

911b1b2

chrisjsewell requested a review from choldgraf March 15, 2020 19:10

chrisjsewell added 2 commits March 15, 2020 19:12

add metadata to code cell test

165c655

add rstrip to final markdown cell

7a9ef46

add TODO

97b18de

Add nb_to_myst and round-trip conversion tests

5d50424

minor update

b824b79

chrisjsewell mentioned this pull request Mar 16, 2020

Add mystnb format mwouts/jupytext#456

Merged

chrisjsewell mentioned this pull request Mar 17, 2020

Files with same name but different extensions leads to unexpected behaviour sphinx-doc/sphinx#7324

Closed

chrisjsewell mentioned this pull request Mar 17, 2020

Add new parser for text-based notebooks #83

Closed

chrisjsewell closed this Mar 19, 2020

chrisjsewell mentioned this pull request Apr 8, 2020

Infer MyST notebook files by looking for ```{code-cell} blocks #120

Closed

Add Myst-to-NB converter #82

Add Myst-to-NB converter #82

Conversation

chrisjsewell commented Mar 15, 2020 • edited Loading

chrisjsewell commented Mar 15, 2020 • edited Loading

chrisjsewell commented Mar 15, 2020

choldgraf commented Mar 15, 2020

chrisjsewell commented Mar 15, 2020

chrisjsewell commented Mar 15, 2020 • edited Loading

chrisjsewell commented Mar 15, 2020 • edited Loading

chrisjsewell commented Mar 15, 2020 • edited Loading

choldgraf commented Mar 15, 2020

choldgraf commented Mar 16, 2020

choldgraf commented Mar 16, 2020

chrisjsewell commented Mar 16, 2020

choldgraf commented Mar 16, 2020

choldgraf commented Mar 16, 2020 • edited Loading

choldgraf commented Mar 16, 2020

chrisjsewell commented Mar 16, 2020

chrisjsewell commented Mar 16, 2020

choldgraf commented Mar 16, 2020 • edited Loading

chrisjsewell commented Mar 16, 2020 • edited Loading

chrisjsewell commented Mar 16, 2020

choldgraf commented Mar 16, 2020

jstac commented Mar 16, 2020

chrisjsewell commented Mar 17, 2020 • edited Loading

jstac commented Mar 17, 2020

mmcky commented Mar 17, 2020

chrisjsewell commented Mar 17, 2020

choldgraf commented Mar 17, 2020 • edited Loading

chrisjsewell commented Mar 17, 2020

mmcky commented Mar 17, 2020 • edited Loading

chrisjsewell commented Mar 17, 2020

choldgraf commented Mar 17, 2020

jstac commented Mar 17, 2020

choldgraf commented Mar 17, 2020 • edited Loading

jstac commented Mar 17, 2020 • edited by chrisjsewell Loading

chrisjsewell commented Mar 17, 2020

chrisjsewell commented Mar 17, 2020 • edited Loading

jstac commented Mar 17, 2020

jstac commented Mar 17, 2020

choldgraf commented Mar 17, 2020

chrisjsewell commented Mar 17, 2020 • edited Loading

chrisjsewell commented Mar 17, 2020

choldgraf commented Mar 17, 2020

chrisjsewell commented Mar 17, 2020

choldgraf commented Mar 17, 2020

chrisjsewell commented Mar 17, 2020

jstac commented Mar 17, 2020

mmcky commented Mar 17, 2020

chrisjsewell commented Mar 19, 2020

chrisjsewell commented Mar 15, 2020 •

edited

Loading

chrisjsewell commented Mar 15, 2020 •

edited

Loading

chrisjsewell commented Mar 15, 2020 •

edited

Loading

chrisjsewell commented Mar 15, 2020 •

edited

Loading

chrisjsewell commented Mar 15, 2020 •

edited

Loading

choldgraf commented Mar 16, 2020 •

edited

Loading

choldgraf commented Mar 16, 2020 •

edited

Loading

chrisjsewell commented Mar 16, 2020 •

edited

Loading

chrisjsewell commented Mar 17, 2020 •

edited

Loading

choldgraf commented Mar 17, 2020 •

edited

Loading

mmcky commented Mar 17, 2020 •

edited

Loading

choldgraf commented Mar 17, 2020 •

edited

Loading

jstac commented Mar 17, 2020 •

edited by chrisjsewell

Loading

chrisjsewell commented Mar 17, 2020 •

edited

Loading

chrisjsewell commented Mar 17, 2020 •

edited

Loading