Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to CommonMark #1371

Open
mgeier opened this issue Apr 19, 2016 · 15 comments
Open

Switch to CommonMark #1371

mgeier opened this issue Apr 19, 2016 · 15 comments

Comments

@mgeier
Copy link
Contributor

mgeier commented Apr 19, 2016

I don't know if this has been discussed yet, but I think it's high time to switch to CommonMark.

It looks like there are (at least) 2 JavaScript implementations:

The latter seems to allow syntax extensions, and there is even already a MathJax extension: https://github.com/classeur/markdown-it-mathjax

@minrk
Copy link
Member

minrk commented Apr 19, 2016

We've been planning to do this for some time, but hoping for an official extension syntax from CommonMark as the impetus for doing so, rather than the de facto extensions where everybody defines their own markdown dialect, like markdown-it. Right now, markdown-it can either implement CommonMark or have extensions, but the two are mutually exclusive.

@mgeier
Copy link
Contributor Author

mgeier commented Apr 20, 2016

The existence of the CommonMark spec doesn't preclude extensions.

Currently, the CommonMark people are working on finalizing the spec for the "basic" stuff and they don't have time for thinking about extensions, but that doesn't mean that the spec forbids extensions.
Incidentally, the very document holding the CommonMark spec uses an ad-hoc extension for showing Markdown/HTML examples. And that's perfectly fine. There will always be extensions!

After the "basic" CommonMark spec is completed (it is unclear to me when this is going to happen ... beginning of 2016?), there might be some work on "official" extensions (or not). If an "official" math extension is made, I'm quite sure it will be the same syntax as we are already using and as pandoc is using since a very long time ago (and probably many other implementations, too).
Therefore, regardless if there will be an "official" math extension or not, we'll be on the safe side if we just keep using what we are already using.

Long story short: we don't have to feel dirty just because we use a (not yet and probably never "officially authorized") extension to CommonMark.

About an "official extension syntax": Do you mean the proposed block syntax with colons? Something like:

::: {some strange attributes}
I'm text in an
extension block!
:::

Sure, such a thing might be officially supported in some time (in a few years), but I urge you not to force Jupyter Notebook users to use this syntax for block equations. This would be a huge step backwards for the ease-of-use of the Notebook (not to mention it would break all existing notebooks that use block equations).

Coming back to markdown-it: they claim "100% CommonMark support", the ease of creating extensions doesn't change anything about that. Just because it's harder to make extensions for commonmark.js doesn't mean one of them is more or less CommonMark compliant. But I'm not saying that markdown-it is the best, if you prefer we can also use the reference implementation commonmark.js (or something entirely different), but then we'll have to use the same hacky work-around as we are using now to get the math extension working. OTOH, it seems that commonmark.js is currently more active than markdown-it.

I don't really care which library gets selected, I'm just suggesting to change to CommonMark without further delay (and of course keeping the math extension that's currently in use).

@takluyver
Copy link
Member

The whole purpose of producing a common specification, though, is that Markdown can be interoperable, and not dependent on whatever ad-hoc extensions are implemented in particular libraries and not properly specified. So again, we should wait for CommonMark to specify some kind of syntax for extensions to use, analogous to roles and directives in rst. I know that's annoying, but if it increases the pressure on CommonMark to do that, everyone wins eventually.

We will probably never force people to switch to the new extension syntax for equations, because that would break existing content. But everything else should go through one extension syntax, even if that's a bit ugly, rather than trying to pick special symbols for each thing to make it concise.

@mgeier
Copy link
Contributor Author

mgeier commented Apr 21, 2016

I'm not suggesting to add new syntax!

I know we were talking about new syntax in #1292, but that's a different issue!

I'm "just" suggesting to switch to CommonMark. And the sooner we do that, the better. Deliberately delaying that doesn't help anyone.

Do you not want to switch to CommonMark?

@takluyver
Copy link
Member

  • Does it break rendering of any content we currently have? E.g. an upgrade of marked at one point stopped #header working, which broke some notebooks that previously worked. This could potentially break much more.
  • Does it make it any harder (or easier) to do the fiddling we currently do with math expressions. I suspect it needn't change, because we do them quite separately from the parser, if I recall correctly.

In principle, I'm fine with switching to Commonmark, but I don't see much advantage to it at this point. As Min mentioned, if there was an official syntax for extension points, that would be a compelling argument to change.

@rgbkrk
Copy link
Member

rgbkrk commented Apr 21, 2016

Thank you for raising this up @mgeier. I'm a fan of commonmark and have been adapting projects to use commonmark compatible parsers and renderers. In particular, commonmark.js or remark. In the notebook we've been using a markdown implementation that is bundled with "extensions". For instance, GitHub flavored markdown is supported in the notebook by marked.

As in agreement as I am with switching to commonmark, there are several costs (the magnitude of which is unknown):

  • How difficult is adapting the MathJax/LaTeX handling to the new parser?
  • What extensions have we been implicitly supporting?
  • How many notebooks will "break" after a switch to another parser?
  • How does the parser handle embedded HTML?

That's likely most of the hesitation to driving forward. People are in agreement that someday we should. It's the maintenance burden we perceive, relative to finishing other work on the project. If you start the investigation @mgeier, with a PR in earnest, we'll be better able to evaluate the switch.

I do agree that making our own markdown cells adhere to a well specced format is paramount to saying that we have a well specced notebook format. Same goes for embedded LaTeX/MathJax.

@mgeier
Copy link
Contributor Author

mgeier commented Apr 22, 2016

Thanks that we can now finally talk about the actual issue instead of bashing CommonMark extensions!

I think most of the concerns raised above reduce to those two points:

  1. cost of implementation (be it proper extensions or ugly work-around hacks)
  2. breakage of existing notebooks

Although the magnitude of both is unknown at this time (as @rgbkrk mentioned), I think it's possible to make some predictions:

  • The cost of implementation will not change if we wait
  • If we change later, more notebooks will break (simply because more notebooks will be created in the meantime)

For me, this is an absolutely compelling argument to switch to CommonMark as soon as possible!

Are there any objections in general?

@takluyver
Copy link
Member

If we change later, more notebooks will break

The flip side is that when you change something that's going to break some things, it's much easier if there's a compelling advantage that comes with the changes. The 'eat your vegetables' approach - do this because it's the right thing to do, doesn't make us many friends.

@minrk
Copy link
Member

minrk commented Apr 22, 2016

The main argument now is that if CommonMark extensions do happen and we adopt them, then we would be breaking notebooks twice instead of once. To be clear, adopting markdown-it + extensions is not adopting CommonMark, it is adopting yet another markdown dialect defined by a single implementation. That's not to say that I'm opposed to adopting markdown-it, but doing so should mean that we are giving up on the prospect of actual CommonMark extensions in the vaguely near future. Of course, if we stick with CommonMark-derived implementations (as we probably should), the domain of those incompatibilities ought to be confined to the areas affected by extensions, and not basic things like spaces in/around headings, like we've seen in the past.

@mgeier
Copy link
Contributor Author

mgeier commented Apr 23, 2016

@takluyver Do you want to switch to CommonMark or not?
If yes, your argument doesn't make any difference if we switch now or later.
If no, that's a different story ...

I think switching to CommonMark is enough compelling advantage on its own, we don't need anything else to "sell" it. Also, most users won't even see a difference.

@minrk wrote:

... if CommonMark extensions do happen and we adopt them, then we would be breaking notebooks twice instead of once.

I don't get it.
If we switch now to CommonMark, we should try that the "extensions" we were using before don't change their behavior too much. This way, only "basic" CommonMark features will break.
If there are ever relevant "official" extensions (like inline math, block math, tables) and we change to them, there will be possible breakage there.
But a single spot will break at most once.

And of course there will be breakage if the CommonMark spec changes, which hopefully will happen extremely seldom.

But really, all this won't be such a big deal. Only some extreme cases will break when switching to CommonMark. And I'm quite sure that if there will ever be a LaTeX-math extension, it will look very similar to what we are using now (i.e. what is used in pandoc).
I'm not so sure about tables, but I also hope that there won't be too much breakage.

adopting markdown-it + extensions is not adopting CommonMark, it is adopting yet another markdown dialect defined by a single implementation

TBH, I don't care what JavaScript library will be used, I don't have experience with any of them. I was just mentioning markdown-it because I found it by a quick web search and it seemed to be the only JavaScript library available that provides an API for syntax extensions (and many existing extensions using this API), all other libraries provide "only" an API for AST transformations and custom writers. But since the ugly hack for math blocks is already in place, that's probably not such a big problem. The only thing that may be problematic is the implementation of tables, which are not (and won't be) part of "core" CommonMark.

markdown-it claims to follow the CommonMark spec, so if they are not lying, it would be adopting CommonMark. The existence or non-existence of extensions doesn't change anything about that.

doing so should mean that we are giving up on the prospect of actual CommonMark extensions in the vaguely near future

As I said above, potential "official" extensions for LaTeX math and tables will very likely be very close to what we're using now. They will probably be less restrictive, but this wouldn't be a backwards-compatibility problem.

if we stick with CommonMark-derived implementations

I'm not quite sure what you mean by that.
Do you mean implementations that pass the test suite provided by https://github.com/jgm/CommonMark?
If yes, I agree, whatever library we choose should pass all those tests.

@willingc
Copy link
Member

@mgeier Historically, the Jupyter team has been very committed to providing installed users, even those that may be working with edge use cases, backward compatibility and minimal breakage when making changes to the code base.

Breakage now, or in the future, is still breakage for the end user and their organization. As for the implementation costs, there are a number of variables (developer time, support for users experiencing breakage, resource availability within the larger project scope, etc.) that make it difficult to prove with a high degree of certainty that implementing changes now is preferable to waiting.

Overall, I believe that @minrk and @takluyver may simply be saying that their preference is to err on the side of caution when it comes to impacting our end users.

@mgeier
Copy link
Contributor Author

mgeier commented May 14, 2016

Just a little update: I haven't found an extensible CommonMark implementation for Python, which I would need for nbsphinx (and which would be needed for nbconvert, too).
I hope an implementation will turn up at some point, if not, I guess I'll have to do it ...

Until then, we can also wait with changing the JavaScript implementation.

@Carreau Carreau modified the milestone: wishlist Jun 17, 2016
@jasongrout
Copy link
Member

Another argument for switching from marked - the last commit to master was about a year ago.

@gazzar
Copy link

gazzar commented Sep 30, 2018

@mgeier Just an update that there is now an MIT-licensed "fast, extensible and spec-compliant Markdown parser in pure Python" called mistletoe
https://github.com/miyuchina/mistletoe

@mgeier
Copy link
Contributor Author

mgeier commented Oct 4, 2018

Thanks for the hint, @gazzar! It indeed looks like it has an API for syntax extensions. Now I don't have an excuse anymore ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants