Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supporting GitHub-flavored Markdown as Description-Content-Type #126

Closed
brainwane opened this issue Mar 17, 2018 · 24 comments
Closed

supporting GitHub-flavored Markdown as Description-Content-Type #126

brainwane opened this issue Mar 17, 2018 · 24 comments

Comments

@brainwane
Copy link
Contributor

Some users would like PyPI to support READMEs (long_descriptions) written in GitHub-flavored Markdown as a description_content_type. Right now we only support CommonMark. Implementing this may span readme_renderer and other projects, so I'm putting it in this repo for discussion and sorting out next steps.

Followup to #46.

Ref:

@ewdurbin
Copy link
Member

Should be specified as a variant in the Description-Content-Type ala:

Description-Content-Type: text/markdown; charset=UTF-8; variant=GitHub or similar

@theacodes
Copy link
Member

theacodes commented Mar 18, 2018

I'm trying to hunt down a good candidate for the gfm markdown renderer.

GitHub uses https://github.com/github/markup to determine which renderer to use. For .md, it uses CommonMarker with some extension. CommonMarker is a Ruby binding for https://github.com/github/cmark which is a fork of https://github.com/commonmark/cmark adding the GitHub flavored markdown extensions.

There is a set of Python bindings for the upstream cmark available over at https://pypi.python.org/pypi/paka.cmark. It should be (relatively) easy to fork/extend that to use GitHub's fork.

There's a lot of other options but few are well maintained or promise as much compatibility.

@theacodes
Copy link
Member

To make my last comment actionable:

  1. Is it acceptable for readme_renderer and warehouse to take on a cffi dependency?
  2. Would PyPA be willing to fork / contribute to https://github.com/PavloKapyshin/paka.cmark to add support for using GitHub's fork of cmark?

@theacodes
Copy link
Member

theacodes commented Mar 18, 2018

A little more research:

cmark is actually fairly easy to wrap if we don't need any of the AST / Node functionality. We literally just need the cmark_render_html function (It's a bit more complicated than that). I'm happy to throw together a minimal library that just binds that for our usage. WD(Y'all)T?

@ewdurbin
Copy link
Member

Any reason to avoid https://pypi.org/project/Markdown/? It seems to be fairly extensible and and would be straightforward to keep in sync with various variants. I haven’t dug deeply into the codebase but have used it with great success for other projects.

Biggest risk I see is any potentially nasty attack vectors brought on by executing the rendering code. New infra makes spinning a task like that out into its own service fairly straightforward, but I’m unsure of the trade off in complicating things.

@theacodes
Copy link
Member

Mostly just that if we're going to say we support "GitHub Flavored Markdown", I would ideally want to use the exact same renderer. Of course, that's not my decision to make, but wrapping a small C library to ensure the exact same semantics seems better than having to bend another implementation our will.

Happy to be convinced otherwise.

@ewdurbin
Copy link
Member

This may not be the last request for a specific flavor of Markdown, as the specifications are poor and everyone seems to do it differently. I don’t think we can sanely meet the same standard of shipping the reference implementation for every variant, so an extensibile Markdown renderer we can iterate/tune for each flavor is my preference.

@theacodes
Copy link
Member

as the specifications are poor and everyone seems to do it differently

This is exactly my argument in favor of using the reference implementations. In fact, if we use github's fork of cmark we can actually use that for both CommonMark and gfm. Considering these two variants are by far the most common, I think we could potentially draw the line there.

@ewdurbin
Copy link
Member

You make a sound point. I’m absolutely good with this approach, but would love to hear others input, particularly @di, having done most of the prior art in the space for Warehouse/PEP 566

@ncoghlan
Copy link
Member

From a portability perspective, the main tool where we want to avoid introducing a C dependency is pip, simply because having a C dependency at that layer significantly complicates bootstrapping new platforms and systems.

However, I don't see a problem with adding a C extension dependency for publishing tools or index servers (especially if that extension wraps a C library that is amenable to static linking).

@theacodes
Copy link
Member

Cool, yeah, this wouldn't require pip to have the c extension - just readme_renderer (and therefore warehouse). We could also make it an optional for readme_renderer.

I've got a proof of concept working already using github's fork (rendering CommonMark), I'm going to try to get it uploaded to github tonight (as soon as I can get the extensions API working and render GFM).

@theacodes
Copy link
Member

Alright: https://pypi.org/project/cmarkgfm/ and https://github.com/jonparrott/cmarkgfm are up - feel free to test it out and see if it's worth continuing to pursue.

Couple of notes:

  1. This should be the same thing github uses: cmarkgfm.cmark.markdown_to_html_with_extensions(text, options=2048, extensions=['autolink', 'tagfilter', 'table', 'strikethrough']). (Magic number simply because I haven't mapped all the options yet).
  2. I'm happy to move this to the PyPA org if we want to use it.
  3. I'm still relatively green on cffi, so if I did something dumb please let me know.
  4. There's still a bit more clean-up to do. :)

@dstufft
Copy link
Member

dstufft commented Mar 18, 2018

C dependency for readme_renderer and/or Warehouse is fine. Warehouse already has a number of them anyways.

@theacodes
Copy link
Member

Boom pypa/readme_renderer#67

@di
Copy link
Member

di commented Mar 19, 2018

Given how many folks will probably want GFM over CommonMark, I wonder if we should make that the default variant.

@theacodes
Copy link
Member

From pypa/readme_renderer#67 (comment)

@di:

This should Just Work™, the only change to Warehouse necessary would be bumping the readme_renderer dependency.

We need to change this to accept the "gfm" variant in addition to the "CommonMark" variant.

@di
Copy link
Member

di commented Mar 19, 2018

@jonparrott Ah yup, good catch, forgot that we were validating the variant as well.

@theacodes
Copy link
Member

@ewdurbin caught it, not me. :)

@theacodes
Copy link
Member

theacodes commented Mar 20, 2018

Okay, it seems all the pieces of the puzzle are in place for this:

Once those are done (and warehouse is deployed):

  • Verify that uploading a package with a Github-flavored Markdown description is displayed correctly. (owner: @jonparrott).
  • Update packaging.python.org to document this support. (owner: @jonparrott, reviewer: @brainwane).
  • Write a nice blog post about it. (owner: @jonparrott or other volunteer)

@theacodes
Copy link
Member

theacodes commented Apr 1, 2018

This is done. There are few things around explicitly setting the GFM variant that need to be deployed (it should be deployed now), but, the default variant is now GFM and Github-Flavored Markdown in long_description should now just work.

@brainwane we're good to close this!

@di
Copy link
Member

di commented Apr 1, 2018

GFM is now fully deployed as the default variant. Thanks @jonparrott!

@di di closed this as completed Apr 1, 2018
@brainwane
Copy link
Contributor Author

Write a nice blog post about it. (owner: @jonparrott or other volunteer)

@jonparrott I nominate you -- take a victory lap! :)

@theacodes
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants