Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown rendering changes - stripping classes? #9054

Closed
2 of 7 tasks
cipherboy opened this issue Nov 17, 2019 · 9 comments · Fixed by #9075
Closed
2 of 7 tasks

Markdown rendering changes - stripping classes? #9054

cipherboy opened this issue Nov 17, 2019 · 9 comments · Fixed by #9075

Comments

@cipherboy
Copy link
Contributor

  • Gitea version (or commit ref): 1.10.0
  • Git version: 2.20.1
  • Pandoc version: 2.5-2 (dpkg)
  • KaTeX version: 0.11.0
  • Operating system: Ubuntu 19.10
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
  • Log gist:

Nothing relevant in logs.

Description

I've installed a custom markdown rendering based on pandoc as such:

[markup.markdown]
ENABLED         = true
FILE_EXTENSIONS = .md,.markdown
RENDER_COMMAND  = pandoc -f markdown -t html --katex

This lets me add a custom header and render LaTeX in Markdown with KaTeX:

    <link rel="stylesheet" href="https://git.cipherboy.com/internal-custom/static/katex/katex.min.css" integrity="sha384-BdGj8xC2eZkQaxoQ8nSLefg4AV4/AwB3Fj+8SUSo7pnKP6Eoy18liIKTPn9oBYNG" crossorigin="anonymous">

    <!-- The loading of KaTeX is deferred to speed up page rendering -->
    <script defer src="https://git.cipherboy.com/internal-custom/static/katex/katex.min.js" integrity="sha384-JiKN5O8x9Hhs/UE5cT5AAJqieYlOZbGT3CHws/y97o3ty4R7/O5poG9F3JoiOYw1" crossorigin="anonymous"></script>

    <!-- To automatically render math in text elements, include the auto-render extension: -->
    <script defer src="https://git.cipherboy.com/internal-custom/static/katex/auto-render.min.js" integrity="sha384-kWPLUVMOks5AQFrykwIup5lo0m3iMkkHrD0uJ4H5cjeGihAutqP0yW0J6dpFiVkI" crossorigin="anonymous" onload="renderMathInElement(document.body);"></script>

Sometime recently (I remember it working early in 1.9.x series) this got broken. Looking at the source, it looks like classes on elements started getting stripped by gitea after pandoc got done rendering it.

For example, the browser gets sent source like:

...
<li><span>\{..., -16, -11, -6, -1, 4, 9, 14, ...\}</span></li>
<li><span>\{...,-7, -4, -1, 2, 5, 8, ...\}</span></li>
<li><span>\{-2, -1, 0, 1, 2, 3, 4, 5, 6\}</span></li>
<li><span>\{-1, 0, 1, 2, 3, 4, 5, 6, 7\}</span></li>
<li><span>\{-\sqrt{3}, \sqrt{3} \}</span></li>
...

However, running the pandoc command above on the server (on the same source file) gives:

...
<li><span class="math inline">\{..., -16, -11, -6, -1, 4, 9, 14, ...\}</span></li>
<li><span class="math inline">\{...,-7, -4, -1, 2, 5, 8, ...\}</span></li>
<li><span class="math inline">\{-2, -1, 0, 1, 2, 3, 4, 5, 6\}</span></li>
<li><span class="math inline">\{-1, 0, 1, 2, 3, 4, 5, 6, 7\}</span></li>
<li><span class="math inline">\{-\sqrt{3}, \sqrt{3} \}</span></li>
...

Which makes me think gitea changed something recently. This results in KaTeX not rendering anything, which means my Math+Markdown files are now broken.

Did something change? Perhaps more markdown sanitation was added recently?

Screenshots

Screenshot from 2019-11-17 08-44-23

@guillep2k
Copy link
Member

Currently, output from all renderers is passed through a post-processor and an HTML sanitizer:

func render(parser Parser, rawBytes []byte, urlPrefix string, metas map[string]string, isWiki bool) []byte {
urlPrefix = strings.Replace(urlPrefix, " ", "+", -1)
result := parser.Render(rawBytes, urlPrefix, metas, isWiki)
// TODO: one day the error should be returned.
result, err := PostProcess(result, urlPrefix, metas, isWiki)
if err != nil {
log.Error("PostProcess: %v", err)
}
return SanitizeBytes(result)
}

The sanitizer rules are getting in your way. There's no way ATM of bypassing or tailoring those rules.

@guillep2k
Copy link
Member

I don't know in what version that behavior could have changed, though.

@guillep2k
Copy link
Member

May be options to bypass those steps could be added to the renderers' configuration?

@cipherboy
Copy link
Contributor Author

I don't know in what version that behavior could have changed, though.

Ah no, you're right and I'm wrong. I pulled 1.9.1, 1.9.0, and 1.8.3 and all had the same behavior as 1.10.0. I must've been misremembering the order I did the migration in. I thought I went gogs -> gitea and then added the custom renderer but I must've done the reverse. Sorry for the noise!

May be options to bypass those steps could be added to the renderers' configuration?

That'd be fine with me (UNSAFE_NO_SANITIZE or some such). My use case for gitea being is as a locked-down git forge for myself and very few others, with both DISABLE_REGISTRATION = true and REQUIRE_SIGNIN_VIEW = true. If someone uploads malicious Markdown, they're probably pranking me.

Alternatively, what about something like:

diff --git a/modules/markup/sanitizer.go b/modules/markup/sanitizer.go
index f873e8105..66bf1b60f 100644
--- a/modules/markup/sanitizer.go
+++ b/modules/markup/sanitizer.go
@@ -48,6 +48,9 @@ func ReplaceSanitizer() {
 
        // Allow keyword markup
        sanitizer.policy.AllowAttrs("class").Matching(regexp.MustCompile(`^` + keywordClass + `$`)).OnElements("span")
+
+       // Allow KaTeX markup from pandoc
+       sanitizer.policy.AllowAttrs("class").Matching(regexp.MustCompile(`^(math\s*|inline\s*|display\s*){0,3}$`)).OnElements("span")
 }
 
 // Sanitize takes a string that contains a HTML fragment or document and applies policy whitelist.

That seems simpler: you're letting through a very small subset of classes and in the default installation, nothing will happen because you're not shipping KaTeX and you're not shipping the Pandoc render. The owner would have to manually add KaTeX rendering (scripts + css + ...) and modify the config to switch to Pandoc to hit this in most cases.

Thought?

@lunny
Copy link
Member

lunny commented Nov 19, 2019

Or we could add custom regexp express on third-party external renderer configuration so that user could define themselves.

@cipherboy
Copy link
Contributor Author

cipherboy commented Nov 19, 2019 via email

@guillep2k
Copy link
Member

guillep2k commented Nov 19, 2019

That sounds like the best candidate. May I take a shot at implementing that?

@cipherboy Of course! You can check modules/markup/sanitizer.go. Rules are added at ReplaceSanitizer().

@cipherboy
Copy link
Contributor Author

OK, I have a working draft on my fork.

Is it possible to run tests without drone? It doesn't work on my system as it seems to require Docker, which doesn't work very well (last time I tried pulling a Docker container it crashed the Docker daemon...). Podman works better but it doesn't appear that drone CLI has support for podman yet.

@guillep2k
Copy link
Member

You certainly can run partial (but meaningful) tests with:

make test
make test-sqlite

Once you're satisfied, you can run:

make lint
make revive

cipherboy added a commit to cipherboy/gitea that referenced this issue Dec 6, 2019
Allowing the gitea administrator to configure sanitization policy allows
them to couple external renders and custom templates to support more
markup. In particular, the `pandoc` renderer allows generating KaTeX
annotations, wrapping them in `<span>` elements with class `math` and
either `inline` or `display` (depending on whether or not inline or
block mode was requested).

This iteration gives the administrator whitelisting powers; carefully
crafted regexes will thus let through only the desired attributes
necessary to support their custom markup.

Resolves: go-gitea#9054

Signed-off-by: Alexander Scheel <[email protected]>
techknowlogick pushed a commit that referenced this issue Dec 7, 2019
* Support custom sanitization policy

Allowing the gitea administrator to configure sanitization policy allows
them to couple external renders and custom templates to support more
markup. In particular, the `pandoc` renderer allows generating KaTeX
annotations, wrapping them in `<span>` elements with class `math` and
either `inline` or `display` (depending on whether or not inline or
block mode was requested).

This iteration gives the administrator whitelisting powers; carefully
crafted regexes will thus let through only the desired attributes
necessary to support their custom markup.

Resolves: #9054

Signed-off-by: Alexander Scheel <[email protected]>

* Document new sanitization configuration

 - Adds basic documentation to app.ini.sample,
 - Adds an example to the Configuration Cheat Sheet, and
 - Adds extended information to External Renderers section.

Signed-off-by: Alexander Scheel <[email protected]>

* Drop extraneous length check in newMarkupSanitizer(...)

Signed-off-by: Alexander Scheel <[email protected]>

* Fix plural ELEMENT and ALLOW_ATTR in docs

These were left over from their initial names. Make them singular to
conform with the current expectations.

Signed-off-by: Alexander Scheel <[email protected]>
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants