Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandoc 2.11 adds HTML to rendered references when --to=markdown_strict #6921

Closed
dhimmel opened this issue Dec 4, 2020 · 11 comments
Closed

Comments

@dhimmel
Copy link

dhimmel commented Dec 4, 2020

With pandoc 2.11.2 and the following pandoc command (run via bash):

pandoc --citeproc --to=markdown_strict --wrap=none <<< "
---
nocite: '@*'
csl: https://github.com/manubot/rootstock/raw/97b294802ffcd39071b6e5b8ab59f60faf4be118/build/assets/style.csl
references:
- id: f51SCNU1
  type: webpage
  title: test
...
"

outputs:

<span class="csl-left-margin">1. </span><span class="csl-right-inline">**test**</span>

Formerly with pandoc 2.9.2.1 and --filter=pandoc-citeproc rather than citeproc:

1\. **test**

We've been using markdown_strict for manubot cite markdown output because it did not include the HTML snippets. Is this regression intentional? Is there anyway to specify markdown output without these HTML fragments added to the bibliography?

@dhimmel
Copy link
Author

dhimmel commented Dec 4, 2020

As far as the numbered list change: 1. is preferable to 1\., so some parts of the new behavior are nice.

@jgm
Copy link
Owner

jgm commented Dec 4, 2020

Raw inline HTML is part of original (strict) markdown.
If you don't want it, though, you can specify -t markdown_strict-raw_html

@jgm jgm closed this as completed Dec 4, 2020
@jgm
Copy link
Owner

jgm commented Dec 4, 2020

By the way, the reason those spans are there is to get proper CSL block-level formatting. If you strip it out or ignore it, you'll lose some distinctions that the style requires.

If you strip it out, you will once again get the escape in 1\., because otherwise this would be interpreted as a markdown ordered list item.

dhimmel added a commit to dhimmel/manubot that referenced this issue Dec 4, 2020
dhimmel added a commit to dhimmel/manubot that referenced this issue Dec 4, 2020
@dhimmel
Copy link
Author

dhimmel commented Dec 4, 2020

Raw inline HTML is part of original (strict) markdown. If you don't want it, though, you can specify -t markdown_strict-raw_html

That removed the unwanted <span> elements. Thanks!

By the way, the reason those spans are there is to get proper CSL block-level formatting

Ah yes. That is something else I've noticed with the citeproc migration. We no longer have line breaks between CSL blocks with our existing style. For example, the plain text output looks like:

2. Honey bee sting pain index by body locationMichael L SmithPeerJ (2014-04-03) https://doi.org/gfrfbmDOI: 10.7717/peerj.338 · PMID: 24765572 · PMCID: PMC3994616

Rather than:

2. Honey bee sting pain index by body location
Michael L Smith
PeerJ (2014-04-03) https://doi.org/gfrfbm
DOI: 10.7717/peerj.338 · PMID: 24765572 · PMCID: PMC3994616

Is this something where we need to update our CSL style? Let me know if I should open another issue describing this more clearly?

@jgm
Copy link
Owner

jgm commented Dec 4, 2020

Yes, take a look at the CSS in the current pandoc default template.

@dhimmel
Copy link
Author

dhimmel commented Dec 4, 2020

take a look at the CSS in the current pandoc default template

I found the following, which styles some of the CSL spans but not <csl-block>:

$if(csl-css)$
div.csl-bib-body { }
div.csl-entry {
clear: both;
$if(csl-entry-spacing)$
margin-bottom: $csl-entry-spacing$;
$endif$
}
.hanging div.csl-entry {
margin-left:2em;
text-indent:-2em;
}
div.csl-left-margin {
min-width:2em;
float:left;
}
div.csl-right-inline {
margin-left:2em;
padding-left:1em;
}
div.csl-indent {
margin-left: 2em;
}

For HTML output, we could update our CSS to place csl-blocks on their own lines. But for --to=plain, --to=markdown_strict-raw_html, --to=docx, etcetera, editing the CSS won't have an effect right? So does that mean it's no longer possible to have newlines between components of a reference that applies to all output formats?

@jgm
Copy link
Owner

jgm commented Dec 4, 2020

Note: in the AST, we represent the display styles using Spans, since the type is [Inline]. But the HTML writer will render these as divs, hence the rules for divs in the css.

No special style was added for 'block' because the default rendering of a div is fine for that.

But it looks as if for some reason we're not rendering the Span with class csl-block as a div in the HTML. I need to look into this.
Example:

<div class="csl-left-margin">6. </div><div class="csl-right-inline"><strong>A6</strong> <span class="csl-block">John Doe</span> <em>Cambridge University Press</em> (2010) <a href="https://127.0.0.1/documents/Watson--paper.pdf">https://127.0.0.1/documents/Watson--paper.pdf</a></div>

To get the plain markdown output you want, you could use a filter that adds soft breaks before each Span with class csl-block -- or something like that. I might want to experiment with adding these soft breaks automatically for all formats, since this will produce nicer output outside of HTML/LaTeX.

@jgm jgm reopened this Dec 4, 2020
@jgm
Copy link
Owner

jgm commented Dec 4, 2020

OK, I see the bug in cslEntryToHTML (also ToLaTeX, ToDocx): it doesn't properly handle nested Spans with csl display attributes.

jgm added a commit that referenced this issue Dec 4, 2020
Previously inner Spans used to represent
CSL display attributes were not rendered as div tags.

See #6921.
@jgm
Copy link
Owner

jgm commented Dec 4, 2020

I've made some fixes to both HTML and LaTeX output; maybe you could try.

Btw, I'm not sure the way you're using the "block" display style is right; I think that after using the "block" for the author, you should include another block for the rest; otherwise the HTML doesn't look right. Maybe there's a way to fix this by changing CSS, I'm not sure.

jgm added a commit that referenced this issue Dec 4, 2020
This just looks better and doesn't affect the semantics.
See #6921.
@jgm
Copy link
Owner

jgm commented Dec 4, 2020

I've added some newlines in the markdown output which should improve things for you.

@jgm jgm closed this as completed Dec 4, 2020
@dhimmel
Copy link
Author

dhimmel commented Dec 5, 2020

Btw, I'm not sure the way you're using the "block" display style is right; I think that after using the "block" for the author, you should include another block for the rest

Yes, we alternated block display for every other line since otherwise references were double spaced. See manubot/rootstock#346 (comment) and manubot/rootstock#134. But we should revisit our style for the new citeproc.

One thing we did is create a document with all combinations of CSL JSON fields in manubot/manubot#110. Then we could render it for a given CSL style and check the formatting was as expected.

Will hopefully do this soon for an updated style and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants