-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pandoc does not pick up the figure which pandoc-crossref specifies using "<figure" #9720
Comments
Please report this to pandoc-crossref instead. |
Thank you for your instruction ! Do you suggest pandoc-crossref should not generate |
OK. Actually, this may point to something that can be done in pandoc. |
I imagine that pandoc-crossref is inserting something like this into the AST:
The problem is that pandoc's markdown writer will render this as HTML. And then, if you try to go from that markdown to docx, the raw HTML will disappear. Why does the markdown writer use raw HTML here? I'm not sure. You can disable raw HTML, though, with
and that, I think, will go through to docx. I think the markdown writer should probably just generate a standard |
In retrospect I don't think this is a problem for pandoc-crossref, so you can cancel any request you made there. |
OK, I see what is going on here. The HTML you display above was probably the result of rendering this AST element (inserted by pandoc-crossref):
In deciding whether to use an implicit figure, the markdown writer tries to determine whether this representation would capture all of the information in this Figure element. One case in which it wouldn't is the case where the image has an image description/alt text that is different from the figure's caption. (An implicit figure just takes the caption from what would otherwise be the image's alt text.) So the writer tests for this. Notice that the caption and the image description are almost the same in this case: the difference is that the caption also includes the label "Figure 1:". Anyway, it's because of that that we fall back to raw HTML. I suppose one way around this would be to just check that the suffix of the Caption matches the image description. This might lead to some false positives, but it's probably fairly reliable. |
What I'm not sure about is what we should do in the case where the suffix matches. Should the image description in the implicit figure include the "Figure 1:" part or not? If it does, then we might get bad results in formats that add a figure number (e.g. latex/pdf). |
It seems to me that pandoc-crossref when invoked is responsible for naming the figures (and the tables, and the equations). Would this information help with the decision ? :D |
I think we need feedback from @lierdakil on this. |
I'm a bit confused by the premise: converting to Markdown through pandoc-crossref then converting the output to docx. I don't know what you're trying to do, but it sounds like using native/json as intermediary format would resolve this, no? |
Honestly, Markdown-to-Markdown conversions were never a target, and in Pandoc, Markdown is not guaranteed to round-trip in the first place. I could make a patch changing the alt text to match the caption though 🤷 |
The reason why my workflow depends/depended on intermediate markdowns is chapter-wise references (should be chapter-wise bibliography if I remembered correctly) :D. A few years ago I read from the google discussion group about this idea (I cannot find it since the group is not accessible....) |
I don't necessarily see if that would prevent you from using |
Anyway, probably worth making the change regardless. This should work: lierdakil/pandoc-crossref@5f2b087 There is a bit of a twist, however. In some cases, pandoc-crossref will add attributes on the Figure element. If that happens, the resulting figure is impossible to represent in Markdown any more, so Pandoc will go back to representing it as raw HTML (if enabled) or nested divs. This does require explicit opt-in via pandoc-crossref configuration, and I don't really see a workaround, so I'm inclined to leave it be. @jiucenglou if you could test this commit for your use case and report back, that would be nice. Automatic builds will (edit: well, should, can't promise that, CI is a bit flaky) become available at the following links once CI finishes (in an hour or two probably):
P.S. I'll make a release proper probably tomorrow lest I forget. |
Many thanks ! Using the command line syntax below to use native as an intermediate format seems very well
|
I can test and report back. Would you suggest to keep using native as intermediate format even with the new patch ? |
I don't know the particulars of your setup, so it's up to you. If you don't really care about the intermediate format, |
In my real use case, the two command lines look like "${Pandoc}" "${Header}" "${TmpMd2}" -F pandoc-crossref --citeproc --csl="${CiteStyle}" -t markdown-citations -o "${TmpMd3}" --wrap=preserve --resource-path=$(dirname "${TmpMd2}")
"${Pandoc}" "${Header}" "${TmpMd3}" --fail-if-warnings -L Dry12_for_docx.lua -L skip_placeholder.lua -L mhchem.lua --reference-doc="${RefWordDocx}" -s -o "${MSWord}" I mean, the first run has a |
As shown below, I tried native on my real use case and I got
|
Meanwhile, turns out I forgot to update some tests, so that CI build failed. Anyway, I'll just cut a release I guess, and we can do another one if this doesn't work out for some reason. For future reference, 0.3.17.1 (artefacts not yet built, but this time CI should finish fine 🤞) |
Thanks @lierdakil - it looks like this isn't going to require pandoc changes, so I'll close this issue. |
Explain the problem.
I have a folder tree of
Ch3.md is
and I am using the following two runs to get docx
As shown below for the intermediate Ch3_tmp.md, the latest pandoc & pandoc-crossref starts to specify the figure using
<figure
.However, now the second run above generates a docx file without the figure in it...
With pandoc 2.19 and the compatible pandoc-crossref,
<figure
is not yet used and docx file resulted contains the figure.Could you suggest what I could do to use the latest pandoc to generate a docx file with the figure in it ?
Many thanks !
Pandoc version?
latest 3.13
The text was updated successfully, but these errors were encountered: