Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying Inkspace for svg conversion tool (as well as rsvg-convert) #8176

Closed
fuhrmanator opened this issue Jul 11, 2022 · 18 comments
Closed

Comments

@fuhrmanator
Copy link

Describe your proposed improvement and the problem it solves.

Some SVG files (those with certain foreignObject?) get (better) conversion to PDF (or other destinations) with Inkscape rather than rsvg-convert. It would be good to have a way to specify which SVG converter to use.

Describe alternatives you've considered.

I've considered various filters (gists, python, hs), but they seem to be unmaintained by a community.
ConTeXt apparently supports inkscape (if it's installed), but for my project I can't use ConTeXt.

@jgm
Copy link
Owner

jgm commented Jul 12, 2022

What would be the command for converting on the command line via inkscape?

@fuhrmanator
Copy link
Author

I'm not sure I understand the question, but I could suggest for pandoc:

--svg-converter=CONVERTER

CONVERTER is either rsvg-convert or inkscape

Or are you asking about how to convert with inkscape:

inkscape --export-filename=sample.pdf sample.svg

I believe the extension of the export filename determines the destination format (pdf, png, etc.)

There is documtation here.

@jgm
Copy link
Owner

jgm commented Jul 12, 2022

Is inkscape strictly better than rsvg-convert? Or is it better in some cases and worse in others? If it's strictly better, perhaps we could always use inkscape if available and fall back on rsvg-convert. Then we wouldn't need a new option.

@fuhrmanator
Copy link
Author

Is inkscape strictly better than rsvg-convert? Or is it better in some cases and worse in others?

I think it's the latter. I know that Inkscape has been involved in SVG as a standard for a while, e.g., https://inkscape.org/support-us/svg-standards-work/

Inkscape as a creation tool is very powerful and the things you can create with it are converted well to PDF (maintaining vectors -- there are examples I know of, especially with paths, that rsvg-convert rasterizes in the PDF).

But, the tool itself is likely much bigger (it's a pretty big install on windows, 0.5 G for example).
image

I suspect the performance of both will also be different. Given a large batch of SVG files, rsvg-convert might be faster overall.

If it's strictly better, perhaps we could always use inkscape if available and fall back on rsvg-convert.

I like that idea, but I'm biased in wanting to use Inkscape. It's easy enough to remove inkscape from one's path to avoid pandoc from using it, I suppose.

@tarleb
Copy link
Collaborator

tarleb commented Jul 12, 2022

For the necessary commands see the respective function in the diagram-generator.lua filter. As can be seen there, the whole process is made more complex by having to check for the installed inkscape version, as the parameter names are not the same in v1 and v2.

@matclab
Copy link

matclab commented Dec 21, 2023

If you arrived here because of some SVG produce by mermaid, you can get rid of the foreignObjects by adding %%{init: {"flowchart": { "htmlLabels": false}} }%% before the flowchart or graph line to solve the problem

@alerque
Copy link
Contributor

alerque commented Dec 21, 2023

Regarding the original suggestion, inkscape continues to be problematic for scripted CLI usage. An unpredictable set of actions and/or input/output formats trigger the GUI to start and close even when running CLI actions, the options regarding bounding boxes (page, bleed, crop boxes, etc.) are not tested from the CLI before releases and regularly break, etc. I would suggest it is not well suited to baking into pandoc. Perhas allowing a custom converter command to be specified at runtime would serve the purpose for specific scenarios, but it is a relatively unstable target to try to bake in support for (speaking as someone who maintains baked in support for scripted use in casile).

@jgm jgm closed this as completed Dec 21, 2023
@Foadsf
Copy link

Foadsf commented Oct 21, 2024

to highjack this discussion as well, the @matclab solution did not work for me, as explained here. Alternatively one can use mermaid-cli to export the Mermaid stuff as PDF:

mmdc -i path\toinput.md --outputFormat=pdf --pdfFit -o path\to\input_preprocessed.md

and then use Pandoc to convert it to PDF:

pandoc input_preprocessed.md -f markdown-implicit_figures -o path\to\output.pdf

alternatively, if you prefer using mermaid-filter, then change your fences code from

```mermaid
```

to

```{.mermaid format=pdf}
```

and then

pandoc -F "%APPDATA%\npm\mermaid-filter.cmd" input.md -o output.pdf

please consider that my solution is for Windows OS. On POSIX compatible platforms such as macOS and Linux you need to adapt.

@Foadsf
Copy link

Foadsf commented Oct 21, 2024

@fuhrmanator I think an inherently better and more robust solution is to use headless Chromium based internet browsers, such as Google Chrome or Microsoft Edge, in headless mode to convert SVG to PDF. For example see this post. For example

msedge --headless --disable-gpu --print-to-pdf=<output-pdf-path> <input-svg-path>

@fuhrmanator
Copy link
Author

Thanks @Foadsf ! That is super useful, and I've used headless Chrome in software testing pipelines.

Now, how to make it work in the pandoc pipeline? I'm using tools like Quarto and it defaults to rsvg-convert which fails on many of my SVG files, the impetus for this issue.

@Foadsf
Copy link

Foadsf commented Oct 21, 2024

I'm totally a fan of your proposal to let the user choose for their SVG converter, but yeah rsvg-convert, CairoSVG, Ghostscript, Inkscape,... all seem to have issues with SVGs generated with Mermaid CLI. So I suppose the starting point should be search Pandoc's code base and find the exact line(s) where it attempts to call rsvg-convert and then replace it with headless browsers. Try and error!

@jgm
Copy link
Owner

jgm commented Oct 21, 2024

I don't want to reopen this issue, because I don't think inkscape is an improvement over rsvg-convert. But headless chrome might be. Feel free to open a new issue, including the command-line invocations that would work for both chrome and edge. (I guess we could always see what is in the path - edge, chrome, chromium, rsvg-convert - and use the best available option.)

@fuhrmanator
Copy link
Author

I was ready to open a new issue, and tried some conversions to verify the command-line info. But I got stuck because everything comes out in the PDF as US Letter size (I was able to turn off header/footer with --no-pdf-header-footer), which is definitely not good for my use cases. I verified in my SVG that my document size was constrained to the image (there's an <svg width="500px" height="500px" viewBox="0 0 500 500" ...> tag), but the PDF produced by msedge was still 8.5" x 11".

It seems edge/chrome/chromium doesn't have a command-line option to use the SVG's bounding box for the PDF page size. There are some work-arounds (see https://stackoverflow.com/questions/44970113/how-can-i-change-paper-size-in-headless-chrome-print-to-pdf/) if you put the SVG in an HTML document with a CSS setting of the bounding box, but that's seems overly complicated compared to how the other tools work. ChatGPT suggested creating a script that uses Puppeteer (nodejs), which also doesn't seem viable for a pandoc pipeline.

@Foadsf did I miss something? Otherwise, without variable-sized PDF output, I can't see this as a good option.

@Foadsf
Copy link

Foadsf commented Oct 22, 2024

@fuhrmanator I am no expert here, but I would say manually hardcoding the page/frame size, is not a the best idea. According to this comment, Inkscape has the --export-area-drawing option to take care of that, and mmdc also has the --pdfFit which fits better to my use-case.

TBH, I have not tested the headless browser idea much other than this post, but I think if we want to resolve these issues in a canonical way, we need to go upstream and open issues for rsvg-convert/cairosvg... another option is also to use Puppeteer which AFIK is used also by mmdc and the output PDFs seem fine.

P.S.1. I see that somebody has already opened an issue here upstream.

P.S.2. Boy oh boy, the state of SVG rendering and conversion is really messy. The more I read about it the more I get convinced that SVG was probably a bad choice by the Mermaid team, and other projects such as Inkscape, as the intermediary markup language. Probably they would be better off using something like Postscript .eps. But if SVG is here to stay then maybe one could look into html2image, html2pdf, and/or html2canvas.

P.S.3. Apparently, Mozilla has solved the foreignObject support issue in their Gecko engine, but it seems like they have never published it as a standalone, modular and reusable library. More on that here. (yeah it seems like Mozilla has pivoted to wards Direct2D since then)

@Foadsf
Copy link

Foadsf commented Oct 22, 2024

@jgm please check this simple Node.js script. feel free to just include it anywhere you need. More on this here.

@jgm
Copy link
Owner

jgm commented Oct 22, 2024

@Foadsf this worked pretty well in my tests, but it produced a two-page PDF for a one-page SVG.

About the dependencies: I know nothing about 'puppeteer' - is this dependency something that would need to be installed?

@Foadsf
Copy link

Foadsf commented Oct 22, 2024

@jgm I know nothing about Puppeteer either, but this is the same library that Mermaid CLI uses to generate the PDFs, and if I'm not mistaken this is basically equivalent to the proposal shared above about using the headless browser. I'm not sure if there is anything similar to Puppeteer for Haskell, but if someone has Mermaid CLI, they already have Node.js and Puppeteer. Feel free to take the code I shared in the Gist with a WTFPL license and alter it in anyway you like. I think eventually something like this could replace rsvg-convert.

In parallel I believe it should be possible to flatten the and simplify the SVGs that include foreignObject in a way that librsvg, Cairo, and Inkscape can handle them. I have had limited success with Cairo so far, but not the other two. I might share something here if I have any success. If this works then practically Mermaid visualization should be compiled with almost all other PDF engines Pandoc knows.

@tarleb
Copy link
Collaborator

tarleb commented Oct 22, 2024

A past Haskell/Google Summer of Code project touched on this, cdp-hs. It should be possible to use that instead of puppeteer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants