Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for chrome (and chromium) as PDF engines #7261

Closed
tarleb opened this issue May 1, 2021 · 6 comments
Closed

Support for chrome (and chromium) as PDF engines #7261

tarleb opened this issue May 1, 2021 · 6 comments

Comments

@tarleb
Copy link
Collaborator

tarleb commented May 1, 2021

The Chrome and Chromium web browsers support printing of websites through the command line, e.g.

chromium --headless --print-to-pdf=OUTFILE.pdf  WEBSITE

It would be nice if pandoc allowed to generate PDFs through HTML by having chrome and chromium available as PDF engines.

There isn't much documentation on command line options, but the devtools interface has many more options. Maybe we can make use of them somehow. https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF

@tarleb
Copy link
Collaborator Author

tarleb commented May 1, 2021

Maybe a counter-argument, taken from chrome's man page:

Google Chrome has hundreds of undocumented command-line flags that are added and removed at the whim of the developers.

But this has been supported now for 4 years, so it appears to be reasonably stable.

@tarleb
Copy link
Collaborator Author

tarleb commented May 1, 2021

Closing again, because making the PDF look reasonably nice seems to be non-trivial, see, e.g., the chrome-headless-render-pdf npm package. Maybe adding support for a wrapper like pagedjs-cli would be a better time investment.

Sorry for the noise.

@tarleb tarleb closed this as completed May 1, 2021
@mb21
Copy link
Collaborator

mb21 commented May 2, 2021

I've been wondering about this as well! And I'm still a bit surprised that there's no official (or at least commonly used) command-line wrapper around that use-case of chromium (analogous to wkhtmltopdf). The closest thing I could find was puppeteer-cli, which is a wrapper around puppeteer. Puppeteer is a node.js package and seems to be the most commonly used way to access Chromium over the DevTools Protocol, and seems to be fairly stable (i think it's even developed by Google itself). We could of course also try to call the devtools protocol directly from Haskell, but probably not the right approach. I'd rather just add another --pdf-engine option.

Note that pagedjs, is a different beast. It's a JavaScript library, meant to run in a browser, that's basically a polyfill for CSS specs for paged media that so far no browser has implemented yet thoroughly. (I've added it to PanWriter for the paginated preview and it works 'okay'...)

Quick search turned up this blog post which says much of the same things.

@tarleb
Copy link
Collaborator Author

tarleb commented May 2, 2021

There is a also a pagedjs-cli package on npm, which provides a convenient interface to go from HTML to PDF.

pagedjs-cli input.html -o output.pdf

I tried it on the manual, and as you say, it's not perfect but quite nice already.

@mb21
Copy link
Collaborator

mb21 commented May 2, 2021

Right, so while puppeteer-cli runs chromium directly on the input html/css, pagedjs-cli also injects the paged.js javascript and puts in divs etc. to do the pagination, before running chromium on it...

Neither package seems to have gained much adoption (yet), so it's a bit hard to justify adding either one to pandoc as a --pdf-engine option... or what to you think? we could at least keep the issue open?

@tarleb
Copy link
Collaborator Author

tarleb commented May 2, 2021

I just found #6126, which is probably a better placeholder than this issue. I'll add a comment about pagedjs there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants