Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customization guidelines #189

Open
slochower opened this issue Mar 13, 2019 · 9 comments
Open

Customization guidelines #189

slochower opened this issue Mar 13, 2019 · 9 comments

Comments

@slochower
Copy link
Collaborator

This is a followup to some of the discussion in #169, viz. #169 (comment).

I see two possible directions for detailing the customization options.

  1. Is it useful (and feasible) to describe the common customizations one might want for writing a typical journal or grant? The motivation is: if I use Manubot to write a grant, I don't want the readers thinking "why does this document look different to the others" or worse "this document doesn't conform to our formatting specifications and we won't consider it". Dealing with formatting is (unfortunately) still a real problem. Personally, I can work in Word and I can usually figure out how to mess with formatting to conform to requirements (and I can leave this to the last minute), but if I write with Manubot, how much time should I allocate for tweaking the formatting? A few hours? A few days so I can post Issues and wait for feedback? I think it's murky. Is our recommendation, on behalf of Manubot, to tell users with formatting requirements to export to DOCX and do the formatting there? Or do we want to detail how to customize the CSS? Say I want to submit an article to Nature, I need to make sure I don't exceed 5 pages, try to use Times New Roman, 12 pt, with Greek letters for math, put tables on a separate page with the description double spaced, figures as small as possible, with amino acid sequences in Courier, etc... Do we want to show how to write a multi-font document, resize figures inline, put in manual page breaks, write distinct divs for different body elements? Do we even support page numbers (I don't think most print PDF from HTML formatters do)? I'm not sure what the guidance should be...

  2. How easy is it get rid of some of the plugins if a user wishes? This is comparatively easy. In USAGE.md, we write that each plugin is enabled during build.sh and can be individually turned off.

@dhimmel
Copy link
Member

dhimmel commented Mar 13, 2019

Is it useful (and feasible) to describe the common customizations one might want for writing a typical journal or grant?

Regarding complying with specific submission formats, I we should consider focusing on the DOCX export rather than the HTML styling. My reasoning is:

  • even if most formatting options can be applied to the HTML view via CSS, there will usually be some that cannot, requiring using a word processor for the final touches.
  • many places with strict formatting requirements will require the source submission as DOCX or Latex. Therefore, formatting effort put into the HTML/PDF may have to be repeated in the future.

Based on these reasons, I suggest we explore DOCX output with Pandoc's --reference-doc option:

Use the specified file as a style reference in producing a docx or ODT file. For best results, the reference docx should be a modified version of a docx file produced using pandoc. The contents of the reference docx are ignored, but its stylesheets and document properties (including margins, page size, header, and footer) are used in the new docx.

Thus --reference-doc should be able to deal with font, sizing, and other formatting stipulations. Perhaps, we can find reference-docs for existing journals somewhere, or potentially create a catalog of community contributions.

Now there are also cases where individuals want a different HTML/PDF style. I believe CSS provides some customization and hopefully users can modify default.html to make minor updates as needed. We also can potentially provide some turnkey configuration for the CSS. However, I am skeptical whether it will ever be easy enough for most users. On the other hand, editing a DOCX file does seem like it will be more accessible.

Finally, our longterm goal would be submission using JATS XML, but this is years off as no journals I'm aware of actually accept JATS submission at the moment.

@slochower
Copy link
Collaborator Author

, I we should consider focusing on the DOCX export rather than the HTML styling

Okay, fair points.

Thus --reference-doc should be able to deal with font, sizing, and other formatting stipulations. Perhaps, we can find reference-docs for existing journals somewhere, or potentially create a catalog of community contributions.

I've used this a bit, but it still leaves some things to be desired (e.g., keeping captions with Figures). It may be the best we can do, but it will be good to put up some guidance.

Do you have things to add @agitter ?

@agitter
Copy link
Member

agitter commented Mar 13, 2019

I don't have much to add other than to say that addressing these practical formatting concerns will be important if we want to scale beyond to be a writing platform that researchers can use for their everyday needs. DOCX export does seem like our best option for now, but it is a time sink if the formatting has to be manually fixed many times when writing.

Some of the problems and solutions may become more apparent as we accrue examples of working with Manubot for different uses cases. Your grant writing example was enlightening. So far with deep review and meta review the journals did not impose strict formatting requirements so I have not had to deal with them much.

We could reach out to the few other users who have submitted Manubot manuscripts to journals to learn about their strategies.

@slochower
Copy link
Collaborator Author

We could reach out to the few other users who have submitted Manubot manuscripts to journals to learn about their strategies.

Good idea! @dhimmel, I suspect you have the most knowledge of who is using Manubot. I wonder if we should make it easy for users to add a reference DOCX here via PR or we should create a templates repository under the main Manubot organization repo.

@dhimmel
Copy link
Member

dhimmel commented Mar 13, 2019

I wonder if we should make it easy for users to add a reference DOCX here via PR or we should create a templates repository under the main Manubot organization repo

Another option besides a new repo to host reference-docs could be a directory in https://github.com/manubot/resources. Let's check whether Pandoc supports --reference-doc=URL like it does for --csl=URL.

@vincerubinetti
Copy link
Collaborator

vincerubinetti commented Mar 18, 2019

FWIW, I intend to eventually write an exhaustive docs (that goes into way more detail than could reasonably fit in a readme.md). We'll definitely cover what the plugins do, how to customize them, what the themes do, how to customize them, and more.

@dhimmel
Copy link
Member

dhimmel commented Mar 21, 2019

A note for future reference: we should look at projects like pubcss to see how much styling is possible with CSS.

@rhagenson
Copy link
Contributor

rhagenson commented Aug 27, 2019

Conversation began in #235

To better understand where this customization piece hooks into the conversion process I have a few questions:

  • Is there an existing document on the passes/sweeps that Manubot makes to transform from source to HTML/PDF/DOCX?
  • How much transformation is driven by Manubot and how much is Pandoc?
  • Is there any work done by Pandoc currently that needs to be undone before further processing by Manubot?

@dhimmel
Copy link
Member

dhimmel commented Aug 28, 2019

Is there an existing document on the passes/sweeps that Manubot makes to transform from source to HTML/PDF/DOCX?

Almost all of the conversion process is done by the build.sh script, so that is the best place to look. The description in the Manubot software paper is a bit too general probably to be useful here. Prior to Pandoc, processing is done by the manubot process command:

rootstock/build/build.sh

Lines 16 to 20 in bc094bd

manubot process \
--content-directory=content \
--output-directory=output \
--cache-directory=ci/cache \
--log-level=INFO

How much transformation is driven by Manubot and how much is Pandoc?

Most of the transformation is driven by Pandoc. However, Manubot does a lot of the citation/bibliographic processing (the whole cite-by-persistent-identifier stuff). In addition, we use customized Pandoc commands to, for example, insert custom HTML / CSS / JS into the manuscript.html output.

As a general guideline, we'd like to delegate as much as possible to Pandoc. Ideally, we can avoid duplicating features. There is room for improvement here. For example, perhaps citation-by-identifier could be a pandoc filter as opposed to a separate workflow that must search through markdown (see manubot/manubot#99). Or perhaps we can use Pandoc templates for our custom HTML.

Is there any work done by Pandoc currently that needs to be undone before further processing by Manubot?

Not really.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants