Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word docx to adoc conversion feature #144

Closed
Intelligent2013 opened this issue May 16, 2022 · 12 comments
Closed

Word docx to adoc conversion feature #144

Intelligent2013 opened this issue May 16, 2022 · 12 comments
Assignees
Labels
enhancement New feature or request

Comments

@Intelligent2013
Copy link
Contributor

To do: add Word document (.docx format) to adoc conversion based on Word document's styles:

  • 'Foreword Title'
  • 'Foreword Text'
  • 'std_docNumber'
  • 'zzSTDTitle'
  • 'Heading 1'
  • 'Heading 2'
  • 'TermNum'
  • 'Term(s)'
  • 'Example'
  • 'Note'
  • 'RefNorm'
  • 'Source'
  • etc.
@Intelligent2013 Intelligent2013 added the enhancement New feature or request label May 16, 2022
@Intelligent2013 Intelligent2013 self-assigned this May 16, 2022
@Intelligent2013
Copy link
Contributor Author

document.xml (WordprocessingML) from docx should be preliminary cleaned for some deleted text, because it can't be converted into adoc and significantly increases complexity of the conversion:

Deleted text on the coverpage:
image

Deleted items in the Bibliography:
image

Deleted 'obligation' for Annex:
image

@ronaldtse
Copy link
Contributor

@Intelligent2013 regarding the styles, perhaps you can work with @opoudjis for an authoritative list of these styles and the mapping for metanorma.org? That would make users' lives much easier too. Thanks!

@Intelligent2013
Copy link
Contributor Author

@ronaldtse as I understood, Word docx to mn adoc converter should support two kind of styles:

  • docx styles from metanorma gem, and
  • docx styles from ISO template (i.e. word document created in Word directly by ISO)
  • and mix of them

Right?

@ronaldtse
Copy link
Contributor

@Intelligent2013 No, that's a misunderstanding.

There are only 2 types of styles:

  1. ISO Simple Template styles.
  2. ISO Edited DIS template styles.

Metanorma is supposed to generate both types of Word templates. If Metanorma creates content outside these styles, it's wrong.

There is no mix of styles. A document can either be in the ISO Simple Template, or in the ISO Edited DIS template.

@Intelligent2013
Copy link
Contributor Author

Intelligent2013 commented May 24, 2022

DRAFT mapping table between docx styles (ISO Simple Template) and adoc:

ISO Simple Template docx internal xml style adoc example mnconvert support
ANNEX ANNEX [[AnnexA]]
[appendix,obligation=normative]
== Determination of defects
+
Admonition Admonition
AdmonitionTitle AdmonitionTitle
AltTerm(s) AltTerms alt:[paddy rice]
alt:[rough rice]
+
Annex Figure Title Char AnnexFigureTitleChar
Annex Figure Title AnnexFigureTitle .Split-it-right sample divider +
Annex Table Title Char AnnexTableTitleChar
Annex Table Title AnnexTableTitle .Table title +
Annex Table TitleCxSpFirst AnnexTableTitleCxSpFirst
Annex Table TitleCxSpLast AnnexTableTitleCxSpLast
Annex Table TitleCxSpMiddle AnnexTableTitleCxSpMiddle
Biblio Title BiblioTitle
biblio Biblio
Code Code [source]
--
puts "Hello, world."
--
+
Definition Char DefinitionChar
Definition Definition domain:[rice]
organic and inorganic components other than whole or broken kernels
+
DeprecatedTerm(s) DeprecatedTerms deprecated:[cargo rice] +
Example Char ExampleChar
Example Example ====
Example text
====
+
Figure Title Char FigureTitleChar
Figure Title FigureTitle .Split-it-right sample divider +
Figure TitleCxSpFirst FigureTitleCxSpFirst
Figure TitleCxSpLast FigureTitleCxSpLast
Figure TitleCxSpMiddle FigureTitleCxSpMiddle
Foreword Text Char ForewordTextChar
Foreword Text ForewordText Text +
Foreword Title ForewordTitle == Foreword +
Formula Formula
Heading 1 Heading1 == Title +
Heading 2 Heading2 === Title +
Heading 3 Heading3 ==== Title +
Heading 4 Heading4 ===== Title +
Heading 5 Heading5 ====== Title +
Heading 6 Heading6 [level=6]
====== Title
+
Heading 7 Heading7 [level=7]
====== Title
+
Intro Title IntroTitle == Introduction +
List Paragraph Char Char2
normref NormRef
Note Char NoteChar
Note note NOTE: This category includes ... +
Quote Quote
quoteattribution QuoteAttribution
RecommendationTitle RecommendationTitle
Source Char SourceChar
Source Source <<ISO_7301_2011,clause=3.1>> +
SourceTitle SourceTitle
Sourcecode Sourcecode
Table Grid MsoTableGrid
Table ISO MsoISOTable MsoISOTableBig
Table title Char TabletitleChar
Table titleCxSpFirst TabletitleCxSpFirst
Table titleCxSpLast TabletitleCxSpLast
Table titleCxSpMiddle TabletitleCxSpMiddle
TableFootnote TableFootnote
Tabletitle tabletitle .Table title +
Term(s) Terms === term's name +
TermNum TermNum skip +
titlepagesubhead TitlePageSubhead
TOC x TOCx skip +
boilerplate-address boilerplate-address
boilerplate-copyright boilerplate-copyright
boilerplate-name boilerplate-name
content content
coverpage coverpage
coverpage-doc-identity coverpage-doc-identity
coverpage-logo coverpage-logo
coverpage-stage-block coverpage-stage-block
coverpage-tc-name coverpage-tc-name
coverpage-title coverpage-title
coverpage-warning coverpage-warning
coverpage_docnumber coverpage_docnumber
coverpage_docstage coverpage_docstage
coverpage_techcommittee coverpage_techcommittee
coverpage_warning coverpage_warning
doctitle doctitle
example_label example_label
figdl figdl
formula formula
formula_dl formula_dl
msotoctextspan MsoTocTextSpan
note_label note_label
pseudocode pseudocode
table.dl table.dl
tablefootnoteref tablefootnoteref skip +
title title
title-second title-second
zzAddress zzaddress skip +
zzContents zzContents skip +
zzCopyright zzCopyright skip +
zzSTDTitle zzSTDTitle zzSTDTitle1 zzSTDTitle2
zzwarning zzWarning
zzwarninghdr zzWarningHdr

@ronaldtse
Copy link
Contributor

@Intelligent2013 just wanted to remind you that there is the ISO Edited DIS style, which uses different styles from ISO Simple Template. @opoudjis has been generating the Edited DIS style for DIS/FDIS/Final ISO Word documents, which is now very close to completion.

@Intelligent2013
Copy link
Contributor Author

@ronaldtse yes, I remember, thank you.

@Intelligent2013
Copy link
Contributor Author

Intelligent2013 commented May 30, 2022

Further tasks:

  • review processing (process comment reference to comments.xml)
  • process _term_ (<<term>>) as term:[ ]
  • remove Annex clause numbers (example: A.1 Principle)
  • process '4.2.1' inline clause number
  • add clause's id from w:bookmarkStart
  • convert Word math markup to adoc or mathml

@Intelligent2013
Copy link
Contributor Author

Added conversion from Word math markup (OMML) to MathML and AsciiMath (for simple formulas like r or a=b+c).

@Intelligent2013
Copy link
Contributor Author

This ticket is too long, so I'll split into two for 'ISO Edited DIS Template' and 'ISO Simple Template' for further development tracking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants