Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with .odt when converting to docx with LibreOffice #43

Closed
hcf-n opened this issue Feb 1, 2021 · 17 comments
Closed

Problems with .odt when converting to docx with LibreOffice #43

hcf-n opened this issue Feb 1, 2021 · 17 comments

Comments

@hcf-n
Copy link

hcf-n commented Feb 1, 2021

When I convert to .odt with make4ht i get a file that works fine in LibreOffice. But, when I try to save as .docx I get some problems and LibreOffice refuses to convert. Investigating this I tried out the validator at https://odfvalidator.org. It seems that the odt. from Make4ht has some errors.

If this is something you could look into I would happily make a testfile to identify the errors. Since the validator has several different versions for the .odt format I wonder which one I should aim for in the tests.

best regards
Hans

@michal-h21
Copy link
Owner

Yes, please make a small test file. It will be hopefully just one command or package that cause problems.

Best regards,
Michal

@hcf-n
Copy link
Author

hcf-n commented Feb 7, 2021

I've looked into it and for article sized files it is difficult to pinpoint the exact latex code that is causing problems. It seems to me that it is the styles that ar causing problems. If I unzip the odt file produced by make4ht and simply delete styles.xml it seems to work better. This is confirmed by the odfvalidator. The following MWE converts fine wiht make4ht:

\documentclass[11pt, a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc,url}
\usepackage{textcomp}
\begin{document}
\title{Placeholder for title}
\author{Firstname Lastname}
\date{\today}
\maketitle
Test.
\end{document}

But the resulting odt file is not validated. If set validating to ODF1.0 Strict I get the following errors:

Details:

mwe2.odt: Info: ODF version of root document: 1.0
mwe2.odt/META-INF/manifest.xml: Warning: The directory 'Pictures/' is not a sub-document and should not be listed in the 'META-INF/manifest.xml' file of ODF package 'mwe2.odt'!
internal:/schema/odf1.0/OpenDocument-manifest-schema-v1.0-os.rng: Info: parsed.
mwe2.odt/META-INF/manifest.xml: Info: no errors, 1 warnings
mwe2.odt/mimetype: Info: no errors, no warnings
mwe2.odt: Info: Media Type: application/vnd.oasis.opendocument.text
internal:/schema/odf1.0/OpenDocument-strict-schema-v1.0-os.rng: Info: resolving 'internal:/schema/odf1.0/OpenDocument-schema-v1.0-os.rng'
internal:/schema/odf1.0/OpenDocument-strict-schema-v1.0-os.rng: Info: parsed.
mwe2.odt/meta.xml: Info: Generator: TeX4ht from mwe2.tex, options: xhtml,charset=utf-8,ooffice,html,refcaption (http://www.cse.ohio-state.edu/~gurari/TeX4ht/)
mwe2.odt/meta.xml: Info: no errors, no warnings
mwe2.odt/settings.xml: Info: no errors, no warnings
mwe2.odt/styles.xml[38,245]: Error: unexpected attribute "style:text-underline-height"
style:text-underline-color="#0000FF" /> </style:style> <style:style style:name= ----^ mwe2.odt/styles.xml[45,107]: Error: attribute "style:text-position" has a bad value
erties style:text-position="-25 100%"/> </style:style> ----^ mwe2.odt/styles.xml[47,107]: Error: attribute "style:text-position" has a bad value
operties style:text-position="15 70%"/> </style:style> ----^ mwe2.odt/styles.xml[53,304]: Error: unexpected attribute "style:background-transparency"
ansparency="100%" fo:margin-left="5%" > </style:graphic-properties> </style:sty ----^ mwe2.odt/styles.xml[76,306]: Error: unexpected attribute "fo:hyphenate"
ount="no-limit" style:page-number="0"/> </style:style> ----^ mwe2.odt/styles.xml[78,306]: Error: unexpected attribute "fo:hyphenate"
ount="no-limit" style:page-number="0"/> </style:style> ----^ mwe2.odt/styles.xml[80,330]: Error: unexpected attribute "fo:hyphenate"
ign="end" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[82,330]: Error: unexpected attribute "fo:hyphenate"
ign="end" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[121,112]: Error: unexpected attribute "style:num-format"
="1" text:min-label-distance="0.05in"/> ----^ mwe2.odt/styles.xml[124,112]: Error: unexpected attribute "style:num-format"
="a" text:min-label-distance="0.05in"/> ----^ mwe2.odt/styles.xml[127,111]: Error: unexpected attribute "style:num-format"
="i" text:min-label-distance="0.05in"/> ----^ mwe2.odt/styles.xml[130,111]: Error: unexpected attribute "style:num-format"
="A" text:min-label-distance="0.05in"/> ----^ mwe2.odt/styles.xml[172,222]: Error: unexpected attribute "fo:margin-top"
gin-left="0cm" fo:margin-right="0cm" /> </style:columns> </style:section-proper ----^ mwe2.odt/styles.xml[174,141]: Error: attribute "fo:column-count" has a bad value: "0" does not satisfy the "positiveInteger" type
:column-count="0" fo:column-gap="0cm"/> </style:section-properties > </style:st ----^ mwe2.odt/styles.xml[176,509]: Error: unexpected attribute "fo:font-size"
r-lines="false" text:line-number="0" /> </style:style> ----^ mwe2.odt/styles.xml[179,338]: Error: unexpected attribute "fo:font-size"
er-lines="false" text:line-number="0"/> ----^ mwe2.odt/styles.xml[191,377]: Error: unexpected attribute "fo:wrap-option"
-properties fo:wrap-option="no-wrap" /> </style:style> <style:style style:name= ----^ mwe2.odt/styles.xml[191,664]: Error: unexpected attribute "fo:wrap-option"
-properties fo:wrap-option="no-wrap" /> </style:style> <style:style style:name= ----^ mwe2.odt/styles.xml[195,51]: Error: unexpected attribute "fo:wrap-option"
<style:text-properties fo:wrap-option="no-wrap" /> ----^ mwe2.odt/styles.xml[198,51]: Error: unexpected attribute "fo:wrap-option"
<style:text-properties fo:wrap-option="no-wrap" /> ----^ mwe2.odt/styles.xml[239,112]: Error: unexpected attribute "style:text-underline"
erties style:text-underline="dotted" /> </style:style> <style:style style:name= ----^ mwe2.odt/styles.xml[240,304]: Error: unexpected attribute "fo:font-size"
er" style:justify-single-word="false"/> </style:style> ----^ mwe2.odt/styles.xml[242,332]: Error: unexpected attribute "fo:font-size"
d="false" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[244,154]: Error: unexpected attribute "fo:font-size"
er" style:justify-single-word="false"/> </style:style> ----^ mwe2.odt/styles.xml[246,182]: Error: unexpected attribute "fo:font-size"
d="false" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[248,154]: Error: unexpected attribute "fo:font-size"
er" style:justify-single-word="false"/> </style:style> ----^ mwe2.odt/styles.xml[250,182]: Error: unexpected attribute "fo:font-size"
d="false" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[253,192]: Error: unexpected attribute "fo:font-size"
="0cm" style:auto-text-indent="false"/> ----^ mwe2.odt/styles.xml[256,220]: Error: unexpected attribute "fo:font-size"
t="false" style:writing-mode="rl-tb" /> ----^ mwe2.odt/styles.xml[260,156]: Error: unexpected attribute "fo:font-weight"
er" style:justify-single-word="false"/> ----^ mwe2.odt/styles.xml[264,193]: Error: unexpected attribute "fo:font-weight"
t="0cm" style:auto-text-indent="false"> ----^ mwe2.odt/styles.xml[271,346]: Error: unexpected attribute "style:leader-char"
le:type="right" style:leader-char="."/> </style:tab-stops> </style:paragraph-pr ----^ mwe2.odt/styles.xml[273,342]: Error: unexpected attribute "style:leader-char"
le:type="right" style:leader-char="."/> </style:tab-stops> </style:paragraph-pr ----^ mwe2.odt/styles.xml[277,103]: Error: unexpected attribute "style:leader-char"
le:type="right" style:leader-char="."/> ----^ mwe2.odt/styles.xml[285,85]: Error: unexpected attribute "style:leader-char"
le:type="right" style:leader-char="."/> ----^ mwe2.odt/styles.xml[317,316]: Error: attribute "style:page-number" has a bad value: "0" does not satisfy the "positiveInteger" type
le-word="false" style:page-number="0"/> </style:style> ----^ mwe2.odt/styles.xml[327,99]: Error: unexpected attribute "fo:font-size"
n-bottom="0.21cm" fo:font-size="18pt"/> ----^ mwe2.odt/styles.xml[341,256]: Error: unexpected attribute "fo:font-size"
n-top="40pt" fo:margin-bottom="25pt" /> </style:style> ----^ mwe2.odt/styles.xml[343,292]: Error: unexpected attribute "fo:font-size"
d="false" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[346,207]: Error: unexpected attribute "fo:font-size"
in-top="12pt" fo:margin-bottom="9pt" /> </style:style> ----^ mwe2.odt/styles.xml[348,290]: Error: unexpected attribute "fo:font-size"
d="false" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[351,196]: Error: unexpected attribute "fo:font-size"
4pt" style:font-weight-complex="bold"/> </style:style> ----^ mwe2.odt/styles.xml[353,278]: Error: unexpected attribute "fo:font-size"
d="false" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[356,216]: Error: unexpected attribute "fo:font-size"
ic" style:font-weight-complex="bold" /> </style:style> ----^ mwe2.odt/styles.xml[358,297]: Error: unexpected attribute "fo:font-size"
d="false" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[361,301]: Error: unexpected attribute "fo:font-size"
="0cm" style:auto-text-indent="false"/> </style:style> ----^ mwe2.odt/styles.xml[363,383]: Error: unexpected attribute "fo:font-size"
d="false" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[366,193]: Error: unexpected attribute "fo:font-size"
75%" style:font-weight-complex="bold"/> </style:style> ----^ mwe2.odt/styles.xml[368,275]: Error: unexpected attribute "fo:font-size"
d="false" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[371,193]: Error: unexpected attribute "fo:font-size"
75%" style:font-weight-complex="bold"/> </style:style> ----^ mwe2.odt/styles.xml[373,275]: Error: unexpected attribute "fo:font-size"
d="false" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[376,193]: Error: unexpected attribute "fo:font-size"
75%" style:font-weight-complex="bold"/> </style:style> ----^ mwe2.odt/styles.xml[378,275]: Error: unexpected attribute "fo:font-size"
d="false" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[381,193]: Error: unexpected attribute "fo:font-size"
75%" style:font-weight-complex="bold"/> </style:style> ----^ mwe2.odt/styles.xml[383,275]: Error: unexpected attribute "fo:font-size"
d="false" style:writing-mode="rl-tb" /> </style:style> ----^ mwe2.odt/styles.xml[386,193]: Error: unexpected attribute "fo:font-size"
75%" style:font-weight-complex="bold"/> </style:style> ----^ mwe2.odt/styles.xml[388,193]: Error: unexpected attribute "fo:font-size"
75%" style:font-weight-complex="bold"/> </style:style> ----^ mwe2.odt/styles.xml: Info: 57 errors, no warnings
mwe2.odt/content.xml: Info: Adapting OpenDocument CD2 namspace'http://www.w3.org/2000/svg' (has been stored by old OOo versions)
mwe2.odt/content.xml: Info: Adapting OpenDocument CD2 namspace'http://www.w3.org/1999/XSL/Format' (has been stored by old OOo versions)
mwe2.odt/content.xml[6,98]: Error: unexpected attribute "style:editable"
left='0.25in' fo:margin-right='0.25in'> ----^ mwe2.odt/content.xml: Info: 1 errors, no warnings
mwe2.odt: Info: 58 errors, 1 warnings

Is this expected, or is it something wrong with my setup?

@michal-h21
Copy link
Owner

@hcf-n do you use up-to-date TeX Live? I've tried to compile your example and report from the validator is following:

sample.odt: Info: ODF version of root document: 1.0
sample.odt/META-INF/manifest.xml: Warning: The directory 'Pictures/' is not a sub-document and should not be listed in the 'META-INF/manifest.xml' file of ODF package 'sample.odt'!
internal:/schema/odf1.0/OpenDocument-manifest-schema-v1.0-os.rng: Info: parsed.
sample.odt/META-INF/manifest.xml: Info: no errors, 1 warnings
sample.odt/mimetype: Info: no errors, no warnings
sample.odt: Info: Media Type: application/vnd.oasis.opendocument.text
internal:/schema/odf1.0/OpenDocument-schema-v1.0-os.rng: Info: parsed.
sample.odt/meta.xml: Info: Generator: TeX4ht from sample.tex, options: xhtml,charset=utf-8,ooffice,html,refcaption (http://www.cse.ohio-state.edu/~gurari/TeX4ht/)
sample.odt/meta.xml: Info: no errors, no warnings
sample.odt/settings.xml: Info: no errors, no warnings
sample.odt/styles.xml: Info: no errors, no warnings
sample.odt/content.xml: Info: Adapting OpenDocument CD2 namspace'http://www.w3.org/1999/XSL/Format' (has been stored by old OOo versions)
sample.odt/content.xml: Info: Adapting OpenDocument CD2 namspace'http://www.w3.org/2000/svg' (has been stored by old OOo versions)
sample.odt/content.xml: Info: no errors, no warnings
sample.odt: Info: no errors, 1 warnings

@hcf-n
Copy link
Author

hcf-n commented Feb 8, 2021

When validating with 1.0 I get the same as you. But when i use 1.0 strict I get the errors in the last post. I also get errors validating for 1.1 strict, 1.2 or 1.3.

The problem for me is that the generated odt output from make4ht can't be parsed by pandoc and that LibreOffice sometimes crashes when saving the odt file as docx. That is why I looked in to the validity of the files trying to get an idea of why the odt file is problematic.

What do you think? Could a stricter validation help make the odt file more conformant with other tools?

@michal-h21
Copy link
Owner

I think that it is important that the ODT file is conformant with the version it declares. It is unfortunate that Pandoc doesn't show a more helpful error message, like what is the actual issue why it cannot parse the file.

@hcf-n
Copy link
Author

hcf-n commented Feb 8, 2021

Agree,
But should it validate to 1.0 strict?

@michal-h21
Copy link
Owner

I don't know. It would be quite difficult to fix all these issues. It is mainly that lots of attributes are non-valid in the strict mode. I've tried to modify the ODT file by hand, succeed in making it valid ODF 1.0. But Pandoc still cannot parse it, so there must be another issue.

@hcf-n
Copy link
Author

hcf-n commented Feb 8, 2021

Thank you for trying out the manual edit.

I can try to investigate further to problem as to why Pandoc can´t parse the odt output on the Pandoc side.

@michal-h21
Copy link
Owner

I wouldn't expect Pandoc faling just because of some spurious attributes. I expect that it fails because some attribute it needs is missing. It would be useful if they could investigate what is the problem.

@hcf-n
Copy link
Author

hcf-n commented Feb 8, 2021

They are making progress on this over at pandoc.
jgm/pandoc#7091 (comment)

@michal-h21
Copy link
Owner

Thanks, so the issue seems to be caused by <?xtpipes ?> instruction. It is inserted by TeX4ht, xtpipes is one program that is part of the conversion process. It fixes some common issues in the XML file. I can this instruction, it has no function after xtpipes is executed.

@hcf-n
Copy link
Author

hcf-n commented Feb 8, 2021

Seems so. Would you like to leave a comment that this problem will disappear over at the pandoc group yourself?

@michal-h21
Copy link
Owner

It should be fixed in the sources by the latest commit. Pandoc can convert the ODT file now.

@hcf-n
Copy link
Author

hcf-n commented Feb 8, 2021

I'll have to apologize for making a bigger problem out of the odt styles than there really was. I was having to problems. Convert with libreoffice was unstable (Which I now have solved by using soffice --convert-to) and pandoc wouldn't parse. I took that to be an indication of some kind of syntactical problem with the file. With a more stable environmont on my part I look forward to working on the odt conversion. I hope I still can bother you with my efforts the get a smooth transition from tex to docx (as requiered by publishers)

@michal-h21
Copy link
Owner

No problem, the XML instructions were unnecessary and it is not that hard to remove them. I was worried about the syntactical issues, because I find it quite difficult to find any information about ODF. So it is really good that this is not the case. I certainly welcome any feature requests and bug reports.

@hcf-n
Copy link
Author

hcf-n commented Feb 10, 2021

Is this commited to Tex Live? I still have files genereted with

@michal-h21
Copy link
Owner

No, I have some work in progress in Make4ht, so I want to update it when it is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants