nul byte difference between xmp and info dictionary #1017

beat2 · 2019-02-07T16:43:35Z

When validating a PDF/A-1b file, we encountered this issue:

If a document information dictionary does appear at a document, then all of its entries that have analogous properties in predefined XMP schemas, shall also be embedded in the file in XMP form with equivalent values

The producer contains extra zeroes at the end:

/Producer (\376\377\000A\000d\000o\000b\000e\000 \000P\000S\000L\000 \0001\000.\000
2\000e\000 \000f\000o\000r\000 \000C\000a\000n\000o\000n\000\000)

In the XML there is no blank / nul bytes not allowed. For a discussion of the same issue please see the PDFBOX issue here: PDFBOX-2503.
Example files are there too.

I propose to add a trim within XMPChecker#checkCOSStringProperty.

Slightly related, the BFO library added a "workaround" for this too some years ago: javadoc

bdoubrov · 2019-02-08T13:15:14Z

Thanks for bringing this to our attention! I've double checked that indeed other PDF/A validators behave exactly in this way. So, we'll indeed fix our logic as well.

a20god · 2019-02-11T09:25:18Z

Please do not try to make veraPDF bug-compatible with broken software. Apparently, some PDF/A validators use inadequate means (wcscmp()?) for comparing strings, stopping comparison at the first NUL character. This has nothing to do with some characters not being representable in XML.

Examples:
t3.pdf
t5.pdf

beat2 · 2019-02-14T10:29:07Z

@a20god please see the linked pdfbox issue - this is not about NUL between, but only at the end

quote:

msahyoun Maruan Sahyoun added a comment - 18/Nov/14 10:48 - edited

If the DocumentInformation meta data contains trailing NUL characters everything is fine. For all others the trailing characters as well as control characters within the are taken into account by Adobe Preflight as well as others and validated against the XMP entry.

From these tests IMHO we should only trim trailing NUL

a20god · 2019-02-14T15:32:56Z

@a20god please see the linked pdfbox issue - this is not about NUL between, but only at the end

A NUL at the end is just a special case of the general Adobe Acrobat breakage: Adobe Acrobat's Preflight thinks that my t3.pdf conforms to PDF/A-1b. However, the string in the Document Information Dictionary has two additional characters at the end, U+0000 and U+0041.

I propose to fix the PDF producer rather than breaking all existing PDF/A validators.

If you think that U+0000 is to be ignored at the end of strings in the Document Information Dictionary, please point to chapter and verse in any relevant standard.

If a string cannot be represented in XML, then that string cannot be used as the value of one of the entries in the Document Information Dictionary that must match the document metadata of conforming PDF/A-1b documents.

bdoubrov assigned BezrukovM Feb 8, 2019

bdoubrov added bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release labels Feb 13, 2019

bdoubrov added this to the v1.14-m4 milestone Feb 13, 2019

BezrukovM mentioned this issue Apr 3, 2019

FIX #1017 veraPDF/veraPDF-validation#278

Merged

carlwilson closed this as completed in veraPDF/veraPDF-validation#278 Apr 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nul byte difference between xmp and info dictionary #1017

nul byte difference between xmp and info dictionary #1017

beat2 commented Feb 7, 2019 •

edited

Loading

bdoubrov commented Feb 8, 2019

a20god commented Feb 11, 2019 •

edited

Loading

beat2 commented Feb 14, 2019

a20god commented Feb 14, 2019 •

edited

Loading

nul byte difference between xmp and info dictionary #1017

nul byte difference between xmp and info dictionary #1017

Comments

beat2 commented Feb 7, 2019 • edited Loading

bdoubrov commented Feb 8, 2019

a20god commented Feb 11, 2019 • edited Loading

beat2 commented Feb 14, 2019

a20god commented Feb 14, 2019 • edited Loading

beat2 commented Feb 7, 2019 •

edited

Loading

a20god commented Feb 11, 2019 •

edited

Loading

a20god commented Feb 14, 2019 •

edited

Loading