You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When validating a PDF/A-1b file, we encountered this issue:
If a document information dictionary does appear at a document, then all of its entries that have analogous properties in predefined XMP schemas, shall also be embedded in the file in XMP form with equivalent values
In the XML there is no blank / nul bytes not allowed. For a discussion of the same issue please see the PDFBOX issue here: PDFBOX-2503.
Example files are there too.
I propose to add a trim within XMPChecker#checkCOSStringProperty.
Slightly related, the BFO library added a "workaround" for this too some years ago: javadoc
The text was updated successfully, but these errors were encountered:
Thanks for bringing this to our attention! I've double checked that indeed other PDF/A validators behave exactly in this way. So, we'll indeed fix our logic as well.
Please do not try to make veraPDF bug-compatible with broken software. Apparently, some PDF/A validators use inadequate means (wcscmp()?) for comparing strings, stopping comparison at the first NUL character. This has nothing to do with some characters not being representable in XML.
If the DocumentInformation meta data contains trailing NUL characters everything is fine. For all others the trailing characters as well as control characters within the are taken into account by Adobe Preflight as well as others and validated against the XMP entry.
From these tests IMHO we should only trim trailing NUL
@a20god please see the linked pdfbox issue - this is not about NUL between, but only at the end
A NUL at the end is just a special case of the general Adobe Acrobat breakage: Adobe Acrobat's Preflight thinks that my t3.pdf conforms to PDF/A-1b. However, the string in the Document Information Dictionary has two additional characters at the end, U+0000 and U+0041.
I propose to fix the PDF producer rather than breaking all existing PDF/A validators.
If you think that U+0000 is to be ignored at the end of strings in the Document Information Dictionary, please point to chapter and verse in any relevant standard.
If a string cannot be represented in XML, then that string cannot be used as the value of one of the entries in the Document Information Dictionary that must match the document metadata of conforming PDF/A-1b documents.
When validating a PDF/A-1b file, we encountered this issue:
If a document information dictionary does appear at a document, then all of its entries that have analogous properties in predefined XMP schemas, shall also be embedded in the file in XMP form with equivalent values
The producer contains extra zeroes at the end:
/Producer (\376\377\000A\000d\000o\000b\000e\000 \000P\000S\000L\000 \0001\000.\000
2\000e\000 \000f\000o\000r\000 \000C\000a\000n\000o\000n\000\000)
In the XML there is no blank / nul bytes not allowed. For a discussion of the same issue please see the PDFBOX issue here: PDFBOX-2503.
Example files are there too.
I propose to add a trim within XMPChecker#checkCOSStringProperty.
Slightly related, the BFO library added a "workaround" for this too some years ago: javadoc
The text was updated successfully, but these errors were encountered: