Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result xml when reading from stdin is different from invocation with file name #1014

Closed
dschulten opened this issue Jan 19, 2019 · 1 comment · Fixed by veraPDF/veraPDF-apps#273
Assignees
Labels
bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release
Milestone

Comments

@dschulten
Copy link

dschulten commented Jan 19, 2019

I invoke verapdf as a serverless openfaas function, and for that, the validation from stdin is very useful since I do not have to write a temp file inside my docker container before I can call the verapdf cli. I have to filter out the textual help message before the xml output (which is a bit silly for a tool processing stdin, but I can imagine why you did that).

BUT: If I invoke verapdf and pass the pdf via stdin, the resulting xml is fundamentally different from the documented format at http://docs.verapdf.org/cli/validation/#auto-profile. The documentation says the result xml contains a validationReport element whereas the result of a stdin invocation contains a validationResult element and also aside from that it looks very different.

I tried --format mrr (see below) and also the formats xml and text, they all yield the same result when using stdin.

When calling with a filename (not stdin), --format xml gives me a larger xml which documents the used configuration and appears to contain the output of the stdin call as a subtree as well.

If the behaviour of the stdin invocation is not already documented somewhere, can it be documented?
If possible, can the stdin invocation support --format mrr and (with that option) yield the same result as the invocation which passes a file name?

Making mrr the default would break the current interface and should not be done lightly, but maybe that is something to consider nevertheless. It is quite confusing that the invocation via stdin behaves differently than an invocation with file name.

Below you see how mrr appears to be ignored when stdin is used:

ds@ds-Nitro-AN515-42:~/verapdf$ cat corpus/veraPDF-corpus-staging/PDF_A-1b/6.6\ Actions/6.6.1\ General/veraPDF\ test\ suite\ 6-6-1-t01-fail-a.pdf | ./verapdf --format mrr
veraPDF is processing STDIN and is expecting an EOF marker.
If this isn't your intention you can terminate by typing an EOF equivalent:
 - Linux or Mac users should type CTRL-D
 - Windows users should type CTRL-Z
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<processorResult xmlns:ns2="http://www.verapdf.org/ValidationProfile" isPdf="true" isEncryptedPdf="false">
    <itemDetails size="-1">
        <name>STDIN</name>
    </itemDetails>
    <validationResult flavour="PDFA_1_B" totalAssertions="363" isCompliant="false">
        <ns2:profileDetails creator="veraPDF Consortium" created="2017-09-06T15:12:20.277+02:00">
            <ns2:name>PDF/A-1B validation profile</ns2:name>
            <ns2:description>Validation rules against ISO 19005-1:2005, Cor.1:2007 and Cor.2:2011, Level B</ns2:description>
        </ns2:profileDetails>
        <ns2:assertions>
            <ns2:assertion ordinal="363" status="FAILED">
                <ns2:ruleId specification="ISO_19005_1" clause="6.6.1" testNumber="1"/>
                <ns2:message>The Launch, Sound, Movie, ResetForm, ImportData and JavaScript actions shall not be permitted. 
                        Additionally, the deprecated set-state and no-op actions shall not be permitted. The Hide action shall not be permitted (Corrigendum 2)</ns2:message>
                <ns2:location>
                    <ns2:level>CosDocument</ns2:level>
                    <ns2:context>root/document[0]/OpenAction[0](5 0 obj PDAction)</ns2:context>
                </ns2:location>
            </ns2:assertion>
        </ns2:assertions>
    </validationResult>
    <fixerResult status="NO_ACTION">
        <ns2:appliedFixes/>
    </fixerResult>
    <featuresReport/>
    <taskResult>
        <taskResult type="VALIDATE" isExecuted="true" isSuccess="true">
            <duration start="1547889097054" finish="1547889097643">00:00:00.589</duration>
        </taskResult>
    </taskResult>
</processorResult>
@bdoubrov
Copy link
Contributor

For historical reasons the processing of an input stream is indeed different from the (batch) processing of a folder or a single PDF path. We'll fix this to make sure the input stream is validated as if it was a single local PDF file.

@bdoubrov bdoubrov added bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release labels Feb 13, 2019
@bdoubrov bdoubrov added this to the v1.14-m4 milestone Feb 13, 2019
This was referenced Mar 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants