Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support detection of digital signatures #2655

Closed
macdeport opened this issue May 19, 2024 · 14 comments
Closed

ENH: Support detection of digital signatures #2655

macdeport opened this issue May 19, 2024 · 14 comments
Labels
is-feature A feature request

Comments

@macdeport
Copy link

Explanation

IMHO pypdf deserves that the is_encrypted property be completed by is_signed (Digitally signed).

Code Example

How would your feature be used? (Remove this if it is not applicable.)

import pypdf

def is_signed(pdf_file_path):
    reader = pypdf.PdfReader(pdf_file_path)
    root = reader.trailer['/Root']; l_signed = False
    if acroform := root.get('/AcroForm'):
        if acroform and (sig := acroform.get('/SigFlags')):
            l_signed = bool(sig & 1)
    return l_signed
# inspired by the [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF/) code

# Use:
reader = PdfReader(pdf_file)

if reader.is_signed:
    print(f'{pdf_file!r} is digitaly signed...n')
@stefan6419846
Copy link
Collaborator

You are of course invited to propose a corresponding PR with tests and being compatible with Python 3.7 as well.

@macdeport
Copy link
Author

Thanks for the invitation. But I don't have the necessary know-how for this PR and am not currently available to learn by doing ;-)

For a personal application, which I use several times a day in production, I have on the other hand carried out tests (signed and unsigned files) of very reasonable reliability, but cannot communicate these files as private.

Do all the test files in the pypdf project contain signed files?

@pubpub-zz
Copy link
Collaborator

Encryption is not signauture. Also signature is a complex process for both signature and validation.
Also, the flag you're referecing just indicated that there is a signature field not indicating if signed or not

@pubpub-zz
Copy link
Collaborator

Do all the test files in the pypdf project contain signed files?

Currently we have not signed document

@macdeport
Copy link
Author

macdeport commented May 19, 2024

Encryption is not signature. Also signature is a complex process for both signature and validation. Also, the flag you're referencing just indicated that there is a signature field not indicating if signed or not

@pubpub-zz Your are right.

Preliminary statement: in PDF format, I'm just an unfortunate self-taught man with his limits... Sharing these limitations often allows us to go further together ;-)

So can you suggest a realistic scenario (or better still show a file) where such a field would be present without the PDF containing a valid signature, or one that was or will be valid?

This proposal simply concerns the detection of digital signatures and not, of course, their validation.

The naming can lead to confusion: is_signed can become has_signature.

Furthermore, the proposed algorithm, here adapted to the pypdf context, has been used by James Barlow for his two pikepdf and OCRmyPDF projects and their many users for years: isn't that a great realistic cohort of testers?

This heuristic is undoubtedly not perfect, but better (at this time) than others on the web.

@macdeport
Copy link
Author

macdeport commented May 19, 2024

Do all the test files in the pypdf project contain signed files?

Currently we have not signed document

The lack of test files (valid/invalid digital signature) will not help testing or short-term development.

So below 4 different signed documents (3 only with a valid signature) as test files:
la-grenouille-et-le-boeuf-vivant-signed-1.pdf
la-grenouille-et-le-boeuf-vivant-signed-2.pdf
la-grenouille-et-le-boeuf-vivant-signed-2-comp.pdf
la-grenouille-et-le-boeuf-vivant-signed_.pdf <= non valid

@MartinThoma
Copy link
Member

I vaguely remember that we decided against adding signature support as we wanted to focus on other parts + pyHanko exists: https://pypi.org/project/pyHanko/

@MartinThoma
Copy link
Member

@MartinThoma
Copy link
Member

Since we closed those issues, quite a lot of improvements were done. I would be open to discuss this topic again. We would need to clarify:

  1. Scope: We need to define which capabilities we want / we feel confident to be able to maintain. What would be the value for the user? Do we have evidence that people look for this (especially when we say that we only support signatures partially this is questionable)
  2. Examples for testing: Having a few example files we can test with - thank you, I'll add those to https://github.com/py-pdf/sample-files/ if that is fine to you @macdeport
  3. PR: We would need to find somebody who is willing to create a PR. @macdeport , it's fine to not be an expert :-) Nobody here does PDF as their main job. And if you need help with finding where something fits in pypdf, I'm confident people will help :-)

@MartinThoma MartinThoma changed the title ENH: proposal for a new R/O property is_signed PdfReader/PdfWrite ENH: Support detection of digital signatures May 20, 2024
@MartinThoma MartinThoma added the is-feature A feature request label May 20, 2024
@stefan6419846
Copy link
Collaborator

I personally do not think that we should really start with the signature stuff. We already have the pyHanko project which does this completely fine (although on an older copy of pypdf) - why should we re-implement this here while just increasing the maintenance load?

@pubpub-zz
Copy link
Collaborator

As said earlier, signing is quite tough : In order to be efficient the solution needs to apply work with PKCS / PKI. I have not used yet pyHanko. It looks quite good, although I see 2 points : It stills uses pyPDF2 and uses some old interfaces (e.g. PdfFileReader) which prevents a proper work with some incremental PDFs (I've started to work on the PR which has been proposed although the work is quite tough). Maybe we should start to exchange with pyHanko to make it compatible with the latest pypdf?

@stefan6419846
Copy link
Collaborator

According to MatthiasValvekens/pyHanko#127 and MatthiasValvekens/pyHanko#335 (comment), there are quite some changes which make it hard to update pypdf to the latest version on pyHanko. pyHanko itself embeds an own, heavily modified copy of some older PyPDF2 version. Upgrading will most likely be a large and tedious task, while there are more important aspects to work on in my opinion.

@macdeport
Copy link
Author

#2655 (comment)
@MartinThoma Thank you for your kind and careful summary.

These PDFs have been produced especially for the test base of this project, which I appreciate both for its didactic contribution and as a user of the functions developed. Thank you to all those who are helping to bring it to life.

@pubpub-zz
Copy link
Collaborator

based on exchanges I convert this thread into a discussion

@py-pdf py-pdf locked and limited conversation to collaborators May 26, 2024
@pubpub-zz pubpub-zz converted this issue into discussion #2678 May 26, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
is-feature A feature request
Projects
None yet
Development

No branches or pull requests

4 participants