-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCR Overwrites digitally signed files #220
Comments
Hi @farhills and thanks for reporting this. Indeed I'm afraid you're right and this issue seems rather be related to As far as I understand, technically the tool cannot preserve a valid digital pdf signature since it changes the documents content which invalidates any signature. One way would be to tell If you're able to sign your documents via CLI, you could also try to chain the OCR workflow with the external command workflow |
Thanks, as I wrote the issue I realized it would be the underlying library that has to deal with this. My professional organization has teamed up with a very closed-source certificate authority, there's no CLI option for signing. The process is heavily locked down. I'll mark the issue as closed. If ocrmypdf adds a new switch '--skip-signed' or similar I'll open a new feature request here to tap into that functionality. Thanks! |
And just like that it's been fixed! OCRmyPDF, V14.4.0 and later will preserve digital signatures by default. Earlier versions clobber the signature without warning.
|
Thanks for letting us know! Sounds like we might want to introduce an additional switch for the digital signature behaviour. |
In my use case, digitally signed documents should never be changed, even if the document OCR is imperfect or incomplete. These files represent final outputs, and need to be retained unmodified. When OCR is complete, a new file is saved, so the digital signature is lost (opposed to editing a signed file where the signature is retained, but made invalid due to the edit). I would, at most, add the |
Some additional feedback - the app notifications need to be updated to catch and handle the no-output condition when processing a digitally signed file. IMO this can be done silently. Currently it throws an error in the browser and desktop client. CLI output for the same file:
|
Good catch, thanks for the hint. I think we need to properly recognize this situation and don't throw an error but instead logging an information for example. |
Hello, In my use case, most of the time I would not care about the original digital signature but do care about proper OCR. I do understand that an altered file cannot retain original signature and nonetheless want OCR. But I would not use force OCR because I do care not to destroy original (probably best) OCR. It would be great if it was an option like the Remove background option, because it perfectly make sense to accept possible deletion of digital signature in modes like skip text. |
Current implementation plan would be like the following:
|
please see my comments here |
Describe the bug
Files with a digital signature are being overwritten, deleting the digital seal (leaving just the image of the signature)
System
ocrmypdf
version: 14.1.0How to reproduce
Steps to reproduce the behavior:
Screenshots
Additional context
I've deleted the OCR rule for 'file modified', but in my typical workflow I print to PDF and immediately sign, so the files are captured in the queue and often don't get processed until after they've been signed.
It would be great if we could detect if a file is signed and skip it.
I've also commented on ocrmypdf #1040 as I recognize this issue may be more appropriately directed toward that project.
ocrmypdf/OCRmyPDF#1040
The text was updated successfully, but these errors were encountered: