Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to keep source file time, date, metadata.... etc for Target File? #1003

Closed
limopc opened this issue Aug 5, 2022 · 4 comments
Closed

Comments

@limopc
Copy link

limopc commented Aug 5, 2022

Describe the bug
when I use ocrmypdf, the output file has a new date and time.
The created file has the current modification date and time.
I would like to keep it as the original.
To Reproduce
normal ocrmypdf command creates the new file with modification date now not the original of the source file.
Expected behavior
What did you expected to happen?

Screenshots
If applicable, add screenshots to help explain your problem.

System (please complete the following information):

  • OS: EndeavourOS
  • Python version:
  • OCRmyPDF version:

Installation
How did you install OCRmyPDF? Did you install it from your operating system's
package manager, or using pip?

Additional context
OcrMyPDF I assume is the latest as EOS is a rolling release. I just installed it August 4.
I will appreciate any help about keeping the modified date as source.

@jbarlow83
Copy link
Collaborator

ocrmypdf updates the modification timestamp inside the PDF, since the PDF specification states that a compliant PDF writer should update it when the PDF is modified. Your computer's filesystem also updates timestamps when a file is modified. This is all standard behavior, and doing anything else would violate norms and expectations that other users and applications rely on.

I understand you require or expect ocrmypdf to work differently. You could, for example by using a script that changes the modification time to that of the input file after ocrmypdf creates its output. You could also use pikepdf to edit the timestamp metadata inside the pdf.

@jbarlow83 jbarlow83 closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2022
@ferdiga
Copy link

ferdiga commented Jul 9, 2024

Hi, I came along with the same issue.
It seems to me crucial to not loose this information.
A possible solution could be to optionally add the original timestamp to the file name using 2 parameters:

  • prepend/append
  • timestamp format

BTW - I have seen many organisations (including ours) to include a convenient date in the file name, exactly due to the fact, that many apps alter the saved timestamp and consequently it is impossible to simply sort and see the time sequence needed by the organisation.
We prepend YYYYMMDD to each file, usually the date when the file was created. especially useful for PDFs.

This would be compliant to the specs

BTW - I find ocrmypdf very usefull.

@ferdiga
Copy link

ferdiga commented Jul 9, 2024

in respect to the discussion above "OCR Overwrites digitally signed files"
the date should always be appended to keep the filename similar to the original

@jbarlow83
Copy link
Collaborator

Third party software should implement its own renaming behavior and other such policies, and pass that to ocrmypdf as the desired output filename.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants