Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: Images rendered from PDF pages are hard to read and sometimes have issues #1792

Closed
mcantelon opened this issue Apr 2, 2024 · 1 comment · Fixed by #1793
Closed
Labels
Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result.
Milestone

Comments

@mcantelon
Copy link
Member

mcantelon commented Apr 2, 2024

Current Behavior

Steps to reproduce the behavior

Here are the steps needs to enable the setting, upload a PDF, then navigate to the corresponding image of a PDF page:

  1. Navigate to settings/uploads.
  2. Change the "Upload multi-page files as multiple descriptions" setting to "Yes".
  3. Click the "Save" button.
  4. Navigate to informationobject/addand add an information object.
  5. Once you've created the information object then, for the "More" select menu, select "Link Digital Object".
  6. Click "Choose file", select a multi-page PDF file, then click the "Create" button.
  7. Click the preview image for one of the PDF's pages to navigate to that page's corresponding information object.
  8. Click on the PDF page's representation image.
  9. The full PDF page image will be displayed.

Example PDFs, provided by Dan, that don't render optimally:

Expected Behavior

Rendered pages should be legible.

Possible Solution

Argument sent to invocation of the "convert" tool can fix.

Context and Notes

AtoM has a setting that allows PDFs, uploaded as digital objects to information objects, to be "exploded" into child information objects, for each of the PDF's pages, with each information object having a digital object attached that's an image rendered from the PDF page.

Version used

2.8.2 - 193

Operating System and version

Ubuntu 20.04

Default installation culture

en

PHP version

PHP 7.4

Contact details

[email protected]

@mcantelon mcantelon added the Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result. label Apr 2, 2024
mcantelon added a commit that referenced this issue Apr 2, 2024
Added new CLI options to command used to extract images of PDF pages.
Added "-density 300" to increase image detail and "-alpha remove" to
fix issue where the alpha channel is rendered as black and causes
images to be illegible.
mcantelon added a commit that referenced this issue Apr 3, 2024
Fixed issue with the digital objects regeneration task
(digitalobject:regen-derivatives) deleting, but not regenerating,
digital objects representing PDF pages.

Removed unneeded and unused digital object class method.
mcantelon added a commit that referenced this issue Apr 3, 2024
Added new CLI options to command used to extract images of PDF pages.
Added "-density 300" to increase image detail and "-alpha remove" to
fix issue where the alpha channel is rendered as black and causes
images to be illegible.
@mcantelon
Copy link
Member Author

Merged PR to fix this.

@anvit anvit linked a pull request May 16, 2024 that will close this issue
@anvit anvit added this to the 2.8.2 milestone May 16, 2024
@anvit anvit closed this as completed May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants