Skip to content

Commit

Permalink
added README note on poppler install and better error handling for po…
Browse files Browse the repository at this point in the history
…ppler not found
  • Loading branch information
grantbuster committed Nov 20, 2023
1 parent 4e4ac93 commit 0e95088
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 5 deletions.
19 changes: 14 additions & 5 deletions elm/pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -254,12 +254,21 @@ def clean_poppler(self, layout=True):
if not os.path.exists(os.path.dirname(fp_out)):
os.makedirs(os.path.dirname(fp_out), exist_ok=True)

stdout = subprocess.run(args, check=True, stdout=subprocess.PIPE)
if stdout.returncode != 0:
msg = ('Poppler raised return code {}: {}'
.format(stdout.returncode, stdout))
try:
stdout = subprocess.run(args, check=True,
stdout=subprocess.PIPE)
if stdout.returncode != 0:
msg = ('Poppler raised return code {}: {}'
.format(stdout.returncode, stdout))
logger.exception(msg)
raise RuntimeError(msg)
except Exception as e:
msg = ('PDF cleaning with poppler failed! This usually '
'because you have not installed the poppler utility '
'(see https://poppler.freedesktop.org/). '
f'Full error: {e}')
logger.exception(msg)
raise RuntimeError(msg)
raise RuntimeError(msg) from e

with open(fp_out, 'r') as f:
clean_txt = f.read()
Expand Down
2 changes: 2 additions & 0 deletions examples/energy_wizard/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ corpus.

Notes:

- In this example, we use the optional `popper <https://poppler.freedesktop.org/>`_ PDF utility which you will have to install separately. You can also use the python-native ``PyPDF2`` package when calling using ``elm.pdf.PDFtoTXT`` but we have found that poppler works better.

- Streamlit is required to run this app, which is not an explicit requirement of this repo (``pip install streamlit``)

- You need to set up your own OpenAI or Azure-OpenAI API keys to run the scripts.
Expand Down

0 comments on commit 0e95088

Please sign in to comment.