Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop on empty input #16

Open
jsonn opened this issue Jan 16, 2011 · 6 comments
Open

Infinite loop on empty input #16

jsonn opened this issue Jan 16, 2011 · 6 comments

Comments

@jsonn
Copy link

jsonn commented Jan 16, 2011

Create an empty StringIO and call the pdf reader on it. It will loop in the readNextEndLine calls before the %%EOF check in read.

@tongwang
Copy link

tongwang commented May 4, 2012

It enters infinite loop for single-line text files and some other files too.

@alexgarel
Copy link

got this bug too !

@alexgarel
Copy link

Proposed patch

diff --git a/pyPdf/pdf.py b/pyPdf/pdf.py
index bf60d01..586ea81 100644
--- a/pyPdf/pdf.py
+++ b/pyPdf/pdf.py
@@ -701,7 +701,7 @@ class PdfFileReader(object):
         # start at the end:
         stream.seek(-1, 2)
         line = ''
-        while not line:
+        while not line and stream.tell():
             line = self.readNextEndLine(stream)
         if line[:5] != "%%EOF":
             raise utils.PdfReadError, "EOF marker not found"

Without patch::

    >>> import pyPdf
    >>> from cStringIO import StringIO
    >>> c = StringIO('')
    >>> pdf = pyPdf.PdfFileReader(c)
    --- Infinite loop ---
    ^CTraceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__
        self.read(stream)
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 705, in read
        line = self.readNextEndLine(stream)
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 870, in readNextEndLine
        line = x + line
    KeyboardInterrupt

With patch::

    >>> import pyPdf
    >>> from cStringIO import StringIO
    >>> c = StringIO('')
    >>> pdf = pyPdf.PdfFileReader(c)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__
        self.read(stream)
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 707, in read
        raise utils.PdfReadError, "EOF marker not found"
    pyPdf.utils.PdfReadError: EOF marker not found

@alexgarel
Copy link

Hum a better patch:

--- a/pyPdf/pdf.py
+++ b/pyPdf/pdf.py
@@ -701,7 +701,7 @@ class PdfFileReader(object):
         # start at the end:
         stream.seek(-1, 2)
         line = ''
-        while not line:
+        while not line and stream.tell():
             line = self.readNextEndLine(stream)
         if line[:5] != "%%EOF":
             raise utils.PdfReadError, "EOF marker not found"
@@ -857,7 +857,7 @@ class PdfFileReader(object):

     def readNextEndLine(self, stream):
         line = ""
-        while True:
+        while stream.tell():
             x = stream.read(1)
             stream.seek(-2, 1)
             if x == '\n' or x == '\r':

This one work with empty stream but also one line stream:

>>> import pyPdf
>>> from cStringIO import StringIO
>>> c = StringIO('  ')
>>> pdf = pyPdf.PdfFileReader(c)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__
    self.read(stream)
  File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 707, in read
    raise utils.PdfReadError, "EOF marker not found"
pyPdf.utils.PdfReadError: EOF marker not found

@jsonn
Copy link
Author

jsonn commented Jun 18, 2012

The second chunk is not really going to work...

@alexgarel
Copy link

sorry, corrected :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants