Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft Reporting Service workaround #23

Open
ghost opened this issue Feb 28, 2011 · 1 comment
Open

Microsoft Reporting Service workaround #23

ghost opened this issue Feb 28, 2011 · 1 comment

Comments

@ghost
Copy link

ghost commented Feb 28, 2011

hey folks :)

on some files generated by Microsoft Reporting Service i get one of the following errors using this script:


from pyPdf import PdfFileWriter, PdfFileReader

output = PdfFileWriter()
input1 = PdfFileReader(file("infile.pdf", "rb"))

output.addPage(input1.getPage(0))

outputStream = file("outfile.pdf", "wb")

output.write(outputStream)

Traceback (most recent call last):
File "/backup/print/municipality stara zagora/110228/Aitos_1/test.py", line 20, in
output.write(outputStream)
.....
File "/usr/local/lib/python2.6/site-packages/pyPdf/generic.py", line 232, in readFromStream
return NumberObject(name)
ValueError: invalid literal for int() with base 10: ''

or using another approach (loading pages in array and then saving them):

Traceback (most recent call last):
File "/backup/print/municipality stara zagora/110228/municipality stara zagora pdf combine 110228 start.py", line 60, in
outpdf.write(outfile)
.....
File "/usr/local/lib/python2.6/site-packages/pyPdf/pdf.py", line 545, in getObject
self.stream.seek(start, 0)
ValueError: I/O operation on closed file

where the file is (of course) not closed

i workaround it resaving the file using pdftk like this:


from pyPdf import PdfFileWriter, PdfFileReader

import shlex, subprocess
pdftkcommand = 'pdftk infile.pdf cat output fixed_infile.pdf'
args = shlex.split(pdftkcommand)
subprocess.call(args)

output = PdfFileWriter()
input1 = PdfFileReader(file("fixed_infile.pdf", "rb"))

output.addPage(input1.getPage(0))

outputStream = file("outfile.pdf", "wb")

output.write(outputStream)

but only when using last pdftk version (1.44 - 1.41 produces blank pdf) - i guess this is what pdftk guys have fixed:
1.43 - September 30, 2010
Fixed a stream parsing bug that was causing page content to disappear after merge of PDFs generated by Microsoft Reporting Services PDF Rendering Extension 10.0.0.0.

unfortunately i can't provide the broken file as contents are confidential

hope this helps :)

georgi

@ghost
Copy link
Author

ghost commented Feb 28, 2011

i don't know why the formatting broke - i copy-pasted pure text :( also i can provide the full traceback if needed

@johnwhitington
Copy link

I just put a workaround into CamlPDF to fix the same problem.

The malformity is that the streams in files produced by Microsoft Reporting Services put a space character immediately after the 'stream' keyword (before the CR / LF).

The solution is, after reading the stream keyword, to consume all whitespace-characters-other-than-cr-or-lf before looking for the newline as normal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant