-
I'm having a hard time accomplishing what I thought would be a simple goal: remove a placeholder with pypdf then add a new signature field with pyHanko. When I remove the widget, it's gone from the annotations list. However, it appears to still remain in the fields list. When pyHanko checks for existing fields, it is iterating the fields list and not the annotations list, so it throws an exception because the field is already there in the fields list. How can I remove the reference to the field in the fields list? This is what my function is doing right now: # try using pypdf to remove all annotations and see if pyHanko still complains
pypdf_writer = PdfWriter(clone_from=OLD_FILE)
fields = pypdf_writer.get_fields()
for page in pypdf_writer.pages:
placeholders = []
for i, annot in enumerate(page[PageAttributes.ANNOTS]):
annot = annot.get_object()
if annot[AnnotationDictionaryAttributes.Subtype] == "/Widget":
try:
n = annot[FieldDictionaryAttributes.T]
if n.startswith('sig') or n.startswith('init'):
rect = annot['/Rect']
boxes[n] = {
"page": page.page_number,
"box": (rect[0], rect[1], rect[2], rect[3])
}
placeholders.append(i)
fields[n].remove_from_tree()
except KeyError as e:
pass
for i in placeholders[::-1]:
del page[PageAttributes.ANNOTS][i]
pypdf_writer.write(NEW_FILE) I tried using |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Annotations are stored in an array. Please refer to Pdf reference |
Beta Was this translation helpful? Give feedback.
I understand they are stored in an array called
/Annots
per page, but they also have references in another array for the whole document under/Root
.This additional removal is what I needed for the readers/importers that ignore page
/Annots
list and only look through the document root:writer._root_object['/AcroForm']['/Fields'].remove(annotation.indirect_reference)
More context on what triggered this issue for me:
MatthiasValvekens/pyHanko#430