A curated list of resources around PDF files
- PDF Association: PDF Specification Index, 2021.
- Jindrich Kubec, Jiri Sejtko: X is not enough! Grab the PDF by the tail! at Virus Bulletin, 2011.
- Selected compilation of PDF Standards from the Adobe Open Source Reference, 2022.
- PDF Reference 1.0
- PDF Reference 1.2
- PDF Reference 1.3
- PDF Reference 1.4
- PDF Reference 1.5 (v6)
- PDF Reference 1.6
- PDF Reference 1.7 (ISO 32000, 2008)
- PDF Reference 2.0 (ISO 32000-2:2020) (freely available ISO standard due to corporate sponsorship)
- Adobe: XMP Specification Part 3, January 2020.
- KOReader: a document viewer primarily aimed at e-ink readers
- react-native-pdf: a react native PDF view component
- PdfViewPager: Android widget to display PDF documents in your Activities or Fragments
- vue-pdf: vue.js pdf viewer
- pdftotext: an application that converts Portable Document Format (PDF) files to plain text. Part of poppler-utils.
- pdfminer.six: a Python library for extracting information from PDF documents
- pdfplumber: Plumb a PDF for detailed information about each text character, rectangle, and line. Plus: Table extraction and visual debugging.
- Tabula: an application for extracting tables
- camelot: PDF Table Extraction
- awesome-document-understanding: A curated list of resources for Document Understanding (DU) topic
Anything that can produce PDF files from scratch:
- fpdf2: An Open Source Python library for generating PDFs
- pdflatex (e.g. in TexLive): A LaTeX-to-PDF converter
- reportlab: An Open Source Python library for generating PDFs and graphics.
- prawn: a pure Ruby PDF generation library
- react-pdf: Create PDF files using React
- markdown-pdf: Markdown to PDF converter
- mpdf: PHP library generating PDF files from UTF-8 encoded HTML
Anything that's used to edit an existing PDF file:
- pdfarranger: a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using a graphical interface
- OCRmyPDF: adds an OCR text layer to scanned PDF files, allowing them to be searched
- Pdfalyzer: PDF analysis tool to visualize the internal data structure of a PDF in large and colorful diagrams as well as scanning the binary streams embedded in the PDF against a collection of malicious PDF specific YARA rules.
- Malicious PDF Generator: generate a bunch of malicious pdf files with phone-home functionality
- pdfbox: tool in java to browse internally a pdf. Download and use as
pdfbox-app-x.y.z.jar debug pdf_file
- pdftk: command-line tool for working with PDFs. It is commonly used for client-side scripting or server-side processing of PDFs.
- pypdf : a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
- pikepdf : a Python library for reading and writing PDF, powered by qpdf
- PyMuPDF : Python bindings to MuPDF.
- pypdfium2 : Python bindings to PDFium.
- borb : reading, creating and manipulating PDF files in python
- pdfcpu : batch processing and scripting via a rich command line
- pdf-lib : Create and modify PDF documents in any JavaScript environment
- HexaPDF: : A pure Ruby PDF creation and manipulation library