Awesome PDF

A curated list of resources around PDF files

The File Format

PDF Association: PDF Specification Index, 2021.
Jindrich Kubec, Jiri Sejtko: X is not enough! Grab the PDF by the tail! at Virus Bulletin, 2011.
Selected compilation of PDF Standards from the Adobe Open Source Reference, 2022.
1. PDF Reference 1.0
2. PDF Reference 1.2
3. PDF Reference 1.3
4. PDF Reference 1.4
5. PDF Reference 1.5 (v6)
6. PDF Reference 1.6
7. PDF Reference 1.7 (ISO 32000, 2008)
8. PDF Reference 2.0 (ISO 32000-2:2020) (freely available ISO standard due to corporate sponsorship)
Adobe: XMP Specification Part 3, January 2020.

Viewers

KOReader: a document viewer primarily aimed at e-ink readers
react-native-pdf: a react native PDF view component
PdfViewPager: Android widget to display PDF documents in your Activities or Fragments
vue-pdf: vue.js pdf viewer

Data Extraction

pdftotext: an application that converts Portable Document Format (PDF) files to plain text. Part of poppler-utils.
pdfminer.six: a Python library for extracting information from PDF documents
- pdfplumber: Plumb a PDF for detailed information about each text character, rectangle, and line. Plus: Table extraction and visual debugging.
Tabula: an application for extracting tables
camelot: PDF Table Extraction
awesome-document-understanding: A curated list of resources for Document Understanding (DU) topic

Generators

Anything that can produce PDF files from scratch:

fpdf2: An Open Source Python library for generating PDFs
pdflatex (e.g. in TexLive): A LaTeX-to-PDF converter
reportlab: An Open Source Python library for generating PDFs and graphics.
prawn: a pure Ruby PDF generation library
react-pdf: Create PDF files using React
markdown-pdf: Markdown to PDF converter
mpdf: PHP library generating PDF files from UTF-8 encoded HTML

Manipulators

Anything that's used to edit an existing PDF file:

pdfarranger: a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using a graphical interface
OCRmyPDF: adds an OCR text layer to scanned PDF files, allowing them to be searched

File Analysis / Security

Pdfalyzer: PDF analysis tool to visualize the internal data structure of a PDF in large and colorful diagrams as well as scanning the binary streams embedded in the PDF against a collection of malicious PDF specific YARA rules.
Malicious PDF Generator: generate a bunch of malicious pdf files with phone-home functionality
pdfbox: tool in java to browse internally a pdf. Download and use as pdfbox-app-x.y.z.jar debug pdf_file

Multi-Purpose Libraries

pdftk: command-line tool for working with PDFs. It is commonly used for client-side scripting or server-side processing of PDFs.
pypdf : a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
pikepdf : a Python library for reading and writing PDF, powered by qpdf
PyMuPDF : Python bindings to MuPDF.
pypdfium2 : Python bindings to PDFium.
borb : reading, creating and manipulating PDF files in python
pdfcpu : batch processing and scripting via a rich command line
pdf-lib : Create and modify PDF documents in any JavaScript environment
HexaPDF: : A pure Ruby PDF creation and manipulation library

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RULES.md		RULES.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome PDF

The File Format

Viewers

Data Extraction

Generators

Manipulators

File Analysis / Security

Multi-Purpose Libraries

About

Releases

Packages

Contributors 6

License

py-pdf/awesome-pdf

Folders and files

Latest commit

History

Repository files navigation

Awesome PDF

The File Format

Viewers

Data Extraction

Generators

Manipulators

File Analysis / Security

Multi-Purpose Libraries

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Packages