Skip to content

A script which makes use of GNU parallel to transform TIFF files into a PDF and/or a DJVU document.

License

Notifications You must be signed in to change notification settings

d0b3rm4n/ptiff2doc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ptiff2doc

ptiff2doc is shell script which puts tiff files from a folder, together into a PDF and/or DJVU file. It's assumed that the tiff files are pre-processed with a tool like ScanTailor. A hidden text layer is added to the document PDF/DJVU, generated with tesseract (OCR). This allows the PDF/DJVU file to be searchable (aka sandwichpdf).

ptiff2doc makes use of parallel to process the tiff files in parallel and make use of several CPU cores. ptiff2doc is very resource hungry (CPU and disk space). Expect about twice the size the folder with the tiff files to be used for temporary processed files (they get removed when the script finishes). The temporary folder is created in the current working directory (cwd).

If you need more control over the created PDF/DJVU document, it's recommended to use gscan2pdf.

ptiff2doc depends on many external tools (see below), for convenience the needed packages to be installed in Fedora:

dnf install parallel libtiff-tools tesseract netpbm-progs djvulibre \
poppler-utils perl-Log-Log4perl gscan2pdf perl-File-Slurp perl-File-Temp \
perl-PDF-API2 perl-Getopt-Long perl-Encode perl-Encode-Locale perl-TimeDate

Usage

    ./ptiff2doc.sh [OPTIONS] [FOLDER WITH TIFF FILES]

    [FOLDER WITH TIFF FILES]
        a folder with .tif files, if folder is ommited
        the current working directory (cwd) is used.

Options [default value]:
    -h | --help          This help
    -b | --docname       The basename of the output document [book]
    -d | --dpi           DPI setting for c44 [300]
    -j | --djvu          Create .djvu
    -p | --pdf           Create .pdf
    -a | --author        Author to be set in .pdf/.djvu
    -t | --title         Title to be set in .pdf/.djvu
    -l | --language      Language setting for tesseract [deu]
                         See 'tesseract --list-langs' for supported languages
                         deu = German
                         eng = English
                         fin = Finnish
                         for mixed language documents 'deu+eng' is also possible

Needed tools

Needed Perl Libraries

About

A script which makes use of GNU parallel to transform TIFF files into a PDF and/or a DJVU document.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published