Skip to content

ivy-rew/booksAlive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Books alive

Restore and conservate great pieces of literature

Restore german books with fractured fonts

Google books contains some books that are no longer available anywhere else. Though it might be great literature that had already been popular decades ago.

Unfortunately the used aproach to detect written text on the scanned books seems to have been totally unaware of old german fractured fonts. Therefore available textual representations do not contain much more than a weird mix of letters enriched with unrecognizeable characters.

Sample

The book "Peterli am Lift" from "Niklaus Bolt" has been scanned by google. While the scanned PDF is of good quality, the generated textual representations, such as the EPUB, are poor.

More ?

Search

Conversion Scripts

Here's a script that hoists the treasure confined within this digital PDF images.

The script relies on popular PDF tooling (poppler) and an OCR scanner (tesseract) with its extensions for german fractures. This set of free available software is able to bring the classic to life.

Scripts were crafted and designed to run on a debian based operating system such as Ubuntu (tested with Linux Mint 18).

wget https://archive.org/download/peterliamlift00boltgoog/peterliamlift00boltgoog.pdf
./revealLetters.sh peterliamlift00boltgoog.pdf

About

Restore and conservate great pieces of literature

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages