Skip to content

LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance where needed). The system is open-source and provides a simple baseline function for extracting text from primary research articles using rules that developers can customize. This means that the system works qu…

License

Notifications You must be signed in to change notification settings

denkbares/lapdftext

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lapdftext

This is a fork of lapdftext.

We aim to integrate new classifiers (e.g. based on d3web) and parsers (e.g. based on OmniPage XML).

When executing lapdftext from the IDE, please check the conformance of your Run Configuration.

About

LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance where needed). The system is open-source and provides a simple baseline function for extracting text from primary research articles using rules that developers can customize. This means that the system works qu…

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • XSLT 43.3%
  • Java 37.8%
  • HTML 16.4%
  • XProc 2.2%
  • CSS 0.3%