Skip to content

Releases: VikParuchuri/surya

OCR v2

16 Aug 17:40
8d5affa
Compare
Choose a tag to compare

A new version of the OCR model with a custom architecture.

  • 20% faster
  • Automatic language detection, with support for optional language hints
  • Better accuracy on old/noisy documents
  • Basic english handwriting support (to be improved soon)

Faster text detection + layout

12 Jul 16:06
03b859e
Compare
Choose a tag to compare

Switched model architecture for the text detection and layout models:

  • 30% faster on GPU
  • 4x faster on CPU
  • 12x faster on MPS (M series macs)

Accuracy should be about the same, or slightly better, from my benchmarks.

v0.4.14: Merge pull request #141 from VikParuchuri/dev

30 Jun 14:39
f7c6c04
Compare
Choose a tag to compare

New transformers version added a new kwarg to donut embeddings. This now handles and ignores that kwarg, and also slightly future-proofs in case this happens again.

Minor bugfixes

28 May 21:44
c5f5e77
Compare
Choose a tag to compare
  • Fix rotation and copy bugs

Fix image bugs

28 May 21:16
53135d0
Compare
Choose a tag to compare
  • Fix bugs with RGBA images
  • Fix assert bug
  • Add back in thumbnail method for resizing
  • Slightly optimize segformer code

Change image resize

28 May 02:55
d167369
Compare
Choose a tag to compare
  • Image resize from cv2 to PIL - cv2 caused benchmark regressions

OCR speedups

27 May 21:56
31e36e7
Compare
Choose a tag to compare
  • Speed up base OCR model ~15-20%, and reduce memory usage by ~25% (can do higher batch sizes)
  • Add static cache for compilation - torch.compile will result in another 15% speedup
  • Other optimizations, like faster image resizing
  • Bugfixes, like enabling different length language inputs for OCR (batching different docs with different languages together)

Processor improvements

23 May 23:12
80889bd
Compare
Choose a tag to compare
  • Remove unneeded format conversions
  • Fix bug in OCR, where only one color channel was used for OCR - results should be better now
  • Speed up layout/text detection a bit

OCR speedup

18 May 04:03
74e8c0c
Compare
Choose a tag to compare

Cut OCR time in half. Combined with the previous release, OCR should now take about 40% as much time as it did before.

Significant speedup for layout, line detection

17 May 22:04
7a65c45
Compare
Choose a tag to compare
  • Improve CPU postprocessing for line detection and layout - cut postprocessing time to 1/3 of original
  • Unpin transformers version after investigating model performance

This should result in an ~2x speedup for layout and text detection. The effect will be most noticeable on GPU. I haven't fully benchmarked, though.