Skip to content

Navigation Menu

Explore
By size
By industry
By use case
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

VikParuchuri / surya Public

Notifications You must be signed in to change notification settings
Fork 644
Star 9.9k

Code
Issues 68
Pull requests 9
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: VikParuchuri/surya

Releases · VikParuchuri/surya

OCR v2

16 Aug 17:40

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

OCR v2 Latest

Latest

A new version of the OCR model with a custom architecture.

20% faster
Automatic language detection, with support for optional language hints
Better accuracy on old/noisy documents
Basic english handwriting support (to be improved soon)

Assets 2

Loading

israelsaba, m7mdhka, 596050, lithium0003, driscoll42, moritzwilksch, marquaye, jamesfeigenbaum, kelechi-c, socratic-irony, and 10 more reacted with rocket emoji

All reactions

🚀 20 reactions

20 people reacted

Faster text detection + layout

12 Jul 16:06

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Faster text detection + layout

Switched model architecture for the text detection and layout models:

30% faster on GPU
4x faster on CPU
12x faster on MPS (M series macs)

Accuracy should be about the same, or slightly better, from my benchmarks.

Assets 2

Loading

socratic-irony, quythanh, styrowolf, xiaominghero, ZhengRui, VipinVIP, shividhar, tomcotter7, ksxkq, abclution, and harsha20032020 reacted with hooray emoji

ZhengRui, ashwanthkumar, shividhar, mateusnobre, shwu-nyunai, harsha20032020, and azizahtas reacted with heart emoji

All reactions

🎉 11 reactions
❤️ 7 reactions

15 people reacted

v0.4.14: Merge pull request #141 from VikParuchuri/dev

30 Jun 14:39

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.4.14: Merge pull request #141 from VikParuchuri/dev

New transformers version added a new kwarg to donut embeddings. This now handles and ignores that kwarg, and also slightly future-proofs in case this happens again.

Assets 2

Loading

kelechi-c and MassChargeRatio reacted with rocket emoji

All reactions

🚀 2 reactions

2 people reacted

Minor bugfixes

28 May 21:44

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Minor bugfixes

Fix rotation and copy bugs

Assets 2

Loading

All reactions

Fix image bugs

28 May 21:16

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Fix image bugs

Fix bugs with RGBA images
Fix assert bug
Add back in thumbnail method for resizing
Slightly optimize segformer code

Assets 2

Loading

All reactions

Change image resize

28 May 02:55

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Change image resize

Image resize from cv2 to PIL - cv2 caused benchmark regressions

Assets 2

Loading

kelechi-c reacted with rocket emoji

All reactions

🚀 1 reaction

1 person reacted

OCR speedups

27 May 21:56

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

OCR speedups

Speed up base OCR model ~15-20%, and reduce memory usage by ~25% (can do higher batch sizes)
Add static cache for compilation - torch.compile will result in another 15% speedup
Other optimizations, like faster image resizing
Bugfixes, like enabling different length language inputs for OCR (batching different docs with different languages together)

Assets 2

Loading

651961, kelechi-c, Josephrp, david-nikolai-mueller, and Gbillington1 reacted with heart emoji

All reactions

❤️ 5 reactions

5 people reacted

Processor improvements

23 May 23:12

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Processor improvements

Remove unneeded format conversions
Fix bug in OCR, where only one color channel was used for OCR - results should be better now
Speed up layout/text detection a bit

Assets 2

Loading

mesutde, kelechi-c, hopez13, Jamalianpour, and hyotaime reacted with thumbs up emoji

All reactions

👍 5 reactions

5 people reacted

OCR speedup

18 May 04:03

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

OCR speedup

Cut OCR time in half. Combined with the previous release, OCR should now take about 40% as much time as it did before.

Assets 2

Loading

styrowolf, xtyrrell, jetjodh, muhammadAgfian96, and aravind-selvam reacted with hooray emoji

kelechi-c and xtyrrell reacted with heart emoji

tcluri and xtyrrell reacted with rocket emoji

All reactions

🎉 5 reactions
❤️ 2 reactions
🚀 2 reactions

7 people reacted

Significant speedup for layout, line detection

17 May 22:04

VikParuchuri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Significant speedup for layout, line detection

Improve CPU postprocessing for line detection and layout - cut postprocessing time to 1/3 of original
Unpin transformers version after investigating model performance

This should result in an ~2x speedup for layout and text detection. The effect will be most noticeable on GPU. I haven't fully benchmarked, though.

Assets 2

Loading

651961, syzby, kelechi-c, lynntf, and caseylai reacted with heart emoji

All reactions

❤️ 5 reactions

5 people reacted

Previous 1 2 3 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.