Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please provide Python API #2

Closed
saitej123 opened this issue Jul 24, 2024 · 14 comments · Fixed by #21
Closed

Please provide Python API #2

saitej123 opened this issue Jul 24, 2024 · 14 comments · Fixed by #21
Labels
enhancement New feature or request

Comments

@saitej123
Copy link

Please provide Python API

@tylermaran
Copy link
Contributor

I'd love to support a Python API and publish a package on pip. Right now neither of the maintainers are super good python devs, but if you know anyone would would want to make a contribution let us know!

The roadmap so far is:

  • Add more chunking options (right now we're just separating at page level)
  • Supporting additional document formats. This will likely be adding a document => pdf step prior to the LLM step.
  • Supporting additional models

@batmanscode
Copy link

@tylermaran I've been thinking of building a version of this for myself for a while now and I was so excited to see your project on HN so that I didn't have to build it myself haha

Let me look into this. Maybe I can help out with a python package

@tylermaran
Copy link
Contributor

Hey @batmanscode 🦇

I would love to have some help here. It looks like there is a similar pip package for pdf2image.
https://github.com/Belval/pdf2image

Uses poppler under the hood. I wonder if there's a variant that uses imagemagik like the current node version does. But either way it should be pretty easy to set up. Within the npm setup we have an install-dependencies script to make sure all the prereqs are set up.

I'd like to keep this as a monorepo if possible. Probably something like:

zerox/
├── .gitignore
├── README.md
├── LICENSE
├── package.json     # npm config
├── setup.py         # pip config
├── node-zerox/      # typescript source
│   ├── src/
│   ├── dist/
│   ├── tests/
│   └── etc/
└── py-zerox/        # python source
    ├── src/
    ├── tests/
    └── etc/            

@wizenheimer
Copy link
Contributor

Hey @tylermaran and @batmanscode,
This looks interesting, would love to collaborate. I have experience with both TypeScript and Python package development.

Have reviewed zerox source, can assist in replicating it to Python. My goal would be to ensure that the API and build process remain consistent across both the TypeScript and Python implementations.

Looking forward to working together!

@wizenheimer
Copy link
Contributor

Hey @tylermaran,
Quick update. Prepared a PR #4 which presents the monorepo structure for Zerox. This includes Poetry for dependency management, a Makefile for build automation, and some code quality checks.
Current implementations are placeholders. The actual implementation details will be added once the proposed structure gets reviewed and approved :D

@saitej123
Copy link
Author

Can gpt4 mini provide bounding box details also ? If I want to highlight key information in document

@tylermaran
Copy link
Contributor

@saitej123 I've been looking into this as well. It doesn't seem to be immediately available using gpt-4o-mini.

I know it's possible to use a library like YOLOv8 to grab bounding boxes. But that get's a little harder when you have to host an additional model.

I think the general flow would be:

  1. Parse the document with gpt mini
  2. Split the resulting markdown into semantic sections (i.e. headers, subheaders, tables, etc.)
  3. For each semantic section, use some tool to find bounding boxes in the original image

This is a bit separate from the python request, so I added a tracking issue #7

@saitej123
Copy link
Author

If we use azure ocr or gcp we can map bounding box not sure mapping may fail it split in different way

@tylermaran tylermaran added the enhancement New feature or request label Jul 29, 2024
@tylermaran
Copy link
Contributor

@wizenheimer merged your repo updates for the python package in #4

Great work. Now we just need to add the core logic.

@wizenheimer
Copy link
Contributor

Hey @tylermaran,
Added the PR #10 introducing Python SDK for Zerox. Ensured the external API and types remain consistent across the SDKs.

@RazvanMihaiPopa
Copy link

Could you add a usage section for python in the README?

1 similar comment
@guici123
Copy link

guici123 commented Sep 4, 2024

Could you add a usage section for python in the README?

@guici123
Copy link

guici123 commented Sep 4, 2024

Could you add a usage section for python in the README? @tylermaran

@pradhyumna85
Copy link
Contributor

@guici123, @RazvanMihaiPopa have a look at this PR #21, should be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
7 participants