Please provide Python API #2

saitej123 · 2024-07-24T15:54:55Z

Please provide Python API

tylermaran · 2024-07-25T19:47:32Z

I'd love to support a Python API and publish a package on pip. Right now neither of the maintainers are super good python devs, but if you know anyone would would want to make a contribution let us know!

The roadmap so far is:

Add more chunking options (right now we're just separating at page level)
Supporting additional document formats. This will likely be adding a document => pdf step prior to the LLM step.
Supporting additional models

batmanscode · 2024-07-27T10:56:04Z

@tylermaran I've been thinking of building a version of this for myself for a while now and I was so excited to see your project on HN so that I didn't have to build it myself haha

Let me look into this. Maybe I can help out with a python package

tylermaran · 2024-07-27T18:55:42Z

Hey @batmanscode 🦇

I would love to have some help here. It looks like there is a similar pip package for pdf2image.
https://github.com/Belval/pdf2image

Uses poppler under the hood. I wonder if there's a variant that uses imagemagik like the current node version does. But either way it should be pretty easy to set up. Within the npm setup we have an install-dependencies script to make sure all the prereqs are set up.

I'd like to keep this as a monorepo if possible. Probably something like:

zerox/
├── .gitignore
├── README.md
├── LICENSE
├── package.json     # npm config
├── setup.py         # pip config
├── node-zerox/      # typescript source
│   ├── src/
│   ├── dist/
│   ├── tests/
│   └── etc/
└── py-zerox/        # python source
    ├── src/
    ├── tests/
    └── etc/

wizenheimer · 2024-07-27T22:21:40Z

Hey @tylermaran and @batmanscode,
This looks interesting, would love to collaborate. I have experience with both TypeScript and Python package development.

Have reviewed zerox source, can assist in replicating it to Python. My goal would be to ensure that the API and build process remain consistent across both the TypeScript and Python implementations.

Looking forward to working together!

wizenheimer · 2024-07-28T17:43:22Z

Hey @tylermaran,
Quick update. Prepared a PR #4 which presents the monorepo structure for Zerox. This includes Poetry for dependency management, a Makefile for build automation, and some code quality checks.
Current implementations are placeholders. The actual implementation details will be added once the proposed structure gets reviewed and approved :D

saitej123 · 2024-07-29T07:42:03Z

Can gpt4 mini provide bounding box details also ? If I want to highlight key information in document

tylermaran · 2024-07-29T18:43:51Z

@saitej123 I've been looking into this as well. It doesn't seem to be immediately available using gpt-4o-mini.

I know it's possible to use a library like YOLOv8 to grab bounding boxes. But that get's a little harder when you have to host an additional model.

I think the general flow would be:

Parse the document with gpt mini
Split the resulting markdown into semantic sections (i.e. headers, subheaders, tables, etc.)
For each semantic section, use some tool to find bounding boxes in the original image

This is a bit separate from the python request, so I added a tracking issue #7

saitej123 · 2024-07-29T19:00:21Z

If we use azure ocr or gcp we can map bounding box not sure mapping may fail it split in different way

tylermaran · 2024-07-30T01:07:49Z

@wizenheimer merged your repo updates for the python package in #4

Great work. Now we just need to add the core logic.

wizenheimer · 2024-07-30T18:35:42Z

Hey @tylermaran,
Added the PR #10 introducing Python SDK for Zerox. Ensured the external API and types remain consistent across the SDKs.

RazvanMihaiPopa · 2024-08-14T12:34:35Z

Could you add a usage section for python in the README?

guici123 · 2024-09-04T03:20:48Z

Could you add a usage section for python in the README?

guici123 · 2024-09-04T03:21:13Z

Could you add a usage section for python in the README? @tylermaran

pradhyumna85 · 2024-09-06T01:39:16Z

@guici123, @RazvanMihaiPopa have a look at this PR #21, should be useful.

tylermaran added the enhancement New feature or request label Jul 29, 2024

pradhyumna85 mentioned this issue Sep 10, 2024

FEAT: Introducing support for vision models from all major providers like Azure OpenAI, Anthropic etc and custom system prompt in python SDK #21

Merged

xdotli closed this as completed in #21 Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please provide Python API #2

Please provide Python API #2

saitej123 commented Jul 24, 2024

tylermaran commented Jul 25, 2024

batmanscode commented Jul 27, 2024

tylermaran commented Jul 27, 2024

wizenheimer commented Jul 27, 2024

wizenheimer commented Jul 28, 2024

saitej123 commented Jul 29, 2024

tylermaran commented Jul 29, 2024

saitej123 commented Jul 29, 2024

tylermaran commented Jul 30, 2024

wizenheimer commented Jul 30, 2024

RazvanMihaiPopa commented Aug 14, 2024

guici123 commented Sep 4, 2024

guici123 commented Sep 4, 2024

pradhyumna85 commented Sep 6, 2024

Please provide Python API #2

Please provide Python API #2

Comments

saitej123 commented Jul 24, 2024

tylermaran commented Jul 25, 2024

batmanscode commented Jul 27, 2024

tylermaran commented Jul 27, 2024

wizenheimer commented Jul 27, 2024

wizenheimer commented Jul 28, 2024

saitej123 commented Jul 29, 2024

tylermaran commented Jul 29, 2024

saitej123 commented Jul 29, 2024

tylermaran commented Jul 30, 2024

wizenheimer commented Jul 30, 2024

RazvanMihaiPopa commented Aug 14, 2024

guici123 commented Sep 4, 2024

guici123 commented Sep 4, 2024

pradhyumna85 commented Sep 6, 2024