Query PDF and CSV with LLM

This repo host fastAPI app for querying PDF and CSV file to look for information using natural language.

The query flow for pdf file was implemented as follow:

Read text from pdf file
Split the text into chunks
Encode the chunks into embedding vectors using huggingface GTR-T5 or OpenAI Ada text embedding
Upload text chunks and embedding vectors to vector database Qdrant
Get user query text and encode into embedding vector
Search vector database for text chunk whose embedding are closest to query embedding based on cosine similarity
Get answer from FLAN-T5 or OpenAI GPT-3 based user query text and text chunk

The query flow for pdf file was implemented as follow:

Setup

Clone this repo

git clone https://github.com/haizadtarik/queryfile.git

Install dependencies

cd queryPDF
python -m pip install -r requirements.txt

Run Qdrant base

docker run -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant

NOTE

To use OpenAI embeddings or GPT create .env and put your API key there

OPENAI_KEY=<OPEN_API_KEY>

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
queryfile.py		queryfile.py
requirements.txt		requirements.txt
server.py		server.py