This project implements a PDF chatbot powered by Retrieval Augmented Generation (RAG). It allows users to upload any PDF document, analyze its contents, and answer the questions related to the uploaded file. The chatbot responds based on the PDF's content and can refuse to reply if the requested information is not present in the document.
- PDF Upload & Analysis: Upload any PDF file and have it automatically analyzed for content extraction.
- Interactive Chatbot: Ask questions related to the uploaded PDF and receive contextual answers.
- Out-of-Scope Responses: If the required information is missing from the PDF, the chatbot can provide a relevant response.
- Document Chunking: The PDF is split into smaller chunks to manage the document efficiently during retrieval.
- Embeddings: Each chunk is embedded into a high-dimensional space to represent the content numerically.
- Vector Store: The embedded chunks are stored in a vector database, enabling fast similarity searches based on the user’s query.
- Retriever-LLM: The retriever finds the most relevant chunks from the vector store, which are passed to a language model (LLM) to generate responses.
- Augmented Responses: The LLM synthesizes answers by using both the retrieved context from the PDF and its own pre-trained knowledge.
This file handles the UI components for the project, built with two primary sections:
- Left Panel:
- Upload Button: Allows the user to upload a PDF file.
- Analyze Button: Analyzes the uploaded PDF and prepares it for question answering.
- Right Panel:
- Chat Area: Displays the chat history between the user and the chatbot.
- User Input Area: A textbox for the user to input questions with a send button to submit the queries.
This file handles the following:
- PDF Analysis: Processes the uploaded PDF, chunks the document, and embeds each chunk.
- Vector Database & Retriever: The chunks are stored in a vector database for efficient retrieval.
- Query Processing: When the user asks a question, it is embedded, and relevant chunks are retrieved from the vector store.
- LLM Integration: The retrieved information is passed to the language model for generating responses.
This file handles grammar and spelling correction of user inputs:
- User Input Correction: It checks for spelling and grammar errors in the user's query and replaces the corrected version in the chat area.
- Punctuation Handling: Note that this project version does not handle punctuation.
-
Install Requirements: Install all required libraries using the
requirements.txt
file by running:pip install -r requirements.txt
-
Clone the Repository:
git clone https://github.com/dheerajkallakuri/pdfChatbot.git
-
Run the Application: Navigate to the project directory and run:
python app.py
This PDF chatbot leverages RAG to create an interactive and intelligent interface for users to engage with any PDF content effectively. It enhances the document experience by providing accurate and context-driven responses.