PDF Chatbot using Retrieval Augmented Generation (RAG)

This project implements a PDF chatbot powered by Retrieval Augmented Generation (RAG). It allows users to upload any PDF document, analyze its contents, and answer the questions related to the uploaded file. The chatbot responds based on the PDF's content and can refuse to reply if the requested information is not present in the document.

Key Features

PDF Upload & Analysis: Upload any PDF file and have it automatically analyzed for content extraction.
Interactive Chatbot: Ask questions related to the uploaded PDF and receive contextual answers.
Out-of-Scope Responses: If the required information is missing from the PDF, the chatbot can provide a relevant response.

RAG Architecture Overview

Document Chunking: The PDF is split into smaller chunks to manage the document efficiently during retrieval.
Embeddings: Each chunk is embedded into a high-dimensional space to represent the content numerically.
Vector Store: The embedded chunks are stored in a vector database, enabling fast similarity searches based on the user’s query.
Retriever-LLM: The retriever finds the most relevant chunks from the vector store, which are passed to a language model (LLM) to generate responses.
Augmented Responses: The LLM synthesizes answers by using both the retrieved context from the PDF and its own pre-trained knowledge.

Project Structure

1. `app.py` - The User Interface

This file handles the UI components for the project, built with two primary sections:

Left Panel:
- Upload Button: Allows the user to upload a PDF file.
- Analyze Button: Analyzes the uploaded PDF and prepares it for question answering.
Right Panel:
- Chat Area: Displays the chat history between the user and the chatbot.
- User Input Area: A textbox for the user to input questions with a send button to submit the queries.

2. `ragModel.py` - The Core Logic

This file handles the following:

PDF Analysis: Processes the uploaded PDF, chunks the document, and embeds each chunk.
Vector Database & Retriever: The chunks are stored in a vector database for efficient retrieval.
Query Processing: When the user asks a question, it is embedded, and relevant chunks are retrieved from the vector store.
LLM Integration: The retrieved information is passed to the language model for generating responses.

3. `grammarCheck.py` - Grammar and Spelling Check

This file handles grammar and spelling correction of user inputs:

User Input Correction: It checks for spelling and grammar errors in the user's query and replaces the corrected version in the chat area.
Punctuation Handling: Note that this project version does not handle punctuation.

How to Run the Project

Install Requirements: Install all required libraries using the requirements.txt file by running:
```
pip install -r requirements.txt
```

Clone the Repository:

git clone https://github.com/dheerajkallakuri/pdfChatbot.git

Run the Application: Navigate to the project directory and run:
```
python app.py
```

Conclusion

This PDF chatbot leverages RAG to create an interactive and intelligent interface for users to engage with any PDF content effectively. It enhances the document experience by providing accurate and context-driven responses.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
US_Constitution.pdf		US_Constitution.pdf
app.py		app.py
grammarCheck.py		grammarCheck.py
ragModel.py		ragModel.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Chatbot using Retrieval Augmented Generation (RAG)

Key Features

RAG Architecture Overview

Project Structure

1. `app.py` - The User Interface

2. `ragModel.py` - The Core Logic

3. `grammarCheck.py` - Grammar and Spelling Check

How to Run the Project

Conclusion

About

Releases

Packages

Languages

dheerajkallakuri/pdfChatbot

Folders and files

Latest commit

History

Repository files navigation

PDF Chatbot using Retrieval Augmented Generation (RAG)

Key Features

RAG Architecture Overview

Project Structure

1. app.py - The User Interface

2. ragModel.py - The Core Logic

3. grammarCheck.py - Grammar and Spelling Check

How to Run the Project

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. `app.py` - The User Interface

2. `ragModel.py` - The Core Logic

3. `grammarCheck.py` - Grammar and Spelling Check

Packages