-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update RAG, daemon and reporting docs (#1)
* Update RAG app doc * Add daemon, rag and reporting docs
- Loading branch information
1 parent
16012f1
commit 42d0395
Showing
8 changed files
with
111 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
_site |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,3 @@ | ||
theme: jekyll-theme-midnight | ||
title: Pebblo Documentation Home | ||
description: Pebblo Gen-AI application data governance tool documetation | ||
|
||
title: Pebblo Documentation | ||
description: OpenSource Safe Data Loader for Gen AI applications |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Pebblo Daemon | ||
|
||
## Overview | ||
|
||
Pebblo has two components. | ||
|
||
1. Pebblo Daemon | ||
2. Pebblo Langchain SafeLoader | ||
|
||
This document describes how to `Pebblo Daemon` works to enable any Langchain Gen-AI application with deep data visibility on the types of Topics and Entities ingested through Document Loaders. For more details on how Pebblo enabled your Langchain RAG application see this [Pebblo SafeLoader](/pebblo-docs/rag.html) document. | ||
|
||
## Pebblo Daemon | ||
|
||
Pebblo Daemon is a `FastAPI` application that exposes a locally hosted REST API endpoint for various Pebblo SafeLoader enabled Langchain application to connect. | ||
|
||
By default `Pebblo Daemon` runs at `localhost:8000`. The `Pebblo SafeLoader` by default connects to hostname and port. If the daemon is running in a different port or a different hostname, the SafeLoader env variable `PEBBLI_CLASSIFIER_URL` need to set to the correct URL. | ||
|
||
## Report Generation | ||
|
||
A separate `Data Report` will be generated for every complete document load operation. A subsequent document loader, either done periodically (say everyday, every week, etc) or on-demand will not overwrite a previous load's `Data Report`. | ||
|
||
## Report Location | ||
|
||
By default all the reports will be stored in a `.pebblo` in the home directory of the system running `Pebblo Daemon`. Separate subdirectories named with the RAG application name is used when multiple RAG applications uses the same `Pebblo Daemon`. | ||
|
||
```bash | ||
|
||
$ cd $HOME/.pebblo | ||
$ tree | ||
├── acme-corp-rag-1 | ||
│ ├── pebblo_report.pdf | ||
│ ├── bfd46d34-42c7-4819-846c-f54b3620f540 | ||
│ │ ├── metadata | ||
│ │ │ └── metadata.json | ||
│ │ └── report.json | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
# Pebblo Docs Home | ||
# Contents | ||
|
||
- [Installation](/pebblo-docs/installation.html) | ||
- [Development Environment](/pebblo-docs/development.html) | ||
- [Pebblo SafeLoader for Langchain RAG](/pebblo-docs/rag.html) | ||
- [Pebblo Reports](/pebblo-docs/reporting.html) | ||
- [Reports](/pebblo-docs/reporting.html) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,33 @@ | ||
# Pebblo Reports | ||
# Pebblo Data Reports | ||
|
||
Pebblo Data Reports provides an in-depth visibilty into the document ingested into Gen-AI RAG application during every load. | ||
|
||
This document describes the information produced in the Data Report. | ||
|
||
# Report Summary | ||
|
||
Report Summary provides the following details: | ||
|
||
1. **Findings**: Total number of Topics and Entities found across all the snippets loaded in this specific load run. | ||
1. **Files with Findings**: The number of files that has one or more `Findings` over the total number of files used in this document load. This field indicates the number of files that need to be inspected to remediate any potentially text that needs to be removed and/or cleaned for Gen-AI inference. | ||
1. **Number of Data Source**: The number of data sources used to load documents into the Gen-AI RAG application. For e.g. this field will be two if a RAG application loads data from two different directories or two different AWS S3 buckets. | ||
|
||
# Top Files with Most Findings | ||
|
||
This table indicates the top files that had the most findings. Typically these files are the most _affending_ ones that needs immediate attention and best ROI for data cleansing and remediation. | ||
|
||
# Load History | ||
|
||
This table provides the history of findings and path to the reports for the previous loads of the same RAG application. | ||
|
||
# Instance Details | ||
|
||
This section provide a quick glance of where the RAG application is physically running like in a Laptop (Mac OSX) or Linux VM and related properties like IP address, local filesystem path and Python version. | ||
|
||
# Data Source Findings Table | ||
|
||
This table provides a summary of all the different Topics and Entities found across all the files that got ingested usind `Pebblo SafeLoader` enabled Document Loaders. | ||
|
||
# Snippets | ||
|
||
This sections provides the actual text inspected by the `Pebblo Daemon` using the `Pebblo Topic Classifier` and `Pebblo Entity Classifier`. This will be useful to quickly inspect and remediate text that should not be ingested into the Gen-AI RAG application. Each snippet shows the exact file the snippet is loaded from easy remediation. |