Releases: gustavz/DataChad
Releases · gustavz/DataChad
DataChad V2
This is the cut-off point for DataChad V2
How does it work?
- Upload any
file(s)
or enter anypath
orurl
- The data source is detected and loaded into text documents
- The text documents are embedded using openai embeddings
- The embeddings are stored as a vector dataset to activeloop's database hub
- A langchain is created consisting of a LLM model (
gpt-3.5-turbo
by default) and the vector store as retriever - When asking questions to the app, the chain embeds the input prompt and does a similarity search of in the vector store and uses the best results as context for the LLM to generate an appropriate response
- Finally the chat history is cached locally to enable a ChatGPT like Q&A conversation
Good to know
- The app only runs on
py>=3.10
! - As default context this git repository is taken so you can directly start asking question about its functionality without chosing an own data source.
- To run locally or deploy somewhere, execute
cp .env.template .env
and set credentials in the newly created.env
file. Other options are manually setting of system environment variables, or storing them into.streamlit/secrets.toml
when hosted via streamlit. - If you have credentials set like explained above, you can just hit
submit
in the authentication without reentering your credentials in the app. - Your data won't load? Feel free to open an Issue or PR and contribute!
- Yes, Chad in
DataChad
refers to the well-known meme - DataChad V2 does not support local mode, but many feature will soon come. Stay tuned!
DataChad V1
This is the cut-off point for DataChad V1
How does it work?
- Upload any
file(s)
or enter anypath
orurl
- The data source is detected and loaded into text documents
- The text documents are embedded using openai embeddings
- The embeddings are stored as a vector dataset to activeloop's database hub
- A langchain is created consisting of a LLM model (
gpt-3.5-turbo
by default) and the vector store as retriever - When asking questions to the app, the chain embeds the input prompt and does a similarity search of in the vector store and uses the best results as context for the LLM to generate an appropriate response
- Finally the chat history is cached locally to enable a ChatGPT like Q&A conversation
Good to know
- The app only runs on
py>=3.10
! - As default context this git repository is taken so you can directly start asking question about its functionality without chosing an own data source.
- To run locally or deploy somewhere, execute
cp .env.template .env
and set credentials in the newly created.env
file. Other options are manually setting of system environment variables, or storing them into.streamlit/secrets.toml
when hosted via streamlit. - If you have credentials set like explained above, you can just hit
submit
in the authentication without reentering your credentials in the app. - To enable
Local Mode
(disabled for the demo) setENABLE_LOCAL_MODE
toTrue
indatachad/constants.py
. You need to have the model binaries downloaded and stored inside./models/
- Currently supported
Local Mode
OSS model is GPT4all. To add more models updatedatachad/models.py
- If you are running
Local Mode
all your data stays locally on your machine. No API calls are made. Same with the embeddings database which stores its data to./data/
- Your data won't load? Feel free to open an Issue or PR and contribute!
- Yes, Chad in
DataChad
refers to the well-known meme