CodeSeer is a powerful tool designed to index, embed, and semantically search through Python code. It's a Code Search system for Python that can be used to feed Language Models like LLMs. The project consists of several components that work together to achieve these goals.
- Environment Configuration: Set up your Python code path and OpenAI API key.
- Parsing Python Code: Analyze your Python code using Abstract Syntax Trees (AST) and create a JSON file with all methods, classes, and variables.
- Creating Embeddings: Create a local Chroma vector database and fetch embeddings for each code element previously indexed.
- Semantic Code Query: Query your code semantically and retrieve relevant methods, classes, and variables to feed LLM context.
This file contains example environment variables that you need to set up. You'll specify the path to your Python code and your OpenAI API key here.
This script parses all your Python code using AST. It creates a JSON file containing all methods, classes, and variables found in the code. The script includes a class CodeVisitor
that visits function definitions, class definitions, variable assignments, and other nodes to extract the required information.
This script creates a local Chroma vector database. It fetches the embedding for each code element previously indexed (methods, classes, variables) and stores them in the database.
This script allows you to query your code semantically. You can search for relevant methods, classes, and variables, and it will return the results to feed the context of LLMs.
- Set Up Environment Variables: Copy
.env.example
to.env
and fill in the required information. - Create Code Data: Run
create_code_data.py
to parse all your Python code and create a JSON file with all methods, classes, and variables. - Create Embeddings: Run
create_embeddings.py
to create a local Chroma vector database with embeddings for each code element. - Query Code: Use
code_query.py
to query your code semantically and retrieve relevant code elements.
To use that later script, you need to pass your query as an argument when running the file. For instance:
python code_query.py "How to calculate confidence with this project"
- Python 3.x
- OpenAI API
- Other dependencies can be found in
pyproject.toml
and installed using Poetry.
This project is licensed under the terms of the license found in the LICENSE
file.