Leto-Modelizer AI Proxy is a Python-based proxy API which interfaces with several AI models in order to generate infrastructure code for Leto-Modelizer. The purpose of this API is to facilitate the automation and customization of infrastructure code generation for various use cases.
Linked to Leto-Modelizer, the API provides a seamless integration with the powerful infrastructure generation capabilities of the Leto-Modelizer framework. Whether you need to generate code for cloud infrastructure, or any other domain-specific infrastructure, our API leverages the advanced modeling capabilities of Leto-Modelizer to automate and optimize the code generation process.
Built on top of Python's robust web framework, the Leto-Modelizer Proxy API offers a flexible and extensible architecture for integrating various AI models. Whether you prefer using pre-trained models or training your own custom models, our API seamlessly handles the code generation process. By leveraging the power of AI and Leto-Modelizer, our API simplifies the task of infrastructure code generation, saving you time.
Currently this project use Ollama or Gemini in order to generate code.
To configure the API, you need to edit the configuration/configuration.json
file (default location).
Or you can set the environment variables API_CONFIGURATION
and put the configuration.json
file in that path`.
Currently the configuration has the following settings:
Setting | Description |
---|---|
pluginPreferences | A dictionary containing the plugin preferences, which are the AI models to use for what plugin. |
ollama | A dictionary containing the ollama configuration (cf: next section). |
Gemini | A dictionary containing the Gemini configuration (cf: next section). |
Here is an example of the configuration.json
file:
{
"pluginPreferences":{
"default": "ollama"
},
"ollama":
{
"base_url": "http://localhost:11434/api",
"models": ["mistral"],
"defaultModel": "mistral",
"modelFiles": {
"generate": {
"default": "default-mistral-modelfile_generate",
"@ditrit/kubernator-plugin": "default-kubernetes-mistral-modelfile_generate",
"@ditrit/githubator-plugin": "default-githubactions-mistral-modelfile_generate"
},
"message": {
"default": "default-mistral-modelfile_message",
"@ditrit/kubernator-plugin": "default-kubernetes-mistral-modelfile_message",
"@ditrit/githubator-plugin": "default-githubactions-mistral-modelfile_message"
}
}
},
"gemini": {
"base_url": "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent",
"key": "YOUR-KEY",
"system_instruction": {
"generate": {
"default": "default-generate.json",
"@ditrit/kubernator-plugin": "default-kubernetes.json",
"@ditrit/githubator-plugin": "default-githubactions.json"
},
"message": {
"default": "default-message.json",
"@ditrit/kubernator-plugin": "default-kubernetes.json",
"@ditrit/githubator-plugin": "default-githubactions.json"
}
}
}
}
The mandatory default
key is used to specify which AI model to use by default for every plugin.
Moreover, the AI precised for each plugin in the pluginPreferences
key must exist in the configuration.json
.
Ollama can be found here: https://github.com/ollama/ollama Ollama has the following settings:
Setting | Description |
---|---|
base_url | The base URL of the Ollama API. |
models | A list of models to use. |
defaultModel | The default model to use. |
modelFiles | The Ollama model files to use. They are seperate by purpose, one for generate and one for message mode |
Currently only the default model is used. But later, we will be able to handle more smoothly the rest of the models to use, depending on the usage.
Gemini can be found here: https://github.com/google-gemini/
Don't forget to set the key
in the configuration.json
by creating an account and getting an API key.
Gemini has the following settings:
Setting | Description |
---|---|
base_url | The base URL of the Gemini API. |
key | The API key to use. |
system_instruction | Json file used to generate the response according to the methodology we need. (same as Ollama modelfiles) |
NOTE: Currently Gemini does not handle the message mode (conversation with a context) correctly due to the way the API works with context.
Currently the API only supports the Ollama and Gemini. In the near future, we will be able to have more AI models. Moreover, you can add your own AI models, to do so see here.
So you need to install Ollama on your local:
curl -fsSL https://ollama.com/install.sh | sh
Then you have to pull a model from Ollama:
ollama pull mistral
In order to test Ollama, you can run your model using the following command:
ollama run mistral
Then you can just ask it a question !
NOTE: You may need to install NVIDIA Container Toolkit to run Ollama. cf: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
For Gemini, you need to create an account and get an API key, on this website: https://ai.google.dev/
Then you just need to add it in the configuration.json
file, in the ai-models
section.
First install pipenv and create a virtual environment with the needed packages:
-
pipenv
On modern Linux distributions, you can install pipenv using the following command:
* ``` sudo apt install pipenv ``` * ``` sudo apt install pipx && pipx install pipenv && pipx ensurepath ```
NOTE: After PEP668, all python packages should be installed in a virtual environment (recommanded to use pipx) or using the debian package manager.
Create virtual environment and install dependencies from your repo:
pipenv install
Install dev dependencies:
pipenv install --dev
Activate virtual environment:
pipenv shell
To deactivate your virtual environment:
deactivate
Then launch the API using hypercorn (from the root folder and your virtual environment is activated): The initialize script initializes everything the handlers needs in order to work (for instance the Ollama handler needs to be initialized with the model files). This script can be launched anytime. Even after the API is running.
python3 initialize.py
hypercorn src.main:app --reload --bind 127.0.0.1:8585
Once it is running, you can request it on this url: http://localhost:8585/
And the Swagger UI is available on this url: http://localhost:8585/docs
You need to run the following commands to launch the API.
First install NVIDIA Container Toolkit:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configure NVIDIA Container Toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Test GPU integration
docker run --gpus all nvidia/cuda:11.5.2-base-ubuntu20.04 nvidia-smi
then build the docker image using the following command:
docker build -t leto-modelizer-ai-proxy .
Then run the image:
docker compose up -f docker-compose-nvidia.yaml
Build the docker image using the following command:
docker build -t leto-modelizer-ai-proxy .
Then run the image:
docker compose up -f docker-compose.yaml
The initialize script initializes everything the handlers needs in order to work (for instance the Ollama handler needs to be initialized with the model files). This script can be launched anytime. Even after the API is running.
python3 initialize.py
Once your docker is running and initialized, you can request it on this url: http://localhost:8585/
NOTE: You do not need to launch the initialize script everytime you launch the API with docker, once is enough since it target the AI (i.e model files for Ollama).
Currently the API only supports these enpoints:
Method | Enpoint | Description |
---|---|---|
GET | / | Returning a welcome message |
POST | /api/diagram | Generating diagram code |
POST | /api/message | Send a message to the AI and get a response with the associated context |