Astro is a virtual assistant for console.redhat.com created with Rasa.
Astro uses pipenv to manage it's dependencies. Follow the instructions on pipenv to install it.
After installing pipenv, you are now ready to install the project development dependencies by running:
pipenv install --dev
pipenv run install-api
This will install rasa inside the virtual environment created by
pipenv.
To access it you can run pipenv run rasa
to execute any
rasa command or pipenv shell
to start a session within the virtual environment.
ℹ️ Any command below assumes that you are within a pipenv shell (i.e. after executing
pipenv shell
). If you don't want to use that mode, just preprendpipenv run
to any command.
The training files are inside data directory. The common files are on the root of this directory and specific
files are in the subfolders. The current approach is to have a directory for the namespace/bundle and a sub-directory
for the application. Each application can write its own nlu
, domain
, stories
and rules
file depending on its needs.
For example, data/console/rbac/stories.yml holds the nlu
info for RBAC
.
Intents and responses are spread throughout the data directory in domain.yml
files. This allows us to make changes to one bundle without affecting the other.
The custom actions (python code) is found within actions and holds the required code to execute it.
Test files are found in the tests folders.
You can run make train
to start a full training session, or
you can do an incremental training of your previous model with make train-finetune
.
Note that incremental training doesn't work if you added any new intents or actions.
The models are by default saved to ./models
.
To be able to locally run the actions, you need to have a valid offline token for https://sso.redhat.com.
All the API calls will be made on behalf of the user of this token. You can generate an offline token at
https://access.redhat.com/management/api by clicking "Generate token".
Copy this token to the environment variable OFFLINE_REFRESH_TOKEN
(.env
file is supported).
:note: If you want to use the stage environment, generate the token at https://access.stage.redhat.com/management/api and also set the environment variable
SSO_REFRESH_TOKEN_URL
tohttps://sso.stage.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
You will also need to point the
CONSOLEDOT_BASE_URL
environment variable to https://console.stage.redhat.com, and make sure that your http proxy url is set up properly.
This app is configured to use a postgres database to store its conversation history. To start the database, run make run-db
.
Once rasa is running, it will automatically migrate, creating an events
table.
Once you have a trained model, you can run a local chat instance with rasa shell
or make run-cli
.
After setting up your offline token, start the actions server by executing rasa run actions --auto-reload
or make run-actions
.
RasaException: Failed to find input channel class for 'channels.console.ConsoleInput'. Unknown input channel. Check your credentials configuration to make sure the mentioned channel is not misspelled. If you are creating your own channel, make sure it is a proper name of a class in a module.
This error happens when the class channels.console.ConsoleInput is not found OR the file containing that class fails to load. If the later, the error doesn't appear in the console, but we can check it by importing that file.
Running the following should make the error evident or at least give more information.
python -c "import channels.console"
We use a custom makefile with some useful targets. The main Makefile is located on the root of the project at: Makefile. We have submodules stored in make folder
Each module has its own targets to help us with the development.
Has general targets for installing dependencies, cleaning and running the project.
install
: Installs the dependencies and dev-dependencies of both rasa and actions serverclean
: Cleans any know temporal file, model, cache, results or reports from rasa.run
: Runs rasarun-interactive
: Runs rasa in interactive moderun-actions
: Runs actions serverrun-cli
: Runs rasa and shows the CLI/shell moderun-db
:db
:drop-db
:compose
:
Contains global variables used in all the other Makefiles to execute rasa and python in a common way. Also checks for a
DEBUG
or VERBOSE
environment to include the respective flags into the train and run arguments.
Contains targets to train the models
train
: Full train, intents and storiestrain-finetune
: Incremental trainingtrain-nlu
: Only trains the NLU components
General purpose linting for our project. Inspects yml and python files.
lint
: Runs the linter in read only mode. Outputs the error but does not fix them.lint-fix
: Runs the linter and attempts to fix the lint errors
test
: Alias fortest-rata
andtest-python
test-rasa
: Alias fortest-stories
,test-data
andtest-nlu
test-data
(alias:validate
): Checks for inconsistencies in rasa's files.test-nlu
: Runs a data split and nlu tests in the results. Split files are written under.astro/train_test_split
test-stories-nlu
: Extracts user utterances and intents from the test stores and runs the nlu test on these. Files are written under.astro/nlu-from-stories
test-stories
: Run stories teststest-python
: Run python teststest-identity
: Convenience method to call the APItest-is-org-admin
: Convenience method to call the API as an org admintest-is-not-org-admin
: Convenience method to call the API as a non org admin
hyperopt-nlu
: Does an nlu optimization. See Optimizing hyperparameters
These are the parameters used in our configs. There are multiple tools such as hyperopt. There is a Rasa implementation that uses hyperopt to optimize the NLU data. It can be see at RasaHQ/nlu-hyperopt.
We have a make target (hyperopt-nlu
) that sets this up for this project.
It takes the configuration from config/nlu-hyperopt/
but further configuration can be done after first run in [.astro/nlu-hyperopt].
Refer to our config files and the original repository for more information.
To make it easier to review all our intents and their training examples, there are some scripts found on scripts/.
-
scripts/dump_data.py Will scan the data directory, load all the intents and dump all of these in stdin in CSV format. It will also validate for repeated intents, which is something that could be extracted to a separate process.
-
[scripts/update_google_sheet.py] It will read the file
./intents.csv
and upload it to google sheets using the following environmental variables:- SPREADSHEET_ID: Id of the google sheet
- WORKSHEET_NAME: Name of the work sheet
- GOOGLE_CLOUD_ACCOUNT: Service account email
- GOOGLE_CLOUD_ACCOUNT_SECRET: Private key of the service account email
This is currently done automatically on commits to main branch by one of our github workflows.