The official repository for the paper RL-PGO: Reinforcement Learning-based Planar Pose-Graph Optimization
The RL trainings and testing python script in the RL-PGO folder and was evaluated in a miniconda environment. For GPU installation, steps are shown assuming training is conducted on an Nvidia RTX-3090 GPU with the supported CUDA 11.6 driver.
First refer to this guide to install conda or miniconda.
Clone the directory to your desired location:
git clone https://github.com/Nick-Kou/RL-PGO.git
create conda env with python 3.7:
conda create -n myenv python=3.7
Activate the conda environemnt:
conda activate myenv
install the following pip dependences:
pip install PyOpenGL
pip install numpy
pip install tensorboard
pip install pandas
pip install matplotlib
pip install gym
For GPU Only - (Test on cuda 11.6 and tested on Pytorch 1.13):
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c dglteam dgl-cuda11.6
For CPU Only - tested on Pytorch 1.13:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
conda install -c dglteam dgl
All C++ realted dependencies were installed using g++-7 and Tested on Ubuntu 20.04:
How to switch between multiple GCC and G++ compiler versions on Ubuntu 20.04 LTS Focal Fossa
Install Dependencies for Minisam - tested on Eigen 3.3.7:
sudo apt-get update
sudo apt-get install libeigen3-dev
sudo apt-get install libsuitesparse-dev
Install Sophus Dependency for Minisam:
git clone https://github.com/strasdat/Sophus.git
cd Sophus
git checkout d63ad09 (Jan 1 2020 version)
mkdir build
cd build
cmake ..
cmake --build .
Install Minisam. Ensure that you activate your conda environemnt before proceeding with the next steps. Set -DMINISAM_WITH_CUSOLVER=ON for GPU option:
conda activate yourenvname
cd /PATH/TO/RL-PGO/minisam
mkdir build
cd build
cmake .. -DMINISAM_BUILD_PYTHON_PACKAGE=ON -DMINISAM_WITH_INTERNAL_TIMING=ON -DMINISAM_WITH_SOPHUS=ON -DMINISAM_WITH_CHOLMOD=ON -DMINISAM_WITH_SPQR=ON -DMINISAM_WITH_CUSOLVER=OFF
Before proceeding to install minisam python package verify that the python executable is pointing to your minconda environemnt. This is found by refering to the output path after running the bash cmake ..
command.
Install minisam python wrapper and package:
make
sudo make install
sudo make python_package
sudo make install
Note: If you check in the myenvname/lib/python3.7/site-packages folder you may see a folder named "minisam-0.0.0-py3.7" with a folder named minisam located inside of it. If so, you must copy the minisam folder to the site-packages location and delete the folder named "minisam-0.0.0-py3.7".
For GUI (optional) refer to: Pangolin. Also ensure that you activate your conda environment such that the python wrapper is installed in the correct site packages location simular to the minisam installation process. Uncomment the "#import pangolin" line 9 in RL_PGO.py if GUI is installed.
Example command to train the proposed reccurent SAC agent:
python3 RL_PGO.py --train --no-load --frob --no-grad --no-change --no-gui --sqrt_loss --no_change_noise --lr=3e-4 --grad_value=20.0 --trans_noise=0.1495 --rot_noise=0.2 --loop_close --inter_dist=1 --prob_of_loop=0.5 --freq_of_change=1 --hidden_dim=512 --state_dim=20 --reward_numerator_scale=1.0 --reward_update_scale=10.0 --training_name=Toy_Pose_sac_v2_lstm_Training452 --num_nodes=20 --rot_action_range=0.15 --batch_size=128 --update_itr=1 --max_steps=5 --gamma=1.0 --max_episodes=230000 --max_evaluation_episodes=1
Example commant to test the proposed reccurent SAC agent:
python3 RL_PGO.py --test --load --frob --no-grad --no-change --no-gui --sqrt_loss --no_change_noise --lr=3e-4 --grad_value=20.0 --trans_noise=0.1495 --rot_noise=0.2 --loop_close --inter_dist=1 --prob_of_loop=0.5 --freq_of_change=1 --hidden_dim=512 --state_dim=20 --reward_numerator_scale=1.0 --reward_update_scale=10.0 --training_name=Toy_Pose_sac_v2_lstm_Training373 --num_nodes=20 --rot_action_range=0.15 --batch_size=128 --update_itr=1 --max_steps=5 --gamma=1.0 --max_episodes=230000 --max_evaluation_episodes=10
Evaluate or train:
--test or --train
Option to load a pose-graph already placed in the run/trainingname folder or instead synthetically gernerate one during train time (graph must initially be loaded for --test mode).
--load or --no-load
Option to use Geodesic or Frobenius norm for rotation/orientation cost which is included in the reward function:
--no-frob or --frob
Option to use gradient clipping technique during training (only applicable for --train mode):
--no-grad or --grad
Option to use sample a graph at the specified noise parameters every "freq_of_change" episodes of training (only applicable for --train mode):
--no-change or --change
Option to use GUI for evaluation (only applicable for --test mode):
--no-gui or --gui
Option to use the square root of the sum squared orientation residuals for each factor:
--no_sqrt_loss or --sqrt_loss
Option to change the specified noise parameters the graphs are sampled from during training or evaluation (only applicable for --train mode):
--no_change_noise or --change_noise
Option to change the learning rate for training (only applicable for --train mode):
--lr=3e-4
Option to change the gradient clipping norm value if --grad is set for training (only applicable for --train mode):
--grad_value=20.0
Option to change the translational or position noise paramter in meters for sampled graph during training (only applicable for --train mode):
--trans_noise=0.1495
Option to change the rotataional or orientational noise paramter in radians for sampled graph during training (only applicable for --train mode):
--rot_noise=0.2
Option to include loop closures in the sampled graph (only applicable for --train mode):
--loop_close
Option to adjust the value of inter-nodal distance spacing between synthetically generated poses in meters (only applicable for --train mode):
--inter_dist=1
Option to change the probability a loop closure exists between factors (only applicable for --train mode). Value ranges from 0-1:
--inter_dist=1
Option to change the after how many episodes a new graph may be sampled (only applicable if --change is set). The command below indicates a new graph is sampled after every episode.
--freq_of_change=1
Option to change the hidden layer size for the recurrent LSTM unit:
--hidden_dim=512
Option to change the length of the encoded low dimenstional state vector output by the GNN:
--hidden_dim=512
Option to change numerator of the reward which is defined as numerator/(orientation cost +1):
--reward_numerator_scale=1.0
Option to reward scaling hyperparameter for training stabilization (only applicable for --train mode):
--reward_update_scale=10.0
Option to change the directory name where the training information such as the cumulative reward plot and best saved networks weights are saved:
training_name=Toy_Pose_sac_v2_lstm_Training373
Option to change the number of poses in which the graph is sampled from the environment (only applicable for --train mode). Must be a factor of 10:
--num_nodes=20
Option to change the arbitrary rotation magnitude scale in radians that is used for SO(2) retraction by the agent on each neighborghing pose:
--rot_action_range=0.15
Option to change the number of episode being sampled from the replay buffer for each update during training (only applicable for --train mode):
--batch_size=128
Option to change the frquency of updates after the batch number of episodes have been stored in the replay buffer (only applicable for --train mode). In this case an network update occurs after every episode once 128 episodes (batch size) are completed initally:
--update_itr=1
Option to change the number of cycles per epsiode:
--max_steps=5
Option to change the gamma hyperparameter involved in the update:
--gamma=1.0
Option to change the number episodes required to be completed for training to terminate:
--max_episodes=230000
Option to change the number of evaluation epsiodes (only applicable if --test is set):
--max_evaluation_episodes=10
If you use RL-PGO in any project please cite:
@article{Kourtzanidis2022RLPGORL,
title={RL-PGO: Reinforcement Learning-based Planar Pose-Graph Optimization},
author={Nikolaos Kourtzanidis and Sajad Saeedi},
journal={ArXiv},
year={2022},
volume={abs/2202.13221}}