Skip to content

DRL Central Pattern Generation Algorithm with Matsuoka Networks controlled by MPO algorithm

Notifications You must be signed in to change notification settings

YusseffRuiz/Adaptive_RL

Repository files navigation

Adaptive_RL

Adaptive_RL
Adaptive_RL is an open-source framework for implementing state-of-the-art Reinforcement Learning (RL) algorithms, designed to be modular and easily adaptable for different research and real-world applications. This repository has emphasis in Continuous space problems.

Current implementation for Gymnasium and MyoSuite deployment, using torch.

Currently, the repository supports DDPG (Deep Deterministic Policy Gradient), SAC (Soft Actor-Critic), PPO (Proximal Policy Optimization), and MPO (Maximum a Posteriori Optimization) with a flexible architecture that allows for the integration of new algorithms.

Features:

Replay Buffer and experience sampling to stabilize training.

Customizable neural network architectures for the actor and critic networks.

Modular framework allowing for easy extension to additional algorithms. Comprehensive logging for training metrics and performance evaluation.

Implemented Algorithms

  • DDPG: Suitable for environments requiring deterministic, high-precision control. Works well in tasks such as robotic control or simulated physical systems.
  • SAC: Best used in tasks requiring robust exploration, where a stochastic policy can adapt better to uncertain environments.
  • MPO: Ideal for complex environments requiring a careful balance between policy stability and adaptability.
  • PPO: Strikes a balance between performance and simplicity. It prevents large policy updates, stabilizing the learning process by clipping updates to the policy to keep changes within a fixed range, ensuring more stable and reliable training performance.

CPG

Development of a DRL framework in the MatsuokaOscillator Folder.

MatsuokaOscillator Directory created with multiple neurons and multiple oscillators implementation for oscillated pattern movements. To be controlled by manual parameters or a DRL algorithm.

Hudgkin and Huxley Neurons: Simulate ionic currents such as sodium and potassium, allowing the system to generate more biologically accurate oscillations. This addition provides the flexibility to model more complex neuronal behaviors and enriches the simulation of Central Pattern Generators (CPGs) in robotic control, prosthetics, and other neural control systems.

Quickstart

Clone the repository and install dependencies:

git clone https://github.com/YusseffRuiz/Adaptive_RL
cd Adaptive_RL
pip install -r requirements.txt

How to use

Running the Training Script You can use the training script to train an agent on a specified environment using different algorithms like PPO, SAC, MPO, or DDPG. The training script supports several command-line arguments to customize the training process.

Here is how to run the training script:

To Run the SAC agent on the Mujoco Humanoid-v4 environment

python train.py --algorithm SAC --env Humanoid-v4 --steps 1000000

To Plot:

Plot every algorithm found inside a folder (you can specify only a specific folder):

python plot.py --path training/ --x_axis "train/seconds" --x_label "Time (s)" --title "Training Progress"

To plot multiple folders:

python plot.py --path training/Humanoid-v4-{SAC,CPG-SAC,DDPG,CPG-DDPG} --x_axis "train/seconds" --x_label "Time (s)" --title "Training Progress"

Test an Agent:

python simulate.py --algorithm SAC --env Humanoid-v4

Parallelization training via Multiprocess class.

Command-Line Arguments

Below are the key arguments that can be passed to the script for customizing the training:

Argument Type Default Description
--algorithm str 'PPO' The RL algorithm to use for training. Options: 'PPO', 'SAC', 'MPO', 'DDPG'.
--env str 'Humanoid-v4' The environment name to train on (e.g., 'Humanoid-v4', 'Walker2d-v4').
--cpg flag False Enable Central Pattern Generator (CPG) for training.
--f str None Folder to save logs, models, and results. If not specified, it will create one based on the environment and algorithm.
--experiment_number int 0 Specify the experiment number for logging purposes.
--steps int 1e7 Maximum steps for training.
--seq int 1 Number of sequential environments.
--parallel int 1 Number of parallel environments to run.

Hyperparameters

Argument Type Default Description
--learning_rate float 3.57e-05 Learning rate for the actor network.
--lr_critic float 3e-4 Learning rate for the critic network.
--ent_coeff float 0.00238306 Entropy coefficient used by PPO or SAC algorithms.
--clip_range float 0.3 Clip range for PPO.
--lr_dual float 0.000356987 Learning rate for the dual variables (MPO).
--gamma float 0.95 Discount factor.
--neuron_number int/list 256 Number of neurons in hidden layers. Can be a single integer or a list for specifying different layer sizes.
--layers_number int 2 Number of hidden layers in the neural network.
--batch_size int 256 Batch size used during training.
--replay_buffer_size int 10e5 Size of the replay buffer used by off-policy algorithms like SAC and DDPG.
--epsilon float 0.1 Exploration rate for epsilon-greedy algorithms.
--learning_starts int 10000 Number of steps before learning starts.
--noise float 0.01 Noise added to future rewards to promote exploration (DDPG).

Example Commands

Train PPO on Humanoid-v4 for 1 million steps:

python train.py --algorithm PPO --env Humanoid-v4 --steps 1000000

Train SAC on Walker2d-v4 with a custom learning rate:

python train.py --algorithm SAC --env Walker2d-v4 --learning_rate 0.0003

Enable CPG training with DDPG:

python train.py --algorithm DDPG --env Humanoid-v4 --cpg

To Run the SAC agent on the Mujoco Humanoid-v4 environment with CPG

python train.py --algorithm SAC --env Humanoid-v4 --steps 1000000 --cpg

Train MPO on Humanoid-v4 with parallel environments and multiple sequenced environments:

python train.py --algorithm MPO --env Humanoid-v4 --parallel 4 --seq 4

Train SAC with optimized HyperParameters based on rl3-zoo library:

python train.py --algorithm SAC --env Humanoid-v4 --params utilities_repository/hyperparameters.yaml

To Do:

ARS, TRPO implementation, analysing other state-of-the-art algorithms. Tensorflow Implementation.

Upcoming Features

  • Integration of Trust Region Policy Optimization (TRPO) and Augmented Random Search (ARS).
  • Enhanced support for discrete action spaces.

Credit

This repository is based on Tonic RL and inspired by SB3 Based on Tonic RL Library and modified to match current Gymnasium implementation and for MyoSuite development.

Changes includes:

  • Direct control over learning rates and neuron size.
  • Simplification of classes.
  • Updated Libraries.
  • Parallelization of algorithm and buffer by using torch tensors.

About

DRL Central Pattern Generation Algorithm with Matsuoka Networks controlled by MPO algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages