[FEATURE] Vsys feature: massively parallel domain randomization #458

Velythyl · 2024-02-15T22:48:00Z

Hello!

For an unrelated research project, I needed a massively parallel RL environment with domain randomization capabilities. Isaac Sim/Gym/Omniverse fit the bill, but I also needed the simulator to be differentiable w.r.t. each domain randomization parameters.

So I set out to implement DR in brax. This is research code, so it's obviously a little janky and ad-hoc. But I thought maybe the brax community could find this interesting, and perhaps (with a lot of tuning) even merge it into brax main.

Special thanks to this github issue from which I stole some code ;) here

Note that this domain randomization method is more powerful than this. With this code, we can randomize every single simulation step, if we so wish.

The summary of the implementation is simple: we just augment the simulation state to contain sys, thereby allowing every single parallel environment access to its own separate sys. Also, this enables us to resample sys according to some rule (for example, "resample every 50 steps").

Features:

The vsys wrapper allows for a vectorized `sys` variable that might contain different domain randomization values for each vectorized env

Domain randomization is controlled via a simple yaml file format that describes the path to a domain randomization target. Example:

link:
  inertia:
    mass:
      base: [r, r, r, r, r, r, r]
      min: [-0.5, -0.5,-0.5,-0.5,-0.5,-0.5,-0.5]
      max: [0.5, 0.5,0.5,0.5,0.5,0.5,0.5]
  constraint_ang_damping:
    min: [-1,-1,-1,1,1,1,1]
    max: [2,2,2,1.5,1,1,1]

This randomizes over the 7 links of the robot. For the mass, the base is "r", so the value is "read" from the default value defined in the URDF file. The min-max ranges are both relative to the base, so the current setup randomizes from [r-0.5, r+0.5]. For the damping, no base is given, which defaults to "r". One could also set the base to a float value. Another possible value for the base is "n" ("none"), which disables randomization for this index.

Domain randomization is differentiable (!)

For example, running a simple optax optimizer, we can obtain the true domain randomization parameters in play for a specific timestep.

Known issues:

Because of the need for sys to be included in the state, a few of the python type hints are broken
The yaml definition system is arbitrary and might not be best-practices
This is kind of a huge PR, so thoroughly testing all ~800 lines of code changes is bound to be tough
You can test the changes by looking at the script in the vsys wrapper's if __name__ == "__main__": function. Specifically, here: https://github.com/Velythyl/brax/blob/b6cab6449ba677108e37739286e0521f7c226a9e/brax/envs/wrappers/vsys.py#L553

Again, I don't expect this to be merged as-is. But perhaps the implementation might be interesting to the community, hence the reason for this PR.

Sync

lebrice · 2024-02-16T05:10:28Z

Hey @Velythyl I've been looking forward to this feature for a while now, thanks a lot for sharing this!
I'm just curious, why did you close the PR?

Velythyl · 2024-02-16T16:20:43Z

@lebrice Hey! Sorry, I realized I had some cleanup to do, and it was way past 5pm so I wanted to go home. I reopened it now.

lebrice

(I'm not a maintainer, this is just a fix for some spacing typos, this is very clean!)

lebrice · 2024-04-30T21:26:37Z

brax/envs/half_cheetah.py

@@ -173,12 +173,12 @@ def reset(self, rng: jax.Array) -> State:
        'reward_ctrl': zero,
        'reward_run': zero,
    }
-    return State(pipeline_state, obs, reward, done, metrics)
+    return State(pipeline_state, obs, reward, done,sys,  metrics)


Suggested change

return State(pipeline_state, obs, reward, done,sys, metrics)

return State(pipeline_state, obs, reward, done, sys, metrics)

lebrice · 2024-04-30T21:27:33Z

brax/envs/hopper.py

@@ -218,12 +218,12 @@ def reset(self, rng: jax.Array) -> State:
        'x_position': zero,
        'x_velocity': zero,
    }
-    return State(pipeline_state, obs, reward, done, metrics)
+    return State(pipeline_state, obs, reward, done,sys,  metrics)


Suggested change

return State(pipeline_state, obs, reward, done,sys, metrics)

return State(pipeline_state, obs, reward, done, sys, metrics)

btaba · 2024-04-30T21:53:33Z

Thanks @Velythyl ! The recent comment made me just realize that maintainers hadn't commented on the PR. There were a few design decisions that went into DomainRandomizationVmapWrapper:

We saw better performance when sys was not added as part of State
We wanted the user to fully define the randomization strategy rather than have a schema. At HEAD, this can be done via the randomization_fn.

The cons of the impl at HEAD are that:

The reset is static and stored in the wrapper, as addressed in this PR.
Simple randomization strategies still require the user to write a randomization_fn

What I think would make sense to merge, is to add a wrapper with the same API as DomainRandomizationVmapWrapper, that passes in_axes and the randomized Sys PyTree values in the State, as discussed in this thread: #446 .

Velythyl and others added 13 commits June 22, 2023 22:28

Merge pull request #1 from google/main

4d569d4

Sync

dr works. onto prod

ea0c6ae

done?

a60207d

upd

d8d4914

added tracking of current vals

769d057

fixed vals

0fc424b

fixed vals again\?

2d1fdcd

added logging for resampling

e0317b3

hmm

dda64ef

fix logging of skr vals

3c66f57

wat

5408fcc

merge

5952601

pr

b6cab64

Velythyl closed this Feb 15, 2024

small cleanup

3da30d0

Velythyl reopened this Feb 16, 2024

huge refactor

0d835eb

lebrice reviewed Apr 30, 2024

View reviewed changes

Velythyl added 5 commits April 30, 2024 23:21

fix rngs

ca84df8

more explicit

3ae61cf

fix pusher

f952b9f

upd

3ff0565

upd

7767989

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Vsys feature: massively parallel domain randomization #458

[FEATURE] Vsys feature: massively parallel domain randomization #458

Velythyl commented Feb 15, 2024 •

edited

Loading

lebrice commented Feb 16, 2024

Velythyl commented Feb 16, 2024

lebrice left a comment

lebrice Apr 30, 2024

lebrice Apr 30, 2024

btaba commented Apr 30, 2024 •

edited

Loading

	return State(pipeline_state, obs, reward, done,sys, metrics)
	return State(pipeline_state, obs, reward, done, sys, metrics)

[FEATURE] Vsys feature: massively parallel domain randomization #458

Are you sure you want to change the base?

[FEATURE] Vsys feature: massively parallel domain randomization #458

Conversation

Velythyl commented Feb 15, 2024 • edited Loading

Features:

The vsys wrapper allows for a vectorized sys variable that might contain different domain randomization values for each vectorized env

Domain randomization is controlled via a simple yaml file format that describes the path to a domain randomization target. Example:

Domain randomization is differentiable (!)

Known issues:

lebrice commented Feb 16, 2024

Velythyl commented Feb 16, 2024

lebrice left a comment

Choose a reason for hiding this comment

lebrice Apr 30, 2024

Choose a reason for hiding this comment

lebrice Apr 30, 2024

Choose a reason for hiding this comment

btaba commented Apr 30, 2024 • edited Loading

Velythyl commented Feb 15, 2024 •

edited

Loading

The vsys wrapper allows for a vectorized `sys` variable that might contain different domain randomization values for each vectorized env

btaba commented Apr 30, 2024 •

edited

Loading