feature(rjy): add mamujoco env and related configs #153

nighood · 2023-11-28T15:58:03Z

MAmujoco Environment Integration: I have added support for the MAmujoco environment and successfully adapted it for use with LightZero. For detailed information about the MAmujoco environment, please refer to the original repository at MaMuJoCo Environments.
Independent Learning Pipeline: A new independent learning pipeline has been introduced to the project. This pipeline is currently integrated with the existing codebase and can be activated by setting the 'multi_agent' parameter accordingly.

These updates aim to enhance the project's functionality and scalability, providing a robust framework for multi-agent learning scenarios.

puyuan1996 · 2023-11-29T07:20:33Z

以 independent learning 的形式接一下sampled efficientzero算法，验证环境的逻辑

puyuan1996 · 2023-12-04T03:15:56Z

zoo/multiagent_mujoco/envs/multiagent_mujoco_lightzero_env.py

+    """
+    Overview:
+        The modified Multi-agentMuJoCo environment with continuous action space for LightZero's algorithms.
+    """


https://github.com/opendilab/LightZero/blob/main/zoo/box2d/lunarlander/envs/lunarlander_env.py 类似这里增加详细清晰的注释，可以参考https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 这里的提示词用gpt4优化，然后手动矫正。

类似这里https://github.com/opendilab/LightZero/blob/main/zoo/box2d/lunarlander/envs/lunarlander_env.py#L30增加存储MP4和gif回复的功能。

PR的description里面增加这个PR的简要描述

类似这里https://github.com/opendilab/LightZero/blob/main/zoo/box2d/lunarlander/envs/lunarlander_env.py#L30增加存储MP4和gif回复的功能。

原来DI-engine这里似乎还没replay，等把其他改完我再测试一下

zoo/multiagent_mujoco/envs/multiagent_mujoco_lightzero_env.py

puyuan1996 · 2024-06-12T07:35:47Z

lzero/mcts/buffer/game_buffer_sampled_efficientzero.py

            # split a full batch into slices of mini_infer_size: to save the GPU memory for more GPU actors
            slices = int(np.ceil(transition_batch_size / self._cfg.mini_infer_size))
            network_output = []
            for i in range(slices):
                beg_index = self._cfg.mini_infer_size * i
                end_index = self._cfg.mini_infer_size * (i + 1)
-                m_obs = torch.from_numpy(value_obs_list[beg_index:end_index]).to(self._cfg.device).float()
+                m_obs = to_dtype(to_device(to_tensor(value_obs_list[beg_index:end_index]), self._cfg.device), torch.float)


为什么要这样修改呢？之前的方法在多智能体下面会有报错吗？你现在的写法是在单/多智能体下都能与预期一致吗

puyuan1996 · 2024-06-12T07:36:17Z

lzero/mcts/buffer/game_buffer_sampled_efficientzero.py


                # calculate the target value
+                m_obs = default_collate(m_obs)


类似上面的问题

puyuan1996 · 2024-06-12T07:37:12Z

lzero/mcts/buffer/game_buffer_sampled_efficientzero.py

-                        target_values.append(0)
-                        target_value_prefixs.append(value_prefix)
+                        target_values.append(np.zeros_like(value_list[0]))
+                        target_value_prefixs.append(np.array([0,]))


单/多智能体运行都是正常的吗？测试一下mamujoco hopper和lunarlander-cont

puyuan1996 · 2024-06-12T07:37:52Z

lzero/mcts/buffer/game_segment.py

-                pad_frames = np.array([stacked_obs[-1] for _ in range(pad_len)])
-                stacked_obs = np.concatenate((stacked_obs, pad_frames))
+                pad_frames = [stacked_obs[-1] for _ in range(pad_len)]
+                stacked_obs += pad_frames


单/多智能体运行都是正常的吗？测试一下mamujoco hopper和lunarlander-cont

puyuan1996 · 2024-06-12T07:39:55Z

lzero/model/sampled_efficientzero_model_mlp_ma_independent.py

@@ -0,0 +1,540 @@
+from typing import Optional, Tuple


从SampledEfficientZeroModelMLP继承，只改写不同的method，增加overview 阐述具体的不同

lzero/policy/scaling_transform.py

puyuan1996 · 2024-06-12T07:44:30Z

lzero/worker/muzero_collector.py

@@ -388,8 +398,12 @@ def collect(self,
                ready_env_id = ready_env_id.union(set(list(new_available_env_id)[:remain_episode]))
                remain_episode -= min(len(new_available_env_id), remain_episode)

-                stack_obs = {env_id: game_segments[env_id].get_obs() for env_id in ready_env_id}
+                stack_obs = {env_id: game_segments[env_id].get_obs()[0] for env_id in ready_env_id}


确认单/多智能体是否兼容

puyuan1996 · 2024-06-12T07:45:50Z

zoo/multiagent_mujoco/config/multiagent_mujoco_sampled_efficientzero_config.py

+if __name__ == "__main__":
+    from zoo.multiagent_mujoco.entry import train_sez_independent_mamujoco
+
+    train_sez_independent_mamujoco([main_config, create_config], seed=seed, max_env_step=max_env_step)


目前 mamujoco 上的实验状态是？写在description里面吧，以及相对单agent的核心算法overview

puyuan1996 · 2024-06-12T07:46:14Z

zoo/multiagent_mujoco/entry/train_sez_independent_mamujoco.py

+) -> 'Policy':  # noqa
+    """
+    Overview:
+        The train entry for MCTS+RL algorithms, including MuZero, EfficientZero, Sampled EfficientZero, Gumbel Muzero.


更新overview，阐述清楚主要的改动代码

这个与原有的 train_muzero的主要区别是？如果区别不大，尽量复用原有的代码哈

puyuan1996 · 2024-06-12T07:47:21Z

zoo/multiagent_mujoco/config/multiagent_mujoco_sampled_efficientzero_config.py

@@ -0,0 +1,132 @@
+from easydict import EasyDict
+import os
+os.environ["CUDA_VISIBLE_DEVICES"] = '6'


优化config，去掉不通用的部分

…wo separate branches.

puyuan1996 · 2024-06-25T09:48:39Z

lzero/policy/sampled_efficientzero.py

-@POLICY_REGISTRY.register('sampled_efficientzero')
-class SampledEfficientZeroPolicy(MuZeroPolicy):
+@POLICY_REGISTRY.register('sampled_efficientzero_ma')
+class SampledEfficientZeroMAPolicy(SampledEfficientZeroPolicy):


这个文件应该保持和原来的一致哈

puyuan1996 · 2024-06-25T09:49:19Z

lzero/policy/sampled_efficientzero_ma.py

+class SampledEfficientZeroMAPolicy(SampledEfficientZeroPolicy):
+    """
+    Overview:
+        The policy class for Sampled EfficientZero proposed in the paper https://arxiv.org/abs/2104.06303.


更新注释，只重写需要修改的method哈，大部分应该是不用重写的。

env(rjy): add mamujoco for LightZero

f1528d4

puyuan1996 added environment New or improved environment config New or improved configuration labels Nov 28, 2023

fix(rjy): fix mamujoco and add test

de89e54

puyuan1996 reviewed Dec 4, 2023

View reviewed changes

nighood added 4 commits December 6, 2023 10:44

feature(rjy): add independent sez pipeline(\train)

933f673

algo(rjy): fix forward_learn and game_buffer

1e04545

algo(rjy): add pipeline of sez ma (train+eval)

c54c0a5

fix(rjy): fix config

f888f6f

puyuan1996 added the research Research work in progress label Dec 12, 2023

puyuan1996 mentioned this pull request Mar 15, 2024

Question: How can I set up a custom environment? #198

Closed

puyuan1996 changed the title ~~WIP: env(rjy): add mamujoco for LightZero~~ feature(rjy): add mamujoco env and related configs Apr 8, 2024

puyuan1996 requested changes Jun 12, 2024

View reviewed changes

nighood added 2 commits June 14, 2024 15:28

polish(rjy): polish comments of mamujoco

27420d4

fix(rjy): Divide the handling of single/multi-agentin the code into t…

fc84583

…wo separate branches.

puyuan1996 reviewed Jun 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(rjy): add mamujoco env and related configs #153

feature(rjy): add mamujoco env and related configs #153

nighood commented Nov 28, 2023 •

edited

Loading

puyuan1996 commented Nov 29, 2023

puyuan1996 Dec 4, 2023

puyuan1996 Dec 4, 2023

puyuan1996 Dec 22, 2023

nighood Jun 14, 2024

puyuan1996 Jun 12, 2024

puyuan1996 Jun 12, 2024

puyuan1996 Jun 12, 2024

puyuan1996 Jun 12, 2024

puyuan1996 Jun 12, 2024

puyuan1996 Jun 12, 2024

puyuan1996 Jun 12, 2024

puyuan1996 Jun 12, 2024

puyuan1996 Jun 25, 2024

puyuan1996 Jun 12, 2024

puyuan1996 Jun 25, 2024

puyuan1996 Jun 25, 2024

feature(rjy): add mamujoco env and related configs #153

Are you sure you want to change the base?

feature(rjy): add mamujoco env and related configs #153

Conversation

nighood commented Nov 28, 2023 • edited Loading

puyuan1996 commented Nov 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nighood commented Nov 28, 2023 •

edited

Loading