-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(rjy): add mamujoco env and related configs #153
base: main
Are you sure you want to change the base?
Conversation
以 independent learning 的形式接一下sampled efficientzero算法,验证环境的逻辑 |
""" | ||
Overview: | ||
The modified Multi-agentMuJoCo environment with continuous action space for LightZero's algorithms. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/opendilab/LightZero/blob/main/zoo/box2d/lunarlander/envs/lunarlander_env.py 类似这里增加详细清晰的注释,可以参考https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 这里的提示词用gpt4优化,然后手动矫正。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR的description里面增加这个PR的简要描述
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
原来DI-engine这里似乎还没replay,等把其他改完我再测试一下
# split a full batch into slices of mini_infer_size: to save the GPU memory for more GPU actors | ||
slices = int(np.ceil(transition_batch_size / self._cfg.mini_infer_size)) | ||
network_output = [] | ||
for i in range(slices): | ||
beg_index = self._cfg.mini_infer_size * i | ||
end_index = self._cfg.mini_infer_size * (i + 1) | ||
m_obs = torch.from_numpy(value_obs_list[beg_index:end_index]).to(self._cfg.device).float() | ||
m_obs = to_dtype(to_device(to_tensor(value_obs_list[beg_index:end_index]), self._cfg.device), torch.float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么要这样修改呢?之前的方法在多智能体下面会有报错吗?你现在的写法是在单/多智能体下都能与预期一致吗
|
||
# calculate the target value | ||
m_obs = default_collate(m_obs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
类似上面的问题
target_values.append(0) | ||
target_value_prefixs.append(value_prefix) | ||
target_values.append(np.zeros_like(value_list[0])) | ||
target_value_prefixs.append(np.array([0,])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
单/多智能体运行都是正常的吗?测试一下mamujoco hopper和lunarlander-cont
lzero/mcts/buffer/game_segment.py
Outdated
pad_frames = np.array([stacked_obs[-1] for _ in range(pad_len)]) | ||
stacked_obs = np.concatenate((stacked_obs, pad_frames)) | ||
pad_frames = [stacked_obs[-1] for _ in range(pad_len)] | ||
stacked_obs += pad_frames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
单/多智能体运行都是正常的吗?测试一下mamujoco hopper和lunarlander-cont
@@ -0,0 +1,540 @@ | |||
from typing import Optional, Tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
从SampledEfficientZeroModelMLP继承,只改写不同的method,增加overview 阐述具体的不同
lzero/worker/muzero_collector.py
Outdated
@@ -388,8 +398,12 @@ def collect(self, | |||
ready_env_id = ready_env_id.union(set(list(new_available_env_id)[:remain_episode])) | |||
remain_episode -= min(len(new_available_env_id), remain_episode) | |||
|
|||
stack_obs = {env_id: game_segments[env_id].get_obs() for env_id in ready_env_id} | |||
stack_obs = {env_id: game_segments[env_id].get_obs()[0] for env_id in ready_env_id} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确认单/多智能体是否兼容
if __name__ == "__main__": | ||
from zoo.multiagent_mujoco.entry import train_sez_independent_mamujoco | ||
|
||
train_sez_independent_mamujoco([main_config, create_config], seed=seed, max_env_step=max_env_step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前 mamujoco 上的实验状态是?写在description里面吧,以及相对单agent的核心算法overview
) -> 'Policy': # noqa | ||
""" | ||
Overview: | ||
The train entry for MCTS+RL algorithms, including MuZero, EfficientZero, Sampled EfficientZero, Gumbel Muzero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
更新overview,阐述清楚主要的改动代码
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个与原有的 train_muzero的主要区别是?如果区别不大,尽量复用原有的代码哈
@@ -0,0 +1,132 @@ | |||
from easydict import EasyDict | |||
import os | |||
os.environ["CUDA_VISIBLE_DEVICES"] = '6' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
优化config,去掉不通用的部分
@POLICY_REGISTRY.register('sampled_efficientzero') | ||
class SampledEfficientZeroPolicy(MuZeroPolicy): | ||
@POLICY_REGISTRY.register('sampled_efficientzero_ma') | ||
class SampledEfficientZeroMAPolicy(SampledEfficientZeroPolicy): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件应该保持和原来的一致哈
class SampledEfficientZeroMAPolicy(SampledEfficientZeroPolicy): | ||
""" | ||
Overview: | ||
The policy class for Sampled EfficientZero proposed in the paper https://arxiv.org/abs/2104.06303. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
更新注释,只重写需要修改的method哈,大部分应该是不用重写的。
MAmujoco Environment Integration: I have added support for the MAmujoco environment and successfully adapted it for use with LightZero. For detailed information about the MAmujoco environment, please refer to the original repository at MaMuJoCo Environments.
Independent Learning Pipeline: A new independent learning pipeline has been introduced to the project. This pipeline is currently integrated with the existing codebase and can be activated by setting the 'multi_agent' parameter accordingly.
These updates aim to enhance the project's functionality and scalability, providing a robust framework for multi-agent learning scenarios.