Skip to content

Commit

Permalink
deploy: f1511fb
Browse files Browse the repository at this point in the history
  • Loading branch information
puyuan1996 committed Sep 27, 2024
1 parent 7491d77 commit 773f2e2
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions _modules/lzero/entry/train_muzero_with_gym_env.html
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ <h1>Source code for lzero.entry.train_muzero_with_gym_env</h1><div class="highli
<span class="kn">from</span> <span class="nn">ding.config</span> <span class="kn">import</span> <span class="n">compile_config</span>
<span class="kn">from</span> <span class="nn">ding.envs</span> <span class="kn">import</span> <span class="n">DingEnvWrapper</span><span class="p">,</span> <span class="n">BaseEnvManager</span>
<span class="kn">from</span> <span class="nn">ding.policy</span> <span class="kn">import</span> <span class="n">create_policy</span>
<span class="kn">from</span> <span class="nn">ding.rl_utils</span> <span class="kn">import</span> <span class="n">get_epsilon_greedy_fn</span>
<span class="kn">from</span> <span class="nn">ding.utils</span> <span class="kn">import</span> <span class="n">set_pkg_seed</span>
<span class="kn">from</span> <span class="nn">ding.worker</span> <span class="kn">import</span> <span class="n">BaseLearner</span>
<span class="kn">from</span> <span class="nn">lzero.envs.get_wrapped_env</span> <span class="kn">import</span> <span class="n">get_wrappered_env</span>
Expand Down Expand Up @@ -218,6 +219,17 @@ <h1>Source code for lzero.entry.train_muzero_with_gym_env</h1><div class="highli
<span class="n">trained_steps</span><span class="o">=</span><span class="n">learner</span><span class="o">.</span><span class="n">train_iter</span>
<span class="p">)</span>

<span class="k">if</span> <span class="n">policy_config</span><span class="o">.</span><span class="n">eps</span><span class="o">.</span><span class="n">eps_greedy_exploration_in_collect</span><span class="p">:</span>
<span class="n">epsilon_greedy_fn</span> <span class="o">=</span> <span class="n">get_epsilon_greedy_fn</span><span class="p">(</span>
<span class="n">start</span><span class="o">=</span><span class="n">policy_config</span><span class="o">.</span><span class="n">eps</span><span class="o">.</span><span class="n">start</span><span class="p">,</span>
<span class="n">end</span><span class="o">=</span><span class="n">policy_config</span><span class="o">.</span><span class="n">eps</span><span class="o">.</span><span class="n">end</span><span class="p">,</span>
<span class="n">decay</span><span class="o">=</span><span class="n">policy_config</span><span class="o">.</span><span class="n">eps</span><span class="o">.</span><span class="n">decay</span><span class="p">,</span>
<span class="n">type_</span><span class="o">=</span><span class="n">policy_config</span><span class="o">.</span><span class="n">eps</span><span class="o">.</span><span class="n">type</span>
<span class="p">)</span>
<span class="n">collect_kwargs</span><span class="p">[</span><span class="s1">&#39;epsilon&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">epsilon_greedy_fn</span><span class="p">(</span><span class="n">collector</span><span class="o">.</span><span class="n">envstep</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">collect_kwargs</span><span class="p">[</span><span class="s1">&#39;epsilon&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.0</span>

<span class="c1"># Evaluate policy performance.</span>
<span class="k">if</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">should_eval</span><span class="p">(</span><span class="n">learner</span><span class="o">.</span><span class="n">train_iter</span><span class="p">):</span>
<span class="n">stop</span><span class="p">,</span> <span class="n">reward</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">eval</span><span class="p">(</span><span class="n">learner</span><span class="o">.</span><span class="n">save_checkpoint</span><span class="p">,</span> <span class="n">learner</span><span class="o">.</span><span class="n">train_iter</span><span class="p">,</span> <span class="n">collector</span><span class="o">.</span><span class="n">envstep</span><span class="p">)</span>
Expand Down

0 comments on commit 773f2e2

Please sign in to comment.