Merge pull request #7 from TianyiQ/main

docs: fix examples in docs and add API instructions
PKU-Alignment · Sep 26, 2024 · d9d9181 · d9d9181
2 parents e1f7a8b + 13064ae
commit d9d9181
Show file tree

Hide file tree

Showing 18 changed files with 152 additions and 62 deletions.
diff --git a/doc_generation/build/doctrees/Data.doctree b/doc_generation/build/doctrees/Data.doctree
diff --git a/doc_generation/build/doctrees/Model.doctree b/doc_generation/build/doctrees/Model.doctree
diff --git a/doc_generation/build/doctrees/running.doctree b/doc_generation/build/doctrees/running.doctree
diff --git a/doc_generation/build/doctrees/usage.doctree b/doc_generation/build/doctrees/usage.doctree
diff --git a/doc_generation/source/running.rst b/doc_generation/source/running.rst
@@ -52,7 +52,7 @@ will automatically download the relevant files and then compile them into the de
   - Organize downloaded data into an unified format, and the resulting historical text dataset is saved as a series of files ``./dataset/dataset_textseq/[century_num]/[year_num].json``
   - Train a series of historical language models  ``./dataset/dataset_modelseq/[century_num]`` using the historical text dataset.
 
-**Running The Benchmarks**
+**Running The Benchmarks on Command Line**
 
 Running 
 
@@ -88,7 +88,10 @@ See comments at the beginning of ``run_benchmark.py`` for a detailed usage of th
         [-h | --help]
         [...] #additional arguments will be supplied to the algorithms and the challenges when they are instantiated; only string values are supported
  
-   
+**Running The Benchmarks with Python API**
 
+You can also run the benchmark with the Python API using the :code:`run_benchmark` function.
 
+.. autofunction:: run_benchmark.run_benchmark
 
+It's worth noting that the Python API is more flexible than the command line interface, as you can implement your own Judge and Examinee classes (inheriting from :code:`JudgeBase` and :code:`ExamineeBase` respectively) and pass them to the :code:`run_benchmark` function.
diff --git a/doc_generation/source/usage.rst b/doc_generation/source/usage.rst
@@ -1,11 +1,18 @@
 Quickstart: Implementing your own experiment
 ============================================
 
+General Instructions
+--------------------
+
 To run the benchmark for your own alignment algorithm and assess its temporal alignment
-abilities, you have to implement your own algorithm as a subclass of :class:`benchmark.framework.ExamineeBase`. Implement it
-in a new ``.py`` file in ``algorithm``, then refer to :doc:`running` for guidance on how to run the benchmark on the examinee you implemented. 
+abilities, you have to implement your own algorithm as a subclass of :class:`benchmark.framework.ExamineeBase`. 
+
+You could either implement it in a new ``.py`` file in ``algorithm`` and use the command-line interface for running it, or by directly implementing a subclass of :class:`benchmark.framework.ExamineeBase` in your own script and use the Python API for running it. Refer to :doc:`running` for guidance on how to run the benchmark on the examinee you implemented. 
+
+Implementation Instructions
+---------------------------
 
-Feeling daunted? Dom't worry. Read through the following descriptions of :class:`benchmark.framework.ExamineeBase`, and you'll understand the overall workings of the examinee class in our framework.
+Feeling daunted? Don't worry. Read through the following descriptions of :class:`benchmark.framework.ExamineeBase`, and you'll understand the overall workings of the examinee class in our framework.
 
 .. autoclass:: benchmark.framework.ExamineeBase
     :members:

diff --git a/docs/Data.html b/docs/Data.html
@@ -274,12 +274,13 @@ <h1>Data<a class="headerlink" href="#data" title="Permalink to this heading">¶<
 <dd class="field-even"><p><strong>FileNotFoundError</strong> – If file is not found in default search path and path is not specified.</p>
 </dd>
 </dl>
-<p>Examples:
-.. code-block:: python</p>
-<blockquote>
-<div><p>Data(‘c4_demo’, data_type = ‘sft’, data_path = ‘./libs/llama_factory/data/c4_demo.json’)
-Data(‘c4_demo’, data_type = ‘sft’)</p>
-</div></blockquote>
+<dl>
+<dt>Examples:</dt><dd><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">Data</span><span class="p">(</span><span class="s1">&#39;c4_demo&#39;</span><span class="p">,</span> <span class="n">data_type</span> <span class="o">=</span> <span class="s1">&#39;sft&#39;</span><span class="p">,</span> <span class="n">data_path</span> <span class="o">=</span> <span class="s1">&#39;./libs/llama_factory/data/c4_demo.json&#39;</span><span class="p">)</span>
+<span class="n">Data</span><span class="p">(</span><span class="s1">&#39;c4_demo&#39;</span><span class="p">,</span> <span class="n">data_type</span> <span class="o">=</span> <span class="s1">&#39;sft&#39;</span><span class="p">)</span>
+</pre></div>
+</div>
+</dd>
+</dl>
 </dd></dl>
 
 <dl class="py method">
@@ -340,12 +341,13 @@ <h1>Data<a class="headerlink" href="#data" title="Permalink to this heading">¶<
 </ul>
 </dd>
 </dl>
-<p>Example:
-.. code-block:: python</p>
-<blockquote>
-<div><p>data.set_key_fields(prompt_field_name=’content’) # for pretraining dataset stored in content field
-data.set_key_fields(prompt_field_name=’instruction’, query_field_name=’input’, response_field_name=’output’) # for QA dataset with system prompt</p>
-</div></blockquote>
+<dl>
+<dt>Example:</dt><dd><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">data</span><span class="o">.</span><span class="n">set_key_fields</span><span class="p">(</span><span class="n">prompt_field_name</span><span class="o">=</span><span class="s1">&#39;content&#39;</span><span class="p">)</span> <span class="c1"># for pretraining dataset stored in content field</span>
+<span class="n">data</span><span class="o">.</span><span class="n">set_key_fields</span><span class="p">(</span><span class="n">prompt_field_name</span><span class="o">=</span><span class="s1">&#39;instruction&#39;</span><span class="p">,</span> <span class="n">query_field_name</span><span class="o">=</span><span class="s1">&#39;input&#39;</span><span class="p">,</span> <span class="n">response_field_name</span><span class="o">=</span><span class="s1">&#39;output&#39;</span><span class="p">)</span> <span class="c1"># for QA dataset with system prompt</span>
+</pre></div>
+</div>
+</dd>
+</dl>
 </dd></dl>
 
 <dl class="py method">
@@ -397,16 +399,15 @@ <h1>Data<a class="headerlink" href="#data" title="Permalink to this heading">¶<
 <p>If collection_path is omitted, we will search for collection_name in directories specified in abstractions_config.json.
 When file_selection_func is supplied, files will be captured real-time, instead of only when initializing. Only json files will be captured.
 You may want to exclude undated.json using file_selection_func. That file is huge.</p>
-<p>Example:
-.. code-block:: python</p>
-<blockquote>
-<div><dl class="simple">
-<dt>DataFileCollection(collection_name=’histtext_1826_to_2018’,</dt><dd><p>data_type=’pretrain’,
-collection_path = ‘./dataset/dataset_text_sequence/’,
-file_selection_func = (lambda path: 1826 &lt;= int(path.split(‘/’)[-1][1:6]) &lt;= 2018))</p>
+<dl>
+<dt>Example:</dt><dd><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">DataFileCollection</span><span class="p">(</span><span class="n">collection_name</span><span class="o">=</span><span class="s1">&#39;histtext_1826_to_2018&#39;</span><span class="p">,</span>
+                <span class="n">data_type</span><span class="o">=</span><span class="s1">&#39;pretrain&#39;</span><span class="p">,</span>
+                <span class="n">collection_path</span> <span class="o">=</span> <span class="s1">&#39;./dataset/dataset_text_sequence/&#39;</span><span class="p">,</span>
+                <span class="n">file_selection_func</span> <span class="o">=</span> <span class="p">(</span><span class="k">lambda</span> <span class="n">path</span><span class="p">:</span> <span class="mi">1826</span> <span class="o">&lt;=</span> <span class="nb">int</span><span class="p">(</span><span class="n">path</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;/&#39;</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="mi">1</span><span class="p">:</span><span class="mi">6</span><span class="p">])</span> <span class="o">&lt;=</span> <span class="mi">2018</span><span class="p">))</span>
+</pre></div>
+</div>
 </dd>
 </dl>
-</div></blockquote>
 </dd></dl>
 
 <dl class="py method">

diff --git a/docs/Model.html b/docs/Model.html
@@ -271,12 +271,13 @@ <h1>Model<a class="headerlink" href="#model" title="Permalink to this heading">
 </ul>
 </dd>
 </dl>
-<p>Examples:
-.. code-block:: python</p>
-<blockquote>
-<div><p>Model(model_name = ‘Gemma-2B_sft’, is_instruct_finetuned = True, model_path = ‘./output/training_results/Gemma-2B_sft/’)
-Model(model_name = ‘Gemma-2B_sft’, is_instruct_finetuned = True)</p>
-</div></blockquote>
+<dl>
+<dt>Examples:</dt><dd><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">Model</span><span class="p">(</span><span class="n">model_name</span> <span class="o">=</span> <span class="s1">&#39;Gemma-2B_sft&#39;</span><span class="p">,</span> <span class="n">is_instruct_finetuned</span> <span class="o">=</span> <span class="kc">True</span><span class="p">,</span> <span class="n">model_path</span> <span class="o">=</span> <span class="s1">&#39;./output/training_results/Gemma-2B_sft/&#39;</span><span class="p">)</span>
+<span class="n">Model</span><span class="p">(</span><span class="n">model_name</span> <span class="o">=</span> <span class="s1">&#39;Gemma-2B_sft&#39;</span><span class="p">,</span> <span class="n">is_instruct_finetuned</span> <span class="o">=</span> <span class="kc">True</span><span class="p">)</span>
+</pre></div>
+</div>
+</dd>
+</dl>
 </dd></dl>
 
 <dl class="py method">

diff --git a/docs/_sources/running.rst.txt b/docs/_sources/running.rst.txt
@@ -52,7 +52,7 @@ will automatically download the relevant files and then compile them into the de
   - Organize downloaded data into an unified format, and the resulting historical text dataset is saved as a series of files ``./dataset/dataset_textseq/[century_num]/[year_num].json``
   - Train a series of historical language models  ``./dataset/dataset_modelseq/[century_num]`` using the historical text dataset.
 
-**Running The Benchmarks**
+**Running The Benchmarks on Command Line**
 
 Running 
 
@@ -88,7 +88,10 @@ See comments at the beginning of ``run_benchmark.py`` for a detailed usage of th
         [-h | --help]
         [...] #additional arguments will be supplied to the algorithms and the challenges when they are instantiated; only string values are supported
  
-   
+**Running The Benchmarks with Python API**
 
+You can also run the benchmark with the Python API using the :code:`run_benchmark` function.
 
+.. autofunction:: run_benchmark.run_benchmark
 
+It's worth noting that the Python API is more flexible than the command line interface, as you can implement your own Judge and Examinee classes (inheriting from :code:`JudgeBase` and :code:`ExamineeBase` respectively) and pass them to the :code:`run_benchmark` function.
diff --git a/docs/_sources/usage.rst.txt b/docs/_sources/usage.rst.txt
@@ -1,11 +1,18 @@
 Quickstart: Implementing your own experiment
 ============================================
 
+General Instructions
+--------------------
+
 To run the benchmark for your own alignment algorithm and assess its temporal alignment
-abilities, you have to implement your own algorithm as a subclass of :class:`benchmark.framework.ExamineeBase`. Implement it
-in a new ``.py`` file in ``algorithm``, then refer to :doc:`running` for guidance on how to run the benchmark on the examinee you implemented. 
+abilities, you have to implement your own algorithm as a subclass of :class:`benchmark.framework.ExamineeBase`. 
+
+You could either implement it in a new ``.py`` file in ``algorithm`` and use the command-line interface for running it, or by directly implementing a subclass of :class:`benchmark.framework.ExamineeBase` in your own script and use the Python API for running it. Refer to :doc:`running` for guidance on how to run the benchmark on the examinee you implemented. 
+
+Implementation Instructions
+---------------------------
 
-Feeling daunted? Dom't worry. Read through the following descriptions of :class:`benchmark.framework.ExamineeBase`, and you'll understand the overall workings of the examinee class in our framework.
+Feeling daunted? Don't worry. Read through the following descriptions of :class:`benchmark.framework.ExamineeBase`, and you'll understand the overall workings of the examinee class in our framework.
 
 .. autoclass:: benchmark.framework.ExamineeBase
     :members:

diff --git a/docs/genindex.html b/docs/genindex.html
@@ -532,6 +532,8 @@ <h2>R</h2>
           <li><a href="Examinee.html#benchmark.framework.ExamineeBase.run">(benchmark.framework.ExamineeBase method)</a>
 </li>
         </ul></li>
+        <li><a href="running.html#run_benchmark.run_benchmark">run_benchmark() (in module run_benchmark)</a>
+</li>
     </ul></td>
   </tr></table>
 </section>

diff --git a/docs/index.html b/docs/index.html
@@ -267,7 +267,11 @@ <h2>Contents<a class="headerlink" href="#contents" title="Permalink to this head
 <li class="toctree-l2"><a class="reference internal" href="running.html#basic-usage">Basic Usage</a></li>
 </ul>
 </li>
-<li class="toctree-l1"><a class="reference internal" href="usage.html">Quickstart: Implementing your own experiment</a></li>
+<li class="toctree-l1"><a class="reference internal" href="usage.html">Quickstart: Implementing your own experiment</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="usage.html#general-instructions">General Instructions</a></li>
+<li class="toctree-l2"><a class="reference internal" href="usage.html#implementation-instructions">Implementation Instructions</a></li>
+</ul>
+</li>
 <li class="toctree-l1"><a class="reference internal" href="documentation.html">API reference</a><ul>
 <li class="toctree-l2"><a class="reference internal" href="documentation.html#contents">Contents</a></li>
 </ul>

diff --git a/docs/objects.inv b/docs/objects.inv
diff --git a/docs/running.html b/docs/running.html
@@ -295,7 +295,7 @@ <h2>Basic Usage<a class="headerlink" href="#basic-usage" title="Permalink to thi
 </ul>
 </dd>
 </dl>
-<p><strong>Running The Benchmarks</strong></p>
+<p><strong>Running The Benchmarks on Command Line</strong></p>
 <p>Running</p>
 <div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>python<span class="w"> </span>run_benchmark.py
 <span class="go">     --algorithms=ALGO1[,ALGO2[,ALGO3[,...]]]</span>
@@ -323,6 +323,37 @@ <h2>Basic Usage<a class="headerlink" href="#basic-usage" title="Permalink to thi
 <span class="go">     [...] #additional arguments will be supplied to the algorithms and the challenges when they are instantiated; only string values are supported</span>
 </pre></div>
 </div>
+<p><strong>Running The Benchmarks with Python API</strong></p>
+<p>You can also run the benchmark with the Python API using the <code class="code docutils literal notranslate"><span class="pre">run_benchmark</span></code> function.</p>
+<dl class="py function">
+<dt class="sig sig-object py" id="run_benchmark.run_benchmark">
+<span class="sig-prename descclassname"><span class="pre">run_benchmark.</span></span><span class="sig-name descname"><span class="pre">run_benchmark</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">ExamineeClass</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Type</span><span class="p"><span class="pre">[</span></span><a class="reference internal" href="Examinee.html#benchmark.framework.ExamineeBase" title="benchmark.framework.ExamineeBase"><span class="pre">ExamineeBase</span></a><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">JudgeClass</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Type</span><span class="p"><span class="pre">[</span></span><a class="reference internal" href="Judge.html#benchmark.framework.JudgeBase" title="benchmark.framework.JudgeBase"><span class="pre">JudgeBase</span></a><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">Dict</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="pre">Any</span><span class="p"><span class="pre">]</span></span></span></span><a class="headerlink" href="#run_benchmark.run_benchmark" title="Permalink to this definition">¶</a></dt>
+<dd><p>Run a single benchmarking test on a single examinee and a single judge, and return the results.</p>
+<dl class="field-list simple">
+<dt class="field-odd">Parameters<span class="colon">:</span></dt>
+<dd class="field-odd"><ul class="simple">
+<li><p><strong>ExamineeClass</strong> – Necessary, examinee class object representing the algorithm to be evaluated. Can be any subclass of ExamineeBase, including user-implemented ones. Note that this is the class object, not an instance of the class.</p></li>
+<li><p><strong>judge</strong> (<em>Type</em><em>[</em><a class="reference internal" href="Judge.html#benchmark.framework.JudgeBase" title="benchmark.framework.JudgeBase"><em>JudgeBase</em></a><em>]</em>) – Necessary, judge class object representing the challenge to be evaluated. Can be any subclass of JudgeBase, including user-implemented ones. Note that this is the class object, not an instance of the class.</p></li>
+<li><p><strong>kwargs</strong> (<em>Dict</em><em>[</em><em>str</em><em>, </em><em>str</em><em>]</em>) – Optional, additional arguments to be passed to the examinee and the judge. Pass the same str-typed arguments as you would in the command line.</p></li>
+</ul>
+</dd>
+<dt class="field-even">Returns<span class="colon">:</span></dt>
+<dd class="field-even"><p>A dictionary containing the results of the benchmarking test. The dictionary is in the exact same format as the results of command-line benchmarking.</p>
+</dd>
+<dt class="field-odd">Return type<span class="colon">:</span></dt>
+<dd class="field-odd"><p>Dict[str, Any]</p>
+</dd>
+</dl>
+<dl>
+<dt>Example:</dt><dd><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">progressgym</span> <span class="kn">import</span> <span class="n">run_benchmark</span><span class="p">,</span> <span class="n">CoevolveJudge</span><span class="p">,</span> <span class="n">LifelongDPOExaminee</span> <span class="c1"># if using PyPI package</span>
+<span class="n">results</span> <span class="o">=</span> <span class="n">run_benchmark</span><span class="p">(</span><span class="n">LifelongDPOExaminee</span><span class="p">,</span> <span class="n">CoevolveJudge</span><span class="p">)</span>
+</pre></div>
+</div>
+</dd>
+</dl>
+</dd></dl>
+
+<p>It’s worth noting that the Python API is more flexible than the command line interface, as you can implement your own Judge and Examinee classes (inheriting from <code class="code docutils literal notranslate"><span class="pre">JudgeBase</span></code> and <code class="code docutils literal notranslate"><span class="pre">ExamineeBase</span></code> respectively) and pass them to the <code class="code docutils literal notranslate"><span class="pre">run_benchmark</span></code> function.</p>
 </section>
 </section>
 
@@ -384,7 +415,10 @@ <h2>Basic Usage<a class="headerlink" href="#basic-usage" title="Permalink to thi
 <li><a class="reference internal" href="#">Quickstart: Running the benchmark</a><ul>
 <li><a class="reference internal" href="#requirements">Requirements</a></li>
 <li><a class="reference internal" href="#installation">Installation</a></li>
-<li><a class="reference internal" href="#basic-usage">Basic Usage</a></li>
+<li><a class="reference internal" href="#basic-usage">Basic Usage</a><ul>
+<li><a class="reference internal" href="#run_benchmark.run_benchmark"><code class="docutils literal notranslate"><span class="pre">run_benchmark()</span></code></a></li>
+</ul>
+</li>
 </ul>
 </li>
 </ul>

diff --git a/docs/searchindex.js b/docs/searchindex.js