Skip to content

Commit

Permalink
Merge pull request #7 from TianyiQ/main
Browse files Browse the repository at this point in the history
docs: fix examples in docs and add API instructions
  • Loading branch information
TianyiQ authored Sep 26, 2024
2 parents e1f7a8b + 13064ae commit d9d9181
Show file tree
Hide file tree
Showing 18 changed files with 152 additions and 62 deletions.
Binary file modified doc_generation/build/doctrees/Data.doctree
Binary file not shown.
Binary file modified doc_generation/build/doctrees/Model.doctree
Binary file not shown.
Binary file modified doc_generation/build/doctrees/running.doctree
Binary file not shown.
Binary file modified doc_generation/build/doctrees/usage.doctree
Binary file not shown.
7 changes: 5 additions & 2 deletions doc_generation/source/running.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ will automatically download the relevant files and then compile them into the de
- Organize downloaded data into an unified format, and the resulting historical text dataset is saved as a series of files ``./dataset/dataset_textseq/[century_num]/[year_num].json``
- Train a series of historical language models ``./dataset/dataset_modelseq/[century_num]`` using the historical text dataset.

**Running The Benchmarks**
**Running The Benchmarks on Command Line**

Running

Expand Down Expand Up @@ -88,7 +88,10 @@ See comments at the beginning of ``run_benchmark.py`` for a detailed usage of th
[-h | --help]
[...] #additional arguments will be supplied to the algorithms and the challenges when they are instantiated; only string values are supported
**Running The Benchmarks with Python API**

You can also run the benchmark with the Python API using the :code:`run_benchmark` function.

.. autofunction:: run_benchmark.run_benchmark

It's worth noting that the Python API is more flexible than the command line interface, as you can implement your own Judge and Examinee classes (inheriting from :code:`JudgeBase` and :code:`ExamineeBase` respectively) and pass them to the :code:`run_benchmark` function.
13 changes: 10 additions & 3 deletions doc_generation/source/usage.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
Quickstart: Implementing your own experiment
============================================

General Instructions
--------------------

To run the benchmark for your own alignment algorithm and assess its temporal alignment
abilities, you have to implement your own algorithm as a subclass of :class:`benchmark.framework.ExamineeBase`. Implement it
in a new ``.py`` file in ``algorithm``, then refer to :doc:`running` for guidance on how to run the benchmark on the examinee you implemented.
abilities, you have to implement your own algorithm as a subclass of :class:`benchmark.framework.ExamineeBase`.

You could either implement it in a new ``.py`` file in ``algorithm`` and use the command-line interface for running it, or by directly implementing a subclass of :class:`benchmark.framework.ExamineeBase` in your own script and use the Python API for running it. Refer to :doc:`running` for guidance on how to run the benchmark on the examinee you implemented.

Implementation Instructions
---------------------------

Feeling daunted? Dom't worry. Read through the following descriptions of :class:`benchmark.framework.ExamineeBase`, and you'll understand the overall workings of the examinee class in our framework.
Feeling daunted? Don't worry. Read through the following descriptions of :class:`benchmark.framework.ExamineeBase`, and you'll understand the overall workings of the examinee class in our framework.

.. autoclass:: benchmark.framework.ExamineeBase
:members:
Expand Down
41 changes: 21 additions & 20 deletions docs/Data.html
Original file line number Diff line number Diff line change
Expand Up @@ -274,12 +274,13 @@ <h1>Data<a class="headerlink" href="#data" title="Permalink to this heading">¶<
<dd class="field-even"><p><strong>FileNotFoundError</strong> – If file is not found in default search path and path is not specified.</p>
</dd>
</dl>
<p>Examples:
.. code-block:: python</p>
<blockquote>
<div><p>Data(‘c4_demo’, data_type = ‘sft’, data_path = ‘./libs/llama_factory/data/c4_demo.json’)
Data(‘c4_demo’, data_type = ‘sft’)</p>
</div></blockquote>
<dl>
<dt>Examples:</dt><dd><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">Data</span><span class="p">(</span><span class="s1">&#39;c4_demo&#39;</span><span class="p">,</span> <span class="n">data_type</span> <span class="o">=</span> <span class="s1">&#39;sft&#39;</span><span class="p">,</span> <span class="n">data_path</span> <span class="o">=</span> <span class="s1">&#39;./libs/llama_factory/data/c4_demo.json&#39;</span><span class="p">)</span>
<span class="n">Data</span><span class="p">(</span><span class="s1">&#39;c4_demo&#39;</span><span class="p">,</span> <span class="n">data_type</span> <span class="o">=</span> <span class="s1">&#39;sft&#39;</span><span class="p">)</span>
</pre></div>
</div>
</dd>
</dl>
</dd></dl>

<dl class="py method">
Expand Down Expand Up @@ -340,12 +341,13 @@ <h1>Data<a class="headerlink" href="#data" title="Permalink to this heading">¶<
</ul>
</dd>
</dl>
<p>Example:
.. code-block:: python</p>
<blockquote>
<div><p>data.set_key_fields(prompt_field_name=’content’) # for pretraining dataset stored in content field
data.set_key_fields(prompt_field_name=’instruction’, query_field_name=’input’, response_field_name=’output’) # for QA dataset with system prompt</p>
</div></blockquote>
<dl>
<dt>Example:</dt><dd><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">data</span><span class="o">.</span><span class="n">set_key_fields</span><span class="p">(</span><span class="n">prompt_field_name</span><span class="o">=</span><span class="s1">&#39;content&#39;</span><span class="p">)</span> <span class="c1"># for pretraining dataset stored in content field</span>
<span class="n">data</span><span class="o">.</span><span class="n">set_key_fields</span><span class="p">(</span><span class="n">prompt_field_name</span><span class="o">=</span><span class="s1">&#39;instruction&#39;</span><span class="p">,</span> <span class="n">query_field_name</span><span class="o">=</span><span class="s1">&#39;input&#39;</span><span class="p">,</span> <span class="n">response_field_name</span><span class="o">=</span><span class="s1">&#39;output&#39;</span><span class="p">)</span> <span class="c1"># for QA dataset with system prompt</span>
</pre></div>
</div>
</dd>
</dl>
</dd></dl>

<dl class="py method">
Expand Down Expand Up @@ -397,16 +399,15 @@ <h1>Data<a class="headerlink" href="#data" title="Permalink to this heading">¶<
<p>If collection_path is omitted, we will search for collection_name in directories specified in abstractions_config.json.
When file_selection_func is supplied, files will be captured real-time, instead of only when initializing. Only json files will be captured.
You may want to exclude undated.json using file_selection_func. That file is huge.</p>
<p>Example:
.. code-block:: python</p>
<blockquote>
<div><dl class="simple">
<dt>DataFileCollection(collection_name=’histtext_1826_to_2018’,</dt><dd><p>data_type=’pretrain’,
collection_path = ‘./dataset/dataset_text_sequence/’,
file_selection_func = (lambda path: 1826 &lt;= int(path.split(‘/’)[-1][1:6]) &lt;= 2018))</p>
<dl>
<dt>Example:</dt><dd><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">DataFileCollection</span><span class="p">(</span><span class="n">collection_name</span><span class="o">=</span><span class="s1">&#39;histtext_1826_to_2018&#39;</span><span class="p">,</span>
<span class="n">data_type</span><span class="o">=</span><span class="s1">&#39;pretrain&#39;</span><span class="p">,</span>
<span class="n">collection_path</span> <span class="o">=</span> <span class="s1">&#39;./dataset/dataset_text_sequence/&#39;</span><span class="p">,</span>
<span class="n">file_selection_func</span> <span class="o">=</span> <span class="p">(</span><span class="k">lambda</span> <span class="n">path</span><span class="p">:</span> <span class="mi">1826</span> <span class="o">&lt;=</span> <span class="nb">int</span><span class="p">(</span><span class="n">path</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;/&#39;</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="mi">1</span><span class="p">:</span><span class="mi">6</span><span class="p">])</span> <span class="o">&lt;=</span> <span class="mi">2018</span><span class="p">))</span>
</pre></div>
</div>
</dd>
</dl>
</div></blockquote>
</dd></dl>

<dl class="py method">
Expand Down
13 changes: 7 additions & 6 deletions docs/Model.html
Original file line number Diff line number Diff line change
Expand Up @@ -271,12 +271,13 @@ <h1>Model<a class="headerlink" href="#model" title="Permalink to this heading">
</ul>
</dd>
</dl>
<p>Examples:
.. code-block:: python</p>
<blockquote>
<div><p>Model(model_name = ‘Gemma-2B_sft’, is_instruct_finetuned = True, model_path = ‘./output/training_results/Gemma-2B_sft/’)
Model(model_name = ‘Gemma-2B_sft’, is_instruct_finetuned = True)</p>
</div></blockquote>
<dl>
<dt>Examples:</dt><dd><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">Model</span><span class="p">(</span><span class="n">model_name</span> <span class="o">=</span> <span class="s1">&#39;Gemma-2B_sft&#39;</span><span class="p">,</span> <span class="n">is_instruct_finetuned</span> <span class="o">=</span> <span class="kc">True</span><span class="p">,</span> <span class="n">model_path</span> <span class="o">=</span> <span class="s1">&#39;./output/training_results/Gemma-2B_sft/&#39;</span><span class="p">)</span>
<span class="n">Model</span><span class="p">(</span><span class="n">model_name</span> <span class="o">=</span> <span class="s1">&#39;Gemma-2B_sft&#39;</span><span class="p">,</span> <span class="n">is_instruct_finetuned</span> <span class="o">=</span> <span class="kc">True</span><span class="p">)</span>
</pre></div>
</div>
</dd>
</dl>
</dd></dl>

<dl class="py method">
Expand Down
7 changes: 5 additions & 2 deletions docs/_sources/running.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ will automatically download the relevant files and then compile them into the de
- Organize downloaded data into an unified format, and the resulting historical text dataset is saved as a series of files ``./dataset/dataset_textseq/[century_num]/[year_num].json``
- Train a series of historical language models ``./dataset/dataset_modelseq/[century_num]`` using the historical text dataset.

**Running The Benchmarks**
**Running The Benchmarks on Command Line**

Running

Expand Down Expand Up @@ -88,7 +88,10 @@ See comments at the beginning of ``run_benchmark.py`` for a detailed usage of th
[-h | --help]
[...] #additional arguments will be supplied to the algorithms and the challenges when they are instantiated; only string values are supported
**Running The Benchmarks with Python API**

You can also run the benchmark with the Python API using the :code:`run_benchmark` function.

.. autofunction:: run_benchmark.run_benchmark

It's worth noting that the Python API is more flexible than the command line interface, as you can implement your own Judge and Examinee classes (inheriting from :code:`JudgeBase` and :code:`ExamineeBase` respectively) and pass them to the :code:`run_benchmark` function.
13 changes: 10 additions & 3 deletions docs/_sources/usage.rst.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
Quickstart: Implementing your own experiment
============================================

General Instructions
--------------------

To run the benchmark for your own alignment algorithm and assess its temporal alignment
abilities, you have to implement your own algorithm as a subclass of :class:`benchmark.framework.ExamineeBase`. Implement it
in a new ``.py`` file in ``algorithm``, then refer to :doc:`running` for guidance on how to run the benchmark on the examinee you implemented.
abilities, you have to implement your own algorithm as a subclass of :class:`benchmark.framework.ExamineeBase`.

You could either implement it in a new ``.py`` file in ``algorithm`` and use the command-line interface for running it, or by directly implementing a subclass of :class:`benchmark.framework.ExamineeBase` in your own script and use the Python API for running it. Refer to :doc:`running` for guidance on how to run the benchmark on the examinee you implemented.

Implementation Instructions
---------------------------

Feeling daunted? Dom't worry. Read through the following descriptions of :class:`benchmark.framework.ExamineeBase`, and you'll understand the overall workings of the examinee class in our framework.
Feeling daunted? Don't worry. Read through the following descriptions of :class:`benchmark.framework.ExamineeBase`, and you'll understand the overall workings of the examinee class in our framework.

.. autoclass:: benchmark.framework.ExamineeBase
:members:
Expand Down
2 changes: 2 additions & 0 deletions docs/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -532,6 +532,8 @@ <h2>R</h2>
<li><a href="Examinee.html#benchmark.framework.ExamineeBase.run">(benchmark.framework.ExamineeBase method)</a>
</li>
</ul></li>
<li><a href="running.html#run_benchmark.run_benchmark">run_benchmark() (in module run_benchmark)</a>
</li>
</ul></td>
</tr></table>
</section>
Expand Down
6 changes: 5 additions & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,11 @@ <h2>Contents<a class="headerlink" href="#contents" title="Permalink to this head
<li class="toctree-l2"><a class="reference internal" href="running.html#basic-usage">Basic Usage</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="usage.html">Quickstart: Implementing your own experiment</a></li>
<li class="toctree-l1"><a class="reference internal" href="usage.html">Quickstart: Implementing your own experiment</a><ul>
<li class="toctree-l2"><a class="reference internal" href="usage.html#general-instructions">General Instructions</a></li>
<li class="toctree-l2"><a class="reference internal" href="usage.html#implementation-instructions">Implementation Instructions</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="documentation.html">API reference</a><ul>
<li class="toctree-l2"><a class="reference internal" href="documentation.html#contents">Contents</a></li>
</ul>
Expand Down
Binary file modified docs/objects.inv
Binary file not shown.
38 changes: 36 additions & 2 deletions docs/running.html
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ <h2>Basic Usage<a class="headerlink" href="#basic-usage" title="Permalink to thi
</ul>
</dd>
</dl>
<p><strong>Running The Benchmarks</strong></p>
<p><strong>Running The Benchmarks on Command Line</strong></p>
<p>Running</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>python<span class="w"> </span>run_benchmark.py
<span class="go"> --algorithms=ALGO1[,ALGO2[,ALGO3[,...]]]</span>
Expand Down Expand Up @@ -323,6 +323,37 @@ <h2>Basic Usage<a class="headerlink" href="#basic-usage" title="Permalink to thi
<span class="go"> [...] #additional arguments will be supplied to the algorithms and the challenges when they are instantiated; only string values are supported</span>
</pre></div>
</div>
<p><strong>Running The Benchmarks with Python API</strong></p>
<p>You can also run the benchmark with the Python API using the <code class="code docutils literal notranslate"><span class="pre">run_benchmark</span></code> function.</p>
<dl class="py function">
<dt class="sig sig-object py" id="run_benchmark.run_benchmark">
<span class="sig-prename descclassname"><span class="pre">run_benchmark.</span></span><span class="sig-name descname"><span class="pre">run_benchmark</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">ExamineeClass</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Type</span><span class="p"><span class="pre">[</span></span><a class="reference internal" href="Examinee.html#benchmark.framework.ExamineeBase" title="benchmark.framework.ExamineeBase"><span class="pre">ExamineeBase</span></a><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">JudgeClass</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Type</span><span class="p"><span class="pre">[</span></span><a class="reference internal" href="Judge.html#benchmark.framework.JudgeBase" title="benchmark.framework.JudgeBase"><span class="pre">JudgeBase</span></a><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">&#x2192;</span> <span class="sig-return-typehint"><span class="pre">Dict</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="pre">Any</span><span class="p"><span class="pre">]</span></span></span></span><a class="headerlink" href="#run_benchmark.run_benchmark" title="Permalink to this definition"></a></dt>
<dd><p>Run a single benchmarking test on a single examinee and a single judge, and return the results.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>ExamineeClass</strong> – Necessary, examinee class object representing the algorithm to be evaluated. Can be any subclass of ExamineeBase, including user-implemented ones. Note that this is the class object, not an instance of the class.</p></li>
<li><p><strong>judge</strong> (<em>Type</em><em>[</em><a class="reference internal" href="Judge.html#benchmark.framework.JudgeBase" title="benchmark.framework.JudgeBase"><em>JudgeBase</em></a><em>]</em>) – Necessary, judge class object representing the challenge to be evaluated. Can be any subclass of JudgeBase, including user-implemented ones. Note that this is the class object, not an instance of the class.</p></li>
<li><p><strong>kwargs</strong> (<em>Dict</em><em>[</em><em>str</em><em>, </em><em>str</em><em>]</em>) – Optional, additional arguments to be passed to the examinee and the judge. Pass the same str-typed arguments as you would in the command line.</p></li>
</ul>
</dd>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>A dictionary containing the results of the benchmarking test. The dictionary is in the exact same format as the results of command-line benchmarking.</p>
</dd>
<dt class="field-odd">Return type<span class="colon">:</span></dt>
<dd class="field-odd"><p>Dict[str, Any]</p>
</dd>
</dl>
<dl>
<dt>Example:</dt><dd><div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">progressgym</span> <span class="kn">import</span> <span class="n">run_benchmark</span><span class="p">,</span> <span class="n">CoevolveJudge</span><span class="p">,</span> <span class="n">LifelongDPOExaminee</span> <span class="c1"># if using PyPI package</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">run_benchmark</span><span class="p">(</span><span class="n">LifelongDPOExaminee</span><span class="p">,</span> <span class="n">CoevolveJudge</span><span class="p">)</span>
</pre></div>
</div>
</dd>
</dl>
</dd></dl>

<p>It’s worth noting that the Python API is more flexible than the command line interface, as you can implement your own Judge and Examinee classes (inheriting from <code class="code docutils literal notranslate"><span class="pre">JudgeBase</span></code> and <code class="code docutils literal notranslate"><span class="pre">ExamineeBase</span></code> respectively) and pass them to the <code class="code docutils literal notranslate"><span class="pre">run_benchmark</span></code> function.</p>
</section>
</section>

Expand Down Expand Up @@ -384,7 +415,10 @@ <h2>Basic Usage<a class="headerlink" href="#basic-usage" title="Permalink to thi
<li><a class="reference internal" href="#">Quickstart: Running the benchmark</a><ul>
<li><a class="reference internal" href="#requirements">Requirements</a></li>
<li><a class="reference internal" href="#installation">Installation</a></li>
<li><a class="reference internal" href="#basic-usage">Basic Usage</a></li>
<li><a class="reference internal" href="#basic-usage">Basic Usage</a><ul>
<li><a class="reference internal" href="#run_benchmark.run_benchmark"><code class="docutils literal notranslate"><span class="pre">run_benchmark()</span></code></a></li>
</ul>
</li>
</ul>
</li>
</ul>
Expand Down
2 changes: 1 addition & 1 deletion docs/searchindex.js

Large diffs are not rendered by default.

Loading

0 comments on commit d9d9181

Please sign in to comment.