diff --git a/doc_generation/build/doctrees/Data.doctree b/doc_generation/build/doctrees/Data.doctree index bd9d00f..0de623b 100644 Binary files a/doc_generation/build/doctrees/Data.doctree and b/doc_generation/build/doctrees/Data.doctree differ diff --git a/doc_generation/build/doctrees/Model.doctree b/doc_generation/build/doctrees/Model.doctree index 6f7d89e..3f13b80 100644 Binary files a/doc_generation/build/doctrees/Model.doctree and b/doc_generation/build/doctrees/Model.doctree differ diff --git a/doc_generation/build/doctrees/running.doctree b/doc_generation/build/doctrees/running.doctree index 41511d1..a2f0742 100644 Binary files a/doc_generation/build/doctrees/running.doctree and b/doc_generation/build/doctrees/running.doctree differ diff --git a/doc_generation/build/doctrees/usage.doctree b/doc_generation/build/doctrees/usage.doctree index 5481b38..414311c 100644 Binary files a/doc_generation/build/doctrees/usage.doctree and b/doc_generation/build/doctrees/usage.doctree differ diff --git a/doc_generation/source/running.rst b/doc_generation/source/running.rst index 5a95217..9d7b378 100644 --- a/doc_generation/source/running.rst +++ b/doc_generation/source/running.rst @@ -52,7 +52,7 @@ will automatically download the relevant files and then compile them into the de - Organize downloaded data into an unified format, and the resulting historical text dataset is saved as a series of files ``./dataset/dataset_textseq/[century_num]/[year_num].json`` - Train a series of historical language models ``./dataset/dataset_modelseq/[century_num]`` using the historical text dataset. -**Running The Benchmarks** +**Running The Benchmarks on Command Line** Running @@ -88,7 +88,10 @@ See comments at the beginning of ``run_benchmark.py`` for a detailed usage of th [-h | --help] [...] #additional arguments will be supplied to the algorithms and the challenges when they are instantiated; only string values are supported - +**Running The Benchmarks with Python API** +You can also run the benchmark with the Python API using the :code:`run_benchmark` function. +.. autofunction:: run_benchmark.run_benchmark +It's worth noting that the Python API is more flexible than the command line interface, as you can implement your own Judge and Examinee classes (inheriting from :code:`JudgeBase` and :code:`ExamineeBase` respectively) and pass them to the :code:`run_benchmark` function. \ No newline at end of file diff --git a/doc_generation/source/usage.rst b/doc_generation/source/usage.rst index 0d443a3..7bdafc3 100644 --- a/doc_generation/source/usage.rst +++ b/doc_generation/source/usage.rst @@ -1,11 +1,18 @@ Quickstart: Implementing your own experiment ============================================ +General Instructions +-------------------- + To run the benchmark for your own alignment algorithm and assess its temporal alignment -abilities, you have to implement your own algorithm as a subclass of :class:`benchmark.framework.ExamineeBase`. Implement it -in a new ``.py`` file in ``algorithm``, then refer to :doc:`running` for guidance on how to run the benchmark on the examinee you implemented. +abilities, you have to implement your own algorithm as a subclass of :class:`benchmark.framework.ExamineeBase`. + +You could either implement it in a new ``.py`` file in ``algorithm`` and use the command-line interface for running it, or by directly implementing a subclass of :class:`benchmark.framework.ExamineeBase` in your own script and use the Python API for running it. Refer to :doc:`running` for guidance on how to run the benchmark on the examinee you implemented. + +Implementation Instructions +--------------------------- -Feeling daunted? Dom't worry. Read through the following descriptions of :class:`benchmark.framework.ExamineeBase`, and you'll understand the overall workings of the examinee class in our framework. +Feeling daunted? Don't worry. Read through the following descriptions of :class:`benchmark.framework.ExamineeBase`, and you'll understand the overall workings of the examinee class in our framework. .. autoclass:: benchmark.framework.ExamineeBase :members: diff --git a/docs/Data.html b/docs/Data.html index be678eb..d65c73a 100644 --- a/docs/Data.html +++ b/docs/Data.html @@ -274,12 +274,13 @@
FileNotFoundError – If file is not found in default search path and path is not specified.
Examples: -.. code-block:: python
--+Data(‘c4_demo’, data_type = ‘sft’, data_path = ‘./libs/llama_factory/data/c4_demo.json’) -Data(‘c4_demo’, data_type = ‘sft’)
-
Data('c4_demo', data_type = 'sft', data_path = './libs/llama_factory/data/c4_demo.json')
+Data('c4_demo', data_type = 'sft')
+
Example: -.. code-block:: python
--+data.set_key_fields(prompt_field_name=’content’) # for pretraining dataset stored in content field -data.set_key_fields(prompt_field_name=’instruction’, query_field_name=’input’, response_field_name=’output’) # for QA dataset with system prompt
-
data.set_key_fields(prompt_field_name='content') # for pretraining dataset stored in content field
+data.set_key_fields(prompt_field_name='instruction', query_field_name='input', response_field_name='output') # for QA dataset with system prompt
+
If collection_path is omitted, we will search for collection_name in directories specified in abstractions_config.json. When file_selection_func is supplied, files will be captured real-time, instead of only when initializing. Only json files will be captured. You may want to exclude undated.json using file_selection_func. That file is huge.
-Example: -.. code-block:: python
---
- DataFileCollection(collection_name=’histtext_1826_to_2018’,
data_type=’pretrain’, -collection_path = ‘./dataset/dataset_text_sequence/’, -file_selection_func = (lambda path: 1826 <= int(path.split(‘/’)[-1][1:6]) <= 2018))
++
-- Example:
+DataFileCollection(collection_name='histtext_1826_to_2018', + data_type='pretrain', + collection_path = './dataset/dataset_text_sequence/', + file_selection_func = (lambda path: 1826 <= int(path.split('/')[-1][1:6]) <= 2018)) +
Examples: -.. code-block:: python
--+Model(model_name = ‘Gemma-2B_sft’, is_instruct_finetuned = True, model_path = ‘./output/training_results/Gemma-2B_sft/’) -Model(model_name = ‘Gemma-2B_sft’, is_instruct_finetuned = True)
-
Model(model_name = 'Gemma-2B_sft', is_instruct_finetuned = True, model_path = './output/training_results/Gemma-2B_sft/')
+Model(model_name = 'Gemma-2B_sft', is_instruct_finetuned = True)
+
$ python run_benchmark.py
--algorithms=ALGO1[,ALGO2[,ALGO3[,...]]]
@@ -323,6 +323,37 @@ Basic Usage [...] #additional arguments will be supplied to the algorithms and the challenges when they are instantiated; only string values are supported
Running The Benchmarks with Python API
+You can also run the benchmark with the Python API using the run_benchmark
function.
Run a single benchmarking test on a single examinee and a single judge, and return the results.
+ExamineeClass – Necessary, examinee class object representing the algorithm to be evaluated. Can be any subclass of ExamineeBase, including user-implemented ones. Note that this is the class object, not an instance of the class.
judge (Type[JudgeBase]) – Necessary, judge class object representing the challenge to be evaluated. Can be any subclass of JudgeBase, including user-implemented ones. Note that this is the class object, not an instance of the class.
kwargs (Dict[str, str]) – Optional, additional arguments to be passed to the examinee and the judge. Pass the same str-typed arguments as you would in the command line.
A dictionary containing the results of the benchmarking test. The dictionary is in the exact same format as the results of command-line benchmarking.
+Dict[str, Any]
+from progressgym import run_benchmark, CoevolveJudge, LifelongDPOExaminee # if using PyPI package
+results = run_benchmark(LifelongDPOExaminee, CoevolveJudge)
+
It’s worth noting that the Python API is more flexible than the command line interface, as you can implement your own Judge and Examinee classes (inheriting from JudgeBase
and ExamineeBase
respectively) and pass them to the run_benchmark
function.