diff --git a/README.md b/README.md index 642af24a..6c5d147a 100644 --- a/README.md +++ b/README.md @@ -150,13 +150,18 @@ Here `run_1` to `run_n` correspond to the number of independent runs in a given > > For producing probability of improvement plots, it is important that any algorithm names in the dataset do not contain any commas. -### Data Tooling -[**Pull Neptune Data**](marl_eval/json_tools/pull_neptune_data.py): `pull_neptune_data` connects to a Neptune project, retrieves experiment data from a given list of tags and downloads it to a local directory. This function is particularly useful when there is a need to pull data from multiple experiments that were logged separately on Neptune. +### JSON Data Tooling -[**JSON Files Merging Script**](marl_eval/json_tools/merge_json_files.py): `concatenate_files` reads multiple json files from a specified local directory and concatenates their contents into a single structured dictionary, while ensuring uniqueness of seed numbers within the data. It handles nested json structures and saves the concatenated result into a new single json file for downstream aggregation and plotting. +[**JSON Logger**](marl_eval/json_tools/json_logger.py): `JsonLogger` handles logging data according to the structured format detailed [above](#data-structure-for-raw-experiment-data-📒). + +[**Neptune Data Pulling Script**](marl_eval/json_tools/pull_neptune_data.py): `pull_neptune_data` connects to a Neptune project, retrieves experiment data from a given list of tags and downloads it to a local directory. This function is particularly useful when there is a need to pull data from multiple experiments that were logged separately on Neptune. + +[**JSON File Merging Script**](marl_eval/json_tools/merge_json_files.py): `concatenate_json_files` reads multiple JSON files from a specified local directory and concatenates their contents into a single structured JSON file. > 📌 Using `pull_neptune_data` followed by `concatenate_files` forms an effective workflow, where multiple JSON files from different experiment runs are first pulled from Neptune and then merged into a single file, ready for use in marl-eval. +For more details on how to use the JSON tools, please see the [detailed usage guide](). + ### Metrics to be normalised during data processing ⚗️ Certain metrics, like episode returns, are required to be normalised during data processing. In order to achieve this it is required that users give these metric names, in the form of strings in a python list, to the `data_process_pipeline` function, the `create_matrices_for_rliable` function and all plotting functions as an argument. In the case where no normalisation is required this argument may be omitted. diff --git a/docs/json_tooling_usage.md b/docs/json_tooling_usage.md new file mode 100644 index 00000000..d1b6e0b0 --- /dev/null +++ b/docs/json_tooling_usage.md @@ -0,0 +1,94 @@ +# JSON tooling usage guide + +## JSON logger + +The JSON logger will write experiment data to JSON files in the format required for downstream aggregation and plotting with the MARL-eval tools. To initialise the logger the following arguments are required: + +* `path`: the path where a file called `metrics.json` will be stored which will contain all logged metrics for a given experiment. Data will be stored in `/metrics.json` by default. If a JSON file already exists at a particular path, new experiment data will be appended to it. MARL-eval does currently **NOT SUPPORT** asynchronous logging. So if you intend to run distributed experiments, please create a unique `path` per experiment and concatenate all generated JSON files after all experiments have been run. +* `algorithm_name`: the name of the algorithm being run in the current experiment. +* `task_name`: the name of the task in the current experiment. +* `environment_name`: the name of the environment in the current experiment. +* `seed`: the integer value of the seed used for pseudo-randomness in the current experiment. + +An example of initialising the JSON logger could look something like: + +```python +from marl_eval.json_tools import JsonLogger + +json_logger = JsonLogger( + path="experiment_results", + algorithm_name="IPPO", + task_name="2s3z", + environment_name="SMAX", + seed=42, +) +``` + +To write data to the logger, the `write` method takes in the following arguments: + +* `timestep`: the current environment timestep at the time of evaluation. +* `key`: the name of the metric to be logged. +* `value`: the scalar value to be logged for the current metric. +* `evaluation_step`: the number of evaluations that have been performed so far. +* `is_absolute_metric`: a boolean flag indicating whether an absolute metric is being logged. + +Suppose a the `4`th evaluation is being performed at environment timestep `40000` for the `episode_return` metric with a value of `12.9` then the `write` method could be used as follows: + +```python +json_logger.write( + timestep=40_000, + key="episode_return", + value=12.9, + evaluation_step=4, + is_absolute_metric=False, +) +``` + +In the case where the absolute metric for the `win_rate` metric with a value of `85.3` is logged at the `200`th evaluation after `2_000_000` timesteps, the `write` method would be called as follows: + +```python +json_logger.write( + timestep=2_000_000, + key="win_rate", + value=85.3, + evaluation_step=200, + is_absolute_metric=True, +) +``` + +## Neptune data pulling script +The `pull_neptune_data` script will download JSON data for multiple experiment runs from Neptune given a list of one or more Neptune experiment tags. The function accepts the following arguments: + +* `project_name`: the name of the neptune project where data has been logged given as `/`. +* `tag`: a list of Neptune experiment tags for which JSON data should be downloaded. +* `store_directory`: a local directory where downloaded JSON files should be stored. +* `neptune_data_key`: a key in a particular Neptune run where JSON data has been stored. By default this while be `metrics` implying that the JSON file will be stored as `metrics/.zip` in a given Neptune run. For an example of how data is uploaded please see [here](https://github.com/instadeepai/Mava/blob/ce9a161a0b293549b2a34cd9a8d794ba7e0c9949/mava/utils/logger.py#L182). + +In onrder to download data, the tool can be used as follows: + +```python +from marl_eval.json_tools import pull_netpune_data + +pull_netpune_data( + project_name="DemoWorkspace/demo_project", + tag=["experiment_1"], + store_directory="./neptune_json_data", +) +``` + +## JSON file merging script +The `concatenate_json_files` function will merge all JSON files found in a given directory into a single JSON file ready to be used for downstream aggregation and plotting with MARL-eval. The function accepts the following arguments: + +* `input_directory`: the path to the directory containing multiple JSON files. This directory can contain JSON files in arbitrarily nested directories. +* `output_json_path`: the path where the merged JSON file should be stored. + +The function can be used as follows: + +```python +from marl_eval.json_tools import concatenate_json_files + +concatenate_json_files( + input_directory="path/to/some/folder/", + output_json_path="path/to/merged_file/folder/", +) +``` diff --git a/marl_eval/json_tools/__init__.py b/marl_eval/json_tools/__init__.py index 22d1c582..d8cfcd64 100644 --- a/marl_eval/json_tools/__init__.py +++ b/marl_eval/json_tools/__init__.py @@ -15,3 +15,5 @@ """JSON tools for data preprocessing.""" from .json_logger import JsonLogger +from .merge_json_files import concatenate_json_files +from .pull_neptune_data import pull_neptune_data diff --git a/marl_eval/json_tools/json_logger.py b/marl_eval/json_tools/json_logger.py index 571fc136..e56d63e3 100644 --- a/marl_eval/json_tools/json_logger.py +++ b/marl_eval/json_tools/json_logger.py @@ -83,7 +83,7 @@ def write( Args: timestep (int): the current environment timestep. key (str): the name of the metric to be logged. - value (str): the value of the metric to be logged. + value (float): the value of the metric to be logged. evaluation_step (int): the number of evaluations already run. is_absolute_metric (bool): whether the metric being logged is an absolute metric. diff --git a/marl_eval/json_tools/merge_json_files.py b/marl_eval/json_tools/merge_json_files.py index f2dae389..336ef852 100644 --- a/marl_eval/json_tools/merge_json_files.py +++ b/marl_eval/json_tools/merge_json_files.py @@ -62,7 +62,7 @@ def _check_seed(concatenated_data: Dict, algo_data: Dict, seed_number: str) -> s return seed_number -def concatenate_files( +def concatenate_json_files( input_directory: str, output_json_path: str = "concatenated_json_files/" ) -> Dict: """Concatenate all json files in a directory and save the result in a json file."""