Skip to content

Commit

Permalink
feat: update readme with json logger details
Browse files Browse the repository at this point in the history
  • Loading branch information
RuanJohn committed Feb 26, 2024
1 parent ce80357 commit 4567b75
Show file tree
Hide file tree
Showing 5 changed files with 106 additions and 5 deletions.
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,13 +150,18 @@ Here `run_1` to `run_n` correspond to the number of independent runs in a given
>
> For producing probability of improvement plots, it is important that any algorithm names in the dataset do not contain any commas.
### Data Tooling
[**Pull Neptune Data**](marl_eval/json_tools/pull_neptune_data.py): `pull_neptune_data` connects to a Neptune project, retrieves experiment data from a given list of tags and downloads it to a local directory. This function is particularly useful when there is a need to pull data from multiple experiments that were logged separately on Neptune.
### JSON Data Tooling

[**JSON Files Merging Script**](marl_eval/json_tools/merge_json_files.py): `concatenate_files` reads multiple json files from a specified local directory and concatenates their contents into a single structured dictionary, while ensuring uniqueness of seed numbers within the data. It handles nested json structures and saves the concatenated result into a new single json file for downstream aggregation and plotting.
[**JSON Logger**](marl_eval/json_tools/json_logger.py): `JsonLogger` handles logging data according to the structured format detailed [above](#data-structure-for-raw-experiment-data-📒).

[**Neptune Data Pulling Script**](marl_eval/json_tools/pull_neptune_data.py): `pull_neptune_data` connects to a Neptune project, retrieves experiment data from a given list of tags and downloads it to a local directory. This function is particularly useful when there is a need to pull data from multiple experiments that were logged separately on Neptune.

[**JSON File Merging Script**](marl_eval/json_tools/merge_json_files.py): `concatenate_json_files` reads multiple JSON files from a specified local directory and concatenates their contents into a single structured JSON file.

> 📌 Using `pull_neptune_data` followed by `concatenate_files` forms an effective workflow, where multiple JSON files from different experiment runs are first pulled from Neptune and then merged into a single file, ready for use in marl-eval.
For more details on how to use the JSON tools, please see the [detailed usage guide]().

### Metrics to be normalised during data processing ⚗️
Certain metrics, like episode returns, are required to be normalised during data processing. In order to achieve this it is required that users give these metric names, in the form of strings in a python list, to the `data_process_pipeline` function, the `create_matrices_for_rliable` function and all plotting functions as an argument. In the case where no normalisation is required this argument may be omitted.

Expand Down
94 changes: 94 additions & 0 deletions docs/json_tooling_usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# JSON tooling usage guide

## JSON logger

The JSON logger will write experiment data to JSON files in the format required for downstream aggregation and plotting with the MARL-eval tools. To initialise the logger the following arguments are required:

* `path`: the path where a file called `metrics.json` will be stored which will contain all logged metrics for a given experiment. Data will be stored in `<path>/metrics.json` by default. If a JSON file already exists at a particular path, new experiment data will be appended to it. MARL-eval does currently **NOT SUPPORT** asynchronous logging. So if you intend to run distributed experiments, please create a unique `path` per experiment and concatenate all generated JSON files after all experiments have been run.
* `algorithm_name`: the name of the algorithm being run in the current experiment.
* `task_name`: the name of the task in the current experiment.
* `environment_name`: the name of the environment in the current experiment.
* `seed`: the integer value of the seed used for pseudo-randomness in the current experiment.

An example of initialising the JSON logger could look something like:

```python
from marl_eval.json_tools import JsonLogger

json_logger = JsonLogger(
path="experiment_results",
algorithm_name="IPPO",
task_name="2s3z",
environment_name="SMAX",
seed=42,
)
```

To write data to the logger, the `write` method takes in the following arguments:

* `timestep`: the current environment timestep at the time of evaluation.
* `key`: the name of the metric to be logged.
* `value`: the scalar value to be logged for the current metric.
* `evaluation_step`: the number of evaluations that have been performed so far.
* `is_absolute_metric`: a boolean flag indicating whether an absolute metric is being logged.

Suppose a the `4`th evaluation is being performed at environment timestep `40000` for the `episode_return` metric with a value of `12.9` then the `write` method could be used as follows:

```python
json_logger.write(
timestep=40_000,
key="episode_return",
value=12.9,
evaluation_step=4,
is_absolute_metric=False,
)
```

In the case where the absolute metric for the `win_rate` metric with a value of `85.3` is logged at the `200`th evaluation after `2_000_000` timesteps, the `write` method would be called as follows:

```python
json_logger.write(
timestep=2_000_000,
key="win_rate",
value=85.3,
evaluation_step=200,
is_absolute_metric=True,
)
```

## Neptune data pulling script
The `pull_neptune_data` script will download JSON data for multiple experiment runs from Neptune given a list of one or more Neptune experiment tags. The function accepts the following arguments:

* `project_name`: the name of the neptune project where data has been logged given as `<workspace_name>/<project_name>`.
* `tag`: a list of Neptune experiment tags for which JSON data should be downloaded.
* `store_directory`: a local directory where downloaded JSON files should be stored.
* `neptune_data_key`: a key in a particular Neptune run where JSON data has been stored. By default this while be `metrics` implying that the JSON file will be stored as `metrics/<metric_file_name>.zip` in a given Neptune run. For an example of how data is uploaded please see [here](https://github.com/instadeepai/Mava/blob/ce9a161a0b293549b2a34cd9a8d794ba7e0c9949/mava/utils/logger.py#L182).

In onrder to download data, the tool can be used as follows:

```python
from marl_eval.json_tools import pull_netpune_data

pull_netpune_data(
project_name="DemoWorkspace/demo_project",
tag=["experiment_1"],
store_directory="./neptune_json_data",
)
```

## JSON file merging script
The `concatenate_json_files` function will merge all JSON files found in a given directory into a single JSON file ready to be used for downstream aggregation and plotting with MARL-eval. The function accepts the following arguments:

* `input_directory`: the path to the directory containing multiple JSON files. This directory can contain JSON files in arbitrarily nested directories.
* `output_json_path`: the path where the merged JSON file should be stored.

The function can be used as follows:

```python
from marl_eval.json_tools import concatenate_json_files

concatenate_json_files(
input_directory="path/to/some/folder/",
output_json_path="path/to/merged_file/folder/",
)
```
2 changes: 2 additions & 0 deletions marl_eval/json_tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@

"""JSON tools for data preprocessing."""
from .json_logger import JsonLogger
from .merge_json_files import concatenate_json_files
from .pull_neptune_data import pull_neptune_data
2 changes: 1 addition & 1 deletion marl_eval/json_tools/json_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def write(
Args:
timestep (int): the current environment timestep.
key (str): the name of the metric to be logged.
value (str): the value of the metric to be logged.
value (float): the value of the metric to be logged.
evaluation_step (int): the number of evaluations already run.
is_absolute_metric (bool): whether the metric being logged is
an absolute metric.
Expand Down
2 changes: 1 addition & 1 deletion marl_eval/json_tools/merge_json_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def _check_seed(concatenated_data: Dict, algo_data: Dict, seed_number: str) -> s
return seed_number


def concatenate_files(
def concatenate_json_files(
input_directory: str, output_json_path: str = "concatenated_json_files/"
) -> Dict:
"""Concatenate all json files in a directory and save the result in a json file."""
Expand Down

0 comments on commit 4567b75

Please sign in to comment.