Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add details about sample target #8

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions hub-config/tasks.json
Original file line number Diff line number Diff line change
Expand Up @@ -132,12 +132,14 @@
"sample": {
"output_type_id_params": {
"is_required": false,
"type": "integer",
"type": "character",
"max_length": 15,
"min_samples_per_task": 100,
"max_samples_per_task": 100
"max_samples_per_task": 100,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could consider allowing more than this. Main concern would be disk space / file size.

"compound_taskid_set": ["reference_date"]
},
"value": {
"type": "integer",
"type": "double",
"minimum": 0
}
}
Expand Down
16 changes: 13 additions & 3 deletions model-output/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ Values in the `output_type` column are either
- "quantile" or
- "samples".

This value indicates whether that row corresponds to a quantile forecast or sample trajectories for weekly incident hospital admissions.
This value indicates whether that row corresponds to a quantile forecast or sample trajectories for weekly incident hospital admissions. Samples can be submitted either for individual modeling tasks, where each `horizon` and `location` is treated independently, or as a part of a compound modeling task that encodes dependencies across forecast `horizon` and `location`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This value indicates whether that row corresponds to a quantile forecast or sample trajectories for weekly incident hospital admissions. Samples can be submitted either for individual modeling tasks, where each `horizon` and `location` is treated independently, or as a part of a compound modeling task that encodes dependencies across forecast `horizon` and `location`.
This value indicates whether that row corresponds to a quantile forecast or sample trajectories for weekly incident hospital admissions. Samples can be submitted either for individual modeling tasks, where each `horizon` and `location` is treated independently, or as a part of a compound modeling task that encodes predictive statistical dependency across forecast `horizon`s and/or `location`s.

We want to allow single location / multiple horizon joint samples and single horizon multiple location joint samples.


### `output_type_id`
Values in the `output_type_id` column specify identifying information for the output type.
Expand Down Expand Up @@ -217,8 +217,18 @@ Teams must provide the following 23 quantiles:

#### sample output

When the predictions are samples, values in the `output_type_id` column are indexes for the samples.
*More details to be added here*
When the predictions are samples, values in the `output_type_id` column are indexes for the samples. The `output_type_id` is used to indicate the dependence across multiple task id variables when samples come from a joint predictive distribution. For example, samples from a joint predictive distribution across `horizon`, will share `output_type_id` for predictions for different horizons within a same `location` as below:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the predictions are samples, values in the `output_type_id` column are indexes for the samples. The `output_type_id` is used to indicate the dependence across multiple task id variables when samples come from a joint predictive distribution. For example, samples from a joint predictive distribution across `horizon`, will share `output_type_id` for predictions for different horizons within a same `location` as below:
When the predictions are samples, values in the `output_type_id` column are indexes for the samples. The `output_type_id` is used to indicate the dependence across multiple task id variables when samples come from a joint predictive distribution. For example, samples from a joint predictive distribution across `horizon`s for a given `location`, will share `output_type_id` for predictions for different horizons within a same `location` as below:


| origin_date|horizon| location | output_type| output_type_id | value |
|:---------- |:-----:|:-----:| :-------- | :------------ | :---- |
| 2024-10-15 | -1 | MA | sample | s0 | - |
| 2024-10-15 | 0 | MA | sample | s0 | - |
| 2024-10-15 | 1 | MA | sample | s0 | - |
| 2024-10-15 | -1 | MA | sample | s1 | - |
| 2024-10-15 | 0 | MA | sample | s1 | - |
| 2024-10-15 | 1 | MA | sample | s1 | - |

Here, `output_type_id = s0` specifies that the predictions for horizons -1, 0, and 1 are part of the same joint distribution. More details on sample output can be found in the [hubverse documentation of sample output type](https://hubverse.io/en/latest/user-guide/sample-output-type.html).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For extra clarity, give an example of a second location, so people know how to indicate that MA trajectories are not joint with, e.g., NH trajectories, but are joint across horizons for each location? And maybe also give an example of a submission of trajectories that are joint across both locations and horizons?


### `value`

Expand Down
Loading