Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add details about sample target #8

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sbidari
Copy link
Collaborator

@sbidari sbidari commented Oct 16, 2024

add more details re. submitting samples from a marginal/joint distribution with an example data table.

Copy link

✅ Hub correctly configured!

2024-10-16 21:35:54 UTC

Copy link

@dylanhmorris dylanhmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good, thanks @sbidari! A few tweaks/suggestions.

@@ -169,7 +169,7 @@ Values in the `output_type` column are either
- "quantile" or
- "samples".

This value indicates whether that row corresponds to a quantile forecast or sample trajectories for weekly incident hospital admissions.
This value indicates whether that row corresponds to a quantile forecast or sample trajectories for weekly incident hospital admissions. Samples can be submitted either for individual modeling tasks, where each `horizon` and `location` is treated independently, or as a part of a compound modeling task that encodes dependencies across forecast `horizon` and `location`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This value indicates whether that row corresponds to a quantile forecast or sample trajectories for weekly incident hospital admissions. Samples can be submitted either for individual modeling tasks, where each `horizon` and `location` is treated independently, or as a part of a compound modeling task that encodes dependencies across forecast `horizon` and `location`.
This value indicates whether that row corresponds to a quantile forecast or sample trajectories for weekly incident hospital admissions. Samples can be submitted either for individual modeling tasks, where each `horizon` and `location` is treated independently, or as a part of a compound modeling task that encodes predictive statistical dependency across forecast `horizon`s and/or `location`s.

We want to allow single location / multiple horizon joint samples and single horizon multiple location joint samples.

"min_samples_per_task": 100,
"max_samples_per_task": 100
"max_samples_per_task": 100,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could consider allowing more than this. Main concern would be disk space / file size.

@@ -217,8 +217,18 @@ Teams must provide the following 23 quantiles:

#### sample output

When the predictions are samples, values in the `output_type_id` column are indexes for the samples.
*More details to be added here*
When the predictions are samples, values in the `output_type_id` column are indexes for the samples. The `output_type_id` is used to indicate the dependence across multiple task id variables when samples come from a joint predictive distribution. For example, samples from a joint predictive distribution across `horizon`, will share `output_type_id` for predictions for different horizons within a same `location` as below:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the predictions are samples, values in the `output_type_id` column are indexes for the samples. The `output_type_id` is used to indicate the dependence across multiple task id variables when samples come from a joint predictive distribution. For example, samples from a joint predictive distribution across `horizon`, will share `output_type_id` for predictions for different horizons within a same `location` as below:
When the predictions are samples, values in the `output_type_id` column are indexes for the samples. The `output_type_id` is used to indicate the dependence across multiple task id variables when samples come from a joint predictive distribution. For example, samples from a joint predictive distribution across `horizon`s for a given `location`, will share `output_type_id` for predictions for different horizons within a same `location` as below:

| 2024-10-15 | 0 | MA | sample | s1 | - |
| 2024-10-15 | 1 | MA | sample | s1 | - |

Here, `output_type_id = s0` specifies that the predictions for horizons -1, 0, and 1 are part of the same joint distribution. More details on sample output can be found in the [hubverse documentation of sample output type](https://hubverse.io/en/latest/user-guide/sample-output-type.html).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For extra clarity, give an example of a second location, so people know how to indicate that MA trajectories are not joint with, e.g., NH trajectories, but are joint across horizons for each location? And maybe also give an example of a submission of trajectories that are joint across both locations and horizons?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants