Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Series-Analysis to compute the BRIERCL statistic from the PSTD line type #2003

Closed
21 tasks
j-opatz opened this issue Jan 6, 2022 · 2 comments · Fixed by #2034
Closed
21 tasks

Enhance Series-Analysis to compute the BRIERCL statistic from the PSTD line type #2003

j-opatz opened this issue Jan 6, 2022 · 2 comments · Fixed by #2034
Assignees
Milestone

Comments

@j-opatz
Copy link
Contributor

j-opatz commented Jan 6, 2022

Describe the New Feature

In working with CPC on NMME data and Probability of Exceedance (POE) calculations, there have been numerous times where a score (i.e. Brier) will line up correctly, but the skill score (i.e. BSS) will not. In these situations, having access to a 2D field for the reference value, BRIERCL, would be incredibly useful for diagnosing where the discrepancies are coming from between CPC and MET output.

This was discussed with Tara, and both parties agreed that while it would be a useful diagnosis tool, it may not necessarily belong in the full release (i.e., this may be a feature in beta only). More input will be needed from the C++ engineering members.

Acceptance Testing

NMME data from CPC can be used. Located on Kiowa.

METplus GenEnsProd config file: /d1/projects//CPC_data/scripts/GenEnsProd-SA_NMME.conf
Python scripts to ingest files: /d1/projects/CPC_data/scripts/forecast_read-in_CFSv2.py and /d1/projects//CPC_data/scripts/preprocessFun.py
Ensemble input file: /d1/projects/CPC_data/input/NMME/new_data/raw_fcst/.fcst.nc
GenEnsProd output files: /d1/projects/CPC_data/output/NMME_out/GenEnsProd-SA/.nc

METplus SeriesAnalysis config file: /d1/projects/CPC_data/scripts/SA_testing_for_CFSv2_GeorgeFix.conf
Climo mean field input file: /d1/projects/CPC_data/input/NMME/new_data/ghcn_cams.1x1.1982-2010.mon.clim.nc
Climo Stddev field input file: /d1/projects/CPC_data/input/NMME/new_data/ghcn_cams.1x1.1982-2010.mon.stddev.nc
Obs field input file: /d1/projects/CPC_data/input/NMME/new_data/ghcn_cams.1x1.1982-2020.mon.nc

Where to request this output is open for suggestion: because the workflow is GenEnsProd > SeriesAnalysis, it would be ideal if the PSTD line type could provide support for a BRIERCL column request in series-analysis output. I can also see a path forward in grid-stat, using an nc_pairs_flag option.

Time Estimate

Will require engineer input
Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the new feature down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

2702691

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

New Feature Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@j-opatz j-opatz added type: new feature Make it do something new alert: NEED MORE DEFINITION Not yet actionable, additional definition required alert: NEED ACCOUNT KEY Need to assign an account key to this issue alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle labels Jan 6, 2022
@j-opatz j-opatz added this to the MET 10.1.0 milestone Jan 6, 2022
@j-opatz j-opatz self-assigned this Jan 6, 2022
@j-opatz j-opatz changed the title Add netCDF output option for BRIERCL Add field output support for BRIERCL Jan 6, 2022
@j-opatz j-opatz changed the title Add field output support for BRIERCL Add field output support for BRIERCL column in PSTD Jan 6, 2022
@j-opatz j-opatz removed this from the MET 10.1.0 milestone Jan 6, 2022
@JohnHalleyGotway JohnHalleyGotway added this to the MET 10.1.0 milestone Jan 6, 2022
@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Jan 6, 2022

@j-opatz regarding Series-Analysis, I took a look starting at this line of code. And I see that "BRIERCL" is NOT included as one of the output stat types. However on this line of code, I see that Series-Analysis is storing the climo information, assuming it has been provided.

So adding it to the output to Series-Analysis should be relatively straight-forward... assuming there isn't some logical issue that arises in the computation of that statistic over the time series of data. But I don't anticipate that.

I don't think it can be added to the NetCDF matched pairs output from Grid-Stat. BRIERCL is just a statistic derived from an Nx2 contingency table, just like the BRIER score is. So it's computed by aggregating over "something". In Grid-Stat, it's a spatial aggregation over multiple grid points. In Series-Analysis its (typically) a temporal aggregation, separately for each grid point. That being said... I suppose it's possible that there's some climo-related field that already is (or could be) computed during the processing logic... and that could be added to the NetCDF output. But I'd need you to clarify exactly what that is.

@j-opatz
Copy link
Contributor Author

j-opatz commented Jan 7, 2022

Thanks for looking into this @JohnHalleyGotway, and it seems like you've grasped the situation perfectly.

An output from series-analysis would be sufficient, and I understand that a similar output from grid-stat isn't doable, given the aggregation need. As this is using a gen-ens-prod --> series-analysis route, it's a good outcome that BRIERCL be accessible via series-analysis.

@JohnHalleyGotway JohnHalleyGotway self-assigned this Jan 20, 2022
@JohnHalleyGotway JohnHalleyGotway removed the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Jan 20, 2022
@JohnHalleyGotway JohnHalleyGotway changed the title Add field output support for BRIERCL column in PSTD Enhance Series-Analysis to compute the BRIERCL statistics from the PSTD line type Jan 20, 2022
@JohnHalleyGotway JohnHalleyGotway changed the title Enhance Series-Analysis to compute the BRIERCL statistics from the PSTD line type Enhance Series-Analysis to compute the BRIERCL statistic from the PSTD line type Jan 20, 2022
JohnHalleyGotway added a commit that referenced this issue Jan 20, 2022
…imo mean and standard deviation fields found. Also add the following to the list of PSTD stats: BRIERCL, BRIERCL_NCL, BRIERCL_NCU, and BSS_SMPL.
@TaraJensen TaraJensen removed the alert: NEED ACCOUNT KEY Need to assign an account key to this issue label Jan 20, 2022
JohnHalleyGotway added a commit that referenced this issue Jan 25, 2022
…ml to test running Series-Analysis with probability data and climo.
JohnHalleyGotway added a commit that referenced this issue Jan 25, 2022
…at we need it for the derivation of climo probs.
JohnHalleyGotway added a commit that referenced this issue Jan 25, 2022
JohnHalleyGotway added a commit that referenced this issue Jan 28, 2022
… output. But I have 2 concerns... doing a deep copy of cdf_info for each grid point seems like a lot of wasted space. Consider changing PairBase::cdf_info into an unallocated pointer instead. Also deriving the probability by sampling some number of times from the climo distribution seems unnecessarily compuationally expensive. We do this to mimic existing NOAA/EMC logic. But it sure does seem like a computing the inverse of the CDF would be much simpler.
JohnHalleyGotway added a commit that referenced this issue Jan 29, 2022
…h a PairBase::climo_cdf_ptr pointer to one. This is to needed avoid creating separate ClimoCDFInfo objects for each grid point in Series-Analysis... since we have a PairDataPoint object for each. Using pointers should consume much less memory.
JohnHalleyGotway added a commit that referenced this issue Jan 29, 2022
…Analysis application code to hande the ClimoCDFInfo pointer.
JohnHalleyGotway added a commit that referenced this issue Jan 29, 2022
… config files to indicate the logic I'm adding to the tool. If block_size <= 0, automatically set it the full dimension of the verification grid.
JohnHalleyGotway added a commit that referenced this issue Jan 29, 2022
JohnHalleyGotway added a commit that referenced this issue Jan 29, 2022
… config files, including the direct_prob boolean option. Still need to actually add the code for the latter.
JohnHalleyGotway added a commit that referenced this issue Jan 31, 2022
…nalysis, need to store the aggregation object in the map BEFORE storing the pointer to the CDF thresh array. The opposite doesn't work and caused stat_analysis in unit_climatology_1.0deg to fail.
JohnHalleyGotway added a commit that referenced this issue Jan 31, 2022
…'t allocate PairDataPoint objects until the grid is actually defined.
JohnHalleyGotway added a commit that referenced this issue Jan 31, 2022
JohnHalleyGotway added a commit that referenced this issue Jan 31, 2022
JohnHalleyGotway added a commit that referenced this issue Jan 31, 2022
… it to 0. That way it'll automatically resize to the dimension of the grid. Doing this make it run about 20 seconds faster, which offsets the additional run of Series-Analysis.
JohnHalleyGotway added a commit that referenced this issue Jan 31, 2022
…from Series-Analysis config files since it has no impact on the output. That only applies to Grid-Stat and Point-Stat. But add entries for climo_cdf.direct_prob to all Point-Stat and Grid-Stat config files since those tools do derive climo probabilities. Do not add it to Ensemble-Stat config files since Ensemble-Stat does not derive climo probs.
@JohnHalleyGotway JohnHalleyGotway linked a pull request Jan 31, 2022 that will close this issue
14 tasks
JohnHalleyGotway added a commit that referenced this issue Feb 1, 2022
…Tweaking a PB2NC log message to clarify that the reported time range is the input observation timestamps.
@JohnHalleyGotway JohnHalleyGotway removed the alert: NEED MORE DEFINITION Not yet actionable, additional definition required label Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants