Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data quality flags - planning #55

Open
bpbond opened this issue Nov 13, 2023 · 7 comments
Open

Data quality flags - planning #55

bpbond opened this issue Nov 13, 2023 · 7 comments

Comments

@bpbond
Copy link
Member

bpbond commented Nov 13, 2023

We want a flag database

General

  • L1 -> Flags applied to data -> Filtering -> Computation -> L2
  • L2 filters out data and assigns a quality column

Add a flag API is something like:

  • Timestamp? Not necessary given ID but useful for quick separation by e.g. year/month
  • Observation ID Necessary
  • Severity Drop/Warning/Note
  • Flag type Out_of_bounds/Timestep_outlier/Expert_opinion/...
  • Author Author (human or algorithmic)
  • Remark Remark

Other things we will want

  • Get flags for a year/month
  • Get flags for a timestamp/ID
  • Get flags for an ID
  • Clear flag(s)

Implementation

Flags come from

  • OOB column -> converted
  • Outlier or other statistical analysis
  • Relationship or other analysis
  • Human QAQC (e.g., Shiny app)

Flags get written out as CSVs with

  • L1_flag intermediate data product?
  • L2 data?

What about "hierarchical flags"? Discuss this with @roylrich @selinalcheng @stephpenn1

@bpbond
Copy link
Member Author

bpbond commented Nov 13, 2023

I have played with RSQLite and it seems straightforward. Much better than writing our own code!

@bpbond
Copy link
Member Author

bpbond commented Nov 17, 2023

How do flags get removed from the database? Does L2 remove them? Do they ever get removed?

@bpbond
Copy link
Member Author

bpbond commented Nov 24, 2023

Basically: is the flags database persistent between driver() invocations?

  • If no, everything is simplified: we expect that L1.qmd will create/overwrite the database (using the OOB column); L2_algorithmic.qmd and Shiny app Human_QAQC (neither of which exists yet) will add to it; and L2 will read it.
  • If yes, the L1 adds to whatever database is there. L2 can remove from it though (right?).

@bpbond
Copy link
Member Author

bpbond commented Nov 28, 2023

NCAS data https://sites.google.com/ncas.ac.uk/ncasobservations/home/data-project/ncas-data-standards/ncas-amof/data-quality-flags

As the name suggests, data quality flags are used to let the user know the quality of a particular data variable or factors that impact on the quality of a variable. In this standard we use an integer value in the range 0 to n: 
0 is reserved for future use and is not used
1 is always good data. 
The values of n, what they represent and how data with that flag value should be interpreted is incorporated into files by means of the a variable that is structured as follows.
A file containing just one data quality flag will contain the variable qc_flag 
Where a  file contains more that on data quality flag variable the data quality flag named is structured as:  qc_flag_<name> 

@bpbond
Copy link
Member Author

bpbond commented Nov 28, 2023

Ameriflux https://ameriflux.lbl.gov/data/flux-data-products/data-qaqc/physical-range-module/

Pastorello, G., et al. (2020), The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data, Scientific Data, 7(1), 225, DOI:10.1038/s41597-020-0534-3

Chu, H., Christianson, D. S., Cheah, Y.-W., Pastorello, G., O’Brien, F., Geden, J., Ngo, S.-T., Hollowgrass, R., Leibowitz, K., Beekwilder, N. F., Sandesh, M., Dengel, S., Chan, S. W., Santos, A., Delwiche, K., Yi, K., Buechner, C., Baldocchi, D., Papale, D., Keenan, T. F., Biraud, S. C., Agarwal, D. A., and Torn, M. S.: AmeriFlux BASE data pipeline to support network growth and data sharing, Sci Data, 10, 614, 2023.

@bpbond
Copy link
Member Author

bpbond commented Nov 28, 2023

UGA LTER https://gce-lter.marsci.uga.edu/gce_toolbox/wiki/QAQC.htm

QA/QC flag codes should be documented in the metadata (i.e. 'Data' category, 'Codes' field) using the following format: "Q = questionable value, I = invalid value, M = missing", etc. This ensures that the flag codes are properly displayed in standard and XML metadata, and also allows column values codes to be automatically generated when flags are optionally converted to encoded integer columns during ASCII or MATLAB export operations or manually in the structure editor. A GUI flag definition editor is provided with the GCE Data Toolbox, which can be opened using the 'View/Edit Q/C Flag Definitions' option on the 'Edit > Q/C Flag Functions' menu.

Suggested flag codes are listed below:

   I = invalid value (out of range) -- use for out-of-range/impossible values (e.g. negative mass)
   Q = questionable value -- use for values outside of expected range (e.g. below detection limit,
       well outside of historical value range, pattern indicating data contamination)
   E = estimated value -- use for values that were estimated by interpolation or other means
   S = spike/noise -- use for sharp discontinuities/spikes indicating data contamination

@roylrich
Copy link

We want a flag database

General

  • L1 -> Flags applied to data -> Filtering -> Computation -> L2
  • L2 filters out data and assigns a quality column

Add a flag API is something like:

  • Timestamp? Not necessary given ID but useful for quick separation by e.g. year/month
  • Observation ID Necessary
  • Severity Drop/Warning/Note
  • Flag type Out_of_bounds/Timestep_outlier/Expert_opinion/...
  • Author Author (human or algorithmic)
  • Remark Remark

Other things we will want

  • Get flags for a year/month
  • Get flags for a timestamp/ID
  • Get flags for an ID
  • Clear flag(s)

Implementation

Flags come from

  • OOB column -> converted
  • Outlier or other statistical analysis
  • Relationship or other analysis
  • Human QAQC (e.g., Shiny app)

Flags get written out as CSVs with

  • L1_flag intermediate data product?
  • L2 data?

> What about "hierarchical flags"? Discuss this with @roylrich @selinalcheng @stephpenn1
YES, but I am not sure how?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants