Validate inputs #85

JMGilbert · 2023-06-15T16:43:22Z

So far the input validation checks:

That there are two RCPs and that they have labels rcp45 and rcp85 (assertion)
That there are 15 batches with labels batch0 - batch14 (assertion)
That the chunksize for batch is 15 (assertion)
That all of the chunksizes are exactly as we want them (warning)

In general, I have tried to design the validate_damages function to be pretty flexible so it should be easy to change what checks we run on the input damages.

codecov · 2023-06-15T19:33:14Z

Codecov Report

Merging #85 (172a518) into dscim-v0.4.0_fixes (970c623) will increase coverage by 0.56%.
The diff coverage is 100.00%.

@@                  Coverage Diff                   @@
##           dscim-v0.4.0_fixes      #85      +/-   ##
======================================================
+ Coverage               68.15%   68.72%   +0.56%     
======================================================
  Files                      17       17              
  Lines                    1878     1912      +34     
======================================================
+ Hits                     1280     1314      +34     
  Misses                    598      598

Impacted Files	Coverage Δ
src/dscim/preprocessing/input_damages.py	`89.57% <100.00%> (+1.21%)`	⬆️
src/dscim/preprocessing/preprocessing.py	`72.18% <100.00%> (+0.37%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

JMGilbert · 2023-06-20T18:39:32Z

This validation script requires very strict input damages, but we may want to generalize in the future. The package itself requires very particular dimensions for each object, so we may want the config to include the dimensions which would allow more flexible input files.

brews

Hey y'all. I gave this a quick look and I have some comments here and in-line.

Having validation throw assertions is very common; unfortunately, it's possible to turn off assertions entirely. It also makes it hard for users to catch validation errors, specifically if assertions are getting thrown from different bodies of code in this pkg. Consider throwing a custom exception or something else instead. I'd consider what happens or what failures you'd see if this ran without throwing assertions and how that changes behavior. This might be something to consider in another PR later.
Checking for chunking might be too rigid. Is there a really good reason for this? Some of these checks feel pretty rigid, generally. Just double-check that you actually need to validate this stuff. It's pretty common to dance around with how tight or loose checks are. I'd say you want to ensure the data has what you need, but you also want to avoid blocking people from doing good work in the future because you were overzealous with checks.

brews · 2023-07-06T18:53:39Z

CHANGELOG.md

+## Unreleased
+### Added
+- Function to validate input damages. ([PR #85](https://github.com/ClimateImpactLab/dscim/pull/83), [@JMGilbert](https://github.com/JMGilbert))
+


Why add this in a new "unreleased" section here and not in the one below?

brews · 2023-07-06T18:54:32Z

src/dscim/preprocessing/input_damages.py

@@ -863,3 +874,50 @@ def coastal_inputs(
            consolidated=True,
            mode="w",
        )
+
+
+def validate_damages(sector, path):


Docstr? What are the inputs params and what is being checked? Expected behavior?

Be clear about what you're actually checking for. Don't be shy with inline comments, too because people will usually be back to read this kind of checking code in the future. (You did a pretty good job with comments here so 👍 )

brews · 2023-07-06T18:55:41Z

src/dscim/preprocessing/input_damages.py

+
+
+def validate_damages(sector, path):
+    inputs = xr.open_zarr(path)


Curious why have IO here instead of reading in a zarr store? Would this make it so you don't need to monkey patch it for tests?

brews · 2023-07-06T19:01:46Z

src/dscim/preprocessing/input_damages.py

+    chunks_expected = dict(zip(dims, chunks))
+
+    assert dims_expected == dict(inputs.dims)
+    for i in list(inputs.keys()):


Might be able to just have this as inputs.keys() without casting it to a list?

brews · 2023-07-06T19:05:53Z

tests/test_input_damages.py

+    monkeypatch.setattr(
+        "dscim.preprocessing.input_damages.validate_damages", lambda *args: True
+    )
+


(also applies to all monkeypatching below)

Why does this need to be monkey patched out?

Might be better if the validation function takes a dataset or dataarray as input instead of doing its own IO?

Patching like this in tests might be a symptom of a design problem.

brews · 2023-07-06T19:06:50Z

tests/test_input_damages.py

+
+
+@pytest.mark.parametrize("sector", ["mortality", "coastal"])
+def test_validate_damages_correct(tmp_path, sector):


Need docstr. What behavior is this testing for?

brews · 2023-07-06T19:08:21Z

tests/test_input_damages.py

+    validate_damages(sector, path)  # No assertion error should be raised
+
+
+def test_validate_damages_incorrect_batches(tmp_path):


Need docstr. What's this testing. I can kinda get a feel for it from the title but might want to say more specifically what you're looking for.

These docstrs also can print when tests fail so its good to have even if they seem obvious.

brews · 2023-07-06T19:09:23Z

tests/test_input_damages.py

+    )
+
+
+def test_validate_damages_incorrect_rcps(tmp_path):


Need docstr. Harder to grok what this is testing for exactly.

brews · 2023-07-06T19:09:38Z

tests/test_input_damages.py

+
+
+@pytest.mark.parametrize("sector", ["mortality", "coastal"])
+def test_validate_damages_incorrect_chunk_sizes(tmp_path, sector):


Needs docstr.

brews · 2023-07-06T19:09:56Z

tests/test_input_damages.py

+
+
+@pytest.mark.parametrize("sector", ["mortality", "coastal"])
+def test_validate_damages_incorrect_region_chunk_sizes(tmp_path, sector):


Neeeeds doooocstrr.

brews · 2023-07-06T19:27:30Z

One more thing! Consider having an option to turn off hard, run-blocking validation in functions where you've hardcoded it. This is another thing that could help with testing or if the validation starts getting in the way for silly reasons.

Input damages validation + tests

b4d35c8

JMGilbert changed the base branch from main to dscim-v0.4.0_fixes June 15, 2023 16:43

JMGilbert added 6 commits June 15, 2023 11:48

black

3400f51

Add missing import

84e3cdb

Attempt to monkeypatch validation function

5cd68f2

Fix monkeypatch

b33343b

Fix flake8

3bbdfd1

Fix monkeypatch

172a518

Base automatically changed from dscim-v0.4.0_fixes to dscim-v0.4.0 July 6, 2023 17:53

kemccusker requested review from brews and kemccusker July 6, 2023 18:04

brews reviewed Jul 6, 2023

View reviewed changes

Base automatically changed from dscim-v0.4.0 to main July 6, 2023 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate inputs #85

Validate inputs #85

JMGilbert commented Jun 15, 2023

codecov bot commented Jun 15, 2023

JMGilbert commented Jun 20, 2023

brews left a comment

brews Jul 6, 2023

brews Jul 6, 2023

brews Jul 6, 2023

brews Jul 6, 2023

brews Jul 6, 2023

brews Jul 6, 2023

brews Jul 6, 2023

brews Jul 6, 2023

brews Jul 6, 2023

brews Jul 6, 2023

brews commented Jul 6, 2023



		def validate_damages(sector, path):
		inputs = xr.open_zarr(path)



		@pytest.mark.parametrize("sector", ["mortality", "coastal"])
		def test_validate_damages_correct(tmp_path, sector):

		validate_damages(sector, path) # No assertion error should be raised


		def test_validate_damages_incorrect_batches(tmp_path):



		@pytest.mark.parametrize("sector", ["mortality", "coastal"])
		def test_validate_damages_incorrect_chunk_sizes(tmp_path, sector):



		@pytest.mark.parametrize("sector", ["mortality", "coastal"])
		def test_validate_damages_incorrect_region_chunk_sizes(tmp_path, sector):

Validate inputs #85

Are you sure you want to change the base?

Validate inputs #85

Conversation

JMGilbert commented Jun 15, 2023

codecov bot commented Jun 15, 2023

Codecov Report

JMGilbert commented Jun 20, 2023

brews left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brews commented Jul 6, 2023