Add pandas index checks #200

skrawcz · 2022-09-22T19:24:38Z

Adds some basic index type checking to df creation

Related to issue #191, this commit is to help surface index type issues.

Specifically:

Warn if there are index type mismatches.
Require you to set your logger to debug if you want to see more details.
Provide a "ResultBuilder" class that uses strict index type matching so if you
want to error on index type mismatches, this is the results builder to use.

I don't think we should build anything more custom unless there's a clear
common use case - user contributed result builders sound like an interesting idea here though.

To use the new result builder the code should be the following:

from hamilton import base, driver
strict_builder = base.StrictIndexTypePandasDataFrameResult()
adapter = base.SimplePythonGraphAdapter(strict_builder)
...
dr =  driver.Driver(config, *modules, adapter=adapter)
df = dr.execute(...)  # this will now error if index types mismatch.

Changes

one clean up commit around imports
one commit with the index type changes
one commit to handle python 3.6 support and pandas index classes changing

Testing

Unit tests.
Checked things in the REPL locally against the examples in Better error handling for errors in the execute() stage #191.

Notes

Checklist

Testing checklist

Python - local testing

python 3.6
python 3.7
python 3.8
python 3.9

Importing `typing` and using that as a prefix was getting unsightly. So moved to importing the types explicitly.

Related to issue #191, this commit is to help surface index type issues. Specifically: 1. Warn if there are index type mismatches. 2. Require you to set your logger to debug if you want to see more details. 3. Provide a "ResultBuilder" class that uses strict index type matching so if you want to error on index type mismatches, this is the results builder to use. I don't think we should build anything more custom unless there's a clear common use case - user contributed result builders sound like an interesting idea.

Pandas dropped support for python 3.6 in something like 1.2. So pandas 1.1.5 is what we're using in our CI system, and that does not have the `NDArrayBackedExtensionIndex` type. So I'm guessing here, but looking at the 1.1.5 source, we instead want `ExtensionIndex`.

elijahbenizzy

Overall looks good -- curious on some decisions though around time-series

hamilton/base.py

TIL: you can create a dataframe and pass in an index object and it'll happily use it as a column. So this test should exist for dataframe creation since it's a valid case. But for the index type checking, I'm adding it here even though it does not have an explicit index. Therefore, one could make the argument it doesn't qualify here. But, I'd rather push people to be explicit in their code, e.g. if they want to be strict on indexes, then they should make the index a series, rather than passing an Index object.

So that way it's clearer what's going on and why. I decided to use private functions to the static ones because I don't really want them used outside of that function.

elijahbenizzy

Yeah, this is all good except the time-series stuff is confusing me.

hamilton/base.py

skrawcz added 2 commits September 22, 2022 12:26

Refactors base typing imports

678656b

Importing `typing` and using that as a prefix was getting unsightly. So moved to importing the types explicitly.

skrawcz force-pushed the add_pandas_index_checks branch from 41c65d8 to 7305ad0 Compare September 22, 2022 19:26

skrawcz force-pushed the add_pandas_index_checks branch from 7305ad0 to 23eb685 Compare September 22, 2022 19:28

skrawcz requested a review from elijahbenizzy September 22, 2022 19:31

elijahbenizzy reviewed Sep 22, 2022

View reviewed changes

skrawcz added 2 commits September 23, 2022 14:19

Refactors pandas_index_types to be more legible

d7bc3f5

So that way it's clearer what's going on and why. I decided to use private functions to the static ones because I don't really want them used outside of that function.

elijahbenizzy reviewed Sep 26, 2022

View reviewed changes

hamilton/base.py Show resolved Hide resolved

hamilton/base.py Show resolved Hide resolved

elijahbenizzy self-requested a review October 14, 2022 20:32

elijahbenizzy approved these changes Oct 14, 2022

View reviewed changes

skrawcz merged commit 5a9de3f into main Oct 14, 2022

skrawcz deleted the add_pandas_index_checks branch October 14, 2022 20:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pandas index checks #200

Add pandas index checks #200

skrawcz commented Sep 22, 2022 •

edited

Loading

elijahbenizzy left a comment

elijahbenizzy left a comment

Add pandas index checks #200

Add pandas index checks #200

Conversation

skrawcz commented Sep 22, 2022 • edited Loading

Changes

Testing

Notes

Checklist

Testing checklist

Python - local testing

elijahbenizzy left a comment

Choose a reason for hiding this comment

elijahbenizzy left a comment

Choose a reason for hiding this comment

skrawcz commented Sep 22, 2022 •

edited

Loading