-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add profiler option for column level invalid values #704
Add profiler option for column level invalid values #704
Conversation
@@ -2512,6 +2522,11 @@ def tqdm(level: Set[int]) -> Generator[int, None, None]: | |||
min_true_samples = self._profile[prof_idx]._min_true_samples | |||
try: | |||
null_values = self._profile[prof_idx]._null_values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge add and most LGTM, one big here though bc a doctor is mutable, this will change self._null_values with the update. If we instead copy prior to a variable, that would alleviate the issue. Great job though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need to fix in the locations where we update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added .copy()
Head branch was pushed to by a user without write access
@@ -100,10 +103,13 @@ def __init__( | |||
} | |||
if options: | |||
if options.null_values is not None: | |||
self._null_values = options.null_values | |||
self._null_values = options.null_values.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added copy
@@ -2594,7 +2615,11 @@ def tqdm(level: Set[int]) -> Generator[int, None, None]: | |||
prof_idx = col_idx_to_prof_idx[col_idx] | |||
if min_true_samples is None: | |||
min_true_samples = self._profile[prof_idx]._min_true_samples | |||
null_values = self._profile[prof_idx]._null_values | |||
|
|||
null_values = self._profile[prof_idx]._null_values.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here
@@ -2576,7 +2591,13 @@ def tqdm(level: Set[int]) -> Generator[int, None, None]: | |||
prof_idx = col_idx_to_prof_idx[col_idx] | |||
if min_true_samples is None: | |||
min_true_samples = self._profile[prof_idx]._min_true_samples | |||
null_values = self._profile[prof_idx]._null_values | |||
|
|||
null_values = self._profile[prof_idx]._null_values.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here
@@ -2536,7 +2546,12 @@ def tqdm(level: Set[int]) -> Generator[int, None, None]: | |||
if min_true_samples is None: | |||
min_true_samples = self._profile[prof_idx]._min_true_samples | |||
try: | |||
null_values = self._profile[prof_idx]._null_values | |||
null_values: Dict = self._profile[prof_idx]._null_values.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here
This PR adds the feature to additionally set column-level null values. Here is an example of how to use this:
In addition to the global null value
9999999
, column 0 has the null value1
and column 1 has the null value3
.