Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MinLength/MaxLength] Apply filtered row behavior at the row level evaluation #547

Merged
merged 2 commits into from
Mar 10, 2024

Conversation

rdsharma26
Copy link
Contributor

@rdsharma26 rdsharma26 commented Mar 10, 2024

Description of changes:

  • For certain scenarios, the filtered row behavior for MinLength and MaxLength was not working correctly.
  • For example, when using both minLength and maxLength constraints in a single check, and with both using == as an assertion. This was resulting in the row level outcome of the filtered rows to be false. This was because we were replacing values for filtered rows for Min to MaxValue and for Max to MinValue. But a number could not equal both at the same time.
val analyzerOptions = AnalyzerOptions(
  nullBehavior = NullBehavior.EmptyString,
  filteredRow = FilteredRowOutcome.TRUE
)

val check = new Check(CheckLevel.Error, "test-check")
  .hasMinLength("Company", _ == 8, analyzerOptions = Some(analyzerOptions)).where("ID > 2")
  .hasMaxLength("Company", _ == 8, analyzerOptions = Some(analyzerOptions)).where("ID > 2")


+---+----------------+-------+-----+-----------+----------+
|ID |Company         |ZipCode|State|City       |test-check|
+---+----------------+-------+-----+-----------+----------+
|1  |Acme            |90210  |CA   |Los Angeles|false     |   <-- Incorrect outcome for filtered row
|2  |Acme            |90211  |CA   |Los Angeles|false     |   <-- Incorrect outcome for filtered row 
|3  |Robocorp        |NULL   |NJ   |NULL       |true      |
|4  |Robocorp        |NULL   |NY   |New York   |true      |
+---+----------------+-------+-----+-----------+----------+


  • Updated the logic of the row level assertion to MinLength/MaxLength to match what was done for Min/Max.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…aluation

- For certain scenarios, the filtered row behavior for MinLength and MaxLength was not working correctly.
- For example, when using both minLength and maxLength constraints in a single check, and with both using == <value> as an assertion. This was resulting in the row level outcome of the filtered rows to be false. This was because we were replacing values for filtered rows for Min to MaxValue and for Max to MinValue. But a number could not equal both at the same time.
- Updated the logic of the row level assertion to MinLength/MaxLength to match what was done for Min/Max.
conditionSelectionGivenColumn(colLengths, Option(isNullCheck), replaceWith = 0.0).cast(DoubleType)
case _ =>
colLengths
val colLength = length(col(column)).cast(DoubleType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this is much cleaner

Copy link
Contributor

@eycho-am eycho-am left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - added one small comment for documenting this new behavior

- Added comments to the "aggregationFunctions" method in Min, Max, MinLength and MaxLength analyzers.
- Refactored the criterion method to reuse an existing variable.
@rdsharma26
Copy link
Contributor Author

Thanks for the review @eycho-am . I've addressed the comments in the latest commit.

@eycho-am eycho-am merged commit 798901e into awslabs:master Mar 10, 2024
1 check passed
eycho-am pushed a commit that referenced this pull request Mar 10, 2024
…aluation (#547)

* [MinLength/MaxLength] Apply filtered row behavior at the row level evaluation

- For certain scenarios, the filtered row behavior for MinLength and MaxLength was not working correctly.
- For example, when using both minLength and maxLength constraints in a single check, and with both using == <value> as an assertion. This was resulting in the row level outcome of the filtered rows to be false. This was because we were replacing values for filtered rows for Min to MaxValue and for Max to MinValue. But a number could not equal both at the same time.
- Updated the logic of the row level assertion to MinLength/MaxLength to match what was done for Min/Max.
@rdsharma26 rdsharma26 deleted the minmax-length-row-level branch March 10, 2024 22:00
eycho-am pushed a commit that referenced this pull request Apr 3, 2024
…aluation (#547)

* [MinLength/MaxLength] Apply filtered row behavior at the row level evaluation

- For certain scenarios, the filtered row behavior for MinLength and MaxLength was not working correctly.
- For example, when using both minLength and maxLength constraints in a single check, and with both using == <value> as an assertion. This was resulting in the row level outcome of the filtered rows to be false. This was because we were replacing values for filtered rows for Min to MaxValue and for Max to MinValue. But a number could not equal both at the same time.
- Updated the logic of the row level assertion to MinLength/MaxLength to match what was done for Min/Max.
rdsharma26 added a commit that referenced this pull request Apr 16, 2024
…aluation (#547)

* [MinLength/MaxLength] Apply filtered row behavior at the row level evaluation

- For certain scenarios, the filtered row behavior for MinLength and MaxLength was not working correctly.
- For example, when using both minLength and maxLength constraints in a single check, and with both using == <value> as an assertion. This was resulting in the row level outcome of the filtered rows to be false. This was because we were replacing values for filtered rows for Min to MaxValue and for Max to MinValue. But a number could not equal both at the same time.
- Updated the logic of the row level assertion to MinLength/MaxLength to match what was done for Min/Max.
rdsharma26 added a commit that referenced this pull request Apr 16, 2024
…aluation (#547)

* [MinLength/MaxLength] Apply filtered row behavior at the row level evaluation

- For certain scenarios, the filtered row behavior for MinLength and MaxLength was not working correctly.
- For example, when using both minLength and maxLength constraints in a single check, and with both using == <value> as an assertion. This was resulting in the row level outcome of the filtered rows to be false. This was because we were replacing values for filtered rows for Min to MaxValue and for Max to MinValue. But a number could not equal both at the same time.
- Updated the logic of the row level assertion to MinLength/MaxLength to match what was done for Min/Max.
rdsharma26 added a commit that referenced this pull request Apr 16, 2024
…aluation (#547)

* [MinLength/MaxLength] Apply filtered row behavior at the row level evaluation

- For certain scenarios, the filtered row behavior for MinLength and MaxLength was not working correctly.
- For example, when using both minLength and maxLength constraints in a single check, and with both using == <value> as an assertion. This was resulting in the row level outcome of the filtered rows to be false. This was because we were replacing values for filtered rows for Min to MaxValue and for Max to MinValue. But a number could not equal both at the same time.
- Updated the logic of the row level assertion to MinLength/MaxLength to match what was done for Min/Max.
rdsharma26 added a commit that referenced this pull request Apr 17, 2024
…aluation (#547)

* [MinLength/MaxLength] Apply filtered row behavior at the row level evaluation

- For certain scenarios, the filtered row behavior for MinLength and MaxLength was not working correctly.
- For example, when using both minLength and maxLength constraints in a single check, and with both using == <value> as an assertion. This was resulting in the row level outcome of the filtered rows to be false. This was because we were replacing values for filtered rows for Min to MaxValue and for Max to MinValue. But a number could not equal both at the same time.
- Updated the logic of the row level assertion to MinLength/MaxLength to match what was done for Min/Max.
rdsharma26 added a commit that referenced this pull request Apr 17, 2024
…aluation (#547)

* [MinLength/MaxLength] Apply filtered row behavior at the row level evaluation

- For certain scenarios, the filtered row behavior for MinLength and MaxLength was not working correctly.
- For example, when using both minLength and maxLength constraints in a single check, and with both using == <value> as an assertion. This was resulting in the row level outcome of the filtered rows to be false. This was because we were replacing values for filtered rows for Min to MaxValue and for Max to MinValue. But a number could not equal both at the same time.
- Updated the logic of the row level assertion to MinLength/MaxLength to match what was done for Min/Max.
arsenalgunnershubert777 pushed a commit to arsenalgunnershubert777/deequ that referenced this pull request Nov 8, 2024
…aluation (awslabs#547)

* [MinLength/MaxLength] Apply filtered row behavior at the row level evaluation

- For certain scenarios, the filtered row behavior for MinLength and MaxLength was not working correctly.
- For example, when using both minLength and maxLength constraints in a single check, and with both using == <value> as an assertion. This was resulting in the row level outcome of the filtered rows to be false. This was because we were replacing values for filtered rows for Min to MaxValue and for Max to MinValue. But a number could not equal both at the same time.
- Updated the logic of the row level assertion to MinLength/MaxLength to match what was done for Min/Max.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants