Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add failure entity for incomplete events #886

Merged
merged 2 commits into from
Apr 5, 2024

Conversation

spenes
Copy link
Contributor

@spenes spenes commented Mar 27, 2024

Failure entities are started to be attached to incomplete events as derived contexts.

There are some cases that we need more information than in the bad rows to create failure entities therefore it isn't possible to create failure entities from bad rows directly. For this purpose, we created wrapper classes to attach extra information about failure entities.

Copy link
Contributor

@benjben benjben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the current design could be simplified.

We have:

  • BadRow
  • FailureEntity
  • BadRowWithFailureEntities that contains both
  • SchemaViolationWithExtraContext that contains part of a BadRow and part of a FailureEntity

This is due to the fact the information contained in a BadRow isn't always enough to build a FailureEntity.

But the information from the FailureEntitys is enough to reconstruct a bad row. So it's possible to do the other way around: to use FailureEntitys as a basis and to build a BadRow when we need it. This removes the need for other classes/wrappers.

What do you think of this alternative approach @spenes ?

@spenes
Copy link
Contributor Author

spenes commented Mar 27, 2024

Thanks for the review @benjben!

I see a few problems with the approach you suggested. First, I think it is hard to reconstruct BadRow from FailureEntitys. As you know, there can be multiple FailureEntity for one BadRow. We need some kind of mechanism to map FailureEntitys to their BadRow. Other problem I see with that approach is that we would be starting to redefine BadRows as FailureEntity in this case just to add a few extra bits. I am not sure it is good idea to duplicate BadRow definitions in the Enrich codebase. Of course, other solution might be add those extra bit of information to original BadRowdefinitions but I am not sure we want to go that route at this stage since that would make things more complicated.

I agree wrappers doesn't look pretty but this is kinda necessary workaround for this feature.

SchemaViolationWithExtraContext contains only a few extra bits additional to SchemaViolation to create FailureEntity. This will be exactly what we will do even if define completely new FailureEntity with structure almost same with SchemaViolation itself.

I agree with your comment about BadRowWithFailureEntities but don't we need both BadRow and List[FailureEntity] in the end ? We can postpone their creation with creating intermediate format that can be converted to both but we need to create both eventually.

@benjben
Copy link
Contributor

benjben commented Mar 28, 2024

As you know, there can be multiple FailureEntity for one BadRow. We need some kind of mechanism to map FailureEntitys to their BadRow

There are 3 steps that can produce a bad row:

  1. Validating the input
  2. Enriching
  3. Validating the output

Each of these steps would now return a List[FailureEntity] instead of a BadRow.
We know that all the FailureEntitys for a step are for the same bad row, so we just need to pattern match on the first one and we'll know the type of bad row to use.
The left part of the IorT would then be NonEmptyList[List[FailureEntity]].

With this approach it is all about the mapping function from FailureEntity to FailureDetails. We get rid of all the wrappers.

Other problem I see with that approach is that we would be starting to redefine BadRows as FailureEntity

No, we'd leave BadRow intact. What do you think that we'd need to change it ?

@spenes spenes force-pushed the incomplete_events_failure_entity branch 2 times, most recently from 38e0a96 to 36fc8fd Compare March 29, 2024 09:47
Copy link
Contributor

@benjben benjben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great @spenes ! And great work on the testing 👍

I think that we can still simplify the design a bit by getting rid of IntermediateBadRow and by using IorT.fold.

@spenes spenes force-pushed the incomplete_events_failure_entity branch 2 times, most recently from f229016 to 247542f Compare April 3, 2024 11:39
Copy link
Contributor

@benjben benjben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great @spenes ! There is one last important change needed and we're good ! 🥳

@spenes spenes requested a review from benjben April 4, 2024 22:54
Copy link
Contributor

@benjben benjben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @spenes !

Comment on lines +172 to +175
val derivedContexts = enrichmentResult.leftMap { ll =>
ll.flatten.toList
.map(_.toSDJ(timestamp, processor))
}.merge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice !

Remove this after failure entity schema is merged
With this commit, failure entities are started to be attached to incomplete
events as derived contexts.

There are some cases that we need more information than in the bad rows to create
failure entities therefore it isn't possible to create failure entities from bad rows directly.
For this purpose, we created wrapper classes to attach extra information about failure entities.
@spenes spenes force-pushed the incomplete_events_failure_entity branch from 23b1237 to c4ab89b Compare April 5, 2024 09:35
@spenes spenes merged commit c4ab89b into incomplete_events Apr 5, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants