-
Notifications
You must be signed in to change notification settings - Fork 77
Avoid needless data loss during schema mapping failures #110
Comments
Agreed on the better logging. The reason we let the exception propagate all the way up to Thread.run() is intentional. We force a new instance of the processor to be created, because whatever state the existing instance had built up, could be the cause of the exception. This is similar to how actor systems deal with failure. The fact that you lose the entire micro-batch is a bit of a tradeoff. This just happens to be the granularity of processing. In essence, a mapping that throws exceptions is in fact a misconfigured systems and as such might lose some events. |
Take a look at #111, I think it avoids losing the micro-batch (though perhaps there is a case where the entire batch needs to be discarded). Sure it is possible that there is some state that is messed up, but more than likely that isn't the case. |
Actually, I'm struggling to imagine what state is in the thread that will get cleaned up by discarding it and drawing blank. |
This is a conceptual thing. We don't know what state could cause the exception now and we don't know what future changes might incur additional ones. We guard against all of that by just cleaning up and starting fresh. The real remedy is: if mapping throws exceptions, fix that, not the mechanism that catches it. An exception is an exceptional condition; it shouldn't occur frequently enough to warrant handling that takes performance into account. |
As an aside: it's better to start of these kinds of discussions on the mailing list and see if it requires an issue to be logged from there. It helps keep this space clean. |
DIvolte should minimize data loss due to schema mapping errors, and should provide better logging when they occur. It appears right now that a failure while mapping a single field of a single event will result in the loss not only of that event, but of all other events in the same micro batch. It will also result in the exception unwinding the stack all the way up to Thread.run(), despite this comment in ItemProcessor:
At the very least, the rest of the batch should be preserved. Ideally, failures for one field shouldn't prevent other fields being populated, and the unstructured raw data should be passed in to a field that allows the case to be later analyzed and potentially even corrected.
Here's a sample of what I'm seeing in production:
The text was updated successfully, but these errors were encountered: