Optimize object creation for new Delta snapshot #326
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #325
The current code in the DeltaClient generates unnecessary objects when computing the file diff to find new and removed files. The process first converts all Delta Actions of the current delta log's snapshot to OneDataFiles, uses OneDataFiles to compute the diff, and then converts the resulting OneDataFiles collection back to Delta Action objects for writing. There is a round trip from Delta Action to OneDataFiles here. For large tables with thousands of Actions in a snapshot, this results in the creation of a large number of objects unnecessarily.
This change optimizes this process by skipping the unnecessary steps of converting delta actions from the previous snapshot into OneDataFiles and then back into delta actions. This optimizations does not change the behavior of the translation.
This change is already covered by existing tests for Delta conversion