-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use bzip2 compressed feature set json as pipeline option #466
Use bzip2 compressed feature set json as pipeline option #466
Conversation
There are a lot of formatting changes in this. Which IDE + settings are you using? I have imported the IntelliJ settings as described here: but I don't think it matches what you've submitted. |
It's the maven spotless plugin. |
|
||
public class ProtoUtil { | ||
|
||
public static String toJson(List<FeatureSetProto.FeatureSet> featureSets) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid creating non-generic utility methods. ProtoUtil
and toJson
seem like a generic class and method, but the implementation is specific to FeatureSetProtos.
Either we need to rename this to be more specific and generalize later, or move this functionality out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename seems more of a band aid solution, so i refactored my commits such that it is no longer under util, and can be extended for other compression strategies.
ingestion/src/main/java/feast/ingestion/utils/CompressionUtil.java
Outdated
Show resolved
Hide resolved
551652a
to
0fe5654
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: khorshuheng, woop The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Use bzip2 compressed feature set json as pipeline option * Make decompressor and compressor more generic and extensible * Avoid code duplication in test
* Use bzip2 compressed feature set json as pipeline option * Make decompressor and compressor more generic and extensible * Avoid code duplication in test
What this PR does / why we need it:
Dataflow runner has a limit of 256kb for pipeline option. As we are storing feature sets as json string in pipeline option, the size will grow proportionally to the number of feature set versions. Compressing the feature set json will help us to support more feature sets.
Which issue(s) this PR fixes:
None
Does this PR introduce a user-facing change?:
Users will be able to have more feature set before dataflow job submission fails. However, this depends on the compression ratio, which in turn depends on how much repetition exists in ithe feature set json.