-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common: get rid of atomic-events folder #183
Comments
Sounds good @chuwy. Just a note on (1) - this is a dedicated step only if What are the benefits of this change? Will an absence of the RDD action have notable performance impact? |
👍
Not maybe that notable as we'd like to have - just around 2-3%, a bit more if we use >1 nodes, because Spark will be able to evenly partition the data. Also, the fewer steps we have - the fewer points of failure it will have. However, main reason is that we need to unify data and simplify the flow for the future refactoring. |
We can to treat atomic events as a (special) TSV-shredded type in output data with [
iglu:com.snowplowanalytics.snowplow/atomic/jsonschema/1-0-0
] (snowplow/iglu-central#778) schema. This would allow us to get rid of one S3DistCp step and dedicate RDD action in Shredder.This would require:
Did I miss anything, @stdfalse ☝️
The text was updated successfully, but these errors were encountered: