Common: get rid of atomic-events folder #183

chuwy · 2020-03-23T15:59:31Z

We can to treat atomic events as a (special) TSV-shredded type in output data with [iglu:com.snowplowanalytics.snowplow/atomic/jsonschema/1-0-0] (snowplow/iglu-central#778) schema. This would allow us to get rid of one S3DistCp step and dedicate RDD action in Shredder.

This would require:

Change in EmrEtlRunner to make it skip atomic S3DistCp (otherwise it fails as there's no data)
Make RDB Shredder 0.19.0 (assuming it will implement the change) compatible only with Loader 0.19.0 and above

Did I miss anything, @stdfalse ☝️

The text was updated successfully, but these errors were encountered:

stdfalse · 2020-03-24T02:57:34Z

Sounds good @chuwy.

Just a note on (1) - this is a dedicated step only if consolidated_shredded_output is enabled. I believe the logic will be if rdb_shredder >= '0.19.0' and consolidated_shredded_output is True: do not submit the step. You might need to check how the consolidation is implemented to ensure it won't cause any side-effects.

What are the benefits of this change? Will an absence of the RDD action have notable performance impact?

chuwy · 2020-03-24T18:38:47Z

You might need to check how the consolidation is implemented to ensure it won't cause any side-effects.

👍

What are the benefits of this change? Will an absence of the RDD action have notable performance impact?

Not maybe that notable as we'd like to have - just around 2-3%, a bit more if we use >1 nodes, because Spark will be able to evenly partition the data. Also, the fewer steps we have - the fewer points of failure it will have. However, main reason is that we need to unify data and simplify the flow for the future refactoring.

chuwy self-assigned this Mar 23, 2020

chuwy mentioned this issue Dec 25, 2020

Loading through message queue #264

Merged

3 tasks

chuwy added a commit that referenced this issue Jan 20, 2021

Common: get rid of atomic-events folder (close #183)

a1a552f

chuwy added a commit that referenced this issue Jan 20, 2021

Common: get rid of atomic-events folder (close #183)

a3ddcc2

chuwy added a commit that referenced this issue Jan 21, 2021

Common: get rid of atomic-events folder (close #183)

0ee7381

chuwy mentioned this issue Jan 21, 2021

Common: merge good and bad outputs #285

Closed

chuwy added a commit that referenced this issue Jan 21, 2021

Common: get rid of atomic-events folder (close #183)

a2eacce

chuwy added this to the Release 35 milestone Jan 26, 2021

chuwy added a commit that referenced this issue Jan 26, 2021

Common: get rid of atomic-events folder (close #183)

8672496

chuwy added a commit that referenced this issue Jan 27, 2021

Common: get rid of atomic-events folder (close #183)

4a21080

chuwy added a commit that referenced this issue Jan 27, 2021

Common: get rid of atomic-events folder (close #183)

0d3da5c

chuwy closed this as completed in 7bbf735 Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common: get rid of atomic-events folder #183

Common: get rid of atomic-events folder #183

chuwy commented Mar 23, 2020 •

edited

Loading

stdfalse commented Mar 24, 2020

chuwy commented Mar 24, 2020

Common: get rid of atomic-events folder #183

Common: get rid of atomic-events folder #183

Comments

chuwy commented Mar 23, 2020 • edited Loading

stdfalse commented Mar 24, 2020

chuwy commented Mar 24, 2020

chuwy commented Mar 23, 2020 •

edited

Loading