-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ETL-492] Update Glue table schemas to match crawler findings #63
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this schema change for the production data affect test runs with the pilot data given that the pilot data doesn't have these updated fields/new fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here fall into a few different categories:
- New fields -- these changes don't affect the pilot data. The fields aren't in the data, so they aren't included in the parquet.
- Overdue changes -- In some cases, specifically in
EnrolledParticipants.EventDates
andSymptomLog.Value
, we hadn't updated the types after the transformations we do to these fields in the S3 to JSON job. They are string in the original data, but they're string-encoded JSON. We load them as dicts and write them as JSON objects in the ndjson, so the schemas reflect the object structure now.
So the pilot data isn't affected by these changes. It's fair to ask why was nothing breaking before because of the overdue changes? TBH, I don't think Glue entirely respects the data types we use in our tables. In some cases it just uses whatever data type it detects and can load the data with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasyu888 see my comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 I'm going to pre-approve but I have the same question as rixing. Thanks for pushing this through!
No description provided.