-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase JSON to Parquet timeout #94
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -49,7 +49,7 @@ Parameters: | |||
TimeoutInMinutes: | |||
Type: Number | |||
Description: The job timeout in minutes (integer). | |||
Default: 720 | |||
Default: 1200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we're getting json to parquet jobs that run over 12 hours right now? For the bigger data types like HealthKit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FitbitIntraday. HealthKitV2Samples is our second largest data type, but it only takes ~2 hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we need to specify TimeoutInMinutes
for each data type so we know generally how long each data type should be taking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... if it ain't broke don't fix it? Doing it for each data type would add a lot of complexity for no/little benefit. We could do something like what we do for the number of workers where we have a separate LargeJobNumberOfWorkers
variable for just the larger jobs, but for now using the same timeout for everything hasn't caused any issues so my vote is for doing nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Do we have a table somewhere of the current average job run times (or at least run time range) and workers per data type? I feel like it might still be good to have in case something is taking abnormally long compared to usual, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't. If I thought a job might be running for an abnormally long time I would compare it with previous job run times. My main objection to a having a table with this information is that since the size of the data is always changing (and we may make other changes like # of workers or even architectural changes) the table would need to be updated regularly -- at least monthly. I don't think it would even be referenced monthly.
This is approximately the largest timeout we could use while keeping the entire workflow time under ~22-24 hours.