Increase JSON to Parquet timeout #94

philerooski · 2023-12-12T18:38:38Z

This is approximately the largest timeout we could use while keeping the entire workflow time under ~22-24 hours.

rxu17

LGTM!

rxu17 · 2023-12-12T18:44:53Z

templates/glue-job-JSONToParquet.j2

@@ -49,7 +49,7 @@ Parameters:
  TimeoutInMinutes:
    Type: Number
    Description: The job timeout in minutes (integer).
-    Default: 720
+    Default: 1200


This means we're getting json to parquet jobs that run over 12 hours right now? For the bigger data types like HealthKit?

Just FitbitIntraday. HealthKitV2Samples is our second largest data type, but it only takes ~2 hours.

Do you think we need to specify TimeoutInMinutes for each data type so we know generally how long each data type should be taking?

Well... if it ain't broke don't fix it? Doing it for each data type would add a lot of complexity for no/little benefit. We could do something like what we do for the number of workers where we have a separate LargeJobNumberOfWorkers variable for just the larger jobs, but for now using the same timeout for everything hasn't caused any issues so my vote is for doing nothing.

Sounds good! Do we have a table somewhere of the current average job run times (or at least run time range) and workers per data type? I feel like it might still be good to have in case something is taking abnormally long compared to usual, what do you think?

We don't. If I thought a job might be running for an abnormally long time I would compare it with previous job run times. My main objection to a having a table with this information is that since the size of the data is always changing (and we may make other changes like # of workers or even architectural changes) the table would need to be updated regularly -- at least monthly. I don't think it would even be referenced monthly.

Increase JSON to Parquet timeout

2ff8b6d

philerooski requested a review from a team as a code owner December 12, 2023 18:38

philerooski temporarily deployed to develop December 12, 2023 18:41 — with GitHub Actions Inactive

rxu17 approved these changes Dec 12, 2023

View reviewed changes

rxu17 reviewed Dec 12, 2023

View reviewed changes

philerooski temporarily deployed to develop December 12, 2023 18:46 — with GitHub Actions Inactive

philerooski merged commit e882fee into main Dec 12, 2023
14 checks passed

philerooski deleted the etl-583 branch December 12, 2023 23:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase JSON to Parquet timeout #94

Increase JSON to Parquet timeout #94

philerooski commented Dec 12, 2023

rxu17 left a comment

rxu17 Dec 12, 2023

philerooski Dec 12, 2023

rxu17 Dec 12, 2023

philerooski Dec 12, 2023

rxu17 Dec 12, 2023

philerooski Dec 12, 2023

Increase JSON to Parquet timeout #94

Increase JSON to Parquet timeout #94

Conversation

philerooski commented Dec 12, 2023

rxu17 left a comment

Choose a reason for hiding this comment

rxu17 Dec 12, 2023

Choose a reason for hiding this comment

philerooski Dec 12, 2023

Choose a reason for hiding this comment

rxu17 Dec 12, 2023

Choose a reason for hiding this comment

philerooski Dec 12, 2023

Choose a reason for hiding this comment

rxu17 Dec 12, 2023

Choose a reason for hiding this comment

philerooski Dec 12, 2023

Choose a reason for hiding this comment