Update workflow #24

jrbourbeau · 2024-03-06T02:38:46Z

No description provided.

mrocklin · 2024-03-06T13:03:22Z

pipeline/data.py

+            # tables each time the flow is run to produce unique transactions
+            # xref https://discourse.prefect.io/t/how-to-get-flow-count/3996
+            # if table in ["lineitem", "orders"]:
+            #     df[f"{table[0]}_orderkey"] += counter


We might want to do more than just add now - 1995 or whatever. We might want to do an affine transform so that the entire previous range (like 1985-1995) gets squeezed into the last hour.

mrocklin · 2024-03-06T13:03:46Z

pipeline/monitor.py

-def check_model_endpoint():
-    r = requests.get("http://0.0.0.0:8080/health")
-    if not r.json() == ["ok"]:
-        raise ValueError("Model endpoint isn't healthy")


Still doing the serving with --subdomain or no?

My hope is that serving is cheap and the ROI is high

Yes. Not added yet -- that's for me to do today

mrocklin · 2024-03-06T13:05:39Z

pipeline/reduce.py

@@ -18,7 +18,7 @@


 @task
-def save_query(region, part_type):
+def save_query(segment):


Maybe a better name? Something that's connected to the query itself like "revenue_by_supplier" or something?

I went with unshipped_orders_by_revenue -- further suggestions are welcome

mrocklin · 2024-03-06T13:06:38Z

pipeline/reduce.py

-
-            outfile = RESULTS_DIR / region / part_type / "result.snappy.parquet"
+            ).compute()
+            outfile = RESULTS_DIR / f"{segment}.snappy.parquet"
            fs.makedirs(outfile.parent, exist_ok=True)
            result.to_parquet(outfile, compression="snappy")


 @flow
 def query_reduce():
    with lock_compact:


Cluster creation should go here maybe instead of save_query?

Update workflow

c4f4bb4

mrocklin reviewed Mar 6, 2024

View reviewed changes

jrbourbeau added 2 commits March 6, 2024 15:51

Dashboard

7695b45

More

9c349a7

jrbourbeau changed the title ~~[WIP] Update workflow~~ Update workflow Mar 12, 2024

jrbourbeau marked this pull request as ready for review March 12, 2024 19:35

jrbourbeau merged commit a00d25c into main Mar 12, 2024
1 check passed

jrbourbeau deleted the update branch March 13, 2024 01:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update workflow #24

Update workflow #24

jrbourbeau commented Mar 6, 2024

mrocklin Mar 6, 2024

mrocklin Mar 6, 2024

mrocklin Mar 6, 2024

jrbourbeau Mar 6, 2024

mrocklin Mar 6, 2024

jrbourbeau Mar 6, 2024

mrocklin Mar 6, 2024

Update workflow #24

Update workflow #24

Conversation

jrbourbeau commented Mar 6, 2024

mrocklin Mar 6, 2024

Choose a reason for hiding this comment

mrocklin Mar 6, 2024

Choose a reason for hiding this comment

mrocklin Mar 6, 2024

Choose a reason for hiding this comment

jrbourbeau Mar 6, 2024

Choose a reason for hiding this comment

mrocklin Mar 6, 2024

Choose a reason for hiding this comment

jrbourbeau Mar 6, 2024

Choose a reason for hiding this comment

mrocklin Mar 6, 2024

Choose a reason for hiding this comment