Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lake][DuckDB] Accuracy app issues and improvements #1054

Open
KatunaNorbert opened this issue May 16, 2024 · 2 comments
Open

[Lake][DuckDB] Accuracy app issues and improvements #1054

KatunaNorbert opened this issue May 16, 2024 · 2 comments

Comments

@KatunaNorbert
Copy link
Member

Issues:

  • the current gql fetches slots data from subgraph even if it has a "Pending" status, which means trueval is null and stakes 0. Due to the way we calculate the timetamp for the last data to fetch, the last slots will always have a Pending status. If we keep fetching new data each 5m that leads to all the new data to have null trueval and 0 stakes which means after a time all the accuracy values are going to be 0. Possible fix: fetch only slots with status = "Paying"

Improvements:

  • etl.update() fetches the data for the entire ETL, we could just fetch the raw data and skip the bronze tables and other tables that are going to be added to the ETL, or even better just fetch the data for the slots table. For this we could use gql.update() function and modify it to be able to receive as param the tables to fetch the data for.
@idiom-bytes
Copy link
Member

idiom-bytes commented May 27, 2024

the current gql fetches slots data from subgraph even if it has a "Pending" status, which means trueval is null and stakes 0. Due to the way we calculate the timetamp for the last data to fetch, the last slots will always have a Pending status. If we keep fetching new data each 5m that leads to all the new data to have null trueval and 0 stakes which means after a time all the accuracy values are going to be 0. Possible fix: fetch only slots with status = "Paying"

This is the correct behavior. We need to incorporate the "update queries" in order to properly address this. Please, do not try to change this, but instead to incorporate the right update procedure.

etl.update() fetches the data for the entire ETL, we could just fetch the raw data and skip the bronze tables and other tables that are going to be added to the ETL, or even better just fetch the data for the slots table. For this we could use gql.update() function and modify it to be able to receive as param the tables to fetch the data for.

No, ETL only fetches the new data and rebuilds new rows for bronze and other tables.

skip the bronze tables and other tables that are going to be added to the ETL,

This is not the right pattern

@idiom-bytes
Copy link
Member

idiom-bytes commented May 27, 2024

[My Feedback]
There are 2 ways to do this:

  1. The Old Way - When the API is hit, fetch from subgraph, calculate the accuracies, return the answer
  2. The New Way - Update the lake + tables.... when the API is hit, fetch from the lake, calculate the accuracies, return the answer

[Problem]
The new way requires [slot tables] + [handling update events] such that the slots table is updated and goes from "Pending" to "Paying"

Before doing this to the slots table, let's please implement this with the predictions table... such that when a new "Payouts Event" shows up, an existing "Predictions" record is updated w/ it's payout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants