Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lake] Integrate pdr_subscriptions into GQL Data Factory #468

Closed
9 tasks done
idiom-bytes opened this issue Dec 21, 2023 · 1 comment · Fixed by #469
Closed
9 tasks done

[Lake] Integrate pdr_subscriptions into GQL Data Factory #468

idiom-bytes opened this issue Dec 21, 2023 · 1 comment · Fixed by #469
Assignees
Labels
Type: Enhancement New feature or request

Comments

@idiom-bytes
Copy link
Member

idiom-bytes commented Dec 21, 2023

Motivation

Subscriptions are not being fetched from subgraph and added to lake. There is similar logic being used to determine consumptions for df_buyer, but the requirements are different.

Instead, simply use predictSubscriptions w/ a top-level timestamp filter to fetch all subscriptions between the expected [st_ut, fin_ut].

When implementing this, we'll have 2 different subgraph queries being handled by GQL Data Factory, using the same date-range. Due to subgraphs tables having records (or not), the Data Factory should elegantly handle empty gql quries, empty parquet files, and it's basic interface of get_dfs() without any issues.

DoD:

  • Improve gql_data_factory so it can handle empty fetches, empty parquet files, and general mis-matches between fs.
  • Improve gql_data_factory to return an empty dataframe of the expected schema if nothing is availbale.
  • Add fn to fetch subscriptions from subgraph
  • Create subscription object for models
  • Create subscription schema for lake
  • Hook subscriptions to lake
  • Add mock utils for subscriptions
  • Add tests that verify that subscription objects, subgraph, and components are working as expected
  • Add tests that verify that GQL Data Factory that handle empty subgraph fetches + empty parquets files
@idiom-bytes idiom-bytes added the Type: Enhancement New feature or request label Dec 21, 2023
@idiom-bytes idiom-bytes changed the title [lake] Integrate subgraph subscriptions [Lake] Integrate subscriptions from pdr subgraph Dec 21, 2023
@idiom-bytes idiom-bytes changed the title [Lake] Integrate subscriptions from pdr subgraph [Lake] Integrate pdr subscriptions from subgraph Dec 21, 2023
@idiom-bytes idiom-bytes changed the title [Lake] Integrate pdr subscriptions from subgraph [Lake] Integrate pdr_subscriptions into GQL Data Factory Dec 22, 2023
@idiom-bytes
Copy link
Member Author

idiom-bytes commented Dec 22, 2023

Implemented pdr_subscriptions while updating GQL Data Factory, such that it behaves in the expected manner.

Example:

  1. gql_data_factory already has predictions.parquet saved locally and up-to-date, so it skips
  2. gql_data_factory doesn't have subscriptions.parquet, so it tries to update it
  3. gql_data_factory can't find subscriptions, so it doesn't save to parquet
  4. gql_data_factory loads predictions.parquet successfully, creating charts
  5. gql_data_factory can't load subscriptions.parquet, so it returns an empty dataframe w/ the expected schema
    image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant