Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(seer grouping): Add Seer fields to grouphash metadata table #78106

Merged
merged 3 commits into from
Sep 30, 2024

Conversation

lobsterkatie
Copy link
Member

@lobsterkatie lobsterkatie commented Sep 25, 2024

This adds fields to the GroupHashMetadata table to track the following for hashes sent to Seer:

  • When the grouphash was sent to Seer
  • Which event's stacktrace was sent (stored as an event id)
  • The Seer model version used to analyze the stacktrace
  • The matched hash returned by Seer, if any (stored as a reference to that hash's GroupHash record)
  • The similarity distance returned by Seer, if any

Use of these fields is added in #78107.

@github-actions github-actions bot added Scope: Frontend Automatically applied to PRs that change frontend components Scope: Backend Automatically applied to PRs that change backend components labels Sep 25, 2024

This comment was marked as off-topic.

@lobsterkatie lobsterkatie removed the Scope: Frontend Automatically applied to PRs that change frontend components label Sep 25, 2024

This comment was marked as outdated.

Copy link
Member

@wedamija wedamija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migration lgtm. The table has only 14k rows so adding the index is fine.

@lobsterkatie lobsterkatie force-pushed the kmclb-add-seer-fields-to-grouphash-metadata branch from 68eaeab to 7e84260 Compare September 25, 2024 20:22
@github-actions github-actions bot added the Scope: Frontend Automatically applied to PRs that change frontend components label Sep 25, 2024

This comment was marked as outdated.

@lobsterkatie lobsterkatie force-pushed the kmclb-add-seer-fields-to-grouphash-metadata branch from 6ab3c0d to 71a3939 Compare September 27, 2024 21:34
Copy link
Contributor

This PR has a migration; here is the generated SQL for src/sentry/migrations/0769_add_seer_fields_to_grouphash_metadata.py ()

--
-- Add field seer_date_sent to grouphashmetadata
--
ALTER TABLE "sentry_grouphashmetadata" ADD COLUMN "seer_date_sent" timestamp with time zone NULL;
--
-- Add field seer_event_sent to grouphashmetadata
--
ALTER TABLE "sentry_grouphashmetadata" ADD COLUMN "seer_event_sent" varchar(32) NULL;
--
-- Add field seer_match_distance to grouphashmetadata
--
ALTER TABLE "sentry_grouphashmetadata" ADD COLUMN "seer_match_distance" double precision NULL;
--
-- Add field seer_matched_grouphash to grouphashmetadata
--
ALTER TABLE "sentry_grouphashmetadata" ADD COLUMN "seer_matched_grouphash_id" bigint NULL;
--
-- Add field seer_model to grouphashmetadata
--
ALTER TABLE "sentry_grouphashmetadata" ADD COLUMN "seer_model" varchar NULL;
ALTER TABLE "sentry_grouphashmetadata" ADD CONSTRAINT "sentry_grouphashmeta_seer_matched_groupha_c92b0107_fk_sentry_gr" FOREIGN KEY ("seer_matched_grouphash_id") REFERENCES "sentry_grouphash" ("id") DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE "sentry_grouphashmetadata" VALIDATE CONSTRAINT "sentry_grouphashmeta_seer_matched_groupha_c92b0107_fk_sentry_gr";
CREATE INDEX CONCURRENTLY "sentry_grouphashmetadata_seer_matched_grouphash_id_c92b0107" ON "sentry_grouphashmetadata" ("seer_matched_grouphash_id");

@lobsterkatie lobsterkatie removed the Scope: Frontend Automatically applied to PRs that change frontend components label Sep 30, 2024
@lobsterkatie lobsterkatie merged commit 4652526 into master Sep 30, 2024
51 checks passed
@lobsterkatie lobsterkatie deleted the kmclb-add-seer-fields-to-grouphash-metadata branch September 30, 2024 17:16
lobsterkatie added a commit that referenced this pull request Sep 30, 2024
…78107)

This is a follow up to #78106, which added seer-relevant fields to the `GroupHashMetadata` table, actually using those fields to store data about Seer calls during ingest. The following data is stored:

- When the grouphash was sent to Seer
- Which event's stacktrace was sent (stored as an event id)
- The Seer model version used to analyze the stacktrace
- The matched hash returned by Seer, if any (stored as a reference to that hash's `GroupHash` record)
- The similarity distance returned by Seer, if any

Notes:

- As a result of this change, we're no longer storing Seer results in event data. This is better - before, to see the Seer results you had to find the first event to generate a given hash (which is the first event in a group in cases where Seer doesn't find a match, but is some random event among a group's full event list in cases where Seer does match to an existing group). Now, you can find Seer resuls from any event in a group, via the group's `GroupHash` records. (It did mean I had to pull out some temporary logs I had added to group creation, but it's unclear if they're still necessary. If the issue which prompted them comes up again, I'll add them back in a different way.)

- As agreed offline, I'm updating the `GroupHashMetadata` records which are created elsewhere (rather than waiting to create them until the Seer results are available, or until we decide not to call Seer) because it's easier to reason about and because the cardinality here is low, given that more than 99% of events match to an existing group and therefore never hit this code. If, after the `GroupHashMetadata` MVP is done, we decide we need to optimize this, we can do so before GA.

- Not done here: Updating the backfill code to also store Seer results in grouphash metadata rather than on the group. This should be done before the next time we run a backfill, but given that the current backfill is already half-completed, having them in one place for new groups and a different (single) place for backfilled groups - while not ideal - seemed better than having them in one place for new groups and sometimes the same place but sometimes a different place for backfilled groups.
@github-actions github-actions bot locked and limited conversation to collaborators Oct 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants