Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition summary for embeddings #12

Open
vahuja4 opened this issue Jul 20, 2023 · 1 comment
Open

Partition summary for embeddings #12

vahuja4 opened this issue Jul 20, 2023 · 1 comment

Comments

@vahuja4
Copy link

vahuja4 commented Jul 20, 2023

Interesting approach for drift detection! Can you please tell me if the partition summary in the case of embeddings is the same as below (https://dm4ml.github.io/gate/how-it-works/) or are you taking into account other factors:
coverage: The fraction of the column that has non-null values.
mean: The mean of the column.
p50: The median of the column.
num_unique_values: The number of unique values in the column.
occurrence_ratio: The count of the most frequent value divided by the total count.
p95: The 95th percentile of the column.

@shreyashankar
Copy link
Contributor

The partition summary includes the summary statistics listed above, for each dimension of the embeddings!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants