PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation

Implementation of the PQMass two sample test from Lemos et al. 2024 here

Install

Just do:

pip install pqm

Usage

This is the main use case:

from pqm import pqm_pvalue, pqm_chi2
import numpy as np

x_sample = np.random.normal(size = (500, 10))
y_sample = np.random.normal(size = (400, 10))

# To get pvalues from PQMass
pvalues = pqm_pvalue(x_sample, y_sample, num_refs = 100, re_tessellation = 50)
print(np.mean(pvalues), np.std(pvalues))

# To get chi^2 from PQMass
chi2_stat = pqm_chi2(x_sample, y_sample, num_refs = 100, re_tessellation = 50)
print(np.mean(chi2_stat), np.std(chi2_stat))

If your two samples are drawn from the same distribution, then the p-value should be drawn from the random uniform(0,1) distribution. This means that if you get a very small value (i.e., 1e-6), then you have failed the null hypothesis test, and the two samples are not drawn from the same distribution. If you get values approximately equal to 1 every time then that suggests potential duplication of samples between x_samples and y_samples.

For the chi^2 metric, given your two sets of samples, if they come from the same distribution, the histogram of your chi^2 values should follow the chi^2 distribution. The degrees of freedom (DoF) will equal DoF = num_refs - 1 The peak of this distribution will be at DoF - 2, the mean will equal DoF, and the standard deviation will be sqrt(2 * DoF). If your chi^2 values are too high (chi^2 / DoF > 1), it suggests that the samples are out of distribution. Conversely, if the values are too low (chi^2 / DoF < 1), it indicates potential duplication of samples between x_samples and y_samples (i.e. memorization for generative models).

Developing

If you're a developer then:

git clone [email protected]:Ciela-Institute/PQM.git
cd PQM
git checkout -b my-new-branch
pip install -e .

But make an issue first so we can discuss implementation ideas.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
notebooks		notebooks
src/pqm		src/pqm
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation

Install

Usage

Developing

About

Releases 9

Packages

Contributors 3

Languages

License

Ciela-Institute/PQM

Folders and files

Latest commit

History

Repository files navigation

PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation

Install

Usage

Developing

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 3

Languages

Packages