Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore output-specific small number suppression #759

Open
ccunningham101 opened this issue Mar 16, 2022 · 2 comments
Open

Explore output-specific small number suppression #759

ccunningham101 opened this issue Mar 16, 2022 · 2 comments

Comments

@ccunningham101
Copy link

Current small number suppression may be overly stringent in some cases.
For decile charts, for example, we only care that there are at least n practices (or other group-by variable) per decile, rather than each practice having at least n events.
Louis has prototyped some decile charts suppression code here
This could be an opportunity to experiment with a redaction reusable action, that would happen between the generation of measures files and the use of the measures files.
The reusable action could incorporate #559 #560 and #561

@sebbacon
Copy link
Contributor

I like this idea!

@LFISHER7
Copy link

In many cases, the redaction can happen directly after the generation of the measures files. My normal approach to this uses this function to first suppress low numbers and subsequently further redact values if the total number of suppressed values is <=5. This is to reduce any secondary disclosure issues, whilst maintaining as many true values as possible. This doesn't provide protection against small number differences between months as I have generally used it to measure events occurring within each month. If you were measuring "ever had a vaccine" each month, you would need to protect against differencing the cumulative values.

As the above can get quite tricky when you start to think about secondary disclosure issues, the prevailing way to redact measures has been to first redact numbers <=5 and then round to the nearest 5 (or 7 in vaccine report). This protects from primary disclosure and provides some protection against secondary disclosure (including preventing differencing cumulative counts between months).

On the deciles chart redaction, this is one case where the measures file is grouped by a high cardinality var (and hence quite likely to have small counts) which is later to be aggregated, so redacting early can reduce the utility more significantly. It's not that we don't care about low counts here but more it only makes sense to produce a decile chart by practice if there are enough events! So there are a couple of extra checks that would be needed if the plan was to automate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants