-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow adding custom values to telemetry masking #80
Comments
So is the idea that you pass a list of strings to replace with like "REDACTED". We of course don't want to leak PII but additional bloat to the module is also something we want to avoid |
Exactly. I think that as much masking logic as is reasonable should be here, since it is shared so widely. However, I'm sure there will always be cases where the masking logic would miss something, and if it's hyper-specific, it would be better for that to live in the caller's code rather than bloat here. Could the parameter also accept |
The litmus test for whether the code should be here or with the caller could basically be--is the thing being masked likely to only come from that extension, or is it reasonable to think it may come from all of them? |
Yeah I almost want to say this feature in specific is out of scope as most people won't use masking and people who do are likely to have a custom telemetry service that wraps this module. |
That's fair, though I would say my motivation is chiefly to make the path of least resistance also the good path (i.e. the most privacy-respecting path possible). If it's very easy for consumers of this package to mask PII, they're more likely to do it. |
I think #93 fixes this. |
Thanks @lramos15, @aeisenberg, I took a look at the PR and this is great! One question; is there a way to alter/add new replacements after the reporter is constructed? |
No. That wasn't something I considered. It did not seem to fit in with the way the TelemetryRepoter is implemented. Do you think that's a feature users will want? For my use case, all replacements have been hard coded. |
@bwateratmsft Not at this time no. I think the general idea was that whatever you're trying to filter is most likely a static known set or at least known before you start sending telemetry. This design decision makes a lot of sense in my opinion because you wouldn't want to sends some telemetry unfiltered and then later add filters which would not retroactively apply to the previously sent events. |
Currently we are adding masks after-the-fact in our custom logic but the regexs we are adding are pretty narrowly-scoped (e.g. to the exact Azure subscription ID of the user). It's possible that more broadly-applicable, statically-defined regexs would cover our needs. I'll investigate. |
I mean the telemetry reporters don't have state really. In theory you could instantiate one per azure subscription and just cache telemetry until events are ready to be filtered. This feels a bit hacky though |
Hmm @aeisenberg I'm writing tests for this now and I think I completely blanked on this really only being a key mask and not a value one. So you replace entries with the key "foo" with a replacement string. This doesn't work for the use case of wanting to replace let's say some PII in a property because you would always just be redacting say the property "foo" and in that case why not just not send that property? I don't think this is what you need @bwateratmsft |
Here, we redact the value of a key if that key matches a regex supplied by the user, or that key is deleted if no replacement value is provided. Is your suggestion that users may want to redact the key instead of the value? How would that work if multiple keys match the regex? And what happens to the value? Perhaps I'm misunderstanding what you are asking for. |
My suggestion is that users want to redact values not keys. So lets say you want to make sure an IP doesn't end up in your telemetry then you could use an ipv4 regex such as |
Thanks for the clarification. This was not the implementation I had in mind when I created the PR. I was only considering keys. I do see how it can be useful to ensure privacy. It probably wouldn't be too complex to apply the same lookup and replacement to the values, but there would be some slightly different behaviour. If the lookup matches a key, then the entire value is replaced or removed. If the lookup matches a value, then the portion that is matched is replaced or removed (possibly with There could also be two separate parameters, one for key lookups and replacements, and one for values. Personally, I think it's simpler for users to have a single parameter to describe both. What do you think? |
@aeisenberg Could your use case be done by just value redaction? My worry is that overloading the use case could be surprising. The current key redaction is surprising to me since keys are almost always known so I wonder if it's surprising to others. |
I'm sure we could do something. The problem is that application insights adds a bunch of keys that we know we don't want, which we have no other way of controlling. So it is safer to delete those keys instead of relying on patterns. Eg- for Can you think of another way of achieving the same goal? |
Thanks for the feedback @aeisenberg . Maybe then just a boolean to apply the regex to the key vs value is best. We should probably default to value since the privacy case is more common. Would this work for you? |
I'd be fine with that. |
The
vscode-azureextensionui
package has a feature allowing callers to add some custom values to a list that ought to be masked out of telemetry events. You can see that here: https://github.com/microsoft/vscode-azuretools/blob/main/ui/src/masking.tsIdeally all masking logic should occur here in
vscode-extension-telemetry
, nothing downstream. I think it is a useful feature in the somewhat uncommon cases where a telemetry caller knows something should be masked that the shared telemetry library couldn't possibly predict.The text was updated successfully, but these errors were encountered: