Allow adding custom values to telemetry masking #80

bwateratmsft · 2021-11-17T17:32:39Z

The vscode-azureextensionui package has a feature allowing callers to add some custom values to a list that ought to be masked out of telemetry events. You can see that here: https://github.com/microsoft/vscode-azuretools/blob/main/ui/src/masking.ts

Ideally all masking logic should occur here in vscode-extension-telemetry, nothing downstream. I think it is a useful feature in the somewhat uncommon cases where a telemetry caller knows something should be masked that the shared telemetry library couldn't possibly predict.

The text was updated successfully, but these errors were encountered:

lramos15 · 2021-11-17T22:03:28Z

So is the idea that you pass a list of strings to replace with like "REDACTED". We of course don't want to leak PII but additional bloat to the module is also something we want to avoid

bwateratmsft · 2021-11-17T22:11:16Z

Exactly. I think that as much masking logic as is reasonable should be here, since it is shared so widely. However, I'm sure there will always be cases where the masking logic would miss something, and if it's hyper-specific, it would be better for that to live in the caller's code rather than bloat here.

Could the parameter also accept RegExp's, in addition to strings?

bwateratmsft · 2021-11-17T22:14:32Z

The litmus test for whether the code should be here or with the caller could basically be--is the thing being masked likely to only come from that extension, or is it reasonable to think it may come from all of them?

lramos15 · 2021-11-17T23:09:18Z

Yeah I almost want to say this feature in specific is out of scope as most people won't use masking and people who do are likely to have a custom telemetry service that wraps this module.

bwateratmsft · 2021-11-18T14:39:12Z

That's fair, though I would say my motivation is chiefly to make the path of least resistance also the good path (i.e. the most privacy-respecting path possible). If it's very easy for consumers of this package to mask PII, they're more likely to do it.

lramos15 · 2022-04-05T13:46:19Z

I think #93 fixes this.

bwateratmsft · 2022-04-05T14:19:03Z

Thanks @lramos15, @aeisenberg, I took a look at the PR and this is great! One question; is there a way to alter/add new replacements after the reporter is constructed?

aeisenberg · 2022-04-05T14:22:26Z

No. That wasn't something I considered. It did not seem to fit in with the way the TelemetryRepoter is implemented. Do you think that's a feature users will want? For my use case, all replacements have been hard coded.

lramos15 · 2022-04-05T14:22:54Z

@bwateratmsft Not at this time no. I think the general idea was that whatever you're trying to filter is most likely a static known set or at least known before you start sending telemetry. This design decision makes a lot of sense in my opinion because you wouldn't want to sends some telemetry unfiltered and then later add filters which would not retroactively apply to the previously sent events.

bwateratmsft · 2022-04-05T14:24:46Z

Currently we are adding masks after-the-fact in our custom logic but the regexs we are adding are pretty narrowly-scoped (e.g. to the exact Azure subscription ID of the user). It's possible that more broadly-applicable, statically-defined regexs would cover our needs. I'll investigate.

lramos15 · 2022-04-05T14:49:54Z

Currently we are adding masks after-the-fact in our custom logic but the regexs we are adding are pretty narrowly-scoped (e.g. to the exact Azure subscription ID of the user).

I mean the telemetry reporters don't have state really. In theory you could instantiate one per azure subscription and just cache telemetry until events are ready to be filtered. This feels a bit hacky though

lramos15 · 2022-05-02T19:13:42Z

Hmm @aeisenberg I'm writing tests for this now and I think I completely blanked on this really only being a key mask and not a value one. So you replace entries with the key "foo" with a replacement string. This doesn't work for the use case of wanting to replace let's say some PII in a property because you would always just be redacting say the property "foo" and in that case why not just not send that property? I don't think this is what you need @bwateratmsft

aeisenberg · 2022-05-02T19:50:55Z

Here, we redact the value of a key if that key matches a regex supplied by the user, or that key is deleted if no replacement value is provided.

https://github.com/microsoft/vscode-extension-telemetry/pull/93/files#diff-ba58f348990cdb73794c3b09904e7a51fed5a9373e2aebfcbe87046206b99591R43-R49

Is your suggestion that users may want to redact the key instead of the value? How would that work if multiple keys match the regex? And what happens to the value? Perhaps I'm misunderstanding what you are asking for.

lramos15 · 2022-05-02T19:58:45Z

Is your suggestion that users may want to redact the key instead of the value? How would that work if multiple keys match the regex? And what happens to the value? Perhaps I'm misunderstanding what you are asking for.

My suggestion is that users want to redact values not keys. So lets say you want to make sure an IP doesn't end up in your telemetry then you could use an ipv4 regex such as (\b25[0-5]|\b2[0-4][0-9]|\b[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3} and regardless of the keys it would replace it with the replacement string. Maybe an optional full replacement so {myProperty: "Failed to connect to 127.0.0.1"} would become {myProperty :"Failed to connect to redacted"} or full replacement {myProperty: "redacted"}. I understand that this isn't the current implementation and this is more what I had in mind at the time but believe I did not articulate it properly or catch it during my review. Reading back it is very clear this was your intent and makes sense for the data tags AI adds when you use the keys but most of the data that is being sent by the user is known keys with unknown values via the object {properties: {...}, measures: {...}} and therefore PII might enter there that they want to redact. The module does its best to redact this but of course every extension is different. Let me know your thoughts on this approach.

aeisenberg · 2022-05-02T21:26:51Z

Thanks for the clarification. This was not the implementation I had in mind when I created the PR. I was only considering keys. I do see how it can be useful to ensure privacy.

It probably wouldn't be too complex to apply the same lookup and replacement to the values, but there would be some slightly different behaviour. If the lookup matches a key, then the entire value is replaced or removed. If the lookup matches a value, then the portion that is matched is replaced or removed (possibly with ****).

There could also be two separate parameters, one for key lookups and replacements, and one for values. Personally, I think it's simpler for users to have a single parameter to describe both. What do you think?

lramos15 · 2022-05-27T18:24:47Z

@aeisenberg Could your use case be done by just value redaction? My worry is that overloading the use case could be surprising. The current key redaction is surprising to me since keys are almost always known so I wonder if it's surprising to others.

aeisenberg · 2022-05-27T18:52:53Z

I'm sure we could do something. The problem is that application insights adds a bunch of keys that we know we don't want, which we have no other way of controlling. So it is safer to delete those keys instead of relying on patterns. Eg- for ai.device.osPlatform we would need to figure out what all the potential values are and write a regex that matches them. It's not impossible, but it just seems more straight forward to delete the key.

Can you think of another way of achieving the same goal?

lramos15 · 2022-05-31T17:47:19Z

Can you think of another way of achieving the same goal?

Thanks for the feedback @aeisenberg . Maybe then just a boolean to apply the regex to the key vs value is best. We should probably default to value since the privacy case is more common. Would this work for you?

aeisenberg · 2022-05-31T17:53:03Z

I'd be fine with that.

This was referenced Nov 17, 2021

Reduce aggressiveness of masking logic microsoft/vscode-azuretools#1025

Closed

Mask out username from events #81

Open

maskValue is corrupting telemetry data seemingly at random microsoft/vscode-docker#3324

Closed

lramos15 mentioned this issue Mar 22, 2022

Exposes the addTelemetryProcessor method #56

Closed

lramos15 closed this as completed Apr 5, 2022

bwateratmsft mentioned this issue Apr 5, 2022

Use new built-in telemetry masking microsoft/vscode-azuretools#1104

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow adding custom values to telemetry masking #80

Allow adding custom values to telemetry masking #80

bwateratmsft commented Nov 17, 2021

lramos15 commented Nov 17, 2021

bwateratmsft commented Nov 17, 2021

bwateratmsft commented Nov 17, 2021

lramos15 commented Nov 17, 2021

bwateratmsft commented Nov 18, 2021

lramos15 commented Apr 5, 2022

bwateratmsft commented Apr 5, 2022

aeisenberg commented Apr 5, 2022

lramos15 commented Apr 5, 2022

bwateratmsft commented Apr 5, 2022

lramos15 commented Apr 5, 2022

lramos15 commented May 2, 2022

aeisenberg commented May 2, 2022

lramos15 commented May 2, 2022

aeisenberg commented May 2, 2022

lramos15 commented May 27, 2022

aeisenberg commented May 27, 2022

lramos15 commented May 31, 2022

aeisenberg commented May 31, 2022

Allow adding custom values to telemetry masking #80

Allow adding custom values to telemetry masking #80

Comments

bwateratmsft commented Nov 17, 2021

lramos15 commented Nov 17, 2021

bwateratmsft commented Nov 17, 2021

bwateratmsft commented Nov 17, 2021

lramos15 commented Nov 17, 2021

bwateratmsft commented Nov 18, 2021

lramos15 commented Apr 5, 2022

bwateratmsft commented Apr 5, 2022

aeisenberg commented Apr 5, 2022

lramos15 commented Apr 5, 2022

bwateratmsft commented Apr 5, 2022

lramos15 commented Apr 5, 2022

lramos15 commented May 2, 2022

aeisenberg commented May 2, 2022

lramos15 commented May 2, 2022

aeisenberg commented May 2, 2022

lramos15 commented May 27, 2022

aeisenberg commented May 27, 2022

lramos15 commented May 31, 2022

aeisenberg commented May 31, 2022