Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[text analytics] PII updates for v5.1.0b6 #17038

Merged
merged 8 commits into from
Mar 5, 2021

Conversation

abhahn
Copy link
Member

@abhahn abhahn commented Mar 3, 2021

fixes #16174

@abhahn
Copy link
Member Author

abhahn commented Mar 3, 2021

TODO: update tests

@@ -7,6 +7,12 @@
- Renamed properties `aspect` and `opinions` to `target` and `assessments` respectively in class `MinedOpinion`.
- Renamed classes `AspectSentiment` and `OpinionSentiment` to `TargetSentiment` and `AssessmentSentiment` respectively.

**New Features**

- Added parameter `pii_entity_categories` to the `recognize_pii_entities` client method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @abhahn I actually made a gist for this design, here it is, sorry I should've 100% clarified this with you. js also did a pii entity pr that aligns with this (idt they added an enum though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've update things to align with the gist above.

@@ -829,7 +835,7 @@ def begin_analyze_batch_actions( # type: ignore
except ValueError as error:
if "API version v3.0 does not have operation 'begin_analyze'" in str(error):
raise ValueError(
"'begin_analyze_batch_actions' endpoint is only available for API version v3.1-preview.3"
"'begin_analyze_batch_actions' endpoint is only available for API version v3.1-preview.4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test for pii categories too? You can also add it to samples if you think it's necessary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a couple of tests. One thing I'm still unsure about is why using domain=phi seems to recognize all entities, even the ones that, according to the official docs, are not usually supposed to be returned as PHI. Since the interaction between domain and pii_categories is an intersection, I'd eventually like to write a test to show that there is at least one combination that will cause a list of zero entities to be returned, but I need to know which entities are not supposed to be classified as PHI.

For samples, we may want to consider adding a new one to show this interaction between domain and categories, but seems like it doesn't fit well with the existing sample. Let me know if this is something we should do for this release or if we should wait to get more info from the service team about the issue I mentioned above about PHI.

@abhahn abhahn requested a review from iscai-msft March 4, 2021 17:19
Copy link
Contributor

@iscai-msft iscai-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good, thanks so much @abhahn !

@@ -10,6 +10,9 @@

**New Features**

- Added parameter `categories_filter` to the `recognize_pii_entities` client method.
- Added `categries_filter` property to class `RecognizePiiEntitiesAction`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v tiny nit: categories_filter

@@ -1410,13 +1410,21 @@ class RecognizePiiEntitiesAction(DictMixin):
:keyword str model_version: The model version to use for the analysis.
:keyword str domain_filter: An optional string to set the PII domain to include only a
subset of the PII entity categories. Possible values include 'phi' or None.
:keyword categories_filter: A list of specific PII entity categories to return. If the value of `domain_filter`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if it's worth to mention domain_filter in this docstring. Following UX studies I've been trying to put more examples in the docstrings, so the docstring could be something like

:keyword categories_filter: Instead of filtering over all PII entity categories, you can pass in a list of the specific PII entity categories you want to filter out. For example, if you only want to filter out U.S. social security numbers in a document, you can pass in `[PiiEntityCategoryType. US_SOCIAL_SECURITY_NUMBER]` for this kwarg.


self.assertEqual(len(result[0].entities), 1)
entity = result[0].entities[0]
self.assertEqual(entity.category, PiiEntityCategoryType.US_SOCIAL_SECURITY_NUMBER.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test look great, and I think you've already mentioned this offline. But just for posterity, can you also add tests for pii-categories to the analyze endpoint? thanks!

@iscai-msft iscai-msft merged commit b6d2979 into Azure:master Mar 5, 2021
iscai-msft added a commit to abhahn/azure-sdk-for-python that referenced this pull request Mar 5, 2021
…into ta-v5.1.0b6-analyze

* 'master' of https://github.com/Azure/azure-sdk-for-python: (24 commits)
  [text analytics] PII updates for v5.1.0b6 (Azure#17038)
  Fix bug where imported matrix parameter duplicates are not overrided (Azure#17126)
  Add NULL to readme (Azure#17076)
  Sas batching (Azure#17133)
  dropping py3.5 (Azure#17127)
  [EventHubs] 5.3.1 update changelog (Azure#17132)
  [text analytics] assertions (Azure#17098)
  add recordings (Azure#17125)
  [core] add HttpRequest and HttpResponse reprs (Azure#16972)
  [text analytics] add actual normalized_text tests (Azure#17123)
  update samples to use actual role names (Azure#17124)
  Sync eng/common directory with azure-sdk-tools for PR 1448 (Azure#17085)
  Enable APIView status check (Azure#17107)
  Fix PackageName typo (Azure#17109)
  Move SetTestPipelineVersion.ps1 to eng/common (Azure#17103)
  Fix paths for non-windows agents (Azure#17096)
  [Communication] - Identity - Update README (Azure#17091)
  Rename CertificateCredential's certificate_bytes -> certificate_data (Azure#17090)
  fix shared reqs (Azure#17095)
  [translation] initial library (Azure#16837)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[text analytics] PII updates for v3.1-preview.4
2 participants