Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow places's categories to be NULL (although categories.primary must still exist if any categories.alternate are present) #231

Merged
merged 10 commits into from
Jul 10, 2024

Conversation

Dritte
Copy link
Contributor

@Dritte Dritte commented Jul 2, 2024

Description

Reporting back on the longstanding issue of 2.5M overture places with missing categories, needing a fix in advance of General Availability

tl;dr - I want to change the schema to allow for NULL categories.

My first finding is that ~half come from microsoft, and half come from Meta. Someone from Microsoft should investigate and/or fix their half.

In terms of why Meta's half is missing, it's because they only have extremely generic categories that aren't specific enough to be meaningful, and so those categories don't exist in the category mapping.

They're generic like "Local", "Product Service", and "Health Beauty" - so generic that they're essentially meaningless. In some large fraction of places, it seems like they have this category merely so that they can be classified as a place. (In our internally-enforced category hierarchy, some categories are disallowed from being places. "Local" is the most generic category that is still allowed to be a place.)

So our options are:

  1. add a mapping to some inappropriate category for the generic internal categories
  2. add these new generic categories to the Overture category ontology
  3. change the schema to allow categories to be NULL
  4. remove all category-less places from the release

I like option 3. As Jeff Underwood reported, the quality of these category-less places is quite low (which matches my investigation today), so I'm ok to remove them, but some of them are good and with better signals collected on them then we could classify them even better. I prefer to maintain the option to collect signals on them to then improve them, which I view as a large portion of why we're in the Overture project to begin with.

I discussed this with @jwass and @jenningsanderson , who supported making categories into an optional field. In particular, @jwass liked how keeping them there invites feedback for actually assigning an appropriate category... and elevates the importance of having a system to collect and respond to that feedback.

Reference

In regards to the longstanding NULL category issue from https://github.com/OvertureMaps/tf-place-data/issues/123

This was discussed and aligned on in the place engineering meeting on July 2, 2024 https://github.com/OvertureMaps/tf-place-data/issues/142

Testing

Added new test case and removed obsolete counterexample.

schema tester in GitHub action succeeded.

Checklist

Checklist of tasks commonly-associated with schema pull requests. Please review the relevant checklists and ensure you do all the tasks that are required for the change you made.

  1. Add relevant examples.
  2. Add relevant counterexamples.
  3. Update any counterexamples that became obsolete. For example, if a counterexample uses property A but is not intended to test property A's validity, and you made a schema change that invalidates property A in that counterexample, fix the counterexample to align it with your schema change.
  4. Update in-schema documentation using plain English written in complete sentences, if an update is required.
  5. Update Docusaurus documentation, if an update is required.
  6. Review change with Overture technical writer to ensure any advanced documentation needs will be taken care of, unless the change is trivial and would not affect the documentation.

Documentation Website

Update the hyperlink below to put the pull request number in.

[Docs preview for this PR.](https://dfhx9f55j8eg5.cloudfront.net/pr/<PUT THE PR # HERE>)

In regards to the longstanding NULL category issue from OvertureMaps/tf-place-data#123

Reporting back on the longstanding issue of 2.5M overture places with missing categories, needing a fix in advance of General Availability

**tl;dr - I want to change the schema to allow for NULL categories.**

My first finding is that ~half come from microsoft, and half come from Meta. Someone from Microsoft should investigate and/or fix their half.

In terms of why Meta's half is missing, it's because they only have extremely generic categories that aren't specific enough to be meaningful, and so those categories don't exist in the category mapping.

 They're generic like "Local", "Product Service", and "Health Beauty" - so generic that they're essentially meaningless. In some large fraction of places, it seems like they have this category merely so that they can be classified as a place. (In our internally-enforced category hierarchy, some categories are disallowed from being places. "Local" is the most generic category that is still allowed to be a place.)

So our options are:
1) add a mapping to some inappropriate category for the generic internal categories
2) add these new generic categories to the Overture category ontology
3) change the schema to allow categories to be NULL
4) remove all category-less places from the release

I like option 3. As Jeff Underwood reported, the quality of these category-less places is quite low (which matches my investigation today), so I'm ok to remove them, but some of them are good and with better signals collected on them then we could classify them even better. I prefer to maintain the option to collect signals on them to then improve them, which I view as a large portion of why we're in the Overture project to begin with.

I discussed this with @jwass and @jenningsanderson , who supported making categories into an optional field. In particular, @jwass liked how keeping them there invites feedback for actually assigning an appropriate category... and elevates the importance of having a system to collect and respond to that feedback.
Copy link
Collaborator

@vcschapp vcschapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked in schema TF.

Will follow names pattern, allow omitting whole categories property. So categories becomes not required, but categories.primary is required.

schema/places/place.yaml Outdated Show resolved Hide resolved
@Dritte Dritte changed the title Allow places's categories.primary to be NULL Allow places's categories to be NULL (although categories.primary must still exist if any categories.alternate are present) Jul 8, 2024
@jenningsanderson jenningsanderson merged commit 7451a4f into dev Jul 10, 2024
2 checks passed
@jenningsanderson jenningsanderson deleted the places-null-category branch July 10, 2024 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants