-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow places's categories to be NULL (although categories.primary must still exist if any categories.alternate are present) #231
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In regards to the longstanding NULL category issue from OvertureMaps/tf-place-data#123 Reporting back on the longstanding issue of 2.5M overture places with missing categories, needing a fix in advance of General Availability **tl;dr - I want to change the schema to allow for NULL categories.** My first finding is that ~half come from microsoft, and half come from Meta. Someone from Microsoft should investigate and/or fix their half. In terms of why Meta's half is missing, it's because they only have extremely generic categories that aren't specific enough to be meaningful, and so those categories don't exist in the category mapping. They're generic like "Local", "Product Service", and "Health Beauty" - so generic that they're essentially meaningless. In some large fraction of places, it seems like they have this category merely so that they can be classified as a place. (In our internally-enforced category hierarchy, some categories are disallowed from being places. "Local" is the most generic category that is still allowed to be a place.) So our options are: 1) add a mapping to some inappropriate category for the generic internal categories 2) add these new generic categories to the Overture category ontology 3) change the schema to allow categories to be NULL 4) remove all category-less places from the release I like option 3. As Jeff Underwood reported, the quality of these category-less places is quite low (which matches my investigation today), so I'm ok to remove them, but some of them are good and with better signals collected on them then we could classify them even better. I prefer to maintain the option to collect signals on them to then improve them, which I view as a large portion of why we're in the Overture project to begin with. I discussed this with @jwass and @jenningsanderson , who supported making categories into an optional field. In particular, @jwass liked how keeping them there invites feedback for actually assigning an appropriate category... and elevates the importance of having a system to collect and respond to that feedback.
Dritte
requested review from
DavidKarlas,
jwass,
jenningsanderson,
vcschapp,
JanCallewaert-TomTom,
jensgoossens-tomtom and
TristanDiet-TomTom
July 2, 2024 18:53
jwass
reviewed
Jul 2, 2024
vcschapp
reviewed
Jul 3, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked in schema TF.
Will follow names
pattern, allow omitting whole categories
property. So categories
becomes not required, but categories.primary
is required.
jwass
approved these changes
Jul 8, 2024
jenningsanderson
approved these changes
Jul 8, 2024
Dritte
changed the title
Allow places's categories.primary to be NULL
Allow places's categories to be NULL (although categories.primary must still exist if any categories.alternate are present)
Jul 8, 2024
TristanDiet-TomTom
approved these changes
Jul 10, 2024
DavidKarlas
approved these changes
Jul 10, 2024
mojodna
approved these changes
Jul 10, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Reporting back on the longstanding issue of 2.5M overture places with missing categories, needing a fix in advance of General Availability
tl;dr - I want to change the schema to allow for NULL categories.
My first finding is that ~half come from microsoft, and half come from Meta. Someone from Microsoft should investigate and/or fix their half.
In terms of why Meta's half is missing, it's because they only have extremely generic categories that aren't specific enough to be meaningful, and so those categories don't exist in the category mapping.
They're generic like "Local", "Product Service", and "Health Beauty" - so generic that they're essentially meaningless. In some large fraction of places, it seems like they have this category merely so that they can be classified as a place. (In our internally-enforced category hierarchy, some categories are disallowed from being places. "Local" is the most generic category that is still allowed to be a place.)
So our options are:
I like option 3. As Jeff Underwood reported, the quality of these category-less places is quite low (which matches my investigation today), so I'm ok to remove them, but some of them are good and with better signals collected on them then we could classify them even better. I prefer to maintain the option to collect signals on them to then improve them, which I view as a large portion of why we're in the Overture project to begin with.
I discussed this with @jwass and @jenningsanderson , who supported making categories into an optional field. In particular, @jwass liked how keeping them there invites feedback for actually assigning an appropriate category... and elevates the importance of having a system to collect and respond to that feedback.
Reference
In regards to the longstanding NULL category issue from https://github.com/OvertureMaps/tf-place-data/issues/123
This was discussed and aligned on in the place engineering meeting on July 2, 2024 https://github.com/OvertureMaps/tf-place-data/issues/142
Testing
Added new test case and removed obsolete counterexample.
schema tester in GitHub action succeeded.
Checklist
Checklist of tasks commonly-associated with schema pull requests. Please review the relevant checklists and ensure you do all the tasks that are required for the change you made.
A
but is not intended to test propertyA
's validity, and you made a schema change that invalidates propertyA
in that counterexample, fix the counterexample to align it with your schema change.Documentation Website
Update the hyperlink below to put the pull request number in.
[Docs preview for this PR.](https://dfhx9f55j8eg5.cloudfront.net/pr/<PUT THE PR # HERE>)