From d94e2b7c88bb488d0a81405cdf293b033547250a Mon Sep 17 00:00:00 2001 From: Bobby Fortanely Date: Tue, 2 Jul 2024 13:56:21 -0400 Subject: [PATCH 01/10] Make categories an optional field for place In regards to the longstanding NULL category issue from OvertureMaps/tf-place-data#123 Reporting back on the longstanding issue of 2.5M overture places with missing categories, needing a fix in advance of General Availability **tl;dr - I want to change the schema to allow for NULL categories.** My first finding is that ~half come from microsoft, and half come from Meta. Someone from Microsoft should investigate and/or fix their half. In terms of why Meta's half is missing, it's because they only have extremely generic categories that aren't specific enough to be meaningful, and so those categories don't exist in the category mapping. They're generic like "Local", "Product Service", and "Health Beauty" - so generic that they're essentially meaningless. In some large fraction of places, it seems like they have this category merely so that they can be classified as a place. (In our internally-enforced category hierarchy, some categories are disallowed from being places. "Local" is the most generic category that is still allowed to be a place.) So our options are: 1) add a mapping to some inappropriate category for the generic internal categories 2) add these new generic categories to the Overture category ontology 3) change the schema to allow categories to be NULL 4) remove all category-less places from the release I like option 3. As Jeff Underwood reported, the quality of these category-less places is quite low (which matches my investigation today), so I'm ok to remove them, but some of them are good and with better signals collected on them then we could classify them even better. I prefer to maintain the option to collect signals on them to then improve them, which I view as a large portion of why we're in the Overture project to begin with. I discussed this with @jwass and @jenningsanderson , who supported making categories into an optional field. In particular, @jwass liked how keeping them there invites feedback for actually assigning an appropriate category... and elevates the importance of having a system to collect and respond to that feedback. --- examples/places/place-null-categories.yaml | 31 ++++++++++++++++++++++ schema/places/place.yaml | 1 - 2 files changed, 31 insertions(+), 1 deletion(-) create mode 100644 examples/places/place-null-categories.yaml diff --git a/examples/places/place-null-categories.yaml b/examples/places/place-null-categories.yaml new file mode 100644 index 00000000..92d0a869 --- /dev/null +++ b/examples/places/place-null-categories.yaml @@ -0,0 +1,31 @@ +geometry: + coordinates: + - 0 + - 0 + type: Point +id: overture:places:place:1 +properties: + addresses: + - freeform: 770 Broadway, Floor 8 + locality: New York + - country: US + freeform: '770 Broadway #802' + locality: New York + region: US-NY + brand: + name: Example + wikidata: Q1000 + categories: NULL + emails: + - info@example.com + phones: + - +32 1207 + socials: + - https://www.twitter.com/example + theme: places + type: place + update_time: '2024-06-12T12:20:08-06:00' + version: 1 + websites: + - https://www.example.com +type: Feature diff --git a/schema/places/place.yaml b/schema/places/place.yaml index f3f69b3e..d948efdc 100644 --- a/schema/places/place.yaml +++ b/schema/places/place.yaml @@ -17,7 +17,6 @@ properties: - "$ref": https://geojson.org/schema/Point.json properties: unevaluatedProperties: false - required: [categories] allOf: - "$ref": ../defs.yaml#/$defs/propertyContainers/overtureFeaturePropertiesContainer - "$ref": ../defs.yaml#/$defs/propertyContainers/namesContainer From 8bce68a1e487762c6396b9d8fc4ab5f0077d963d Mon Sep 17 00:00:00 2001 From: Bobby Fortanely Date: Tue, 2 Jul 2024 14:08:31 -0400 Subject: [PATCH 02/10] Don't require main in place's categories --- schema/places/place.yaml | 1 - 1 file changed, 1 deletion(-) diff --git a/schema/places/place.yaml b/schema/places/place.yaml index d948efdc..0d9b2bba 100644 --- a/schema/places/place.yaml +++ b/schema/places/place.yaml @@ -26,7 +26,6 @@ properties: The categories of the place. Complete list is available on GitHub: https://github.com/OvertureMaps/schema/blob/main/task-force-docs/places/overture_categories.csv type: object - required: [primary] properties: primary: description: The primary or main category of the place. From b3f46e55acad4879545d4d4044e4d838d1b4d52d Mon Sep 17 00:00:00 2001 From: Bobby Fortanely Date: Tue, 2 Jul 2024 14:17:30 -0400 Subject: [PATCH 03/10] Require categories but allow categories.main to be NULL --- examples/places/place-null-categories.yaml | 3 ++- schema/places/place.yaml | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/examples/places/place-null-categories.yaml b/examples/places/place-null-categories.yaml index 92d0a869..71fc073d 100644 --- a/examples/places/place-null-categories.yaml +++ b/examples/places/place-null-categories.yaml @@ -15,7 +15,8 @@ properties: brand: name: Example wikidata: Q1000 - categories: NULL + categories: + main: NULL emails: - info@example.com phones: diff --git a/schema/places/place.yaml b/schema/places/place.yaml index 0d9b2bba..9a026369 100644 --- a/schema/places/place.yaml +++ b/schema/places/place.yaml @@ -17,6 +17,7 @@ properties: - "$ref": https://geojson.org/schema/Point.json properties: unevaluatedProperties: false + required: [categories] allOf: - "$ref": ../defs.yaml#/$defs/propertyContainers/overtureFeaturePropertiesContainer - "$ref": ../defs.yaml#/$defs/propertyContainers/namesContainer From 9d916336f32be4a93eb628aa5578cc3f6ba190e1 Mon Sep 17 00:00:00 2001 From: Bobby Fortanely Date: Tue, 2 Jul 2024 14:20:31 -0400 Subject: [PATCH 04/10] Missing categories.main counterexample now permissible --- .../places/bad-categories-missing-primary.yaml | 17 ----------------- 1 file changed, 17 deletions(-) delete mode 100644 counterexamples/places/bad-categories-missing-primary.yaml diff --git a/counterexamples/places/bad-categories-missing-primary.yaml b/counterexamples/places/bad-categories-missing-primary.yaml deleted file mode 100644 index 8049c93e..00000000 --- a/counterexamples/places/bad-categories-missing-primary.yaml +++ /dev/null @@ -1,17 +0,0 @@ ---- -id: overture:places:place:1 -type: Feature -geometry: - type: Point - coordinates: [0, 0] -properties: - ext_expected_errors: ["missing properties: 'primary'"] - theme: places - type: place - version: 1 - update_time: "2024-06-12T12:41:30-06:00" - names: - primary: Fancy POI missing primary category property - categories: - alternate: - - some_category From a0c4c0fd1cafca616b8b3369c9ccb8643f27cee9 Mon Sep 17 00:00:00 2001 From: Bobby Fortanely Date: Tue, 2 Jul 2024 14:35:20 -0400 Subject: [PATCH 05/10] main to primary --- examples/places/place-null-categories.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/places/place-null-categories.yaml b/examples/places/place-null-categories.yaml index 71fc073d..49932696 100644 --- a/examples/places/place-null-categories.yaml +++ b/examples/places/place-null-categories.yaml @@ -16,7 +16,7 @@ properties: name: Example wikidata: Q1000 categories: - main: NULL + primary: NULL emails: - info@example.com phones: From 05dd0fefd24b035306a478202279b2b48d1e21cf Mon Sep 17 00:00:00 2001 From: Bobby Fortanely Date: Tue, 2 Jul 2024 14:42:43 -0400 Subject: [PATCH 06/10] Allow places.primary to be NULL --- schema/places/place.yaml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/schema/places/place.yaml b/schema/places/place.yaml index 9a026369..69e2d135 100644 --- a/schema/places/place.yaml +++ b/schema/places/place.yaml @@ -30,7 +30,9 @@ properties: properties: primary: description: The primary or main category of the place. - "$ref": "./defs.yaml#/$defs/typeDefinitions/category" + oneOf: + - "$ref": "./defs.yaml#/$defs/typeDefinitions/category" + - "type": "null" alternate: description: Alternate categories of the place. Some places might fit into two From cb03e498cbe3d759e66f2a8364fe14de267ff630 Mon Sep 17 00:00:00 2001 From: Bobby Fortanely Date: Tue, 2 Jul 2024 14:48:57 -0400 Subject: [PATCH 07/10] Require categories.primary but allow it to be NULL --- schema/places/place.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/schema/places/place.yaml b/schema/places/place.yaml index 69e2d135..c10403b0 100644 --- a/schema/places/place.yaml +++ b/schema/places/place.yaml @@ -27,6 +27,7 @@ properties: The categories of the place. Complete list is available on GitHub: https://github.com/OvertureMaps/schema/blob/main/task-force-docs/places/overture_categories.csv type: object + required: [primary] properties: primary: description: The primary or main category of the place. From e2c847a87e47c1b5b7c7c367c25f262cc3af400a Mon Sep 17 00:00:00 2001 From: Bobby Fortanely Date: Mon, 8 Jul 2024 09:30:49 -0400 Subject: [PATCH 08/10] Allow categories to be null, but require categories.primary if categories is non-NULL --- schema/places/place.yaml | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/schema/places/place.yaml b/schema/places/place.yaml index c10403b0..d948efdc 100644 --- a/schema/places/place.yaml +++ b/schema/places/place.yaml @@ -17,7 +17,6 @@ properties: - "$ref": https://geojson.org/schema/Point.json properties: unevaluatedProperties: false - required: [categories] allOf: - "$ref": ../defs.yaml#/$defs/propertyContainers/overtureFeaturePropertiesContainer - "$ref": ../defs.yaml#/$defs/propertyContainers/namesContainer @@ -31,9 +30,7 @@ properties: properties: primary: description: The primary or main category of the place. - oneOf: - - "$ref": "./defs.yaml#/$defs/typeDefinitions/category" - - "type": "null" + "$ref": "./defs.yaml#/$defs/typeDefinitions/category" alternate: description: Alternate categories of the place. Some places might fit into two From ad3905be8eb9c1bb09dfb34f70cd1ead886b468f Mon Sep 17 00:00:00 2001 From: Bobby Fortanely Date: Mon, 8 Jul 2024 09:31:28 -0400 Subject: [PATCH 09/10] Set categories counter-example to have NULL categories. --- examples/places/place-null-categories.yaml | 2 -- 1 file changed, 2 deletions(-) diff --git a/examples/places/place-null-categories.yaml b/examples/places/place-null-categories.yaml index 49932696..9857fc33 100644 --- a/examples/places/place-null-categories.yaml +++ b/examples/places/place-null-categories.yaml @@ -15,8 +15,6 @@ properties: brand: name: Example wikidata: Q1000 - categories: - primary: NULL emails: - info@example.com phones: From cfc5ad4041612aef50787a27a94bd82eeb3ceb1a Mon Sep 17 00:00:00 2001 From: Bobby Fortanely Date: Mon, 8 Jul 2024 10:30:20 -0400 Subject: [PATCH 10/10] Re-add missing categories counter-example, based on places schema refinement --- .../places/bad-categories-missing-primary.yaml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 counterexamples/places/bad-categories-missing-primary.yaml diff --git a/counterexamples/places/bad-categories-missing-primary.yaml b/counterexamples/places/bad-categories-missing-primary.yaml new file mode 100644 index 00000000..8049c93e --- /dev/null +++ b/counterexamples/places/bad-categories-missing-primary.yaml @@ -0,0 +1,17 @@ +--- +id: overture:places:place:1 +type: Feature +geometry: + type: Point + coordinates: [0, 0] +properties: + ext_expected_errors: ["missing properties: 'primary'"] + theme: places + type: place + version: 1 + update_time: "2024-06-12T12:41:30-06:00" + names: + primary: Fancy POI missing primary category property + categories: + alternate: + - some_category