Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create endpoint for conflation of FMTM data with existing OSM data #1548

Closed
spwoodcock opened this issue Jun 5, 2024 · 12 comments
Closed

Create endpoint for conflation of FMTM data with existing OSM data #1548

spwoodcock opened this issue Jun 5, 2024 · 12 comments
Assignees
Labels
backend Related to backend code effort:high Broader scope task with unclear timeline (consider splitting) enhancement New feature or request priority:high Should be addressed as a priority testing:ready Ready for testing

Comments

@spwoodcock
Copy link
Member

spwoodcock commented Jun 5, 2024

Is your feature request related to a problem? Please describe.

  • https://github.com/hotosm/conflator requires OSM data in a local postgres instance.
  • We need an endpoint that can conflate FMTM submissions against existing OSM data.
  • It should take the final submission geojson from FMTM, including geometries and tags.
  • The conflation code should ideally return the certainty of the conflation:
    • No overlap (new): add geometry and tags with certain flag.
    • Exact building overlap: conflate tags and return geom, with certain flag.
    • Partial building overlap: return both geometries, including conflated tags on both, with uncertain flag.
  • The resulting GeoJSON will have geometries and tags, with a certainty attached, ready for manual validation via the FMTM frontend.

Describe the solution you'd like

  • Option 1: host our own OSM database with updates (not ideal, wasted money), and create an endpoint.
  • Option 2: integrate the conflation with raw-data-api, that already has an existing postgres instance with updated OSM data (i.e. a new endpoint).

Discussing the long term solution with @kshitijrajsharma over the coming weeks.
The most achievable option now is something like https://github.com/kshitijrajsharma/osmconflator/blob/3ecf12b0d31773750cf1f806fd4547532f19d046/osmconflator/utils.py#L18

Process:

  • User requests conflation
  • Latest OSM data for the AOI is downloaded via raw-data-api
    • The data should include all geometry types and all tags.
    • The data is used for conflation against the FMTM geometries and tags.
  • The endpoint can return the certainties of conflation as described above.
  • The returned GeoJSON is used for visual validation, as described in the follow on issues in this milestone.

Additional considerations

  • How do we handle the case when the offset is so large that there is no building overlap (for matching buildings).
  • Is this a likely scenario?
@spwoodcock
Copy link
Member Author

For now this should be adapted from: https://github.com/kshitijrajsharma/osmconflator/blob/3ecf12b0d31773750cf1f806fd4547532f19d046/osmconflator/utils.py#L18

  • This would provide very basic conflation of the geometries.
  • At first, we could just merge all the tags.
  • The next step would be tag conflation, but most likely this will be handled in the near future in hotosm/conflator.

@spwoodcock
Copy link
Member Author

spwoodcock commented Aug 2, 2024

More context on this after a lengthy internal discussion with team members.

OSM Refs & Raw-Data-API

  • OSM has nodes (points with lat/lng), ways (a collection of nodes that form a geometry), and relations (a collection of nodes/ways with a similar theme, not as relevant for our purposes).
  • Tags can be attached to a node, if the feature is a point feature, or to ways, if the feature is a polygon/linestring feature.
  • In OSM XML we have the concept of refs, which link nodes to ways:
    • A simplified node entry:
      <node id="298884269" lat="54.0901746" lon="12.2482632">
    • A simplified way entry (polygon):
      <way id="26659127">
        <nd ref="292403538"/>
        <nd ref="298884289"/>
        <nd ref="261755686"/>
        <nd ref="261728686"/>
        <nd ref="292403538"/>
      </way>
  • In raw-data-api, any nodes that do not have a tag attached (i.e. anything not a point feature), and the refs entries for ways are removed. This leaves us with a collection of geometries (point, polyline, polygon), with attached osm_id and tags.
  • These refs could be useful for easier correlation of geometries and generating a new OSM XML during conflation (they are essential if using JOSM to conflate), however, storing the refs in raw-data-api would amount to 100's of GB of data stored, which is expensive in cloud databases. We need a workaround.

GeoJSON Conflation Workflow (OSM specific)

  • At the end of field mapping we have a GeoJSON file of geometries, with attached osm_id values for existing geometries, plus OSM tags that we converted from field mapping.
  • We are creating a small UI in FMTM for conflation in addition to an API to remove the need for JOSM.

Proposed workflow:

  • Download extract from raw-data-api, including OSM IDs (during project creation).
  • Carry out field mapping, collecting submissions / tags for geometries.
  • At conflation step, download OSM XML directly from OSM for AOI, also including OSM IDs
    • This great thing about going straight to OSM is the data is as up to date as it gets.
    • We can use the OSM API 0.6 GET /api/0.6/map?bbox=left,bottom,right,top endpoint to get all nodes, ways, relations for a bbox.
  • Convert the latest OSM XML to GeoJSON with OSM IDs attached.
    • First we should strip out any geometries not falling within our AOI (the BBOX-based download from OSM is not ideal and may extend outside of our irregularly shaped task area).
    • Then we need to keep a copy of the XML for later (as it has helpful node/way relations that will be lost in geojson conversion).
    • Alternatively we just use Overpass! 😅
    • Convert to GeoJSON for the conflation.
  • Conflate the FMTM GeoJSON with the OSM GeoJSON:
    • Attempt to match geometries on the OSM IDs.
    • We have three distinct workflows, described below.

New Geoms

For new geoms in the FMTM GeoJSON (ID in FMTM but not OSM, feature was added during field mapping), it is likely the field verified geometry will take precedence.

  • However we should first check if OSM contains a geometry that intersects with our new geometry (i.e. is in the place where we want our new one to be!)
    • We could use at ST_Overlaps in PostGIS for this.
    • If there is no geometry conflict, we can accept the new FMTM geometry.
    • If there is a geometry conflict, we need to flag this in the same way as the modified geometries described below (likely requiring manual selection / user input).
  • If uploading a new geometry to OSM we need to generate the nodes for the XML here - smallest number necessary to make the way geometry.

We need to call the API with: PUT /api/0.6/[node|way|relation]/create with the required nodes and way.

Modified Geoms

Modified geoms (i.e. an OSM ID match, but differing geometry footprints) should be flagged, their percentage overlap determined, and then picked up in the frontend later for manual verification for which geometry will be kept.

  • It is likely that the new geometry in OSM has been updated for a reason (i.e. better remote mapping using new high-resolution imagery), so this will likely take precedence.
    • We can check the version tag is also higher than our current version tag to verify the update.
  • If it is determined the OSM geometry update was in error, and the FMTM geometry should be used, then it will be necessary to modify the nodes lat/lons in the XML.
    • We can also use the timestamp and version to help identify if this change was done after the FMTM project started, so if it's some sort of error.
    • We should also account for the digitization_correct field, but keep in mind that a 'field verified' geometry is simply a visual check, not measured with a measuring tape! An updated geometry in OSM may actually be more accurate if the digitiser used high resolution imagery.
  • If the FMTM geometry is the result of splitting a larger geometry into smaller ones, we should have a flag for this and account for it. The multiple FMTM geometries would replace the large OSM geometry.
  • If the FMTM geometry is the result of lumping smaller geoms into one larger one, we should have a flag for this and account for it. The single FMTM geometry would replace the multiple OSM geometries.

To update an existing geometry we need to call the API via PUT /api/0.6/[node|way|relation]/#id to update existing nodes if necessary, and update the way with tags.

Deleted Geoms

  • Deleted geoms have two possible scenarios:
    • There is an OSM ID match, but the FMTM geometry has a tag to say that the digitization_correct field suggest the building does not exist.
    • ID in OSM but not FMTM: the geometry was added after the FMTM project started, so we can't be sure if the field team missed it and it's valid, or the OSM feature creation was in error.
      • We should probably inform the validator ('This geometry is newly added in OSM, but was deleted in FMTM and the digitisation field verified. Should we delete it?')
    • Also check the date modified time of the OSM feature to help inform the decision.

For the actual deletion we need to call the API: DELETE /api/0.6/[node|way|relation]/#id to delete the way and all related nodes.

Needs further research!

Updating OSM via the API

  • First we need to open a changeset and get an ID returned. This describes the changes being made.
  • With the existing XML we can modify as required and call the API for actions we need:
    • Add new geom. E.g. for a node (no id is included, it's generated automatically):
      <osm>
          <node changeset="12" lat="..." lon="...">
      	    <tag k="note" v="Just a node"/>
      	    ...
          </node>
      </osm>
    • Update existing nodes/ways. E.g. for a node (we need to include the current version for each object, then it is automatically incremented by OSM):
      <osm>
          <node changeset="188021" id="4326396331" lat="50.4202102" lon="6.1211032" version="1" visible="true">
      	    <tag k="foo" v="barzzz" />
          </node>
      </osm>
    • Delete existing nodes/ways. E.g. for a node:
      <osm>
          <node id="..." version="..." changeset="..." lat="..." lon="..." />
      </osm>
  • We then need to close the changeset when we are done editing.

@spwoodcock
Copy link
Member Author

The above comment may change some of the API we have already designed unfortunately! Sorry @Sujanadh 🙏 Let's look into this together when we get a chance

@spwoodcock spwoodcock added effort:high Broader scope task with unclear timeline (consider splitting) priority:high Should be addressed as a priority and removed effort:medium Likely a day or two labels Aug 2, 2024
@rsavoye
Copy link
Contributor

rsavoye commented Aug 2, 2024

I think you can skip the "Download extract from raw-data-api, including OSM IDs.", since in the next step you are doing the same thing, just differently. The conflation software converts the OSM XML internally to GeoJson anyway, and later converts the results back to OSM XML. A GeoJson file is generated too. OSM XML is important for JOSM. Otherwise if you edit the GeoJson file, at least in JOSM, you need to manually cut & paste the tags from the external dataset into the OSM layer. Creating a changeset is a good idea, and could eliminate the need for an OSM XML file and JOSM. It'd be easy to take the list of GeoJson features after validation and generate a changeset. I've basically been using JOSM as my UI.

I have seen that the external datasets for highways can be split at intersections, since the traced line may not understand where the surface/smoothness/name changes. But if you are ground-truthing, this is a common thing.

@spwoodcock
Copy link
Member Author

By "Download extract from raw-data-api" I mean the initial data extract download during project creation - we need the geometries to have something to map!

I think we need a conflation flow designed from the ground up to be independent of JOSM:

  • We shouldn't let restrictions from one tool determine the development of a potentially more efficient or easier conflation flow.
  • JOSM is quite antiquated now, and although powerful, doesn't provide a user experience at the level we would provide in a web UI.
  • Saving the requirement for yet another tool as part of the FMTM stack would be nice!

@rsavoye
Copy link
Contributor

rsavoye commented Aug 3, 2024

Sure, you can just use the GeoJson format if you're going to upload via changeset. Is this UI part of FMTM or a separate program ? JOSM is not antiquated, it is under active development, and many advanced mappers use use. And it can work fully offline, critical in the field. Not everything needs to be a website. Since the conflation code generates returns a list of GeoJson features, converting to OSM XML is separate, so you can ignore that if you want.

@charliemcgrady
Copy link
Collaborator

charliemcgrady commented Aug 6, 2024

Hi Sam, great writeup. Adding a couple of questions below.

  1. For new geometries, will there be a step which ensures an OSM building has not been added since the FMTM building was collected? ID in FMTM but not OSM implies no spatial join will be performed to ensure duplicate buildings are not added to OSM.
  2. Will multi-polygons be supported for addition/modification/deletion? In this case, the API will need to handle building relations as well.
  3. Are there plans to handle more complex merge conflicts during manual validation? For rural areas, it's likely sufficient to either take the OSM or the FMTM buildings, as the conflicts will be more isolated. However, denser regions where buildings are either connected or close to each other may lead to more complex conflicts which require a more sophisticated merge process.
Screenshot 2024-08-05 at 7 58 54 PM

@spwoodcock
Copy link
Member Author

spwoodcock commented Aug 6, 2024

Hi @charliemcgrady! Thanks so much for the input, it's really appreciated 🙏

  1. Excellent point - I have updated to add the check for new geometries here.

Note

I should preface the answers to Q1 & Q2 by saying that the approach listed above is definitely a more naive (v1) approach.

Our main goal for FMTM (at least to start, based on user requirements) is to map regions that have typically been poorly mapped, mainly in developing countries. So as you say, this normally means much sparser geometries and an easier merge.

Once we nail down this simple conflation, we will move onto other conflation requirements based on user needs.

  1. Although MultiPolygons are less common in data extracts I have seen in poorly mapped areas, we definitely need to handle this! At the start we could do a similar approach to Polygons, where we form a GeoJSON from the downloaded XML (using the relations as you say), then attempt to match the footprints of the FMTM MultiPolygon with the current OSM MultiPolygon.

  2. This is an important question & a tricky one. When we start factoring in building types common in many developed city centres (terraces, blocks, etc), things get messy! We are definitely helped out by the sparseness of data we typically encounter. But this will 💯 have to be addressed with some more thought!

@spwoodcock
Copy link
Member Author

spwoodcock commented Aug 6, 2024

Quick update on this based on discussions with @kshitijrajsharma who architected raw-data-api!

Regarding point 2, on MultiPolygon (+ also MultiLineString).

Relations / Multi-Geoms in OSM (background info)

  • In OSM there is no concept of geometries, so MultiPolygons do not exist.
  • The main criteria to identify a MultiPolygon is:
    • The OSM entry is a relation.
    • It has a tag type=multipolygon or type=boundary.
    • Also if we identify multiple ways inside the relation area, it's likely a multipolygon (requires additional processing though).
  • We can use other tags such as route=x to determine MultiLineStrings.

How raw-data-api handles multi-geoms

How to use this info during conflation

  • The conversion to geometries is already done by raw-data-api.
  • When we get the up-to-date OSM XML during the conflation step, we need to process the XML in the same way, to identify the MultiPolygon / MultiLineString.
  • We can then match the geometry types together based on the osm_id and do a footprint comparison as described above.

@rsavoye
Copy link
Contributor

rsavoye commented Aug 6, 2024

I have seen buildings as MultiPolygons in OSM. In some countries there is a large courtyard in the middle, and the building wraps around it. So when conflating, I ignore the inner polygons. If a data extract for ODK Collect is used, then conflation is relatively easy as we have the OSM ID. Then it's just merging tags together. If there is a building in the basemap used for the location, or the GPS, but it's not in OSM, then it's a new feature. For FMTM, nobody is mapping building polygons with Collect that I'm aware of... so no spatial conflation is needed of the Polygons. Spatial conflation is more for building imports, not field mapping with ODK. Also ODK collected data is just a single node as well, so no Polygon to conflate with. Currently the conflation code supports conflating a single node with a nearby building for the cases where you aren't using a data extract, or there is a building in OSM with only "building=yes" from remote mapping.

Also don't forget conflating highways & waterways. When I tried a data extract of highways in Collect, I can still select it and answer the survey questions cause I don't need the geometry. Once again, just the tags. In a lot of remote areas the OSM feature only has "highway=track", and I want to add surface, smoothness, tracktype, and width to improve navigation.

Where conflation gets interesting is with external datasets not from ODK, so I don't think FMTM would be involved.

@rsavoye
Copy link
Contributor

rsavoye commented Aug 6, 2024

Btw, I've got a whole doc on conflating with ODK field collected data for more detail:
https://hotosm.github.io/osm-merge/odkconflation/

@rsavoye
Copy link
Contributor

rsavoye commented Aug 28, 2024

I need to figure out why the images don't appear, but I just wrote a doc on conflating highways. While FMTM isn't mapping highways (yet), I believe it's on the roadmap.
https://hotosm.github.io/osm-merge/highways/. Right now it's focused on remote US roads in national forests, but could easily be extended for other countries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to backend code effort:high Broader scope task with unclear timeline (consider splitting) enhancement New feature or request priority:high Should be addressed as a priority testing:ready Ready for testing
Projects
Development

No branches or pull requests

5 participants