Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are the pois module and footprints module redundant? #478

Closed
gboeing opened this issue May 26, 2020 · 21 comments · Fixed by #542 or #549
Closed

Are the pois module and footprints module redundant? #478

gboeing opened this issue May 26, 2020 · 21 comments · Fixed by #542 or #549

Comments

@gboeing
Copy link
Owner

gboeing commented May 26, 2020

Over time, the pois and footprints modules have expanded their functionality and grown more similar to each other. I believe they are approximately redundant now and should be merged together to simplify the codebase.

The following methods are approximately equivalent:

import osmnx as ox
ox.config(use_cache=True, log_console=True)
place = 'Emeryville, CA, USA'

# use pois module
po = ox.pois_from_place(place, tags={'building':True})
po.shape #(1117, 73)

# use footprints module
fp = ox.footprints_from_place(place)
fp.shape #(1127, 55)

Minor differences would have to be reconciled between these modules (especially the detailed handling of relations in footprints) to account for the differences between the resulting GeoDataFrames.

@juliamgomes
Copy link

Hey! Made a google draw diagram of the call stack / code duplications to visualize the issue:

Footprints_ _Pois

How are you envisioning the final interface? For example, we could replace footprints_from_address and pois_from_address with geodataframe_from_address that takes in the union of the params in the two original functions. Or alternatively leave the interface unchanged and add helper functions to contain duplicate code.

The final calls at the end for _create_footprints_gdf and _create_poi_gdf are quite different. Curious if these should also be merged to provide a _create_gdf function that can handle both footprints and poi.

@gboeing
Copy link
Owner Author

gboeing commented Jun 11, 2020

Thanks! I'm grateful to have you taking a crack at this. A few ideas...

The pois module at this point essentially offers a generalization of the footprints module, though they diverge in how they handle relations in building geometry objects. Ideally, the footprints_from_x functions would just become thin wrappers that merely return pois_from_x(tags={'building':True}). So the interface would either remain unchanged, or the footprints user functions could eventually be deprecated.

Either way, this would require migrating some of the relation/geometry handling code from footprints -> pois. For example, when handling complex multipolygons with inner holes, the footprints module currently is more robust than the pois module. At least, I think. At this point they do approximately the same thing in very different ways, which I haven't had the time to slowly dig through and unravel. If you can wrap your head around their internal functions' similarities, overlap, and differences that'd be really helpful.

Then we can rethink how they handle relations/complex geometries and migrate/merge their functionality.

See also: https://wiki.openstreetmap.org/wiki/Relation:multipolygon

@AtelierLibre
Copy link
Contributor

Just to add to the conversation...

Originally, as I understood it, there was more separation between the two - the POIs module returned points of interest and the buildings module returned building footprints as a nice to have visual reference for the network analysis. When I worked on the buildings module to generalise it to footprints and add multipolygon support I realised that there had to be a lot of overlap as you have to go through the points -> lines -> polygons to get to the multipolygons.

The footprints module already returns at least some line geometry if you set retain_invalid=True:

import osmnx as ox
gdf = ox.footprints_from_point((3.8594,11.5200), 2000, retain_invalid=True)
gdf['geometry'].type.unique()

Not that I'm suggesting it, but I guess it would return points as well if the tags were added to them in the main _responses_to_dicts() for loop and they weren't filtered out later on.

Just to pick up on @juliamgomes point I suspect there are a lot of people using OSMnx to get OSM geometry (whether points, lines or (multi)polygons) into GeoPandas without necessarily engaging with the networkx side at all. So it makes sense to me to aim for geodataframe_from_ functions sitting alongside graph_from_ functions. This would get away from the links to specific geometries that both names currently suggest but I accept you would then end up with two sets of thin layers instead of one.

I'm not familiar with the POI code so I can't comment on which is more robust / a better place to start from but at a glance it looks cleaner and can make more specific queries which are both positives.

@gboeing
Copy link
Owner Author

gboeing commented Jun 11, 2020

Thanks @AtelierLibre.

Originally, as I understood it, there was more separation between the two

That's right. When OSMnx was originally developed, it was merely designed as a "connector" between networkx and OpenStreetMap (hence the name). Soon after, I whipped up the footprints module as a quick and dirty way to add in building footprints to analyses for visual reference and simple morphology analysis. Obviously OSMnx's uses and functionality have evolved much since then.

I suspect there are a lot of people using OSMnx to get OSM geometry

Yes. And I should have been more detailed in my previous response to @juliamgomes... deprecating/replacing the pois_from_x and footprints_from_x functions with a new gdf_from_x function that queries OSM for geometries (rather than topological graph creation) may be the right move in the end. But, note that it would require some other cascading deprecations because gdf_from_place and gdf_from_places already exist in the boundaries module. That said, this may be a good opportunity to rethink that too. I had recently been considering renaming the boundaries module as a new geocode module and moving utils_geo.geocode into it. In that case, perhaps gdf_from_place and gdf_from_places could be deprecated/replaced with a geocode_to_gdf function in that geocode module.

At that point, we could have new gdf_from_x functions that replace pois_from_x and footprints_from_x as generic OSM geometry query -> geopandas GeoDataFrame functions. This would all look something like:

  • geocode module that replaces current boundaries module and includes functions:
    • geocode_to_gdf (replaces current gdf_from_place and gdf_from_places functions)
    • geocode (moved from current location in utils_geo module)
  • geometries module that generalizes/replaces pois and footprints modules and includes user-facing functions:
    • gdf_from_x (replace current pois_from_x and footprints_from_x functions)
  • pois module gets deprecated and later removed
  • footprints module gets deprecated and later removed

@gboeing
Copy link
Owner Author

gboeing commented Jun 11, 2020

Also note #504 which removes a bunch of old deprecated params, greatly streamlining several modules and functions (including pois and footprints). That PR should be merged by the end of this week.

@AtelierLibre
Copy link
Contributor

AtelierLibre commented Jun 13, 2020

That all sounds great, I'd forgotten about the existing gdf_from_x functions.

Just on the original question, I've done a couple of tests looking at the practical differences.

The POI module tries to convert all OSM ways directly to polygons and then create multipolygons from those where necessary - this is essentially what the footprints module originally did.

There are two issue with this:

  1. In OSM any tag can be applied to any geometry/relation so the assumption that a POI query will only return points or polygons isn't true - you are quite likely to get lines as well.
  2. Polygons/Multipolygons in OSM can be represented by a ring of open ways grouped into a relation (essentially a closed ring of linestrings, no polygons at all).

To resolve these issues the footprints module has an intermediate step where it separates closed ways (which it keeps as polygons) from open ways (which it keeps as linestrings) and then uses both to create (multi)polygons. By default it then discards the linestrings at the end. As I mentioned above this has the side effect that you can actually use the footprints module to get linear geometries if you set the footprints module parameter retain_invalid to True. You can see the difference in behaviour with this example:

import osmnx as ox # osmnx 0.14.1

point= (51.0856, -1.1635)
dist=400
tag='highway'

footprint_gdf = ox.footprints_from_point(point=point, dist=dist,footprint_type=tag, retain_invalid=True)
poi_gdf = ox.pois_from_point(point=point, tags={tag:True}, dist=dist)

print("poi_gdf geometry types:", poi_gdf.geometry.type.unique())
print("footprint_gdf geometry types:", footprint_gdf.geometry.type.unique())

ax = poi_gdf.plot(figsize=(16,16), alpha=0.5, color='orange')
footprint_gdf.plot(figsize=(16,16), alpha=0.5, ax=ax)

POIPolygonsFootprintLines_

This is particularly relevant for handling rings of open ways defining polygons. In this toy example the polygon on the right is made up of three open ways inside a relation. The POI module returns these as three individual polygons grouped into a multipolygon whereas the footprints module assembles the three open ways into a single polygon.

import osmnx as ox # osmnx 0.14.1

point=(51.0281,-1.07)
dist=200
tag='natural'

footprint_gdf = ox.footprints_from_point(point=point, dist=dist,footprint_type=tag)
poi_gdf = ox.pois_from_point(point=point, tags={tag:True}, dist=dist)

ax = poi_gdf.plot(figsize=(16,16), alpha=0.5, color='orange')
footprint_gdf.plot(figsize=(16,16), alpha=0.5, ax=ax)

POIPolygonsFromOpenWaysInARelation_

If the polygons created by the POI module from the linestrings in a relation are themselves geometrically invalid it will discard the whole multipolygon. This example is from issue #270

# Issue #270
import osmnx as ox
print(ox.__version__)

point=(48,10)
dist=(10_000)
tag='landuse'

footprint_gdf = ox.footprints_from_point(point=point, dist=dist,footprint_type=tag)
poi_gdf = ox.pois_from_point(point=point, tags={tag:True}, dist=dist)

ax = poi_gdf.plot(figsize=(16,16), alpha=0.5, color='orange')
footprint_gdf.plot(figsize=(16,16), alpha=0.5, ax=ax)

Issue270_

From the few examples that I have looked at I think the differences between the two modules are more significant than they appear. It would need the proper handling of open ways/linestrings as well as the more robust processing of polygons & multipolygons to be moved across to/merged with the POI module. However, I still think the proposal makes a lot of sense.

@gboeing
Copy link
Owner Author

gboeing commented Jun 20, 2020

I'm creating a new geocoding module as discussed in #478 (comment) in PR #506.

If anyone wants to take a crack at the proposed geometries module that generalizes/replaces the pois and footprints modules, the help would be much appreciated.

@AtelierLibre
Copy link
Contributor

I'd be happy to start having a look at it aiming to keep the structure and organisation of the pois module but combine the processing of footprints. Shall I start it off in a branch on my fork?

@gboeing
Copy link
Owner Author

gboeing commented Jun 22, 2020

That'd be great. Thanks!

@AtelierLibre
Copy link
Contributor

Great! Just to say that I have started it off on this branch - the module is called geometries. Any comments or feedback as I'm working through it just let me know.

@AtelierLibre
Copy link
Contributor

I wanted to share some progress on this. There are quite a few things to tidy up so it's just to show things moving forward. As the image below shows:

The pois module returns:

  • Tagged nodes as points
  • Every OSM way that it can converted to a polygon
  • Any multipolygons that it can create from those polygons

The footprints module returns:

  • Closed OSM ways converted to polygons
  • Multipolygons assembled from component open and closed OSM ways
  • Open OSM ways that it would otherwise discard as linestrings (If retain_invalid=True)

The draft geometries module returns:

  • Tagged nodes as points
  • Open ways as linestrings
  • Closed ways as linestrings/polygons
  • Multipolygon relations as multipolygons.

One of the trickier things to resolve is which closed OSM ways (i.e. first point and last point are the same) should be resolved to LineStrings (e.g. a roundabout or fence round a field) and which should be resolved to Polygons (e.g. land use, buildings etc.). To do that requires looking beyond the geometry and using the tags. I have implemented an approach based on the JSON available from this page Overpass_turbo/Polygon_Features. This means that closed ways returned from a query with tags={'highway':True} should create LineStrings while those returned from tags={'landuse':True} should create Polygons.

There is an edge case which is not entirely resolved yet where a single closed OSM way can represent both a LineString and Polygon (e.g. when it is tagged barrier=fence and landuse=farmland). I am working through that at the moment but wanted to share the general direction of travel.

The notebook for creating the image from my fork is here

pois_footprints_geometries_comparison

@gboeing
Copy link
Owner Author

gboeing commented Jul 17, 2020

@AtelierLibre fantastic! This is looking great.

@AtelierLibre
Copy link
Contributor

@gboeing Great, thanks, good to know it's going in the right direction!

@xgerrmann
Copy link
Contributor

xgerrmann commented Jul 29, 2020

Hi guys,

I tried using the pois module to plot the outlines of rivers and canals and found that it required some adjustments for my use.
Some of the adjustments I made:

  • pois assumes all ways are closed, I made a fix for this
  • I simplified the logic for the relations and use shapely for merging and stitching outlines.

Below is an example of all the water in and around Amsterdam:
image

Additionally I was missing a feature for the custom filter, since it doesn't allow to include relations (at least in my understanding)

I would like to discuss the changes I've made to the codebase and see how it can be best integrated with the rest.

If interested, can someone please contact me to further discuss this?

@gboeing
Copy link
Owner Author

gboeing commented Jul 29, 2020

@AtelierLibre if you open a PR for your work in progress, then @xgerrmann could weigh in on it there?

@AtelierLibre
Copy link
Contributor

@gboeing Sure, that sounds like a good idea. Should I just open a pull request straight into master?

I'll also write up some of the changes that I have made in the latest versions and @xgerrmann would be great to get some extra eyes on it.

@gboeing
Copy link
Owner Author

gboeing commented Jul 29, 2020

Should I just open a pull request straight into master?

@AtelierLibre sure.

@AtelierLibre
Copy link
Contributor

@gboeing Okay, I have opened pull request #542. There are a few things I wanted to flag up - but I need a little bit of time to write that up.

@xgerrmann It handles multipolygon relations - is that what you mean? I have just run this Amsterdam query through it and the result is below:

Amsterdam = ox.gdf_from_point((52.3716,4.9005), dist=15000, tags={'natural':'water'})
Amsterdam.to_crs(epsg=28992, inplace=True)
Amsterdam.plot(figsize=(16,16))

Amsterdam

@AtelierLibre
Copy link
Contributor

These are some notes on the work that I have done:

There are three main stages in all three modules (pois, footprints, geometries):

  1. Creating the query and requesting the JSON
  2. Parsing the JSON to Shapely geometries in a GeoDataFrame
  3. Filtering the final GeoDataFrame

I have focused on parsing the JSON onwards. There are some differences in the first stage but I assume the issues must be common to the graph module as well so I have left them alone for future streamlining?

1. Creating the query and requesting the JSON

  • footprints uses the polygon as part of its query so it receives and has to process less information (pois and geometries use the bounding box).

  • footprints splits large requests into pieces and then processes the separate JSONs received - I don't know how beneficial this is but, assuming it is, wouldn't graph benefit from this as well?

2. Parsing the JSON to Shapely geometries in a GeoDataFrame

There are differences between the approaches of pois and footprints. I have tried to rationalise the approach as much as I can. I think this has speed benefits as well:

Loop through the JSON only once

  • The nodes, ways and relations come from the Overpass API in order so I don't think it is necessary to loop through to construct the nodes, the ways, and then separately again for the relations.

Single dictionary approach

From what I understand there isn't anything faster than a dictionary when you are processing, adding, and retrieving individual objects to some kind of collection. The pois module:

  • converts the dictionaries of nodes and ways to two separate GeoDataFrames
  • transposes them individually
  • projects them individually
  • uses gdf_ways to construct each multipolygon which is then individually appended to gdf_ways.
  • appends the two GeoDataFrames

Creating, Transposing and Appending the GeoDataFrames are all quite resource intensive operations so I have preferred to create a single dictionary from the start with a single call to gpd.GeoDataFrame.from_dict(geometries, orient="index") at the end. I think this is the fastest solution.

As nodes, ways and relations can share id numbers it does require a new unique id formed from the element type and its id number.

Blocklist/passlist

Probably the biggest change is the inclusion of the blocklist/passlist JSON. I have tried to follow the Overpass Turbo approach linked to from this page.

The only way to determine if a closed OSM way should become a LineString (e.g. highway) or a Polygon (e.g. landuse) is from its tags. I mentioned above that some ways are tagged with both types of tags - I have done some reading about this (there is quite a long discussion here) and this is tricky to handle - if you create both geometries do you need to create new id numbers? separate the relevant tags onto the different geometries? I have followed what I think is the Overpass Turbo / OpenStreetMap Carto approach which is to only return a single geometry for every closed way - a linestring by default, or a polygon if it has any polygon type tagging.

I'm having a bit of trouble understanding how the JSON should be included in the package and could use some pointers for that.

Creation of Shapely geometries

I have removed the try/except blocks from the creation of Points and Polygons. According to Shapely's documentation there is no such thing as an invalid Point and it does not check the validity of the Polygons when it is creating them, only when it is using them in further operations. So I have limited the try/except to the .difference() operation when creating the MultiPolygons. If there is some other reason for the try/except on the Points and Polygons they can of course go back.

It is not clear to me what the _invalid_multipoly_handler() was doing - I hope that it is no longer necessary due to more robust creation of the component LineStrings/Polygons in the earlier stages. If I have misunderstood I am sure it can go back in some way.

3. Filtering the final GeoDataFrame

Where footprints uses the polygon in its Overpass request pois relies on filtering the GeoDataFrame with the polygon at the end.

From my tests to_keep = gdf_proj.centroid.within(poly_proj) is probably the most 'expensive' line in the pois module. In geometries I have replaced it with your utils_geo._intersect_index_quadrats(gdf.centroid, polygon) which is significantly faster but still accounts for roughly one third of the total execution time some of the tests that I did. The footprints module just doesn't do this step so saves a lot of time at the expense of not accurately adhering to the polygon boundary.

I have wondered if the filtering by polygon could be made optional or whether it could be removed if the Overpass query is created more accurately but I haven't investigated this.

I think there are some subtler points that I might have omitted like #364 so I'll try and review those.

Any comments/queries please let me know.

@gboeing
Copy link
Owner Author

gboeing commented Jul 30, 2020

Let's move this conversation to the PR. That will help to document its development for future reference. I'll provide responses there.

@xgerrmann
Copy link
Contributor

@gboeing Okay, I have opened pull request #542. There are a few things I wanted to flag up - but I need a little bit of time to write that up.

@xgerrmann It handles multipolygon relations - is that what you mean? I have just run this Amsterdam query through it and the result is below:

Amsterdam = ox.gdf_from_point((52.3716,4.9005), dist=15000, tags={'natural':'water'})
Amsterdam.to_crs(epsg=28992, inplace=True)
Amsterdam.plot(figsize=(16,16))

Amsterdam

Yes, exactly!
Im now going through all the comments and the corresponding pull request to see where I can contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants