Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle more relation types in geometries module and multi-index gdf #654

Merged
merged 3 commits into from
Feb 11, 2021

Conversation

gboeing
Copy link
Owner

@gboeing gboeing commented Feb 5, 2021

Currently, the geometries module only handles OSM relations of type multipolygon. However, type boundary is used and preferred for grouping ways into place boundaries.

As we're not currently handling boundary relations, we get some inconsistent effects. For example, in Turin this suburb's boundary is represented as a multipolygon relation but this suburb's boundary is represented as a boundary relation. Currently, we only retain the former's geometry and not the latter's. Example:

import osmnx as ox
ox.config(log_console=True)
place = 'Torino, Italia'
tags = {'place':'suburb'}
gdf = ox.geometries_from_place(place, tags)
gdf.shape # (46, 14)

In this PR, I've added handling for boundary relation types. Example:

import osmnx as ox
ox.config(log_console=True)
place = 'Torino, Italia'
tags = {'place':'suburb'}
gdf = ox.geometries_from_place(place, tags)
gdf.shape # (61, 15)

I'd appreciate comments and thoughts on this proposal if anyone would be kind enough to offer them. Some specific questions.

  1. There are many other relation types as well: see the OSM relation types documentation. Do we need to handle other types?
  2. Does the existing _parse_relation_to_multipolygon handle this use case flexibly enough for boundary relation types? It appears to work well with both shapely Polygon and MultiPolygon geometries (the function's name may be a misnomer). What about other relation types?

@AtelierLibre may have specific thoughts on the original decision to retain only multipolygon relation types and if this proposal here makes sense.

See also #542 where the original geometries module work took place and this question on StackExchange.

@codecov
Copy link

codecov bot commented Feb 5, 2021

Codecov Report

Merging #654 (9b2a4ac) into master (c2f55a3) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #654      +/-   ##
==========================================
+ Coverage   92.82%   92.84%   +0.02%     
==========================================
  Files          24       24              
  Lines        2326     2333       +7     
==========================================
+ Hits         2159     2166       +7     
  Misses        167      167              
Impacted Files Coverage Δ
osmnx/geometries.py 95.68% <100.00%> (-0.02%) ⬇️
osmnx/osm_xml.py 95.16% <0.00%> (-0.04%) ⬇️
osmnx/truncate.py 97.67% <0.00%> (+0.05%) ⬆️
osmnx/distance.py 89.47% <0.00%> (+0.67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c2f55a3...9b2a4ac. Read the comment docs.

@AtelierLibre
Copy link
Contributor

AtelierLibre commented Feb 7, 2021

@gboeing from what I can see, adding "boundary" as another geometry relation type should be okay - ideally they should use the same "outer" and "inner" roles as "multipolygons" and so should parse properly.

The reason that I limited it specifically to MultiPolygons is that relations more broadly do two things - one is to represent single multi-part geometries (i.e. MultiPolygons and now boundaries) - the other is as a kind of functional group of objects that are all independent geometries in their own right e.g. buildings belonging to a university, street segments making up bus routes etc. Unfortunately the division between the two uses is not strict and actually you can see this mixed approach in the relation that you referenced relation/9153339 which is tagged as a MultiPolygon type but also includes an independent point node with its own information.

As the functional grouping type of relations aren't geometries per se - rather they are a list of references to other geometries - I don't see how they can be held in the geodataframe? Perhaps they could be held separately in a standard Pandas DataFrame with pointers to the GeoDataframe?

One thing to bear in mind is that because the non-geometry relations carry their own tags (e.g. "amenity"="university) and these aren't necessarily duplicated on the actual geometries that they reference, the current approach can result in nothing being returned at all because the the non-geometry relation with the query tags is not held anywhere by OSMnx and the individual geometries are also discarded because they don't individually carry the requested tags.

In short, I think including boundaries with multipolygons should make sense, for the other non-geometry relations I think we'd have to develop a new approach. Getting more input from others with experience of this would be good, whether it is desirable to keep non-geometry relations at all and, if so, how to implement it.

@gboeing
Copy link
Owner Author

gboeing commented Feb 8, 2021

Thanks @AtelierLibre. That's a clear explanation of everything. For documentation here, I'll try to summarize. Relations (relevant to our geometries retrieval use case) serve two broad categories of purpose:

  1. Group multi-part geometries of a single real-world object
    1. eg, type=multipolygon
    2. eg, type=boundary
  2. Grouping a set of several functionally related real-world objects
    1. eg, type=route
    2. eg, type=waterway
    3. and more

I agree that it makes sense to limit the geometries module to the first category. The "route" and "waterway" types of relations, for example, would seem to be a better fit for network analysis but would require a very different querying paradigm than OSMnx currently uses. So for now I'd say that's outside the project's scope.

I'll add a little more documentation to the geometries module to make it clearer what it does and does not retrieve.

@gboeing
Copy link
Owner Author

gboeing commented Feb 9, 2021

@AtelierLibre one more question: all the geometries_from_x functions return a GeoDataFrame with element_type, osmid, and unique_id columns, with the latter being a conflation of the two former for unique identification. It would be more consistent with the rest of the package if we uniquely identify the GeoDataFrame rows instead by their index labels.

In other words, the returned GeoDataFrame would look like:

import osmnx as ox
place = 'Los Angeles, California, USA'
tags = {'place': 'suburb'}
gdf = ox.geometries_from_place(place, tags)

# index by element_type and osmid and discard the unique_id field
gdf.set_index(['element_type', 'osmid']).drop(columns='unique_id')

Are there any drawbacks to this? The benefits would be 1) less duplication of data in the GeoDataFrame, 2) a more useful and meaningful index, 3) better consistency with the output format of the graph_to_gdfs function.

@AtelierLibre
Copy link
Contributor

@gboeing in terms of the code, I don't see a problem with that.

Use of the composite unique_id in the form 'node/6313343549' runs right the way through the geometries module from the point of creation of the original dictionary of geometries but it is only in the last two lines that it gets reset out of the index and into a column.

If you would prefer not to do that, but rather replace it at the end with a MultiIndex as in your code above I can't see it affecting how the module works. If you think that creates a more useful index and better consistency, then that sounds good.

On a purely personal level, in terms of usability, I'm can see I'm going to have to put in a bit of effort to get comfortable working with MultiIndexes!

@gboeing gboeing changed the title handle more relation types in geometries module handle more relation types in geometries module and multi-index gdf Feb 9, 2021
@gboeing
Copy link
Owner Author

gboeing commented Feb 9, 2021

Multi-indexing has been added in 9b2a4ac.

On a purely personal level, in terms of usability, I'm can see I'm going to have to put in a bit of effort to get comfortable working with MultiIndexes!

@AtelierLibre others may have the same question so I'll add a quick usage example here.

Old way:

import osmnx as ox
import pandas as pd
gdf = ox.geometries_from_place('Los Angeles, California, USA', {'place': 'suburb'})

# select all rows of element type "node"
gdf[gdf['element_type']=='node']

# select all rows of element type "way" or "relation"
gdf[gdf['element_type'].isin(['way', 'relation'])]

# select all rows with osmid 150933823, regardless of element type
gdf[gdf['osmid']==150933823]

# select all rows of element type "node" with osmid 150934802 or 150935958
gdf[(gdf['element_type']=='node') & (gdf['osmid'].isin([150934802, 150935958]))]

New way:

import osmnx as ox
import pandas as pd
gdf = ox.geometries_from_place('Los Angeles, California, USA', {'place': 'suburb'})

# select all rows of element type "node"
gdf.loc['node']

# select all rows of element type "way" or "relation"
gdf.loc[['way', 'relation']]

# select all rows with osmid 150933823, regardless of element type
gdf.loc[pd.IndexSlice[:, [150933823]], :]

# select all rows of element type "node" with osmid 150934802 or 150935958
gdf.loc[pd.IndexSlice[['node'], [150934802, 150935958]], :]

@gboeing gboeing mentioned this pull request Feb 9, 2021
18 tasks
@AtelierLibre
Copy link
Contributor

@gboeing That's great, thanks.

@martinfleis
Copy link
Contributor

Hi, just heads up that this change disabled explode() on GeoDataFrames (without resetting index) since there is apparently a bug in GeoPandas - geopandas/geopandas#1937.

I don't think there's anything to be done on your side (we should fix that in GeoPandas) but it may be good to be aware of the situation.

@gboeing
Copy link
Owner Author

gboeing commented May 6, 2021

Thanks Martin, I appreciate the heads-up. I'll try to keep my eye on that upstream GeoPandas issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants