-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle more relation types in geometries module and multi-index gdf #654
Conversation
Codecov Report
@@ Coverage Diff @@
## master #654 +/- ##
==========================================
+ Coverage 92.82% 92.84% +0.02%
==========================================
Files 24 24
Lines 2326 2333 +7
==========================================
+ Hits 2159 2166 +7
Misses 167 167
Continue to review full report at Codecov.
|
@gboeing from what I can see, adding "boundary" as another geometry relation type should be okay - ideally they should use the same "outer" and "inner" roles as "multipolygons" and so should parse properly. The reason that I limited it specifically to MultiPolygons is that relations more broadly do two things - one is to represent single multi-part geometries (i.e. MultiPolygons and now boundaries) - the other is as a kind of functional group of objects that are all independent geometries in their own right e.g. buildings belonging to a university, street segments making up bus routes etc. Unfortunately the division between the two uses is not strict and actually you can see this mixed approach in the relation that you referenced relation/9153339 which is tagged as a MultiPolygon type but also includes an independent point node with its own information. As the functional grouping type of relations aren't geometries per se - rather they are a list of references to other geometries - I don't see how they can be held in the geodataframe? Perhaps they could be held separately in a standard Pandas DataFrame with pointers to the GeoDataframe? One thing to bear in mind is that because the non-geometry relations carry their own tags (e.g. "amenity"="university) and these aren't necessarily duplicated on the actual geometries that they reference, the current approach can result in nothing being returned at all because the the non-geometry relation with the query tags is not held anywhere by OSMnx and the individual geometries are also discarded because they don't individually carry the requested tags. In short, I think including boundaries with multipolygons should make sense, for the other non-geometry relations I think we'd have to develop a new approach. Getting more input from others with experience of this would be good, whether it is desirable to keep non-geometry relations at all and, if so, how to implement it. |
Thanks @AtelierLibre. That's a clear explanation of everything. For documentation here, I'll try to summarize. Relations (relevant to our geometries retrieval use case) serve two broad categories of purpose:
I agree that it makes sense to limit the I'll add a little more documentation to the |
@AtelierLibre one more question: all the In other words, the returned GeoDataFrame would look like: import osmnx as ox
place = 'Los Angeles, California, USA'
tags = {'place': 'suburb'}
gdf = ox.geometries_from_place(place, tags)
# index by element_type and osmid and discard the unique_id field
gdf.set_index(['element_type', 'osmid']).drop(columns='unique_id') Are there any drawbacks to this? The benefits would be 1) less duplication of data in the GeoDataFrame, 2) a more useful and meaningful index, 3) better consistency with the output format of the |
@gboeing in terms of the code, I don't see a problem with that. Use of the composite If you would prefer not to do that, but rather replace it at the end with a MultiIndex as in your code above I can't see it affecting how the module works. If you think that creates a more useful index and better consistency, then that sounds good. On a purely personal level, in terms of usability, I'm can see I'm going to have to put in a bit of effort to get comfortable working with MultiIndexes! |
Multi-indexing has been added in 9b2a4ac.
@AtelierLibre others may have the same question so I'll add a quick usage example here. Old way:import osmnx as ox
import pandas as pd
gdf = ox.geometries_from_place('Los Angeles, California, USA', {'place': 'suburb'})
# select all rows of element type "node"
gdf[gdf['element_type']=='node']
# select all rows of element type "way" or "relation"
gdf[gdf['element_type'].isin(['way', 'relation'])]
# select all rows with osmid 150933823, regardless of element type
gdf[gdf['osmid']==150933823]
# select all rows of element type "node" with osmid 150934802 or 150935958
gdf[(gdf['element_type']=='node') & (gdf['osmid'].isin([150934802, 150935958]))] New way:import osmnx as ox
import pandas as pd
gdf = ox.geometries_from_place('Los Angeles, California, USA', {'place': 'suburb'})
# select all rows of element type "node"
gdf.loc['node']
# select all rows of element type "way" or "relation"
gdf.loc[['way', 'relation']]
# select all rows with osmid 150933823, regardless of element type
gdf.loc[pd.IndexSlice[:, [150933823]], :]
# select all rows of element type "node" with osmid 150934802 or 150935958
gdf.loc[pd.IndexSlice[['node'], [150934802, 150935958]], :] |
@gboeing That's great, thanks. |
Hi, just heads up that this change disabled I don't think there's anything to be done on your side (we should fix that in GeoPandas) but it may be good to be aware of the situation. |
Thanks Martin, I appreciate the heads-up. I'll try to keep my eye on that upstream GeoPandas issue. |
Currently, the
geometries
module only handles OSM relations of type multipolygon. However, type boundary is used and preferred for grouping ways into place boundaries.As we're not currently handling boundary relations, we get some inconsistent effects. For example, in Turin this suburb's boundary is represented as a multipolygon relation but this suburb's boundary is represented as a boundary relation. Currently, we only retain the former's geometry and not the latter's. Example:
In this PR, I've added handling for boundary relation types. Example:
I'd appreciate comments and thoughts on this proposal if anyone would be kind enough to offer them. Some specific questions.
_parse_relation_to_multipolygon
handle this use case flexibly enough for boundary relation types? It appears to work well with both shapely Polygon and MultiPolygon geometries (the function's name may be a misnomer). What about other relation types?@AtelierLibre may have specific thoughts on the original decision to retain only multipolygon relation types and if this proposal here makes sense.
See also #542 where the original
geometries
module work took place and this question on StackExchange.