-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are the pois module and footprints module redundant? #478
Comments
Hey! Made a google draw diagram of the call stack / code duplications to visualize the issue: How are you envisioning the final interface? For example, we could replace The final calls at the end for |
Thanks! I'm grateful to have you taking a crack at this. A few ideas... The pois module at this point essentially offers a generalization of the footprints module, though they diverge in how they handle relations in building geometry objects. Ideally, the Either way, this would require migrating some of the relation/geometry handling code from footprints -> pois. For example, when handling complex multipolygons with inner holes, the footprints module currently is more robust than the pois module. At least, I think. At this point they do approximately the same thing in very different ways, which I haven't had the time to slowly dig through and unravel. If you can wrap your head around their internal functions' similarities, overlap, and differences that'd be really helpful. Then we can rethink how they handle relations/complex geometries and migrate/merge their functionality. See also: https://wiki.openstreetmap.org/wiki/Relation:multipolygon |
Just to add to the conversation... Originally, as I understood it, there was more separation between the two - the POIs module returned points of interest and the buildings module returned building footprints as a nice to have visual reference for the network analysis. When I worked on the buildings module to generalise it to footprints and add multipolygon support I realised that there had to be a lot of overlap as you have to go through the points -> lines -> polygons to get to the multipolygons. The footprints module already returns at least some line geometry if you set retain_invalid=True:
Not that I'm suggesting it, but I guess it would return points as well if the tags were added to them in the main _responses_to_dicts() for loop and they weren't filtered out later on. Just to pick up on @juliamgomes point I suspect there are a lot of people using OSMnx to get OSM geometry (whether points, lines or (multi)polygons) into GeoPandas without necessarily engaging with the networkx side at all. So it makes sense to me to aim for I'm not familiar with the POI code so I can't comment on which is more robust / a better place to start from but at a glance it looks cleaner and can make more specific queries which are both positives. |
Thanks @AtelierLibre.
That's right. When OSMnx was originally developed, it was merely designed as a "connector" between networkx and OpenStreetMap (hence the name). Soon after, I whipped up the
Yes. And I should have been more detailed in my previous response to @juliamgomes... deprecating/replacing the At that point, we could have new
|
Also note #504 which removes a bunch of old deprecated params, greatly streamlining several modules and functions (including |
That all sounds great, I'd forgotten about the existing Just on the original question, I've done a couple of tests looking at the practical differences. The POI module tries to convert all OSM ways directly to polygons and then create multipolygons from those where necessary - this is essentially what the footprints module originally did. There are two issue with this:
To resolve these issues the footprints module has an intermediate step where it separates closed ways (which it keeps as polygons) from open ways (which it keeps as linestrings) and then uses both to create (multi)polygons. By default it then discards the linestrings at the end. As I mentioned above this has the side effect that you can actually use the footprints module to get linear geometries if you set the footprints module parameter
This is particularly relevant for handling rings of open ways defining polygons. In this toy example the polygon on the right is made up of three open ways inside a relation. The POI module returns these as three individual polygons grouped into a multipolygon whereas the footprints module assembles the three open ways into a single polygon.
If the polygons created by the POI module from the linestrings in a relation are themselves geometrically invalid it will discard the whole multipolygon. This example is from issue #270
From the few examples that I have looked at I think the differences between the two modules are more significant than they appear. It would need the proper handling of open ways/linestrings as well as the more robust processing of polygons & multipolygons to be moved across to/merged with the POI module. However, I still think the proposal makes a lot of sense. |
I'm creating a new If anyone wants to take a crack at the proposed |
I'd be happy to start having a look at it aiming to keep the structure and organisation of the |
That'd be great. Thanks! |
Great! Just to say that I have started it off on this branch - the module is called |
I wanted to share some progress on this. There are quite a few things to tidy up so it's just to show things moving forward. As the image below shows: The
The
The draft
One of the trickier things to resolve is which closed OSM ways (i.e. first point and last point are the same) should be resolved to LineStrings (e.g. a roundabout or fence round a field) and which should be resolved to Polygons (e.g. land use, buildings etc.). To do that requires looking beyond the geometry and using the tags. I have implemented an approach based on the JSON available from this page Overpass_turbo/Polygon_Features. This means that closed ways returned from a query with There is an edge case which is not entirely resolved yet where a single closed OSM way can represent both a LineString and Polygon (e.g. when it is tagged The notebook for creating the image from my fork is here |
@AtelierLibre fantastic! This is looking great. |
@gboeing Great, thanks, good to know it's going in the right direction! |
@AtelierLibre if you open a PR for your work in progress, then @xgerrmann could weigh in on it there? |
@gboeing Sure, that sounds like a good idea. Should I just open a pull request straight into master? I'll also write up some of the changes that I have made in the latest versions and @xgerrmann would be great to get some extra eyes on it. |
@AtelierLibre sure. |
@gboeing Okay, I have opened pull request #542. There are a few things I wanted to flag up - but I need a little bit of time to write that up. @xgerrmann It handles multipolygon relations - is that what you mean? I have just run this Amsterdam query through it and the result is below:
|
These are some notes on the work that I have done: There are three main stages in all three modules (
I have focused on parsing the JSON onwards. There are some differences in the first stage but I assume the issues must be common to the graph module as well so I have left them alone for future streamlining? 1. Creating the query and requesting the JSON
2. Parsing the JSON to Shapely geometries in a GeoDataFrameThere are differences between the approaches of Loop through the JSON only once
Single dictionary approachFrom what I understand there isn't anything faster than a dictionary when you are processing, adding, and retrieving individual objects to some kind of collection. The
Creating, Transposing and Appending the GeoDataFrames are all quite resource intensive operations so I have preferred to create a single dictionary from the start with a single call to As nodes, ways and relations can share id numbers it does require a new unique id formed from the element type and its id number. Blocklist/passlistProbably the biggest change is the inclusion of the blocklist/passlist JSON. I have tried to follow the Overpass Turbo approach linked to from this page. The only way to determine if a closed OSM way should become a LineString (e.g. highway) or a Polygon (e.g. landuse) is from its tags. I mentioned above that some ways are tagged with both types of tags - I have done some reading about this (there is quite a long discussion here) and this is tricky to handle - if you create both geometries do you need to create new id numbers? separate the relevant tags onto the different geometries? I have followed what I think is the Overpass Turbo / OpenStreetMap Carto approach which is to only return a single geometry for every closed way - a linestring by default, or a polygon if it has any polygon type tagging. I'm having a bit of trouble understanding how the JSON should be included in the package and could use some pointers for that. Creation of Shapely geometriesI have removed the try/except blocks from the creation of Points and Polygons. According to Shapely's documentation there is no such thing as an invalid Point and it does not check the validity of the Polygons when it is creating them, only when it is using them in further operations. So I have limited the try/except to the It is not clear to me what the 3. Filtering the final GeoDataFrameWhere From my tests I have wondered if the filtering by polygon could be made optional or whether it could be removed if the Overpass query is created more accurately but I haven't investigated this. I think there are some subtler points that I might have omitted like #364 so I'll try and review those. Any comments/queries please let me know. |
Let's move this conversation to the PR. That will help to document its development for future reference. I'll provide responses there. |
Yes, exactly! |
Over time, the pois and footprints modules have expanded their functionality and grown more similar to each other. I believe they are approximately redundant now and should be merged together to simplify the codebase.
The following methods are approximately equivalent:
Minor differences would have to be reconciled between these modules (especially the detailed handling of relations in
footprints
) to account for the differences between the resulting GeoDataFrames.The text was updated successfully, but these errors were encountered: