Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transport infrastructure data packs are out #116

Open
hulsiejames opened this issue Aug 25, 2022 · 5 comments
Open

Transport infrastructure data packs are out #116

hulsiejames opened this issue Aug 25, 2022 · 5 comments

Comments

@hulsiejames
Copy link
Collaborator

The (current version) of transport infrastructure data packs are out (/being processed & published) under the 0.4 releases!

Data packs are currently generated through: create_data_packs.R

These data packs contain the following function outputs:

  • oi_recode_road_class - re-code default OSM highway values into usable UK road descriptions
  • oi_road_names - re-code OSM highway names from osm_id to road name & ref (i.e 056274512 --> St. Peters Road, A61)
  • oi_active_cycle - shows ways with allowed access for cyclists
  • oi_active_walk - shows ways with allowed access for pedestrians
  • oi_is_lit - indicates the presence of lighting on OSM ways, useful if considering active travel after dark
  • oi_clean_maxspeed_uk - re-codes OSM maxspeed values to be compliant with current official UK maxspeeds
  • oi_inclusive_mobility - re-categorises OSM data to indicate the presence of features indictive of inclusive mobility
@hulsiejames
Copy link
Collaborator Author

hulsiejames commented Aug 25, 2022

The current workflow can be optimised. Currently is as follows:

1 - Download England-latest.osm.pbf and store locally                        
                                                                              
2 - For each LAD, get default OSM networks by subsetting from England-latest 
    using oe_read(), pointing to local England-latest as input file. Save    
    each LAD network as a .geojson                                           
                                                                              
3 - For each LAD network, read the .geojson and load into R, then apply the  
    openinfra functions to create a data pack for that LAD                   
                                                                            
4 - Write the data pack locally as both .geojson and .gpkg                   
                                                                              
5 - Upload the local data packs to github releases using piggyback package

Through R, step 2 is taking a considerable amount of time (~1 minute per LAD, in full there are over 330 I believe)

In #49 @bdon proposed the use of the osmium-tool which can be used to generate extracts from a large single file (say, england-latest.osm.pbf).
The main benefit: osmium-tool can generate multiple region extarcts from a single file with a single command (see creating multiple extracts in one go)

This requires a config file (.JSON) that specifies boundaries to be used for each region. I imagine it can be simple to save each LAD polygon as a single .geojson to be used by the osmium-tool as a boundary - the fun part will be linking these up into a config file to be read by the osmium-tool... the given example:

{
    "directory": "/tmp/",
    "extracts": [
        {
            "output": "dep01-ain.osm.pbf",
            "polygon": {
                "file_name": "dep01-ain.geojson",
                "file_type": "geojson"
            }
        },
        {
            "output": "dep02-aisne.osm.pbf",
            "polygon": {
                "file_name": "dep02-aisne.geojson",
                "file_type": "geojson"
            }
        },
        ...
    ]
}

output: the name of the extract .osm.pbf to be generated
file_name: the filename of each LADs bounding polygon
file_type: file type of the file_name file

I've had a quick look and it seems this file will need to be written by hand, wich for a few hundred LADs will be repetative, but I think once this is set up it can be reused, provided the file_name of each LAD polygon stays the same.

Given the time constraints left on the project this is not something I am prioritising, but if there is a rainy weekend I may try to write a script that can write such a config file to be used by the osmium-tool.

The benefit of this would be, rather than oe_read() reading and subsetting from the entire england-latest.osm.pbf, it would only read from a much smaller, nearly perfect .osm.pbf.
The issue is there will still be a few ways that extend the LAD boundary as osmium-tool does not perfectly clip ways on a boundary (see Extract strategies, under 9.), thus I would still re-pass these smaller .osm.pbf files through oe_read, applying the LAD polygon as a boundary= arg and boundary_type="clipsrc" to create the perfect network. From here I would apply the openinfra functions to create the data pack

@hulsiejames
Copy link
Collaborator Author

Update:

  • We have now added another function, oi_bicycle_parking, which gets OSM data on cycling points within each LAD.
  • Data packs for each LAD now come in two forms, lines and points (linestrings - roads, paths etc. & nodes - bins, bicycle parking, crossings etc.)
  • A draft of the above can be found in 0.4.1 releases
  • With the new points & lines data packs and the two formats (.geojson & .gpkg) there are around 1260 individual data packs to be uploaded. Each time you ask piggyback for a new upload, it queries the current releases tag to not upload duplicates however this is starting to take some time (1-2s) per data pack, see below...
Uploading lines data pack for: Barking_and_Dagenham with format: .gpkg
ℹ Running gh query, got 400 records of about 700

To reduce this, I propose we could try packaging data packs by region into a single .zip (inc. the .geojson & .gpkg formats - or another other compressed format) and upload these reducing total uploads to around 630.

Alternatively, we could package all lines and all points into respective zipped directories and upload these two to releases. Though, this would require planners to download the data for all of England before finding their specific regions.

@GretaTimaite
Copy link
Collaborator

After seeing Robin's issue #128, I had a look at the vignette too and noticed a few mismatches(?) re IM function:

  1. you mention a function, but there are no visualisations of anything the function returns. This is confusing, so either delete the function or add some kind of a visualisation (kerbs, footways/footpaths/implied, etc)
  2. I'd modify the description from

Adds a number of Inclusive Mobility (IM) "im_***" columns that reflect whether or not a piece of infrastructure meets the requirements of the IM Guide

to

Adds a number of columns relevant to planning inclusive pedestrian infrastructure. For more information see the documentation
The problem I have with the existing forumalation: saying that it "reflects whether or not..." implies that the data is 100% reliable and recategorization correct, however this is not true as function provides a simplification as, in most cases, data is lacking.

  1. it's minor but I'd change "assesses" word to "indicates" as I simply recategorise data rather than do any kind of evaluation that "assess" would presume? Dunno, this is linguistics, I guess.
  2. Please, if you change anything in the function then update the vignette too. For example, in your data packs vignette speed recategorisation is missing in the IM function, but it's still there in the IM vignette. Which version is the function? I do realise that there's a separate function for speed recategorisation, so I don't mind this as long as we keep everything consistent.

Also, I believe there's a separate lit function now, then creating this column as part of im function becomes rather redundant too. There's a lack of consistency that may cause confusion for a potential user.

@hulsiejames
Copy link
Collaborator Author

hulsiejames commented Sep 2, 2022

Yes this vignette needs updating!

To clarify - this issue was not promoting the vignette as being published, rather it was promoting that actual data packs had been made available through 0.4 releases.
Though, this also required the data_packs vignette to be updated... my current thoughts are:

  • re-write the data_packs vignette to have the following format:
  • a brief intro on motivation and context for transport infrastructure data packs,
  • followed by - visualisations of a 1/2.5km circular buffer around the city of leeds showing data pack outputs. Each visualisation has descriptions of data pack column names, what they mean, and potentially how they are derived... leading onto the next point:
  • followed by reproducible code on how to create the plots (or at least a link to another vignette that goes through this in more detail) so people/planners can visualise their own data packs (obtainable from the project releases page)

This vignette actually hasn't been updated since the steering group meeting with Kayley towards the end of July I think, which may be the reason for some of the discrepancies.

you mention a function, but there are no visualisations of anything the function returns. This is confusing, so either delete the function or add some kind of a visualisation (kerbs, footways/footpaths/implied, etc)

Originally I had intended to include all of the IM function columns within the meeting, however I believe that there are an additioanl 11 columns returned by the IM function, and at the time of compiling I didn't have time to create individual .html maps for each.
For now I will remove the mention of this, and add it once I have create some .html maps or we change it to individual functions.

Also, I believe there's a separate lit function now, then creating this column as part of im function becomes rather redundant too. There's a lack of consistency that may cause confusion for a potential user.

This is actually the original reason for why I had removed im_lighting from the IM function within the data packs as I had created the oi_is_lit function at this point. Apologies I had forgot to specify within the vignette and did not want to edit your vignette in case you had linked this in a personal website or if it has been included in any publications etc.

I'll get to work on updating the vignette now.

@GretaTimaite
Copy link
Collaborator

Thanks James for clarifying.

I don't link to individual functions on my website as I see the project in it's totality. It's an interesting point, so maybe we should discuss this at some point as I've noticed that you add your name to every vignette your start but neither I nor Robin have done it. Maybe it's something we should discuss.

I guess just because something has been mentioned in the publication, it doesn't mean it stops being a subject to changes. So I'd say as long as the communication/updating is clear then we can avoid miscommunication in the future. We might simply start keeping a log of changes of documentation/vignete/etc specifically used in publications. Changes are unavoidable in the ongoing open source projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants