Transport infrastructure data packs are out #116

hulsiejames · 2022-08-25T15:20:07Z

The (current version) of transport infrastructure data packs are out (/being processed & published) under the 0.4 releases!

Data packs are currently generated through: create_data_packs.R

These data packs contain the following function outputs:

oi_recode_road_class - re-code default OSM highway values into usable UK road descriptions
oi_road_names - re-code OSM highway names from osm_id to road name & ref (i.e 056274512 --> St. Peters Road, A61)
oi_active_cycle - shows ways with allowed access for cyclists
oi_active_walk - shows ways with allowed access for pedestrians
oi_is_lit - indicates the presence of lighting on OSM ways, useful if considering active travel after dark
oi_clean_maxspeed_uk - re-codes OSM maxspeed values to be compliant with current official UK maxspeeds
oi_inclusive_mobility - re-categorises OSM data to indicate the presence of features indictive of inclusive mobility

hulsiejames · 2022-08-25T15:28:24Z

The current workflow can be optimised. Currently is as follows:

1 - Download England-latest.osm.pbf and store locally                        
                                                                              
2 - For each LAD, get default OSM networks by subsetting from England-latest 
    using oe_read(), pointing to local England-latest as input file. Save    
    each LAD network as a .geojson                                           
                                                                              
3 - For each LAD network, read the .geojson and load into R, then apply the  
    openinfra functions to create a data pack for that LAD                   
                                                                            
4 - Write the data pack locally as both .geojson and .gpkg                   
                                                                              
5 - Upload the local data packs to github releases using piggyback package

Through R, step 2 is taking a considerable amount of time (~1 minute per LAD, in full there are over 330 I believe)

In #49 @bdon proposed the use of the osmium-tool which can be used to generate extracts from a large single file (say, england-latest.osm.pbf).
The main benefit: osmium-tool can generate multiple region extarcts from a single file with a single command (see creating multiple extracts in one go)

This requires a config file (.JSON) that specifies boundaries to be used for each region. I imagine it can be simple to save each LAD polygon as a single .geojson to be used by the osmium-tool as a boundary - the fun part will be linking these up into a config file to be read by the osmium-tool... the given example:

{
    "directory": "/tmp/",
    "extracts": [
        {
            "output": "dep01-ain.osm.pbf",
            "polygon": {
                "file_name": "dep01-ain.geojson",
                "file_type": "geojson"
            }
        },
        {
            "output": "dep02-aisne.osm.pbf",
            "polygon": {
                "file_name": "dep02-aisne.geojson",
                "file_type": "geojson"
            }
        },
        ...
    ]
}

output: the name of the extract .osm.pbf to be generated
file_name: the filename of each LADs bounding polygon
file_type: file type of the file_name file

I've had a quick look and it seems this file will need to be written by hand, wich for a few hundred LADs will be repetative, but I think once this is set up it can be reused, provided the file_name of each LAD polygon stays the same.

Given the time constraints left on the project this is not something I am prioritising, but if there is a rainy weekend I may try to write a script that can write such a config file to be used by the osmium-tool.

The benefit of this would be, rather than oe_read() reading and subsetting from the entire england-latest.osm.pbf, it would only read from a much smaller, nearly perfect .osm.pbf.
The issue is there will still be a few ways that extend the LAD boundary as osmium-tool does not perfectly clip ways on a boundary (see Extract strategies, under 9.), thus I would still re-pass these smaller .osm.pbf files through oe_read, applying the LAD polygon as a boundary= arg and boundary_type="clipsrc" to create the perfect network. From here I would apply the openinfra functions to create the data pack

hulsiejames · 2022-08-31T08:52:47Z

Update:

We have now added another function, oi_bicycle_parking, which gets OSM data on cycling points within each LAD.
Data packs for each LAD now come in two forms, lines and points (linestrings - roads, paths etc. & nodes - bins, bicycle parking, crossings etc.)
A draft of the above can be found in 0.4.1 releases
With the new points & lines data packs and the two formats (.geojson & .gpkg) there are around 1260 individual data packs to be uploaded. Each time you ask piggyback for a new upload, it queries the current releases tag to not upload duplicates however this is starting to take some time (1-2s) per data pack, see below...

Uploading lines data pack for: Barking_and_Dagenham with format: .gpkg
ℹ Running gh query, got 400 records of about 700

To reduce this, I propose we could try packaging data packs by region into a single .zip (inc. the .geojson & .gpkg formats - or another other compressed format) and upload these reducing total uploads to around 630.

Alternatively, we could package all lines and all points into respective zipped directories and upload these two to releases. Though, this would require planners to download the data for all of England before finding their specific regions.

GretaTimaite · 2022-09-02T16:02:59Z

After seeing Robin's issue #128, I had a look at the vignette too and noticed a few mismatches(?) re IM function:

you mention a function, but there are no visualisations of anything the function returns. This is confusing, so either delete the function or add some kind of a visualisation (kerbs, footways/footpaths/implied, etc)
I'd modify the description from

Adds a number of Inclusive Mobility (IM) "im_***" columns that reflect whether or not a piece of infrastructure meets the requirements of the IM Guide

to

Adds a number of columns relevant to planning inclusive pedestrian infrastructure. For more information see the documentation
The problem I have with the existing forumalation: saying that it "reflects whether or not..." implies that the data is 100% reliable and recategorization correct, however this is not true as function provides a simplification as, in most cases, data is lacking.

it's minor but I'd change "assesses" word to "indicates" as I simply recategorise data rather than do any kind of evaluation that "assess" would presume? Dunno, this is linguistics, I guess.
Please, if you change anything in the function then update the vignette too. For example, in your data packs vignette speed recategorisation is missing in the IM function, but it's still there in the IM vignette. Which version is the function? I do realise that there's a separate function for speed recategorisation, so I don't mind this as long as we keep everything consistent.

Also, I believe there's a separate lit function now, then creating this column as part of im function becomes rather redundant too. There's a lack of consistency that may cause confusion for a potential user.

hulsiejames · 2022-09-02T17:41:14Z

Yes this vignette needs updating!

To clarify - this issue was not promoting the vignette as being published, rather it was promoting that actual data packs had been made available through 0.4 releases.
Though, this also required the data_packs vignette to be updated... my current thoughts are:

re-write the data_packs vignette to have the following format:
a brief intro on motivation and context for transport infrastructure data packs,
followed by - visualisations of a 1/2.5km circular buffer around the city of leeds showing data pack outputs. Each visualisation has descriptions of data pack column names, what they mean, and potentially how they are derived... leading onto the next point:
followed by reproducible code on how to create the plots (or at least a link to another vignette that goes through this in more detail) so people/planners can visualise their own data packs (obtainable from the project releases page)

This vignette actually hasn't been updated since the steering group meeting with Kayley towards the end of July I think, which may be the reason for some of the discrepancies.

you mention a function, but there are no visualisations of anything the function returns. This is confusing, so either delete the function or add some kind of a visualisation (kerbs, footways/footpaths/implied, etc)

Originally I had intended to include all of the IM function columns within the meeting, however I believe that there are an additioanl 11 columns returned by the IM function, and at the time of compiling I didn't have time to create individual .html maps for each.
For now I will remove the mention of this, and add it once I have create some .html maps or we change it to individual functions.

Also, I believe there's a separate lit function now, then creating this column as part of im function becomes rather redundant too. There's a lack of consistency that may cause confusion for a potential user.

This is actually the original reason for why I had removed im_lighting from the IM function within the data packs as I had created the oi_is_lit function at this point. Apologies I had forgot to specify within the vignette and did not want to edit your vignette in case you had linked this in a personal website or if it has been included in any publications etc.

I'll get to work on updating the vignette now.

GretaTimaite · 2022-09-02T18:08:58Z

Thanks James for clarifying.

I don't link to individual functions on my website as I see the project in it's totality. It's an interesting point, so maybe we should discuss this at some point as I've noticed that you add your name to every vignette your start but neither I nor Robin have done it. Maybe it's something we should discuss.

I guess just because something has been mentioned in the publication, it doesn't mean it stops being a subject to changes. So I'd say as long as the communication/updating is clear then we can avoid miscommunication in the future. We might simply start keeping a log of changes of documentation/vignete/etc specifically used in publications. Changes are unavoidable in the ongoing open source projects.

GretaTimaite mentioned this issue Sep 2, 2022

Splitting up oi_inclusive_mobility() function #129

Closed

GretaTimaite mentioned this issue Sep 2, 2022

documentation for data packs #130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transport infrastructure data packs are out #116

Transport infrastructure data packs are out #116

hulsiejames commented Aug 25, 2022

hulsiejames commented Aug 25, 2022 •

edited

Loading

hulsiejames commented Aug 31, 2022

GretaTimaite commented Sep 2, 2022

hulsiejames commented Sep 2, 2022 •

edited

Loading

GretaTimaite commented Sep 2, 2022

Transport infrastructure data packs are out #116

Transport infrastructure data packs are out #116

Comments

hulsiejames commented Aug 25, 2022

hulsiejames commented Aug 25, 2022 • edited Loading

hulsiejames commented Aug 31, 2022

GretaTimaite commented Sep 2, 2022

hulsiejames commented Sep 2, 2022 • edited Loading

GretaTimaite commented Sep 2, 2022

hulsiejames commented Aug 25, 2022 •

edited

Loading

hulsiejames commented Sep 2, 2022 •

edited

Loading