Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import UK postcodes from ONSPD instead of Code-Point Open #216

Closed
wants to merge 22 commits into from

Commits on Dec 15, 2015

  1. Script for importing UK postcodes from ONSPD

    Importing everything from one source reduces what we have to download and we have to download ONSPD to get NI and Crown dependencies which aren't in Code Point Open.
    
    Other advantages are that the ONSPD has all live and terminated postcodes in it and (as of Aug 2015 release at least) everything is a gss code, rather than a mix of ons + gss.
    
    We add a toggle to allow importing terminated postcodes as it can be useful to have old postcodes in the db to allow searching on old addresses.  Note that although code point open doesn't include terminated postcodes we can end up with them in our dataset, but only if we had a long-lived database that imported multiple releases of the dataset.  For example, we import the May 2012 dataset and then when the next one comes out in Aug 2012 we import that - our db will have in it the postcodes that were terminated between those two releases, but it won't have any that were terminated before May 2012.  This leads to the situation where a db rebuilt from scratch using the current dataset would have a different set of postcodes to one that had been around for a few years having had releases imported as they arrive.  Notionally both represent the current data, but one has more postcodes.  Using the ONSPD and allowing terminated postcodes fixes this problem.
    h-lame committed Dec 15, 2015
    Configuration menu
    Copy the full SHA
    b2d1d6d View commit details
    Browse the repository at this point in the history
  2. Allow detecting location availability per postcode row

    If `--no-location` is not set we would try to detect a location for each row, and this would break if the location fields could not be coerced into floats.  Some datasets mix location and non-location postal codes and to import them all we have to filter the data and run the importer twice.
    
    This change allows individual importers to implement `location_available_for_row` to say if the supplied row has location data or not.  The method is called on each row and will run the `--no-location` path if we can't extract location fields for that row.  If `--no-location` is set, we always run that path, regardless of the `location_available_for_row` value.
    h-lame committed Dec 15, 2015
    Configuration menu
    Copy the full SHA
    74aa673 View commit details
    Browse the repository at this point in the history
  3. Provide fixture data for NI council / electoral / wards

    In April 2015 councils and wards changed in Northern Ireland so the old ni-electoral-areas data files no longer represent the truth.
    
    The new ni-electoral-areas-2015 file provides the names and GSS codes of the new Districts, Electoral Areas, and Wards of Northern Ireland following the Apr 2015 reorganisation.  We synthesized this from a few datasets.
    ** The District -> Electoral Area -> Ward breakdown is taken directly from the legislation[2] - although this only contains the names
    ** The GSS codes of the Districts and Wards are taken from the "Wards (2015) to district council areas (2015) NI lookup"[3] dataset provided by the ONS.
    ** The GSS codes of the Electoral areas are taken from running sparql queries against the ONS Linked Data Portal[4].  The query used was:
    
        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX geography: <http://statistics.data.gov.uk/def/>
        PREFIX statistical-entity: <http://statistics.data.gov.uk/def/statistical-entity#>
        SELECT DISTINCT ?item
    
        WHERE {
        ?item rdf:type geography:statistical-geography .
    
          ?item statistical-entity:code <http://statistics.data.gov.uk/id/statistical-entity/N10> .
        }
        ORDER BY ASC(?code)
        LIMIT 100
        OFFSET 0
    
    N10 is the code for a NI Electoral Area (found by looking up one of the areas by name and investigating).  We then extract the data for each entry in this result set and extract the GSS code to match up with the name.
    
    This data will be used to help import NI areas from the OSNI boundary datasets which do not all contain GSS codes.
    
    [1]: https://geoportal.statistics.gov.uk/geoportal/catalog/search/resource/details.page?uuid=%7B2196C1D5-6A11-47DE-BD0E-26311E3D6D9F%7D
    [2]: http://www.legislation.gov.uk/uksi/2014/270/made
    [3]: https://geoportal.statistics.gov.uk/geoportal/catalog/search/resource/details.page?uuid=%7BFE83C43C-9403-408C-833C-367BC56659C9%7D
    [4]: http://statistics.data.gov.uk/
    h-lame committed Dec 15, 2015
    Configuration menu
    Copy the full SHA
    32d772e View commit details
    Browse the repository at this point in the history
  4. Allow importing postcodes with no location

    Some rows in the ONSPD that have no location - those where the 12th column is a '9'.  The existing importers for ni and gb postcodes automatically ignore these, but it can be useful to include them for existence checks even if we can't give more geographical information about them.
    
    We implement `location_available_for_row` to return `False` if the quality row is `'9'`, `True` otherwise.  This lets us import rows that have no location if we want to.  To keep the Code-Point Open import behaviour we provide an option `--allow-no-location-postcodes` to turn this on; without this option the importer will not import these rows.
    h-lame committed Dec 15, 2015
    Configuration menu
    Copy the full SHA
    7f0f41e View commit details
    Browse the repository at this point in the history
  5. Merge onspd and nspd_crown_dependencies importers

    Now that we can handle rows with no location we don't need a separate importer for crown-dependencies.  We add a new `--crown-dependencies` option to the onspd importer that takes the following options: `include`, `exclude`, or `only`.
    
    The default is `exclude` which means we retain the previous behaviour of the onspd importer to only import GB postcodes and require a second run to import the crown dependency ones.  Choosing `include` means we import all the GB + crown dependency postcodes in one pass (we still ignore the NI ones though).  The `only` option means that we want to only import crown dependency postcodes - this gives us the behaviour of the old nspd_crown_dependencies importer, which is useful should you choose to import GB postcodes from the Code-Point Open dataset.
    
    Note that crown dependency postcodes (currently) have no location information in ONSPD and while the importer has an option for allowing postcodes with no location to be imported (--allow-no-location-postcodes) this option has no effect on the behaviour of importing crown dependency postcodes.  Crown Dependency postcodes are imported solely based on the value of the --crown-dependencies option, as outlined above.  This could be confusing, but the help text should make it clear.
    h-lame committed Dec 15, 2015
    Configuration menu
    Copy the full SHA
    8dafb12 View commit details
    Browse the repository at this point in the history
  6. Add command for importing osni boundary data

    This is mostly a rip from import_boundary_line but instead of taking one
    file as an argument we specify which files to work on as options because
    the OSNI releases are single files per shape, unlike Boundary-Line which
    has everything in one folder.  We could expect a user to put everything in
    one folder first, but we need to handle each shapefile differently
    depending on how we expect it to be used.  For example the Westminster
    Parliamentary constituencies boundaries have the name in PC_NAME and the
    gss code in PC_ID, whereas the LGD file has them in LGDNAME and LGDCode.
    
    We add a new codetype for identifying the OSNI id of the boundaries and a
    new nametype for the OSNI names.
    h-lame committed Dec 15, 2015
    Configuration menu
    Copy the full SHA
    503745d View commit details
    Browse the repository at this point in the history
  7. Add option for importing OSNI ward boundaries

    Name is in WARDNAME and GSS code is in WardCode.  We did see them publish
    this data in a set that had no WardCode, but did have LGDName so we could
    use our ni-electoral-areas-2015.csv fixture to do a name match to get the
    GSS code.  Hopefully though they won't publish that dataset again.
    h-lame committed Dec 15, 2015
    Configuration menu
    Copy the full SHA
    d861f29 View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2015

  1. Import Northern Ireland Assembly constituencies

    We process the Westminister file twice, once to generate areas with type
    WMC and then again to generate areas with type NIE.  This is what the old
    shape-less importer did so I assume it's still good.  These NIE areas
    don't have GSS codes because they are legally identical to the Westminister
    constituencies and don't have a code issued by the ONS.
    h-lame committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    1f4b13a View commit details
    Browse the repository at this point in the history
  2. Import NI Electoral Areas (LGE)

    These are the areas that live between LGD (NI Councils) and LGW (NI
    Wards).  The shapefile released by the OSNI does not include GSS codes
    so we have to synthesize them by doing a name lookup against the
    ni-electoral-areas-2015.csv fixture.  We've asked OSNI if they plan to
    expose GSS codes in this dataset.
    h-lame committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    b861d8f View commit details
    Browse the repository at this point in the history
  3. Import NI EUR area

    Note that importing this area generates some warnings about invalid
    geometry, but the existing "fix_invalid_geos_geometry" is able to turn
    the data into something that it considers valid.
    h-lame committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    47a26c3 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b924486 View commit details
    Browse the repository at this point in the history
  5. Simplify field extraction by using objects not hashes

    This means we don't need to deal with strings vs callables and can isolate
    complexity to the area codes that need it (LGE and NIE).
    h-lame committed Dec 16, 2015
    Configuration menu
    Copy the full SHA
    5f19d5f View commit details
    Browse the repository at this point in the history

Commits on Dec 17, 2015

  1. Allow specifying SRID of OSNI imports

    From looking at the shapefiles in a viewing tool (qgis) it appears that the
    co-ords are in the NI projection (29902) in some files, but in the 102100
    project in others.  For some files this matches up with the arcgis
    metadata pointed to by the OSNI for each release (e.g. the LGEs
    dataset[1] points to the following metadata as its source[2] which lists 29902
    as the projection, and the LGW dataset[3] points to source metadata[4] which
    lists 102100 projection).  For others however this is not true (e.g. the LGDs
    dataset[5] has a source metadata[6] that says 299002, but the downloaded
    shapefile is actually in 102100).
    
    As it appears this is changeable per release (possibly per download) we allow
    for telling the importer what srid a given file is in.  If it's not 29902 we
    convert it to 29902 before importing so that everything is consistent.
    
    Note that for some reason the geometry imported from the shapefiles does
    not contain an SRID for some reason so we have to set it even if we don't have
    to transform it.
    
    As a further wrinkle, PostGIS doesn't support 102100, but it is mathematically
    equivalent to 3857 which it does support.  Unfortunately using that projection
    causes failures during for point-based lookup of parents, but if we use 4326
    instead it works.  Apparently 102100 and 4326 are both "web mercator"
    projections so are probably very similar (if not exactly mathematically
    equivalent).  Interestingly opening a shapefile that is in 102100 in a viewing
    tool such as qgis reports it as 4326 whereas a 29902 reports as a custom
    projection that is identical in all but name to 29902.  This suggests it's safe
    to use 4326 as a replacement for 102100.
    
    The defaults we set for the options are based on the SRIDs of the data files we've downloaded in Dec 2015 - they may change over time.
    
    [1]: http://osni.spatial-ni.opendata.arcgis.com/datasets/981a83027c0e4790891baadcfaa359a3_4
    [2]: https://gisservices.spatialni.gov.uk/arcgisc/rest/services/OpenData/OSNIOpenData_LargescaleBoundaries/MapServer/4
    [3]: http://osni.spatial-ni.opendata.arcgis.com/datasets/55cd419b2d2144de9565c9b8f73a226d_0
    [4]: https://services3.arcgis.com/dNsInyVNGMqG1QjF/arcgis/rest/services/OSNI_Open_Data_Largescale_Boundaries_Wards_2012/FeatureServer/0
    [5]: http://osni.spatial-ni.opendata.arcgis.com/datasets/a55726475f1b460c927d1816ffde6c72_2
    [6]: https://gisservices.spatialni.gov.uk/arcgisc/rest/services/OpenData/OSNIOpenData_LargescaleBoundaries/MapServer/2
    h-lame committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    645a722 View commit details
    Browse the repository at this point in the history
  2. Handle NI postcodes in ONSPD importer

    We add an option to allow importing the NI postcodes at the same time as the rest of the postcodes.  The option takes 3 values: 'include', 'exclude', 'only' with the same behaviour as the --crown-dependencies option:
    
    * 'include' will import NI postcodes
    * 'exclude' will not import NI postcodes
    * 'only' will only import NI postcodes
    
    The default is 'exclude' to maintain previous behaviour.  Unlike Crown Dependency postcodes NI postcodes might have location data, and so the --allow-no-location-postcodes setting (default false) does affect how we import NI postcodes.
    
    Setting both --crown-dependencies and --northern-ireland to 'only' is an error and will halt the importer before it begins.
    
    We've also updated the documentation provided by the options to be clearer about how the various options interact.
    
    Unlike the old nspd_ni importer that relied on the nspd_ni_areas importer to be run and the ni-electoral-areas.csv to directly assign areas to NI postcodes, this importer has no special handling.  We assume that the new OSNI importer has been run and the relevant shapefiles have been imported, much like we assume that the boundary-line importer has been run to provide the areas for the rest of the UK.
    h-lame committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    eef599c View commit details
    Browse the repository at this point in the history
  3. Allow specifying srid for GB vs. NI postcodes

    The --gb-srid and --ni-srid options have defaults (27700 and 29902
    respectively) that are sensible and will change the --srid option on
    a per row basis if the postcode is for Northern Ireland (e.g. starts
    with BT) or not.
    h-lame committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    91ca8f1 View commit details
    Browse the repository at this point in the history
  4. Provide ONSPD version of scilly command

    One command can, by checking the lengths of the rows, work on both
    Code-Point Open and ONSPD files for dealing with scilly wards.
    h-lame committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    c55e7f6 View commit details
    Browse the repository at this point in the history

Commits on Dec 18, 2015

  1. Add script for adding GSS codes to NI Areas

    Some mapit installations already have NI Areas with or without boundaries
    but these areas may not have GSS codes.  This script uses the
    ni-electoral-areas-2015.csv hierarchy to find LGDs, LGEs, and LGWs by
    name and add their GSS codes.  Because names are not neccessarily unique
    it respects the hierarchy in the fixture.  If names cannot be found (it
    does a case-insensitive lookup) the row is ignored and a warning issued.
    h-lame committed Dec 18, 2015
    Configuration menu
    Copy the full SHA
    822f443 View commit details
    Browse the repository at this point in the history
  2. Add script for adding names to NI Areas

    The names in the OSNI data don't always match the names for the same
    areas in the ni-electoral-areas-2015.csv fixture which was extracted from
    the legislation.  In some cases it's just an uppercase difference, or a
    lack of punctuation.  In others the names are completely different.  For
    example in the fixture GSS N09000011 is called "North Down and Ards" but
    in the OSNI shapefile it is called "East Coast".  Turns out this is
    because the council voted to change the name to the OSNI one, but backed
    down after outcry and reverted[1].
    
    This script goes through the fixture and matches on GSS code to find the
    Areas and add a new override name to the area if the fixture name is not
    already present.
    
    [1]: http://www.belfasttelegraph.co.uk/news/northern-ireland/backlash-forces-council-to-ditch-new-east-coast-name-that-cost-thousands-30902221.html
    h-lame committed Dec 18, 2015
    Configuration menu
    Copy the full SHA
    2617420 View commit details
    Browse the repository at this point in the history
  3. Correct LGD names in ni-electoral-areas-2015.csv

    Mostly this is just extending the name to include the council type (District, Borough, or City), similar to naming of some council areas in the rest of the UK.  In the case of "North Down and Ards" we also rename to their final name choice of "Ards and North Down".  For "Derry and Strabane" and "Armagh, Banbridge and Craigavon" we also include "City" in the appropriate place ("Derry City" and "Armagh City").
    h-lame committed Dec 18, 2015
    Configuration menu
    Copy the full SHA
    99847c4 View commit details
    Browse the repository at this point in the history
  4. Make adding gss codes to ni areas work for real data

    We incorporate the feedback from mysociety about running the `mapit_UK_add_gss_codes_to_ni_areas` command against their real data.  Because we're doing name matches we need to change our naive `names_name__iexact` match and sanitize the data a bit.
    h-lame committed Dec 18, 2015
    Configuration menu
    Copy the full SHA
    f4e27f3 View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2016

  1. Allow add_x_to_ni_areas scripts work on 1st import

    If there are no active generations we set the "current" generation to
    the "new" generation.  Otherwise we try to find objects in the 0th
    generation and this won't work.
    h-lame committed Jan 4, 2016
    Configuration menu
    Copy the full SHA
    c159ba1 View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2016

  1. Convert geometry to application projection in NI shape imports

    We used to import all the NI shapes with an srid of 29902 (the Irish grid [1]).
    For some reason when the geometry was extracted from the DB it was in 27700
    (the GB grid [2]) but had not undergone any transformation from 29902 to 27700.
    Consequently the NI shapes were in the wrong place (covering Liverpool, North
    Wales and some of the Irish Sea).
    
    It's not clear how this happened, but we can fix it by always transforming the
    NI shape data from whatever srid it is provided as into the 27700 srid used by
    the rest of the UK data.  Note that we actually use the
    `settings.MAPIT_AREA_SRID` srid and not 27700 directly as in most cases of a UK
    instance of mapit this will be 27700, but in the off chance it's not we don't
    want things to break.
    
    [1]: http://spatialreference.org/ref/epsg/tm65-irish-grid/
    [2]: http://spatialreference.org/ref/epsg/27700/
    h-lame committed Jan 5, 2016
    Configuration menu
    Copy the full SHA
    0730197 View commit details
    Browse the repository at this point in the history