Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import UK postcodes from ONSPD instead of Code-Point Open #216

Closed
wants to merge 22 commits into from

Conversation

h-lame
Copy link
Contributor

@h-lame h-lame commented Nov 26, 2015

Code-Point Open only contains live Postcodes for GB and we need ONSPD to get the postcodes for Northern Ireland and Crown Dependencies. As well as the NI and CD postcodes the ONSPD also contains all the postcodes that Code-Point Open has and all the terminated postcodes. This PR introduces a new mapit_UK_import_onspd command that can be used to import all of these postcodes instead of using a mixture of Code-Point Open and ONSPD.

Because the ONSPD now contains only GSS codes we also introduce a new mapit_UK_import_onspd_ni_areas command to replace the existing mapit_UK_import_nspd_ni_areas command. The new command a new set of fixtures that generates the most recent (April 2015) Northern Ireland areas for: Councils (LGD), Electoral Areas (LGE), Wards (LGW), Westminister Constituencies (WMC), and Northern Irish Assembly Constituencies (NIE). These areas, like those created by the old command, do not have boundary data.

The new mapit_UK_import_onspd importer has lots of options to configure which postcodes to import from the dataset and to allow it to be used as a drop in replacement for the various old commands if required. The default behaviour with no options is to import only what the mapit_UK_import_codepoint command would: live GB postcodes with locations. The configuration options are:

  • --allow-terminated-postcodes - supplying this option imports terminated postcodes as well as live ones
  • --allow-no-location-postcodes - supplying this option imports postcodes with no location information
  • --crown-dependencies - this option controls how to import postcodes from Crown Dependencies
  • --northern-ireland - this option controls how to import postcodes from Northern Ireland

The last two accept "include" (to mean import these postcodes), "exclude" (to mean don't import these postcodes), or "only" (to mean only import these postcodes, ignoring all others) as values, with "exclude" being the default.

We don't remove the mapit_UK_import_codepoint importer, people may still want to use it afterall - it's a smaller dataset and if you don't want terminated, NI, or CD postcodes, it may still be useful. We do however make the scilly command work with either Code-Point Open or ONSPD files to work its magic.

Documentation for importing the UK includes these lines currently (see: http://mapit.poplus.org/docs/self-hosted/import/uk/):

./manage.py mapit_UK_import_codepoint ../data/Code-Point-Open/*.csv
./manage.py mapit_UK_scilly ../data/Code-Point-Open/tr.csv
./manage.py mapit_UK_import_nspd_ni_areas
./manage.py mapit_UK_import_nspd_ni ../data/ONSPD.csv
./manage.py mapit_UK_import_nspd_crown_dependencies ../data/ONSPD.csv

With this new importer we can replace it with:

./manage.py mapit_UK_import_onspd_ni_areas
./manage.py mapit_UK_import_onspd --northern-ireland=include --crown-dependencies=include ../data/ONSPD.csv
./manage.py mapit_UK_scilly ../data/ONSPD.csv

Or, if we wanted to import everything from ONSPD:

./manage.py mapit_UK_import_onspd_ni_areas
./manage.py mapit_UK_import_onspd --allow-terminated-postcodes --allow-no-location-postcodes --northern-ireland=include --crown-dependencies=include ../data/ONSPD.csv
./manage.py mapit_UK_scilly ../data/ONSPD.csv

Some more detail (for example where the data comes from for the new NI fixtures) can be found in the commit messages.

@h-lame
Copy link
Contributor Author

h-lame commented Nov 26, 2015

The commits here tell the whole story, in particular that I started with many separate commands and then merged them after feedback in irc. Let me know if you'd like things squashed to hide the "miss-step".

return options['crown-dependencies'] == 'exclude' # reject if we should exclude these codes
elif options['crown-dependencies'] == 'only':
return True # if we're only importing these codes, reject other codes
return False # otherwise keep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the indent is wrong here, though it has no material effect :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! I reckon I must have rebase-awayed the else: part.

@dracos
Copy link
Member

dracos commented Dec 4, 2015

Thanks for this, looks really good :-) As you have now spotted OSNI have released ward etc boundary data as OpenData, so the NI aspects of this can probably be much simplified! I guess it might be worth doing something with that first rather than later? I think almost all of this can remain the same, it just wouldn't need to add anything to areas in the post_row any longer (as the boundaries would do it for you), and we'd need to adjust the area script to actually load in the boundaries instead of working things out.

On your last point, yes please do squash it however you want :) Perhaps we could for now ignore the boundary things and start with 3a4d450 and then all the ONSPD postcode import script stuff (which works assuming boundaries are okay, as I understand it), and then have new NI stuff on top.

Importing everything from one source reduces what we have to download and we have to download ONSPD to get NI and Crown dependencies which aren't in Code Point Open.

Other advantages are that the ONSPD has all live and terminated postcodes in it and (as of Aug 2015 release at least) everything is a gss code, rather than a mix of ons + gss.

We add a toggle to allow importing terminated postcodes as it can be useful to have old postcodes in the db to allow searching on old addresses.  Note that although code point open doesn't include terminated postcodes we can end up with them in our dataset, but only if we had a long-lived database that imported multiple releases of the dataset.  For example, we import the May 2012 dataset and then when the next one comes out in Aug 2012 we import that - our db will have in it the postcodes that were terminated between those two releases, but it won't have any that were terminated before May 2012.  This leads to the situation where a db rebuilt from scratch using the current dataset would have a different set of postcodes to one that had been around for a few years having had releases imported as they arrive.  Notionally both represent the current data, but one has more postcodes.  Using the ONSPD and allowing terminated postcodes fixes this problem.
If `--no-location` is not set we would try to detect a location for each row, and this would break if the location fields could not be coerced into floats.  Some datasets mix location and non-location postal codes and to import them all we have to filter the data and run the importer twice.

This change allows individual importers to implement `location_available_for_row` to say if the supplied row has location data or not.  The method is called on each row and will run the `--no-location` path if we can't extract location fields for that row.  If `--no-location` is set, we always run that path, regardless of the `location_available_for_row` value.
In April 2015 councils and wards changed in Northern Ireland so the old ni-electoral-areas data files no longer represent the truth.

The new ni-electoral-areas-2015 file provides the names and GSS codes of the new Districts, Electoral Areas, and Wards of Northern Ireland following the Apr 2015 reorganisation.  We synthesized this from a few datasets.
** The District -> Electoral Area -> Ward breakdown is taken directly from the legislation[2] - although this only contains the names
** The GSS codes of the Districts and Wards are taken from the "Wards (2015) to district council areas (2015) NI lookup"[3] dataset provided by the ONS.
** The GSS codes of the Electoral areas are taken from running sparql queries against the ONS Linked Data Portal[4].  The query used was:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX geography: <http://statistics.data.gov.uk/def/>
    PREFIX statistical-entity: <http://statistics.data.gov.uk/def/statistical-entity#>
    SELECT DISTINCT ?item

    WHERE {
    ?item rdf:type geography:statistical-geography .

      ?item statistical-entity:code <http://statistics.data.gov.uk/id/statistical-entity/N10> .
    }
    ORDER BY ASC(?code)
    LIMIT 100
    OFFSET 0

N10 is the code for a NI Electoral Area (found by looking up one of the areas by name and investigating).  We then extract the data for each entry in this result set and extract the GSS code to match up with the name.

This data will be used to help import NI areas from the OSNI boundary datasets which do not all contain GSS codes.

[1]: https://geoportal.statistics.gov.uk/geoportal/catalog/search/resource/details.page?uuid=%7B2196C1D5-6A11-47DE-BD0E-26311E3D6D9F%7D
[2]: http://www.legislation.gov.uk/uksi/2014/270/made
[3]: https://geoportal.statistics.gov.uk/geoportal/catalog/search/resource/details.page?uuid=%7BFE83C43C-9403-408C-833C-367BC56659C9%7D
[4]: http://statistics.data.gov.uk/
Some rows in the ONSPD that have no location - those where the 12th column is a '9'.  The existing importers for ni and gb postcodes automatically ignore these, but it can be useful to include them for existence checks even if we can't give more geographical information about them.

We implement `location_available_for_row` to return `False` if the quality row is `'9'`, `True` otherwise.  This lets us import rows that have no location if we want to.  To keep the Code-Point Open import behaviour we provide an option `--allow-no-location-postcodes` to turn this on; without this option the importer will not import these rows.
Now that we can handle rows with no location we don't need a separate importer for crown-dependencies.  We add a new `--crown-dependencies` option to the onspd importer that takes the following options: `include`, `exclude`, or `only`.

The default is `exclude` which means we retain the previous behaviour of the onspd importer to only import GB postcodes and require a second run to import the crown dependency ones.  Choosing `include` means we import all the GB + crown dependency postcodes in one pass (we still ignore the NI ones though).  The `only` option means that we want to only import crown dependency postcodes - this gives us the behaviour of the old nspd_crown_dependencies importer, which is useful should you choose to import GB postcodes from the Code-Point Open dataset.

Note that crown dependency postcodes (currently) have no location information in ONSPD and while the importer has an option for allowing postcodes with no location to be imported (--allow-no-location-postcodes) this option has no effect on the behaviour of importing crown dependency postcodes.  Crown Dependency postcodes are imported solely based on the value of the --crown-dependencies option, as outlined above.  This could be confusing, but the help text should make it clear.
This is mostly a rip from import_boundary_line but instead of taking one
file as an argument we specify which files to work on as options because
the OSNI releases are single files per shape, unlike Boundary-Line which
has everything in one folder.  We could expect a user to put everything in
one folder first, but we need to handle each shapefile differently
depending on how we expect it to be used.  For example the Westminster
Parliamentary constituencies boundaries have the name in PC_NAME and the
gss code in PC_ID, whereas the LGD file has them in LGDNAME and LGDCode.

We add a new codetype for identifying the OSNI id of the boundaries and a
new nametype for the OSNI names.
Name is in WARDNAME and GSS code is in WardCode.  We did see them publish
this data in a set that had no WardCode, but did have LGDName so we could
use our ni-electoral-areas-2015.csv fixture to do a name match to get the
GSS code.  Hopefully though they won't publish that dataset again.
@h-lame
Copy link
Contributor Author

h-lame commented Dec 15, 2015

Apologies for the delay. I've updated the PR to squash the work somewhat and incorporate importing boundaries for NI areas from the OSNI data available under open government licence from http://osni.spatial-ni.opendata.arcgis.com/. This data is (currently) missing GSS codes for NI Electoral Areas (LGE) so we still use the ni-electoral-areas-2015.csv fixture data to do name->GSS matching.

Would there be interest in me providing the specific datafiles I used to write the osni importer so they can be mirrored on http://parlvid.mysociety.org/os/ ? The software powering http://osni.spatial-ni.opendata.arcgis.com/ means you can't just link directly to a download because they're generated on the fly (and then cached for a bit I guess). A given download URL may return a JSON payload telling you how the download generation is going instead of the requested download itself. The URLS also aren't persistent - as the datasets are re-released with fixes the uuid/sha in the urls changes.

@h-lame
Copy link
Contributor Author

h-lame commented Dec 15, 2015

FWIW - this PR no longer includes the ability to import boundary-less NI areas from the new fixtures, and no longer supports directly assigning these shapeless NI Areas to NI postcodes in the ONSPD importer (I do still have those versions in another branch if we wanted to keep them though). I haven't removed the old NI importers though, you may still want them for importing NI postcodes/ areas for historical datasets.

@dracos
Copy link
Member

dracos commented Dec 15, 2015

Wow, thanks :) No need to apologise! Good work on the random SRID stuff…

If someone (cough, us) has already got Northern Ireland areas in our database, from LGW up to NIE, is there a way we can add the boundaries from this data without creating new areas (given they aren’t new areas, they’re the same)? Could there be a control file provided that would do name matching, that people (cough, us) could then use? :)

"Would there be interest in me providing the specific datafiles I used to write the osni importer so they can be mirrored on http://parlvid.mysociety.org/os/ ?” - Yes please, the OS site behaves the same and we grab copies from there and mirror them as you suggest.

We process the Westminister file twice, once to generate areas with type
WMC and then again to generate areas with type NIE.  This is what the old
shape-less importer did so I assume it's still good.  These NIE areas
don't have GSS codes because they are legally identical to the Westminister
constituencies and don't have a code issued by the ONS.
These are the areas that live between LGD (NI Councils) and LGW (NI
Wards).  The shapefile released by the OSNI does not include GSS codes
so we have to synthesize them by doing a name lookup against the
ni-electoral-areas-2015.csv fixture.  We've asked OSNI if they plan to
expose GSS codes in this dataset.
Note that importing this area generates some warnings about invalid
geometry, but the existing "fix_invalid_geos_geometry" is able to turn
the data into something that it considers valid.
This means we don't need to deal with strings vs callables and can isolate
complexity to the area codes that need it (LGE and NIE).
From looking at the shapefiles in a viewing tool (qgis) it appears that the
co-ords are in the NI projection (29902) in some files, but in the 102100
project in others.  For some files this matches up with the arcgis
metadata pointed to by the OSNI for each release (e.g. the LGEs
dataset[1] points to the following metadata as its source[2] which lists 29902
as the projection, and the LGW dataset[3] points to source metadata[4] which
lists 102100 projection).  For others however this is not true (e.g. the LGDs
dataset[5] has a source metadata[6] that says 299002, but the downloaded
shapefile is actually in 102100).

As it appears this is changeable per release (possibly per download) we allow
for telling the importer what srid a given file is in.  If it's not 29902 we
convert it to 29902 before importing so that everything is consistent.

Note that for some reason the geometry imported from the shapefiles does
not contain an SRID for some reason so we have to set it even if we don't have
to transform it.

As a further wrinkle, PostGIS doesn't support 102100, but it is mathematically
equivalent to 3857 which it does support.  Unfortunately using that projection
causes failures during for point-based lookup of parents, but if we use 4326
instead it works.  Apparently 102100 and 4326 are both "web mercator"
projections so are probably very similar (if not exactly mathematically
equivalent).  Interestingly opening a shapefile that is in 102100 in a viewing
tool such as qgis reports it as 4326 whereas a 29902 reports as a custom
projection that is identical in all but name to 29902.  This suggests it's safe
to use 4326 as a replacement for 102100.

The defaults we set for the options are based on the SRIDs of the data files we've downloaded in Dec 2015 - they may change over time.

[1]: http://osni.spatial-ni.opendata.arcgis.com/datasets/981a83027c0e4790891baadcfaa359a3_4
[2]: https://gisservices.spatialni.gov.uk/arcgisc/rest/services/OpenData/OSNIOpenData_LargescaleBoundaries/MapServer/4
[3]: http://osni.spatial-ni.opendata.arcgis.com/datasets/55cd419b2d2144de9565c9b8f73a226d_0
[4]: https://services3.arcgis.com/dNsInyVNGMqG1QjF/arcgis/rest/services/OSNI_Open_Data_Largescale_Boundaries_Wards_2012/FeatureServer/0
[5]: http://osni.spatial-ni.opendata.arcgis.com/datasets/a55726475f1b460c927d1816ffde6c72_2
[6]: https://gisservices.spatialni.gov.uk/arcgisc/rest/services/OpenData/OSNIOpenData_LargescaleBoundaries/MapServer/2
We add an option to allow importing the NI postcodes at the same time as the rest of the postcodes.  The option takes 3 values: 'include', 'exclude', 'only' with the same behaviour as the --crown-dependencies option:

* 'include' will import NI postcodes
* 'exclude' will not import NI postcodes
* 'only' will only import NI postcodes

The default is 'exclude' to maintain previous behaviour.  Unlike Crown Dependency postcodes NI postcodes might have location data, and so the --allow-no-location-postcodes setting (default false) does affect how we import NI postcodes.

Setting both --crown-dependencies and --northern-ireland to 'only' is an error and will halt the importer before it begins.

We've also updated the documentation provided by the options to be clearer about how the various options interact.

Unlike the old nspd_ni importer that relied on the nspd_ni_areas importer to be run and the ni-electoral-areas.csv to directly assign areas to NI postcodes, this importer has no special handling.  We assume that the new OSNI importer has been run and the relevant shapefiles have been imported, much like we assume that the boundary-line importer has been run to provide the areas for the rest of the UK.
The --gb-srid and --ni-srid options have defaults (27700 and 29902
respectively) that are sensible and will change the --srid option on
a per row basis if the postcode is for Northern Ireland (e.g. starts
with BT) or not.
One command can, by checking the lengths of the rows, work on both
Code-Point Open and ONSPD files for dealing with scilly wards.
Some mapit installations already have NI Areas with or without boundaries
but these areas may not have GSS codes.  This script uses the
ni-electoral-areas-2015.csv hierarchy to find LGDs, LGEs, and LGWs by
name and add their GSS codes.  Because names are not neccessarily unique
it respects the hierarchy in the fixture.  If names cannot be found (it
does a case-insensitive lookup) the row is ignored and a warning issued.
The names in the OSNI data don't always match the names for the same
areas in the ni-electoral-areas-2015.csv fixture which was extracted from
the legislation.  In some cases it's just an uppercase difference, or a
lack of punctuation.  In others the names are completely different.  For
example in the fixture GSS N09000011 is called "North Down and Ards" but
in the OSNI shapefile it is called "East Coast".  Turns out this is
because the council voted to change the name to the OSNI one, but backed
down after outcry and reverted[1].

This script goes through the fixture and matches on GSS code to find the
Areas and add a new override name to the area if the fixture name is not
already present.

[1]: http://www.belfasttelegraph.co.uk/news/northern-ireland/backlash-forces-council-to-ditch-new-east-coast-name-that-cost-thousands-30902221.html
@h-lame
Copy link
Contributor Author

h-lame commented Dec 18, 2015

I think this latest version covers everything.

  1. Instead of a control file for the OSNI import I added a script to go through the ni-electoral-areas-2015.csv and do name lookups on any existing NI areas (EUR, LGD, LGW, LGE only) to add GSS codes (mapit_UK_add_gss_codes_to_ni_areas in 822f443). All we can do is a name lookup and this might not be particularly useful depending on the existing data because...
  2. The OSNI data sources have names that don't include punctuation and in some cases are completely different (e.g. "East Coast" vs. "North Down and Ards"). To help with this I added a script to go through the ni-electoral-areas-2015.csv and do GSS code lookups on the same areas and add any missing names as overrides (mapit_UK_add_names_to_ni_areas in 2617420). Of course this can only do GSS lookup so there's something of a chicken+egg problem for those who already have areas, but no boundaries and no GSS codes.
  3. On IRC we talked about the datasets question and there's now a mirror of the data I used at http://parlvid.mysociety.org/os/osni-dec-2015.tar.gz
  4. Had to allow for specifying the SRID of each OSNI shapefile as it can change depending on the release / download. Gory details in 645a722.

Is the documentation available as a collaborative site? There's enough change here that http://mapit.poplus.org/docs/self-hosted/import/uk/ would become out of date and it would be remiss of me if I didn't offer to help update that too.

@dracos
Copy link
Member

dracos commented Dec 18, 2015

It’s actually Ards and North Down, they didn’t revert :-) http://www.belfasttelegraph.co.uk/news/politics/inside-the-truly-bonkers-world-of-our-new-super-councils-30968587.html

Your script in 1 works fine on our MapIt, great, apart from a couple of tweaks, given below (I guess the Ards one should be changed in the CSV). I’ve gone for the direct name entry as for some reason our current NI areas don’t have anything in names; name should always be set, so that should be fine. I’ve got it do a startswith match if the exact fails, which catches the council names ending in District/Borough/City Council.

The docs site is at https://github.com/mysociety/mapit.poplus.org :)

$ diff mapit_gb/management/commands/mapit_UK_add_gss_codes_to_ni_areas.py{O,}
83a84,89
>         area_name = area_name.replace('St. ', 'St ')
>         if area_type == 'LGD':
>             area_name = area_name.replace('Derry and', 'Derry City and')
>             area_name = area_name.replace('Armagh, ', 'Armagh City, ')
>             area_name = area_name.replace('North Down and Ards', 'Ards and North Down')
> 
87c93
<                 names__name__iexact=area_name,
---
>                 name__iexact=area_name,
90a97,107
>         except Area.DoesNotExist:
>             try:
>                 area = area_source.get(
>                     country=self.country, type__code=area_type,
>                     name__istartswith=area_name,
>                     generation_low__lte=self.current_generation,
>                     generation_high__gte=self.current_generation
>                 )
>             except Area.DoesNotExist:
>                 return None, False
>         else:
95,96d111
<         except Area.DoesNotExist:
<             return None, False

@h-lame
Copy link
Contributor Author

h-lame commented Dec 18, 2015

Thanks for trying that out. I thought it might need tweaking to work correctly on your data, so I'll fold in your change - it's ultimately for your install so it's best if it reflects your data.

I'll update the CSV too in a new commit for "Ards and North Down" and I'll check the names of all of them just in case (hope that wikipedia is up to date ;).

Mostly this is just extending the name to include the council type (District, Borough, or City), similar to naming of some council areas in the rest of the UK.  In the case of "North Down and Ards" we also rename to their final name choice of "Ards and North Down".  For "Derry and Strabane" and "Armagh, Banbridge and Craigavon" we also include "City" in the appropriate place ("Derry City" and "Armagh City").
@h-lame
Copy link
Contributor Author

h-lame commented Dec 18, 2015

Updated the csv with names that include the council type for LGDs. Also added "City" where appropriate and changed "North Down and Ards" to "Ards and North Down" so I didn't fold in these lines:

>         if area_type == 'LGD':
>             area_name = area_name.replace('Derry and', 'Derry City and')
>             area_name = area_name.replace('Armagh, ', 'Armagh City, ')
>             area_name = area_name.replace('North Down and Ards', 'Ards and North Down')

from your diff as I reckon they'd be redundant now.

We incorporate the feedback from mysociety about running the `mapit_UK_add_gss_codes_to_ni_areas` command against their real data.  Because we're doing name matches we need to change our naive `names_name__iexact` match and sanitize the data a bit.
If there are no active generations we set the "current" generation to
the "new" generation.  Otherwise we try to find objects in the 0th
generation and this won't work.
@h-lame
Copy link
Contributor Author

h-lame commented Jan 4, 2016

Added one (hopefully) final commit to allow running some of the new scripts on first import. Checking for objects in the right generation was always returning None because we look for something no higher than the 0th generation (which doesn't exist).

@h-lame
Copy link
Contributor Author

h-lame commented Jan 4, 2016

Disappointingly I expect another commit (or rebase) tomorrow - some checks locally show me that the NI shape data has imported into the wrong place. It's dropped it all on top of Liverpool / North Wales. I suspect I need to do more SRID/projection manipulation to get it into the correct place.

We used to import all the NI shapes with an srid of 29902 (the Irish grid [1]).
For some reason when the geometry was extracted from the DB it was in 27700
(the GB grid [2]) but had not undergone any transformation from 29902 to 27700.
Consequently the NI shapes were in the wrong place (covering Liverpool, North
Wales and some of the Irish Sea).

It's not clear how this happened, but we can fix it by always transforming the
NI shape data from whatever srid it is provided as into the 27700 srid used by
the rest of the UK data.  Note that we actually use the
`settings.MAPIT_AREA_SRID` srid and not 27700 directly as in most cases of a UK
instance of mapit this will be 27700, but in the off chance it's not we don't
want things to break.

[1]: http://spatialreference.org/ref/epsg/tm65-irish-grid/
[2]: http://spatialreference.org/ref/epsg/27700/
@h-lame
Copy link
Contributor Author

h-lame commented Jan 5, 2016

Ok. This last commit 0730197 should finally close this off as ready to review/merge/etc...

Details in the commit but in summary because the rest of the shape data for UK is in 27700, and the default SRID for the instance is 27700 it makes sense to transform the NI shape areas to 27700 for storage. If we don't do this then when the geometry is retrieved from the DB it reports as being in 27700 without having actually being transformed from whatever SRID is is in, so ultimately is wrong (e.g. the NI shapes end up being over Liverpool/North Wales).

Apologies for the churn here. I've done more local testing and visualising and am way more confident that I'm actually done now.

h-lame added a commit to h-lame/mapit.poplus.org that referenced this pull request Jan 6, 2016
We break the UK import documentation into two parts.  The second part is the original documentation, now titled as "For data released before Nov 2015".  The first part, titled "For data released after Nov 2015", covers what commands to use to import postcodes solely from ONSPD and import NI shape data from the OSNI Open Data.

This acts as documentation for the changes in mysociety/mapit#216
@dracos
Copy link
Member

dracos commented Jan 22, 2016

I have cherry-picked b2d1d6d 74aa673 7f0f41e 8dafb12 32d772e and 99847c4 (the first five commits and the ni-electoral-areas-2015.csv fixup) into master (I word wrapped and tweaked a few of the commit messages (e.g. removed an unused footnote from one) but otherwise made no changes). If you want to rebase the rest of this PR on top of that, feel free, otherwise I'll continue merging as time allows, thanks :-)

@dracos
Copy link
Member

dracos commented Feb 10, 2016

Sigh, Safari ate my comment :-/ This is all now rebased/squashed/merged, thanks very much for your help and patience. Using diff <(git diff master...alphagov/upstream-onspd-importers mapit_gb) <(git diff bb2bc4a..upstream-onspd-importers mapit_gb) you can see I've made the following changes which you might want to look at:

  • Changed some print to self.stdout.writes (helped with testing with piping without encoding issues)
  • Stopped a "St." creeping in via add_names_to_ni_areas ;)
  • Allowed the two add_*_to_ni_areas scripts to be run without needing a new generation (as I've run them first on our MapIt), just using the latest generation, active or not
  • Added a name match of NIE areas in import_osni so that our existing NIE areas can be used, without having to create new ones (this last one as a new commit, the rest squashed in).

This is all now present on mapit.mysociety.org too, e.g. http://mapit.mysociety.org/area/16985.html has a boundary for the first time ever, hooray! I'm just going through and checking the manually created postcode–area lookups match before dropping them (it doesn't affect the API call but they do duplicate on the HTML page). Thanks again, ask if you have any questions.

@dracos dracos closed this Feb 10, 2016
@dracos dracos removed the Current label Feb 10, 2016
@h-lame
Copy link
Contributor Author

h-lame commented Feb 11, 2016

Brilliant, thanks @dracos! If we have more changes in the future we'll do our best to make them smaller and so they're easier to incorporate, rather than an single massive "change everything plz" PR like this one.

@barrucadu barrucadu deleted the upstream-onspd-importers branch April 27, 2018 12:53
@timwis
Copy link

timwis commented Dec 16, 2019

Forgive me if I'm misunderstanding, but doesn't this suggest MapIt will return results for terminated postcodes? I'm getting no results for G41 5PF, which is in the ONSPD dataset and existed from 1990-1992 (the good ol' days, of Bryan Adams, Mariah Carey, and Boyz II Men yore).

@h-lame
Copy link
Contributor Author

h-lame commented Dec 16, 2019

Forgive me if I'm misunderstanding, but doesn't this suggest MapIt will return results for terminated postcodes? I'm getting no results for G41 5PF, which is in the ONSPD dataset and existed from 1990-1992 (the good ol' days, of Bryan Adams, Mariah Carey, and Boyz II Men yore).

Hi @timwis - it might depend on the import arguments fo the mapit instance you're using as allowing terminated postcodes is an optional feature, specified with --allow-terminated-postcodes when importing. If it's the public mapit instance maybe other MySociety folks can confirm what options were used, and if you should be seeing terminated postcodes.

@timwis
Copy link

timwis commented Dec 16, 2019

@h-lame thanks for the quick reponse! We are indeed using the public mapit instance, so it would be great to know whether terminated postcodes are included in that one.

@dracos
Copy link
Member

dracos commented Dec 16, 2019

Umm, just checking - it doesn't look like we import terminated postcodes on mapit.mysociety.org at present, but also have not e.g. deleted postcodes that became terminated during the lifespan of the site, so it's not exactly consistent. I can't see a reason not to include older ones in that case, unless there's some overwhelming number of them, will look to include them in the next import.

@timwis
Copy link

timwis commented Dec 16, 2019

Thanks @dracos. According to the ONSPD documentation (Table 4 / Page 29), there are 872,377 terminated postcodes in the UK (including Norther Ireland, Channel Islands, etc.). Does that sound reasonable to include? Not sure the exact number of records overall in ONSPD, but the full csv file has 2,632,805 lines, so assuming that's the number of records, terminated postcodes represent 33% of it.

@chris48s
Copy link
Contributor

Keeping terminated postcodes around might not be as useful as you think it is. I don't really know the full mechanics of it, but a large number of them end up being assigned to Royal Mail sorting offices so your centroid point ends up not describing the centroid of the group of properties the postcode used to describe before it was terminated, but the location of the nearest Royal Mail sorting office. It might depend on the postcode type. To give a couple of examples:

(just for the lolz, have a look how many different postcodes in the mapit DB have a point somewhere in that one sorting office).

Broadly this situation already exists in mapit for the reason you've described (tbh I've always assumed you just import terminated postcodes because of the number that exist). I guess it depends on your viewpoint whether you see this as a problem or a desirable quality, but its worth thinking about if you're making a decision on what to do about terminated postcodes..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants