hyphenated housenumber parsing #204

missinglink · 2019-05-22T13:45:49Z

We have a conservative setting for parsing hyphenated house numbers.

ie. is 4-6 a 'number range' or a 'house number and apartment number'

In some countries such as Canada their postal authority recommends separating the house number and apartment number with a hyphen.
https://en.wikipedia.org/wiki/Address#Canada

If there is an apartment number it should be written before the house number and separated by a hyphen.

As we cannot reliably determine the postal addressing format, we discard the address rather than potentially corrupting the number series with an incorrect value.

We have tests for this behaviour here: https://github.com/pelias/interpolation/blob/master/test/lib/analyze.js#L41

It would probably be better to assume these numbers are ranges and then try to detect countries where hyphens are used to delimit apartment numbers (possibly via a bbox check) and then only apply the conservative logic for these countries.

The text was updated successfully, but these errors were encountered:

missinglink · 2019-05-22T13:48:27Z

we already have some configurable values to control how we handle hyphens:
https://github.com/pelias/interpolation/blob/master/lib/analyze.js#L1-L8

vicchi · 2020-07-15T16:30:49Z

@missinglink Putting my "UK centric addressing format" hat on for a second ...

When running ./interpolate to create address.db, based on a GB OSM extract, I see lots of messages like ...

could not reliably parse housenumber 6 & 8
could not reliably parse housenumber 104-114

... both of which make sense for the UK. Apartment/flat/unit numbers are (almost) always expressed as Apt 1 or Flat 1 followed by the rest of the address, so 104-114 is (almost) always a building number range. Of course, except when it's not. But mostly it is. This is not an exact science as you well know.

Also, given that a significant number of UK building number allocations follow odd numbers on the one side of the road and even numbers on the other, then 6 & 8 is really a range of two adjacent buildings. Except when they're not.

Looking at the constants at https://github.com/pelias/interpolation/blob/master/lib/analyze.js#L1-L8 ... I'd welcome some suggestions on what magic values to drop in here and tweak to make the interpolate script treat cases such as these as ranges as the data sets I'm using for Pelias are (currently) only for the UK?

missinglink · 2020-07-15T17:02:01Z

Hmm yeah so we can totally add country-specific logic, some potential issues adding that:

We don't actually know which country each address belongs to!
The function signatures would need to be updated to allow us to pass this info in, and possibly to return multiple values.

OSM has the concept of interpolation ranges, these are much more reliable and already supported out-of-the-box, as are TIGER ranges.

You should also consider just doing nothing, which I know sounds like an anti-solution but let me explain 😄

Interpolation ranges are only valuable when they are valid, if one or more erroneous members are introduced into the range then it can screw up most of the street.

However, if we have fewer points then we only lose out on precision, so a valid sparse index is probably preferable to a dense range with errors, if that makes sense?

Maybe you could send me an example of a street which you'd like to improve?

missinglink · 2020-07-15T17:08:12Z

Ugh the address coverage in the UK is just so bad, what ever happened to the OpenAddressesUK project and the rumours of Ordnance Survey opening some block range data up?

There's an interactive demo where you can click streets to see the coverage, which just proves how sparse the coverage is in the UK, even in London:

Maybe I missed a bunch of data in the last import?

vicchi · 2020-07-15T17:14:39Z

Here's a good example, which happens to be my local supermarket ... Tesco, 20-28 Broad St, Teddington TW11 8RF

AFAIK OpenAddresses UK almost got there, but then died due to claims of legal rights ov
er the data from an "organisation", which resulted in almost half of the data being excised, so the project ... expired. See also (cough cough) this.

There is a whole new load of OS open data coming this month as a result of the UK Geospatial Commission shaking things up, which I'm waiting eagerly to see just what gets released and whether I can a) use this in my Pelias instance and then (of course) b) contribute this back to Pelias. But right now ... I'm waiting

missinglink · 2020-07-15T17:15:26Z

END OF RANT and to answer your question, the easiest thing to do is split the data yourself, so a single row in your file becomes two rows, one is the beginning number and one is the end.

That's it, there is no added value in generating all the rest of the values within the range, they can be interpolated.

vicchi · 2020-07-15T17:16:56Z

That makes a lot of sense and I'll give that a try. Also, I appreciated the rant about open addressing data in the UK. I feel that way ... a lot

missinglink · 2020-07-15T17:19:01Z

Amazing, I've been waiting 6 years for this day, if/when that happens we should jump on a call.

vicchi · 2020-07-15T17:39:38Z

@missinglink Hmm ... three new open data sets are now up on the new OS Data Hub, Open TOIDs, Open UPRNs and Open USRNs ... sadly I'm underwhelmed at first glance. Not what I'd hoped for. It's just indentifiers which are linkages into proprietary data sets such as AddressBase and MasterMap ... https://osdatahub.os.uk/downloads/open

missinglink · 2020-07-15T17:43:22Z

👑 📧 is the 😈

missinglink · 2020-07-15T17:46:32Z

What I would love to have (at minimum) is 4 house numbers per street, just the start-left, start-right, end-left & end-right house numbers, from this we can figure out quite a lot, and if those 4 points also had postal code info then this would make a huge difference.

What I'm describing is the TIGER file I'm using for the USA, to some degree we could delete all of OSM and OA for the USA and it wouldn't be too bad.

vicchi · 2020-07-15T18:36:18Z

Hmmm ... with some custom preprocessing and tooling you might be able to cobble that together from (off the top of my head) ONS PD, OS OpenNames, CodePoint Open and OpenRoads and WOF. Maybe. Plus some interpolation into OSM. Though that may veer into (OSM ODbL) derived data set licensing horrendousness.

vicchi · 2020-07-15T18:37:01Z

But that would only be for England and Wales. Maybe for Scotland too. But definitely not for Northern Ireland. Because history and politics.

missinglink · 2020-07-16T09:38:41Z

Hmm... I had a quick look at those data sources today and I couldn't find anything more granular than a street 😢

vicchi · 2020-07-16T16:52:31Z

@missinglink There's a conversation about UK open data, probably mainly about admin polygons, going on over on Gitter which would be good to get your take on when you have a moment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hyphenated housenumber parsing #204

hyphenated housenumber parsing #204

missinglink commented May 22, 2019 •

edited

Loading

missinglink commented May 22, 2019

vicchi commented Jul 15, 2020

missinglink commented Jul 15, 2020 •

edited

Loading

missinglink commented Jul 15, 2020

vicchi commented Jul 15, 2020 •

edited

Loading

missinglink commented Jul 15, 2020

vicchi commented Jul 15, 2020

missinglink commented Jul 15, 2020

vicchi commented Jul 15, 2020

missinglink commented Jul 15, 2020

missinglink commented Jul 15, 2020 •

edited

Loading

vicchi commented Jul 15, 2020

vicchi commented Jul 15, 2020

missinglink commented Jul 16, 2020

vicchi commented Jul 16, 2020

hyphenated housenumber parsing #204

hyphenated housenumber parsing #204

Comments

missinglink commented May 22, 2019 • edited Loading

missinglink commented May 22, 2019

vicchi commented Jul 15, 2020

missinglink commented Jul 15, 2020 • edited Loading

missinglink commented Jul 15, 2020

vicchi commented Jul 15, 2020 • edited Loading

missinglink commented Jul 15, 2020

vicchi commented Jul 15, 2020

missinglink commented Jul 15, 2020

vicchi commented Jul 15, 2020

missinglink commented Jul 15, 2020

missinglink commented Jul 15, 2020 • edited Loading

vicchi commented Jul 15, 2020

vicchi commented Jul 15, 2020

missinglink commented Jul 16, 2020

vicchi commented Jul 16, 2020

missinglink commented May 22, 2019 •

edited

Loading

missinglink commented Jul 15, 2020 •

edited

Loading

vicchi commented Jul 15, 2020 •

edited

Loading

missinglink commented Jul 15, 2020 •

edited

Loading