this is largely based on [elasticsearch-phone] and internally uses
[libphonenumber].
this intentionally only ports a subset of the features: only `phone` and
`phone-search` are supported right now, `phone-email` can be added
if/when there's a clear need for it.
using `libphonenumber` is required since parsing phone numbers is a
non-trivial task (even though it might seem trivial at first glance!),
as can be seen in the list [falsehoods programmers believe about phone
numbers][falsehoods].
this allows defining the region to be used when analysing a phone
number. so far only the generic "unkown" region (`ZZ`) had been used
which worked as long as international numbers were prefixed with `+` but
did not work when using local numbers (e.g. a number stored as
`+4158...` was not matched against a number entered as `004158...` or
`058...`).
example configuration for an index:
```json
{
"index": {
"analysis": {
"analyzer": {
"phone": {
"type": "phone"
},
"phone-search": {
"type": "phone-search"
},
"phone-ch": {
"type": "phone",
"phone-region": "CH"
},
"phone-search-ch": {
"type": "phone-search",
"phone-region": "CH"
}
}
}
}
}
```
this creates four analyzers: `phone` and `phone-search` which do not
explicitly specify a region and thus fall back to `ZZ` (unknown region,
regional version of international dialing prefix (e.g. `00` instead of
`+` in most of europe) will not be recognised) and `phone-ch` and
`phone-search-ch` which will try to parse the phone number as a swiss
phone number (thus e.g. `00` as a prefix is recognised as the
international dialing prefix).
note that the analyzer is (currently) not meant to find phone numbers in
large text documents - instead it should be used on fields which contain
just the phone number (though extra text will be ignored) and it
collects the whole content of the field into a `String` in memory,
making it unsuitable for large field values.
this has been implemented in a new plugin which is however part of the
central opensearch repository as it was deemed too big an overhead to
have it in a separate repository but not important enough to bundle it
directly in `analysis-common` (see the discussion on the issue and the
PR for further details).
note that the new plugin has been added to the exclude list of the
javadoc check as this check is overzealous and also complains in many
cases where it shouldn't (e.g. on overridden methods - which it should
theoretically not do - or constructors which don't even exist). the
check first needs to be improved before this exclusion could be removed.
closes opensearch-project#11326
[elasticsearch-phone]: https://github.com/purecloudlabs/elasticsearch-phone
[libphonenumber]: https://github.com/google/libphonenumber
[falsehoods]: https://github.com/google/libphonenumber/blob/master/FALSEHOODS.md
Signed-off-by: Ralph Ursprung <[email protected]>