Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

60-81-identifiers #84

Merged
merged 6 commits into from
Jan 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 2 additions & 14 deletions src/main/asciidoc/api_specifications.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -270,14 +270,12 @@ A fixed list of possible values of a property can be specified using `enum`.
However, this may make it harder to change the list of possible values, as client applications will often depend on the specified list e.g. by using code generation.

`enum` SHOULD only be used when the list of values is unlikely to change or when changing it has a big impact on clients of the API.

Enumerated string values SHOULD be declared in lowerCamelCase, just as property names.
====

.Enum declaration
====
```YAML
state:
State:
type: string
enum:
- processing
Expand All @@ -286,17 +284,7 @@ state:
```
====

[.rule, caption="Rule {counter:rule-number}: "]
.String and integer types
====
When defining the type for a property representing a numerical code or identifier:

* if the values constitute a list of sequentially generated codes (e.g. gender ISO code), `type: integer` SHOULD be used. It is RECOMMENDED to further restrict the format of the type (e.g. `format: int32`).
* if the values are of fixed length or not sequentially generated, `type: string` SHOULD be used (e.g. Ssin, EnterpriseNumber). This avoids leading zeros to be hidden.

When using a string data type, each code SHOULD have a unique representation, e.g. don't allow representations both with and without a leading zeros or spaces for a single code.
If possible, specify a `pattern` with a regular expression restricting the allowed representations.
====
When defining a type for an identifier or code, like the above example, the guidelines under <<Identifier>> apply, even when not used as a URL path parameter of a document resource.

[[openapi-tools]]
=== Tools
Expand Down
3 changes: 2 additions & 1 deletion src/main/asciidoc/changelog.adoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
== Changelog
* 2021-xx-xx
* 2022-xx-xx
** new: guidelines when designing new <<Identifier,identifiers and codes>> or using existing numerical ones
** new: <<Service Unavailable>> problem type (http 503)
** added: use Retry-After HTTP header in <<Too Many Failed Requests>> and <<Too Many Requests>>
* 2021-06-24
Expand Down
189 changes: 184 additions & 5 deletions src/main/asciidoc/resources-document.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,193 @@ image::collection.png[]
When defining child resources, stick to concepts _within the same API_. An employer resource could contain child concepts as employees, debts, taxes, mandates, declarations, risks, but putting all these different concepts below a single document resource becomes unmanageable.
In that case *prefer associations and create <<links,links>>* to other APIs from the employer resource.

*Identity key*
=== Identifier

The _identity key_ is preferably a _natural business identifier_, uniquely identifying the business resource. If such key does not exist, a _surrogate or technical key_ (like http://tools.ietf.org/html/rfc4122[UUID^]) can be used.
[.rule, caption="Rule {counter:rule-number}: "]
.Choice of identifier
====
The _identity key_ of a document resource is preferably a _natural business identifier_ uniquely identifying the business resource like an ISBN number or SSIN. If such key does not exist, a _surrogate or technical key_ can be created.
====

New types of identifiers should be designed carefully. Once an identifier has been introduced, it may get widespread usage in various systems even beyond the scope for which it was initially designed, making it very hard to change its structure later on.

WARNING: Especially for unsecured resources, avoid technical keys that are easy to guess (for example sequential identifiers)
Designing identifiers in a URI structure, as specified in the https://github.com/belgif/thematic/blob/master/URI/iceg_uri_standard.md[ICEG URI standard], is useful as it makes the identifier context-independent and more self-descriptive. A REST API may choose to use a shorter API-local form of a URI identifier because of practical considerations.

When designing an identifier, various requirements may be of importance specific to the use case:

* the governance and lifecycle of identifiers and the entities they represent
* easy to memorize (e.g. textual identifier like problem types)
* input by user (e.g. web form, over phone/mail)
** easy to type (ignore special separator chars, difference between lower/capital case), limited length
** validation of typing errors, e.g. by checksum, fixed length, ...
** hint on format to recognize purpose of identifier based on its value
* printable (restricted length)
* open to evolve structure for new use cases
* ability to generate identifiers collision-free at multiple independent sources, e.g. by adding a source-specific prefix, using UUIDs, ...
* stable across different deployment environments (e.g. problem type codes)
* hide any business information (e.g. no sequential number that indicates number of resources created)
* not easy to guess a valid identifier, especially for unsecured resources (e.g. no sequentially generated identifier)
* easy to represent in URL parameter without escaping
* sortable (for technical reasons e.g. pagination)

[[new-identifiers]]
[.rule, caption="Rule {counter:rule-number}: "]
.Designing new identifiers
====
For new identifiers, a string based format SHOULD be used: textual lowerCamelCase string codes, http://tools.ietf.org/html/rfc4122[UUID^], URI or other custom formats. Take into account the requirements that follow from the ways the identifier will be used.

Each identifier MUST be represented by only one single string value, so that a string equality check can be used to test if two identifiers are identitical. This means that capitalization, whitespace or leading zeroes are significant.

In the OpenAPI data type for the identifier, a regular expression may be specified if helpful for input validation or as hint of the structure (e.g. to avoid whitespace or wrong capitalization), but shouldn't be too restrictive in order to be able to evolve the format.

No business meaning SHOULD be attributed to parts of the identifier. This should be captured in separate data fields. Parts with technical meaning like a checksum are allowed.
====

NOTE: Parts of an identifier may carry some business meaning for easier readability, like the problem type identifiers in this guide, but no application logic should parse and interpret these parts.

WARNING: Don't use database-generated keys as identity keys in your public API to avoid tight coupling between the database schema and API. Having the key independent of all other columns insulates the database relationships from changes in data values or database design (agility) and guarantees uniqueness.

.Identifiers
====
The table below lists some examples of identifiers, though it does not list all possibilities or considerations when designing a new identifier.
|===
| identifier structure | example | OpenAPI type | considerations

|UUID
| "d9e35127-e9b1-4201-a211-2b52e52508df"
a|
Type defined in https://github.com/belgif/openapi-common/blob/master/src/main/swagger/common/v1/common-v1.yaml[common-v1.yaml]
```YAML
Uuid:
description: Universally Unique Identifier, as standardized in RFC 4122 and ISO/IEC 9834-8
type: string
pattern: '^[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}$'
```
a|
long identifier,
not easy to memorize or input by user,
easy to generate,
resistant to brute-force guessing

| URI (URN)
| "urn:problem-type:belgif:resourceNotFound"
a|
```YAML
type: string
format: uri
pattern: "^urn:problem-type:.+$" # further restrictions may be possible
```
|
can be human readable,
long, not easy to input by user
| URI (http)
| "https://www.waterwegen.be/id/rivier/schelde"
a|
```YAML
type: string
format: uri
```
|
can be human readable,
long, not easy to input by user,
requires character escaping when used as URL parameter
can be generated collision-free by multiple sources (different domain name)

| custom format
| "ab12347895"
a|
```YAML
type: string
pattern: "^[a-z0-9]{1-20}$"
```
|
short,
easy to encode
|===
====

A _code_ is a special type of identifier:

* it has an exhaustive list of possible values that doesn't change frequently over time
* each value identifies a concept (examples: a country, a gender, ...).

[.rule, caption="Rule {counter:rule-number}: "]
.Designing new codes
====
New code types SHOULD be represented as string values in lowerCamelCase.

Depending on context, the OpenAPI data type may enumerate the list of allowed values (see <<enum-rule>>).
====

.Code
====
// TODO: replace PensionType fabricated example with a real one
`GET /refData/pensionTypes/{id}` with `id` of type `PensionType`

As string with enumeration:
```YAML
PensionType:
type: string
enum:
- retirementPension
- survivalPension
- guaranteedIncomeElderly
```

As string with regular expression:
```YAML
PensionType:
type: string
pattern: "^[A-Za-z0-9]+$"
example: "retirementPension"
```
====

[.rule, caption="Rule {counter:rule-number}: "]
.Representating existing numerical identifiers
====
When defining the type for a property representing an existing numerical code or identifier:

* Identifiers that are commonly represented (e.g. when displayed or inputted by a user) with *leading zeros* present SHOULD be represented using a string type. A regular expression SHOULD be specified in the OpenAPI data type to avoid erronous values (i.e. without leading zeros).
* Otherwise, use an integer based type. It is RECOMMENDED to further restrict the format of the type (e.g. `format: int32` and using `minimum`/`maximum`).

For new identifiers, it is not recommended to use a number type however as stated in <<new-identifiers>>
====

.Representing existing numerical identifiers
====
An employer ID may be of variable length. Leading zeroes are ignored and most of the time not displayed.
```YAML
EmployerId:
description: Definitive or provisional NSSO number, assigned to each registered employer or local or provincial administration.
type: integer
minimum: 0
maximum: 5999999999
example: 21197
```

If SSIN has a zero as first digit, it is always displayed.

```YAML
Ssin:
description: Social Security Identification Number issued by the National Register or CBSS
type: string
pattern: '^\d{11}$'
```

Country NIS code is a three digit code, the first digit cannot be a zero.

```YAML
CountryNisCode:
description: NIS code representing a country as defined by statbel.fgov.be
type: integer
minimum: 100
maximum: 999
example: 150 # represents Belgium
```

====

[.rule, caption="Rule {counter:rule-number}: "]
.Identifier name
====
Expand Down Expand Up @@ -106,9 +285,9 @@ a|
{
"self": "{API}/employers/93017373[/employers/93017373^]",
"name": "Belgacom",
"nssoNbr": 93017373,
"employerId": 93017373,
"company": {
"cbeNbr": 202239951,
"enterpriseNumber": "0202239951",
"href": "{API}/companies/202239951[/companies/202239951^]"
}
}
Expand Down