Review Updates

- Added style guidelines as an asset. - Linked these guidelines in the specification section - Moved the large PR link to the reference implementation section - Added reference to protoc-gen-doc - Added two rejected ideas - Added paragraph breaks in two places - Updated style guidelines to mention PBJ and Map field concerns. Signed-off-by: Joseph Sinclair <[email protected]>
hashgraph · Oct 1, 2024 · 23d9432 · 23d9432
1 parent 87d5e01
commit 23d9432
Show file tree

Hide file tree

Showing 2 changed files with 333 additions and 6 deletions.
diff --git a/HIP/hip-1037.md b/HIP/hip-1037.md
@@ -9,7 +9,7 @@ needs-council-approval: Yes
 status: Review
 created: 2024-08-09
 discussions-to: https://github.com/hashgraph/hedera-improvement-proposal/discussions/1037
-updated: 2024-09-18
+updated: 2024-10-01
 ---
 
 ## Abstract
@@ -30,6 +30,7 @@ consensus node software without extensive reference to the existing source
 code. Second, developers are often unclear how best to interact with the
 Hedera API, and independent SDK developers, in particular, may struggle to
 correctly implement these API calls.
+
 As part of expanding the Hedera ecosystem we MUST reduce the difficulty and
 hurdles for independent developers when building systems that interact with
 the Hedera network. Clear and accurate API specifications are one element
@@ -39,6 +40,7 @@ required to accomplish this goal.
 This HIP improves the developer experience and enhances decentralization by
 making it easier to design distributed applications or alternative
 implementations of core network systems.
+
 The use of a markdown format that is automatically generated from the
 functional protocol buffer definition files encourages more frequent updates
 that are more closely tied to the functional updates to the API documents, and
@@ -59,10 +61,12 @@ reduces the inevitable "drift" between functional API and API specification.
 This HIP makes sweeping changes to specification text for the existing API
 documents. These changes affect over 20,000 lines of text across almost 150
 files, which is excessive for a document such as this HIP. The full
-specification, therefore, is only linked here for brevity and clarity.<br/>
+implementation, therefore, is only linked here for brevity and clarity.<br/>
 The specification changes intended for this HIP are detailed in github as a
-[pull request](https://github.com/hashgraph/hedera-protobufs/pull/388)
-for the hedera-protobufs repository.
+pull request linked [later in this document](#reference-implementation).
+The core of this HIP is, however,
+[the guidelines](../assets/hip-1037/Specification-Format-Style-Guidelines.md)
+recommended for this, and all future Hedera protocol buffer specifications.
 
 ## Backwards Compatibility
 This HIP does not change any functional characteristic, and documents the
@@ -79,15 +83,28 @@ specification help everyone involved to develop distributed applications
 more quickly and with greater confidence.
 
 ## Reference Implementation
-Not applicable
+We have a
+[pull request](https://github.com/hashgraph/hedera-protobufs/pull/388)
+that implements these guidelines across the full hedera-protobufs repository.
 
 ## Rejected Ideas
-Not applicable
+1. Use of a highly formal "language" for specification
+   * This was deemed excessive.  Protocol Buffers are already a highly formal
+     language, and adding specification as markdown comments, using the 
+     light weight RFC/HIP language is sufficient while remaining approachable
+     for the largest practical number of community members.
+1. Addition of specialized "tag" elements to specification comments.
+   * This would have required building a complete custom documentation
+     processor and would have been non-standard.  Rather than do so we chose
+     to adhere closely to existing standard approaches for protocol buffers.
+
 
 ## Open Issues
 None.
 
 ## References
+- Protocol Buffer Document Generator
+  [protoc-gen-doc](https://github.com/pseudomuto/protoc-gen-doc/?tab=readme-ov-file#protoc-gen-doc).
 
 ## Copyright/license
 This document is licensed under the Apache License, Version 2.0 --

diff --git a/assets/hip-1037/Specification-Format-Style-Guidelines.md b/assets/hip-1037/Specification-Format-Style-Guidelines.md
@@ -0,0 +1,310 @@
+# Protocol Buffer Style Guidelines
+
+## Style Guidelines for "specification" comments
+- Wrapping is 80 characters for top-level messages.
+- Wrap everything else to 80 characters or 100, depending
+   - Prefer to wrap lines to 80 characters
+   - It is OK to go over 80, but less than 100, if it makes the
+     text clearer and easier to read.
+     - This SHOULD be rare, however.
+   - If you have links or tags, wrapping at 100 characters for clarity
+     is acceptable.
+   - We choose 80 characters here to work better in reviews because Github
+     comment boxes are limited to roughly 80 characters width. The primary
+     "code" interaction with proto files and markdown files is in review.
+     Also, some markdown processors struggle with longer lines.
+- Use blank lines (with `*` prefix) for paragraph breaks in message or
+  enum declarations, regardless of where defined.
+   - Use `<p>` tags for _fields_, and _no_ "empty" lines (it breaks the tables).
+- Use HTML tags where appropriate, but stick to markdown as much as possible.
+- Use correct US-English spelling and grammar to the maximum extent possible.
+   - Write proper and complete sentences, avoid partial sentences and
+     unnecessary abbreviations.
+   - Do not use partial words (e.g. instead of "spec" write "specification").
+   - Explain all acronyms on first use in a document.
+      - Use "Hedera Consensus Service(HCS)" the first time "HCS" is written
+        but "HCS" may be used thereafter.
+      - Extremely common acronyms (e.g. RAM or LASER) do not need
+        to be expanded.
+- First line of each document block is "what is this", and should be a single
+  sentence.  Examples: "An identifier for an account." or "A `tokenTransfer`
+  transaction request body.".
+   - Avoid "this is a thing" (or shorter "thing") type sentences,
+     write "A thing" instead.
+      - This applies for the _first sentence_ only, **not** elsewhere.
+        The first sentence is a "what is this" answer, not full description
+        and mostly just translates from "code" syntax to plain English.
+   - Minimize detail here, qualifiers, usage, etc... belong in the brief
+     description or requirements, not here.
+      - A good rule of thumb is that if this first sentence exceeds 60
+        characters, then it is trying to describe too much.
+- Include a line break (`<br/>`) after the first line, then a brief description,
+  if appropriate.
+   - Not every field needs additional description, but _every_ `message` does.
+   - Descriptions may be detailed, but should not include internal
+     implementation details or requirements. The description _should_ be
+     detailed enough to explain to an external contributor what the
+     message or field contains, where it fits within the overall network
+     system, and why it is needed.
+      - This is a long-term goal; we have a long way to go to get to this
+        level of description. Every PR to change `.proto` files should seek
+        to improve the documentation for that file.
+   - Avoid describing what must, should, must not, or should not be
+     sent, or how the field values affect output. Those items belong in the
+     _requirements_, instead.
+- After a paragraph break (a blank line for messages), one requirement per
+  line. Use break tags at the end of each requirement except the last.
+   - In field comments a `<p>` tag is _required_ for paragraph breaks.
+     Blank lines break the automated documentation generator in **fields**
+     because fields are output as markdown tables, and markdown tables require
+     each row to be a single long line.
+- Be _prescriptive_ in requirements.  Do not _describe_ the behavior.
+   - Declare what MUST be done for valid inputs, and what SHALL be present in
+     valid outputs (or in state messages).
+   - Declare what SHALL happen as a result of a successful, or failed,
+     transaction.
+   - The word `will` usually means `SHALL` or `MUST`.
+   - The word `is` often means `SHALL be`.
+   - Document both positive (`SHALL` or `MUST`) and negative (`SHALL NOT` or
+     `MUST NOT`) requirements.
+   - Avoid "perfect" or "imperfect" verb aspects and "wiggle" words
+     (might, could, perhaps, etc...).
+      - The specification items MAY, SHOULD, and RECOMMENDED serve to fill
+        the _occasional_ need for less imperative requirements.
+   - Also avoid "progressive" verbs (e.g. working, writing, going, deleting).
+- Requirements should be short and focused.
+   - Better to have three separate lines for three items than one longer
+     line with three clauses.
+   - Duplicating text across lines with slightly different requirements
+     is not only acceptable, but often recommended.
+   - Well written specification tends to be much more short and clipped
+     sentences than "normal" prose. Some describe the flow as "staccato".
+- Add another blank line after requirements for `message` documents only.
+- Each message that represents a transaction body **must** document the
+  Block Stream effects with a heading `### Block Stream Effects`.
+- General notes go last, for `message` only, with a heading
+  `#### Additional Notes`.
+
+## Content Guidelines
+
+### General `proto` File Guidelines
+- Protocol buffers are compiled for the `hedera-services` codebase using a
+  custom processor 'PBJ'. While no other entity is required to use `PBJ`, all
+  protocol buffer files MUST be compatible with `PBJ` processing.
+
+### Field Type Guidelines
+- There are some negative patterns present in existing `proto` files.  These
+  SHOULD NOT be replicated in new files, when possible.
+   - Older files may use `int64` or `int32` when a `uint` is more appropriate,
+     and document a "non-negative" requirement.  This SHOULD NOT be replicated.
+     in new content (files, messages, fields) we SHOULD use _unsigned_ numeric
+     types where appropriate.
+      - We MUST NOT change existing field types without very careful
+        consideration for binary compatibility and impact to clients.
+- Protocol buffer encoding for Hedera MUST be deterministic.
+   - For this reason `Map' fields CANNOT be used because most protocol buffer
+     implementations are _intentionally_ non-deterministic in
+     serializing and parsing those fields.
+   - Unknown fields are, likewise, not handled deterministically by most
+     protocol buffer implementations, so unknown fields MUST NOT be used.
+      - One exception here is forward compatibility, as implemented in the
+        `PBJ` processor, which is supported in selected scenarios where
+        deterministic behavior is not strictly required.
+
+### Package Directive Guidelines
+- There are some negative patterns present in existing `proto` files.  These
+  SHOULD NOT be replicated in new files, when possible.
+   - Older files have a `package proto;` directive.  This is leftover from
+     _sample_ content, and SHOULD NOT be used in new files.
+      - To avoid major compatibility issues, particularly for SDK authors, we
+        SHOULD NOT change the `package` directive in _existing_ files.
+- For _new_ `proto` files only, the file SHOULD use a `package` directive with
+  a package that is identical to the value in the
+  `pbj.java_package` pseudo-directive.
+   - One exception to this guideline is that gRPC `service` and `rpc`
+     definitions include the package name in the url path.  As a result we
+     cannot easily use full package names for these.
+      - Completely new services and gRPC endpoints SHOULD be defined alone in a
+        separate `thing_service.proto` file _without_ transaction bodies. The
+        endpoint MAY, then, be defined in a proper package.
+      - Transaction body messages and any component messages SHOULD be defined
+        in a proper package, and references in the gRPC service definition may
+        then use the full package prefix when referring to those messages.
+   - The "namespace" behavior of package names is not well supported by PBJ at
+     this time, so packages must be fully specified whenever used.
+
+
+## Examples
+
+### A Message Example
+```protobuf
+/**
+ * A transaction body to add a new consensus node to the network address book.
+ *
+ * This transaction body SHALL be considered a "privileged transaction".
+ *
+ * This message supports a transaction to create a new node in the network
+ * address book. The transaction, once complete, enables a new consensus node
+ * to join the network, and requires governing council authorization.
+ *
+ * - A `NodeCreateTransactionBody` MUST be signed by the governing council.
+ * - A `NodeCreateTransactionBody` MUST be signed by the `Key` assigned to the
+ *   `admin_key` field.
+ * - The newly created node information SHALL be added to the network address
+ *   book information in the network state.
+ * - The new entry SHALL be created in "state" but SHALL NOT participate in
+ *   network consensus and SHALL NOT be present in network "configuration"
+ *   until the next "upgrade" transaction (as noted below).
+ * - All new address book entries SHALL be added to the active network
+ *   configuration during the next `freeze` transaction with the field
+ *   `freeze_type` set to `PREPARE_UPGRADE`.
+ *
+ * ### Block Stream Effects
+ * Upon completion the newly assigned `node_id` SHALL be recorded in
+ * the transaction receipt.<br/>
+ * This value SHALL be the next available node identifier.<br/>
+ * Node identifiers SHALL NOT be reused.
+ */
+```
+
+### A Field Example
+```protobuf
+    /**
+     * A list of service endpoints for gossip.
+     * <p>
+     * These endpoints SHALL represent the published endpoints to which other
+     * consensus nodes may _gossip_ transactions.<br/>
+     * These endpoints MUST specify a port.<br/>
+     * This list MUST NOT be empty.<br/>
+     * This list MUST NOT contain more than `10` entries.<br/>
+     * The first two entries in this list SHALL be the endpoints published to
+     * all consensus nodes.<br/>
+     * All other entries SHALL be reserved for future use.
+     * <p>
+     * Each network may have additional requirements for these endpoints.
+     * A client MUST check network-specific documentation for those
+     * details.<br/>
+     * If the network configuration value `gossipFqdnRestricted` is set, then
+     * all endpoints in this list MUST supply only IP address.<br/>
+     * If the network configuration value `gossipFqdnRestricted` is _not_ set,
+     * then endpoints in this list MAY supply either IP address or FQDN, but
+     * MUST NOT supply both values for the same endpoint.
+     */
+```
+
+### An Enum Example
+```protobuf
+/**
+ * An informational enumeration of all known states.
+ * This enumeration is included here So that people know the mapping from
+ * integer to state "name".
+ *
+ * State changes are expressed in terms of changes to named states at the
+ * high level conceptual model of the state type like map key/values or
+ * queue appends. To say which state the change is on we will include an
+ * `integer` number for the state name. This is done for performance and
+ * efficiency as there will be 10s of thousands of state changes in a block.
+ *
+ * We use an integer, and provide this enumeration, for the following reasons.
+ * - If we have a extra 8-10 bytes per state change at 40-50K state changes
+ *   per second then that is an extra 2.5-4 megabits of bandwidth. Compression
+ *   should help a lot but that is not guaranteed.
+ * - When the state name is used as part of complex key in the big state
+ *   merkle map. The smaller the key is, in bytes, the more efficient the
+ *   database is, because more keys can fit in a single disk page.
+ * - When parsing keys, parsing a UTF-8 string to a Java String is a many
+ *   times more expensive than parsing a VARINT to an integer.
+ *
+ * Note: This enumeration is never transmitted directly in the block stream.
+ * This enumeration is provided for clients to _interpret_ the value
+ * of the `StateChange`.`state_id` field.
+ */
+```
+
+### A Complex Message Example
+```protobuf
+/**
+ * A single Account in the Hedera distributed ledger.
+ * 
+ * Each Account SHALL have a unique three-part identifier, a Key, and one
+ * or more token balances.<br/>
+ * Each Account SHALL have an alias, which has multiple forms, and MAY be set automatically.<br/>
+ * Several additional items SHALL be associated with the Account to enable
+ * full functionality.<br/>
+ * Assets SHALL be represented as linked-lists with only the "head" item
+ * referenced directly in the Account, and the remaining items SHALL be
+ * accessible via the token relation or unique tokens maps.<br/>
+ * Accounts, as most items in the network, SHALL have an expiration time,
+ * recorded as seconds since the epoch, and MUST be "renewed" for a small fee
+ * at expiration. This helps to reduce the amount of inactive accounts retained
+ * in state.<br/>
+ * Another account MAY be designated to pay any renewal fees and automatically
+ * renew an account for (by default) 30-90 days at a time as a means to
+ * optionally ensure important accounts remain active.<br/>
+ * Accounts MAY participate in securing the network by "staking" the account
+ * balances to a particular network node, and receive a portion of network
+ * fees as a reward. An account MAY optionally decline these rewards but still
+ * stake its balances.<br/>
+ * An account MAY optionally require that inbound transfer transactions be
+ * signed by that account as receiver
+ * (in addition to the sender's signature).<br/>
+ * As with all network entities, Account ID SHALL be represented as
+ * shard.realm.X.<br/>
+ * Alias and contractId SHALL be additional identifiers used to connect accounts
+ * to transactions before the account is fully enabled,
+ * or in EVM contracts.<br/>
+ *
+ * ---
+ *
+ * #### Alias
+ * There is considerable complexity with `alias` (aka `evm_address`) for
+ * Accounts. Much of this comes from the existence of a "hidden" alias for
+ * almost all accounts, and the reuse of the alias field for both EVM reference
+ * and "automatic" account creation.
+ *
+ * For the purposes of this specification, we will use the following terms for
+ * clarity.
+ *   - `key_alias` is the account public key as a protobuf serialized message
+ *     and used for auto-creation and subsequent lookup. This is only valid if
+ *     the account key is a
+ *     single `primitive` key, either ED25519 or ECDSA_SECP256K1.
+ *   - `evm_address` exists for every account and is one of
+ *      - `contract_address`, which is the 20 byte EVM contract address per
+ *        EIP-1014
+ *      - `evm_key_address`, which is the keccak-256 hash of a ECDSA_SECP256K1
+ *        `primitive` key.
+ *         - This is for accounts lazy-created from EVM public keys, when the
+ *           corresponding ECDSA_SECP256K1 public key is presented in a
+ *           transaction signed by the private key for that public key, the
+ *           account is created that key assigned, and the protobuf-serialized
+ *           form is set as the account alias.
+ *      - `long_zero`, is a synthetic 20 byte address inferred for "normally"
+ *        created accounts. It is constructed from the "standard" AccountID as
+ *        follows.
+ *         - 4 byte big-endian shard number
+ *         - 8 byte big-endian realm number
+ *         - 8 byte big-endian entity number
+ *
+ * The `alias` field in the `Account` message SHALL contain one of four values
+ * for any given account.
+ *   - The `key_alias`, if the account was created by transferring HBAR to the
+ *     account referenced by `key_alias`.
+ *   - The `evm_key_address` if the account was created from an EVM public key
+ *   - The `contract_address` if the account belongs to an EVM contract
+ *   - Not-Set/null/Bytes.EMPTY (collectively `null`) if the account was
+ *     created normally
+ *
+ * If the `alias` field of an `Account` is any form of `null`, then the account
+ * MAY be referenced by `alias` in an `AccountID` by using the `long_zero`
+ * address for the account. This "hidden default" alias SHALL NOT be stored,
+ * but is synthesized by the node software as needed, and may be synthesized by
+ * an EVM contract or client software as well.
+ *
+ * An AccountID in a transaction MAY reference an `Account` with
+ * `shard`.`realm`.`alias`.<br/>
+ * If the account `alias` field is set for an Account, that value SHALL be the
+ * account alias.<br/>
+ * If the account `alias` field is not set for an Account, the `long_zero`
+ * alias SHALL be the account alias.
+ */
+```