Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown updates for 3.0.4 #3932

Merged
merged 5 commits into from
Aug 2, 2024
Merged

Conversation

lornajane
Copy link
Contributor

@lornajane lornajane commented Jun 26, 2024

We need to handle the changes to anchor links and headings (see #3548 for context). We also have #3596 to add tooling, which I will work on after we've done this initial cleanup.

The commits tell their own story, but this is all very repeatable and hopefully transferrable between branches. Shout out to @handrews who got me started with the anchor/link rewriting script.

The process goes like this:

  1. Run prettier for formatting

    • prettier --write --single-quote 3.0.4.md
  2. Run markdownlint to fix whatever it can

    • markdownlint --fix 3.0.4.md
    • Configuration file .markdownlint.yaml:
            # Unordered list symbol
            MD004:
              style: asterisk
      
            # Unordered list indentation
            MD007:
              indent: 2
      
            MD012: false # allow blank lines
      
            MD013:
              line_length: 800
              tables: false
      
            MD024: false # duplicate headings
            MD033: false # inline HTML
  3. Manually fix additional markdownlint problems

    • heading levels aren't continuous MD001
    • code fences need a language MD040
    • table has the wrong number of cells MD056
  4. Take out the table of contents, we don't need it since both GitHub and respec render these automatically

  5. Run a magical one-off script to update/fix/rewrite/remove all our anchors and internal links. Basic idea:

    • make all our anchor links kebab-case
    • update all other internal document links
    • remove the ones that are in headings
    • add a lookup/override to fix things that either were inflected
      differently or we were using an anchor link that didn't match the
      title
    • (script is in comment)
    • use markdownlint again to check that this all worked because it can check internal links

It's 850 lines of change, I don't know how we're going to review it, but take a look!

@lornajane
Copy link
Contributor Author

Super special python script:

from sys import argv
from pathlib import Path
import re

# this script tries to inflect the old links, some are just missing and we need to use the title's version instead
updates = {}
updates["revision-history"] = "appendix-a-revision-history"
updates["data-type-conversion"] = "appendix-b-data-type-conversion"
updates["using-r-f-c6570-implementations"] = "appendix-c-using-rfc6570-implementations"
updates["serializing-headers-and-cookies"] = "appendix-d-serializing-headers-and-cookies"
updates["percent-encoding-and-form-media-types"] = "appendix-e-percent-encoding-and-form-media-types"
updates["document-structure"] = "openapi-description-structure"
updates["oas-object"] = "openapi-object"
updates["components-security-schemes"] = "security-scheme-object"
updates["schema-composition"] = "composition-and-inheritance-polymorphism"
updates["http-codes"] = "http-status-codes"
updates["oas-document"] = "openapi-description"
updates["rich-text"] = "rich-text-formatting"
updates["relative-references"] = "relative-references-in-urls"
updates["runtime-expression"] = "runtime-expressions"
updates["runtime-expression-examples"] = "examples"



def kebab_it(c):
    if c.lower() != c: 
        return f'-{c.lower()}'
    return c

if __name__ == '__main__':
    text = Path(argv[1]).read_text()

    names = {}
    removals = {}
    for match in re.finditer(r'\n(.*)<a name="([^"]*)"', text):
        name = match.group(2)
        names[name] = ''.join([kebab_it(c) for c in name])

        # was it a heading? file it for removal
        if len(match.group(1)) and match.group(1)[0] == "#":
            removals[name] = True

    for current, replacement in names.items():
        if replacement in updates:
            replacement = updates[replacement]
        text = text.replace(f'(#{current})', f'(#{replacement})')

        # only remove if removal is indicated, otherwise update
        if current in removals:
            text = text.replace(f'<a name="{current}"></a>', '')
        else:
            text = text.replace(f'<a name="{current}"></a>', f'<a name="{replacement}"></a>')
            

    print(text)

Run it like: python kebab_it.py 3.0.4.md > temp.md and then if temp.md looks good, copy it back over 3.0.4.md

Copy link
Member

@handrews handrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few quick first impressions...

versions/3.0.4.md Outdated Show resolved Hide resolved
Comment on lines 246 to 284
* `binary` is used where unencoded binary data is allowed, such as when sending a binary payload as an HTTP message body, or as part of a `multipart/*` payload that allows binary parts
* `byte` is used where binary data is embedded in a text-only format such as `application/json` or `application/x-www-form-urlencoded`
- `binary` is used where unencoded binary data is allowed, such as when sending a binary payload as an HTTP message body, or as part of a `multipart/*` payload that allows binary parts
- `byte` is used where binary data is embedded in a text-only format such as `application/json` or `application/x-www-form-urlencoded`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit puzzled by this list indicator change. I feel like we have always consistently used * for this and the change introduces a lot of noise. It's not a hill I want to die on, but I'd prefer * (both because it's what I'm used to scaning visually for when I edit, and to trim down this diff)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- has evolved as a standard in markdown tools, the linter actually checks that the lists are consistent but I can totally enforce * instead if that's our normal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- has evolved as a standard in markdown tools

UGH why???? I'm a bit reluctant to start introducing digressions. I do think * makes lists much easier to read in the source, but maybe that's because I'm more used to it. It's just... bullet points... * looks like a bullet point, - does not!

Comment on lines -182 to 148
Source | Target | Alternative
------ | ------ | -----------
[Security Requirement Object](#securityRequirementObject) `{name}` | [Security Scheme Object](#securitySchemeObject) name under the [Components Object](#componentsObject) | _n/a_
[Discriminator Object](#discriminatorObject) `mapping` _(implicit, or explicit name syntax)_ | [Schema Object](#schemaObject) name under the Components Object | `mapping` _(explicit URI syntax)_
[Operation Object](#operationObject) `tags` | [Tag Object](#tagObject) `name` (in the Components Object) | _n/a_
[Link Object](#linkObject) `operationId` | [Path Item Object](#pathItemObject) `operationId` | `operationRef`
| Source | Target | Alternative |
| -------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- | --------------------------------- |
| [Security Requirement Object](#security-requirement-object) `{name}` | [Security Scheme Object](#security-scheme-object) name under the [Components Object](#components-object) | _n/a_ |
| [Discriminator Object](#discriminator-object) `mapping` _(implicit, or explicit name syntax)_ | [Schema Object](#schema-object) name under the Components Object | `mapping` _(explicit URI syntax)_ |
| [Operation Object](#operation-object) `tags` | [Tag Object](#tag-object) `name` (in the Components Object) | _n/a_ |
| [Link Object](#link-object) `operationId` | [Path Item Object](#path-item-object) `operationId` | `operationRef` |

Copy link
Member

@handrews handrews Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about this sort of table re-formatting. On the one hand, it appeals to my sense of aesthetics. On the other, it's very hard to get people to maintain it correctly. I've gotten rather used to not bothering with consistent widths most of the time. But I'm sure I bothered with it at least once, so... idk.

For me, it boils down to: "If a new contributor messes up this table spacing, will we block their PR until they fix it, and is that the sort of contributor experience we want to create?"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although if this is being done by prettier or some other script that we can set up to run as a pre-commit hook, then I'm 100% fine with it (and even the list item change, although I really do find * much easier to spot when scanning text, as it is rarely used for anything other than lists, while - is used for many, many things and has less visual weight as weel).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is all automatic, a couple of tables had varying numbers of columns but otherwise this is all auto-fix and we can get pre-commit and/or CI to do it or at least check it

@handrews handrews added the editorial Wording and stylistic issues label Jun 27, 2024
@handrews handrews added this to the v3.0.4 milestone Jun 27, 2024
@mikekistler
Copy link
Contributor

Do we need to keep the TOC? GitHub creates one for us ...

image

@handrews
Copy link
Member

@mikekistler we've said the HTML rendering is authoritative now, so yes. (but it's a worthwhile question!)

Copy link
Contributor

@miqui miqui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone through the entire PR ! Formatting looks good. Nice @lornajane !!

@lornajane
Copy link
Contributor Author

About the TOC, if I understood the thread correctly, respec builds a TOC as well, so we do not need one in our source at all

@lornajane lornajane force-pushed the fix-markdown branch 2 times, most recently from e360e30 to abeb7f3 Compare July 1, 2024 20:15
@lornajane
Copy link
Contributor Author

I'm not sure respec does the table of contents in a way we can use, so I've regenerated it in this update. Thanks @ralfhandl for getting into the respec details, I hadn't got there yet!

Changed items:

  • rebased to pick up newest additions to the dev branch
  • switched required bullet point format to * and make the toc command use this format as well
  • reapplied all other changes

It did occur to me that we could keep the <!-- toc --> notation in the source and just generate that bit when we're building the HTML version. It's the last commit on the branch here anyway, so super easy to remove.

@ralfhandl
Copy link
Contributor

I'm not sure respec does the table of contents in a way we can use, so I've regenerated it in this update.

Our HTML build script removes the table of contents from the Markdown, the ToC we see for example in https://spec.openapis.org/oas/latest.html is generated by ReSpec from the section headlines.

Here's the relevant part of our build script:

if (line.startsWith('## Table of Contents')) inTOC = true;
if (line.startsWith('<!-- /TOC')) inTOC = false;
if (inTOC) line = '';

Every line between lines starting with ## Table of Contents and <!-- /TOC is removed, including these lines.

With the current change from <!-- /TOC to <!-- tocstop --> everything except the first 18 lines would be removed unless we adjust the build script to the new ToC tool.

A better way forward would be to completely remove the "Table of Contents" subsection because

  • we don't need it for producing the published HTML
  • we don't need it for viewing the raw *.md files in GitHub because the GitHub Markdown viewer now has an Outline pane on the right with an auto-generated table of contents.

@@ -17,80 +17,35 @@ For examples of OpenAPI usage and additional documentation, please visit [learn.
For extension registries and other specifications published by the OpenAPI Initiative, as well as the authoritative rendering of this specification, please visit [spec.openapis.org](https://spec.openapis.org/).

## Table of Contents
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the heading

@lornajane
Copy link
Contributor Author

Should also tackle #1720

miqui
miqui previously approved these changes Jul 16, 2024
mikekistler
mikekistler previously approved these changes Jul 17, 2024
Copy link
Contributor

@mikekistler mikekistler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. 👍

I reviewed this by performing all the automated changes myself on a fresh branch and then reviewing the (much smaller) diff.

Thank you @lornajane for this work.

@mikekistler
Copy link
Contributor

My notes on the process that @lornajane put in the PR description:

get checkout v3.0.4-dev
git checkout -b fix-markdown
cd versions

  1. Run prettier for formatting

npx prettier --write --single-quote 3.0.4.md

  1. Run markdownlint to fix whatever it can

Create markdown-lint.yaml with contents given, then

npx markdownlint-cli --config markdown-lint.yaml --fix 3.0.4.md

  1. Take out the table of contents and its comments, replace with a single

awk '/## Table of Contents/{f=1} //{f=0; print ""; next} {if (f==0) {print}} ' 3.0.4.md > temp.md; mv temp.md 3.0.4.md

  1. Run a magical one-off script to update/fix/rewrite/remove all our anchors and internal links.

python kebab_it.py 3.0.4.md > temp.md; mv temp.md 3.0.4.md
npx markdownlint-cli --fix 3.0.4.md

  • use markdownlint again to check that this all worked because it can check internal links

npx markdownlint-cli --config markdown-lint.yaml --fix 3.0.4.md

@lornajane lornajane dismissed stale reviews from mikekistler and miqui via 531253e August 1, 2024 18:53
@lornajane lornajane marked this pull request as ready for review August 1, 2024 19:02
@lornajane lornajane requested review from miqui, mikekistler and a team August 1, 2024 19:04
@lornajane
Copy link
Contributor Author

Updated to apply the changes to the newest 3.0.4 version, and marked as ready to review as I don't think we have any more changes in flight for this branch.

Copy link
Contributor

@mikekistler mikekistler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renewing my approval. 👍

Copy link
Contributor

@ralfhandl ralfhandl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also checked the generated HTML:

  • internal links use the new anchors
  • all "code" blocks are prettier-formatted
  • no further differences (after fixing the build script 😁)

@@ -1,6 +1,6 @@
# OpenAPI Specification

#### Version 3.0.4
## Version 3.0.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This threw off the respec build script, fixed with #3992

@ralfhandl ralfhandl merged commit fe10c1c into OAI:v3.0.4-dev Aug 2, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial Wording and stylistic issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants