Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to a different release notes system #25155

Open
asmeurer opened this issue May 19, 2023 · 44 comments
Open

Move to a different release notes system #25155

asmeurer opened this issue May 19, 2023 · 44 comments
Labels

Comments

@asmeurer
Copy link
Member

We should consider moving from sympy-bot to something that uses GitHub actions for the release notes. The bot is currently having issues sympy/sympy-bot#105, and GitHub actions would be harder to maintain. The main downside is that writing the notes as a file in the repo would be harder, but it might also make people actually write better notes, so that could be a good thing.

@oscarbenjamin
Copy link
Collaborator

If there can be a quick fix for the existing release note bot then that would be good. With other CI problems it can be possible to temporarily disable a CI check but that does not work for the release notes because we would lose all the data for what should be in the release notes.

In general though I am definitely in favour of moving to a different system. The current system involves a bunch of things that are difficult to automate during the release process. I also don't like the fact that the release notes can be edited without going through the normal review process and that release notes are not as well reviewed as the other content of PRs. I also would prefer that the release notes be a part of the docs and I think that amalgamating the release notes for a release should be a PR that gets reviewed by as many people as possible.

In sympy/sympy-bot#105 (comment) @skieffer suggested the possibility of towncrier which I think could work. In sympy/sympy-bot#105 (comment) @asmeurer suggested looking at the system that NumPy uses.

@moorepants
Copy link
Member

I also would prefer that the release notes be a part of the docs

I'd like to see them as part of the docs as well as being shipped with our source tarballs.

@oscarbenjamin
Copy link
Collaborator

being shipped with our source tarballs.

The doc sources are included in the sdists. I guess what you mean though is that the changelog or release notes should be more prominent in the sdist than just being under doc/src somewhere.

Currently the contents of the sdist are like:

sympy-1.12
├── AUTHORS
├── bin
├── data
├── doc
├── examples
├── isympy.py
├── LICENSE
├── PKG-INFO
├── pyproject.toml
├── README.md
├── setup.cfg
├── setup.py
├── sympy
└── sympy.egg-info

Maybe there should just be a "changelog" directory. The docs could then be built using that but it would be visible at top-level in the sdist.

For NumPy there seem to be two separate things: the changelog which lists all PRs and the release notes which list only changes of potential relevance for downstream:
https://numpy.org/doc/stable/release/1.24.0-notes.html
https://github.com/numpy/numpy/blob/main/doc/changelog/1.24.0-changelog.rst

It looks like the changelog is generated from git using this script:
https://github.com/numpy/numpy/blob/main/tools/changelog.py

The release notes part is made using towncrier:
https://github.com/numpy/numpy/tree/main/doc/release/upcoming_changes

I suggest we add a changelog directory at top-level in the repo and under it the structure would be like:

changelog/
├── 1.10
│   ├── changelog.md
│   └── release-notes.md
├── 1.11
│   ├── changelog.md
│   └── release-notes.md
├── 1.12
│   ├── changelog.md
│   └── release-notes.md
└── 1.13
    └── notes
        ├── 12345.polys.md
        ├── 12567.solvers.md
        └── 12890.core.md

Then release notes (if needed) would be added in a PR with an issue/PR number and a submodule in the name. The towncrier script would be used before the release to merge those into a release-notes.md and a separate script could list all PRs in the changelog.md. Since that change would come through as a PR before release it gives a chance for the release notes to be reviewed again in context. The docs would include these pages somewhere.

Since there would also be a changelog it might not be as necessary for most PRs to include a release note although there should be better discipline about having good PR titles since those are what would show in the changelog. Currently there is a requirement to at least add NO ENTRY if there should not be a release note but I think that it might be fiddly to check this.

If we are agreed about this as an overall structure then I can put the basic parts together to unblock CI. We would need to think about exactly how to incorporate this into the docs but that could be done later. For now what is needed is some way to collect release notes from PRs because currently all new PRs will be blocked on this.

@oscarbenjamin
Copy link
Collaborator

└── 1.13
└── notes
├── 12345.polys.md
├── 12567.solvers.md
└── 12890.core.md

Actually it would be better to have these in something like changelog/master rather than changelog/1.13 so that they do not need to be changed for a long running PR after a release is made.

@moorepants
Copy link
Member

The doc sources are included in the sdists. I guess what you mean though is that the changelog or release notes should be more prominent in the sdist than just being under doc/src somewhere.

I was under the impression that a copy of the release notes are not present in our sdists. Maybe it is? Or it has been added since I last checked.

My suggestion is that we simply include release notes in the sdist, whether they are prominent or not, I have no opinion.

@asmeurer
Copy link
Member Author

asmeurer commented Jun 2, 2023

We should just include them in the docs. That's the most natural place for them to live. In general, we should move all "official" stuff that's currently in the wiki to the docs. This has already been done for the development guides (there might be a couple of things that are still left to move; I need to do a full runthrough). If the release notes are in the docs, then people can just navigate to https://docs.sympy.org/latest/release-notes.html (or whatever) from the docs site to read them. The docs will also get better indexing from search engines than the wiki or any other source. And if the notes are in the docs, it makes it trivial to reference documentation from the notes, which makes them much more readable, and also to reference release notes from the documentation (that will be rarer but it could be useful).

The wiki dates back to the earliest days of SymPy when there was no actual documentation and Ondřej just set up a MediaWiki server (this was later moved to the GitHub wiki). While there are still some projects today which exclusively use wikis for documentation, the best documentation is not in wikis. Wikis like the GitHub wiki are a poor choice for documentation because they lack important features like navigation, search and cross references, "official" pages are intermixed with "scratchpad" pages, and edits to wikis can be done by anyone and mostly go through unreviewed.

We just need to make sure that this doesn't make overall development process doesn't become too complicated. Ideally, the current make html should just work, regardless of what state the release notes files are in. For creating the notes themselves, it should be something very simple, like just creating a new file in some directory with some simple structure. We can cleanup the files into a single file whenever we do a release, but that script should also be called automatically by make html so that building the dev docs works.

Also, a minor nitpick from the NumPy method: I don't like that their release notes files contain the PR number in the file name. That means that you can't write the release notes entry until after you open your PR. But you should be in the habit of always writing entries, so they should be written and committed alongside your changes, before you even open a PR. So I would omit the PR number from the filename (also, the PR number can be inferred automatically from the git history, so this is also just unnecessary work for contributors).

@skieffer
Copy link
Member

skieffer commented Jun 3, 2023

That means that you can't write the release notes entry until after you open your PR.

I agree that that's a drawback to this system. Some projects use the issue number (when there is one) instead of the PR number.

One way to semi-automate it would be to check for presence of the release notes file as the very first CI test. When the file is absent, then all other tests are canceled, and the error message could include a GitHub URL to create the file with the correct name.

Still, this means that every PR fails its first run, which is a little annoying.

@skieffer
Copy link
Member

skieffer commented Jun 3, 2023

the PR number can be inferred automatically from the git history

That's an interesting idea. If it was guaranteed that every merged PR would have the standardized commit message of the form

    Merge pull request #NNNNN from USER/BRANCH-NAME

    Etc.

then the rule for naming the release notes file in your PR could be to use USER.BRANCH-NAME.feature.rst (or replace feature with bug etc.). A script could then search git log for the merge commits, rename the files to now use the PR number in place of USER.BRANCH-NAME, and then finally run towncrier.

@asmeurer
Copy link
Member Author

asmeurer commented Jun 3, 2023

Basically every PR merge commit has that commit message, unless the person who merges it manually changes it to not be that way (which no one does). Even if that were the case we could figure it out using the GitHub API.

BTW that point isn't just that you can't make the file without making the PR first. It's also annoying and error prone to lookup the PR number to put it in the file name (same with an issue number). This is stuff that can be figured out by the scripts automatically, so we shouldn't make people do it. The whole process should be as easy as possible: just create a file with any (unique) name, containing a simple structure. The actual metadata like the PR number and the authors can be constructed automatically, just like @sympy-bot does.

@oscarbenjamin
Copy link
Collaborator

I would really like the relevant information for the release notes to be in the diff that gets merged so that it is visible to everyone including both PR author and reviewer. We can of course make something that will help someone to write the release note in the expected format like a script that can be run as:

$ bin/make_release_note.py
Enter GitHub username for the release note: asmeurer
Branch name is fix-lambdify
Enter sympy submodule (e.g. solvers, simplify, ...): utilities
Enter type of change (feat, bug, doc, ...): bug
Created doc/release_notes/master/asmeurer.fix-lambdify-1.txt
Contents of file:
-----------------------------------------------------
Author: @asmeurer
Submodule: utilities
Type: bug
Note:

<Please replace this line with an explanation of the change
written in markdown for the release notes in the docs>
------------------------------------------------------
Please edit doc/release_notes/master/asmeurer.fix-lambdify-1.txt
to add the explanation for the release note.

I suggest for now that we do just that so that at least PRs can be merged without losing the information needed to reconstruct release notes later.

Currently the bot also notes who merged the PR but I guess there is no way to get that information until after it is merged.

@oscarbenjamin
Copy link
Collaborator

Maybe just this:

import sys
import subprocess
import argparse
import pathlib


template = """\
Author: {username}
Submodule: {submodule}
Type: {change_type}
Note:

<Please replace this line with an explanation of the change
written in markdown for the release notes in the docs>
"""


def main(*args):
    """
    Example:

    $ python bin/make_release_note.py
    Enter GitHub username for the release note: asmeurer
    Branch name is fix-lambdify
    Enter sympy submodule (e.g. solvers, simplify, ...): utilities
    Enter type of change (feat, bug, doc, ...): bug
    Created doc/release_notes/master/asmeurer.fix-lambdify-1.txt
    Contents of file:
    -----------------------------------------------------
    Author: @asmeurer
    Submodule: utilities
    Type: bug
    Note:

    <Please replace this line with an explanation of the change
    written in markdown for the release notes in the docs>
    ------------------------------------------------------
    Please edit doc/release_notes/master/asmeurer.fix-lambdify-1.txt
    to add the explanation for the release note.
    """
    username = input("Enter GitHub username for the release note: ")
    branch = subprocess.check_output(["git", "branch", "--show-current"]).decode().strip()
    print("Branch name is", branch)
    submodule = input("Enter sympy submodule (e.g. solvers, simplify, ...): ")
    change_type = input("Enter type of change (feat, bug, doc, ...): ")

    filename = "{}.{}.{}.txt".format(username, branch, change_type)
    filepath = pathlib.Path("doc") / "release_notes" / "master" / filename
    contents = template.format(
        username=username,
        submodule=submodule,
        change_type=change_type)
    )

    with open(filename, "w") as f:
        f.write(contents)

    print("Created", filename)
    print("Contents of file:")
    print("-" * 50)
    with open(filename) as f:
        print(f.read().strip())
    print("-" * 50)
    print("Please edit {} to add the explanation for the release note.".format(filename))


if __name__ == '__main__':
    main(*sys.argv[1:])

@skieffer
Copy link
Member

skieffer commented Jun 7, 2023

To ensure that filenames are easy to parse later, I'd suggest adding a check that the branch name not contain any dots:

branch = subprocess.check_output(["git", "branch", "--show-current"]).decode().strip()
if '.' in branch:
    print("Please use a branch name without dot (.) characters.")
    sys.exit(1)
print("Branch name is", branch)

I can't recall seeing anyone use dots in a topic branch name (release branch, yes), but they are allowed.

As for valid chars in github usernames, I found it remarkably difficult to find docs on this. This project claims to have determined (via form validation messages on the "Join GitHub" page!) that it's limited to alphanumerics and hyphens.

@skieffer
Copy link
Member

skieffer commented Jun 7, 2023

To ensure that filenames are easy to parse later

Then again,

parts = filename.split('.')
username = parts[0]
change_type = parts[-2]
branch = '.'.join(parts[1:-2])

so maybe it doesn't matter.

@skieffer
Copy link
Member

skieffer commented Jun 7, 2023

But yet again, what about slashes in branch names?

Maybe it would be good to require that they use just alphanumerics and hyphens?

Or do percent encoding, in order not to have to bother the contributor.

@asmeurer
Copy link
Member Author

asmeurer commented Jun 7, 2023

Currently the bot also notes who merged the PR but I guess there is no way to get that information until after it is merged.

No it doesn't? https://github.com/sympy/sympy/wiki/Release-Notes-for-1.12 And I don't think that info really needs to be there either.

It does include multiple authors when someone else pushes something to the PR, and it also includes the person who opened the PR if they happened to not actually push any commits. It does this using the GitHub API but for this script it would be simpler to use the git history (especially considering we already have a well maintained .mailmap).

To ensure that filenames are easy to parse later, I'd suggest adding a check that the branch name not contain any dots:

I don't think the filename even needs to be based on the branch name. It's a good default for a script that generates it for you (and that script can just remove things like dots and slashes). But in general, whatever builds the notes should just gather all the files together, and the filenames would not be part of the final notes. It doesn't matter for reviewing either because you're just going to review whatever file is new in the PR.

@oscarbenjamin
Copy link
Collaborator

Currently the bot also notes who merged the PR
No it doesn't?

No it doesn't... I guess I misremembered that.

But in general, whatever builds the notes should just gather all the files together, and the filenames would not be part of the final notes.

Exactly, the filenames just need to be unique. We don't need to use the branch name but I imagined that it would usually give a vaguely meaningful unique identifier.

@oscarbenjamin
Copy link
Collaborator

t would be simpler to use the git history (especially considering we already have a well maintained .mailmap).

The git history uses different metadata from what is currently used by the bot. The bot uses GitHub usernames but the .mailmap file uses the names and email addresses that are in the commit metadata. The .mailmap file maps all commits uniquely to a single name/email for the purposes of the git metadata which then means that every commit corresponds to a single line in the AUTHORS file. There is no mapping from the git metadata to GitHub usernames or vice versa though.

I don't know if it is intentional that the release notes use GitHub usernames (maybe it was just easier for the bot to get those?). It would be easier just to use git metadata for the release notes since it is already in the commits and the .mailmap already resolves ambiguities. If we imagine a future in which there would be a need to move away from GitHub then it would be better if long term records like the release notes were not tied to GitHub metadata.

I also wonder if it could be possible to combine the release notes file with the need to update .mailmap to make it easier for new contributors somehow because the .mailmap is something that many contributors struggle with. I think this last point is something to hold off on for now though because we just need to get something working for the release notes ASAP.

@asmeurer
Copy link
Member Author

asmeurer commented Jun 7, 2023

I did that in the bot so that it could cross-link to people's GitHub profiles. That isn't that important if it's hard to do.

@oscarbenjamin
Copy link
Collaborator

I did that in the bot so that it could cross-link to people's GitHub profiles. That isn't that important if it's hard to do.

It's not hard to do if we ask for the information to be provided as I suggested above:

 username = input("Enter GitHub username for the release note: ")

If we want to infer it automatically then it is harder.

Even using the git metadata is not really "automatic". It would be automatic if all contributors set up their git/GitHub metadata and we just used it directly without question. We don't just use it though and instead require all git metadata to be listed in the mailmap. Seeing the ways that new contributors get stuck with the mailmap check in CI shows that many contributors do not provide the metadata in the commits in such a way that we could just use it automatically as suitable for the AUTHORS file.

@asmeurer
Copy link
Member Author

asmeurer commented Jun 7, 2023

We should do whatever is easiest. We don't even have to include the author information at all if we don't want to. The NumPy notes do not, for instance https://numpy.org/doc/stable/release/1.24.0-notes.html. That info is available in the linked PR.

@skieffer
Copy link
Member

skieffer commented Jun 8, 2023

Okay, there are a lot of different methods being considered here. For a separate project that I maintain, I am considering using percent-encoded branch names in the filenames, and using git log to transform these into PR numbers, by seeking commit messages of the form, Merge pull request #NNNNN from USER/BRANCH-NAME. If you want to use that system, I've written a script for it, and share it here in case it's useful.

#!/usr/bin/env python

"""
Rename files in the release notes directory, changing names like

    USER.BRANCH.bug.txt

to instead be of the form

    12345.bug.txt

where the number denotes the PR in which this branch was merged.
"""

import argparse
import pathlib
import subprocess
import sys
import urllib.parse


def rename_files(dry_run=False):
    notes_dir = pathlib.Path("doc") / "release_notes" / "master"
    if not notes_dir.exists():
        print(f'Release notes directory `{notes_dir}` does not exist.')
        sys.exit(1)

    filepaths = list(notes_dir.iterdir())
    print(f'Found {len(filepaths)} release notes files.')

    no_match = []
    match_count = 0
    for filepath in filepaths:
        filename = filepath.name
        username, pct_enc_branch, change_type, ext = filename.split('.')
        branch = urllib.parse.unquote(pct_enc_branch)
        regex = rf'Merge pull request #\d+ from {username}/{branch}$'
        cmd = f'git log -P --format=%s --max-count=1 --grep="{regex}"'

        commit_title = subprocess.check_output(cmd, shell=True).decode().strip()
        if not commit_title:
            no_match.append(filename)
            continue
        else:
            match_count += 1

        PR = commit_title.split()[3][1:]
        newname = f'{PR}.{change_type}.{ext}'
        cmd = f'mv {notes_dir/filename} {notes_dir/newname}'
        print(cmd)
        if not dry_run:
            subprocess.run(cmd.split())

    print(f'{"Would rename" if dry_run else "Renamed"} {match_count} files.')
    if no_match:
        print(f'Could not find merge commit for {len(no_match)} files:')
        for filename in no_match:
            print('  ', filename)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Rename release notes files.')
    parser.add_argument('-d', '--dry', help='do a dry run', action='store_true')
    args = parser.parse_args()
    rename_files(dry_run=args.dry)

@oscarbenjamin
Copy link
Collaborator

That is useful and I think that could work.

We would need to ensure before the PR is merged that the filename is correct for the given username and branch name.

In GitHub actions the GitHub username is available as github.event.pull_request.user.login. Possibly the branch name is github.ref_name. We could pass those to a script that would validate the given file.

Do we want to require the submodule name as part of the file like oscarbenjamin.lambdify_fix.utilities.txt?

Then if there are release notes for multiple submodules they just need to have separate files?

Do we need the equivalent of requiring a contributor to explicitly add NO ENTRY? I guess it could be oscarbenjamin.typo_fix.no_entry.txt?

The thing that needs to be agreed here before we can fix this is what precisely we ask contributors to do and when exactly are CI checks expected to fail either because the release notes are missing or incorrect.

Maybe the rule from the CI check could be

  • EITHER there is exactly one release note for the given username and branch name and it is for the no_entry submodule and just says NO ENTRY.
  • OR there are one or more release note files for different submodules.

Then also what should the content of the file be? Should it just be an unordered markdown list like:

- A bug in solve was fixed.
- Also solve was made faster for some things.

Do we need to validate that the markdown is syntactically correct? (as long as the text content is reasonable it is probably not difficult to fix any markdown syntax later).

Alternatively we ask contributors to just create a file called user.branch.txt and inside they can add headings like now with:

* solvers
    * Fixed a bug

Or otherwise the file just contains the text NO ENTRY. This is potentially easier for contributors since it is not multiple files but validating that the contents of the release note file match the expected pattern is a little harder.

@skieffer
Copy link
Member

skieffer commented Jun 8, 2023

This is potentially easier for contributors since it is not multiple files

I guess I'd lean toward whatever makes things easiest for contributors, and I would agree with you that that would probably mean forming just a single release notes file. I think the interactive script that generates the file is great.

As for processing the contents of the file, if that's to be written in the form

* solvers
    * Fixed a bug

can existing code from sympy-bot be used here?

@asmeurer
Copy link
Member Author

asmeurer commented Jun 9, 2023

In GitHub actions the GitHub username is available as github.event.pull_request.user.login. Possibly the branch name is github.ref_name. We could pass those to a script that would validate the given file.

Do we want to require the submodule name as part of the file like oscarbenjamin.lambdify_fix.utilities.txt?

Again, I don't think any of those should be required to be part of the filename. The filename should just be something unique, and not be part of the final notes. The username can be inferred. You're already suggesting pulling it from the API, so why not just use it? Or we can just use the real name from git. The submodule can be part of the markdown. A short name like lambdify_fix is useful for the filename but doesn't really add anything to the final release notes.

can existing code from sympy-bot be used here?

Here's all the relevant code https://github.com/sympy/sympy-bot/blob/master/sympy_bot/changelog.py. The list of submodules is at https://github.com/sympy/sympy-bot/blob/master/sympy_bot/submodules.txt, but we could also just pull them directly from the source.

@oscarbenjamin
Copy link
Collaborator

Again, I don't think any of those should be required to be part of the filename. The filename should just be something unique, and not be part of the final notes. The username can be inferred. You're already suggesting pulling it from the API, so why not just use it?

There are two separate stages here:

  1. When a contributor submits a PR the CI checks will run and can validate whether the release note data is in the intended format.
  2. Later on someone can run a script (like the one shown by @skieffer above) that will amalgamate the previously merged release note files into a release note page for the docs.

I suggested that at stage 1 we can infer the github username and the branch name in a CI script in order to validate that that information is correctly encoded in the files that are added/changed by the PR. However at stage 2 we no longer have the username and branch name unless they were previously encoded in the actual diff of the PR somehow. The information can be in the filename or it can be in the content of the file but it needs to be somehow in the file tree because none of the other git metadata stores the GitHub username.

Or we can just use the real name from git.

I'm not sure that it is easy to get that from the release note file and git history. For example we would still want to link to at least the PR if not an associated issue and I don't immediately see how we would get that from the git history (without using @skieffer's approach that requires knowing both GitHub username and branch name which requires that those are both present somehow in the file tree).

@oscarbenjamin
Copy link
Collaborator

Let's get this in perspective: currently SymPy development is largely blocked on this issue. The blocker here is how exactly we infer some data rather than asking that the contributor provide the data. There is a very simple solution to this though: just ask the contributor to provide the data.

There are two sides to this as with all data because there is one side (the contributor) who provides the data and another (the release scripts) that consumes the data. When I am in the position of being the consumer of the data I would much prefer that the data was explicitly provided rather than needing to try to infer the data from imperfect or indirect sources. When I am in the position of being the producer of the data I would much prefer to provide the data explicitly so that I know what data is being provided. On either side I prefer the data to be explicit rather than inferred.

We can provide tooling to help generate the data and to validate that it is in the intended form. To me that would seem more helpful than a system that tries to infer data in a way that I don't understand (as a user of that system).

I can send a PR that requires users to explicitly provide the data along with scripts that help to generate and validate that data. If anyone prefers that the data be inferred then can they please outline a full workable system for doing that along with how CI checks and everything else would work?

@asmeurer
Copy link
Member Author

Let's get this in perspective: currently SymPy development is largely blocked on this issue. The blocker here is how exactly we infer some data rather than asking that the contributor provide the data. There is a very simple solution to this though: just ask the contributor to provide the data.

The blocker here is someone writing the script to do this. I don't have the time right now to do it.

@asmeurer
Copy link
Member Author

Since the bot is still broken, and it seems no one has had the time to fix it or move to a new system, I've moved everything back to Heroku from Render. This is currently costing me $5 a month. We could do this indefinitely (although if we do, we should move the billing to our SymPy NumFOCUS account), but the things discussed here seem to have some advantages so it's probably still worth pursuing them.

@skieffer
Copy link
Member

Just for future reference, I did release a package for branch-based news fragment filenames, which gets around the problem of not knowing the PR number before you open it. After using it for one release cycle of a project I maintain, I would say that it works well.

However, the experience has also shown me a virtue of the SymPy release notes system, where the news is written in the opening comment of the PR. Namely, even when using news fragment files, you still have to write opening comments for your PRs, and I often found myself manually copy-pasting the exact text from the news fragment files for this, which was somewhat tedious to do. SymPy's system is nice in that it avoids this.

@oscarbenjamin
Copy link
Collaborator

Branchnews does look nice.

@skieffer
Copy link
Member

Thanks! Like I say, I find it works well enough for what it is, but I'm still not sure what the best system overall would be.

@oscarbenjamin
Copy link
Collaborator

I'm still not sure what the best system overall would be.

Neither am I. There are some nice aspects to the sympy-bot approach.

The primary problem from my perspective is that the release notes are on the wiki which is separate from the codebase and needs manual management around releases.

@oscarbenjamin
Copy link
Collaborator

In #27081 I backport two pull requests that were merged to main to the 1.13 release branch. It is awkward to copy the release notes manually from the OPs of the pull requests and now we will end having duplicate release notes for 1.13.3 and 1.14 unless I go and delete them from 1.14. Often I just forget to do this and so we end up having bugfix releases with missing release notes.

It would be a lot easier to keep track of what was going on in the release notes if they were in the file tree and could be seen in the diff and edited locally etc.

@asmeurer
Copy link
Member Author

branchnews looks like an improvement over towncrier, but wouldn't an even simpler system just be to have a single file that the user creates, which then gets amalgamated on merge? That is, don't require the user to name or even create the file. There would always just be a file, say

docs/CHANGES_FOR_THIS_PR.md

which would contain instructions and a place for the user to write their notes. Then, on merge, some CI bot automatically takes the contents of this file and moves them to the release notes page. You also could have the CI do this in the PR itself (similar to how precommit CI works), so that you aren't creating new commits on top of every merge.

Or maybe even better: just have users edit the release notes page directly, but have a CI bot that automatically adds commits to the PR to hash out annoying merge conflicts.

There's a lot of room for innovation here. My goal with sympy-bot was to make writing notes as dead simple as possible, so that people would actually do it. If it's a multi-step process to write notes, then either people won't do it, or we make it required and then we have to constantly bug people to do it and instruct them in how it works and so on.

I think I wrote this somewhere above, but ideally the user should not be required to write anything in a release notes entry that cannot be obtained automatically. That includes their username, the branch name (which isn't actually relevant), or the PR number, and that includes both the contents of the note and any potential file name.

Ultimately, it can be hard to get some people to actually write about their code, whether that be commit messages, release note entries, PR descriptions, docstrings, useful comments, etc. That's the fundamental problem, and there's only so much we can technically do to solve it.

Another crazy idea is to have LLM try to write release notes automatically. But that would work best when it can summarize something that was written somewhere else (PR summary, commit messages, ...), because they are usually pretty bad at just guessing from the diff, so the above problem would still be relevant.

@oscarbenjamin
Copy link
Collaborator

Then, on merge, some CI bot automatically takes the contents of this file and moves them to the release notes page.

This does not seem simpler to me at all.

If it's a multi-step process to write notes, then either people won't do it, or we make it required and then we have to constantly bug people to do it and instruct them in how it works and so on.

This is still the case with sympy-bot except that "we" (the reviewers) often don't check the release notes because they are separated from the diff. I don't see why the sympy-bot complaint about release notes is any better than a CI check that says "add a notes/xyz.txt file with release notes".

Another crazy idea is to have LLM try to write release notes automatically.

Definitely crazy!

There's a lot of room for innovation here.

There is also room for just using standard tooling, matching the workflow of other projects etc. SymPy is hardly the first project to consider these issues so without some significant effort I don't see why we would come up with something that is significantly better than what other projects have.

Honestly, I don't see what is so bad about just using towncrier.

@asmeurer
Copy link
Member Author

It depends on your perspective. It's maybe not simpler for the person writing the tool but it is simpler for the people using it.

There is also room for just using standard tooling, matching the workflow of other projects etc. SymPy is hardly the first project to consider these issues so without some significant effort I don't see why we would come up with something that is significantly better than what other projects have.

I mentioned this because @skieffer mentioned writing his own tool. If you're going to write your own tool, it would be good to do it right. Another option is upstreaming improvements to towncrier or some other tool itself (it is open source after all).

I don't see why the sympy-bot complaint about release notes is any better than a CI check that says "add a notes/xyz.txt file with release notes".

It's not about where the message lives (I agree a PR comment vs. a CI check is not that important). It's about the steps the user has to take to write the notes. Every extra step someone has to take to contribute beyond the actual code they write is friction. It also makes it harder on reviewers because they will have to walk every new contributor through the process if it is not straightforward. If we can avoid things that are technically unnecessary we should.

The reason people don't write notes with sympy-bot is that the bot has the NO ENTRY escape hatch, and people do that because it's easier. People are also going to do that with towncrier or whatever other tool as long as it's an option. Again, the fundamental issue is that (a lot of) people don't like writing, and tooling cannot fix that. I do think we should remove NO ENTRY, and just have a separate "internal" category or something, so that every change has to have an entry (and we could do this pretty easily right now with sympy-bot if we wanted to).

Reviewers not reviewing the notes is also something I'm not completely convinced will change with different tooling. Reviewers are supposed to do that. If they aren't, that's a problem with our reviewing process. In fact, the whole point of sympy-bot is that reviewers can write or edit the notes themselves very easily, just by editing the PR summary. Again, this is a social (people) issue, not a technical one. Maybe people will be more adamant about reviewing the notes if they are in the diff, but I wouldn't just assume that would be the case.

To me, the main technical issues with sympy-bot are:

  • the notes are put on the wiki, but the docs would be better (this could actually be fixed with sympy-bot itself, if we wanted)
  • It's an independent system that we have to maintain, and it would be better to use something off the shelf that someone else maintains.

@oscarbenjamin
Copy link
Collaborator

I do think we should remove NO ENTRY, and just have a separate "internal" category or something, so that every change has to have an entry (and we could do this pretty easily right now with sympy-bot if we wanted to).

I agree with this. I still don't want to use sympy-bot but it would be good to have a changelog that is not "release notes" where you just note what any change is regardless of if it is relevant to users/downstream.

Maybe people will be more adamant about reviewing the notes if they are in the diff, but I wouldn't just assume that would be the case.

Speaking for myself it requires extra diligence (that does not always happen) to look at the release notes. I always check the full diff as the final step before merging but the release notes are not in there and so I often miss them.

@skieffer
Copy link
Member

Another option is upstreaming improvements to towncrier or some other tool itself

Yes, what branchnews does should probably just be an option in towncrier, which I considered during development, but decided to build a standalone prototype first for rapid proof of concept. I may still open a discussion over there at some point.

One difficulty which I think none of these tools overcomes is what happens when one PR cancels out or alters the effects of an earlier one, within the same release cycle. In such cases, the release notes from the earlier PR may need to be altered too. It's probably unrealistic to hope contributors would ever be willing or able to try to manage this themselves, and I guess this will always have to be the responsibility of project maintainers.

Automated tools could at least raise a red flag for maintainers when a second PR touches some of the same lines of code as an earlier one within the release cycle, but even that wouldn't be bulletproof. Two totally separate files A.py and B.py can, together, produce user-visible behavior, so PRs which intersect in terms of release notes don't necessarily intersect in terms of lines of code.

@oscarbenjamin
Copy link
Collaborator

One difficulty which I think none of these tools overcomes is what happens when one PR cancels out or alters the effects of an earlier one, within the same release cycle.

This is why it would be better if aggregating the news entries happened in a PR that everyone can review.

@asmeurer
Copy link
Member Author

One difficulty which I think none of these tools overcomes is what happens when one PR cancels out or alters the effects of an earlier one, within the same release cycle. In such cases, the release notes from the earlier PR may need to be altered too. It's probably unrealistic to hope contributors would ever be willing or able to try to manage this themselves, and I guess this will always have to be the responsibility of project maintainers.

I would say storing the notes in the repo solves this problem. If PR A adds a new feature and PR B later removes that feature, all within the same release cycle, then PR B can just delete the note that was added from PR A. This sort of thing actually happens a lot and it's one of the downsides of the sympy-bot approach.

That's also why I would love to see a system where everyone just directly edits release-notes.md, instead of aggregating files. That makes it really easy to see what the release notes are, even before a release. It would also put the pre-release notes in the dev docs (similar to how we have pre-release notes visible now). I think the only problem with doing it that way is you end up with a lot of merge conflicts, but I think it should be possible to work around that with the right tooling.

@skieffer
Copy link
Member

I would say storing the notes in the repo solves this problem.

I agree that storing the notes in the repo provides a mechanism for solving the problem, but I have doubts about contributors' willingness and ability to use it. I suppose that here your one-file idea actually makes it much more likely to work. I can't imagine contributors combing through multiple news fragment files to look for overlap.

Actually, the one-file plan is so simple that it makes me re-question why many projects (including my own) have moved to the fragment file (i.e. towncrier) system. You pointed out one advantage of the latter, namely,

  • No merge conflicts

What else? I suppose it has largely to do with having the final change log be automatically generated, which results in things like:

  • Consistent formatting
  • All the desired section headings ("Bug Fixes", "Improvements", etc.)
  • Automatically generated links to the related GitHub issues and PRs

But I'll admit that none of those are things that can't be done manually. If editing release-notes.md was to be a part of every PR, then maintainers could point out formatting rules in the review process, and require links to PRs and issues.

There would still be the problem of not knowing the PR number until after opening it, requiring at least one additional commit to add the link. But at this point I think it's clear that every system has its imperfections, so...hard to say which is best.

@skieffer
Copy link
Member

it would be better if aggregating the news entries happened in a PR that everyone can review.

So, under this system, there would be a single, final review of the release notes, just prior to making a release. At that time, project maintainers could look for conflicting or overlapping changes, and rewrite the notes as necessary.

This allows use of fragment files. But it saves all the work until the end, instead of gradually building and maintaining the notes in a single file, updated once per PR. I'm not sure which would be easier in practice.

@oscarbenjamin
Copy link
Collaborator

But it saves all the work until the end, instead of gradually building and maintaining the notes in a single file, updated once per PR. I'm not sure which would be easier in practice.

I don't think it saves all the work until the end. Each PR gets a release note at the time but at the when aggregating them we have a second round of review where we can see in context how all those release notes actually look. Similar release notes can be merged together or some can be reworded or removed if they no longer make sense. It is very hard to write release notes for things that you haven't actually worked on yourself. It is comparatively a lot easier to edit/reword release notes later if you actually have some meaningful content to work with.

@skieffer
Copy link
Member

even when using news fragment files, you still have to write opening comments for your PRs, and I often found myself manually copy-pasting the exact text from the news fragment files for this, which was somewhat tedious to do.

Thinking more about this, a solution would be to have CI automatically add a comment to the PR, showing the contents of the news fragment file. Contributors could then be instructed that they need not repeat anything in their opening comment that is already documented in their news fragment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants