Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate further automation for publishing npm packages #7352

Closed
aslawson opened this issue Mar 5, 2021 · 6 comments
Closed

Investigate further automation for publishing npm packages #7352

aslawson opened this issue Mar 5, 2021 · 6 comments

Comments

@aslawson
Copy link
Contributor

aslawson commented Mar 5, 2021

SDK publishing has been a continuous bottleneck for those that depend on it, yet the existing process is a long manual process still.

Can we automate further to have a continuous publish process or a nightly job?

Output: A proposal for how to proceed with automation

@aslawson
Copy link
Contributor Author

@gastonponti will we have tickets that come out of this that we can pull into the coming sprint/milestone?

@piersy
Copy link
Contributor

piersy commented May 12, 2021

Ponti's initial investigation.

*** Notes ***

Yarn v1 workspace + lerna

  • We are not using it as it should
  • Lerna as a publishing tool:
    • pros:
      • Can maintain same versions for every package (instead of "version: "independent" as we have)
      • Publishes to npm, generates the tags for github, and generates the Changelog entry
      • Checks which packages changed between releases and only publish those
      • can also check the things added in the unstable version (lerna diff)
      • ultra nice to have:
        Used with conventional commits (https://www.conventionalcommits.org/en/v1.0.0/#specification), you can specify in every commit which kind of change are you adding. So, not only will generate the changelog of every package with its bullet points, but also will infeer the version of the package using semantic versions. So, if between releases we added only bug fixes, it will increment the patch version, but if we fixed bugs AND added a feature, will only increase the minor version.
        As everyone is using this repo, is kind of a dream to think that everyone will follow this convention
        when is pushing some changes, but, at least, we could start moving towards that direction, and it
        would be very usefull for the changelog update
    • cons:
      • I couldn't be able to handle two different instances of lerna (didn't fully test it either) in the same monorepo, so if we plan to have this kind of publishing, maybe the only solution would be to extract the sdk (a long time battle)
  • Yarn as a package manager:
    • Yarn is expected to be used as the only source of truth for the workspace. Instead of updating every package.json, this tool is the one in charge of maintaining and keep the same packages versions on every package. The way I see it, is that if we want to truly use yarn, we should think that the packages.json(s) don't exist, we shouln't be able to modify them, only yarn is in charge of doing that.
      So, if we want to add a dep, use yarn, if you want to update a dep, use yarn. If you want to force a resolution, again, use yarn. Every thing that could make you change the package.json, it has it's own way of doing it with yarn.
      The problem with this, is again, that all the other packages does not follow this rule.
      Or we force them to use the same rules (not every package has fixed versions required for a build) or we extract the sdk. IMHO if we are not planning to remove the sdk, what it's really clear, is that if for some reason one package, can't follow the same versions used when updating everything (using yarn) is that the specific package does not belong to a monorepo. So, for example, our protocol package could
      eventually share the same deps, but not the phone-number-etc. But why? If it's only because of the google cloud build, we could try to see if we could addapt that job, before fully removing it.
      What is clear, is that all the monorepo should not suffer by the specific rule of one particular package, or we addapt that rule, or we remove the package.
    • To fully use Yarn, we should unify all versions, which is basically update everything from the root using yarn, and probably before this, we will have to remove the packages that don't belong to the monorepo, AND we should add some sort of material/tutorials for the devs that will change thing here, to avoid forking again the usage of this. Probably it will be a task of being more strict in the PRs until everyone is in the same page.
      As this will also require not only work on the repo, but also some sort of education, I would analyze the posibility of introducing yarn-v2 instead (see next point)

Yarn v2

Yarn v2 it adds a lot of features. One of those is a "foreach" for every workspace, which is the same as lerna does and also has a way to publish every package at once infeering the versions (it's in a experimental stage, but but has been tested for a while https://yarnpkg.com/features/release-workflow)
So, basically, yarn v2 it's yarn-v1+lerna (you could still use lerna if you want)
But, not only allows you to have nested workspaces (thing that could be usefull in our scenario of the sdk), but also adds a ton of new things, for example Zero Installs.
Zero installs basically pushes the libs directly to the repo, which in the case of the node_modules could be a nightmare but they claim that a node module of 1.2gb is stored in 139mb. This give us a few advantages, not only thatfor every commit we will be using the same fixed libs (not even some suggested versions), which it's easier for debugging and it adds another secury layer, but also will decrease all our ci jobs, because every lib it's going to be already downloaded.
Even if we don't use Zero installs, yarn v2 it seems to be what we have plus a lot of other plugins that would be helpful, so IMHO, if we are going to spend some time, fixing yarn v1 + lerna, we could try to spend the same amount of time, to make everything work with yarn v2

Unpublished work

I think that we use the -dev version just to be sure that we are in an unstable version. If we are planning to use lerna and yarn as it should, we should should avoid that, and just using plain versions. This will make the release process faster, instead of requiring to change and remove every possible -dev version. If we forget to update a version and left it fixed is because we are not using yarn the way we are supose to, haha.

Publishing versions

  • Same vs Independent. Not a really strong opinion on this. Both have its pros and cons. Both are handled by lerna or yarn v2 so, if a thing of toolings. What I know from my experience in the javascript world, is that it's easier to understand which sub lib "matches" with another sub lib.
    Is the [email protected] supported by the [email protected]. In a independent scenario, it could be or not, it depends on the previous releases and how the team is working. If both versions are released as the same lib (and the dev team is not doing anything weird), you can be 100% sure that those libs are compatible.
    That's why I tend to prefer the same version for every package. Having the same version it difficults the release process, but, if we are using a tool that will handle everything, it won't be a problem.

@piersy
Copy link
Contributor

piersy commented May 27, 2021

Investigation into publishing automation

Layout of the monorepo

The monorepo contains inidvidual packages and also groups of packages such as the packages grouped under sdk and phone-number-privacy.

I can see that we currently seem to be maintaining the same version for all sdk packages, phone-number-privacy seems to have no clear connection between the sub package versions, and the remaining packages are all individual packages and seem to be versioned independently.

So based on this layout ideally we would like a tool that can handle releasing packages individually and also releasing groups of packages with matching versions.

Available tools

Yarn

Yarn provides functionality to update a single package version with the version command and can automatically commit the version change with a corresponding git tag. The git tagging is not useful because it simply uses the version to tag and in a repo with multiple packages versions are not unique.

Lerna

Lerna provides the ability to update multiple package versions with the version command, but doesn't provide any support for just releasing some packages, either you release everything or nothing (they do have a flag to ignore certain packages but it can be overruled in some cases, so cannot be relied upon). You can't have 2 instances of lerna in the same repo, lerna assumes there is one instance and works from the repo root.

IT doesn't look like this is going to change because in the words of the biggest contributor, responding to a request for an option to skip publishing certain packages.

I'm sorry, this fundamentally breaks the contract of lerna: Every package that has changed since the most recent git tag will be published with a new version. Skipping a package during publish is an enormous footgun, and will not be accepted.

Unfortunately I've been unable to find the definition of the "contract of lerna", but it seems from this comment that lerna's view of a monorepo is a bunch of packages that must all be released together. Which seems a little short sighted to me because I think there is lots of value to be had in sharing common config/tooling/linting... etc none of which imply that the dependent projects must be released as one.

Yarn v2

Yarn v2 looks like it could provide us with the functionality we need, as @gastonponti mentioned in his notes you can use yarn workspaces foreach with include and exclude patterns to run commands on just a subset of the workspaces to be able to bump a group of packages' versions in a single step using yarn version. There is also a new command yarn up that can be used to update references to a package across the repo.

I spent a few hours trying to get yarn 2 setup on the project and was successful (branch here), the tools could be easier to use but they seem to get the job done. I had some problems with husky (what we use for our git hooks) and its still not working so that will require a bit of consieration

In more detail workspaces are packages that are marked in the package.json (currently all packages in the monorepo are also workspaces, there is a 1:1 mapping between packages and workspaces) being a workspace means that when yarn tries to resolve a dependency on a package it will first look for a workspace to fill that dependency, workspaces allow us to modify a package and see that change in a dependent package without having to publish a new version and update the dependent package.

The yarn workspaces foreach command lets us operate over groups of workspaces.

The command yarn version will update the version of the specified package (can do multiple if using workspaces foreach) it will also update any packages that are workspaces that were referencing the previous version of the package. It takes a specific version or the strings (major, minor or patch)

Eg if package X was at version 1.2.3 and we bump it to version 1.2.4 and package Y was previously referencing [email protected] it will have its dependency bumped to 1.2.4, but package Z which was referencing 1.2.2 will not be updated. This is slightly annoying in that you cannot limit package dependency upgrades to just the SDK for example. But it should be easy enough to git checkout unwanted dependency upgrades after running the command.

There is also a new command yarn up that can be used to update references to a package across the repo, so this can be used in the case that we want all packages across the repo to reference a single version of another package. It also has an interactive mode that lets you select a version for each package.

Using the yarn workspaces foreach command can be quite verbose because it requires you to specify the --include flag once for each workspace you want to include, I've added script in the branch which aims to mitigate this issue by printing --include xxx for all workspaces matching a given prefix.

So using that the command to bump the major version for all packages in the sdk becomes.

yarn workspaces foreach `./packages-with-prefix packages/sdk` version major

Conclusion

Lerna has such a limited view of how a monorepo should behave that I don't think we should continue to use it. It definitely doesn't suit our current needs and even if we were able to fit into its framework, I worry that it would restrict future development when things need to change.

It looks like Yarn 2 is the way to go!!

@piersy piersy closed this as completed May 27, 2021
@piersy
Copy link
Contributor

piersy commented Jun 1, 2021

Update: The linked branch doesn't build on CI. I think I have tracked this down to a bug in yarn 2 which means that packages with c dependencies end up with different checksums in different environments. This is a blocker for yarn 2. So I'm not sure what we wan't to do about this.

Options seem to be:

  • Wait for the bug to be fixed
  • Remove the dependency causing problems @celo/ganache-cli
  • Do something else like write scripts to automate the release process.

I lean towards trying to get yarn 2 to work. Writing scripts to do the releasing will require building much of what is already provided by yarn 2, and they are likely to be quite tricky to get right. Having said that, if we can't get yarn 2 to work and no fix is forthcoming I guess writing our own scripts is the only option.

@aslawson
Copy link
Contributor Author

@piersy @AlexBHarley seems like this is well scoped. Have we created epic tickets to allocate and begin implementation?

@piersy
Copy link
Contributor

piersy commented Jun 16, 2021

@aslawson @AlexBHarley I have not created any epic tickets.

A further update though.

I managed to resolve the yarn 2 bug it was not a bug with yarn 2 it was due to a package bundling c/c++ dependencies (which means that those dependencies will be built on the releasing machine, and are therefore unlikely to work on any other machine).

My latest progress with getting yarn2 to work is in this branch https://github.com/celo-org/celo-monorepo/tree/piersy/y2 which works on my machine but not on CI. I've been unable to determine what is causing the difference between the 2 machines.

There is also this commit on this older branch https://github.com/celo-org/celo-monorepo/commits/piersy/yarn2 which has a useful script for simplifying package release operations, so once we have yarn2 working I would suggest cherry picking it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants