Development process improvements #3542

diwakergupta · 2023-01-30T18:44:40Z

diwakergupta
Jan 30, 2023
Maintainer

👋🏽 everyone!

Many aspects of development on the Stacks blockchain are challenging today. This affects developer productivity, as well as predictability of shipping milestones, on time, with high quality.

This is a request for input from contributors to the Stacks blockchain (past or present). Please try to provide your feedback before Wednesday, February 1st -- there is a call on Feb 1st and your feedback will serve as a starting point for that discussion.

If it helps, here are some areas / prompts to consider:

Norms for PR size & review turnaround
Biggest impact on developer productivity (e.g. slow / flaky tests)
Testing hygiene: best practices and norms
Testing resources / infra for upgrades
Task estimation & planning
Issue triage / board management (e.g. stale issues)
Dependency management (e.g. breaking changes to event-observer impacting the API)

Note that the process improvements are not necessarily limited to engineers / individual contributors. Guidelines for managers of teams or non-technical roles are also very much in scope, in so far as they impact (directly or indirectly) development on the Stacks blockchain.

jcnelson · 2023-01-30T19:31:08Z

jcnelson
Jan 30, 2023
Maintainer

My $0.01, based on conversations I've had with others.

Norms for PR size & review turnaround

I'd be fine with imposing size limits. What if we required each PR to be at most 1,500 lines added? No upper limit on deleted lines.

Expected review turn-around could be 2 business days, unless specified otherwise.

Reviewers could be selected from an OWNERS file.

Biggest impact on developer productivity (e.g. slow / flaky tests)

Right now, I think this is unfamiliarity with the codebase, especially with regards to how multiple different modules interact.

I'm very opposed to deleting working tests, even if they are flaky. Flaky tests should be fixed. Working tests are crystallized knowledge of how the system is supposed to work. We delete them at our own peril.

There are some unit tests that cannot be run in parallel because they bind on ports, and might clobber one another. I think this is fixable, provided that cargo is given enough file descriptors to work with. Same goes for unit tests that accidentally clobber one another's disk space.

A lot of the "slow" tests are slow because they fully mock the runtime environment of a Stacks node -- they spin up a regtest Bitcoin node, submit live transaction data to it, and process live Bitcoin blocks. Some tests even spin up multiple Stacks node runloops in separate threads, and have them interact with one another via localhost using the live p2p and RPC network code paths. I don't see these getting any faster; it is imperative in my mind that integration tests run as close to prod as possible.

Testing hygiene: best practices and norms

Right now, I think the biggest hindrance right now is that Codecov is hilariously unreliable. But this is fixable. Once it's reliable, we should pick a percentage of code coverage that's acceptable and stick with it.

It's hard to specify this formally, but tests should spend most of their time exercising the "unhappy" paths. The "happy" path is easy to test, and it's easy to convince oneself that if the "happy" path works, the code works. But, it is these "unhappy" paths that end up biting us in production. The code should be written so that exercising them is straightforward. For example, I've used fault injection code in various places to ensure that certain degraded modes of operation are tested.

I'm skeptical about mocking as a test tactic. I think its unhygienic, because it carries the risk of testing your code's behavior to the mocked component instead of the real thing. I can understand why it's often necessary, but I think testing with the real thing should be preferred when it is possible / tractable. An alternative tactic could be to focus on making the connected modules "scriptable" so that it exhibits the behavior that the tested module expects (fault injection is one form of this).

Testing resources / infra for upgrades

My asks here are:

Fuzzing 24/7. We should be fuzzing code paths that are responsible for consuming untrusted data.
Nightly smoke testing. We should have an automated process for verifying that nodes are always able to boot from genesis and reach the chain tip in a reasonable amount of time. We should also have an automated process for verifying that various configurations of online nodes are able to collectively produce a high-quality chain under various degraded operating conditions. These can't be part of CI right now because they take too long to run.

Task estimation & planning

I am terrible at time estimation. I think I could be better if we could do some kind of collective secret ballot voting on what a task's time estimate could be, and take the median time. Then, my bad estimates could be thrown out in the near-term, and would approach the median estimate from the group in the long term.

I like @kantai's suggestion that no task should take more than one week (5 business days). If it does, it should be broken down further.

I'd love to spend more time planning out features at regular intervals. Historically, I haven't felt like I've had time.

Issue triage / board management (e.g. stale issues)

Super-spicy take: there shall never be more than 1 page of open issues that do NOT have someone working on them. If your issue falls to the second page, it's closed. If it's that important, it will get re-opened.

Issues pertaining to long-term architectural tasks should be kept separate from blockchain issues. I think this includes things like new Clarity language features, new network protocols and services, and so on -- i.e. things that haven't yet been broken down into tasks, or are not ready to implement. The blockchain issues page is only for things that are in-flight and for bug reports.

Dependency management (e.g. breaking changes to event-observer impacting the API)

I think the overall goal is to eliminate dependencies. I think a big part of why the event-observer gets broken is because it's set up to fail that way. If the event observer instead was tasked only with (1) receiving a blocking notification when a new block is ready, (2) pulling the relevant data it needs from the stacks node, and (3) consuming the notification, then I think breakage would be a lot less frequent.

10 replies

zone117x Jan 31, 2023
Maintainer

Here's another idea that could help improve the development process, and this could apply to either the current event-emitter model or the improved-RPC-endpoints model @jcnelson described:

If a change has clear implications for downstream dependencies, then ensure that the PR includes tests that use either the event-emitter and/or RPC-endpoints to validate usage of the feature. If that isn't possible or tractable for you, it's probably an indicator that it won't be possible or tractable for downstream consumers either. And if a contributor is too unfamiliar with downstream dependencies or how a given feature is actually used in production, then a PR reviewer who is familiar should flag the issues. This could be one of the PR checklist items.

Note that this is somewhat already how it works today, but in a much more roundabout and inefficient way: 1) stacks-node change is made that lacks or breaks usability, 2) downstream dependencies encounter the problem sometime later, 3) a core dev fixes/implements usability. This is glossing over a lot of the debugging, coordinating, prioritizing, etc. induced by the current development process.

We could start using this process today for sBTC development. If the downstream usability of sBTC features was more strongly considered today (and if there was enough time allocated for this), then we'd likely save a lot of time compared to waiting until later on in the development process.

jcnelson Jan 31, 2023
Maintainer

At a meta-level, it seems that the source of our woes with downstream clients (including but not limited to the event observer) is really an empathy problem. The people who inflict the breakage don't feel the pain of the breakage. The architecture should ensure that the opposite is true: if a committer breaks something downstream, then they should notice it in a big, immediate way, and CI should force them to fix it before merging.

My recommendation of switching to a thin notification / fat API comes from this observation. Among other things, it would make it easier to ensure that a blockchain committer notices immediately that their changes break something downstream. Specifically:

Adding an RPC test is much more straightforward than adding an event observer test. The former can be done in a unit test; the latter requires a full integration test today.
It's straightforward to add a CI step to check for the existence of test coverage for an RPC endpoint, and to automatically generate tests that check basic properties of an endpoint (e.g. that it is well-formed). Then, CI could block a PR if it doesn't adequately test new RPC endpoints. This is much harder to do for the event stream.
Having a fat API built-in invites more downstream clients, and makes it much easier to get started building apps that depend on a Stacks node. As @xoloki points out, I advised against using an event observer for implementing the sBTC alpha testnet because it would take more effort than just polling the node (in particular, every time you changed the event observer schema, you'd need to re-sync your node from genesis (!!)). That by itself is a strong signal that the current architecture does not reflect sufficient empathy for the needs of downstream users.

Agree with @diwakergupta that the overall design discussion should happen elsewhere. The point I'm trying to make is that the architecture should do a better job taking into consideration the needs and expectations of users and developers than it does now.

@CharlieC3 ...but I think the stance on this in the past has been "let the stacks-blockchain be a blockchain and not an RPC server".

That is correct. The assumption at the time was that we did not want the node to think about or care about persisting the event stream, because this problem could be offloaded to a side-car process that could do it better (e.g. with a "real" DB). I'm happy to revisit this, however.

@xoloki The one time I tried to push a slightly complex PR into stacks, even getting existing integration tests to run was a massive pain point. Adapting an existing test was an enormous challenge, and required a number of sleep statements in places that were confusingly opaque. If we had a functioning message broker with a reactive client API, I could simply have listened for events, and dispatched them on arrival.

Yikes! The test tooling is indeed unwieldy in places, but it shouldn't be so painful to create integration tests. Can you open an issue to provide more details on your experience?

jferrant Mar 2, 2023
Maintainer

Task estimation & planning

I am terrible at time estimation. I think I could be better if we could do some kind of collective secret ballot voting on what a task's time estimate could be, and take the median time. Then, my bad estimates could be thrown out in the near-term, and would approach the median estimate from the group in the long term.

I like @kantai's suggestion that no task should take more than one week (5 business days). If it does, it should be broken down further.

I am really late to this discussion and not sure if we already do this... Something I liked during poker plannings at my prior jobs to help with time estimates/ticket descriptions: if someone voted a number very out of the range of other votes or if there was no obvious consensus on a time estimate, it was very good indication that we needed to further discuss the task and add a better description to the ticket. For example, If someone voted an 8 when everyone else voted 3, we would continue discussion as the person who voted the 8 may have been aware of something that everyone else was not or may have had a misunderstanding of what the scope of the ticket was. We would almost always end up with better flushed out tickets/better problem definitions and would then have a re-vote that came to a better consensus. Not sure if this is something we could do, but I definitely found it helpful in the past.

igorsyl Mar 2, 2023

Task estimation & planning

I am terrible at time estimation. I think I could be better if we could do some kind of collective secret ballot voting on what a task's time estimate could be, and take the median time. Then, my bad estimates could be thrown out in the near-term, and would approach the median estimate from the group in the long term.
I like @kantai's suggestion that no task should take more than one week (5 business days). If it does, it should be broken down further.

I am really late to this discussion and not sure if we already do this... Something I liked during poker plannings at my prior jobs to help with time estimates/ticket descriptions: if someone voted a number very out of the range of other votes or if there was no obvious consensus on a time estimate, it was very good indication that we needed to further discuss the task and add a better description to the ticket. For example, If someone voted an 8 when everyone else voted 3, we would continue discussion as the person who voted the 8 may have been aware of something that everyone else was not or may have had a misunderstanding of what the scope of the ticket was. We would almost always end up with better flushed out tickets/better problem definitions and would then have a re-vote that came to a better consensus. Not sure if this is something we could do, but I definitely found it helpful in the past.

Thanks for your comment! I have been planning to do this at the project level but you have pointed out that it could be done even at a granular level. Why don't you @jferrant drive this initiative during our whiteboard session later today?

jcnelson Mar 2, 2023
Maintainer

Something I liked during poker plannings at my prior jobs to help with time estimates/ticket descriptions

I've done this before as well, and it seemed to work. Happy to try it out!

diwakergupta · 2023-02-01T20:02:26Z

diwakergupta
Feb 1, 2023
Maintainer Author

Started this wiki page to capture some concrete things folks were aligned on:
https://github.com/stacks-network/stacks-blockchain/wiki/Development-Process:-Norms-and-Best-Practices

We can use this discussion to hash out ideas, and as we get alignment on various things, we can update the wiki.

3 replies

wileyj Feb 6, 2023
Maintainer

Coming into this later since i was away, but I had some initial thoughts as I catch up

One open question I have is around the tooling we use to manage the work here.

Most tasks should be scoped such that they can be completed in 3-5 days.

I think that would be an excellent goal; my question is mostly around what changes we'd have to make to how we manage work today (github) or if another tool could be better used to manage the work.

I'll concede that with enough effort, github can mostly replicate other project management software - but our core developers would rather be working on the issues rather than acting as project managers.

I think we should consider other tools/workflows (including changing how we use github) to help us achieve what's started in the "Tasks Scoping" wiki

diwakergupta Feb 6, 2023
Maintainer Author

If there are specific things we need from Github issues / projects that don't exist, by all means explore other options. I personally don't think tooling is a problem or bottleneck here. We don't even use features that have been available on Github for a while, let alone new features that are steadily being added (e.g. https://github.blog/changelog/label/projects/)

If you can list concrete things you'd like to see that are currently not possible via Github project management, let's get a list going!

wileyj Feb 6, 2023
Maintainer

I don't think it's a problem of the capability - it's more around how some other tools may may make development tasks easier (especially considering the 3-5 day scope idea) without the requirement that developers need to be their own project managers.

I don't have specific ideas on what to do, just thinking of ways we could try to make the tools we use work better, or find different tools to achieve what we think we need to accomplish.

I'll think about it; dig into github projects docs and the like - if something makes sense i'll share here.

diwakergupta · 2023-02-07T17:58:16Z

diwakergupta
Feb 7, 2023
Maintainer Author

Notes from discussion on 2023-02-01

Attendees: @jcnelson , @kantai , @obycode , @saralab , @donpdonp , @igorsyl , @netrome , Sergey
Facilitator / Note taker: @diwakergupta

Note: I tried my best to faithfully yet concisely capture the discussion from the meeting. I apologize for any errors / omissions. Please feel free to add if I missed or misrepresented anything.

After quick context setting + silent pre-read, the group emoji voted to prioritize the areas listed in the original post, for discussion. Rough prioritization that emerged:

Norms for pull requests
Dependency management (esp. downstream dependencies)
Testing hygiene
Developer productivity

The group then started discussion on the first area. Unfortunately this took up most of the time. The group agreed this call was more a beginning of conversations around these topics, not the end.

Norms for pull requests

Group agreed this was one of the highest priorities as it has downstream implications on pretty much everything.
Address scope during planning, not during implementation (so no lines of code restrictions). Smaller scope will result in smaller PRs.
Maybe leverage PR or issue templates to introduce an explicit planning or design step.
There was discussion around merging PRs in a (potentially long-lived) feature branch and then merging the feature branch into the primary branch. There was push back on this because then the PR to merge feature branch to primary is just a giant blob that people gloss over, there's a false sense of security since individual PRs had been reviewed, but that might mask real issues. So prefer to merge directly into the primary branch.
Suggestion of including design docs with PRs, or formalizing design docs before code PRs are opened. This is particularly important for any non-trivial component that interacts with other components.
Then discussion shifted to turnaround time for reviews on PRs. Some in the group shared that review latency slows the overall time to merge by a factor of 10 or more.
Suggestion to look into tooling or bots to remind reviewers
Suggestion to set aside time daily to review PRs (every individual should figure out a workflow that works for them). There was also a suggestion to set aside dedicated time each week to review any PRs that are overdue for reviews.
Suggestion to look at metrics for PR review time and see if things are getting better over time.

Impact on downstream dependencies

There was suggestion to run nightly "smoke" tests that could incorporate API, wallet etc. Counter was that integration failures with downstream consumers are symptoms, not the root cause. Root cause might be not documenting clear specs ahead of time and keeping implementations faithful to such specs. Further, the different versions of API, wallet etc make the combinatorial testing space huge. So we should focus more on creating fleshed out specs early, and faithfully implementing them.
General recognition that this is a hard problem and needs more discussion

Testing hygiene

Suggestion that more tenured developers should spend more time on technical debt or non-feature but foundational improvements. This of course assumes there's a pipeline of new devs that can be onboarded effectively and can take on feature work.
Low hanging fruit might be to expand test runners capacity -- that seems to slow down a lot of PRs
People who write the code should own the quality (so making dedicated "test engineers" responsible for quality might not be a good idea)
Need more documentation on how to write test and how to run them locally
CodeCov is currently broken?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development process improvements #3542

{{title}}

Replies: 3 comments 13 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Development process improvements #3542

diwakergupta Jan 30, 2023 Maintainer

Replies: 3 comments · 13 replies

jcnelson Jan 30, 2023 Maintainer

zone117x Jan 31, 2023 Maintainer

jcnelson Jan 31, 2023 Maintainer

jferrant Mar 2, 2023 Maintainer

igorsyl Mar 2, 2023

jcnelson Mar 2, 2023 Maintainer

diwakergupta Feb 1, 2023 Maintainer Author

wileyj Feb 6, 2023 Maintainer

diwakergupta Feb 6, 2023 Maintainer Author

wileyj Feb 6, 2023 Maintainer

diwakergupta Feb 7, 2023 Maintainer Author

Notes from discussion on 2023-02-01

Norms for pull requests

Impact on downstream dependencies

Testing hygiene

diwakergupta
Jan 30, 2023
Maintainer

Replies: 3 comments 13 replies

jcnelson
Jan 30, 2023
Maintainer

zone117x Jan 31, 2023
Maintainer

jcnelson Jan 31, 2023
Maintainer

jferrant Mar 2, 2023
Maintainer

jcnelson Mar 2, 2023
Maintainer

diwakergupta
Feb 1, 2023
Maintainer Author

wileyj Feb 6, 2023
Maintainer

diwakergupta Feb 6, 2023
Maintainer Author

wileyj Feb 6, 2023
Maintainer

diwakergupta
Feb 7, 2023
Maintainer Author