Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with Github pull requests #4

Closed
qwertzguy opened this issue Apr 1, 2021 · 14 comments
Closed

Integrate with Github pull requests #4

qwertzguy opened this issue Apr 1, 2021 · 14 comments

Comments

@qwertzguy
Copy link

Is there a plan to make this tool integrate with Github's pull requests?

I work at FB right now and I notice that your tool is inspired by FB's workflow and I had the same idea of making a tool to replicate the same workflow with git and Github. Your tool seems to match pretty well what I'm looking for aside from being able to do code reviews of the commit before merging them in the main branch.

@arxanas
Copy link
Owner

arxanas commented Apr 1, 2021

Hi @qwertzguy, I didn't have any specific plans to integrate with GitHub. Can you clarify more about the workflow you'd like to see? Do you mean as a maintainer, to try out pull requests locally, or as a contributor, to create and update pull requests?

You should be able to check out a remote pull request as per the docs here: https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/checking-out-pull-requests-locally. Then you can just commit/rebase/amend commits as normal.

@arxanas
Copy link
Owner

arxanas commented Apr 1, 2021

And if I may ask, how did you hear about this project?

@qwertzguy
Copy link
Author

Hi @arxanas, I mean as a contributor. My idea of the workflow, would be one similar to FB:

  • Checkout master
  • Write code and make a commit
  • Create a PR for that one commit
  • Be able to checkout master again and work on another change and commit it
  • Create a 2nd PR for that one other commit
  • Be able to checkout the 1st commit to make fixes following the comments in the PR and update the PR

Let me know if that makes sense.

I found the project by typing 'git branchless' in Google. I typed that in to try to find tools to have a similar workflow as in FB but with git. So it's pretty by chance that I thought of searching using the same term as the name of your project :)

@arxanas
Copy link
Owner

arxanas commented Apr 1, 2021

I found the project by typing 'git branchless' in Google. I typed that in to try to find tools to have a similar workflow as in FB but with git. So it's pretty by chance that I thought of searching using the same term as the name of your project :)

Perfect 😄

You should be able to use git-branchless in conjunction with Github. I recommend that you get the gh CLI client (https://cli.github.com/) for Github, so that you can do things like create a pull request from the command-line.

Fundamentally, you need to have a branch somewhere in order to interface with Github, since it serves as the canonical reference to the tip of the stack you want to send upstream. You have three choices, varying in how much you want to use branches:

  1. Use branches as normal with a Github workflow. In this workflow, you'll check out directly to the branch and make commits there. When you're ready to update the pull request, you'll git push (or git push -f, if you rebased/amended/rewrote any commits). You would gain the nice visualization for git sl, but not much else.
  2. Update your local branch only when you're ready to update the pull request. You would need to create the branch, but you wouldn't need to check it out. Understand you make your additional commits on top of the branch, you can squash them all together and run git restack, which will update the branch automatically, or you can manually move the branch with git branch -f. Then run git push/git push -f as normal. This is how I prefer to work with Github.
  3. Only create a branch in your remote repository. Then, you can update it to an arbitrary commit using an invocation like git push origin HEAD:my-branch (I think). If you hate branches, then this is the way to go.

The Phabricator diff number D123 is surfaced in the smartlog. I would be open to adding a similar mechanism to surface which Github issues/pull requests are addressed in the commit message.

@qwertzguy
Copy link
Author

qwertzguy commented Apr 2, 2021 via email

@arxanas
Copy link
Owner

arxanas commented Apr 2, 2021

Integrating with Github sounds strategically useful for the project. But I think there are a lot of design decisions that would need to be resolved first.

  • What should the interface for git-branchless be for managing pull requests?
    • It should be decided how this relates to gh pr.
    • Should it be a default aliased command, or should users write something like git branchless pr in order to invoke it? I don't want to add confusion with gh pr.
    • Do you have to run the command once to create the PR (essentially the same as gh pr, except that we may create a branch name)? And then again to update the PR? Or should we create different commands?
  • Generating a branch: I think it would be best to generating a local branch as well. After all, local branches already have a mechanism to keep up to date with remote branches. If we don't reuse the branch name, then there are a number of additional problems we have to solve:
    • Keeping the PR associated with a commit in sync with rewritten versions of the commit. Currently, if you amend/rebase a commit, the commit hash changes, but git-branchless will warn you to run git restack to move any attached branches back into place. We would have to add an implicit or explicit way to move around or copy the PR reference for the commit.
    • When you add a commit to a line of development, it's unclear if the entire stack should be associated with the PR, or the commit which was previously associated with it. More generally, it's not clear if people have a one-commit-per-PR workflow or a many-commits-per-PR workflow. Branches solve this by not moving the branch unless it's checked out when you make the commit.
    • When you add multiple descendant commits to a line of development, it's even more unclear which of the descendant commits should be associated with the PR. Branches also solve this, since it's explicit which commit is associated with the PR.
  • Showing the PR number: showing the PR number might be possible without hitting the network, if the PR number is encoded in the commit somehow.
    • You could put it in the commit message, in the same way as Phabricator diff numbers are put in the commit message, but I imagine there will be some projects with strict commit message requirements that won't like that.
    • It might be best to use the branch name to encode the PR. Actually, then, you wouldn't need a separate field for the PR at all.
  • Showing the status/number of comments:
    • It would take a while to render the smartlog if we have to hit the network. I recall that Facebook solves this by having a separate hg ssl command, which will take longer to render. We could also just have the user configure whether or not they want their smartlog to hit the network.
      • Alternatively, it would be nice if we could somehow update the already-displayed output using some fancy terminal-drawing, but I don't see a way to make it work if more output was produced than can fit in the terminal window.
      • Or, we could show the networkless smartlog first, and then wait on the network request. The user can either cancel with ctrl-C, or we can re-print the smartlog output with the live Github information once it's available.
    • If we hit the network, we may want to make sure that the user has a Github authentication token, so that we're less likely to cause them to hit a rate limit.
    • From the UI perspective, it's a lot of information, and I'm not sure where we would want to put it in the smartlog. I experimented a while ago with having more than one status line per commit, but it was pretty hard for me to read. We might want to try it again — it might be that only showing additional metadata for commits which are linked to a Github PR is not too unwieldy.
  • Command to open in browser: this might be better solved by the user's terminal emulator. For example, iTerm 2 has "smart selections", which are what I use to automatically open Phabricator reviews in the browser (like this: https://hlung.medium.com/using-smart-selection-in-iterm2-to-make-asana-task-id-or-any-text-clickable-dddbbbe2bbe8). If we reuse the branch name to encode the PR number, then perhaps we wouldn't need to surface additional information at all.

Since I don't use Github extensively right now, and since the design issues need to be resolved, I don't have any specific plans to tackle this myself. I'm happy to set up a call to onboard anyone who might be interested in implementing it.

@qwertzguy
Copy link
Author

I've been thinking quite a bit about it, but I'm a bit torn on which solution to use. Ideally I would want something close to the workflows at FB, but at the same time it does not match the github workflow well: for example github doesn't support amending a commit in a PR well, it will replace the commit but not let reviewers view previous versions of the commit.

Here are some thoughts on your points:

  1. Since we'll likely be automating some of the PR creation process, I think it should be something like git branchless pr with a different syntax than gh pr.
  2. I think generating a local branch could be ok, but for most IDEs, it then doesn't show if you are behind on the main branch (or whichever is the branch you made the new branch from).
    1. If we don't do a local branch and instead move the local main branch pointer when user switches from one branchless commit to another, then IDEs will be able to use origin/main as the upstream and make it easier to see if there's commits that have been pushed on the remote and easier to rebase. We can then track the commit using the main branch and save the commit hash when the user changes to another branchless commit. As far as where to keep the info about the remote branch and PR, I think it can be in .git/config which already has info for each local branch, and I think we can put our own config in there which would not be a local branch config, but more of a per commit config.
    2. I think per commit PR would be best to match the FB workflow and gerrit workflow, which I like. However it does make it a bit harder, as then we would need to make one remote branch per commit and then have PRs that have the previous commit's branch as the base branch, which might be a bit weird, and once the previous commit's PR gets merged, we would need to update the PRs of the descendant commits.
    3. I think one branch per commit does help to address being able to update a commit: in github each commit in a branch is a bit like a version of a commit in phabricator, so we could auto rename the commits like v1, v2, etc. Problem though is it will make the git log pretty confusing. So not sure if that's good. Kinda bummer that github doesn't handle amending commits well.
  3. I think we could store the PR info in .git/config
  4. Another option could be that we cache the PR info results and have some daemon that can refresh in the background. Could be useful for fetching remote from time to time as well. Ad there could be a flag / command to force refresh synchronously.
  5. Ah true, easiest would be to display the entire URL, that way there's no config needed in most terminal.

Definitely! I appreciate the discussion and I'm open to work on it. I just need to think a bit more about it and whether it fits well into your project. Out of curiosity: how were you planning to handle code reviews with your tool? Or is the idea to not have code reviews and push into the main branch directly?

@arxanas
Copy link
Owner

arxanas commented Apr 7, 2021

I just need to think a bit more about it and whether it fits well into your project.

Github integration has a definite place in the project, given that most people using it will probably be using Github. At the very least, we should be able to reflect the work in progress on Github (e.g. linking commits to PRs somehow). We'll have to add write operations as necessary in order to support that workflow (e.g. creating/updating PRs).

Out of curiosity: how were you planning to handle code reviews with your tool? Or is the idea to not have code reviews and push into the main branch directly?

There's no specific code review workflow prescribed by the tool at present:

  • For repos which I own, I just rebase my work onto the local main branch once it's done, and then push to the remote main branch. Sometimes I create a local/remote branch pair to get a CI run on Github (example: https://github.com/arxanas/git-branchless/actions/runs/662701629 for the git-v2.31 branch, it fails the tests 😭).
  • For Github PR workflows, I make a branch for the PR/commit and push it to my fork as usual. Once merged/rebased into the remote main branch, I can fetch the remote main branch and rebase my work on top of it.
  • For the Phabricator workflow, I create a branch of development, and use arc diff (actually a company-specific tool arc multi-diff, which handles stacks) to create or update the code reviews on Phabricator. Later, they're reviewed by someone else, and I click the button in the Phabricator interface to submit the commit to the main branch. Finally, I fetch the remote main branch and rebase my work on top of it.

Currently, none of these workflows show review status. I use the Github/Phabricator website tooling to manage my in-progress reviews.

Since we'll likely be automating some of the PR creation process, I think it should be something like git branchless pr with a different syntax than gh pr.

Sounds good to me. We might want to take a step back and think more holistically about the workflow. For example, Phabricator uses a single command to both create and update code reviews (arc diff). Maybe we should have a single command like git submit, which submits a stack of work to the code review backend (Github/Phabricator/etc.)? It can also handle the case of updating all dependent PRs, as you mention later.

If we do so, we should decide whether we want to support the one-commit-per-code-review workflow, or the many-commits-per-code-review-workflow, or both.

I think generating a local branch could be ok, but for most IDEs, it then doesn't show if you are behind on the main branch (or whichever is the branch you made the new branch from).

That's disappointing, but won't that be an issue for people who don't use git-branchless as well? How do they handle their code review management when they don't know how far behind their branches are?

If we want, we can update git sl to show the number of commits ahead/behind our local lines of development are. (NB: an old version of git-branchless used to do this, but the ahead/behind calculation was pretty slow in large repositories. We'd have to pay attention to performance. There's a mergebase-caching mechanism specifically to address this point.)

If we don't do a local branch and instead move the local main branch pointer when user switches from one branchless commit to another

I really don't want to move the local main branch on behalf of the user. It should always represent the remote main branch. There's also the issue that if git-branchless crashes or gets into a bad state after a given command, then the user may not know how to remediate.

then IDEs will be able to use origin/main as the upstream and make it easier to see if there's commits that have been pushed on the remote and easier to rebase

An alternative is that we can have local branches which correspond to PRs always correspond to the remote main branch, which will enable this. Then, instead of git push to update the PR, we can use a theoretical git submit command to push it to the correct remote branch, rather than the remote main branch. (Although, I suppose this could be risky, because users who manually git push will push to the wrong branch... we could hook into the local pre-push hook to detect this case.)

As far as where to keep the info about the remote branch and PR, I think it can be in .git/config which already has info for each local branch, and I think we can put our own config in there which would not be a local branch config, but more of a per commit config.

I'm not sure if the approach you're proposing handles the evolution of commit: when you amend/rebase a commit, it's replaced with an entirely new commit object. I wrote https://github.com/arxanas/git-branchless/wiki/Architecture#commit-evolution to detail this further. What should the behavior be in the following cases?

  • A commit is rebased, resulting in a new commit object. However, the commit-to-PR mapping only has an entry for the previous version of the commit. Should we prompt the user to add a new entry for the new commit object, or automatically do it?
  • A commit is combined with another commit (via a git rebase -i fixup), but they currently have separate PRs. Do we need to combine the PRs somehow?

Branches are a concept which the user is already familiar with, and which already has semantics in the git-branchless system, so I would like to reuse them as much as possible for this use-case as well.

once the previous commit's PR gets merged, we would need to update the PRs of the descendant commits.

This feature alone itself might be a killer feature for some Github users!

Another option could be that we cache the PR info results and have some daemon that can refresh in the background. Could be useful for fetching remote from time to time as well. Ad there could be a flag / command to force refresh synchronously.

So far, I've tried to avoid having long-running processes, as they significantly complicate the architecture. I would prefer to only add a synchronous update flag to git sl. Then the user can opt to always run with that flag (or alias git sl to do so), or they can do it only when they want an update on status. Thus, we won't sacrifice performance without an explicit opt-in from the user.

@arxanas
Copy link
Owner

arxanas commented Apr 24, 2021

There's a lot of prior work in this thread: https://news.ycombinator.com/item?id=26913959

@qwertzguy
Copy link
Author

Got a busy these past weeks. The HN thread is interesting, especially ghstack. I wonder if using git branchless and ghstack together works.

@arxanas
Copy link
Owner

arxanas commented Apr 26, 2021

It was made by a current Facebook engineer, so I'm hopeful that it fits our workflows!

@krobelus
Copy link

Hi, I found this because I have a similar project with the same name: https://github.com/krobelus/git-branchless
I wonder if there is a better name for mine, since it has a smaller scope. Maybe git stacked-branches? 🤔

  1. Checkout master
  2. Write code and make a commit
  3. Create a PR for that one commit
  4. Be able to checkout master again and work on another change and commit it
  5. Create a 2nd PR for that one other commit
  6. Be able to checkout the 1st commit to make fixes following the comments in the PR and update the PR

that is well supported by my tool. In step 4., you don't need to checkout master again, because you don't switch branches in the first place. My git branchless adds branches to the Git object database without needing to ever check them out.
There's also an integration example on how to cherry-pick others' GitHub PRs to your local branch.

@arxanas
Copy link
Owner

arxanas commented Apr 27, 2021

@krobelus whoops, I searched for "git branchless" before I named the project, and I thought I didn't find anything.

It looks like your workflow is a lot simpler for the common case where your commits don't conflict with each other! I suppose my workflow is better only if you want to do a lot of backtracking and make tree-structured histories. I'm not sure what a better name would be. git-treeful, git-forest, git-backtrack?

Maybe for yours, you could emphasize the single branch nature. git-monobranch, git-uberbranch? Or you could emphasize dividing the commits up after the fact. git assign, git divide, git partition, git-group, git-group-by? A name from the latter group would also suggest partitioning by other factors, such as splitting up the commits by which files were touched.

@krobelus
Copy link

krobelus commented May 3, 2021

Right, my workflow doesn't offer a solution for the case when there are conflicts. Thanks for your suggestions, I'm leaning towards git bs now, since it's pretty similar to git ps (though I still need to try that).

(bit of a brain dump ahead, soz)
Yours is interesting - so the user can switch between commits they are not sure about, and the commits can have conflicts and shared history. Getting to any local commit is already possible via the reflog, but the interface can definitely be improved.
What I sometimes dislike about the reflog is that, when I'm searching around and check out some of entries, then reflog entry numbers are shifted. There is no git reflog pause to inspect the state of history without writing changing its state.. I imagine it might be nice to have something like Emacs' undo-tree, or a git bisect for the reflog...
The tree search terminology reminds me of CDCL, so.. is this conflict-driven commit learning? 😆

I like the concept of hiding commits, I wonder what's the best way to attach such metadata to commits.
I'm using commit messages; some other projects are creating new Git objects for metadata; at one point
I've thought about attaching Git notes (notes also survive rebase/amend). One important point
for me is that I want to integrate naturally with standard Git tools and workflows.

@arxanas arxanas closed this as completed Jun 28, 2021
Repository owner locked and limited conversation to collaborators Jun 28, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants