revset: add `fork_point` function #4795

bnjmnt4n · 2024-11-07T22:01:19Z

This can be used to find the best common ancestors of a revset with an arbitrary number of commits, which cannot be done currently.

Checklist

If applicable:

I have updated CHANGELOG.md
I have updated the documentation (README.md, docs/, demos/)
I have updated the config schema (cli/src/config-schema.json)
I have added tests to cover my changes

algmyr · 2024-11-08T12:56:30Z

I don't think GCA is the term used for this, it's lowest (or sometimes last) common ancestor. I would probably be fine with lca, but maybe others might not be as used to the abbreviations. Expanding it fully feels quite wordy though. Maybe common_ancestors(revset) is precise/descriptive enough?

yuja

Perhaps, it's the "greatest" in terms of generation numbers? I feel both gca and lca are equally cryptic.

common_ancestor(), best_ancestor(), closest_ancestor()?

lib/src/default_index/revset_engine.rs

lib/src/revset.rs

bnjmnt4n · 2024-11-08T13:59:55Z

Git calls it best common ancestor (via git merge-base):

git merge-base finds the best common ancestor(s) between two commits to use in a three-way merge. One common ancestor is better than another common ancestor if the latter is an ancestor of the former. A common ancestor that does not have any better common ancestor is a best common ancestor, i.e. a merge base. Note that there can be more than one merge base for a pair of commits.

bnjmnt4n · 2024-11-12T05:19:08Z

Does anyone else have any thoughts on the naming? Also, should the name be in singular or plural form (common_ancestor vs common_ancestors)? I went with plural because the function can return multiple results if there are multiple common ancestors.

yuja · 2024-11-12T07:10:44Z

If we don't say "best", "greatest", etc., a singular form would suggest that it won't include all common ancestors. So I prefer singular common_ancestor().

bnjmnt4n · 2024-11-12T07:17:42Z

Now that I think about it, I agree that common_ancestors seems to indicate ::a & ::b & ::c & .... What about best_common_ancestors?

yuja · 2024-11-12T07:43:23Z

What about best_common_ancestors?

That seems good, but isn't it too wordy? I'm okay with either common_ancestor() (singular), best_common_ancestor[s]() (singular/plural). The return value is usually singular.

emilazy · 2024-11-12T14:59:39Z

Would ancestor be too much? I feel the fact that it’s common is implicit to the fact that you’re passing multiple arguments. Also, I might be misreading, but ancestor() should be root() (it looks like it might currently be none()?).

bnjmnt4n · 2024-11-12T15:30:14Z

Would ancestor be too much? I feel the fact that it’s common is implicit to the fact that you’re passing multiple arguments.

ancestor is way too close to ancestors, which obtains the ancestors of all commits in the given revset. Additionally, both functions technically accept only a single revset argument, but the revset can resolve to multiple commits instead.

Also, I might be misreading, but ancestor() should be root() (it looks like it might currently be none()?).

This is an interesting point, I'm not familiar enough with graph theory (?) to say if the best common ancestors of an empty set of nodes is the root of the graph.

algmyr · 2024-11-12T16:55:27Z

I think the best common ancestor of a single commit should be itself, and the best common ancestor of an empty set ought to be none (it doesn't exist)

emilazy · 2024-11-12T16:58:08Z

Hmm, you might actually be right. We want the property that ancestor(As | Bs) is ancestor(ancestor(As) | ancestor(Bs)), right? Therefore, yes, ancestor(A) (where A is a single commit) should be A, and ancestor(none()) should be the identity element, and I guess none() is that identity?

martinvonz · 2024-11-12T16:59:04Z

Something like fork_point() might also work. I'm not sure I like it better, but figured I'd share it at least.

joyously · 2024-11-12T17:05:51Z

How can a commit be its own ancestor?

martinvonz · 2024-11-12T17:19:40Z

How can a commit be its own ancestor?

We simply define it such that a commits ancestor includes itself. We do that because it's often useful in practice. Mercurial does the same.

emilazy · 2024-11-12T17:31:59Z

How can a commit be its own ancestor?

In the same way that you can define the ancestors of a person as:

A person is their own ancestor.
The parents of a person’s ancestor are also their ancestors.

which is consistent but gives more answers than a definition that excludes the first reflexive condition. It’s the same reason that the sum of one number is itself and the product of one number is itself, even though there’s no addition or multiplication going on there; similarly why we define the sum of no numbers to be 0, and the product of no numbers to be 1. (In mathematical terms, we’re trying to make this operation form a monoid, because that results in good properties.)

emilazy · 2024-11-12T17:34:11Z

Or, to motivate it directly in terms of a defining property: the greatest common ancestor of a set of commits is the commit that all elements of the set are descendents of, such that there is no descendent of the ancestor that also satisfies that property. For a single commit, that’s the commit itself. (This still leaves the choice of the ancestor of an empty set somewhat arbitrary, though it already makes it clear that there can be no single unique choice for many commit graphs, which motivates none() as a result.)

joyously · 2024-11-12T23:33:12Z

I think if it includes itself, it shouldn't be called ancestor. Commits don't appear out of thin air, do they? Wouldn't there at least be root()?

algmyr · 2024-11-12T23:41:52Z

I think if it includes itself, it shouldn't be called ancestor.

In graph theory, when talking about things like lowest common ancestor (LCA) you consider nodes to be descendants of themselves.

emilazy · 2024-11-12T23:52:38Z

It’s also the same case as the greatest common divisor of a single number being itself (which Python’s math.gcd implements).

joyously · 2024-11-12T23:53:16Z

In graph theory, when talking about things like lowest common ancestor (LCA) you consider nodes to be descendants of themselves.

If this is simply for the name of a function in code, call it whatever and explain with a comment.
If it's something a user will see, don't confuse us with "you are your own ancestor", even if another VCS did it.

emilazy · 2024-11-12T23:55:04Z

It’s the standard graph theory terminology, not a VCS‐specific quirk, and the mathematically obvious definition. People anyway aren’t likely to call ancestor on single commits except as a result of computed revsets, where the single‐commit case will provide correct results that are consistent with its behaviour on revsets of other sizes.

algmyr · 2024-11-13T00:15:54Z

Hmm, you might actually be right. We want the property that ancestor(As | Bs) is ancestor(ancestor(As) | ancestor(Bs)), right? Therefore, yes, ancestor(A) (where A is a single commit) should be A, and ancestor(none()) should be the identity element, and I guess none() is that identity?

This is mostly a tangent, but the empty case is a bit iffy and you end up having to be careful in how you define what a common ancestor is (to avoid issues of vacuous truths).

E.g. for a set of nodes S and we want to define the set of all common ancestors (ignoring the lowest/best part), it's tempting to try

"common_ancestors(S) are nodes with all nodes in S as descendants"

But this behaves poorly if S is empty. It's vacuously true that any node is an ancestor of all nodes in the empty set!

If you make the (sensible) constraint that common ancestors need to be an ancestors of some node in S, then you should end up with nicer behavior that gives you the none result.

emilazy · 2024-11-13T01:32:18Z

Some more motivation of the base cases here:

If we desire that ancestor(as | bs) equals ancestor(ancestor(as) | ancestor(bs)), as it does for the >1 commit case, then we immediately get ancestor(as) = ancestor(as | none()) (because as | none() = as) = ancestor(ancestor(as) | ancestor(none())), which can only (as far as I can see) be equal to ancestor(as | none()) if ancestor(none()) = none() and ancestor(ancestor(as)) = ancestor(as), which gives us both the zero and one commit cases (assuming I understand correctly that ancestor() should never return a revset of more than one commit – although I wonder if that’s actually a desirable property, as I think it differs from heads(::a & ::b)?).

More pragmatically, a common practical use of this operation – Git’s git merge-base – is to answer the question “what’s the most ‘recent’ commit I can base my work on if I need it to be able to cleanly merge into these branches without pulling in additional commits only present in some of them?”. For the single‐commit case, the answer is that you can base that work on the commit itself, since there’s nothing “missing” from the non‐existent other branches. (Although, in that case I think it doesn’t give us a clear answer for ancestor(none()), since I think root() would be a valid answer there; it’s other properties that rule that case out.)

Base cases are hard. It wasn’t that long ago that people found the idea of zero very alien, and rejected formulas like n + 0 = n, n × 0 = 0, or the idea that you could sum up no integers at all and get 0, or take the product of no integers at all and get 1. But once you work out the rules it’s clear that there’s only one correct answer, and that skipping defining it because it seems weird and unintuitive at first reduces the power and convenience of your operations and forces you to introduce special cases elsewhere in the system. (Corollary of base cases being hard: I might have the reasoning wrong here! But I’m pretty confident that returning the input for a single commit is the correct behaviour, that any other concrete choice would make things worse and break things you would reasonably want to do, and that declining to define it would make things more awkward and less useful in general.)

yuja · 2024-11-13T07:27:51Z

assuming I understand correctly that ancestor() should never return a revset of more than one commit

There's an edge case that ancestor() is resolved to multiple revisions, so I'm afraid that we cannot define ancestor(*xs) based on ancestor(a | b | ...).
a1ced5b#diff-6bd5d4c0fd459e2c1cb3f1e0d880d6f23c4e3b11863adc76db3931e4b94d194dR2524-R2542

Something like fork_point() might also work.

Sounds also good. It would reflect user's intent.

martinvonz · 2024-11-13T18:28:29Z

Something like fork_point() might also work.

Sounds also good. It would reflect user's intent.

Actually, maybe even just forks() since we have merges() (not merge_point()). "Forks" and "merges" seem to work well as analogies to roads forking and merging.

EDIT: But an important difference between them is that merges() doesn't take an input set, so maybe forks() (and fork_points()?) would make it sound like forks() returns all commits with more than one child.

yuja · 2024-11-14T00:47:15Z

EDIT: But an important difference between them is that merges() doesn't take an input set, so maybe forks() (and fork_points()?) would make it sound like forks() returns all commits with more than one child.

Yes, maybe we can add merge_point(xs) that finds the best common descendants, but I don't know if this name makes sense.

martinvonz · 2024-11-14T17:09:31Z

There doesn't seem to be a clear consensus and no one seems to feel strongly. To avoid this just getting stalled, I'm fine with @bnjmnt4n making the final decision on the name. Sounds good? Or do other feel like there's a name that's clearly best (and or names that they really don't want)?

bnjmnt4n · 2024-11-15T12:47:03Z

I like the suggested fork_point, which is relatively short and doesn't have any immediately ambiguous/conflicting meanings. I'm not too sure about the preferred definition of fork_point in docs/revsets.md: would anyone have any suggestions for the definition?

algmyr · 2024-11-15T14:22:23Z

Maybe it makes sense to give the example of the two commit case (which I think is heads(::x & ::y)) and just say fork_points generalizes this to an arbitrary number of commits?

yuja · 2024-11-15T14:37:01Z

We have "examples" section in the doc. fork_point() should probably be added there.

This can be used to find the fork point (best common ancestors) of a revset with an arbitrary number of commits, which cannot be expressed currently in the revset language.

martinvonz · 2024-11-15T18:30:14Z

docs/revsets.md

+    * `fork_point(D|B)` ⇒ `{A}`
+    * `fork_point(B|C)` ⇒ `{A}`
+    * `fork_point(A)` ⇒ `{A}`
+    * `fork_point()` ⇒ `{}`


nit: maybe fork_point(none()) so users don't think it can be called with no arguments

bnjmnt4n force-pushed the bnjmnt4n/push-xykukoyzpukl branch from 441f09c to 6afd5b2 Compare November 8, 2024 07:36

yuja reviewed Nov 8, 2024

View reviewed changes

lib/src/default_index/revset_engine.rs Show resolved Hide resolved

lib/src/revset.rs Outdated Show resolved Hide resolved

bnjmnt4n force-pushed the bnjmnt4n/push-xykukoyzpukl branch from 6afd5b2 to 9264792 Compare November 8, 2024 14:55

bnjmnt4n changed the title ~~revset: add gca function~~ revset: add common_ancestors function Nov 8, 2024

bnjmnt4n force-pushed the bnjmnt4n/push-xykukoyzpukl branch 2 times, most recently from c713fe9 to a1ced5b Compare November 12, 2024 05:17

bnjmnt4n force-pushed the bnjmnt4n/push-xykukoyzpukl branch from a1ced5b to 3fee11e Compare November 15, 2024 12:45

bnjmnt4n changed the title ~~revset: add common_ancestors function~~ revset: add fork_point function Nov 15, 2024

revset: add fork_point function

4c5e8e5

This can be used to find the fork point (best common ancestors) of a revset with an arbitrary number of commits, which cannot be expressed currently in the revset language.

bnjmnt4n force-pushed the bnjmnt4n/push-xykukoyzpukl branch from 3fee11e to 4c5e8e5 Compare November 15, 2024 17:36

martinvonz approved these changes Nov 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revset: add `fork_point` function #4795

revset: add `fork_point` function #4795

bnjmnt4n commented Nov 7, 2024 •

edited

Loading

algmyr commented Nov 8, 2024 •

edited

Loading

yuja left a comment

bnjmnt4n commented Nov 8, 2024

bnjmnt4n commented Nov 12, 2024

yuja commented Nov 12, 2024

bnjmnt4n commented Nov 12, 2024

yuja commented Nov 12, 2024

emilazy commented Nov 12, 2024 •

edited

Loading

bnjmnt4n commented Nov 12, 2024 •

edited

Loading

algmyr commented Nov 12, 2024

emilazy commented Nov 12, 2024 •

edited

Loading

martinvonz commented Nov 12, 2024

joyously commented Nov 12, 2024

martinvonz commented Nov 12, 2024

emilazy commented Nov 12, 2024

emilazy commented Nov 12, 2024

joyously commented Nov 12, 2024

algmyr commented Nov 12, 2024

emilazy commented Nov 12, 2024

joyously commented Nov 12, 2024

emilazy commented Nov 12, 2024

algmyr commented Nov 13, 2024 •

edited

Loading

emilazy commented Nov 13, 2024 •

edited

Loading

yuja commented Nov 13, 2024

martinvonz commented Nov 13, 2024 •

edited

Loading

yuja commented Nov 14, 2024

martinvonz commented Nov 14, 2024

bnjmnt4n commented Nov 15, 2024

algmyr commented Nov 15, 2024

yuja commented Nov 15, 2024

martinvonz Nov 15, 2024

revset: add fork_point function #4795

Are you sure you want to change the base?

revset: add fork_point function #4795

Conversation

bnjmnt4n commented Nov 7, 2024 • edited Loading

Checklist

algmyr commented Nov 8, 2024 • edited Loading

yuja left a comment

Choose a reason for hiding this comment

bnjmnt4n commented Nov 8, 2024

bnjmnt4n commented Nov 12, 2024

yuja commented Nov 12, 2024

bnjmnt4n commented Nov 12, 2024

yuja commented Nov 12, 2024

emilazy commented Nov 12, 2024 • edited Loading

bnjmnt4n commented Nov 12, 2024 • edited Loading

algmyr commented Nov 12, 2024

emilazy commented Nov 12, 2024 • edited Loading

martinvonz commented Nov 12, 2024

joyously commented Nov 12, 2024

martinvonz commented Nov 12, 2024

emilazy commented Nov 12, 2024

emilazy commented Nov 12, 2024

joyously commented Nov 12, 2024

algmyr commented Nov 12, 2024

emilazy commented Nov 12, 2024

joyously commented Nov 12, 2024

emilazy commented Nov 12, 2024

algmyr commented Nov 13, 2024 • edited Loading

emilazy commented Nov 13, 2024 • edited Loading

yuja commented Nov 13, 2024

martinvonz commented Nov 13, 2024 • edited Loading

yuja commented Nov 14, 2024

martinvonz commented Nov 14, 2024

bnjmnt4n commented Nov 15, 2024

algmyr commented Nov 15, 2024

yuja commented Nov 15, 2024

martinvonz Nov 15, 2024

Choose a reason for hiding this comment

revset: add `fork_point` function #4795

revset: add `fork_point` function #4795

bnjmnt4n commented Nov 7, 2024 •

edited

Loading

algmyr commented Nov 8, 2024 •

edited

Loading

emilazy commented Nov 12, 2024 •

edited

Loading

bnjmnt4n commented Nov 12, 2024 •

edited

Loading

emilazy commented Nov 12, 2024 •

edited

Loading

algmyr commented Nov 13, 2024 •

edited

Loading

emilazy commented Nov 13, 2024 •

edited

Loading

martinvonz commented Nov 13, 2024 •

edited

Loading