Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Create an example API. #3869

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

matts1
Copy link
Collaborator

@matts1 matts1 commented Jun 13, 2024

This PR is meant to trigger a discussion on what the specific design of the API should look like.

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@matts1 matts1 mentioned this pull request Jun 13, 2024
4 tasks
@matts1
Copy link
Collaborator Author

matts1 commented Jun 13, 2024

I believe that the best command to start with would probably be jj log, since the APIs for that are probably ~80% of what's needed for APIs (I can run `jj describe -m "..." myself pretty easily, for example, but determining what the commit message currently is so I can modify it would be hard).

Copy link
Collaborator

@PhilipMetzger PhilipMetzger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments and questions, as I only can represent a maintainers view on this. I think that our embedders probably are better reviewers when this gets into a workable state.

Comment on lines +62 to +64
// Open question: Is a template a first-class citizen of jj, or is it specific
// to jj-cli?
string rendered = 7;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, template rendering should be a first-class citizen of the jj client/CLI.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling @martinvonz's comment into here to discuss further in a single thread.

My gut feeling is that templating should not be in the API.

Seems the two of you are disagreeing here, so I'll list what I think the consequences are of each option:

  • If a template is a first-class citizen of jj, then you should be able to call ListRevision(template="...") and have it return Revision(rendered = "...")
  • This means that any extension should be able to use your templates / template aliases that you defined
    • This means that the jj daemon will need to read your config to be aware of your templates
      • I personally don't have a problem with this - I think that the daemon is definitely going to need to read your config regardless, since I think it'll be required for Revset
  • If a template is not a first-class citizen of jj, then it must be done purely from the jj cli.
    • Potential performance overhead of this, since jj-cli will need to always fill all fields just in case the template references them (unless you can do a significant amount of pre-processing to determine what properties of the template it's gonna reference).
    • Templates will be useable only for jj-cli (eg. extensions cannot use them).

After thinking about it in more detail, I think that adding it as a first-class citizen is probably my preferred choice. I can imagine that, for example, an extension such as jj piper upload might want to add a template alias so that a user could override it from printing:

Exported CLs:
  http://cl/643830362 qtoyoqum d5973671 cl/643830362 | foo

And instead print:

Exported CLs:
  http://cl/643830362 foo

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine that, for example, an extension such as jj piper upload might want to add a template alias so that a user could override it from printing:

I thought the conclusion from the design doc was that the RPC is basically for adding new UIs while the Rust API is usually required for adding new commands. Did I misunderstand/misremember?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like some miscommunication went on. I suspect that my jj piper upload example may have been a confusing choice of command, since it already exists, and wasn't implemented as an extension (since we don't support extensions, and even if we did, we'd require modifying jj for the extra backend support).

My assumption was that:

  • Commands could be added via an extension (FR: JJ extension suppport #3575)
  • An extension would be a binary external to jj (and thus not neccesarily written in rust) that does whatever it wants under the hood via the jj API.

For example, if we chose to implement gerrit support as an extension rather than natively in jj, I assumed that it might look like (very pseudocode):

#!/usr/bin/env python3

import jj

args = argparse.ArgumentParser(...).parse_args()
client = jj.client()
transaction = client.StartTransaction()
rev = client.GetRevision(jj.RevisionRequest(transaction=transaction, revset=args.revset))
rev = client.UpdateDescription(transaction=transaction, revision=rev, description=client.description + "Change-ID: ...")
client.EndTransaction(transaction)
subcommand.run(["git", "push", "--remote", "gerrit", f"{rev.commit_id}:refs/for/main"]
print(f"Sent, {rev} to gerrit for review")

My assumption In extension would be a binary external to jj, and that it wouldn't have to be implemented in rust. I assumed that I'd be able to write an extension in python, for example, and then use the RPCs to implement it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused because I had assumed that it was mutually exclusive with making templates a first-class citizen of the jj client. Could you explain what you're hoping for in more detail

My assumption was that either templates would format a Commit object inside the daemon, and be treated as a first-class citizen:

let revision = client.GetRevisions(ListRevisionsRequest(template="..."))
let rendered = template.rendered

Or templates would be purely for the use of jj-cli:

let revision = client.GetRevisions(ListRevisionsRequest());
let template = Template::new("...");
let rendered = template.render(revision);

And that in this second scenario, they would be unable to be a first-class citizen of the jj-client, since a python extension, for example, would be unable to use the rust-native implementation of Template

In the second scenario, we definitely aren't exposing the actual storage type, but in the first scenario, templates work the same way they currently do, and I don't think that would involve exposing the actual storage type either (the end-user would just receive a string).

I think API clients should have access to the templates, but as Martin says this may have an perf overhead and it also isn't clear on how to expose them correctly. I think a lightweight variant of them should be sensible for now. The reason for that is, like I've written in the doc to discourage stdout parsing.

After thinking about it in more detail, I think that adding it as a first-class citizen is probably my preferred choice. I can imagine that, for example, an extension such as jj piper upload might want to add a template alias so that a user could override it from printing:

Exported CLs:
  http://cl/643830362 qtoyoqum d5973671 cl/643830362 | foo

And instead print:

Exported CLs:
  http://cl/643830362 foo

My opinion on that is that we need general commit metadata anyways, so piper upload can set something like "key" => codereview "value" => cl/NNNNN which we then can standardize across forges and can make available in templates. This should require no extension work.

RE: conclusion conversation, in my mind it isn't done yet so I'd be nice have a conclusion there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have an perf overhead, and It also isn't clear on how to expose them correctly

My thoughts was that you would execute a ListRevisions(revision_mask=RevisionMask(template="..."), and then each Revision returned would have the template filled in.

If you executed ListRevisions(revision_mask=RevisionMask(...)) (no template), then we wouldn't calculate the templates. This should have no performance overhead, since templates are only calculated when we explicitly request for them. In fact, this is likely actually cheaper, since if you calculated templates client side, you'd need to calculate all fields just in case they're needed for templates.

My opinion on that is that we need general commit metadata anyways, so piper upload can set something like "key" => codereview "value" => cl/NNNNN which we then can standardize across forges and can make available in templates. This should require no extension work.

I think you may have misunderstood what my point was here. I was simply saying that an arbitrary extension may want access to templates to change how commits are printed. I was simply suggesting that the user might want to change the template from seperate(" ", codereview_link(), change_id(), commit_id(), branches(), "|", description()) to seperate(" ", codereview_link(), description()), and that this wasn't easily possible if the extension doesn't have access to templates.

// to jj-cli?
string rendered = 7;

map<string, File> files = 8;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should have a specific RepoPath type, as could model the whole rename tracing API a bit better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would a RepoPath type look like? I think it's a good point that we should track renames, but I'm not convinced it should be in the key of the map, since that means the user would need to write something like revision.files[RepoPath(current_path="foo", parent_path="bar")], which seems like a really awkward API.

How would you feel about tracking renames with additional metadata in the File object instead?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a RepoPath type like the one in jj-lib is useful just for type safety. I don't see the connection to renames, however (i.e., I think it would be about equally useful whether or not we support renames).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically any server, remote or local would benefit from a restricted type with some guarantees. The current design could mean that you can escape the repo/workspace which isn't something we want to support anyway.


package jj_api.objects;

message Revset {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be named BuiltinRevset or so for the first class types and then just Revset for the user input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of that idea. The user input is just a string, so doesn't need its own type, and I think the term BuiltinRevset just adds additional confusion to a user of the API.

@martinvonz WDYT?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow revsets strings here, then we presumably allow nesting revset strings inside other revset nodes (e.g. intersection=[change_id="xyz", revset="main.."]). That seems reasonable and useful for a UI.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of that idea. The user input is just a string, so doesn't need its own type, and I think the term BuiltinRevset just adds additional confusion to a user of the API.

Then how do custom downstreams implement builtins? Imagine if Google implements a commit_visible_to(user, group) for Piper, how would you access it in the implementation? Anyways that was my thought.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption was that the custom downstream would just use Revset(revset="commit_visible_to(user, group)"). I'm also not sure I understand how your proposal would solve that problem.

Alternatively, we could just add a type for a custom function call.

message Arg {
  oneof {
    Revset revset = 1;
    string literal = 2;
  }
}

message FunctionCall {
  string function_name = 2;
  repeated Arg args;

message Revset {
  oneof kind {
    // existing options
    FunctionCall function_call;
  }
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption was that the custom downstream would just use Revset(revset="commit_visible_to(user, group)").

Yeah, that definitely works.

I'm also not sure I understand how your proposal would solve that problem.

It would allow downstreams to add it to their "BuiltinRevset" enumeration, although this could also be problematic if it leads to tag overlaps.

Alternatively, we could just add a type for a custom function call.

I like that solution, but it is very low-level and I'm not sure if that aligns with the stated goal.

Comment on lines +18 to +26
// 3) Remove this from the API and put it in jj-cli entirely. Templates would,
// instead of formatting a Commit object, format a Revision proto.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally don't like exposing the actual storage type, so a wrapper is my preferred option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, you're saying that you'd prefer that a Template should format a revision proto rather than the Commit object?

I'm a little confused because I had assumed that it was mutually exclusive with making templates a first-class citizen of the jj client. Could you explain what you're hoping for in more detail

My assumption was that either templates would format a Commit object inside the daemon, and be treated as a first-class citizen:

let revision = client.GetRevisions(ListRevisionsRequest(template="..."))
let rendered = template.rendered

Or templates would be purely for the use of jj-cli:

let revision = client.GetRevisions(ListRevisionsRequest());
let template = Template::new("...");
let rendered = template.render(revision);

And that in this second scenario, they would be unable to be a first-class citizen of the jj-client, since a python extension, for example, would be unable to use the rust-native implementation of Template

In the second scenario, we definitely aren't exposing the actual storage type, but in the first scenario, templates work the same way they currently do, and I don't think that would involve exposing the actual storage type either (the end-user would just receive a string).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, you're saying that you'd prefer that a Template should format a revision proto rather than the Commit object?

I'm saying that the API server should have a distinct definition of what is stored, as not everyone will want a full commit. See #2988 for my thoughts on that, because I think Google wants to run jj api like the ClangD implementation.

For the other part I'm gonna answer upthread.

Comment on lines 11 to 14
// This is generally the first rpc you should be calling.
// It deviates from the norm by returning an object rather than a response so
// that you can pass that in to options in later requests.
rpc Snapshot(jj_api.rpc.SnapshotRequest) returns (jj_api.objects.RepoState) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought of the option of making this implicit? Otherwise a Begin/EndTransaction pair probably would be better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this, but I think you raise a good point. In general, I believe that while a user interface should have reasonable implicit defaults, APIs should be explicit. I think there's a decent middle-ground though, where we allow a RevisionsRequest to do it all in one RPC, but it will throw an error if they don't explicitly say whether they want to snapshot or not.

I do like your Begin/EndTransaction idea for read-write operations (which I haven't really considered yet), so I've fleshed out how I think that might look.

My design is, roughly:

  • You can create a transaction from a RepoState.
  • Write operations will require a transaction ID to be set
  • Readonly operations will require either a transaction ID or a RepoState, since they don't require a transaction to have begun.

Copy link
Owner

@martinvonz martinvonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting late, but here are a few comments for now

Comment on lines +62 to +63
// Open question: Is a template a first-class citizen of jj, or is it specific
// to jj-cli?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question! My gut feeling is that templating should not be in the API.

However, if we return all the data any caller might need, it might get unnecessarily slow. For example, it's currently expensive to check if merge commits are empty (because we check that by merging the parents' trees). So we may want to be able to indicate which data we want back. There may be some value in being able to make that conditional like you can in a template (e.g. only care whether a commit is empty if it's mutable). That suggests something like templates, but I'm not sure it will be needed. Maybe callers that care enough can send multiple requests instead (e.g. send an additional request for each mutable commit). Even if we provide some support for conditionals like that, I don't think we should return a rendered string. The the caller might have to parse it. It seems better to return it in some structured form.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if we return all the data any caller might need, it might get unnecessarily slow.

I'm open to adding fields to the RevisionMask proto.

There may be some value in being able to make that conditional like you can in a template (e.g. only care whether a commit is empty if it's mutable).

I'm not a huge fan of this, this seems unnecessarily complex, and I can't see a use case for it. I think that, for example, if the user wanted to retrieve the diff, and thus only wanted to know if the current commit was empty, but needed the parent commit, then they can simply do a GetRevision on the current revision, and add a mask for the parents so that the parents and the revision you actually care about return different sets of data.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of the current CLI as an example user of the API, but I think our conclusion from the design doc was that the current CLI needs more control than we expect to provide via the RPC API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sounds like we both understood the conclusion differently here. I understood our conclusion to be that the current CLI currently needs more control than we can expect to provide via the RPC API, but the conclusion I drew was that we would have to do significant refactorings to the current CLI in order to change that. I think, however, that your conclusion is probably better for the short-term, and for now we shouldn't try to migrate jj-cli over.

That being said, I think we should design this with the assumption that if jj-cli needs more control than we expect to provide, then other users may need that as well. Thus, I think we should design this API with the intent that if jj-cli requires more than this API can provide right now, we should attempt to design it so that we can at least add those features to the API in the future so we could migrate.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're ever going to get to a point where the RPC API is basically as flexible as the Rust API, I would recommend looking at e.g. jj parallelize and rewrite that using an hypothetical RPC API (i.e. no need to actually implement the RPC API). I'm not yet convinced that it's feasible to get the RPC API to that point, but I'm also not convinced that it's not. Using some moderately complex existing command like jj parallelize seems like a good way of figuring whether it is.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parts I were thinking of as hard were the transaction management and the callback in transform_descendants(). I think we previously said the API would be stateless. I see that that has now changed with the transaction methods. Callbacks might be implementable using streaming RPCs. Maybe we'll implement some such state management using RefCell and it won't be much of a problem. I'm sure it's all solvable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I originally planned for the API to be stateless, because I assumed that the transaction would take a lock and didn't want an extension crashing or something similar blocking something else from running.

I'm honestly on the fence and wanted to experiment with both options.

What I like about stateful RPCs

  • You can do more complex things that might not neccesarily map to a single jj command
  • You can perform atomic operations containing lots of things at once
  • Potential performance benefits when doing complex things. For example, if we had to perform a bunch of rebases, and then update a bunch of commit descriptions, we could do it in a single transaction, allowing us to skip creating the intermediate commits between the two.

What I like about a 1:1 mapping of RPC <-> jj command

  • Users will generally think in terms of jj commands, so it's much easier to use
  • Users don't have to implement complex commands like jj parallelize themselves

I think it's mostly a question of power vs simplicity. Low-level operations as commands give you a lot of power, but are far more difficult to use. I think, to be honest, looking at it now, we need those high-level operations (if a user had to implement parallelize themselves, they're just gonna use the jj cli instead of the jj api).

We could potentially implement both sets, with the high-level commands written in terms of those low-level commands, and I think that's probably the best option, but that will introduce a maintainence burden (not sure how much).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly on the fence and wanted to experiment with both options.

What I like about stateful RPCs

  • You can do more complex things that might not neccesarily map to a single jj command
  • You can perform atomic operations containing lots of things at once
  • Potential performance benefits when doing complex things. For example, if we had to perform a bunch of rebases, and then update a bunch of commit descriptions, we could do it in a single transaction, allowing us to skip creating the intermediate commits between the two.

What I like about a 1:1 mapping of RPC <-> jj command

  • Users will generally think in terms of jj commands, so it's much easier to use
  • Users don't have to implement complex commands like jj parallelize themselves

I think it's mostly a question of power vs simplicity. Low-level operations as commands give you a lot of power, but are far more difficult to use. I think, to be honest, looking at it now, we need those high-level operations (if a user had to implement parallelize themselves, they're just gonna use the jj cli instead of the jj api).

If we take the approach of both providing a low-level API and a CLI like API, they should be separate services with different guarantees to not have such a large API surface on a single service.

We could potentially implement both sets, with the high-level commands written in terms of those low-level commands, and I think that's probably the best option, but that will introduce a maintainence burden (not sure how much).

I agree but think that you still should put your sketch of the high level API up, even if it gets superseded by the low-level API for this PR.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should focus on the high-level API for now. The level I'm thinking of would be roughly matching the CLI. The purpose is to make it easier to implement UIs and IDE extensions.

We can add lower-level functionality later IMO.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really care about the design as long as it's usable and I have no clue of any internals here
but as a developer for other projects I can say that what I want from an api right now is just something that outputs information from e.g. jj diff, jj st, jj log and allows to use basic commands
this is required to be able to write integrations which are more or less future proof (language independent once that gets implemented) and you don't need to parse ugly text

Comment on lines +7 to +9
// Should ideally only be used when the user types in their own revset.
// Otherwise, construct it using this proto.
string revset = 1;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this resolve revset aliases? If it does, it implies the server will have to know how to read the user's config.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I wrote the below before your reply above)
Finding the user's config is not a problem since the daemon is running as the user. The question is more whether the API should take the user's config into account. Since the API is meant (in part) to make it easier to write UIs, it seems like it should know about the user's config. (The lib crate, however, is meant to be able to run in a server and host multiple users, so it doesn't read the user's config.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should. If an extension ever wants to take input from a user, I think that the user would likely have the assumption that the revset will work correctly with their aliases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a reasonable middle ground where we could take in a boolean use_user_config, and default it to False. I think that there should be a way for the APIs to use user config, but I think your point is quite reasonable, and so maybe we should just not do so by default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added GlobalOptions (similar to jj-cli's global options), which now has the field use_user_config and extra_configs.

@matts1 matts1 force-pushed the api-protos branch 3 times, most recently from cf0e303 to f816a64 Compare June 17, 2024 01:39
This PR is meant to trigger a discussion on what the specific structure of the API should look like.

This crate contains only the minimal set of code for an external library to start using it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants