Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git support: what is supported? #89

Closed
gabrielrussoc opened this issue Nov 17, 2021 · 6 comments
Closed

Git support: what is supported? #89

gabrielrussoc opened this issue Nov 17, 2021 · 6 comments

Comments

@gabrielrussoc
Copy link

gabrielrussoc commented Nov 17, 2021

On #88, @fanzeyi mentions " The git support we have is very basic", but it's not very clear to me how one would be able to use eden with Git.

I understand eden was born as a virtual FS for mercurial and the names are all changing, which causes a bit of confusion when reading the docs. For clarity, in this issue I will call:

  • hg: The client. The binary people use to run thinks like hg status and hg commit.
  • eden: The virtual filesystem, which mounts a FUSE on machines able to download objects on demand.
  • mononoke: The server. The one serving and storing objects.

For someone that has a vanilla Git repo (i.e. official git client, remote on github). I considered the following:

eden only

Would it be possible to only use the virtual filesystem ?
I believe the answer is no, but I figured I'd ask it explicitly.
On https://github.com/facebookexperimental/eden/blob/main/eden/fs/docs/Process_State.md, edenFS exposes a thrift API that the vanilla git client does not speak.

In fact I tried to eden clone a git repo and the best I could do was (~/universe is my local vanilla git clone):

$ edenfsctl clone ~/universe ~/universe-eden
$ edenfsctl info
{
  "mount": "/Users/gabriel.russo/universe-eden",
  "scm_type": "git",
  "snapshot": "878577d1fac768214acc7db6fe5c1072e8d24090",
  "state_dir": "/Users/gabriel.russo/local/.eden/clients/universe-eden-3",
  "mount_protocol": "fuse"
}
$ mount
...
eden on /Users/gabriel.russo/universe-eden (macfuse_eden, synchronous)

What does this mean exactly? I can use all the files from the eden checkout just fine, but there is no .git folder there.

My best guess here is that things like git status would be even slower because it wouldn't be leveraging any of eden knowledge, just adding overhead to every filesystem call.

I couldn't figure out whether eden can read from github or any generic git server. Even without the hg client, is eden able to download objects on demand from a git server?

Edit: maybe one can use watchman in front of the vanilla git client? But not sure if this would cover all cases. git status would probably work, but other commands might still end up downloading the whole thing.

eden + hg

In order to actually make use of eden API's, the answer is probably to use hg. I didn't look at the code nor tried it, but my guess is that hg status would be smart enough to make a thrift call to eden and figure out what has changed much faster.

I'm also assuming that everything would just work for a git repository, and both hg and eden know what to do when they see a git object model. This is probably not true, or is it? To what extent?

eden + hg + mononoke

Lastly, if eden is not able to download objects from a generic git server, I figured one must also use mononoke. Is mononoke capable of hosting git or one must migrate the entire object model?

@fanzeyi
Copy link
Member

fanzeyi commented Nov 17, 2021

I can use all the files from the eden checkout just fine, but there is no .git folder there.

Yep, that's one issue, and that's probably the biggest challenge here in terms of git support in EdenFS.

EdenFS knows how to interpret a git bare repository and serve its content as a virtual filesystem. It will also let you make changes in it, and that's pretty much all.

The issue is that, git does not know how to deal with EdenFS (as you described, commands other than git status might just force EdenFS to download verything). We have made our own changes to Mercurial to make it easy to work with EdenFS so it knows what to do when serving status, making a commit and more. Git does not have that.

maybe one can use watchman in front of the vanilla git client?

Yep! You can absolutely do that and it will make things much faster. Watchman is also maintained by us and Watchman knows how to speed up things by talking directly with EdenFS without downloading files, and AFAIK, git knows how to talk with Watchman.

Speaking of which, is Databricks using Watchman? If not, I would highly recommend looking into that first. It would significantly improve git performance and easier to set up.

eden + hg

It might work, but there's probably gonna be some work converting your git monorepo into Mercurial, and you will very likely need to bring Mononoke in as well, or you will need to do some work to decouple them. However, I think there were some discussion on having some sort of Git support in Mercurial. @quark-zju is the expert on this topic.

Is mononoke capable of hosting git or one must migrate the entire object model?

AFAIK, not currently but it might be in the future.

@quark-zju
Copy link
Contributor

There is some interest in supporting git repo transparently in "hg". The rough idea is:

  • Related storage layers know how to read and write git commits, trees, and files (blobs) in the git object store. e.g. "hg commit" writes git objects directly to the git store that can be read by a git client.
  • Do not support the git index (staging area). Use working copy (vanilla, with watchman, or with "eden") implementation from "hg".
  • Do not support git wire-protocol directly. Delegate related commands to git commands.

In this setup, you get the UX (ex. revset), working copy (ex. status), but not the laziness of the backend storage. Objects are local. So there is no need for Mononoke protocols, and no need to convert the repo format.

However, we don't have a timeline at this point.

@quark-zju
Copy link
Contributor

There are some recent progress on git support. Currently the plan is to support 2 modes:

  1. EdenSCM working copy + git (full) bare repo + git exchange. status, log, rebase, commit are handled by EdenSCM. push, pull, clone are handled by vanilla git. This is useful if you want to try the UX on small-ish git repos.
  2. EdenSCM working copy + EdenSCM HTTP protocols. Unlike 1, clone, pull, push use EdenSCM protocols instead of git. This allows the client repo to be lazy. Therefore it is more interesting for larger centralized repos.

1 is being worked on and might be somewhat usable in weeks. 2 will take more time and we hope it can be somewhat usable in a few months.

@gabrielrussoc
Copy link
Author

@quark-zju any news related to git support?

@quark-zju
Copy link
Contributor

quark-zju commented Apr 12, 2022

@gabrielrussoc Mode 1 mentioned above was already implemented. It is marked experimental but does have users internally. You can use clone --git or init --git to get started. It provides the UX but not the scalability.

Mode 2 is being planned. The goal is scalability (lazy commit graph + lazy filesystem). It might be implemented in a way that does not require running a special git server.

@yancouto
Copy link
Contributor

Git news support: The sapling release documents what is supported a lot better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants