Skip to content
This repository has been archived by the owner on Jun 27, 2019. It is now read-only.

Encapsulate LigGitSharp and expose our own abstraction layer for the bits we need to. #3

Closed
dazinator opened this issue Apr 20, 2015 · 32 comments
Labels
Milestone

Comments

@dazinator
Copy link
Member

I think we are all agreed this should be a long term goal for the core (correct me if I am wrong)

I'm not sure if it's better to do this for version 1, if we do it for version 2, we will potentially have to refactor all the tools multiple times (once for version 1, and again for version 2)

Perhaps it would be better to have this in the initial release - as this will no doubt form a major part of the public API for GitCore? What do you think?

@dazinator
Copy link
Member Author

And here is a related gist on this topic, explaining some work done in GitReleaseNotes to encapsulate LigGitSharp and expose a GitRepositoryContext

https://gist.github.com/dazinator/cef95296b27e2f965716

@dazinator
Copy link
Member Author

@JakeGinnivan - I'm struggling to understand the following class - is this fairly useful / helper method functionality that can be placed in the core do you think? Would be greatful if you could help me understand it's usage a bit more?

https://github.com/GitTools/GitReleaseNotes/blob/master/src/GitReleaseNotes/Git/GitRepositoryInfoFinder.cs

@GeertvanHorrik GeertvanHorrik modified the milestones: 1.0.0, 2.0.0 Apr 21, 2015
@dazinator
Copy link
Member Author

I've gone slightly off track and started to create a fluent API for git, over here: https://github.com/dazinator/FluentGit

I've got some simple tests passing.

My aim is to go through the tools, and look at the git functions they need. i.e cloning / fetching / examining tags, examining remotes / examining branches / committing etc.

I'll be looking at surfacing those functions in FluentGit over time.

@dazinator
Copy link
Member Author

  • Ultimately, once its mature, I can just contribute the code files directly to GitTools.Core - or we can embed it via costura so it's available for all of the tools to use.

@JakeGinnivan
Copy link
Contributor

Cool, you should share with the libgit2sharp guys

What I was thinking is more a facade over git for our uses in GitVersion. For example:

interface IGitCoreFacade
{
    IEnumerable<BranchInfo> GetBranch(GitCoreContext context);
    IEnumerable<CommitInfo> GetCommits(GitCoreContext context);
    IEnumerable<TagInfo> GetTags(GitCoreContext context);
    string GetTextFileContent(GitCoreContext context);
}

Then we would have a switching layer which would pick the best implementation, to start there would just be LibGit2SharpFacade : IGitCoreFacade but I would want a GitHubGitRepoFacade : IGitCoreFacade which is used when the context is set to a github report repo.

Then when the build server checks out a single commit we can just hit the github API's to get the branch/tag/commit info we need for gitversion to run properly and we don't need to do the dynamic repo stuff at all.

The dynamic repo stuff would then be moved behind the LibGit2Facade because it is an implementation detail of 'the repo gitversion is running in is not a complete repo, so we are missing information'

@JakeGinnivan
Copy link
Contributor

We might even be able to use ls-remote instead of doing the dynamic repo stuff. https://www.kernel.org/pub/software/scm/git/docs/git-ls-remote.html

Assuming libgit2sharp supports it.

@GeertvanHorrik
Copy link
Contributor

Where would you like to see the documentation?

@JakeGinnivan
Copy link
Contributor

@GeertvanHorrik for this interface? In the GitTools.Core repo

But basically the idea is to look at the git usage across GitVersion, GitReleaseNotes, GitHubReleaseManager and create a facade which hides us from the details of what needs to happen

@GeertvanHorrik
Copy link
Contributor

I mean: @dazinator requested docs how the dynamic repo stuff works. Is it worth creating a GitTools-docs or will we also put that on readme.io?

@JakeGinnivan
Copy link
Contributor

So the facade would not have concepts of cloning, current state/working directory etc. It should just be a way to query metadata and maybe fetch the content of a blob (so we can fetch config files or something).

@JakeGinnivan
Copy link
Contributor

@GeertvanHorrik ah ok. Just throw in the wiki for the moment

@GeertvanHorrik
Copy link
Contributor

@dazinator
Copy link
Member Author

@GeertvanHorrik - thank you for that.

There are some additional things happening though that I'd be really grateful for any insight with as my git knowledge isn't the best (i'm working on that!)

After the bare clone is made, and before the Fetch is done, we:

  1. Ensure there is exactly 1 remote. (Throw an exception if any other number)
  2. Ensure the remote has Fetch refspecs - if there are not - we add some in.

After the Fetch is done, but before switching to the correct branch:

  1. CreateMissingLocalBranchesFromRemoteTrackingOnes() - isn't this what an ordinary clone would do?
  2. If the HEAD Is detached, find the branch with the commit SHA for HEAD, if one is found, Checkout() that branch, if one is not found, CreateFakeBranchPointingAtThePullRequestTip(). If multiple are found, throw an exception.

I am wondering if this logic needs to be revisited before its pulled into the core verbatim. Some questions are:

  1. The check for exactly one remote - could this be fragile with cached long living dynamic repo's that are just re-fetched over time - i.e could we ever expect more than one remote ever to appear and do we need to handle it - i.e specifying which remote to fetch from, or fetching from all of them?
  2. Refspecs - does bare cloning not add in refspecs? I'll read up on these, although I am guessing it causes fetching from that remote to update the local branches?
  3. We seem to be doing a bare clone, but then additional work to patch up the repo ready for use. Are we still certain that doing a bare clone plus this work is giving us gains compared to doing an ordinary clone?

@dazinator
Copy link
Member Author

@JakeGinnivan

but I would want a GitHubGitRepoFacade : IGitCoreFacade which is used when the context is set to a github report repo.

I had no idea github offered an API for querying repo information - doh! One that gives an alternative to using libgit2sharp then, - but obviously only for github repos! So your facade idea now makes complete sense to me, however - what's the advantage you see in having multiple API's to implement (libgit2sharp, github api etc) over cloning and always using one api (i.e libgit2sharp)?

@GeertvanHorrik
Copy link
Contributor

A 1/2) Ensuring the remote is just to get the right remote (pr / branch / whatever). The refs are required for pull requests

B 1) A bare clone only clones the default branch, we need to ensure all branches we need are avialable
B 2) Not sure, I think @JakeGinnivan wrote that part

C 1/2) I don't think we check for 1 remote. I think we search for the right remote.
C 3) Yes, bare clone simply means: don't create the files (so only create a .git). We still need to additional work to switch branches, etc

@dazinator
Copy link
Member Author

@GeertvanHorrik - this is the ensure remote code I took from GitReleaseNotes - seems to throw if not 1 remote.

 private Remote EnsureSingleRemoteIsDefined()
        {
            var remotes = Repository.Network.Remotes;
            var howMany = remotes.Count();

            if (howMany == 1)
            {
                var remote = remotes.Single();

                Log.InfoFormat("One remote found ({0} -> '{1}').", remote.Name, remote.Url);
                return remote;
            }

            var message = string.Format("{0} remote(s) have been detected. When being run on a TeamCity agent, the Git repository is expected to bear one (and no more than one) remote.", howMany);
            throw new InvalidOperationException(message);
        }

@dazinator
Copy link
Member Author

  • and that would have no doubt been copied from GitVersion

@GeertvanHorrik
Copy link
Contributor

Definitely not my code because it involves TeamCity, I don't use that ;-)

@dazinator
Copy link
Member Author

haha ok - sounds like i shouldn't use this code verbatim then :)

@gep13
Copy link
Member

gep13 commented Apr 23, 2015

I had some hand in this here as well:

https://github.com/ParticularLabs/GitVersion/pull/262/files
https://github.com/ParticularLabs/GitVersion/pull/286/files

This was based on conversations with @nulltoken. These issues were found when implementing support for AppVeyor, but it was decided that they could be brought into the core GitHelpers as well.

@JakeGinnivan
Copy link
Contributor

I had no idea github offered an API for querying repo information - doh! One that gives an alternative to using libgit2sharp then, - but obviously only for github repos! So your facade idea now makes complete sense to me, however - what's the advantage you see in having multiple API's to implement (libgit2sharp, github api etc) over cloning and always using one api (i.e libgit2sharp)?

Performance, we don't need the blobs from the repository we just need the metadata (refs etc). If we optimise for GitHub then when using github it will be super fast with no cloning/disk IO. This will be especially handy for things like https://github.com/JakeGinnivan/GitReleaseNotes/tree/master/src/GitReleaseNotes.Website which we could host on azure or somewhere, and saving the bandwidth costs for GitHub could make a difference if they get popular

@JakeGinnivan
Copy link
Contributor

If the HEAD Is detached, find the branch with the commit SHA for HEAD, if one is found, Checkout() that branch, if one is not found, CreateFakeBranchPointingAtThePullRequestTip(). If multiple are found, throw an exception.

When building pull requests it is a hidden ref, it is not under refs/heads/ which means that if you checkout a pull request ref refs/pull/*/merge or refs/pull/*/head then you will have a detached head. We then create a fake branch called refs/heads/pull/<pr number>/merge or /head so the rest of the logic can work the same as if it was not a pull request

@JakeGinnivan
Copy link
Contributor

The check for exactly one remote - could this be fragile with cached long living dynamic repo's that are just re-fetched over time - i.e could we ever expect more than one remote ever to appear and do we need to handle it - i.e specifying which remote to fetch from, or fetching from all of them?

A dynamic repo is created per repo url, so if that changes it will be a fresh clone

@JakeGinnivan
Copy link
Contributor

Ensure there is exactly 1 remote. (Throw an exception if any other number)

I think this only is checked if we are running on a build server. Running locally in a real repo should not do any fetching or these checks

@dazinator
Copy link
Member Author

Performance, we don't need the blobs from the repository we just need the metadata (refs etc). If we optimise for GitHub then when using github it will be super fast with no cloning/disk IO.

Nice. But just to play devils advocate, wouldn't performing ongoing fetches on a repo (and then querying the repo directly locally) be faster / more efficient than querying accross the network i.e to github to get stuff every time by their api? I guess my point here is isn't there some kind of tradeoff between a local cache with incremental updates (i.e pay a high price initially to populate the cache) versus continually accessing the same info accross the network (info that could have been held locally for example)
I don't know how much of what I have said applies in this instance, just chucking it out there

@dazinator
Copy link
Member Author

Performance, we don't need the blobs from the repository we just need the metadata (refs etc). If we optimise for GitHub then when using github it will be super fast with no cloning/disk IO.

  • Until you want a blob - but i guess then you just are paying for the blobs you actually need, rather than having them all locally even if you dont end up needing them

@JakeGinnivan
Copy link
Contributor

Any blobs should be accessible via the working directory rather than the git repo.

And we have to fetch the metadata everytime, yes, but the payloads should be small. We still have to do a fetch which will include the changes to blobs which will likely be much larger than any API request.

@dazinator
Copy link
Member Author

Focusing back on remote repo support, in the core.

When building pull requests it is a hidden ref, it is not under refs/heads/ which means that if you checkout a pull request ref refs/pull//merge or refs/pull//head then you will have a detached head.

@JakeGinnivan thanks for pointing me in the right direction there, I've educated myself a bit more fully on that topic now :)

I'm now thinking about how the core will fetch optimally - i.e only the branches / pr branches that each tool needs.

For example, currently, we ensure the remote has ref specs for all branches - but then doing a fetch could be wasteful if the tool in question only actually only requires a single branch to be fetched - so perhaps this doesn't belong in the core.

I'd hazard that each tool will need total control over the refspecs used, and fetch operations.

I'm thinking it would be better to:

  1. Perform the clone (if necessary, i.e repo doesnt allready exist)
  2. Leave default refspecs alone.
  3. Do a fetch based on explicit refspecs passed in from the particular tool)
  4. Hand the repo (now its updated to the latest) back to the tool to use normally.

Something along those lines..

@JakeGinnivan
Copy link
Contributor

The issue is that say for GitVersion, if you are on develop we need to fetch all tags and master. With feature branches we do inheritance of config based on the source branch. Without all branches, not sure we could do that?

If we could work remotely with refspecs without cloning/fetching that may work?

@GeertvanHorrik
Copy link
Contributor

Sorry for my absense, had a few deadlines. Going to pick up GT again this week. What's the status of this feature? Is it manageable for the next release?

@dazinator
Copy link
Member Author

I've not had any further time to look at this either - apologies.

I was going to go through each of the tools to list out, how it uses libgit2sharp - to ensure that we can expose a compatible interface in the Core - as per @JakeGinnivan idea:

interface IGitCoreFacade
{
    IEnumerable<BranchInfo> GetBranch(GitCoreContext context);
    IEnumerable<CommitInfo> GetCommits(GitCoreContext context);
    IEnumerable<TagInfo> GetTags(GitCoreContext context);
    string GetTextFileContent(GitCoreContext context);
}

I don't think I'll be able to start on this for a while, so if someone wants to try to make some headway with it then feel free..

@asbjornu
Copy link
Member

As GitVersion stopped using GitTools.Core in GitTools/GitVersion#1581, this issue is unfortunately not relevant anymore and the whole repository will be archived. Please try the latest build of GitVersion and see if it solves whatever problem you had when you submitted this issue. If the problem persists, please submit a new issue or PR against the GitVersion repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants