Skip to content
This repository has been archived by the owner on Sep 11, 2020. It is now read-only.

Proposal: Make CommitObjects(), BlobObjects() and so on methods do not return unreachable objects due to performance problems. #1023

Open
ajnavarro opened this issue Nov 16, 2018 · 2 comments

Comments

@ajnavarro
Copy link
Contributor

Right now, objects interator methods: TreeObjects(), CommitObjects(), BlobObjects(), TagObjects() are returning also unreachable objects.

This is because we are iterating the whole packfile, reading object headers, and returning only the ones that match with the requested type. This is causing really slow iterators on really large repositories because we are skipping constantly other objects.

To avoid that, we should just return objects that are referenced by a git reference.

Just as an example, on CommitObjects() iterator we should simulate the git log --all git command, that means, get all the hashes from references, and start walking the commit history.

WDYT @smola @mcuadros ?

@smola
Copy link
Collaborator

smola commented Nov 16, 2018

@ajnavarro I think go-git behavior is correct at the moment, even if it could use better documentation. Even if it weren't, this is not a backwards compatible change and cannot be done in v4.

But the concern is legitimate for a lot of use cases, including gitbase (src-d/gitbase#617). That's why I would like to propose to solve the problem as follows:

  1. Change gitbase table implementations to do the right thing for its use case.
  2. Provide required APIs in go-git when they are missing.
    2.1. For example, for commits, we already talked about providing an additional option (All bool) to LogOptions to mimmick git log --all, that should be usable by gitbase for the commits table.
    2.2. We could also provide something like revlist.Objects but for specific specific objects and using iterators.

@smola
Copy link
Collaborator

smola commented Nov 19, 2018

I've filed the git.Log one here: #1024
I'm still thinking about the right API for other object types.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants