-
Notifications
You must be signed in to change notification settings - Fork 42
Conversation
It seems there are some trace lines. |
Trace lines removed, sorry about that |
Are there some performance tests? |
Other than what I've run manually, no. |
We could add golang benchmark or more simple benchmark like a script in a folder contrib ? |
Maybe I have made a mistake in my benchmark (#54) but I don't find optimization on some big repo. Edit: |
@sapk Thanks for writing the tests, I'll look into it |
If you try a repo with many files (e.g. https://github.com/ethantkoenig/manyfiles), you should see a noticeable speed-up (old: 50 seconds, new: 5 seconds on my laptop). I'll try to look for how to make my implementation comparable to the old implementation for non-pathological cases. |
d78c76e
to
ab9e103
Compare
@sapk I've found a faster implementation that improves all 5 benchmarks test (see PR description). Let me know what you think |
ab9e103
to
f5f6f1b
Compare
tree_entry_test.go
Outdated
panic(err) | ||
} | ||
entries.Sort() | ||
b.Run(benchmark.name, func(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This need Go1.7. I think we support at least Go1.6 for gitea but I can't find where it is written ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this repo already doesn't support Go1.6, since we use the "context"
package in command.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was thinking this change wasn't merge because of that.
tree_entry_test.go
Outdated
{url: "https://github.com/torvalds/linux.git", name: "linux"}, | ||
} | ||
for _, benchmark := range benchmarks { | ||
var commit *Commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should still b.StopTimer() ... b.StartTimer() around init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
tree_entry_test.go
Outdated
var commit *Commit | ||
var entries Entries | ||
if repoPath, err := setupGitRepo(benchmark.url, benchmark.name); err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found that it is cleaner to use : b.Fatal(err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must have forget to commit ? there were no change maid here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops 🤦♂️, see #55
tree_entry_test.go
Outdated
const benchmarkReposDir = "benchmark_repos/" | ||
|
||
func setupGitRepo(url string, name string) (string, error) { | ||
repoDir := filepath.Join(benchmarkReposDir, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to ask for tempdir via ioutil.TempDir but migth not be the choice of everyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to have to re-clone the repositories each time the tests run; it took me several minutes to clone them all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so maybe /benchmark/repos and exclude /benchmark, if we include later other benchmark with other resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Globally, I haven't review implementation (yet), benchmark show some little improvement that would be good to have. I made some comments on the benchmark part. I would have prefer that you cherry-pick my commits/PR and made change after but it could pass for this PR ^^. |
f5f6f1b
to
76cec74
Compare
@sapk Rebased to include your commits, sorry about that |
@ethantkoenig couldn't this have some concurrency like old code in order to speed up again ? I know that more routine is not always the solution and that maybe you already test that. |
@sapk I don't think this implementation is as amenable to concurrency as the old one. As far as I can tell, the main benefit from using concurrency in the old implementation was to make calls to |
@ethantkoenig looking at it could have some concurrency but with a different format/strategy. For example a controller starting routines executing git log history and returning by chan the path matching. The controller stop starting new routine when all path are completed. This could be done later. This PR give an improvement and LGTM. (except panic() that need to be changed) This PR could have a test to check regression but I haven't a good idea how to do that so maybe another time. ^^ |
LGTM |
A faster implementation of
GetCommitsInfo
, addresses go-gitea/gitea#491 and go-gitea/gitea#502.The previous implementation made a call to
git log
for each entry, each of which required scanning through the commit history. This new implementation instead makes a single call togit log
. This is faster, because it involves scanning the commit history only once.BENCHMARK RESULTS:
Shoutout to @sapk for the benchmark tests he wrote (#54), which I have stolen here.
Old implementation:
New implementation:
IMPLEMENTATION DETAILS:
Gets the 16 latest commits affecting the relevant entries (
git log --name-only -16 HEAD -- <treePath>
). The output of this command containing which files were affect by each commit. Scan through this list of cmmit, and stop once a commit has been found for each entry.If you go through the first batch of 16 commits, you get then next 32 commits (
git log --name-only -32 <last-commit-from-first-batch>^ -- entry1 entry2 ...
); each time you double the number of commits. This ensures that in the common case you'll only have to read in a small number (16) of commits, but you also won't have to make too many calls togit log
if you need to go further into the commit history.Finally, if you are looking for 32 or fewer entries, manually list out each entry in the
git log
command (git log --name-only -- entry1 entry2...
) to support a more targeted search.