Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance #34

Open
boyter opened this issue Mar 5, 2018 · 8 comments
Open

Performance #34

boyter opened this issue Mar 5, 2018 · 8 comments
Assignees
Labels
enhancement New feature or request its on

Comments

@boyter
Copy link
Owner

boyter commented Mar 5, 2018

The performance could be a lot better through the use of fan out. Might be possible to speed up the matching as well by using byte comparisons rather than string. Need to investigate both as the tool can be quite slow at times.

@boyter
Copy link
Owner Author

boyter commented Mar 9, 2018

Benchmark #1: lc -pbl .git,vendor,licenses -f tabular .

  Time (mean ± σ):      3.425 s ±  0.053 s    [User: 3.617 s, System: 0.122 s]

  Range (min … max):    3.375 s …  3.521 s

Example of performance as it currently stands.

@boyter
Copy link
Owner Author

boyter commented Mar 10, 2018

Initial tests did not look so good. Just adding the fan out to the work was not brilliant as it produced no speed improvements.

Checking against a single file shows that the initial pass takes ~600ms so there is a lot to be gained there as well.

Probably need to consider changing how the whole pipeline works to achieve a speedup here.

@boyter
Copy link
Owner Author

boyter commented Mar 23, 2018

(pprof) top10
Showing nodes accounting for 22200ms, 60.77% of 36530ms total
Dropped 205 nodes (cum <= 182.65ms)
Showing top 10 nodes out of 79
      flat  flat%   sum%        cum   cum%
    6930ms 18.97% 18.97%     6930ms 18.97%  runtime.indexbytebody
    5490ms 15.03% 34.00%    16370ms 44.81%  strings.Index
    1630ms  4.46% 38.46%     1630ms  4.46%  runtime.memeqbody
    1500ms  4.11% 42.57%     2800ms  7.66%  runtime.slicerunetostring
    1280ms  3.50% 46.07%     1280ms  3.50%  runtime.encoderune
    1170ms  3.20% 49.27%     1560ms  4.27%  runtime.mapiternext
    1150ms  3.15% 52.42%     2240ms  6.13%  runtime.mapaccess2_faststr
    1070ms  2.93% 55.35%     1070ms  2.93%  runtime.aeshashbody
    1050ms  2.87% 58.23%     1920ms  5.26%  runtime.scanobject
     930ms  2.55% 60.77%      930ms  2.55%  runtime.memequal

Most of the time is spent in contains and index comparisons. Might be faster to move over to byte comparisons.

@boyter
Copy link
Owner Author

boyter commented May 1, 2018

https://blog.sourced.tech/post/gld/

Called out publicly... oh its on now. Time to double down on performance.

@boyter boyter added the enhancement New feature or request label May 1, 2018
@boyter boyter self-assigned this May 1, 2018
@boyter boyter added the its on label May 1, 2018
@boyter
Copy link
Owner Author

boyter commented May 2, 2018

Although one of their goals is to

"Favor false positives over false negatives (target data mining instead of compliance)."

Which I did not want to do.

@boyter
Copy link
Owner Author

boyter commented Jan 22, 2021

http://web.archive.org/web/20180904032703/https://blog.sourced.tech/post/gld/

updated link because they went away

@boyter
Copy link
Owner Author

boyter commented Jan 27, 2021

Although it seems it lives on somewhat here https://github.com/go-enry/go-license-detector

@boyter
Copy link
Owner Author

boyter commented Jan 27, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request its on
Projects
None yet
Development

No branches or pull requests

1 participant