-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lc thinks celt license is BSD-2-Clause-NetBSD because of copyright #38
Comments
Of course I notice this just after I push out 1.3.1. Could you supply the example of it with the copyright removed please. I will then be able to replicate it more closely. Having issues doing so right now. |
Eh no worries, i am testing lc in a huge repository so there will be a stream of bug reports.
|
Cool ill have a look at it in my afternoon then. If you find more issues by then I can probably group them all together. |
OK so for this #39 #40 #41 its all the same issue. Going to close the others as they are all duplicates. The issue is that after identifying the licenses using keywords a percentage match is taken using the Vectorspace. Because that's fuzzy it pops the wrong license to the top. I'm either going to remove this portion OR have it combine that percentage with the keyword percentage. Will run a few tests to see which one works better. Might take a while. I should say, I have a fix that resolves this one at least. I have committed it under branch https://github.com/boyter/lc/compare/Issue40 note that tests are still failing there but it might improve accuracy for you in the short term. Would you be willing to trial it out when done? |
On Thu, Mar 08, 2018 at 10:41:12PM -0800, Ben Boyter wrote:
OK so for this #39 #40 #41 its all the same issue. Going to close the others as
they are all duplicates.
The issue is that after identifying the licenses using keywords a percentage
match is taken using the Vectorspace. Because that's fuzzy it pops the wrong
license to the top. I'm either going to remove this portion OR have it combine
that percentage with the keyword percentage. Will run a few tests to see which
one works better. Might take a while.
Would you be willing to trial it out when done?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.*
sure
|
Ninja edit while you posted that... copied again. I have a fix that resolves this issue at least. I have committed it under branch https://github.com/boyter/lc/compare/Issue40 note that tests are still failing there but it might improve accuracy for you in the short term. If you checkout and build it you may get fewer false positives. |
found another one. ykpers has BSD-2-Clause
|
Pretty convinced the reasons for this are the fallback to the vector space. I might disable that for all except those which I know don't match any of the keyword matches or if nothing matches and fuzzy search is enabled. While doing that I should look into moving it over to streams to get some multi CPU action happening as well. |
100% convinced its the vector space. Playing around with the code,
Examples 3 and 4 are taken from the above so it works now. Going to finish this off with the additional performance tweaks which do the following to the runtime.
to...
Still working on this but looks very promising so far. |
looks amazing :o |
@maxice8 if you are prepared to build from source, if you do so from the current master and try again you should see a marked improvement in detection and performance. Likely to be some bugs in there but its getting closer to being ready for release. |
with copyright lines removed
LICENSE FILE
The text was updated successfully, but these errors were encountered: