-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor db #16
Refactor db #16
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with the code changes themselves, but I am not sure what to test here. @Keyrxng could you give me a use case?
I'm unsure what you mean a use case? Like to practically work with the embeddings in unison with an LLM? If I was testing this PR I'd ensure that the comments and specs are being stored properly. You could create embeddings for a spec or a readme then create a query embedding and see far fruitful it is with that alone. Although chatbot/embedding effectiveness I'd think is sort of out of scope for this. The intention of the PR was to make it easier to work with the embeddings from more than one angle. The class system should make it easier for us to refine our embedding scope before we compare, that way we improve it's accuracy because we can remove entire "sections" of data and have it only consider what we tell it to. The infra isn't in place yet of course, but the class system for e.g we install a chatbot on ubq.fi, easy to do and good value for money, it should be scoped to high level org stuff and guard railed. Moreover, the class system can be used as a weighted tier system for prioritizing certain sections over others under certain conditions. #18 introduces logic that'll gather readmes which would give us something that we could effectively test in terms of having a well-rounded sort of chatbot with actionable milestones for confirming it effectiveness. I'm not sure I answered your question or not please let me know |
@Keyrxng thank you for the clarification. To ask simply, I wondered what I should be looking for when testing, the comments that contain HTML should be stripped from their tags and stored as raw text in the db is that correct? |
I think we should store with full source code and then stripped just text and compare the results. But eventually we should settle on one way and I'm pretty sure it's gonna end up being with the full markup |
I agree, I think we ought to refer to others in the space and see how they are doing it.
Everyone does something similar but different although they all chunk their codebase, split it into semantically effective sections. So we need to determine how we are going to prep our codebase(s) for chunking, if we'll do it all manually or involve some other recommended tooling or whatever. I've mentioned this before but we need to effectively parse multiple different programming languages. We have our top 5 that we use within our org so we should aim to cover at least those with whatever implementation we go with, others should not be considered a priority imo unless whatever tools we use easily support multiple languages. |
I can't help but notice the last two features did not come with tests included despite them being quite pricey in terms of reward. I thought that we were enforcing that new features extend test suites to at least cover the basic added functionality? I'd appreciate being included in code review for AI plugins/features going forward if that is alright with the team. |
Resolves #13
Resolves #15
embeddings
entitybody > title
, so to make up for that I simSearch using the title and body separatelymswjs
DB and handlers like other plugins tend to do (mocking modules kills me for some reason, but the db and handlers I understand 😂)QA:
ignore the row id 8 with the "::", I added
body > title
logic and removed it immediately.I have another pull with the
setup_instructions
logic included, left out as it was not in scope for this PR.