-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add LostInTheMiddleRanker #5457
Conversation
Pull Request Test Coverage Report for Build 5739623079
💛 - Coveralls |
PR Analysis
PR Feedback
|
Here @bglearning have a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting new ranker! Now I am curious to see how it performs. The tests are extensive, which is great! 👍 I have a couple of change requests regarding the implementation but the general functionality looks good.
logger = logging.getLogger(__name__) | ||
|
||
|
||
class LostInTheMiddleRanker(BaseRanker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we can come up with a name that expresses that relevant documents will be ranked so that they end up at the top and bottom and irrelevant documents end up in the middle.
LITM is a very technical name that rarely anyone using Haystack will know. I don't have a better idea yet. Maybe you can come up with something together with @dfokina ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually like LITM name because it is unique and well descriptive and associative with that now "famous" paper. So I'd keep it if there are no objections...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the LITM name in a sense that it is a clear reference to the paper. I don't think that the users would be specifically searching for this type of ranking without understanding the background for it?
Anyway, I tried to think about possible alternatives, but no good ideas so far. They're coming up pretty bad (IrrelevantMiddleRanker, TopBottomRanker, IrrelevantDocsInTheMiddleRanker...)
4a03765
to
ee62713
Compare
I just pushed some docstrings updates and added the ranker to pydoc. |
bf83076
to
c5fcc30
Compare
df14403
to
2c46028
Compare
@vblagoje Overall this looks really good!
See the discussion here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Question: Write an elaborate review of this PR and give not more than five recommendations on how it can be improved Answer: Here are some recommendations for improvement:
|
Question: Please rate the readiness of this Pull Request (PR) for integration on a scale of 1 to 10, with 1 indicating it is completely unready and 10 signifying it is fully prepared for merging into the main branch. First elborate and then provide your rating Answer: However, there are a few minor points that could be improved:
Considering these points, I would rate this PR as an 8. It's well-prepared and mostly ready for merging, but could benefit from a few minor improvements. |
Addressed Julian's concerns, have to dismiss his review as it is blocking merging and Julian is on PTO
* Add lost in the middle ranker * Add release note * Julian's feedback: more precise version of truncate * Better comments for the litm algorithm * Sebastian PR feedback * Add check for invalid values of word_count_threshold * Remove _truncate as it is not needed any more --------- Co-authored-by: Darja Fokina <[email protected]>
What?
Introduces a new ranker,
LostInTheMiddleRanker
. This ranker ranks documents based on the "Lost in the Middle" order, designed to position "the best" documents (low index in the given list of documents) at the beginning and the end of the resulting list while placing "the worst" documents (high index in the given list of documents) in the middle.Why?
The "Lost in the Middle" order is a strategy for laying out paragraphs in a way that places the most relevant paragraphs at the beginning or end of the LLMs context window while the least relevant paragraphs are placed in the middle. This strategy is based on the recent research paper "Lost in the Middle: How Language Models Use Long Contexts" by Liu et al. Implementing this strategy as a ranker in Haystack allows users to leverage this approach in their RAG pipelines.
How can it be used?
The
LostInTheMiddleRanker
can be used like other rankers in Haystack. After initializing the ranker with the desired parameters, it can be used to rank/reorder a list of documents based on the "Lost in the Middle" order.How did you test it?
The functionality of the
LostInTheMiddleRanker
has been tested via unit tests intest/nodes/test_lost_in_the_middle.py
file. These tests ensure the ranker correctly reorders documents and handles edge cases appropriately.Notes for the reviewer
Please review the implementation of the
LostInTheMiddleRanker
and its associated tests; perhaps some edge unit test cases were not covered.