Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Exception while running node 'Reader':pop index out of range #5618

Closed
wants to merge 7 commits into from

Conversation

x110
Copy link
Contributor

@x110 x110 commented Aug 24, 2023

Fixes #4954
This is an implementation to the solution proposed by @Koenlaermans and @julian-risch

@x110 x110 requested a review from a team as a code owner August 24, 2023 16:25
@x110 x110 requested review from anakin87 and removed request for a team August 24, 2023 16:25
@CLAassistant
Copy link

CLAassistant commented Aug 24, 2023

CLA assistant check
All committers have signed the CLA.

@x110 x110 changed the title Reader bug bug: Exception while running node 'Reader':pop index out of range Aug 24, 2023
@anakin87 anakin87 requested review from julian-risch and removed request for anakin87 August 25, 2023 07:53
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@x110 Thank you for working on this issue and opening a pull request! 🙂 The newly introduced handle_qa_overlap_duplicates(...) looks quite good already. There are just two issues currently. breaked is not used and the final return is at the wrong indentation level. I describe it in more detail in my comments below.

To get the PR ready to be merged, there are two more small things. Could you please add a release note with reno as described here? It's much easier than it sounds at first.
It would be great if you could add a test case to this PR in the following file: https://github.com/deepset-ai/haystack/blob/main/test/nodes/test_reader.py
The test case should be limited to just testing the new function handle_qa_overlap_duplicates by providing some inputs and checking that the result is as expected.

if ans.confidence < pot_dupl_ans.confidence:
pred.prediction.pop(ans_idx)
breaked = True
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of break we should use return here to break out of both for loops. The lines breaked = False and breaked = True can also be removed then. For context, feel free to have a look a my comment on the original issue here: #4954 (comment)

else:
overlapping_doc_pred.prediction.pop(pot_dupl_ans_idx)

return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return is at the wrong indentation level. Currently the outer for loop for overlap in relevant_overlaps is stopped by return after the first execution of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Julian. Fixed the issues you have mentioned. I am looking into the release notes.

@x110 x110 requested a review from a team as a code owner August 31, 2023 10:42
@x110 x110 requested review from dfokina and removed request for a team August 31, 2023 10:42
@julian-risch
Copy link
Member

@x110 Sorry for not getting back to you earlier. Did you test this PR in one of your own projects and can confirm that it solves the issue? Before we can merge it, we would need a test case in https://github.com/deepset-ai/haystack/blob/main/test/nodes/test_reader.py
The test case should be limited to just testing the new function handle_qa_overlap_duplicates by providing some inputs and checking that the result is as expected.
If you want I could also look into it myself and possibly open a PR with a green CI to start with.

@masci masci changed the base branch from main to v1.x November 24, 2023 12:05
@masci masci added the 1.x label Nov 24, 2023
@x110
Copy link
Contributor Author

x110 commented Nov 25, 2023

No problem at all! Appreciate your follow-up. I haven't had an opportunity to test the PR in my projects just yet. While working on a test case for handle_qa_overlap_duplicates, I encountered some challenges capturing the appropriate inputs for the function. Please feel free to initiate a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception while running node 'Reader':pop index out of range + solution
4 participants