Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Fix the IndexError when CNNDailyMailDatasetReader reads test data. #306

Merged
merged 8 commits into from
Nov 1, 2021
Merged

Fix the IndexError when CNNDailyMailDatasetReader reads test data. #306

merged 8 commits into from
Nov 1, 2021

Conversation

xinzhel
Copy link
Contributor

@xinzhel xinzhel commented Oct 25, 2021

Problem: IndexError happens when the first line of a text file only contains (CNN). This happens for reading test data of CNN news.

Example: You can check this file in the extracted cnn_stories directory: 12078b09d95c01cedb06da7fc63faab540432dee.story.
This file would be read for test data (corresponding to this URL in all_text.txt`: http://web.archive.org/web/20150617021105id_/http://www.cnn.com/2015/04/16/opinions/medical-marijuana-revolution-sanjay-gupta/)

Note: the provided training config does not evaluate on test data. So this error could be ignored in this case.

Solution: remove "(CNN)" before checking the empty string

@epwalsh epwalsh self-assigned this Oct 29, 2021
Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@epwalsh epwalsh merged commit 84ba7cf into allenai:main Nov 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants