-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement: Distinguish between error message and context in validation of spans #2329
Enhancement: Distinguish between error message and context in validation of spans #2329
Conversation
in validation of spans
for more information, see https://pre-commit.ci
Codecov ReportBase: 92.55% // Head: 91.83% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## develop #2329 +/- ##
===========================================
- Coverage 92.55% 91.83% -0.72%
===========================================
Files 159 161 +2
Lines 7840 7897 +57
===========================================
- Hits 7256 7252 -4
- Misses 584 645 +61
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Hi @tomaarsen Looks much more readable! Perhaps it makes sense to directly link them to the provided input. Also, I think the substring part ( ("ORG", 0, 6) - 'This i' defined in 'This is Tom...' |
Even better. I also considered output along the lines of this:
But I suspect that we would be best off implementing #1843 instead and using colors. Thoughts? I'm down to update this PR according to your suggestion, by the way. |
Yes, I agree that something like that would even be better, but might require a bit more custom code/logic, so I would, for now, go for a quick win, and based on the initial So, if it doesn't take up too much time, the last proposed design seems best to me. |
I've updated the error to match the proposed message from #2329 (comment). I think we should skip the proposed error from #2329 (comment). import argilla as rg
record = rg.TokenClassificationRecord(
text = "This is Tom's text",
tokens=["This", "is", "Tom", "'s", "text"],
prediction=[("ORG", 0, 6), ("PER", 8, 16)],
)
I think this PR is ready. The next step could come when #1843 is implemented.
|
Hi @tomaarsen , Sorry, readability is still a bit off for me. I think the "defined in" makes it a bit unclear Spans:
("ORG", 0, 6, "This i")
("PER", 8, 16, "Tom's te")
Tokens:
['This', 'is', 'Tom', "'s", 'text'] Spans:
("ORG", 0, 6) - "This i"
("PER", 8, 16) - "Tom's te"
Tokens:
['This', 'is', 'Tom', "'s", 'text'] Spans:
("ORG", 0, 6, "This i") - "This is Tom..."
("PER", 8, 16, "Tom's te") - "...s is Tom's text"
Tokens:
['This', 'is', 'Tom', "'s", 'text'] |
No worries, I'm open to iterating here.
I'm curious to hear your thoughts. |
I wanted to avoid wordiness because we are dealing with a lot of text/words. If you feel option 2 is too unintuitive, I would prefer your final proposal. |
No, I think option 2 is fairly clear, especially as we must consider that the snippet follows |
Updated the output: import argilla as rg
record = rg.TokenClassificationRecord(
text = "This is Tom's text",
tokens=["This", "is", "Tom", "'s", "text"],
prediction=[("ORG", 0, 6), ("PER", 8, 16)],
)
I'm open to any additional feedback. |
Description
I expanded slightly on the error message provided when providing spans that do not match the tokenization.
Consider the following example script:
The (truncated) output on the
develop
branch:The distinction between
defined in
andThis is
is unclear. I've worked on this.The (truncated) output after this PR:
Note the additional
'
. Note that the changes rely onrepr
, so if the snippet contains'
itself, it uses"
instead, e.g.:Type of change
How Has This Been Tested
Modified the relevant tests, ensured they worked.
Checklist
I have merged the original branch into my forked branch
I added relevant documentation
follows the style guidelines of this project
I did a self-review of my code
I added comments to my code
I made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
Tom Aarsen