v1.26.0
🔆 Release highlights
Spans question
We've added a new type of question to Feedback Datasets: the SpanQuestion
. This type of question allows you to highlight portions of text in a specific field and apply a label. It is specially useful for token classification (like NER or POS tagging) and information extraction tasks.
spans_demo.mp4
With this type of question you can:
✨ Provide suggested spans with a confidence score, so your team doesn't need to start from scratch.
⌨️ Choose a label using your mouse or with the keyboard shortcut provided next to the label.
🖱️ Draw a span by dragging your mouse over the parts of the text you want to select or if it's a single token, just double-click on it.
🪄 Forget about mistakes with token boundaries. The UI will snap your spans to token boundaries for you.
🔎 Annotate at character-level when you need more fine-grained spans. Hold the Shift
key while drawing the span and the resulting span will start and end in the exact boundaries of your selection.
✔️ Quickly change the label of a span by clicking on the label name and selecting the correct one from the dropdown.
🖍️ Correct a span at the speed of light by simply drawing the correct span over it. The new span will overwrite the old one.
🧼 Remove labels by hovering over the label name in the span and then click on the 𐢫 on the left hand side.
Here's an example of what your dataset would look like from the SDK:
import argilla as rg
from argilla.client.feedback.schemas import SpanValueSchema
#connect to your Argilla instance
rg.init(...)
# create a dataset with a span question
dataset = rg.FeedbackDataset(
fields=[rg.TextField(name="text"),
questions=[
rg.SpanQuestion(
name="entities",
title="Highlight the entities in the text:",
labels={"PER": "Person", "ORG": "Organization", "EVE": "Event"}, # or ["PER", "ORG", "EVE"]
field="text", # the field where you want to do the span annotation
required=True
)
]
)
# create a record with suggested spans
record = rg.FeedbackRecord(
fields={"text": "This is the text of the record"}
suggestions = [
{
"question_name": "entities",
"value": [
SpanValueSchema(
start=0, # position of the first character of the span
end=10, # position of the character right after the end of the span
label="ORG",
score=1.0
)
],
"agent": "my_model",
}
]
)
# add records to the dataset and push to Argilla
dataset.add_records([record])
dataset.push_to_argilla(...)
To learn more about this and all the other questions available in Feedback Datasets, check out our documentation on:
Changelog 1.26.0
Added
- If you expand the labels of a
single or multi
label Question, the state is maintained during the entire annotation process. (#4630) - Added support for span questions in the Python SDK. (#4617)
- Added support for span values in suggestions and responses. (#4623)
- Added
span
questions forFeedbackDataset
. (#4622) - Added
ARGILLA_CACHE_DIR
environment variable to configure the client cache directory. (#4509)
Fixed
- Fixed contextualized workspaces. (#4665)
- Fixed prepare for training when passing
RankingValueSchema
instances to suggestions. (#4628) - Fixed parsing ranking values in suggestions from HF datasets. (#4629)
- Fixed reading description from API response payload. (#4632)
- Fixed pulling (n*chunk_size)+1 records when using
ds.pull
or iterating over the dataset. (#4662) - Fixed client's resolution of enum values when calling the Search and Metrics api, to support Python >=3.11 enum handling. (#4672)
New Contributors
- @davidefiocco made their first contribution in #4639
Full Changelog: v1.25.0...v1.26.0