🔆 Release highlights

Spans question

We've added a new type of question to Feedback Datasets: the SpanQuestion. This type of question allows you to highlight portions of text in a specific field and apply a label. It is specially useful for token classification (like NER or POS tagging) and information extraction tasks.

spans_demo.mp4

With this type of question you can:

✨ Provide suggested spans with a confidence score, so your team doesn't need to start from scratch.

⌨️ Choose a label using your mouse or with the keyboard shortcut provided next to the label.

🖱️ Draw a span by dragging your mouse over the parts of the text you want to select or if it's a single token, just double-click on it.

🪄 Forget about mistakes with token boundaries. The UI will snap your spans to token boundaries for you.

🔎 Annotate at character-level when you need more fine-grained spans. Hold the Shift key while drawing the span and the resulting span will start and end in the exact boundaries of your selection.

✔️ Quickly change the label of a span by clicking on the label name and selecting the correct one from the dropdown.

🖍️ Correct a span at the speed of light by simply drawing the correct span over it. The new span will overwrite the old one.

🧼 Remove labels by hovering over the label name in the span and then click on the 𐢫 on the left hand side.

Here's an example of what your dataset would look like from the SDK:

import argilla as rg
from argilla.client.feedback.schemas import SpanValueSchema

#connect to your Argilla instance
rg.init(...)

# create a dataset with a span question
dataset = rg.FeedbackDataset(
    fields=[rg.TextField(name="text"),
    questions=[
        rg.SpanQuestion(
            name="entities",
            title="Highlight the entities in the text:",
            labels={"PER": "Person", "ORG": "Organization", "EVE": "Event"},  # or ["PER", "ORG", "EVE"]
            field="text", # the field where you want to do the span annotation
            required=True
        )
    ]
)

# create a record with suggested spans
record = rg.FeedbackRecord(
    fields={"text": "This is the text of the record"}
    suggestions = [
        {
            "question_name": "entities",
            "value": [
                SpanValueSchema(
                    start=0, # position of the first character of the span
                    end=10, # position of the character right after the end of the span
                    label="ORG",
                    score=1.0
                )
            ],
            "agent": "my_model",
        }
    ]
)

# add records to the dataset and push to Argilla
dataset.add_records([record])
dataset.push_to_argilla(...)

To learn more about this and all the other questions available in Feedback Datasets, check out our documentation on:

Changelog 1.26.0

Added

If you expand the labels of a single or multi label Question, the state is maintained during the entire annotation process. (#4630)
Added support for span questions in the Python SDK. (#4617)
Added support for span values in suggestions and responses. (#4623)
Added span questions for FeedbackDataset. (#4622)
Added ARGILLA_CACHE_DIR environment variable to configure the client cache directory. (#4509)

Fixed

Fixed contextualized workspaces. (#4665)
Fixed prepare for training when passing RankingValueSchema instances to suggestions. (#4628)
Fixed parsing ranking values in suggestions from HF datasets. (#4629)
Fixed reading description from API response payload. (#4632)
Fixed pulling (n*chunk_size)+1 records when using ds.pull or iterating over the dataset. (#4662)
Fixed client's resolution of enum values when calling the Search and Metrics api, to support Python >=3.11 enum handling. (#4672)

New Contributors

@davidefiocco made their first contribution in #4639

Full Changelog: v1.25.0...v1.26.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.26.0