Releases: argilla-io/argilla
v1.29.1
What's Changed
- 🙏 Update community link for v1.29.1 by @damianpumar in #5257
- bug: 5123 metrics by @sdiazlor in #5245
Full Changelog: v1.29.0...v1.29.1
v2.0.0rc2
What's Changed
- Docs: new review UI guide by @nataliaElv in #5083
- [ENHANCEMENT] ci: Review event triggers to reduce CI runs by @frascuchon in #5075
- docs: fix minor warning by @sdiazlor in #5089
- 🔥 Fix reorder labels by @damianpumar in #5084
- ✨ Refactor CSS by @damianpumar in #5085
- ✨ Fix issue on iterator by @damianpumar in #5099
- [ENHANCEMENT] CI: Allow to publish hidden version for docs/ branches by @frascuchon in #5088
- [ENHANCEMENT / BUGFIX] CI: publish version docs on tag creation by @frascuchon in #5092
- [DOCS] swap extra_headers for headers in updated sdk docs by @burtenshaw in #5100
- docs: change references slack by @sdiazlor in #5101
- [BUGFIX] remove name as default description in settings models by @burtenshaw in #5081
- 🐛 Fix banner by @damianpumar in #5127
- ✨ Improve docs by @damianpumar in #5094
- change: delete on cascade responses when associated user is deleted by @jfcalvo in #5126
- ✨ Add LaTex support by @damianpumar in #5129
- docs: small clarifications by @sdiazlor in #5131
- fix: UI - scrollable records in bulk view by @leiyre in #5143
- fix: copy the dataset name by clicking the copy button by @leiyre in #5142
- [ENHANCEMENT]
argilla
: simplify structure for flatten records to list by @frascuchon in #5137 - [ENHANCEMENT]
argilla
: define argilla-v1 as optional dependency by @frascuchon in #5120 - refactor: improve get pop issues by @sdiazlor in #5135
- [BUGFIX]
argilla
: normalize records when exporting flatten by @frascuchon in #5138 - [BUGFIX]
argilla
: support read draft response models without values by @frascuchon in #5124 - [REFACTOR] Redefine some property methods by @frascuchon in #5114
- fix: conditional checking SQLite connection so connection configuration is correctly executed by @jfcalvo in #5149
- chore: update SQLAlchemy dependencies by @jfcalvo in #5154
- [ENHANCEMENT/REFACTOR]
argilla
: lazy resolution for dataset workspaces by @frascuchon in #5152 - [REFACTOR]:
argilla
: Renamestatus
toresponse.status
for filtering using the SDK by @frascuchon in #5145 - [ENHANCEMENT] [REFACTOR] optimise and refactor SDK ingestion methods by @burtenshaw in #5107
- [BUGFIX]
argilla-server
:await
on similarity search when filtering response values without user by @frascuchon in #5159 - [BUGFIX] rename optional deps v1 by @frascuchon in #5164
- [REVERT] Rename
sdk-v1
tolegacy
by @frascuchon in #5168 - [RELEASES] 2.0.0rc2 by @frascuchon in #5160
Full Changelog: v2.0.0rc1...v2.0.0rc2
v2.0.0rc1
🔆 Release highlights
One Dataset
to rule them all
The main difference between Argilla 1.x and Argilla 2.x is that we've converted the previous dataset types tailored for specific NLP tasks into a single highly-configurable Dataset
class.
With the new Dataset
you can combine multiple fields and question types, so you can adapt the UI for your specific project. This offers you more flexibility, while making Argilla easier to learn and maintain.
Important
If you want to continue using legacy datasets in Argilla 2.x, you will need to convert them into v2 Dataset
's as explained in this migration guide. This includes: DatasetForTextClassification
, DatasetForTokenClassification
, and DatasetForText2Text
.
FeedbackDataset
's do not need to be converted as they are already compatible with the Argilla v2 format.
New SDK
We've redesigned our SDK with the idea to adapt it to the new single Dataset
class and, most importantly, improve the user and developer experience.
The main goal of the new design is to make the SDK easier to use and learn, making the process to configure your dataset and get it up and running much simpler and faster.
To learn more about this new SDK, you can check:
- our new documentation: https://argilla-io.github.io/argilla/latest/
- @burtenshaw's blog post: https://argilla.io/blog/introducing-argilla-new-sdk
- this community meetup: https://www.youtube.com/watch?v=G3lZBtPrtgU
New UI layout
We have also revamped our UI for Argilla 2.0:
- We've redistributed the information in the Home page
- Datasets don't have Tasks, but Questions.
- Annotation guidelines and your progress are now accessible at all times within the dataset page.
- Dataset pages also have a new flexible layout, so you can change the size of different panes and expand or collapse the guidelines and progress.
SpanQuestion
's are now supported in the bulk view.
2_0_layout.mp4
New documentation
This new version of Argilla comes hand-in-hand with a revamped documentation: https://argilla-io.github.io/argilla/latest
We have applied the Diátaxis framework and UX principles with the hope to make this version cleaner and the information easier to find. Let us know what you think!
Share your thoughts with us!
Note
This is a release candidate ahead of the official Argilla 2.0 release. Try it out and let us know what you think.
Find us in Discord or open a Github issue here.
What's Changed
- change: deleted unused API v0 code by @jfcalvo in #4852
- [RELEASE] 1.29.0 by @frascuchon in #4896
- 💀 feat/remove older datasets by @damianpumar in #4903
- feat: update sign-in page UI by @leiyre in #4915
- ✨ Endpoint migration by @damianpumar in #4883
- [FEATURE-BRANCH] refactor: improve API v1 error handling by @jfcalvo in #4887
- [FEAT BRANCH] Add
argilla-sdk
project by @frascuchon in #4891 - 💀 feat/improve dataset table by @damianpumar in #4917
- [REFACTOR] Remove old API calls for
argilla-sdk
by @frascuchon in #4937 - feat: UI table styles by @leiyre in #4953
- docs: fastfit tutorial by @sdiazlor in #4958
- [DOCS] [FIX] Fix logging, typing and docstrings based on feedback by @burtenshaw in #4968
- [BUGFIX] ci: Configure argilla server deps properly by @frascuchon in #4962
- Fix/add-checked-types-to-io by @burtenshaw in #4974
- [CI] Configure build on push feat/ branches by @frascuchon in #4960
- refactor: API folder structure improvements by @jfcalvo in #4959
- feat: UI - remove sidebar components by @leiyre in #4978
- docs: add changelog by @sdiazlor in #4983
- docs: popular issues file generator by @sdiazlor in #4971
- 🚄 feat/improve performance metrics by @damianpumar in #4981
- Update ACCESS_TOKEN naming and documentation hierarchy guides by @davidberenstein1957 in #4990
- feat: UI - update colors and small screen padding by @leiyre in #4999
- feat: UI - remove all train components by @leiyre in #4998
- [FEATURE] SDK - Add support for response status by @frascuchon in #4977
- chore: add new argilla-server folder structure to README.md by @jfcalvo in #4976
- chore: set logger level to error to reduce noise from Elasticsearch and OpenSearch client libraries by @jfcalvo in #4979
- [FEATURE] remove random password generation when creating a user and password is not provided by @jfcalvo in #4993
- [ENHANCEMENT] stop warning on existing datasets by @burtenshaw in #4987
- feat: delete records by @sdiazlor in #4980
- docs: fastfit tutorial contains link to copied blog post by @sdiazlor in #4995
- [BUGFIX]
argilla-server
: Query on response values without an user by @frascuchon in #5003 - [FIX] [ENHANCEMENT] logging records in notebook without ipython by @burtenshaw in #4988
- [FEATURE] Prepare new argilla package by @frascuchon in #5006
- [FIX] Docs: fix missing conflicts resolution by @frascuchon in #5007
- docs: 4920 v1 docs add banner with link to the new docs post refactor by @davidberenstein1957 in #5008
- [CHORE] Argilla server: Add missing CHANGELOG entry by @frascuchon in #5024
- [CHORE] Review and fix commit hooks by @frascuchon in #5027
- chore: execute pre-commit autoupdate manually by @jfcalvo in #5029
- [BUGFIX] Argilla server: looking for records with
external_id
orid
on bulk operations by @frascuchon in #5014 - [CI] Remove all tag and release events by @frascuchon in #5036
- ↔ feat/resizable layout by @damianpumar in #4921
- [CI] Prepare workflow for
argilla-v1
- 1.29.0 by @frascuchon in #5032 - [CI] Prepare
argilla
release job by @frascuchon in #5037 - fix: UI - remove duplicated flexible border by @leiyre in #5038
- [CHORE] Argilla: remove pydantic warnings by @frascuchon in #5025
- fix: UI - border radius in progress bar by @leiyre in #5041
- 🎯 feat/enable bulk span by @damianpumar in #4986
- [ENHANCEMENTE]
argilla
: support python 3.12 by @frascuchon in #5040 - [ENHANCEMENT] Argilla SDK: Updating record fields and vectors by @frascuchon in #5026
- [BUGFIX]
argilla
: Prevent errors checkingDataset
instances whendatasets
is not installed. by @frascuchon in #5045 - [CI] Prepare the
argilla-server
package release by @frascuchon in #5039 - [ENHANCEMENT] argilla: Remove attribute-like access by @frascuchon in #5048
- feat: UI - flexible layout QA by @leiyre in #5046
- [ENHANCEMENT] docs: Add howto update record vectors by @frascuchon in #5052
- [BUGFIX] argilla: Support export action with filtered records by @frascuchon in #5054
- feat: New illustraton and styles for login page by @leiyre in #5030
- [CI] docs: Configure docs publish for releases by @frascuchon in #5047
- [CI] Point dev version for docs to develop branch by @frascuchon in #5060
- [BUGFIX] ci: Define conditions to publish the release properly by @frascuchon in #5061
- [FEATURE-BRANCH] v2.0.0 changes by @jfcalvo in #4869
- [ENHANCEMENT] ci: remove paths for builds by @frascuchon in #5063
- [ENHANCEMENT] ci: Build docker images on PRs, release, and develop by @frascuchon in #5064
- [ENHANCEMENT] ci: Rem...
v1.29.0
🔆 Release highlights
Warning
This will be the last release of Argilla v1. Starting from Argilla 2.0.0, we will only support FeedbackDataset
s which will be renamed to Dataset
. All other dataset types (DatasetForTextClassification
, DatasetForTokenClassification
, and DatasetForText2Text
) will be deprecated. In the next release, we will provide more information and documentation on how to migrate all your datasets into Argilla 2.0 Dataset
s.
Improved record search
Your search matches are now highlighted so you can see easily the result of your search. We’ve also added a selector for datasets with more than one record fields so you can choose whether to do the search on All fields or a specific one.
search.mp4
Record information and metadata in the UI
You can now check all the information and metadata associated for each record directly in the UI.
metadata.mp4
What's Changed in v1.29.0
- feat: small UI improvements by @leiyre in #4770
- feat:update UI for settings page by @leiyre in #4767
- Fix: "cannot import name 'formatargspec' from 'inspect'" with Python 3.11 by @walter-hernandez in #4693
- 🐛 Ranking component not showing rankings by @damianpumar in #4775
- Adding LlamaIndex docs to integrations by @ignacioct in #4803
- docs: use FeedbackDataset in HF example by @sdiazlor in #4805
- docs: clarification/typo in tutorial by @sdiazlor in #4810
- Log if a dataset is deleted by @paulbauriegel in #4752
- ✨ Search text filtering by field by @damianpumar in #4771
- ✨ Add text search for fields by @damianpumar in #4831
- ✨ Fix shift issue and Letter S on issue reported by @damianpumar in #4836
- 🚑 Fix issue for intentional submission by @damianpumar in #4840
- ci: Mono repo setup by @frascuchon in #4742
- fix: add branches and tags to argilla-server.yml GitHub workflow by @jfcalvo in #4854
- fix: GitHub action names with typos by @jfcalvo in #4850
- fix: remove non necessary conditional to build argilla-server docker images by @jfcalvo in #4855
- chore: update datasets.py by @eltociear in #4842
- docs: Fix typo Argila -> Argilla by @louisguitton in #4870
- fix: add error code when searching for a record missing specific vector by @jfcalvo in #4856
- 🐛 Fix highlight multiple fields by @damianpumar in #4866
- feat: add support for value zero on rating questions by @jfcalvo in #4864
- fix(import): remove non-existent server module by @frascuchon in #4874
- 🐛 Fix pre selection by @damianpumar in #4872
- support for Python 3.12 by @nicoloboschi in #4837
- Search bar and highlight docs by @nataliaElv in #4882
- feat: UI Metadata info component by @leiyre in #4851
- [IMPROVEMENT] Update pip when building docker image by @frascuchon in #4907
- [BUGFIX] Filter record metadata value based on metadata property policies by @frascuchon in #4906
- feat: UI - metadata adjustments by @leiyre in #4905
- [REVIEW] Add missing entries in CHANGELOG files by @frascuchon in #4910
New Contributors
- @walter-hernandez made their first contribution in #4693
- @eltociear made their first contribution in #4842
- @louisguitton made their first contribution in #4870
- @nicoloboschi made their first contribution in #4837
Full Changelog: v1.28.0...v1.29.0
v1.28.0
🔆 Release highlights
Improved suggestions
suggestions_first.mp4
Multiple scores support for MultiLabelQuestion
and RankingQuestion
MultiLabelQuestion
and RankingQuestion
now take one score per suggested label / value, making the scores easier to interpret. Learn more about suggestions and their scores here.
Warning
If you upgrade to this version all previous scores in suggestions for MultiLabelQuestion, RankingQuestion and SpanQuestion will turn to NULL, as they will not be valid in the new schema. Please, make sure you upload scores again if you want to use them.
See scores next to its label / value
Scores are now shown next to its label / value in all questions. This makes them more visible and easier to interpret.
Suggestions first - 🌟 Community request: #4647
Now you can order labels in MultiLabelQuestion
so that suggestions are always shown first. This will help you make sure that the most relevant labels are always at hand. Plus, if you’ve added scores to your labels, these will be ordered in descending order. To enable this, go to the Dataset Settings page > Questions and enable “Suggestions first” for the desired question.
SpanQuestion
improvements
new_spans_selection.mp4
Pre-selection highlight
We’ve improved the way selections are shown. You can now see a highlight that represents what the final selection will look like while you’re dragging your mouse. This will help you with the selection speed and show you the difference between the token vs character selection.
Note
Remember that character-level spans are activated by holding Shift
while doing the selection.
New label selector
We’ve improved the way the label selector works in the SpanQuestion
when overlapping spans are enabled so it’s easier to add or correct labels. Simply click on the desired span to activate the selector and click on the label(s) that you want to add or remove.
Persistent storage warning
We’ve added a warning for Argilla instances deployed on Hugging Face Spaces to alert of data loss when the persistent storage is not enabled.
To learn more about this warning and how to disable it, go to our docs.
Changelog 1.28.0
Added
- Added suggestion multi score attribute. (#4730)
- Added order by suggestion first. (#4731)
- Added multi selection entity dropdown for span annotation overlap. (#4735)
- Added pre selection highlight for span annotation. (#4726)
- Added banner when persistent storage is not enabled. (#4744)
- Added support on Python SDK for new multi-label questions
labels_order
attribute. (#4757)
Changed
- Changed the way how Hugging Face space and user is showed in sign in. (#4748)
Fixed
- Fixed Korean character reversed. (#4753)
Fixed
- Fixed requirements for version of wrapt library conflicting with Python 3.11 (#4693)
Full Changelog: v1.27.0...v1.28.0
v1.27.0
🔆 Release highlights
Overlapping spans
We are finally releasing a much expected feature: overlapping spans. This allows you to draw more than one span over the same token(s)/character(s).
overlapping_spans.mp4
To try them out, set up a SpanQuestion
with the argument allow_overlap=True
like this:
dataset = rg.FeedbackDataset(
fields = [rg.TextField(name="text")]
questions = [
rg.SpanQuestion(
name="spans",
labels=["label1", "label2", "label3"],
field="text"
)
]
)
Learn more about configuring this and other question types here.
Global progress bars
We’ve included a new column in our home page that offers the global progress of your datasets, so that you can see at a glance what datasets are closer to completion.
These bars show progress by grouping records based on the status of their responses:
- Submitted: Records where all responses have the
submitted
status. - Discarded: Records where all responses have the
discarded
status. - Conflicting: Records with at least one
submitted
and onediscarded
response. - Left: All other records that have no
submitted
ordiscarded
responses. These may be inpending
ordraft
.
Suggestions got a new look
We’ve improved the way suggestions are shown in the UI to make their purpose clearer: now you can identify each suggestion with a sparkle icon ✨ .
The behavior is still the same:
- suggested values will appear pre-filled responses and marked with the sparkle icon.
- make changes the the incorrect suggestions, then save as a draft or submit.
- the icon will stay to mark the suggestions so you can compare the final response with the suggested one.
Increased label limits
We’ve increased the limit of labels you can use in Label, Multilabel and Span questions to 500. If you need to go beyond that number, you can set up a custom limit using the following environment variables:
ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS
to set the limits in label and multi label questions.ARGILLA_SPAN_OPTIONS_MAX_ITEMS
to set the limit in span questions.
Warning
The UI has been optimized to support up to 1000 labels. If you go beyond this limit, the UI may not be as responsive.
Learn more about this and other environment variables here.
Argilla auf Deutsch!
Thanks to our contributor @paulbauriegel you can now use Argilla fully in German! If that is the main language of your browser, there is nothing you need to do, the UI will automatically detect that and switch to German.
Would you like to translate Argilla to your own language? Reach out to us and we'll help you!
Changelog 1.27.0
Added
- Added Allow overlap spans in the
FeedbackDataset
(#4668) - Added
allow_overlapping
parameter for span questions. (#4697) - Added overall progress bar on
Datasets
table (#4696) - Added German language translation (#4688)
Changed
- New UI design for suggestions (#4682)
Fixed
- Improve performance for more than 250 labels (#4702)
New Contributors
- @stevengans made their first contribution in #4646
- @tim-win made their first contribution in #4672
- @strickvl made their first contribution in #4675
- @paulbauriegel made their first contribution in #4688
- @davanstrien made their first contribution in #4687
Full Changelog: v1.26.1...v1.27.0
v1.26.1
v1.26.0
🔆 Release highlights
Spans question
We've added a new type of question to Feedback Datasets: the SpanQuestion
. This type of question allows you to highlight portions of text in a specific field and apply a label. It is specially useful for token classification (like NER or POS tagging) and information extraction tasks.
spans_demo.mp4
With this type of question you can:
✨ Provide suggested spans with a confidence score, so your team doesn't need to start from scratch.
⌨️ Choose a label using your mouse or with the keyboard shortcut provided next to the label.
🖱️ Draw a span by dragging your mouse over the parts of the text you want to select or if it's a single token, just double-click on it.
🪄 Forget about mistakes with token boundaries. The UI will snap your spans to token boundaries for you.
🔎 Annotate at character-level when you need more fine-grained spans. Hold the Shift
key while drawing the span and the resulting span will start and end in the exact boundaries of your selection.
✔️ Quickly change the label of a span by clicking on the label name and selecting the correct one from the dropdown.
🖍️ Correct a span at the speed of light by simply drawing the correct span over it. The new span will overwrite the old one.
🧼 Remove labels by hovering over the label name in the span and then click on the 𐢫 on the left hand side.
Here's an example of what your dataset would look like from the SDK:
import argilla as rg
from argilla.client.feedback.schemas import SpanValueSchema
#connect to your Argilla instance
rg.init(...)
# create a dataset with a span question
dataset = rg.FeedbackDataset(
fields=[rg.TextField(name="text"),
questions=[
rg.SpanQuestion(
name="entities",
title="Highlight the entities in the text:",
labels={"PER": "Person", "ORG": "Organization", "EVE": "Event"}, # or ["PER", "ORG", "EVE"]
field="text", # the field where you want to do the span annotation
required=True
)
]
)
# create a record with suggested spans
record = rg.FeedbackRecord(
fields={"text": "This is the text of the record"}
suggestions = [
{
"question_name": "entities",
"value": [
SpanValueSchema(
start=0, # position of the first character of the span
end=10, # position of the character right after the end of the span
label="ORG",
score=1.0
)
],
"agent": "my_model",
}
]
)
# add records to the dataset and push to Argilla
dataset.add_records([record])
dataset.push_to_argilla(...)
To learn more about this and all the other questions available in Feedback Datasets, check out our documentation on:
Changelog 1.26.0
Added
- If you expand the labels of a
single or multi
label Question, the state is maintained during the entire annotation process. (#4630) - Added support for span questions in the Python SDK. (#4617)
- Added support for span values in suggestions and responses. (#4623)
- Added
span
questions forFeedbackDataset
. (#4622) - Added
ARGILLA_CACHE_DIR
environment variable to configure the client cache directory. (#4509)
Fixed
- Fixed contextualized workspaces. (#4665)
- Fixed prepare for training when passing
RankingValueSchema
instances to suggestions. (#4628) - Fixed parsing ranking values in suggestions from HF datasets. (#4629)
- Fixed reading description from API response payload. (#4632)
- Fixed pulling (n*chunk_size)+1 records when using
ds.pull
or iterating over the dataset. (#4662) - Fixed client's resolution of enum values when calling the Search and Metrics api, to support Python >=3.11 enum handling. (#4672)
New Contributors
- @davidefiocco made their first contribution in #4639
Full Changelog: v1.25.0...v1.26.0
v1.25.0
🔆 Release highlights
Reorder labels
admin
and owner
users can now change the order in which labels appear in the question form. To do this, go to the Questions
tab inside Dataset Settings and move the labels until they are in the desired order.
reorder_labels.mp4
Aligned SDK status filter
The missing
status has been removed from the SDK filters. To filter records that don't have responses you will now need to use the pending
status like so:
filtered_dataset = dataset.filter_by(response_status="pending")
Learn more about how to use this filter in our docs
Pandas 2.0 support
We’ve removed the limitation to use pandas <2.0.0
so you can now use Argilla with pandas v1 or v2 safely.
Changelog 1.25.0
Note
For changes in the argilla-server module, visit the argilla-server release notes
Added
- Reorder labels in
dataset settings page
for single/multi label questions (#4598) - Added pandas v2 support using the python SDK. (#4600)
Removed
- Removed
missing
response for status filter. Usepending
instead. (#4533)
Fixed
- Fixed FloatMetadataProperty: value is not a valid float (#4570)
- Fixed redirect to
user-settings
instead of 404user_settings
(#4609)
New Contributors
Full Changelog: v1.24.0....v1.25.0
v1.24.0
Note
This release does not contain any new features, but it includes a major change in the argilla server.
The package is using the argilla-server
dependency defined here.
Full Changelog: v1.23.1...v1.24.0