All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Note
This version is the first release of the Argilla Server. Before this release, the Argilla Server was part of the Argilla SDK. Now, the Argilla Server is a separate package that can be installed and used independently of the Argilla SDK.
- Fixed problems using
ARGILLA_BASE_URL
environment variable. (#14)
- Added bulk annotation by filter criteria. (#4516)
- Automatically fetch new datasets on focus tab. (#4514)
- API v1 responses returning
Record
schema now always includedataset_id
as attribute. (#4482) - API v1 responses returning
Response
schema now always includerecord_id
as attribute. (#4482) - API v1 responses returning
Question
schema now always includedataset_id
attribute. (#4487) - API v1 responses returning
Field
schema now always includedataset_id
attribute. (#4488) - API v1 responses returning
MetadataProperty
schema now always includedataset_id
attribute. (#4489) - API v1 responses returning
VectorSettings
schema now always includedataset_id
attribute. (#4490) - Added
pdf_to_html
function to.html_utils
module that convert PDFs to dataURL to be able to render them in tha Argilla UI. (#4481) - Added
ARGILLA_AUTH_SECRET_KEY
environment variable. (#4539) - Added
ARGILLA_AUTH_ALGORITHM
environment variable. (#4539) - Added
ARGILLA_AUTH_TOKEN_EXPIRATION
environment variable. (#4539) - Added
ARGILLA_AUTH_OAUTH_CFG
environment variable. (#4546) - Added OAuth2 support for HuggingFace Hub. (#4546)
- Deprecated
ARGILLA_LOCAL_AUTH_*
environment variables. Will be removed in the release v1.25.0. (#4539)
- Changed regex pattern for
username
attribute inUserCreate
. Now uppercase letters are allowed. (#4544)
- Remove sending
Authorization
header from python SDK requests. (#4535)
- Fixed keyboard shortcut for label questions. (#4530)
- Added Bulk annotation support. (#4333)
- Restore filters from feedback dataset settings. ([#4461])(argilla-io/argilla#4461)
- Warning on feedback dataset settings when leaving page with unsaved changes. (#4461)
- Added pydantic v2 support using the python SDK. (#4459)
- Added
vector_settings
to the__repr__
method of theFeedbackDataset
andRemoteFeedbackDataset
. (#4454) - Added integration for
sentence-transformers
usingSentenceTransformersExtractor
to configurevector_settings
inFeedbackDataset
andFeedbackRecord
. (#4454)
- Module
argilla.cli.server
definitions have been moved toargilla.server.cli
module. (#4472) - [breaking] Changed
vector_settings_by_name
for genericproperty_by_name
usage, which will returnNone
instead of raising an error. (#4454) - The constant definition
ES_INDEX_REGEX_PATTERN
in moduleargilla._constants
is now private. (#4472) nan
values in metadata properties will raise a 422 error when creating/updating records. (#4300)None
values are now allowed in metadata properties. (#4300)
- Paginating to a new record, automatically scrolls down to selected form area. (#4333)
- The
missing
response status for filtering records is deprecated and will be removed in the release v1.24.0. Usepending
instead. (#4433)
- The deprecated
python -m argilla database
command has been removed. (#4472)
- Added new draft queue for annotation view (#4334)
- Added annotation metrics module for the
FeedbackDataset
(argilla.client.feedback.metrics
). (#4175). - Added strategy to handle and translate errors from the server for
401
HTTP status code` (#4362) - Added integration for
textdescriptives
usingTextDescriptivesExtractor
to configuremetadata_properties
inFeedbackDataset
andFeedbackRecord
. (#4400). Contributed by @m-newhauser - Added
POST /api/v1/me/responses/bulk
endpoint to create responses in bulk for current user. (#4380) - Added list support for term metadata properties. (Closes #4359)
- Added new CLI task to reindex datasets and records into the search engine. (#4404)
- Added
httpx_extra_kwargs
argument torg.init
andArgilla
to allow passing extra arguments tohttpx.Client
used byArgilla
. (#4440) - Added
ResponseStatusFilter
enum in__init__
imports of Argilla (#4118). Contributed by @Piyush-Kumar-Ghosh.
- More productive and simpler shortcut system (#4215)
- Move
ArgillaSingleton
,init
andactive_client
to a new modulesingleton
. (#4347) - Updated
argilla.load
functions to also work withFeedbackDataset
s. (#4347) - [breaking] Updated
argilla.delete
functions to also work withFeedbackDataset
s. It now raises an error if the dataset does not exist. (#4347) - Updated
argilla.list_datasets
functions to also work withFeedbackDataset
s. (#4347)
- Fixed error in
TextClassificationSettings.from_dict
method in which thelabel_schema
created was a list ofdict
instead of a list ofstr
. (#4347) - Fixed total records on pagination component (#4424)
- Removed
draft
auto save for annotation view (#4334)
- Added
GET /api/v1/datasets/:dataset_id/records/search/suggestions/options
endpoint to return suggestion available options for searching. (#4260) - Added
metadata_properties
to the__repr__
method of theFeedbackDataset
andRemoteFeedbackDataset
.(#4192). - Added
get_model_kwargs
,get_trainer_kwargs
,get_trainer_model
,get_trainer_tokenizer
andget_trainer
-methods to theArgillaTrainer
to improve interoperability across frameworks. (#4214). - Added additional formatting checks to the
ArgillaTrainer
to allow for better interoperability ofdefaults
andformatting_func
usage. (#4214). - Added a warning to the
update_config
-method ofArgillaTrainer
to emphasize if thekwargs
were updated correctly. (#4214). - Added
argilla.client.feedback.utils
module withhtml_utils
(this mainly includesvideo/audio/image_to_html
that convert media to dataURL to be able to render them in tha Argilla UI andcreate_token_highlights
to highlight tokens in a custom way. Both work on TextQuestion and TextField with use_markdown=True) andassignments
(this mainly includesassign_records
to assign records according to a number of annotators and records, an overlap and the shuffle option; andassign_workspace
to assign and create if needed a workspace according to the record assignment). (#4121)
- Fixed error in
ArgillaTrainer
, with numerical labels, usingRatingQuestion
instead ofRankingQuestion
(#4171) - Fixed error in
ArgillaTrainer
, now we can train forextractive_question_answering
using a validation sample (#4204) - Fixed error in
ArgillaTrainer
, when training forsentence-similarity
it didn't work with a list of values per record (#4211) - Fixed error in the unification strategy for
RankingQuestion
(#4295) - Fixed
TextClassificationSettings.labels_schema
order was not being preserved. Closes #3828 (#4332) - Fixed error when requesting non-existing API endpoints. Closes #4073 (#4325)
- Fixed error when passing
draft
responses to create records endpoint. (#4354)
- [breaking] Suggestions
agent
field only accepts now some specific characters and a limited length. (#4265) - [breaking] Suggestions
score
field only accepts now float values in the range0
to1
. (#4266) - Updated
POST /api/v1/dataset/:dataset_id/records/search
endpoint to support optionalquery
attribute. (#4327) - Updated
POST /api/v1/dataset/:dataset_id/records/search
endpoint to supportfilter
andsort
attributes. (#4327) - Updated
POST /api/v1/me/datasets/:dataset_id/records/search
endpoint to support optionalquery
attribute. (#4270) - Updated
POST /api/v1/me/datasets/:dataset_id/records/search
endpoint to supportfilter
andsort
attributes. (#4270) - Changed the logging style while pulling and pushing
FeedbackDataset
to Argilla fromtqdm
style torich
. (#4267). Contributed by @zucchini-nlp. - Updated
push_to_argilla
to printrepr
of the pushedRemoteFeedbackDataset
after push and changedshow_progress
to True by default. (#4223) - Changed
models
andtokenizer
for theArgillaTrainer
to explicitly allow for changing them when needed. (#4214).
- Added
POST /api/v1/datasets/:dataset_id/records/search
endpoint to search for records without user context, including responses by all users. (#4143) - Added
POST /api/v1/datasets/:dataset_id/vectors-settings
endpoint for creating vector settings for a dataset. (#3776) - Added
GET /api/v1/datasets/:dataset_id/vectors-settings
endpoint for listing the vectors settings for a dataset. (#3776) - Added
DELETE /api/v1/vectors-settings/:vector_settings_id
endpoint for deleting a vector settings. (#3776) - Added
PATCH /api/v1/vectors-settings/:vector_settings_id
endpoint for updating a vector settings. (#4092) - Added
GET /api/v1/records/:record_id
endpoint to get a specific record. (#4039) - Added support to include vectors for
GET /api/v1/datasets/:dataset_id/records
endpoint response usinginclude
query param. (#4063) - Added support to include vectors for
GET /api/v1/me/datasets/:dataset_id/records
endpoint response usinginclude
query param. (#4063) - Added support to include vectors for
POST /api/v1/me/datasets/:dataset_id/records/search
endpoint response usinginclude
query param. (#4063) - Added
show_progress
argument tofrom_huggingface()
method to make the progress bar for parsing records process optional.(#4132). - Added a progress bar for parsing records process to
from_huggingface()
method withtrange
intqdm
.(#4132). - Added to sort by
inserted_at
orupdated_at
for datasets with no metadata. (4147) - Added
max_records
argument topull()
method forRemoteFeedbackDataset
.(#4074) - Added functionality to push your models to the Hugging Face hub with
ArgillaTrainer.push_to_huggingface
(#3976). Contributed by @Racso-3141. - Added
filter_by
argument toArgillaTrainer
to filter byresponse_status
(#4120). - Added
sort_by
argument toArgillaTrainer
to sort bymetadata
(#4120). - Added
max_records
argument toArgillaTrainer
to limit record used for training (#4120). - Added
add_vector_settings
method to local and remoteFeedbackDataset
. (#4055) - Added
update_vectors_settings
method to local and remoteFeedbackDataset
. (#4122) - Added
delete_vectors_settings
method to local and remoteFeedbackDataset
. (#4130) - Added
vector_settings_by_name
method to local and remoteFeedbackDataset
. (#4055) - Added
find_similar_records
method to local and remoteFeedbackDataset
. (#4023) - Added
ARGILLA_SEARCH_ENGINE
environment variable to configure the search engine to use. (#4019)
- [breaking] Remove support for Elasticsearch < 8.5 and OpenSearch < 2.4. (#4173)
- [breaking] Users working with OpenSearch engines must use version >=2.4 and set
ARGILLA_SEARCH_ENGINE=opensearch
. (#4019 and #4111) - [breaking] Changed
FeedbackDataset.*_by_name()
methods to returnNone
when no match is found (#4101). - [breaking]
limit
query parameter forGET /api/v1/datasets/:dataset_id/records
endpoint is now only accepting values greater or equal than1
and less or equal than1000
. (#4143) - [breaking]
limit
query parameter forGET /api/v1/me/datasets/:dataset_id/records
endpoint is now only accepting values greater or equal than1
and less or equal than1000
. (#4143) - Update
GET /api/v1/datasets/:dataset_id/records
endpoint to fetch record using the search engine. (#4142) - Update
GET /api/v1/me/datasets/:dataset_id/records
endpoint to fetch record using the search engine. (#4142) - Update
POST /api/v1/datasets/:dataset_id/records
endpoint to allow to create records withvectors
(#4022) - Update
PATCH /api/v1/datasets/:dataset_id
endpoint to allow updatingallow_extra_metadata
attribute. (#4112) - Update
PATCH /api/v1/datasets/:dataset_id/records
endpoint to allow to update records withvectors
. (#4062) - Update
PATCH /api/v1/records/:record_id
endpoint to allow to update record withvectors
. (#4062) - Update
POST /api/v1/me/datasets/:dataset_id/records/search
endpoint to allow to search records with vectors. (#4019) - Update
BaseElasticAndOpenSearchEngine.index_records
method to also index record vectors. (#4062) - Update
FeedbackDataset.__init__
to allow passing a list of vector settings. (#4055) - Update
FeedbackDataset.push_to_argilla
to also push vector settings. (#4055) - Update
FeedbackDatasetRecord
to support the creation of records with vectors. (#4043) - Using cosine similarity to compute similarity between vectors. (#4124)
- Fixed svg images out of screen with too large images (#4047)
- Fixed creating records with responses from multiple users. Closes #3746 and #3808 (#4142)
- Fixed deleting or updating responses as an owner for annotators. (Commit 403a66d)
- Fixed passing user_id when getting records by id. (Commit 98c7927)
- Fixed non-basic tags serialized when pushing a dataset to the Hugging Face Hub. Closes #4089 (#4200)
- New
GET /api/v1/datasets/:dataset_id/metadata-properties
endpoint for listing dataset metadata properties. (#3813) - New
POST /api/v1/datasets/:dataset_id/metadata-properties
endpoint for creating dataset metadata properties. (#3813) - New
PATCH /api/v1/metadata-properties/:metadata_property_id
endpoint allowing the update of a specific metadata property. (#3952) - New
DELETE /api/v1/metadata-properties/:metadata_property_id
endpoint for deletion of a specific metadata property. (#3911) - New
GET /api/v1/metadata-properties/:metadata_property_id/metrics
endpoint to compute metrics for a specific metadata property. (#3856) - New
PATCH /api/v1/records/:record_id
endpoint to update a record. (#3920) - New
PATCH /api/v1/dataset/:dataset_id/records
endpoint to bulk update the records of a dataset. (#3934) - Missing validations to
PATCH /api/v1/questions/:question_id
. Nowtitle
anddescription
are using the same validations used to create questions. (#3967) - Added
TermsMetadataProperty
,IntegerMetadataProperty
andFloatMetadataProperty
classes allowing to define metadata properties for aFeedbackDataset
. (#3818) - Added
metadata_filters
tofilter_by
method inRemoteFeedbackDataset
to filter based on metadata i.e.TermsMetadataFilter
,IntegerMetadataFilter
, andFloatMetadataFilter
. (#3834) - Added a validation layer for both
metadata_properties
andmetadata_filters
in their schemas and as part of theadd_records
andfilter_by
methods, respectively. (#3860) - Added
sort_by
query parameter to listing records endpoints that allows to sort the records byinserted_at
,updated_at
or metadata property. (#3843) - Added
add_metadata_property
method to bothFeedbackDataset
andRemoteFeedbackDataset
(i.e.FeedbackDataset
in Argilla). (#3900) - Added fields
inserted_at
andupdated_at
inRemoteResponseSchema
. (#3822) - Added support for
sort_by
forRemoteFeedbackDataset
i.e. aFeedbackDataset
uploaded to Argilla. (#3925) - Added
metadata_properties
support for bothpush_to_huggingface
andfrom_huggingface
. (#3947) - Add support for update records (
metadata
) from Python SDK. (#3946) - Added
delete_metadata_properties
method to delete metadata properties. (#3932) - Added
update_metadata_properties
method to updatemetadata_properties
. (#3961) - Added automatic model card generation through
ArgillaTrainer.save
(#3857) - Added
FeedbackDataset
TaskTemplateMixin
for pre-defined task templates. (#3969) - A maximum limit of 50 on the number of options a ranking question can accept. (#3975)
- New
last_activity_at
field toFeedbackDataset
exposing when the last activity for the associated dataset occurs. (#3992)
GET /api/v1/datasets/{dataset_id}/records
,GET /api/v1/me/datasets/{dataset_id}/records
andPOST /api/v1/me/datasets/{dataset_id}/records/search
endpoints to return thetotal
number of records. (#3848, #3903)- Implemented
__len__
method for filtered datasets to return the number of records matching the provided filters. (#3916) - Increase the default max result window for Elasticsearch created for Feedback datasets. (#3929)
- Force elastic index refresh after records creation. (#3929)
- Validate metadata fields for filtering and sorting in the Python SDK. (#3993)
- Using metadata property name instead of id for indexing data in search engine index. (#3994)
- Fixed response schemas to allow
values
to beNone
i.e. when a record is discarded theresponse.values
are set toNone
. (#3926)
- Added fields
inserted_at
andupdated_at
inRemoteResponseSchema
(#3822). - Added automatic model card generation through
ArgillaTrainer.save
(#3857). - Added task templates to the
FeedbackDataset
(#3973).
- Updated
Dockerfile
to use multi stage build (#3221 and #3793). - Updated active learning for text classification notebooks to use the most recent small-text version (#3831).
- Changed argilla dataset name in the active learning for text classification notebooks to be consistent with the default names in the huggingface spaces (#3831).
- FeedbackDataset API methods have been aligned to be accessible through the several implementations (#3937).
- The
unify_responses
support for remote datasets (#3937).
- Fix field not shown in the order defined in the dataset settings. Closes #3959 (#3984)
- Updated active learning for text classification notebooks to pass ids of type int to
TextClassificationRecord
(#3831). - Fixed record fields validation that was preventing from logging records with optional fields (i.e.
required=True
) when the field value wasNone
(#3846). - Always set
pretrained_model_name_or_path
attribute as string inArgillaTrainer
(#3914). - The
inserted_at
andupdated_at
attributes are create using theutcnow
factory to avoid unexpected race conditions on timestamp creation (#3945) - Fixed
configure_dataset_settings
when providing the workspace via the argworkspace
(#3887). - Fixed saving of models trained with
ArgillaTrainer
with apeft_config
parameter (#3795). - Fixed backwards compatibility on
from_huggingface
when loading aFeedbackDataset
from the Hugging Face Hub that was previously dumped using another version of Argilla, starting at 1.8.0, when it was first introduced (#3829). - Fixed wrong
__repr__
problem forTrainingTask
. (#3969) - Fixed wrong key return error
prepare_for_training_with_*
forTrainingTask
. (#3969)
- Function
rg.configure_dataset
is deprecated in favour ofrg.configure_dataset_settings
. The former will be removed in version 1.19.0
- Added
ArgillaTrainer
integration with sentence-transformers, allowing fine tuning for sentence similarity (#3739) - Added
ArgillaTrainer
integration withTrainingTask.for_question_answering
(#3740) - Added
Auto save record
to save automatically the current record that you are working on (#3541) - Added
ArgillaTrainer
integration with OpenAI, allowing fine tuning for chat completion (#3615) - Added
workspaces list
command to list Argilla workspaces (#3594). - Added
datasets list
command to list Argilla datasets (#3658). - Added
users create
command to create users (#3667). - Added
whoami
command to get current user (#3673). - Added
users delete
command to delete users (#3671). - Added
users list
command to list users (#3688). - Added
workspaces delete-user
command to remove a user from a workspace (#3699). - Added
datasets list
command to list Argilla datasets (#3658). - Added
users create
command to create users (#3667). - Added
users delete
command to delete users (#3671). - Added
workspaces create
command to create an Argilla workspace (#3676). - Added
datasets push-to-hub
command to push aFeedbackDataset
from Argilla into the HuggingFace Hub (#3685). - Added
info
command to get info about the used Argilla client and server (#3707). - Added
datasets delete
command to delete aFeedbackDataset
from Argilla (#3703). - Added
created_at
andupdated_at
properties toRemoteFeedbackDataset
andFilteredRemoteFeedbackDataset
(#3709). - Added handling
PermissionError
when executing a command with a logged in user with not enough permissions (#3717). - Added
workspaces add-user
command to add a user to workspace (#3712). - Added
workspace_id
param toGET /api/v1/me/datasets
endpoint (#3727). - Added
workspace_id
arg tolist_datasets
in the Python SDK (#3727). - Added
argilla
script that allows to execute Argilla CLI using theargilla
command (#3730). - Added support for passing already initialized
model
andtokenizer
instances to theArgillaTrainer
(#3751) - Added
server_info
function to check the Argilla server information (also accessible viarg.server_info
) (#3772).
- Move
database
commands underserver
group of commands (#3710) server
commands only included in the CLI app whenserver
extra requirements are installed (#3710).- Updated
PUT /api/v1/responses/{response_id}
to replacevalues
stored with receivedvalues
in request (#3711). - Display a
UserWarning
when theuser_id
inWorkspace.add_user
andWorkspace.delete_user
is the ID of an user with the owner role as they don't require explicit permissions (#3716). - Rename
tasks
sub-package tocli
(#3723). - Changed
argilla database
command in the CLI to now be accessed viaargilla server database
, to be deprecated in the upcoming release (#3754). - Changed
visible_options
(of label and multi label selection questions) validation in the backend to check that the provided value is greater or equal than/to 3 and less or equal than/to the number of provided options (#3773).
- Fixed
remove user modification in text component on clear answers
(#3775) - Fixed
Highlight raw text field in dataset feedback task
(#3731) - Fixed
Field title too long
(#3734) - Fixed error messages when deleting a
DatasetForTextClassification
(#3652) - Fixed
Pending queue
pagination problems when during data annotation (#3677) - Fixed
visible_labels
default value to be 20 just whenvisible_labels
not provided andlen(labels) > 20
, otherwise it will either be the providedvisible_labels
value orNone
, forLabelQuestion
andMultiLabelQuestion
(#3702). - Fixed
DatasetCard
generation whenRemoteFeedbackDataset
contains suggestions (#3718). - Add missing
draft
status inResponseSchema
as now there can be responses withdraft
status when annotating via the UI (#3749). - Searches when queried words are distributed along the record fields (#3759).
- Fixed Python 3.11 compatibility issue with
/api/datasets
endpoints due to theTaskType
enum replacement in the endpoint URL (#3769). - Fixed
RankingValueSchema
andFeedbackRankingValueModel
schemas to allowrank=None
whenstatus=draft
(#3781).
- Fixed
Text component
text content sanitization behavior just for markdown to prevent disappear the text(#3738) - Fixed
Text component
now you need to press Escape to exit the text area (#3733) - Fixed
SearchEngine
was creating the same number of primary shards and replica shards for eachFeedbackDataset
(#3736).
- Added
Enable to update guidelines and dataset settings for Feedback Datasets directly in the UI
(#3489) - Added
ArgillaTrainer
integration with TRL, allowing for easy supervised finetuning, reward modeling, direct preference optimization and proximal policy optimization (#3467) - Added
formatting_func
toArgillaTrainer
forFeedbackDataset
datasets add a custom formatting for the data (#3599). - Added
login
function inargilla.client.login
to login into an Argilla server and store the credentials locally (#3582). - Added
login
command to login into an Argilla server (#3600). - Added
logout
command to logout from an Argilla server (#3605). - Added
DELETE /api/v1/suggestions/{suggestion_id}
endpoint to delete a suggestion given its ID (#3617). - Added
DELETE /api/v1/records/{record_id}/suggestions
endpoint to delete several suggestions linked to the same record given their IDs (#3617). - Added
response_status
param toGET /api/v1/datasets/{dataset_id}/records
to be able to filter byresponse_status
as previously included forGET /api/v1/me/datasets/{dataset_id}/records
(#3613). - Added
list
classmethod toArgillaMixin
to be used asFeedbackDataset.list()
, also including theworkspace
to list from as arg (#3619). - Added
filter_by
method inRemoteFeedbackDataset
to filter based onresponse_status
(#3610). - Added
list_workspaces
function (to be used asrg.list_workspaces
, butWorkspace.list
is preferred) to list all the workspaces from an user in Argilla (#3641). - Added
list_datasets
function (to be used asrg.list_datasets
) to list theTextClassification
,TokenClassification
, andText2Text
datasets in Argilla (#3638). - Added
RemoteSuggestionSchema
to manage suggestions in Argilla, including thedelete
method to delete suggestios from Argilla viaDELETE /api/v1/suggestions/{suggestion_id}
(#3651). - Added
delete_suggestions
toRemoteFeedbackRecord
to remove suggestions from Argilla viaDELETE /api/v1/records/{record_id}/suggestions
(#3651).
- Changed
Optional label for * mark for required question
(#3608) - Updated
RemoteFeedbackDataset.delete_records
to use batch delete records endpoint (#3580). - Included
allowed_for_roles
for someRemoteFeedbackDataset
,RemoteFeedbackRecords
, andRemoteFeedbackRecord
methods that are only allowed for users with rolesowner
andadmin
(#3601). - Renamed
ArgillaToFromMixin
toArgillaMixin
(#3619). - Move
users
CLI app underdatabase
CLI app (#3593). - Move server
Enum
classes toargilla.server.enums
module (#3620).
- Fixed
Filter by workspace in breadcrumbs
(#3577) - Fixed
Filter by workspace in datasets table
(#3604) - Fixed
Query search highlight
for Text2Text and TextClassification (#3621) - Fixed
RatingQuestion.values
validation to raise aValidationError
when values are out of range i.e. [1, 10] (#3626).
- Removed
multi_task_text_token_classification
fromTaskType
as not used (#3640). - Removed
argilla_id
in favor ofid
fromRemoteFeedbackDataset
(#3663). - Removed
fetch_records
fromRemoteFeedbackDataset
as now the records are lazily fetched from Argilla (#3663). - Removed
push_to_argilla
fromRemoteFeedbackDataset
, as it just works when calling it through aFeedbackDataset
locally, as now the updates of the remote datasets are automatically pushed to Argilla (#3663). - Removed
set_suggestions
in favor ofupdate(suggestions=...)
for bothFeedbackRecord
andRemoteFeedbackRecord
, as all the updates of any "updateable" attribute of a record will go throughupdate
instead (#3663). - Remove unused
owner
attribute for client Dataset data model (#3665)
- Fixed PostgreSQL database not being updated after
begin_nested
because of missingcommit
(#3567).
- Fixed
settings
could not be provided when updating arating
orranking
question (#3552).
- Added
PATCH /api/v1/fields/{field_id}
endpoint to update the field title and markdown settings (#3421). - Added
PATCH /api/v1/datasets/{dataset_id}
endpoint to update dataset name and guidelines (#3402). - Added
PATCH /api/v1/questions/{question_id}
endpoint to update question title, description and some settings (depending on the type of question) (#3477). - Added
DELETE /api/v1/records/{record_id}
endpoint to remove a record given its ID (#3337). - Added
pull
method inRemoteFeedbackDataset
(aFeedbackDataset
pushed to Argilla) to pull all the records from it and return it as a local copy as aFeedbackDataset
(#3465). - Added
delete
method inRemoteFeedbackDataset
(aFeedbackDataset
pushed to Argilla) (#3512). - Added
delete_records
method inRemoteFeedbackDataset
, anddelete
method inRemoteFeedbackRecord
to delete records from Argilla (#3526).
- Improved efficiency of weak labeling when dataset contains vectors (#3444).
- Added
ArgillaDatasetMixin
to detach the Argilla-related functionality from theFeedbackDataset
(#3427) - Moved
FeedbackDataset
-relatedpydantic.BaseModel
schemas toargilla.client.feedback.schemas
instead, to be better structured and more scalable and maintainable (#3427) - Update CLI to use database async connection (#3450).
- Limit rating questions values to the positive range [1, 10] (#3451).
- Updated
POST /api/users
endpoint to be able to provide a list of workspace names to which the user should be linked to (#3462). - Updated Python client
User.create
method to be able to provide a list of workspace names to which the user should be linked to (#3462). - Updated
GET /api/v1/me/datasets/{dataset_id}/records
endpoint to allow getting records matching one of the response statuses provided via query param (#3359). - Updated
POST /api/v1/me/datasets/{dataset_id}/records
endpoint to allow searching records matching one of the response statuses provided via query param (#3359). - Updated
SearchEngine.search
method to allow searching records matching one of the response statuses provided (#3359). - After calling
FeedbackDataset.push_to_argilla
, the methodsFeedbackDataset.add_records
andFeedbackRecord.set_suggestions
will automatically call Argilla with no need of callingpush_to_argilla
explicitly (#3465). - Now calling
FeedbackDataset.push_to_huggingface
dumps theresponses
as aList[Dict[str, Any]]
instead ofSequence
to make it more readable via 🤗datasets
(#3539).
- Fixed issue with
bool
values anddefault
from Jinja2 while generating the HuggingFaceDatasetCard
fromargilla_template.md
(#3499). - Fixed
DatasetConfig.from_yaml
which was failing when callingFeedbackDataset.from_huggingface
as the UUIDs cannot be deserialized automatically byPyYAML
, so UUIDs are neither dumped nor loaded anymore (#3502). - Fixed an issue that didn't allow the Argilla server to work behind a proxy (#3543).
TextClassificationSettings
andTokenClassificationSettings
labels are properly parsed to strings both in the Python client and in the backend endpoint (#3495).- Fixed
PUT /api/v1/datasets/{dataset_id}/publish
to check whether at least one field and question hasrequired=True
(#3511). - Fixed
FeedbackDataset.from_huggingface
assuggestions
were being lost when there were noresponses
(#3539). - Fixed
QuestionSchema
andFieldSchema
not validatingname
attribute (#3550).
- After calling
FeedbackDataset.push_to_argilla
, callingpush_to_argilla
again won't do anything since the dataset is already pushed to Argilla (#3465). - After calling
FeedbackDataset.push_to_argilla
, callingfetch_records
won't do anything since the records are lazily fetched from Argilla (#3465). - After calling
FeedbackDataset.push_to_argilla
, the Argilla ID is no longer stored in the attribute/propertyargilla_id
but inid
instead (#3465).
- Fixed
ModuleNotFoundError
caused because theargilla.utils.telemetry
module used in theArgillaTrainer
was importing an optional dependency not installed by default (#3471). - Fixed
ImportError
caused because theargilla.client.feedback.config
module was importingpyyaml
optional dependency not installed by default (#3471).
- The
suggestion_type_enum
ENUM data type created in PostgreSQL didn't have any value (#3445).
- Fix database migration for PostgreSQL (See #3438)
- Added
GET /api/v1/users/{user_id}/workspaces
endpoint to list the workspaces to which a user belongs (#3308 and #3343). - Added
HuggingFaceDatasetMixin
for internal usage, to detach theFeedbackDataset
integrations from the class itself, and use Mixins instead (#3326). - Added
GET /api/v1/records/{record_id}/suggestions
API endpoint to get the list of suggestions for the responses associated to a record (#3304). - Added
POST /api/v1/records/{record_id}/suggestions
API endpoint to create a suggestion for a response associated to a record (#3304). - Added support for
RankingQuestionStrategy
,RankingQuestionUnification
and the.for_text_classification
method for theTrainingTaskMapping
(#3364) - Added
PUT /api/v1/records/{record_id}/suggestions
API endpoint to create or update a suggestion for a response associated to a record (#3304 & 3391). - Added
suggestions
attribute toFeedbackRecord
, and allow adding and retrieving suggestions from the Python client (#3370) - Added
allowed_for_roles
Python decorator to check whether the current user has the required role to access the decorated function/method forUser
andWorkspace
(#3383) - Added API and Python Client support for workspace deletion (Closes #3260)
- Added
GET /api/v1/me/workspaces
endpoint to list the workspaces of the current active user (#3390)
- Updated output payload for
GET /api/v1/datasets/{dataset_id}/records
,GET /api/v1/me/datasets/{dataset_id}/records
,POST /api/v1/me/datasets/{dataset_id}/records/search
endpoints to include the suggestions of the records based on the value of theinclude
query parameter (#3304). - Updated
POST /api/v1/datasets/{dataset_id}/records
input payload to add suggestions (#3304). - The
POST /api/datasets/:dataset-id/:task/bulk
endpoints don't create the dataset if does not exists (Closes #3244) - Added Telemetry support for
ArgillaTrainer
(closes #3325) User.workspaces
is no longer an attribute but a property, and is callinglist_user_workspaces
to list all the workspace names for a given user ID (#3334)- Renamed
FeedbackDatasetConfig
toDatasetConfig
and export/import from YAML as default instead of JSON (just used internally onpush_to_huggingface
andfrom_huggingface
methods ofFeedbackDataset
) (#3326). - The protected metadata fields support other than textual info - existing datasets must be reindex. See docs for more detail (Closes #3332).
- Updated
Dockerfile
parent image frompython:3.9.16-slim
topython:3.10.12-slim
(#3425). - Updated
quickstart.Dockerfile
parent image fromelasticsearch:8.5.3
toargilla/argilla-server:${ARGILLA_VERSION}
(#3425).
- Removed support to non-prefixed environment variables. All valid env vars start with
ARGILLA_
(See #3392).
- Fixed
GET /api/v1/me/datasets/{dataset_id}/records
endpoint returning always the responses for the records even ifresponses
was not provided via theinclude
query parameter (#3304). - Values for protected metadata fields are not truncated (Closes #3331).
- Big number ids are properly rendered in UI (Closes #3265)
- Fixed
ArgillaDatasetCard
to include the values/labels for all the existing questions (#3366)
- Integer support for record id in text classification, token classification and text2text datasets.
- Using
rg.init
with defaultargilla
user skips setting the default workspace if not available. (Closes #3340) - Resolved wrong import structure for
ArgillaTrainer
andTrainingTaskMapping
(Closes #3345) - Pin pydantic dependency to version < 2 (Closes 3348)
- Added
RankingQuestionSettings
class allowing to create ranking questions in the API usingPOST /api/v1/datasets/{dataset_id}/questions
endpoint (#3232) - Added
RankingQuestion
in the Python client to create ranking questions (#3275). - Added
Ranking
component in feedback task question form (#3177 & #3246). - Added
FeedbackDataset.prepare_for_training
method for generaring a framework-specific dataset with the responses provided forRatingQuestion
,LabelQuestion
andMultiLabelQuestion
(#3151). - Added
ArgillaSpaCyTransformersTrainer
class for supporting the training withspacy-transformers
(#3256).
- Added instructions for how to run the Argilla frontend in the developer docs (#3314).
- All docker related files have been moved into the
docker
folder (#3053). release.Dockerfile
have been renamed toDockerfile
(#3133).- Updated
rg.load
function to raise aValueError
with a explanatory message for the cases in which the user tries to use the function to load aFeedbackDataset
(#3289). - Updated
ArgillaSpaCyTrainer
to allow re-usingtok2vec
(#3256).
- Check available workspaces on Argilla on
rg.set_workspace
(Closes #3262)
- Replaced
np.float
alias byfloat
to avoidAttributeError
when usingfind_label_errors
function withnumpy>=1.24.0
(#3214). - Fixed
format_as("datasets")
when no responses or optional respones inFeedbackRecord
, to set their value to what 🤗 Datasets expects instead of justNone
(#3224). - Fixed
push_to_huggingface()
whengenerate_card=True
(default behaviour), as we were passing a sample record to theArgillaDatasetCard
class, andUUID
s introduced in 1.10.0 (#3192), are not JSON-serializable (#3231). - Fixed
from_argilla
andpush_to_argilla
to ensure consistency on both field and question re-construction, and to ensureUUID
s are properly serialized asstr
, respectively (#3234). - Refactored usage of
import argilla as rg
to clarify package navigation (#3279).
- Fixed URLs in Weak Supervision with Sentence Tranformers tutorial #3243.
- Fixed library buttons' formatting on Tutorials page (#3255).
- Modified styling of error code outputs in notebooks (#3270).
- Added ElasticSearch and OpenSearch versions (#3280).
- Removed template notebook from table of contents (#3271).
- Fixed tutorials with
pip install argilla
to not use older versions of the package (#3282).
- Added
metadata
attribute to theRecord
of theFeedbackDataset
(#3194) - New
users update
command to update the role for an existing user (#3188) - New
Workspace
class to allow users manage their Argilla workspaces and the users assigned to those workspaces via the Python client (#3180) - Added
User
class to let users manage their Argilla users via the Python client (#3169). - Added an option to display
tqdm
progress bar toFeedbackDataset.push_to_argilla
when looping over the records to upload (#3233).
- The role system now support three different roles
owner
,admin
andannotator
(#3104) admin
role is scoped to workspace-level operations (#3115)- The
owner
user is created among the default pool of users in the quickstart, and the default user in the server has nowowner
role (#3248), reverting (#3188).
- As of Python 3.7 end-of-life (EOL) on 2023-06-27, Argilla will no longer support Python 3.7 (#3188). More information at https://peps.python.org/pep-0537/
- Added search component for feedback datasets (#3138)
- Added markdown support for feedback dataset guidelines (#3153)
- Added Train button for feedback datasets (#3170)
- Updated
SearchEngine
andPOST /api/v1/me/datasets/{dataset_id}/records/search
to return thetotal
number of records matching the search query (#3166)
- Replaced Enum for string value in URLs for client API calls (Closes #3149)
- Resolve breaking issue with
ArgillaSpanMarkerTrainer
for Named Entity Recognition withspan_marker
v1.1.x onwards. - Move
ArgillaDatasetCard
import under@requires_version
decorator, so that theImportError
onhuggingface_hub
is handled properly (#3174) - Allow flow
FeedbackDataset.from_argilla
->FeedbackDataset.push_to_argilla
under different dataset names and/or workspaces (#3192)
- Added boolean
use_markdown
property toTextFieldSettings
model. - Added boolean
use_markdown
property toTextQuestionSettings
model. - Added new status
draft
for theResponse
model. - Added
LabelSelectionQuestionSettings
class allowing to create label selection (single-choice) questions in the API (#3005) - Added
MultiLabelSelectionQuestionSettings
class allowing to create multi-label selection (multi-choice) questions in the API (#3010). - Added
POST /api/v1/me/datasets/{dataset_id}/records/search
endpoint (#3068). - Added new components in feedback task Question form: MultiLabel (#3064) and SingleLabel (#3016).
- Added docstrings to the
pydantic.BaseModel
s defined atargilla/client/feedback/schemas.py
(#3137) - Added the information about executing tests in the developer documentation ([#3143]).
- Updated
GET /api/v1/me/datasets/:dataset_id/metrics
output payload to include the count of responses withdraft
status. - Added
LabelSelectionQuestionSettings
class allowing to create label selection (single-choice) questions in the API. - Added
MultiLabelSelectionQuestionSettings
class allowing to create multi-label selection (multi-choice) questions in the API. - Database setup for unit tests. Now the unit tests use a different database than the one used by the local Argilla server (Closes #2987).
- Updated
alembic
setup to be able to autogenerate revision/migration scripts using SQLAlchemy metadata from Argilla server models (#3044) - Improved
DatasetCard
generation onFeedbackDataset.push_to_huggingface
whengenerate_card=True
, following the official HuggingFace Hub template, but suited toFeedbackDataset
s from Argilla (#3110)
- Disallow
fields
andquestions
inFeedbackDataset
with the same name (#3126). - Fixed broken links in the documentation and updated the development branch name from
development
todevelop
([#3145]).
/api/v1/datasets
new endpoint to list and create datasets (#2615)./api/v1/datasets/{dataset_id}
new endpoint to get and delete datasets (#2615)./api/v1/datasets/{dataset_id}/publish
new endpoint to publish a dataset (#2615)./api/v1/datasets/{dataset_id}/questions
new endpoint to list and create dataset questions (#2615)/api/v1/datasets/{dataset_id}/fields
new endpoint to list and create dataset fields (#2615)/api/v1/datasets/{dataset_id}/questions/{question_id}
new endpoint to delete a dataset questions (#2615)/api/v1/datasets/{dataset_id}/fields/{field_id}
new endpoint to delete a dataset field (#2615)/api/v1/workspaces/{workspace_id}
new endpoint to get workspaces by id (#2615)/api/v1/responses/{response_id}
new endpoint to update and delete a response (#2615)/api/v1/datasets/{dataset_id}/records
new endpoint to create and list dataset records (#2615)/api/v1/me/datasets
new endpoint to list user visible datasets (#2615)/api/v1/me/dataset/{dataset_id}/records
new endpoint to list dataset records with user responses (#2615)/api/v1/me/datasets/{dataset_id}/metrics
new endpoint to get the dataset user metrics (#2615)/api/v1/me/records/{record_id}/responses
new endpoint to create record user responses (#2615)- showing new feedback task datasets in datasets list ([#2719])
- new page for feedback task ([#2680])
- show feedback task metrics ([#2822])
- user can delete dataset in dataset settings page ([#2792])
- Support for
FeedbackDataset
in Python client (parent PR #2615, and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003]) - Integration with the HuggingFace Hub ([#2949])
- Added
ArgillaPeftTrainer
for text and token classificaiton #2854 - Added
predict_proba()
method toArgillaSetFitTrainer
- Added
ArgillaAutoTrainTrainer
for Text Classification #2664 - New
database revisions
command showing database revisions info
- Avoid rendering html for invalid html strings in Text2text ([#2911]argilla-io/argilla#2911)
- The
database migrate
command accepts a--revision
param to provide specific revision id tokens_length
metrics function returns empty data (#3045)token_length
metrics function returns empty data (#3045)mention_length
metrics function returns empty data (#3045)entity_density
metrics function returns empty data (#3045)
- Using Argilla with Python 3.7 runtime is deprecated and support will be removed from version 1.11.0 (#2902)
tokens_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)token_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)mention_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)entity_density
metrics function has been deprecated and will be removed in 1.10.0 (#3045)
- Removed mention
density
,tokens_length
andchars_length
metrics from token classification metrics storage (#3045) - Removed token
char_start
,char_end
,tag
, andscore
metrics from token classification metrics storage (#3045) - Removed tags-related metrics from token classification metrics storage (#3045)
- add
max_retries
andnum_threads
parameters torg.log
to run data logging request concurrently with backoff retry policy. See #2458 and #2533 rg.load
acceptsinclude_vectors
andinclude_metrics
when loading data. Closes #2398- Added
settings
param toprepare_for_training
(#2689) - Added
prepare_for_training
foropenai
(#2658) - Added
ArgillaOpenAITrainer
(#2659) - Added
ArgillaSpanMarkerTrainer
for Named Entity Recognition (#2693) - Added
ArgillaTrainer
CLI support. Closes (#2809)
- fix image alignment on token classification
- Argilla quickstart image dependencies are externalized into
quickstart.requirements.txt
. See #2666 - bulk endpoints will upsert data when record
id
is present. Closes #2535 - moved from
click
totyper
CLI support. Closes (#2815) - Argilla server docker image is built with PostgreSQL support. Closes #2686
- The
rg.log
computes all batches and raise an error for all failed batches. - The default batch size for
rg.log
is now 100.
argilla.training
bugfixes and unification (#2665)- Resolved several small bugs in the
ArgillaTrainer
.
- The
rg.log_async
function is deprecated and will be removed in next minor release.
ARGILLA_HOME_PATH
new environment variable (#2564).ARGILLA_DATABASE_URL
new environment variable (#2564).- Basic support for user roles with
admin
andannotator
(#2564). id
,first_name
,last_name
,role
,inserted_at
andupdated_at
new user fields (#2564)./api/users
new endpoint to list and create users (#2564)./api/users/{user_id}
new endpoint to delete users (#2564)./api/workspaces
new endpoint to list and create workspaces (#2564)./api/workspaces/{workspace_id}/users
new endpoint to list workspace users (#2564)./api/workspaces/{workspace_id}/users/{user_id}
new endpoint to create and delete workspace users (#2564).argilla.tasks.users.migrate
new task to migrate users from old YAML file to database (#2564).argilla.tasks.users.create
new task to create a user (#2564).argilla.tasks.users.create_default
new task to create a user with default credentials (#2564).argilla.tasks.database.migrate
new task to execute database migrations (#2564).release.Dockerfile
andquickstart.Dockerfile
now creates a defaultargilladata
volume to persist data (#2564).- Add user settings page. Closes #2496
- Added
Argilla.training
module with support forspacy
,setfit
, andtransformers
. Closes #2504
- Now the
prepare_for_training
method is working whenmulti_label=True
. Closes #2606
ARGILLA_USERS_DB_FILE
environment variable now it's only used to migrate users from YAML file to database (#2564).full_name
user field is now deprecated andfirst_name
andlast_name
should be used instead (#2564).password
user field now requires a minimum of8
and a maximum of100
characters in size (#2564).quickstart.Dockerfile
image default users fromteam
andargilla
toadmin
andannotator
including new passwords and API keys (#2564).- Datasets to be managed only by users with
admin
role (#2564). - The list of rules is now accessible while metrics are computed. Closes#2117
- Style updates for weak labeling and adding feedback toast when delete rules. See #2626 and #2648
email
user field (#2564).disabled
user field (#2564).- Support for private workspaces (#2564).
ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY
andARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD
environment variables. Usepython -m argilla.tasks.users.create_default
instead (#2564).- The old headers for
API Key
andworkspace
from python client - The default value for old
API Key
constant. Closes #2251
1.5.1 - 2023-03-30
- Copying datasets between workspaces with proper owner/workspace info. Closes #2562
- Copy dataset with empty workspace to the default user workspace 905d4de
- Using elasticsearch config to request backend version. Closes #2311
- Remove sorting by score in labels. Closes #2622
- Update field name in metadata for image url. See #2609
- Improvements in tutorial doc cards. Closes #2216
1.5.0 - 2023-03-21
- Add the fields to retrieve when loading the data from argilla.
rg.load
takes too long because of the vector field, even when users don't need it. Closes #2398 - Add new page and components for dataset settings. Closes #2442
- Add ability to show image in records (for TokenClassification and TextClassification) if an URL is passed in metadata with the key _image_url
- Non-searchable fields support in metadata. #2570
- Add record ID references to the prepare for training methods. Closes #2483
- Add tutorial on Image Classification. #2420
- Add Train button, visible for "admin" role, with code snippets from a selection of libraries. Closes [#2591] (argilla-io/argilla#2591)
- Labels are now centralized in a specific vuex ORM called GlobalLabel Model, see argilla-io/argilla#2210. This model is the same for TokenClassification and TextClassification (so both task have labels with color_id and shortcuts parameters in the vuex ORM)
- The shortcuts improvement for labels #2339 have been moved to the vuex ORM in dataset settings feature #2444
- Update "Define a labeling schema" section in docs.
- The record inputs are sorted alphabetically in UI by default. #2581
- The record inputs are fully visible when pagination size is one and the height of collapsed area size is bigger for laptop screen. #2587
- Allow URL to be clickable in Jupyter notebook again. Closes #2527
- Removing some data scan deprecated endpoints used by old clients. This change will break compatibility with client
<v1.3.0
- Stop using old scan deprecated endpoints in python client. This logic will break client compatibility with server version
<1.3.0
- Remove the previous way to add labels through the dataset page. Now labels can be added only through dataset settings page.