Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.4.0 #2500

Merged
merged 53 commits into from
Mar 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
e8a8f89
[pre-commit.ci] pre-commit autoupdate (#2299)
pre-commit-ci[bot] Feb 7, 2023
a09c74d
Add deploy to readme (#2307)
dvsrepo Feb 7, 2023
2bc97c6
Update README.md (#2308)
dvsrepo Feb 7, 2023
d354274
chore: Upgrade package version
Feb 9, 2023
ff4e79f
chore: Merge branch 'main' into develop
Feb 9, 2023
5f0627c
ci: Replace `isort` by `ruff` in `pre-commit` (#2325)
tomaarsen Feb 14, 2023
91a77ad
Docs: Update readme with quickstart section and new links to guides (…
dvsrepo Feb 14, 2023
6d2885f
Enhancement: Also validate records on assignment of variables (#2337)
tomaarsen Feb 15, 2023
179ffb9
Enhancement: Distinguish between error message and context in validat…
tomaarsen Feb 15, 2023
c83ec9e
ci: Setup black line-length in toml file (#2352)
frascuchon Feb 16, 2023
4c5f513
Use `rich` for logging, tracebacks, printing, progressbars (#2350)
tomaarsen Feb 16, 2023
5a8bb28
chore: Replace old recognai emails with argilla ones (#2365)
jfcalvo Feb 20, 2023
175a52a
refactor: remove the classification labeling rules service (#2361)
frascuchon Feb 20, 2023
3c27fd5
[pre-commit.ci] pre-commit autoupdate
pre-commit-ci[bot] Feb 21, 2023
8f0d10d
Documentation update: adding missing n (#2362)
Gnonpi Feb 21, 2023
8f90000
chore: Merge branch 'main' into develop
frascuchon Feb 22, 2023
5ab38be
ci: Remove Pyre from CI (#2358)
tomaarsen Feb 22, 2023
e999fe2
Refactor/deprecate dataset owner (#2386)
frascuchon Feb 22, 2023
4e623d4
feat: Add `active_client` function to main argilla module (#2387)
frascuchon Feb 22, 2023
4e18c6b
Refactor/remove no workspace usage and better superuser computation (…
frascuchon Feb 22, 2023
317ce42
ci: remove checkpoint from PR template (#2390
keithCuniah Feb 22, 2023
ed55719
[pre-commit.ci] pre-commit autoupdate (#2376)
tomaarsen Feb 22, 2023
c35d63a
CI: Skip rather than failing in 2 common scenarios (#2392)
tomaarsen Feb 24, 2023
68eddcb
Refactor: Replace "ar" with "rg" in test imports (#2393)
tomaarsen Feb 27, 2023
c27672f
Refactor: Add `require_version` function and `requires_version` decor…
tomaarsen Feb 27, 2023
fd72834
[pre-commit.ci] pre-commit autoupdate
pre-commit-ci[bot] Feb 28, 2023
b127ff0
[pre-commit.ci] pre-commit autoupdate (#2431)
tomaarsen Feb 28, 2023
4a92b35
feat: Extend shortcuts to include alphabet for token classification (…
cceyda Feb 28, 2023
f5834a5
refactor: Improve efficiency of `.scan` (and `.load`) if `limit` is …
tomaarsen Mar 1, 2023
d789fa1
fix: added regex match to set workspace method (#2427)
davidberenstein1957 Mar 2, 2023
fc71c3b
fix: error when loading record with empty string query (#2429)
davidberenstein1957 Mar 2, 2023
6649e5f
Refactor/prepare datasets endpoints (#2403)
frascuchon Mar 2, 2023
a456f58
chore: Merge branch 'main' into develop
frascuchon Mar 2, 2023
ca06deb
Merge branch 'develop' of github.com:recognai/rubrix into develop
frascuchon Mar 2, 2023
40ca933
refactor: Make workspace required in requests (#2471)
frascuchon Mar 3, 2023
b3b897a
feat: Allow passing workspace as client param for `rg.log` or `rg.loa…
davidberenstein1957 Mar 6, 2023
3ebea76
feat: Deprecate `chunk_size` in favor of `batch_size` for `rg.log` (#…
tomaarsen Mar 6, 2023
e25be3e
feat: Expose `batch_size` parameter for `rg.load` (#2460)
tomaarsen Mar 6, 2023
7199780
docs: Add AutoTrain to readme
dvsrepo Mar 6, 2023
5600301
fix: added flexible app redirect to docs page (#2428)
davidberenstein1957 Mar 6, 2023
21efb83
feat: Add text2text support for prepare for training spark nlp (#2466)
davidberenstein1957 Mar 6, 2023
58aa9f9
[pre-commit.ci] pre-commit autoupdate (#2490)
pre-commit-ci[bot] Mar 7, 2023
4aecb13
Feat/python api sort support (#2487)
davidberenstein1957 Mar 7, 2023
3fce915
feat: Bulk annotation improvement (#2437)
leiyre Mar 8, 2023
7b89393
chore: set release version
frascuchon Mar 8, 2023
93fa938
chore: Update dev version
frascuchon Mar 8, 2023
b49fa2b
chore:Merge branch 'main' into develop
frascuchon Mar 8, 2023
c9a30fb
chore: Merge branch 'develop' into releases/1.4.0
frascuchon Mar 8, 2023
8551dc5
refactor: send username when no workspace setup (#2499)
frascuchon Mar 8, 2023
d2ed1fc
chore: Review active client doc references (#2501)
frascuchon Mar 8, 2023
29c9ee3
feat: `configure_dataset` accepts a workspace as argument (#2503)
frascuchon Mar 8, 2023
a97b7d0
Docs for bulk annotation (#2506)
nataliaElv Mar 8, 2023
3a91bbc
chore: Set workspace to None as default for dataset configuration
frascuchon Mar 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
24 changes: 20 additions & 4 deletions .github/workflows/package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,11 @@ jobs:
defaults:
run:
shell: bash -l {0}
# Only build the package if we can deploy it as a docker image
env:
IS_DEPLOYABLE: ${{ secrets.AR_DOCKER_USERNAME != '' }}
outputs:
code_changes: ${{ steps.filter.outputs.code_changes }}

steps:
- name: Checkout Code 🛎
Expand All @@ -228,7 +233,7 @@ jobs:
- '.github/workflows/package.yml'
- name: Cache pip 👜
uses: actions/cache@v2
if: steps.filter.outputs.code_changes == 'true'
if: steps.filter.outputs.code_changes == 'true' && env.IS_DEPLOYABLE == 'true'
env:
# Increase this value to reset cache if pyproject.toml has not changed
CACHE_NUMBER: 0
Expand All @@ -238,18 +243,18 @@ jobs:

- name: Setup Node.js
uses: actions/setup-node@v2
if: steps.filter.outputs.code_changes == 'true'
if: steps.filter.outputs.code_changes == 'true' && env.IS_DEPLOYABLE == 'true'
with:
node-version: "14"

- name: Build Package 🍟
if: steps.filter.outputs.code_changes == 'true'
if: steps.filter.outputs.code_changes == 'true' && env.IS_DEPLOYABLE == 'true'
run: |
pip install -U build
scripts/build_distribution.sh

- name: Upload package artifact
if: steps.filter.outputs.code_changes == 'true'
if: steps.filter.outputs.code_changes == 'true' && env.IS_DEPLOYABLE == 'true'
uses: actions/upload-artifact@v2
with:
name: python-package
Expand All @@ -262,6 +267,9 @@ jobs:
- build
- test-elastic
- test-opensearch
env:
IS_DEPLOYABLE: ${{ secrets.AR_DOCKER_USERNAME != '' }}
if: needs.build.outputs.code_changes == 'true'
strategy:
matrix:
include:
Expand All @@ -281,30 +289,36 @@ jobs:
steps:
- name: Checkout Code 🛎
uses: actions/checkout@v2
if: env.IS_DEPLOYABLE == 'true'

- name: Download python package
uses: actions/download-artifact@v2
with:
name: python-package
path: dist
if: env.IS_DEPLOYABLE == 'true'

- name: Set up QEMU
uses: docker/setup-qemu-action@v2
if: env.IS_DEPLOYABLE == 'true'

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
if: env.IS_DEPLOYABLE == 'true'

- name: Docker meta
id: meta
uses: crazy-max/ghaction-docker-meta@v2
with:
images: ${{ matrix.image }}
if: env.IS_DEPLOYABLE == 'true'

- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.AR_DOCKER_USERNAME }}
password: ${{ secrets.AR_DOCKER_PASSWORD }}
if: env.IS_DEPLOYABLE == 'true'

- name: Build & push Docker image
uses: docker/build-push-action@v2
Expand All @@ -315,6 +329,7 @@ jobs:
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
push: true
if: env.IS_DEPLOYABLE == 'true'

- name: Docker Hub Description
uses: peter-evans/dockerhub-description@v3
Expand All @@ -323,6 +338,7 @@ jobs:
password: ${{ secrets.AR_DOCKER_PASSWORD }}
repository: ${{ matrix.image }}
readme-filepath: ${{ matrix.readme }}
if: env.IS_DEPLOYABLE == 'true'

# This job will upload a Python Package using Twine when a release is created
# For more information see:
Expand Down
78 changes: 0 additions & 78 deletions .github/workflows/pyre-check.yml

This file was deleted.

12 changes: 8 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,19 @@ repos:
# - --remove-header

- repo: https://github.com/psf/black
rev: 22.12.0
rev: 23.1.0
hooks:
- id: black
additional_dependencies: ["click==8.0.4"]

- repo: https://github.com/pycqa/isort
rev: 5.11.5
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.0.254
hooks:
- id: isort
# Simulate isort via (the much faster) ruff
- id: ruff
args:
- --select=I
- --fix

- repo: https://github.com/alessandrojcm/commitlint-pre-commit-hook
rev: v9.4.0
Expand Down
33 changes: 0 additions & 33 deletions .pyre_configuration

This file was deleted.

2 changes: 1 addition & 1 deletion CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ representative at an online or offline event.

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
contact@recogn.ai.
contact@argilla.io.
All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the
Expand Down
101 changes: 25 additions & 76 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,27 @@
<a href="https://pepy.tech/project/argilla">
<img alt="CI" src="https://static.pepy.tech/personalized-badge/argilla?period=month&units=international_system&left_color=grey&right_color=blue&left_text=pypi%20downloads/month">
</a>
<a href="https://huggingface.co/new-space?template=argilla/argilla-template-space">
<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-sm.svg" />
</a>
</p>

<h2 align="center">Open-source framework for data-centric NLP</h2>
<p align="center">Data Labeling, curation, and Inference Store</p>
<p align="center">Designed for MLOps & Feedback Loops</p>
<h2 align="center">Open-source platform for data-centric NLP</h2>
<p align="center">Data Labeling for MLOps & Feedback Loops</p>


> 🆕 🔥 Play with Argilla UI with this [live-demo](https://argilla-live-demo.hf.space) powered by Hugging Face Spaces (login:`argilla`, password:`1234`)

> 🆕 🔥 Since `1.2.0` Argilla supports vector search for finding the most similar records to a given one. This feature uses vector or semantic search combined with more traditional search (keyword and filter based). Learn more on this [deep-dive guide](https://docs.argilla.io/en/latest/guides/features/semantic-search.html)

https://user-images.githubusercontent.com/1107111/223220683-fbfa63da-367c-4cfa-bda5-66f47413b6b0.mp4

<br />

> 🆕 🔥 Train custom transformers models with no-code: [Argilla + AutoTrain](https://www.argilla.io/blog/argilla-meets-autotrain)

> 🆕 🔥 Deploy [Argilla on Spaces](https://huggingface.co/new-space?template=argilla/argilla-template-space)

> 🆕 🔥 Since `1.2.0` Argilla supports vector search for finding the most similar records to a given one. This feature uses vector or semantic search combined with more traditional search (keyword and filter based). Learn more on this [deep-dive guide](https://docs.argilla.io/en/latest/guides/features/semantic-search.html)

![imagen](https://user-images.githubusercontent.com/1107111/204772677-facee627-9b3b-43ca-8533-bbc9b4e2d0aa.png)

<!-- https://user-images.githubusercontent.com/25269220/200496945-7efb11b8-19f3-4793-bb1d-d42132009cbb.mp4 -->

Expand Down Expand Up @@ -62,7 +70,7 @@

### Advanced NLP labeling

- Programmatic labeling using [weak supervision](https://docs.argilla.io/en/latest/guides/techniques/weak_supervision.html). Built-in label models (Snorkel, Flyingsquid)
- Programmatic labeling using [rules and weak supervision](https://docs.argilla.io/en/latest/guides/programmatic_labeling_with_rules.html). Built-in label models (Snorkel, Flyingsquid)
- [Bulk-labeling](https://docs.argilla.io/en/latest/reference/webapp/features.html#bulk-annotate) and [search-driven annotation](https://docs.argilla.io/en/latest/guides/features/queries.html)
- Iterate on training data with any [pre-trained model](https://docs.argilla.io/en/latest/tutorials/libraries/huggingface.html) or [library](https://docs.argilla.io/en/latest/tutorials/libraries/libraries.html)
- Efficiently review and refine annotations in the UI and with Python
Expand All @@ -72,93 +80,34 @@
### Monitoring

- Close the gap between production data and data collection activities
- [Auto-monitoring](https://docs.argilla.io/en/latest/guides/steps/3_deploying.html) for [major NLP libraries and pipelines](https://docs.argilla.io/en/latest/tutorials/libraries/libraries.html) (spaCy, Hugging Face, FlairNLP)
- [Auto-monitoring](https://docs.argilla.io/en/latest/guides/log_load_and_prepare_data.html) for [major NLP libraries and pipelines](https://docs.argilla.io/en/latest/tutorials/libraries/libraries.html) (spaCy, Hugging Face, FlairNLP)
- [ASGI middleware](https://docs.argilla.io/en/latest/tutorials/notebooks/deploying-texttokenclassification-fastapi.html) for HTTP endpoints
- Argilla Metrics to understand data and model issues, [like entity consistency for NER models](https://docs.argilla.io/en/latest/guides/steps/4_monitoring.html)
- Argilla Metrics to understand data and model issues, [like entity consistency for NER models](https://docs.argilla.io/en/latest/guides/measure_datasets_with_metrics.html)
- Integrated with Kibana for custom dashboards

### Team workspaces

- Bring different users and roles into the NLP data and model lifecycles
- Organize data collection, review and monitoring into different [workspaces](https://docs.argilla.io/en/latest/getting_started/installation/user_management.html#workspace)
- Organize data collection, review and monitoring into different [workspaces](https://docs.argilla.io/en/latest/getting_started/installation/configurations/user_management.html)
- Manage workspace access for different users

## Quickstart
Argilla is composed of a `Python Server` with Elasticsearch as the database layer, and a `Python Client` to create and manage datasets.

To get started you need to **install the client and the server** with `pip`:
```bash

pip install "argilla[server]"

```

Then you need to **run [Elasticsearch (ES)](https://www.elastic.co/elasticsearch)**.

The simplest way is to use`Docker` by running:

```bash

docker run -d --name elasticsearch-for-argilla --network argilla-net -p 9200:9200 -p 9300:9300 -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:8.5.3

```
> :information_source: **Check [the docs](https://docs.argilla.io/en/latest/getting_started/quickstart.html) for further options and configurations for Elasticsearch.**
👋 Welcome! If you have just discovered Argilla this is the best place to get started. Argilla is composed of:

Finally you can **launch the server**:

```bash

python -m argilla

```
> :information_source: The most common error message after this step is related to the Elasticsearch instance not running. Make sure your Elasticsearch instance is running on http://localhost:9200/. If you already have an Elasticsearch instance or cluster, you point the server to its URL by using [ENV variables](#)
* Argilla Client: a powerful Python library for reading and writing data into Argilla, using all the libraries you love (transformers, spaCy, datasets, and any other).

* Argilla Server and UI: the API and UI for data annotation and curation.

🎉 You can now access Argilla UI pointing your browser at http://localhost:6900/.
To get started you need to:

**The default username and password are** `argilla` **and** `1234`.
1. Launch the Argilla Server and UI.

Your workspace will contain no datasets. So let's use the `datasets` library to create our first datasets!

First, you need to install `datasets`:
```bash

pip install datasets

```

Then go to your Python IDE of choice and run:
```python

import pandas as pd
import argilla as rg
from datasets import load_dataset

# load dataset from the hub
dataset = load_dataset("argilla/gutenberg_spacy-ner", split="train")

# read in dataset, assuming its a dataset for text classification
dataset_rg = rg.read_datasets(dataset, task="TokenClassification")

# log the dataset to the Argilla web app
rg.log(dataset_rg, "gutenberg_spacy-ner")

# load dataset from json
my_dataframe = pd.read_json(
"https://raw.githubusercontent.com/recognai/datasets/main/sst-sentimentclassification.json")

# convert pandas dataframe to DatasetForTextClassification
dataset_rg = rg.DatasetForTextClassification.from_pandas(my_dataframe)

# log the dataset to the Argilla web app
rg.log(dataset_rg, name="sst-sentimentclassification")
```
2. Pick a tutorial and start rocking with Argilla using Jupyter Notebooks, or Google Colab.

This will create two datasets that you can use to do a quick tour of the core features of Argilla.
To get started follow the steps [on the Quickstart docs page](https://docs.argilla.io/en/latest/getting_started/quickstart.html).

> 🚒 **If you find issues, get direct support from the team and other community members on the [Slack Community](https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g)**

For getting started with your own use cases, [go to the docs](https://docs.argilla.io).
## Principles
- **Open**: Argilla is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.). In fact, you can **use and combine your preferred libraries** without implementing any specific interface.

Expand Down
Binary file modified docs/_source/_static/reference/webapp/features-annotate.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/_source/_static/reference/webapp/homepage.png
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading