Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add end2end example on creating a basic text-classification dataset #4208

Merged
merged 66 commits into from
Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
4f9bbb6
docs: first tutorial for end2end examples on feedback dataset for tex…
plaguss Nov 13, 2023
a663698
Merge branch 'develop' of github.com:argilla-io/argilla into docs/end…
plaguss Nov 13, 2023
c8681ba
docs: update with task templates and push to argilla hf hub
plaguss Nov 13, 2023
eb35dd3
feat: parametrize notebook to run via papermill
plaguss Nov 15, 2023
212752c
feat: script to run end2end notebooks
plaguss Nov 15, 2023
66c55cc
refactor: grab apikey based on quickstart image
plaguss Nov 15, 2023
fea8b11
feat: initial version of end2end examples from ci/cd
plaguss Nov 15, 2023
fd0ba5b
feat: apply comments from code review
plaguss Nov 15, 2023
3130fdf
feat: update workflow to run depending on the result from check-repo-…
Nov 16, 2023
5a676fb
feat: update naming convention for examples to allow sorting them bef…
Nov 16, 2023
a6fc67a
refactor: update glob call and sort notebooks before running
Nov 16, 2023
6398947
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 16, 2023
bb5c580
chore: remove deleted file
Nov 16, 2023
a9c893a
Merge branch 'docs/end2end-text-classification' of https://github.com…
Nov 16, 2023
3ed3992
Merge branch 'develop' of https://github.com/argilla-io/argilla into …
Nov 16, 2023
b4f61cb
feat: add telemetry for the tutorial
Nov 16, 2023
6acfe0c
refactor: set python3 as the default kernel for running the notebooks
Nov 16, 2023
72c3ddf
tests: updated en2end workflow
davidberenstein1957 Nov 21, 2023
52276c0
chore: added file changes for end2end tests
davidberenstein1957 Nov 21, 2023
99be629
chore: updated server launch command
davidberenstein1957 Nov 21, 2023
f66345f
chore: updated credentials
davidberenstein1957 Nov 21, 2023
d4ace2a
chore: add HF_HUB_ACCESS_TOKEN
davidberenstein1957 Nov 21, 2023
a1c8220
chore: add .png for workflows
davidberenstein1957 Nov 22, 2023
9f96702
docs: updated end2end tutorials
davidberenstein1957 Nov 22, 2023
0eb1f8f
chore: updated starting reference
davidberenstein1957 Nov 22, 2023
328bb7e
chore: updated relative path to end2end examples
davidberenstein1957 Nov 22, 2023
adfb5cf
chore: updated check-repo-files to hit updates in correct example folder
davidberenstein1957 Nov 22, 2023
5c2e468
docs: updated phrasing
davidberenstein1957 Nov 22, 2023
a284033
docs: updated phrasing
davidberenstein1957 Nov 22, 2023
6c3ec05
docs: renamed files for sequential runnign
davidberenstein1957 Nov 22, 2023
4d55a00
docs: added updated data model
davidberenstein1957 Nov 22, 2023
9f0352a
docs: update user management section
davidberenstein1957 Nov 26, 2023
cd822eb
docs: added extra section on users and workspaces
davidberenstein1957 Nov 26, 2023
9e07b00
docs: Adding `metadata` to a `text-classification` dataset (#4313)
kursathalat Nov 26, 2023
eb43ba3
docs: slight updates phrasing
davidberenstein1957 Nov 26, 2023
f3650c0
Merge branch 'develop' into docs/end2end-text-classification
Nov 27, 2023
a464917
feat: update parameter cells and some comments in notebooks
Nov 27, 2023
0c0c255
feat: update parameters from code review
Nov 27, 2023
8b91a3c
fix: remove root folder reference from the script, warn that the scri…
Nov 28, 2023
b6289a0
docs: Adding `vectors` to a text-classification dataset (#4338)
kursathalat Nov 28, 2023
2473b91
docs: Update records with `Responses` and `Suggestions` (#4326)
kursathalat Nov 28, 2023
bb99b19
refactor: write output file to tmpdir
Nov 28, 2023
b0f3018
docs: example on assigning record to your team for a text classificat…
sdiazlor Nov 29, 2023
0f6d36d
docs: updated references
davidberenstein1957 Nov 29, 2023
2ebd343
docs: added reference to parpermill description
davidberenstein1957 Nov 29, 2023
c8739a3
docs: updated tabs
davidberenstein1957 Nov 29, 2023
94436ec
docs: marked cells as parameters
davidberenstein1957 Nov 29, 2023
1ae526c
docs: assign records
davidberenstein1957 Nov 29, 2023
e92e80c
chore: add utils module
davidberenstein1957 Nov 29, 2023
6bc5fff
docs: assign records
davidberenstein1957 Nov 29, 2023
63aa5d8
Merge branch 'develop' into docs/end2end-text-classification
davidberenstein1957 Nov 29, 2023
c1d3f49
chore: updated cell id
davidberenstein1957 Nov 29, 2023
0279821
chore: added kernelspec info
davidberenstein1957 Nov 29, 2023
a520efa
chore: updated add vectors
davidberenstein1957 Nov 29, 2023
01c5d97
docs: updated suggestions and responses
davidberenstein1957 Nov 29, 2023
cd8643e
docs: Tutorial on fine-tuning for a text-classification dataset (#4348)
kursathalat Nov 29, 2023
9050330
docs: updated token
davidberenstein1957 Nov 29, 2023
bfcc207
docs: added updated to end2end training
davidberenstein1957 Nov 29, 2023
a5cbedb
chore: review notebook validator
davidberenstein1957 Nov 29, 2023
2b75399
docs: example on filtering and querying records for a text classifica…
sdiazlor Nov 29, 2023
b2da325
docs: updated end2end to running examples
davidberenstein1957 Nov 29, 2023
dedadae
docs: updated tutorials and images
davidberenstein1957 Nov 29, 2023
18a62ee
docs: updated message used
davidberenstein1957 Nov 29, 2023
1a298a9
docs: added reference to dataset
davidberenstein1957 Nov 29, 2023
dfd9bad
docs: push with local model instead
davidberenstein1957 Nov 29, 2023
89c0c50
docs: updated practical guides image
davidberenstein1957 Nov 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions .github/workflows/end2end-examples.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
name: Run end2end-examples

on:
workflow_dispatch:
schedule:
# “At 01:13 on Saturday.”
- cron: "13 1 * * 6"
inputs:
argillaDockerImage:
description: "The name of the Docker image of Argilla to use."
default: argilla/argilla-quickstart:latest
required: false
type: string

jobs:
end2end-examples:
name: end2end notebook examples, FeedbackDataset for text-classification
runs-on: ubuntu-latest
services:
search_engine:
image: ${{ inputs.argillaDockerImage }}
ports:
- 6900:6900
defaults:
run:
shell: bash -l {0}
steps:
- name: Checkout Code 🛎
uses: actions/checkout@v3
- name: Setup Conda Env 🐍
uses: conda-incubator/setup-miniconda@v2
with:
miniforge-variant: Mambaforge
miniforge-version: latest
use-mamba: true
activate-environment: argilla
- name: Get date for conda cache
id: get-date
run: echo "::set-output name=today::$(/bin/date -u '+%Y%m%d')"
shell: bash
- name: Cache Conda env
uses: actions/cache@v3
id: cache
with:
path: ${{ env.CONDA }}/envs
key: conda-${{ runner.os }}--${{ runner.arch }}--${{ steps.get-date.outputs.today }}-${{ hashFiles('environment_dev.yml') }}-${{ env.CACHE_NUMBER }}
- name: Update environment
if: steps.cache.outputs.cache-hit != 'true'
run: mamba env update -n argilla -f environment_dev.yml
- name: Cache pip 👜
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ env.CACHE_NUMBER }}-${{ hashFiles('pyproject.toml') }}
- name: Set huggingface hub credentials
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop' || startsWith(github.ref, 'refs/heads/releases')
run: |
echo "HF_HUB_ACCESS_TOKEN=${{ secrets.HF_HUB_ACCESS_TOKEN }}" >> "$GITHUB_ENV"
echo "Enable HF access token"
- name: Set Argilla search engine env variable
if: startsWith(inputs.searchEngineDockerImage, 'docker.elastic.co')
run: |
echo "ARGILLA_SEARCH_ENGINE=elasticsearch" >> "$GITHUB_ENV"
echo "Configure elasticsearch engine"
- name: Set Argilla search engine env variable
if: startsWith(inputs.searchEngineDockerImage, 'opensearchproject')
run: |
echo "ARGILLA_SEARCH_ENGINE=opensearch" >> "$GITHUB_ENV"
echo "Configure opensearch engine"
- name: Run end2end examples 📈

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of running this every time, do you think we can filter ut a but and only run it when there are changes to src or examples.py? Also, perhaps we can use a subset of the datasets and/or setup a persistent cache for the datasets and set this equal to our "Cache pip 👜" step?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be relevant in other places we download 'datasets' for our cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update that

env:
ARGILLA_ENABLE_TELEMETRY: 0
run: |
pip install -e .
pip install papermill
python scripts/end2end_examples.py
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading