-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add end2end example on creating a basic text-classification dataset #4208
Merged
davidberenstein1957
merged 66 commits into
develop
from
docs/end2end-text-classification
Nov 29, 2023
Merged
Changes from 7 commits
Commits
Show all changes
66 commits
Select commit
Hold shift + click to select a range
4f9bbb6
docs: first tutorial for end2end examples on feedback dataset for tex…
plaguss a663698
Merge branch 'develop' of github.com:argilla-io/argilla into docs/end…
plaguss c8681ba
docs: update with task templates and push to argilla hf hub
plaguss eb35dd3
feat: parametrize notebook to run via papermill
plaguss 212752c
feat: script to run end2end notebooks
plaguss 66c55cc
refactor: grab apikey based on quickstart image
plaguss fea8b11
feat: initial version of end2end examples from ci/cd
plaguss fd0ba5b
feat: apply comments from code review
plaguss 3130fdf
feat: update workflow to run depending on the result from check-repo-…
5a676fb
feat: update naming convention for examples to allow sorting them bef…
a6fc67a
refactor: update glob call and sort notebooks before running
6398947
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] bb5c580
chore: remove deleted file
a9c893a
Merge branch 'docs/end2end-text-classification' of https://github.com…
3ed3992
Merge branch 'develop' of https://github.com/argilla-io/argilla into …
b4f61cb
feat: add telemetry for the tutorial
6acfe0c
refactor: set python3 as the default kernel for running the notebooks
72c3ddf
tests: updated en2end workflow
davidberenstein1957 52276c0
chore: added file changes for end2end tests
davidberenstein1957 99be629
chore: updated server launch command
davidberenstein1957 f66345f
chore: updated credentials
davidberenstein1957 d4ace2a
chore: add HF_HUB_ACCESS_TOKEN
davidberenstein1957 a1c8220
chore: add .png for workflows
davidberenstein1957 9f96702
docs: updated end2end tutorials
davidberenstein1957 0eb1f8f
chore: updated starting reference
davidberenstein1957 328bb7e
chore: updated relative path to end2end examples
davidberenstein1957 adfb5cf
chore: updated check-repo-files to hit updates in correct example folder
davidberenstein1957 5c2e468
docs: updated phrasing
davidberenstein1957 a284033
docs: updated phrasing
davidberenstein1957 6c3ec05
docs: renamed files for sequential runnign
davidberenstein1957 4d55a00
docs: added updated data model
davidberenstein1957 9f0352a
docs: update user management section
davidberenstein1957 cd822eb
docs: added extra section on users and workspaces
davidberenstein1957 9e07b00
docs: Adding `metadata` to a `text-classification` dataset (#4313)
kursathalat eb43ba3
docs: slight updates phrasing
davidberenstein1957 f3650c0
Merge branch 'develop' into docs/end2end-text-classification
a464917
feat: update parameter cells and some comments in notebooks
0c0c255
feat: update parameters from code review
8b91a3c
fix: remove root folder reference from the script, warn that the scri…
b6289a0
docs: Adding `vectors` to a text-classification dataset (#4338)
kursathalat 2473b91
docs: Update records with `Responses` and `Suggestions` (#4326)
kursathalat bb99b19
refactor: write output file to tmpdir
b0f3018
docs: example on assigning record to your team for a text classificat…
sdiazlor 0f6d36d
docs: updated references
davidberenstein1957 2ebd343
docs: added reference to parpermill description
davidberenstein1957 c8739a3
docs: updated tabs
davidberenstein1957 94436ec
docs: marked cells as parameters
davidberenstein1957 1ae526c
docs: assign records
davidberenstein1957 e92e80c
chore: add utils module
davidberenstein1957 6bc5fff
docs: assign records
davidberenstein1957 63aa5d8
Merge branch 'develop' into docs/end2end-text-classification
davidberenstein1957 c1d3f49
chore: updated cell id
davidberenstein1957 0279821
chore: added kernelspec info
davidberenstein1957 a520efa
chore: updated add vectors
davidberenstein1957 01c5d97
docs: updated suggestions and responses
davidberenstein1957 cd8643e
docs: Tutorial on fine-tuning for a text-classification dataset (#4348)
kursathalat 9050330
docs: updated token
davidberenstein1957 bfcc207
docs: added updated to end2end training
davidberenstein1957 a5cbedb
chore: review notebook validator
davidberenstein1957 2b75399
docs: example on filtering and querying records for a text classifica…
sdiazlor b2da325
docs: updated end2end to running examples
davidberenstein1957 dedadae
docs: updated tutorials and images
davidberenstein1957 18a62ee
docs: updated message used
davidberenstein1957 1a298a9
docs: added reference to dataset
davidberenstein1957 dfd9bad
docs: push with local model instead
davidberenstein1957 89c0c50
docs: updated practical guides image
davidberenstein1957 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
name: Run end2end-examples | ||
|
||
on: | ||
workflow_dispatch: | ||
schedule: | ||
# “At 01:13 on Saturday.” | ||
- cron: "13 1 * * 6" | ||
inputs: | ||
argillaDockerImage: | ||
description: "The name of the Docker image of Argilla to use." | ||
default: argilla/argilla-quickstart:latest | ||
required: false | ||
type: string | ||
|
||
jobs: | ||
end2end-examples: | ||
name: end2end notebook examples, FeedbackDataset for text-classification | ||
runs-on: ubuntu-latest | ||
services: | ||
search_engine: | ||
image: ${{ inputs.argillaDockerImage }} | ||
ports: | ||
- 6900:6900 | ||
defaults: | ||
run: | ||
shell: bash -l {0} | ||
steps: | ||
- name: Checkout Code 🛎 | ||
uses: actions/checkout@v3 | ||
- name: Setup Conda Env 🐍 | ||
uses: conda-incubator/setup-miniconda@v2 | ||
with: | ||
miniforge-variant: Mambaforge | ||
miniforge-version: latest | ||
use-mamba: true | ||
activate-environment: argilla | ||
- name: Get date for conda cache | ||
id: get-date | ||
run: echo "::set-output name=today::$(/bin/date -u '+%Y%m%d')" | ||
shell: bash | ||
- name: Cache Conda env | ||
uses: actions/cache@v3 | ||
id: cache | ||
with: | ||
path: ${{ env.CONDA }}/envs | ||
key: conda-${{ runner.os }}--${{ runner.arch }}--${{ steps.get-date.outputs.today }}-${{ hashFiles('environment_dev.yml') }}-${{ env.CACHE_NUMBER }} | ||
- name: Update environment | ||
if: steps.cache.outputs.cache-hit != 'true' | ||
run: mamba env update -n argilla -f environment_dev.yml | ||
- name: Cache pip 👜 | ||
uses: actions/cache@v3 | ||
with: | ||
path: ~/.cache/pip | ||
key: ${{ runner.os }}-pip-${{ env.CACHE_NUMBER }}-${{ hashFiles('pyproject.toml') }} | ||
- name: Set huggingface hub credentials | ||
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop' || startsWith(github.ref, 'refs/heads/releases') | ||
run: | | ||
echo "HF_HUB_ACCESS_TOKEN=${{ secrets.HF_HUB_ACCESS_TOKEN }}" >> "$GITHUB_ENV" | ||
echo "Enable HF access token" | ||
- name: Set Argilla search engine env variable | ||
if: startsWith(inputs.searchEngineDockerImage, 'docker.elastic.co') | ||
run: | | ||
echo "ARGILLA_SEARCH_ENGINE=elasticsearch" >> "$GITHUB_ENV" | ||
echo "Configure elasticsearch engine" | ||
- name: Set Argilla search engine env variable | ||
if: startsWith(inputs.searchEngineDockerImage, 'opensearchproject') | ||
run: | | ||
echo "ARGILLA_SEARCH_ENGINE=opensearch" >> "$GITHUB_ENV" | ||
echo "Configure opensearch engine" | ||
- name: Run end2end examples 📈 | ||
env: | ||
ARGILLA_ENABLE_TELEMETRY: 0 | ||
run: | | ||
pip install -e . | ||
pip install papermill | ||
python scripts/end2end_examples.py |
Binary file added
BIN
+63.7 KB
...rce/practical_guides/examples/images/feedback-dataset-text-classification-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of running this every time, do you think we can filter ut a but and only run it when there are changes to
src
orexamples.py
? Also, perhaps we can use a subset of thedatasets
and/or setup a persistent cache for thedatasets
and set this equal to our "Cache pip 👜" step?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be relevant in other places we download 'datasets' for our cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update that