Skip to content

Commit

Permalink
docs: add end2end example on creating a basic text-classification dat…
Browse files Browse the repository at this point in the history
…aset (#4208)

<!-- Thanks for your contribution! As part of our Community Growers
initiative 🌱, we're donating Justdiggit bunds in your name to reforest
sub-Saharan Africa. To claim your Community Growers certificate, please
contact David Berenstein in our Slack community or fill in this form
https://tally.so/r/n9XrxK once your PR has been merged. -->

# Description

This PR includes 2 features towards the #4178 issue.
- A tutorial for the creation of a `FeedbackDataset` for
`text-classification`.
- A new script has been added to run the notebooks automatically, via
end2end.yml workflow.

Closes #4179 and #4220

**Type of change**

(Remember to title the PR according to the type of change)

- [ ] Documentation update

**How Has This Been Tested**

(Please describe the tests that you ran to verify your changes.)

- [ ] `sphinx-autobuild` (read [Developer
Documentation](https://docs.argilla.io/en/latest/community/developer_docs.html#building-the-documentation)
for more details)

**Checklist**

- [ ] I added relevant documentation
- [x] I followed the style guidelines of this project
- [x] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [ ] I have added relevant notes to the `CHANGELOG.md` file (See
https://keepachangelog.com/)

---------

Co-authored-by: Agustin Piqueres <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: davidberenstein1957 <[email protected]>
Co-authored-by: kursathalat <[email protected]>
Co-authored-by: sdiazlor <[email protected]>
Co-authored-by: Sara Han <[email protected]>
  • Loading branch information
7 people authored Nov 29, 2023
1 parent f2f82e0 commit cf7f67a
Show file tree
Hide file tree
Showing 55 changed files with 6,665 additions and 8 deletions.
10 changes: 10 additions & 0 deletions .github/workflows/check-repo-files.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ on:
pythonChanges:
description: "True if some files in python code have changed"
value: ${{ jobs.check-repo-files.outputs.pythonChanges }}
end2endChanges:
description: "True if some files in python code have changed"
value: ${{ jobs.check-repo-files.outputs.end2endChanges }}
buildChanges:
description: "True if some files affecting the build have changed"
value: ${{ jobs.check-repo-files.outputs.buildChanges }}
Expand All @@ -17,6 +20,7 @@ jobs:
outputs:
pythonChanges: ${{ steps.path_filter.outputs.pythonChanges }}
buildChanges: ${{ steps.path_filter.outputs.buildChanges }}
end2endChanges: ${{ steps.path_filter.outputs.end2endChanges }}
steps:
- name: Checkout Code 🛎
uses: actions/checkout@v3
Expand All @@ -30,6 +34,12 @@ jobs:
- 'tests/**'
- 'pyproject.toml'
- 'setup.py'
end2endChanges:
- 'src/**'
- 'pyproject.toml'
- 'setup.py'
- 'scripts/end2end_examples.py'
- 'docs/_source/tutorials_and_integrations/tutorials/feedback/end2end_examples/**'
buildChanges:
- 'src/**'
- 'frontend/**'
Expand Down
98 changes: 98 additions & 0 deletions .github/workflows/end2end-examples.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
name: Run end2end sdk examples

on:
workflow_call:
inputs:
runsOn:
required: false
type: string
default: extended-runner
searchEngineDockerImage:
description: "The name of the Docker image of the search engine to use."
default: docker.elastic.co/elasticsearch/elasticsearch:8.8.2
required: false
type: string
searchEngineDockerEnv:
description: "The name of the Docker image of the search engine to use."
default: '{"discovery.type": "single-node", "xpack.security.enabled": "false"}'
required: false
type: string
env:
# Increase this value to reset cache if etc/example-environment.yml has not changed
CACHE_NUMBER: 5

jobs:
# Runs depending on the result from the check-repo-files.yml
call-check-repo-files:
uses: ./.github/workflows/check-repo-files.yml

end2end-examples:
name: end2end notebook examples, FeedbackDataset for text-classification
runs-on: ${{ inputs.runsOn }}
services:
search_engine:
image: ${{ inputs.searchEngineDockerImage }}
ports:
- 9200:9200
env: ${{ fromJson(inputs.searchEngineDockerEnv) }}
defaults:
run:
shell: bash -l {0}
steps:
- name: Checkout Code 🛎
uses: actions/checkout@v3
- name: Setup Conda Env 🐍
uses: conda-incubator/setup-miniconda@v2
with:
miniforge-variant: Mambaforge
miniforge-version: latest
use-mamba: true
activate-environment: argilla
- name: Get date for conda cache
id: get-date
run: echo "::set-output name=today::$(/bin/date -u '+%Y%m%d')"
shell: bash
- name: Cache Conda env
uses: actions/cache@v3
id: cache
with:
path: ${{ env.CONDA }}/envs
key: conda-${{ runner.os }}--${{ runner.arch }}--${{ steps.get-date.outputs.today }}-${{ hashFiles('environment_dev.yml') }}-${{ env.CACHE_NUMBER }}
- name: Update environment
if: steps.cache.outputs.cache-hit != 'true'
run: mamba env update -n argilla -f environment_dev.yml
- name: Cache pip 👜
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ env.CACHE_NUMBER }}-${{ hashFiles('pyproject.toml') }}
- name: Set huggingface hub credentials
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop' || startsWith(github.ref, 'refs/heads/releases')
run: |
echo "HF_HUB_ACCESS_TOKEN=${{ secrets.HF_HUB_ACCESS_TOKEN }}" >> "$GITHUB_ENV"
echo "Enable HF access token"
- name: Set Argilla search engine env variable
if: startsWith(inputs.searchEngineDockerImage, 'docker.elastic.co')
run: |
echo "ARGILLA_SEARCH_ENGINE=elasticsearch" >> "$GITHUB_ENV"
echo "Configure elasticsearch engine"
- name: Set Argilla search engine env variable
if: startsWith(inputs.searchEngineDockerImage, 'opensearchproject')
run: |
echo "ARGILLA_SEARCH_ENGINE=opensearch" >> "$GITHUB_ENV"
echo "Configure opensearch engine"
- name: Launch Argilla Server
env:
ARGILLA_ENABLE_TELEMETRY: 0
run: |
pip install -e .
python -m argilla server database migrate
python -m argilla server database users create_default
uvicorn argilla.server.app:app --port 6900 --host 0.0.0.0 &
- name: Run end2end examples 📈
env:
ARGILLA_ENABLE_TELEMETRY: 0
HF_HUB_ACCESS_TOKEN: ${{ secrets.HF_HUB_ACCESS_TOKEN }}
run: |
pip install papermill
python scripts/end2end_examples.py
22 changes: 22 additions & 0 deletions .github/workflows/package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,28 @@ jobs:
pytestArgs: tests/unit
secrets: inherit

run_end2end_tests:
strategy:
matrix:
include:
- searchEngineDockerImage: docker.elastic.co/elasticsearch/elasticsearch:8.8.2
searchEngineDockerEnv: '{"discovery.type": "single-node", "xpack.security.enabled": "false"}'
coverageReport: coverage-elasticsearch-8.8.2
runsOn: extended-runner
- searchEngineDockerImage: opensearchproject/opensearch:2.4.1
searchEngineDockerEnv: '{"discovery.type": "single-node", "plugins.security.disabled": "true"}'
coverageReport: coverage-opensearch-2.4.1
runsOn: ubuntu-latest
name: Run end2end tests
uses: ./.github/workflows/end2end-examples.yml
needs: check_repo_files
if: needs.check_repo_files.outputs.end2endChanges == 'true'
with:
runsOn: ${{ matrix.runsOn }}
searchEngineDockerImage: ${{ matrix.searchEngineDockerImage }}
searchEngineDockerEnv: ${{ matrix.searchEngineDockerEnv }}
secrets: inherit

run_unit_test_with_extra_engines:
strategy:
matrix:
Expand Down
3 changes: 2 additions & 1 deletion docs/_source/_static/images/llms/fb-model.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit cf7f67a

Please sign in to comment.