-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add end2end example on creating a basic text-classification dataset #4208
docs: add end2end example on creating a basic text-classification dataset #4208
Conversation
…2end-text-classification
Example running locally (with elasticsearch and argilla quickstart images): argilla on docs/end2end-text-classification [!?] via 🐍 v3.10.13 (.venv) on ☁️ (us-east-1)
❯ python scripts/end2end_examples.py --api-key admin.apikey
Executing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 46/46 [00:19<00:00, 2.33cell/s]
✅ text-classification-create-dataset
Removed output notebook: output-notebook
Removed output folder: output_notebooks And an example forcing the process to fail with an error in a cell: ❯ python scripts/end2end_examples.py
/home/agustin/github_repos/argilla-io/argilla/.venv/lib/python3.10/site-packages/nbformat/__init__.py:93: MissingIDFieldWarning: Code cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
validate(nb)
Executing: 75%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 3/4 [00:01<00:00, 1.86cell/s]
❌ test_notebook
Traceback (most recent call last):
File "/home/agustin/github_repos/argilla-io/argilla/scripts/end2end_examples.py", line 65, in <module>
main()
File "/home/agustin/github_repos/argilla-io/argilla/scripts/end2end_examples.py", line 56, in main
example.run()
File "/home/agustin/github_repos/argilla-io/argilla/scripts/end2end_examples.py", line 39, in run
raise e from None
File "/home/agustin/github_repos/argilla-io/argilla/scripts/end2end_examples.py", line 35, in run
papermill.execute_notebook(str(self.src_filename), str(self.dst_filename), parameters=self.parameters)
File "/home/agustin/github_repos/argilla-io/argilla/.venv/lib/python3.10/site-packages/papermill/execute.py", line 134, in execute_notebook
raise_for_execution_errors(nb, output_path)
File "/home/agustin/github_repos/argilla-io/argilla/.venv/lib/python3.10/site-packages/papermill/execute.py", line 241, in raise_for_execution_errors
raise error
papermill.exceptions.PapermillExecutionError:
---------------------------------------------------------------------------
Exception encountered at "In [3]":
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[3], line 1
----> 1 assert 1==2
AssertionError: |
run: | | ||
echo "ARGILLA_SEARCH_ENGINE=opensearch" >> "$GITHUB_ENV" | ||
echo "Configure opensearch engine" | ||
- name: Run end2end examples 📈 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of running this every time, do you think we can filter ut a but and only run it when there are changes to src
or examples.py
? Also, perhaps we can use a subset of the datasets
and/or setup a persistent cache for the datasets
and set this equal to our "Cache pip 👜" step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be relevant in other places we download 'datasets' for our cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update that
scripts/end2end_examples.py
Outdated
"hf_token": hf_token, | ||
} | ||
|
||
examples = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps it is better to do this with glob
and select everything in our folder as examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure whether we could add other files here, but I think that's better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidberenstein1957 I think we would like to have these run in a specific order. For that should we name them using something that can be sorted? (maybe just a number at the end works fine)
@gabrielmbmb I requested your review to know your opinion on the workflow, but feel free to skip the remaining content of the PR |
7be9cd5
to
a6fc67a
Compare
for more information, see https://pre-commit.ci
@plaguss @davidberenstein1957 if the plan is to run this as part of CI/CD or periodically or for testing/QA purposes please make sure we DON'T track any telemetry as this will affect our understanding of real usage/errors, etc. |
…/argilla-io/argilla into docs/end2end-text-classification
Sure @dvsrepo, that should be taken into account in the workflow: env:
ARGILLA_ENABLE_TELEMETRY: 0
run: |
pip install -e .
pip install papermill
python scripts/end2end_examples.py |
perfect @plaguss ! |
The URL of the deployed environment for this PR is https://argilla-quickstart-pr-4208-ki24f765kq-no.a.run.app |
…ion dataset (#4342) <!-- Thanks for your contribution! As part of our Community Growers initiative 🌱, we're donating Justdiggit bunds in your name to reforest sub-Saharan Africa. To claim your Community Growers certificate, please contact David Berenstein in our Slack community or fill in this form https://tally.so/r/n9XrxK once your PR has been merged. --> # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. Closes #4184 **Type of change** (Remember to title the PR according to the type of change) - [ ] Documentation update **How Has This Been Tested** (Please describe the tests that you ran to verify your changes.) - [x ] `sphinx-autobuild` (read [Developer Documentation](https://docs.argilla.io/en/latest/community/developer_docs.html#building-the-documentation) for more details) **Checklist** - [ ] I added relevant documentation - [ ] I followed the style guidelines of this project - [ ] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the `CHANGELOG.md` file (See https://keepachangelog.com/)
18815fd
to
1ae526c
Compare
<!-- Thanks for your contribution! As part of our Community Growers initiative 🌱, we're donating Justdiggit bunds in your name to reforest sub-Saharan Africa. To claim your Community Growers certificate, please contact David Berenstein in our Slack community or fill in this form https://tally.so/r/n9XrxK once your PR has been merged. --> # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. Closes #4187 **Type of change** (Remember to title the PR according to the type of change) - [ ] Documentation update **How Has This Been Tested** (Please describe the tests that you ran to verify your changes.) - [ ] `sphinx-autobuild` (read [Developer Documentation](https://docs.argilla.io/en/latest/community/developer_docs.html#building-the-documentation) for more details) **Checklist** - [ ] I added relevant documentation - [ ] I followed the style guidelines of this project - [ ] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the `CHANGELOG.md` file (See https://keepachangelog.com/)
…tion dataset (#4350) <!-- Thanks for your contribution! As part of our Community Growers initiative 🌱, we're donating Justdiggit bunds in your name to reforest sub-Saharan Africa. To claim your Community Growers certificate, please contact David Berenstein in our Slack community or fill in this form https://tally.so/r/n9XrxK once your PR has been merged. --> # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. Closes #4185 **Type of change** (Remember to title the PR according to the type of change) - [ x] Documentation update **How Has This Been Tested** (Please describe the tests that you ran to verify your changes.) - [ x] `sphinx-autobuild` (read [Developer Documentation](https://docs.argilla.io/en/latest/community/developer_docs.html#building-the-documentation) for more details) **Checklist** - [ ] I added relevant documentation - [ ] I followed the style guidelines of this project - [ ] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the `CHANGELOG.md` file (See https://keepachangelog.com/)
docs: changed to spacy training instead of trf
* develop: (41 commits) chore: update dev version chore: update CHANGELOG.md before release v1.20.0 (#4357) docs: temporal update to indicate persistent storage (#4355) docs: add suggestions and responses filters and sorting (#4345) docs: add end2end example on creating a basic text-classification dataset (#4208) Fix/responses suggestions filter fine tune (#4356) Fix/responses suggestions filter fine tune (#4356) fix: Accept draft responses on dataset records creation (#4354) Feature/responses operator (#4352) Feature/responses operator (#4352) chore: increase dev version release to 1.21.0 chore: remove dev suffix for release branch fix: responses and suggestions filter QA (#4337) feat: delete suggestion from record on search engine (#4336) feat: update suggestion from record on search engine (#4339) bug: fix bug and update test (#4341) fix: preserve `TextClassificationSettings.label_schema` order (#4332) Update issue templates feat: 🚀 support for filtering and sorting by responses and suggestions (#4160) fix: handling errors for non-existing endpoints (#4325) ... # Conflicts: # frontend/v1/domain/entities/question/Question.ts # frontend/v1/domain/entities/record/Record.ts
* develop: (21 commits) ✨ Fix error handling in axios plugin for 401 (#4362) docs: Change `telemetry` section in tutorials to directly executable cells (#4399) docs: add faq files (#4363) fix: pinning `pytest-asyncio` to version `0.21.1` to avoid problems running unit tests on GitHub workflows (#4395) docs: add making most of markdown to tutorial page (#4376) Fixing typo in Fine Tuning LLMs Practical Guides (#4392) Token Classification epochs parameter trainer changed (#4393) docs: align practical guidescreate datasethtml with end2end examples structure (#4375) docs: hugging face space url (#4379) docs: extend using proxy section (#4368) chore: update dev version chore: update CHANGELOG.md before release v1.20.0 (#4357) docs: temporal update to indicate persistent storage (#4355) docs: add suggestions and responses filters and sorting (#4345) docs: add end2end example on creating a basic text-classification dataset (#4208) Fix/responses suggestions filter fine tune (#4356) Fix/responses suggestions filter fine tune (#4356) fix: Accept draft responses on dataset records creation (#4354) Feature/responses operator (#4352) Feature/responses operator (#4352) ...
Description
This PR includes 2 features towards the #4178 issue.
FeedbackDataset
fortext-classification
.Closes #4179 and #4220
Type of change
(Remember to title the PR according to the type of change)
How Has This Been Tested
(Please describe the tests that you ran to verify your changes.)
sphinx-autobuild
(read Developer Documentation for more details)Checklist
CHANGELOG.md
file (See https://keepachangelog.com/)