-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adding utils module and functions #4121
feat: adding utils module and functions #4121
Conversation
The URL of the deployed environment for this PR is https://argilla-quickstart-pr-4121-ki24f765kq-no.a.run.app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
def create_token_highlights( | ||
tokens: List[str], weights: List[float], c_map: Optional[Union[str, Callable]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokens: List[str], weights: List[float], c_map: Optional[Union[str, Callable]] = None | |
tokens: List[str], weights: List[float], c_map: Optional[Union[str, Callable]] = "viridis" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sdiazlor we would also need to make some changes later in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidberenstein1957 yeah, of course, let me know to talk more about the approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sdiazlor we just need to remove the if None check we use lateron
} | ||
|
||
|
||
def media_to_html(media_type: str, path: str, file_type: Optional[str] = None) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, could we also potentially allow for passing a byte string directly instead of passing a path? Perhaps we can check to allow for a path or non-b64encoded string.
} | ||
|
||
|
||
def media_to_html(media_type: str, media_source: Union[str, bytes], file_type: Optional[str] = None) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is bytes
defined?
] | ||
|
||
# Get the color map if set to None or not indicated | ||
if c_map is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove this part
Hi @sdiazlor, I think we can just use a small or a subset of the examples to showcase the point without requiring users to download a lot of content from the HF datasets for the Perhaps we can add a reference to this section to the tutorial about making most of markdown? https://docs.argilla.io/en/latest/practical_guides/create_dataset.html#define-fields |
@sdiazlor, this looks great :) |
@sdiazlor , can you also add the same |
@davidberenstein1957 I added in assignment.py the whole assign_records logic (including the part for the docs), let me know your comments |
# Step 3: push dataset/s | ||
|
||
|
||
def assign_workspaces(assignments, workspace_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thionk we should add some docstrings and type hinting here
for group, users in assignments.items(): | ||
if workspace_type == "group": | ||
workspace_name = group | ||
user_ids = [User.from_name(username).id for username in users.keys()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also check if the user exists
assignments = {user.username: [] for user in users} | ||
num_users = len(users) | ||
|
||
for idx, record in enumerate(records): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can use a rich progress bar here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns: | ||
A dictionary where keys are usernames and values are lists of assigned records. | ||
""" | ||
if overlap < 0 or overlap >= len(users): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say overlap is a percentage between 0 and 1 right? Do you think it might be easy to add this? @nataliaElv what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you may want to have 2 or 3 annotators giving responses to one record, so this could be greater than 1? But let me know if I'm misunderstanding what the function is doing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right @nataliaElv, I thought wrongly about this.l
num_groups = len(group_names) | ||
|
||
group_records = defaultdict(list) | ||
for idx, record in enumerate(records): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can add a rich progress bar here?
Raises: | ||
ValueError: If `overlap` is higher than the number of groups or negative. | ||
""" | ||
if overlap < 0 or overlap >= len(groups.keys()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments as below about opverlap in percentage and # per users
…add-a-utils-module-to-the-package
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4121 +/- ##
============================================
- Coverage 90.13% 64.93% -25.21%
============================================
Files 233 324 +91
Lines 12493 18708 +6215
============================================
+ Hits 11261 12148 +887
- Misses 1232 6560 +5328
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sdiazlor, this looks great and is almost ready to be merged. Can you also update the docs w.r.t.
https://docs.argilla.io/en/latest/practical_guides/assign_records.html
https://docs.argilla.io/en/latest/getting_started/installation/configurations/image_support.html
Additionally, token highlights have not been referenced anywhere and might be mentioned somewhere in the docs or tutorial. Where do you think it should fit?
Also the changelog can be slightly extended :)
Regarding the token highlights, I'll mention them in the tutorial, I think that maybe add them when referencing to markdown in create a dataset is too much info as we also mention media support |
```{note} | ||
If you haven't done so already, decide on the settings of the project (the `fields`, `questions` and `guidelines`) as detailed in the [Create a Feedback Dataset guide](/practical_guides/create_dataset.md#feedback-dataset) and set those as variables. | ||
from argilla.client.feedback.utils import assign_records | ||
assignments = assign_records(users, records, 1, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can use the explicit variable names users=users
, records=records
etc.
``` | ||
::: | ||
from argilla.client.feedback.utils import assign_workspaces | ||
wk_assignments = assign_workspaces(assignments, "individual") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use assignments
like it is use below
fields = [...] | ||
questions = [...] | ||
guidelines = "..." | ||
# assign_workspace(assignments, workspace_type): to check the workspaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you can also use explicit variable name
assign_workspace(
assignments=assignments,
workspace_type="group"
)
you can add additional context about the workspace types outside of the code block
@davidberenstein1957 I updated the last changes |
<!-- Thanks for your contribution! As part of our Community Growers initiative 🌱, we're donating Justdiggit bunds in your name to reforest sub-Saharan Africa. To claim your Community Growers certificate, please contact David Berenstein in our Slack community or fill in this form https://tally.so/r/n9XrxK once your PR has been merged. --> # Description Adding a new utils module : * submodule: HTML (media_to_html and create_tokens_highlights) * submodule: records (assign records) Closes #4030 Closes #4003 Closes #3803 Closes #3928 Closes #4031 **Type of change** (Please delete options that are not relevant. Remember to title the PR according to the type of change) - [x] New feature (non-breaking change which adds functionality) - [ ] Refactor (change restructuring the codebase without changing functionality) - [ ] Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** (Please describe the tests that you ran to verify your changes. And ideally, reference `tests`) - [ ] Test A - [ ] Test B **Checklist** - [ ] I added relevant documentation - [ ] I followed the style guidelines of this project - [x] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the `CHANGELOG.md` file (See https://keepachangelog.com/) --------- Co-authored-by: David Berenstein <[email protected]>
* develop: (30 commits) chore: increase dev version release to 1.21.0 fix: responses and suggestions filter QA (#4337) feat: delete suggestion from record on search engine (#4336) feat: update suggestion from record on search engine (#4339) bug: fix bug and update test (#4341) fix: preserve `TextClassificationSettings.label_schema` order (#4332) Update issue templates feat: 🚀 support for filtering and sorting by responses and suggestions (#4160) fix: handling errors for non-existing endpoints (#4325) feat: adding utils module and functions (#4121) Update labels in github workflows (#4315) fix: correct unification implementation for `RankingQuestionStrategy` (#4295) fix: update to solve the error of integration tests in CI (#4314) docs: revisit install process (#4261) feat: increase timeout minutes for python tests (#4307) docs: docs export dataset does not apply coloring for code snippets (#4296) docs: update final section of the rag haystack blog post (#4294) feat: add multi_modal templates and update vector setting (#4283) feat: better logging bar for FeedbackDataset (#4267) refactor: ArgillaTrainer for unified variable usage (#4214) ... # Conflicts: # frontend/v1/infrastructure/repositories/RecordRepository.ts
Description
Adding a new utils module :
Closes #4030
Closes #4003
Closes #3803
Closes #3928
Closes #4031
Type of change
(Please delete options that are not relevant. Remember to title the PR according to the type of change)
How Has This Been Tested
(Please describe the tests that you ran to verify your changes. And ideally, reference
tests
)Checklist
CHANGELOG.md
file (See https://keepachangelog.com/)