Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate chunk_size in favor of batch_size for rg.log #2455

Merged
merged 2 commits into from
Mar 6, 2023

Conversation

tomaarsen
Copy link
Contributor

Closes #2453

Pull Request overview

  • Deprecate chunk_size in favor of batch_size in rg.log:
    • Move chunk_size to the end of all related signatures.
    • Set chunk_size default to None and update the typing accordingly.
    • Introduce batch_size in the old position in the signature.
    • Update docstrings accordingly.
    • Introduce a FutureWarning if chunk_size is used.
  • Introduce test showing that rg.log(..., chunk_size=100) indeed throws a warning.
  • Update a warning to no longer include a newline and a lot of spaces (see first comment of this PR)

Details

Note that this deprecation is non-breaking: Code that uses chunk_size will still work, as batch_size = chunk_size after a FutureWarning is given, if chunk_size is not None.

Discussion

  • Should I use a FutureWarning? Or a DeprecationWarning? Or a PendingDeprecationWarning? The last two make sense, but they are ignored by default, I'm afraid.
  • Is the deprecation message in the format that we like?

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested

I introduced a test, and ran all tests.

Checklist

  • I have merged the original branch into my forked branch
  • I added relevant documentation
  • follows the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works

  • Tom Aarsen

@codecov
Copy link

codecov bot commented Mar 2, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.38 ⚠️

Comparison is base (f5834a5) 92.00% compared to head (f579949) 91.63%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2455      +/-   ##
===========================================
- Coverage    92.00%   91.63%   -0.38%     
===========================================
  Files          162      162              
  Lines         7896     7899       +3     
===========================================
- Hits          7265     7238      -27     
- Misses         631      661      +30     
Flag Coverage Δ
pytest 91.63% <100.00%> (-0.38%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/argilla/client/api.py 97.67% <ø> (ø)
src/argilla/client/client.py 87.70% <100.00%> (+0.56%) ⬆️
.../server/daos/backend/client_adapters/opensearch.py 83.26% <0.00%> (-8.18%) ⬇️
...lla/server/daos/backend/client_adapters/factory.py 76.31% <0.00%> (-5.27%) ⬇️
...gilla/labeling/text_classification/label_errors.py 86.41% <0.00%> (-3.71%) ⬇️
...rgilla/server/daos/backend/search/query_builder.py 90.75% <0.00%> (-2.90%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@davidberenstein1957 davidberenstein1957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FutureWarning seems fine to me. I found some other chunk-size references within the scan_data_response methods for src/argilla/server/apis/v0/handlers. IMO these can also be renamed, but @frascuchon not sure if you agree.

@frascuchon these methods

src/argilla/client/client.py Show resolved Hide resolved
src/argilla/client/client.py Show resolved Hide resolved
tests/client/test_api.py Show resolved Hide resolved
@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Mar 2, 2023 via email

@tomaarsen
Copy link
Contributor Author

tomaarsen commented Mar 2, 2023

I'll include it in the other PR that I'm preparing :)
Edit: I will not. See #2455 (comment)

@frascuchon frascuchon added this to the v1.4.0 milestone Mar 2, 2023
@frascuchon
Copy link
Member

FutureWarning seems fine to me. I found some other chunk-size references within the scan_data_response methods for src/argilla/server/apis/v0/handlers. IMO these can also be renamed, but @frascuchon not sure if you agree.

@frascuchon these methods

No worries about that. These endpoints are not used anymore and will be removed in v1.5.0

@frascuchon frascuchon merged commit 3ebea76 into argilla-io:develop Mar 6, 2023
@tomaarsen tomaarsen deleted the deprecate/log_chunk_size branch March 6, 2023 13:01
@frascuchon frascuchon mentioned this pull request Mar 8, 2023
frascuchon added a commit that referenced this pull request Mar 9, 2023
# [1.4.0](v1.3.1...v1.4.0)
(2023-03-09)

### Features

* `configure_dataset` accepts a workspace as argument
([#2503](#2503))
([29c9ee3](29c9ee3)),
* Add `active_client` function to main argilla module
([#2387](#2387))
([4e623d4](4e623d4)),
closes [#2183](#2183)
* Add text2text support for prepare for training spark nlp
([#2466](#2466))
([21efb83](21efb83)),
closes [#2465](#2465)
[#2482](#2482)
* Allow passing workspace as client param for `rg.log` or `rg.load`
([#2425](#2425))
([b3b897a](b3b897a)),
closes [#2059](#2059)
* Bulk annotation improvement
([#2437](#2437))
([3fce915](3fce915)),
closes [#2264](#2264)
* Deprecate `chunk_size` in favor of `batch_size` for `rg.log`
([#2455](#2455))
([3ebea76](3ebea76)),
closes [#2453](#2453)
* Expose `batch_size` parameter for `rg.load`
([#2460](#2460))
([e25be3e](e25be3e)),
closes [#2454](#2454)
[#2434](#2434)
* Extend shortcuts to include alphabet for token classification
([#2339](#2339))
([4a92b35](4a92b35))


### Bug Fixes

* added flexible app redirect to docs page
([#2428](#2428))
([5600301](5600301)),
closes [#2377](#2377)
* added regex match to set workspace method
([#2427](#2427))
([d789fa1](d789fa1)),
closes [#2388]
* error when loading record with empty string query
([#2429](#2429))
([fc71c3b](fc71c3b)),
closes [#2400](#2400)
[#2303](#2303)
* Remove extra-action dropdown state after navigation
([#2479](#2479))
([9328994](9328994)),
closes [#2158](#2158)


### Documentation

* Add AutoTrain to readme
([7199780](7199780))
* Add migration to label schema section
([#2435](#2435))
([d57a1e5](d57a1e5)),
closes [#2003](#2003)
[#2003](#2003)
* Adds zero+few shot tutorial with SetFit
([#2409](#2409))
([6c679ad](6c679ad))
* Update readme with quickstart section and new links to guides
([#2333](#2333))
([91a77ad](91a77ad))


## As always, thanks to our amazing contributors!
- Documentation update: adding missing n (#2362) by @Gnonpi
- feat: Extend shortcuts to include alphabet for token classification
(#2339) by @cceyda
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate the current chunk_size in favour of batch_size for rg.log
3 participants