Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New lint missing_iterator_fold #12211

Closed

Conversation

Philippe-Cholet
Copy link

@Philippe-Cholet Philippe-Cholet commented Jan 29, 2024

fixes #12209

Specialize fold might improve performance of methods relying on it and other iterator adaptors calling fold.

changelog: [missing_iterator_fold]: Check that Iterator implementations specialize the fold method.

I am still questioning myself about the lint name, is "missing" appropriate?

I am unsure if in_external_macro check is needed or not in this situation.

Note: I would (later) suggest similar lints for DoubleEndedIterator::rfold (and Iterator::try_fold/DoubleEndedIterator::try_rfold except they currently can't be specialized on Rust stable since the trait Try is still unstable) and Iterator::size_hint. All those lints could solved with a single LateLintPass.

@rustbot
Copy link
Collaborator

rustbot commented Jan 29, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @blyxyas (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Jan 29, 2024
Copy link
Member

@blyxyas blyxyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good first iteration! Thanks for the contribution, I have some small questions and nits, but this is a very good state.


impl LateLintPass<'_> for MissingIteratorFold {
fn check_item(&mut self, cx: &LateContext<'_>, item: &Item<'_>) {
if rustc_middle::lint::in_external_macro(cx.sess(), item.span) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally import this function

Comment on lines 71 to 72
.map(|assoc| assoc.ident.name.as_str())
.any(|name| name == "fold");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map can be removed to avoid walking over the iterator (and allocating memory) more than necessary

Suggested change
.map(|assoc| assoc.ident.name.as_str())
.any(|name| name == "fold");
.any(|assoc| assoc.ident.name.as_str() == "fold");

/// ```
#[clippy::version = "1.77.0"]
pub MISSING_ITERATOR_FOLD,
nursery,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm... what category do you think that this should be? I'm thinking about perf, but the lint may be too demanding to be put in perf

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely about performance but there is also pedantic because this is "rather strict".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually actionable in all cases? As in, can you always provide a more useful implementation than the default fold implementation? Because if not, I'd go as far as saying it could be a restriction lint. For example, std's Range Iterator implementation also does not implement fold and I'm not sure how/why it should

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason why specialize fold would necessarily improve performance in the general case. It certainly is in some cases but not all.
This is definitely a lint that should be allowed in some cases after some consideration (and benchmark). This is exactly our case at rust-itertools.
Not sure why either about Range (while RangeInclusive does if I remember well).

Copy link
Author

@Philippe-Cholet Philippe-Cholet Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the category to restriction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the sake of completion, could you add another struct but implement fold, just to make sure that it doesn't lint then?

@Philippe-Cholet
Copy link
Author

Philippe-Cholet commented Feb 1, 2024

@blyxyas Hi! Each point was addressed. Is there anything else or should I just squash my commits? (and rebase?)

Should I suggest a blank fold specialization? (fn fold... { todo!() })

@blyxyas
Copy link
Member

blyxyas commented Feb 3, 2024

Should I suggest a blank fold specialization

I don't think so, that would prompt users to just leave it unchanged.

Note: I would (later) suggest similar lints for DoubleEndedIterator::rfold (and Iterator::try_fold/DoubleEndedIterator::try_rfold except they currently can't be specialized on Rust stable since the trait Try is still unstable) and Iterator::size_hint. All those lints could solved with a single LateLintPass.

Hmmmm... These should be the same lint imo. Creating 3 - 4 lints just for slightly different methods would be...


Also, do you have an example where manually implementing fold is beneficial for performance? I'm not seeing how it could improve it.

@Philippe-Cholet
Copy link
Author

Philippe-Cholet commented Feb 3, 2024

rust-itertools/itertools#775 had benchmarks 34 times faster but that's the exception. Most other fold specializations are "only" up to 2-4 times faster like rust-itertools/itertools#849 and rust-itertools/itertools#848. Some others are only (very roughly) 20% faster.
Some other do not benefit from a specialization and allow this lint would help document that.
Have the lint would also help maintain a massive TODO list and prevent us from adding a new iterator without considering a fold specialization.

Have a single lint for size_hint/[try_][r]fold/... would be bad for flexibility. We would not be able to allow a lint for one of the method without allowing it for others, even with a crate-level configuration like disallowed_types has.

However, I clearly understand that you don't want a bunch of lints.

  • The current lint could simultanously check for missing DoubleEndedIterator::rfold specializations as well, I have no problem with that because it's on different implementations.
  • One for size_hint seems important to me as a more precise size hint can allow a single allocation (e.g. when collecting to a vector or any collection) and prevent repetive slow reallocations.
  • One for try_fold/try_rfold seems reasonable to me but to be specialized, it needs to name the unstable Try trait. However, I noticed a few lints about nightly features such as empty_enum.
    This one can wait for Try to be stable, but it deserved to be mentionned.

Leading to a total of three lints (in a same lint pass).

Most Iterator (non-adaptor) methods rely on [try_][r]fold by default (count/last/for_each/partition/all/any/find/find_map/[r]position/reduce/(max|min)[_by[_key]]/[partial_]cmp[_by]/eq[_by]/ne/lt/le/gt/ge and some try_ variants). Have [try_][r]fold methods faster makes all others faster with them and all adaptors relying on them too.
The methods not relying on them is somehow short (nth/by_ref/[try_]collect/unzip, probably one or two more).

I think that specialize those are good practices and can have huge impact on the Iterator ecosystem.
I think that such (somehow limited) inclusions would be beneficial to more than rust-itertools. But it's your call.

EDIT: Maybe merge [try_][r]fold to a single lint, but I'm not fan.

EDIT 2: [try_]fold by default repeatedly calls next. But next might update some internal state (like an index) that we won't need once the iterator is finished (for fold) or maybe we can update the internals only once (for try_fold). Specialize [try_]fold might allow to avoid repetitive updates of the internals. And if we adapt an iterator, we can use the specializations of the adapted iterator for our own specializations.

@Philippe-Cholet
Copy link
Author

Before this PR, I took advice on zulip and the discussion here eventually took place back on zulip.
The discussion was insightful and lead me to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

missing_iterator_fold
4 participants