Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closures in generic code can create duplicate monomorphizations #83010

Closed
tgnottingham opened this issue Mar 11, 2021 · 4 comments
Closed

Closures in generic code can create duplicate monomorphizations #83010

tgnottingham opened this issue Mar 11, 2021 · 4 comments
Labels
A-typesystem Area: The type system I-compiletime Issue: Problems and improvements with respect to compile times. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@tgnottingham
Copy link
Contributor

tgnottingham commented Mar 11, 2021

Summary

When a closure is defined in a generic context like a generic function or a method of a generic class, its type includes, in some sense or another, the parent's generic parameters. If the closure doesn't depend on some of those generic parameters, it can lead to the instantiation of mono items which are functionally identical, but distinct by virtue of false dependencies on unused generic parameters of the parent. I'll call these mono items "duplicates".

These duplicates create needless work and code bloat. At the very least, they go through the codegen and optimization pipeline, and if not optimized away, are included in the binary. I imagine there is other overhead, such as creation of more distinct types.

Example 1

fn foo<T: std::fmt::Display>(t: T) {
    let _ = vec!["hello".to_string()].iter().map(|x| x);
    println!("{}", t);
}

fn main() {
    foo(0i32);
    foo(0i64);
}

The map call is passed a closure |x| x which does not depend the parent function's generic parameter T. The code for that closure is identical in both foo::<i32> and foo::<i64>, so we should be able to only generate one map instantiation and Map type (the return type of map, which is generic over the closure that map was passed), and use it in both foo instances. Yet this creates two map instantiations, and two Map::new instantiations, because the closure type is polluted by the parent's T parameter, which, again, the closure doesn't use in any way.

Excerpt from -Z print-mono-items=lazy:

MONO_ITEM fn <std::slice::Iter<std::string::String> as std::iter::Iterator>::map::<&std::string::String, [closure@src/main.rs:2:50: 2:55]> @@ lzylnwmetevawv6[External]
MONO_ITEM fn <std::slice::Iter<std::string::String> as std::iter::Iterator>::map::<&std::string::String, [closure@src/main.rs:2:50: 2:55]> @@ lzylnwmetevawv6[External]

MONO_ITEM fn std::iter::Map::<std::slice::Iter<std::string::String>, [closure@src/main.rs:2:50: 2:55]>::new @@ 290s40pyalavf195[External]
MONO_ITEM fn std::iter::Map::<std::slice::Iter<std::string::String>, [closure@src/main.rs:2:50: 2:55]>::new @@ 290s40pyalavf195[External]

(I don't know why, but there weren't any mono items in the output for the closure itself in this example.)

If we actually did anything interesting with the Maps, it would create yet more duplicate instantiations. And if the closure captured anything by value that needed dropping, it would create duplicate drop_in_place instantiations for the closure and Map types.

Example 2

This isn't a rare event. Here's an example from the standard library which was quickly and easily found. This case is different, because the parent generics come from the struct for which the function is implemented:

impl<K, V, S> PartialEq for HashMap<K, V, S>
where
    K: Eq + Hash,
    V: PartialEq,
    S: BuildHasher,
{
    fn eq(&self, other: &HashMap<K, V, S>) -> bool {
        if self.len() != other.len() {
            return false;
        }

        self.iter().all(|(key, value)| other.get(key).map_or(false, |v| *value == *v))
    }
}

The closure passed to map_or really only depends on V, yet it has false dependencies on K and S. Creating two HashMaps with the same value type will cause duplicate mono items to be created for this closure and map_or:

use std::collections::HashMap;

fn main() {
    let m1 = HashMap::<String, u8>::new();
    assert_eq!(m1, m1);

    let m2 = HashMap::<i32, u8>::new();
    assert_eq!(m2, m2);
}

Excerpt from -Z print-mono-items=lazy:

MONO_ITEM fn <std::collections::HashMap<i32, u8> as std::cmp::PartialEq>::eq::{closure#0}::{closure#0} @@ i6hou2d2a3r4xaf[External]
MONO_ITEM fn <std::collections::HashMap<std::string::String, u8> as std::cmp::PartialEq>::eq::{closure#0}::{closure#0} @@ i6hou2d2a3r4xaf[External]

MONO_ITEM fn std::option::Option::<&u8>::map_or::<bool, [closure@<std::collections::HashMap<i32, u8> as std::cmp::PartialEq>::eq::{closure#0}::{closure#0}]> @@ 583fmf0zweyhx5vp[External]
MONO_ITEM fn std::option::Option::<&u8>::map_or::<bool, [closure@<std::collections::HashMap<std::string::String, u8> as std::cmp::PartialEq>::eq::{closure#0}::{closure#0}]> @@ 583fmf0zweyhx5vp[External]

Impact

Okay, so there are some duplicate mono items. Maybe even a lot in a real codebase. But is it impactful? I haven't measured, but I suspect it is. Take a look at the Rust and assembly for Option::map_or (for the above closure) in debug mode, and consider the above example:

    pub fn map_or<U, F: FnOnce(T) -> U>(self, default: U, f: F) -> U {
        match self {
            Some(t) => f(t),
            None => default,
        }
    }
Assembly
<<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>>:
	                sub    $0x58,%rsp
	                mov    %rdi,0x18(%rsp)
	                mov    %sil,%al
	                and    $0x1,%al
	                mov    %al,0x37(%rsp)
	                mov    %rdx,0x38(%rsp)
	                movb   $0x0,0x36(%rsp)
	                movb   $0x0,0x35(%rsp)
	                movb   $0x1,0x36(%rsp)
	                movb   $0x1,0x35(%rsp)
	                mov    0x18(%rsp),%rcx
	                test   %rcx,%rcx
	                setne  %al
	                movzbl %al,%r8d
	                mov    %r8d,%ecx
	                mov    %rdx,0x10(%rsp)
	                mov    %sil,0xf(%rsp)
	         /----- je     <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x4d>
	         |  /-- jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x4b>
	      /--|--\-X jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x60>
	      |  \----> movb   $0x0,0x36(%rsp)
	      |         mov    0xf(%rsp),%al
	      |         and    $0x1,%al
	      |         mov    %al,0x27(%rsp)
	/-----|-------- jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xac>
	|     |         ud2    
	|     \-------> mov    0x18(%rsp),%rax
	|               mov    %rax,0x40(%rsp)
	|               movb   $0x0,0x35(%rsp)
	|               mov    %rax,0x28(%rsp)
	|               mov    0x28(%rsp),%rsi
	|               mov    0x10(%rsp),%rdi
	|               callq  <<std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>
	|               mov    %al,0xe(%rsp)
	|           /-- jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x89>
	|           \-> mov    0xe(%rsp),%al
	|               and    $0x1,%al
	|               mov    %al,0x27(%rsp)
	+-------------- jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xac>
	|  /----------> testb  $0x1,0x36(%rsp)
	|  |  /-------- jne    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xb5>
	|  |  |  /----> mov    0x27(%rsp),%al
	|  |  |  |      and    $0x1,%al
	|  |  |  |      movzbl %al,%eax
	|  |  |  |      add    $0x58,%rsp
	|  |  |  |      retq   
	|  +--|--|--/-X jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x95>
	\--|--|--|--|-> testb  $0x1,0x35(%rsp)
	   |  |  |  \-- jne    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xaa>
	   \--|--|----- jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x95>
	      \--\----X jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x9c>
	         /--/-X jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xc0>
	      /--|--|-> testb  $0x1,0x36(%rsp)
	      |  |  \-- jne    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xb7>
	      |  \----> mov    0x48(%rsp),%rdi
	      |         callq  <_Unwind_Resume@plt>
	      |         ud2    
	      |         mov    %rax,0x48(%rsp)
	      |         mov    %edx,0x50(%rsp)
	      \-------- jmp    <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xb9>
	                nopw   0x0(%rax,%rax,1)

(Wow, that's surprisingly complicated.)

In debug mode, this code is included in the binary twice, once for each closure. In release mode, the code may be optimized out, but it's still processed for some part of compilation. This was a trivial example. Imagine if map_or was a complicated function!

If there are hundreds or thousands of duplicate mono items in moderate to large codebases, which seems quite possible, then this may have a noticeable impact on compile times and binary sizes.

Solution

If possible, we should make closure types depend only on the parent generics which they actually use.

The ClosureSubsts comment explains why parent generics are included:

/// All right, you say, but why include the type parameters from the
/// original function then? The answer is that codegen may need them
/// when monomorphizing, and they may not appear in the upvars. A
/// closure could capture no variables but still make use of some
/// in-scope type parameter with a bound (e.g., if our example above
/// had an extra `U: Default`, and the closure called `U::default()`).
///
/// There is another reason. This design (implicitly) prohibits
/// closures from capturing themselves (except via a trait
/// object). This simplifies closure inference considerably, since it
/// means that when we infer the kind of a closure or its upvars, we
/// don't have to handle cycles where the decisions we make for
/// closure C wind up influencing the decisions we ought to make for
/// closure C (which would then require fixed point iteration to
/// handle). Plus it fixes an ICE. :P

See #27087, where this was introduced. #27086 may also be relevant. Here and here are FIXMEs for that issue, later removed.

I have a feeling the case where the parent generics come from a generic struct on which the function including the closure is defined (example 2) may be harder to solve.

@rustbot label T-compiler A-typesystem I-compiletime I-heavy

@rustbot rustbot added A-typesystem Area: The type system I-compiletime Issue: Problems and improvements with respect to compile times. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 11, 2021
@Aaron1011
Copy link
Member

I believe this is polymorphization: https://rust-lang.github.io/compiler-team/working-groups/polymorphization/

@tgnottingham
Copy link
Contributor Author

tgnottingham commented Mar 11, 2021

Yeah, polymorphization might be a solution. Though, if it's feasible to just not depend on the unused parent generics, that would be preferable over polymorphization, since it means the duplicates are never even introduced. And -Z polymorphize isn't smart enough to handle this yet either.

@panstromek
Copy link
Contributor

Note for context: The original motivating issue which was closed by polymorhization talks about basically the same problem: #46477

@tgnottingham
Copy link
Contributor Author

Closing this since it's redundant. Solving this by simply treating two types -- which are otherwise equivalent except that they contain different unused generics -- as equivalent, as I originally thought might be possible as a kind of alternative to polymorphization, is probably not viable, at the very least because the two types still need to have different TypeIds (see #75325). There are probably other reasons why it's not viable, but one reason is good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-typesystem Area: The type system I-compiletime Issue: Problems and improvements with respect to compile times. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants