Investigate memory usage of compiling the packed_simd crate #57829

hsivonen · 2019-01-22T11:01:22Z

Steps to reproduce

Create a new crate with cargo.
Add packed_simd = '0.3.1' to Cargo.toml of the new crate.
Build the new crate.

Actual results

While compiling packed_simd, rustc takes more than 2 GB of RAM.

Expected results

Lesser RAM usage.

Additional info

Maybe it's just the nature of packed_simd that it takes a lot of RAM to compile, and there's no bug. However, if RAM usage reached 3 GB in the future, the crate would become unbuildable on 32-bit systems. It might be worthwhile to investigate if building packed_simd has to take this much RAM or if there is an opportunity to use less RAM without adversely affecting compilation speed on systems that have plenty of RAM.

The text was updated successfully, but these errors were encountered:

gnzlbg · 2019-01-22T11:29:11Z

cc @mw @nnethercote

matthiaskrgr · 2019-01-22T12:30:46Z

Looks like nll needs a lot of memory here

�[0m�[0m�[1m�[32m   Compiling�[0m packed_simd v0.3.1
  time: 0.054; rss: 57MB	parsing
  time: 0.000; rss: 58MB	attributes injection
  time: 0.000; rss: 58MB	recursion limit
  time: 0.000; rss: 58MB	crate injection
  time: 0.000; rss: 58MB	plugin loading
  time: 0.000; rss: 58MB	plugin registration
  time: 0.005; rss: 58MB	pre ast expansion lint checks
    time: 2.550; rss: 369MB	expand crate
    time: 0.000; rss: 369MB	check unused macros
  time: 2.550; rss: 369MB	expansion
  time: 0.000; rss: 369MB	maybe building test harness
  time: 0.012; rss: 369MB	maybe creating a macro crate
  time: 0.048; rss: 370MB	creating allocators
  time: 0.036; rss: 370MB	AST validation
  time: 0.497; rss: 412MB	name resolution
  time: 0.075; rss: 412MB	complete gated feature checking
  time: 0.321; rss: 481MB	lowering ast -> hir
  time: 0.081; rss: 482MB	early lint checks
    time: 0.052; rss: 504MB	validate hir map
  time: 0.353; rss: 504MB	indexing hir
  time: 0.000; rss: 504MB	load query result cache
  time: 0.000; rss: 504MB	looking for entry point
  time: 0.000; rss: 504MB	dep graph tcx init
  time: 0.001; rss: 504MB	looking for plugin registrar
  time: 0.001; rss: 504MB	looking for derive registrar
  time: 0.019; rss: 504MB	loop checking
  time: 0.024; rss: 504MB	attribute checking
    time: 0.000; rss: 515MB	solve_nll_region_constraints(DefId(0/1:2171 ~ packed_simd[a932]::v64[0]::f32x2[0]::{{constant}}[0]))
*snip*
    time: 0.000; rss: 527MB	solve_nll_region_constraints(DefId(0/1:4611 ~ packed_simd[a932]::vSize[0]::{{impl}}[587]::from[0]::U[0]::array[0]::{{constant}}[0]))
  time: 0.636; rss: 527MB	stability checking
  time: 0.124; rss: 527MB	type collecting
  time: 0.003; rss: 527MB	outlives testing
  time: 0.019; rss: 527MB	impl wf inference
    time: 0.000; rss: 1113MB	solve_nll_region_constraints(DefId(0/1:224 ~ packed_simd[a932]::codegen[0]::shuffle[0]::{{impl}}[0]::{{constant}}[0]))
*snip*
    time: 0.000; rss: 1246MB	solve_nll_region_constraints(DefId(0/1:4867 ~ packed_simd[a932]::vPtr[0]::{{impl}}[104]::{{constant}}[0]))
  time: 9.972; rss: 1408MB	coherence checking
  time: 0.002; rss: 1408MB	variance testing
    time: 0.000; rss: 1605MB	solve_nll_region_constraints(DefId(0/1:366 ~ packed_simd[a932]::codegen[0]::v16[0]::{{impl}}[0]::NT[0]::{{constant}}[0]))
*snip*
    time: 0.000; rss: 2013MB	solve_nll_region_constraints(DefId(0/0:4027 ~ packed_simd[a932]::codegen[0]::reductions[0]::mask[0]::{{impl}}[7]::any[0]))
    time: 0.000; rss: 2013MB	solve_nll_region_constraints(DefId(0/0:4053 ~ packed_simd[a932]::codegen[0]::reductions[0]::mask[0]::{{impl}}[17]::any[0]))
  time: 5.040; rss: 2013MB	MIR borrow checking
  time: 0.000; rss: 2013MB	dumping chalk-like clauses
  time: 0.005; rss: 2013MB	MIR effect checking
  time: 0.072; rss: 2018MB	death checking
  time: 0.021; rss: 2018MB	unused lib feature checking
  time: 0.176; rss: 2019MB	lint checking
  time: 0.000; rss: 2019MB	resolving dependency formats
    time: 0.890; rss: 2055MB	write metadata
      time: 0.010; rss: 2055MB	collecting roots
      time: 0.186; rss: 2056MB	collecting mono items
    time: 0.196; rss: 2056MB	monomorphization collection
    time: 0.001; rss: 2056MB	codegen unit partitioning
    time: 0.122; rss: 2060MB	codegen to LLVM IR
    time: 0.000; rss: 2060MB	assert dep graph
    time: 0.000; rss: 2060MB	serialize dep graph
  time: 1.215; rss: 2060MB	codegen
    time: 0.056; rss: 2063MB	llvm function passes [packed_simd.smey8184-cgu.0]
    time: 0.777; rss: 2071MB	llvm module passes [packed_simd.smey8184-cgu.0]
    time: 0.798; rss: 2079MB	codegen passes [packed_simd.smey8184-cgu.0]
  time: 1.703; rss: 1539MB	LLVM passes
  time: 0.000; rss: 1540MB	serialize work products
  time: 0.017; rss: 1540MB	linking

gnzlbg · 2019-01-22T13:11:33Z

Coherence checking also takes a good chunk of memory:

time: 0.000; rss: 1246MB	solve_nll_region_constraints(DefId(0/1:4867 ~ packed_simd[a932]::vPtr[0]::{{impl}}[104]::{{constant}}[0]))
  time: 9.972; rss: 1408MB	coherence checking

although NLL is the first suspect here. I wonder why NLL uses this much memory, packed_simd is full of methods, but the great majority of them are essentially one liners.

memoryruins · 2019-01-22T14:49:04Z

Reported the following spike of memory usage in #57432, which occurred after #56723

mati865 · 2019-01-30T09:45:42Z

This one could be closed as duplicate of #57432 I guess.

gnzlbg · 2019-01-30T10:09:02Z

EDIT: @mati865 you are right, these are duplicates, I thought that was a different issue that apparently never got filled, so forget this.

original comment:

@mati865 while they are related, they are two different issues:

this issue is about compiling packed_simd itself, which started using much more memory recently, resulting in some builds failing for consumers (encoding-rs)
Compile time perf regression for packed-simd's max-rss #57432 is about increased compile-times and memory usage when compiling other crates when packed_simd is part of libcore (e.g. via core::simd)

nnethercote · 2019-02-04T21:45:46Z

I did a DHAT run. The "At t-gmax" measurement is the relevant one, it's short for "time of global max". It shows that the interning of constants within TypeFolder is accounting for over 54% of the global peak:

AP 1.1.1.1.1/2 (2 children) {
  Total:     912,261,120 bytes (12.02%, 7,312.63/Minstr) in 6 blocks (0%, 0/Minstr), avg size 152,043,520 bytes, avg lifetime 103,155,024,513.33 instrs (82.69% of program duration)
  At t-gmax: 912,261,120 bytes (54.74%) in 6 blocks (0%), avg size 152,043,520 bytes
  At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
  Reads:     1,827,458,569 bytes (4.97%, 14,648.81/Minstr), 2/byte
  Writes:    844,260,160 bytes (9.59%, 6,767.54/Minstr), 0.93/byte
  Allocated at {
    #1: 0xB66BCCB: alloc (alloc.rs:72)
    #2: 0xB66BCCB: alloc (alloc.rs:148)
    #3: 0xB66BCCB: allocate_in<u8,alloc::alloc::Global> (raw_vec.rs:96)
    #4: 0xB66BCCB: with_capacity<u8> (raw_vec.rs:140)
    #5: 0xB66BCCB: new<u8> (lib.rs:66)
    #6: 0xB66BCCB: arena::DroplessArena::grow (lib.rs:346)
    #7: 0x8C1BB25: alloc_raw (lib.rs:362)
    #8: 0x8C1BB25: alloc<rustc::ty::sty::LazyConst> (lib.rs:378)
    #9: 0x8C1BB25: alloc<rustc::ty::sty::LazyConst> (lib.rs:465)
    #10: 0x8C1BB25: intern_lazy_const (context.rs:1123)
    #11: 0x8C1BB25: <rustc::traits::project::AssociatedTypeNormalizer<'a, 'b, 'gcx, 'tcx> as rustc::ty::fold::TypeFolder<'gcx, 'tcx>>::fold_const (project.rs:423)
    #12: 0x8C1B235: fold_with<rustc::traits::project::AssociatedTypeNormalizer> (structural_impls.rs:1049)
    #13: 0x8C1B235: super_fold_with<rustc::traits::project::AssociatedTypeNormalizer> (structural_impls.rs:719)
    #14: 0x8C1B235: <rustc::traits::project::AssociatedTypeNormalizer<'a, 'b, 'gcx, 'tcx> as rustc::ty::fold::TypeFolder<'gcx, 'tcx>>::fold_ty (project.rs:337)
    #15: 0x890C0D0: fold_with<rustc::traits::project::AssociatedTypeNormalizer> (structural_impls.rs:769)
    #16: 0x890C0D0: super_fold_with<rustc::traits::project::AssociatedTypeNormalizer> (subst.rs:135)
    #17: 0x890C0D0: fold_with<rustc::ty::subst::Kind,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
    #18: 0x890C0D0: {{closure}}<rustc::traits::project::AssociatedTypeNormalizer> (subst.rs:328)
    #19: 0x890C0D0: call_once<(&rustc::ty::subst::Kind),closure> (function.rs:279)
    #20: 0x890C0D0: map<&rustc::ty::subst::Kind,rustc::ty::subst::Kind,&mut closure> (option.rs:414)
    #21: 0x890C0D0: next<rustc::ty::subst::Kind,core::slice::Iter<rustc::ty::subst::Kind>,closure> (mod.rs:567)
    #22: 0x890C0D0: <smallvec::SmallVec<A> as core::iter::traits::collect::Extend<<A as smallvec::Array>::Item>>::extend (lib.rs:1349)
    #23: 0x8EF9787: from_iter<[rustc::ty::subst::Kind; 8],core::iter::adapters::Map<core::slice::Iter<rustc::ty::subst::Kind>, closure>> (lib.rs:1333)
    #24: 0x8EF9787: collect<core::iter::adapters::Map<core::slice::Iter<rustc::ty::subst::Kind>, closure>,smallvec::SmallVec<[rustc::ty::subst::Kind; 8]>> (iterator.rs:1466)
    #25: 0x8EF9787: rustc::ty::subst::<impl rustc::ty::fold::TypeFoldable<'tcx> for &'tcx rustc::ty::List<rustc::ty::subst::Kind<'tcx>>>::super_fold_with (subst.rs:328)
    #26: 0x8C1B183: fold_with<&rustc::ty::List<rustc::ty::subst::Kind>,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
    #27: 0x8C1B183: super_fold_with<rustc::traits::project::AssociatedTypeNormalizer> (structural_impls.rs:721)
    #28: 0x8C1B183: <rustc::traits::project::AssociatedTypeNormalizer<'a, 'b, 'gcx, 'tcx> as rustc::ty::fold::TypeFolder<'gcx, 'tcx>>::fold_ty (project.rs:337)
    #29: 0x890C0D0: fold_with<rustc::traits::project::AssociatedTypeNormalizer> (structural_impls.rs:769)
    #30: 0x890C0D0: super_fold_with<rustc::traits::project::AssociatedTypeNormalizer> (subst.rs:135)
    #31: 0x890C0D0: fold_with<rustc::ty::subst::Kind,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
    #32: 0x890C0D0: {{closure}}<rustc::traits::project::AssociatedTypeNormalizer> (subst.rs:328)
    #33: 0x890C0D0: call_once<(&rustc::ty::subst::Kind),closure> (function.rs:279)
    #34: 0x890C0D0: map<&rustc::ty::subst::Kind,rustc::ty::subst::Kind,&mut closure> (option.rs:414)
    #35: 0x890C0D0: next<rustc::ty::subst::Kind,core::slice::Iter<rustc::ty::subst::Kind>,closure> (mod.rs:567)
    #36: 0x890C0D0: <smallvec::SmallVec<A> as core::iter::traits::collect::Extend<<A as smallvec::Array>::Item>>::extend (lib.rs:1349)
    #37: 0x8EF9787: from_iter<[rustc::ty::subst::Kind; 8],core::iter::adapters::Map<core::slice::Iter<rustc::ty::subst::Kind>, closure>> (lib.rs:1333)
    #38: 0x8EF9787: collect<core::iter::adapters::Map<core::slice::Iter<rustc::ty::subst::Kind>, closure>,smallvec::SmallVec<[rustc::ty::subst::Kind; 8]>> (iterator.rs:1466)
    #39: 0x8EF9787: rustc::ty::subst::<impl rustc::ty::fold::TypeFoldable<'tcx> for &'tcx rustc::ty::List<rustc::ty::subst::Kind<'tcx>>>::super_fold_with (subst.rs:328)
    #40: 0x8BFE173: fold_with<&rustc::ty::List<rustc::ty::subst::Kind>,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
    #41: 0x8BFE173: super_fold_with<rustc::traits::project::AssociatedTypeNormalizer> (macros.rs:344)
    #42: 0x8BFE173: fold_with<rustc::ty::sty::TraitRef,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
    #43: 0x8BFE173: super_fold_with<rustc::ty::sty::TraitRef,rustc::traits::project::AssociatedTypeNormalizer> (macros.rs:397)
    #44: 0x8BFE173: fold_with<core::option::Option<rustc::ty::sty::TraitRef>,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
    #45: 0x8BFE173: super_fold_with<rustc::traits::project::AssociatedTypeNormalizer> (macros.rs:344)
    #46: 0x8BFE173: fold_with<rustc::ty::ImplHeader,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
    #47: 0x8BFE173: fold<rustc::ty::ImplHeader> (project.rs:315)
    #48: 0x8BFE173: normalize_with_depth<rustc::ty::ImplHeader> (project.rs:274)
    #49: 0x8BFE173: normalize<rustc::ty::ImplHeader> (project.rs:258)
    #50: 0x8BFE173: rustc::traits::coherence::with_fresh_ty_vars (coherence.rs:107)

nnethercote · 2019-02-04T21:49:58Z

@eddby @oli-obk @RalfJung Any thoughts on how to improve intern_lazy_const?

RalfJung · 2019-02-04T22:21:29Z

Cc @eddyb

nnethercote · 2019-02-05T07:51:21Z

Any thoughts on how to improve intern_lazy_const?

There is an obvious problem: intern_lazy_const doesn't intern the value! And the values passed are exceedingly repetitive. Here's a histogram of the top 10, which account for 97.2% of the calls:

17886042 counts:
(  1)  5253160 (29.4%, 29.4%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 2 }) })
(  2)  5192895 (29.0%, 58.4%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 4 }) })
(  3)  3928986 (22.0%, 80.4%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 8 }) })
(  4)  1600916 ( 9.0%, 89.3%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 16 }) })
(  5)   719785 ( 4.0%, 93.3%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 32 }) })
(  6)   299507 ( 1.7%, 95.0%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 1 }) })
(  7)   271847 ( 1.5%, 96.5%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 64 }) })
(  8)    61636 ( 0.3%, 96.9%): Unevaluated(DefId(0/1:4735 ~ packed_simd[3c0f]::vPtr[0]::mptrx4[0]::{{constant}}[0]), [])
(  9)    61636 ( 0.3%, 97.2%): Unevaluated(DefId(0/1:4823 ~ packed_simd[3c0f]::vPtr[0]::mptrx8[0]::{{constant}}[0]), [])
( 10)    61636 ( 0.3%, 97.6%): Unevaluated(DefId(0/1:4653 ~ packed_simd[3c0f]::vPtr[0]::mptrx2[0]::{{constant}}[0]), [])

Fixing this should drastically reduce the memory usage.

I tried doing the obvious thing by introducing GlobalCtxt::lazy_const_interner, heavily inspired by GlobalCtxt::layout_interner, but I couldn't get the lifetimes to work. I will try again tomorrow if nobody else beats me to it.

Currently it just unconditionally allocates it in the arena. For a "Clean Check" build of the the `packed-simd` benchmark, this change reduces both the `max-rss` and `faults` counts by 59%; it slightly (~3%) increases the instruction counts but the `wall-time` is unchanged. For the same builds of a few other benchmarks, `max-rss` and `faults` drop by 1--5%, but instruction counts and `wall-time` changes are in the noise. Fixes rust-lang#57432, fixes rust-lang#57829.

hsivonen · 2019-02-07T09:37:25Z

FWIW, without the in-flight fix here, a relatively small tweak to packed_simd made packed_simd uncompilable on an ARMv7 system whose /proc/meminfo says there's 3624684 kB of RAM plus some swap. (And a Chrome OS kernel; I don't know what kind of swap use policy Chrome OS applies.)

I'll test again once the fix for this issue is in nightly.

RalfJung · 2019-02-09T10:50:48Z

This just brought down my whole system -- 16GB of RAM used to be enough to compile two rustc in parallel (with 8 jobs each), but with the current RAM consumption that does not seem to be the case any more.

Make `intern_lazy_const` actually intern its argument. Currently it just unconditionally allocates it in the arena. For a "Clean Check" build of the the `packed-simd` benchmark, this change reduces both the `max-rss` and `faults` counts by 59%; it slightly (~3%) increases the instruction counts but the `wall-time` is unchanged. For the same builds of a few other benchmarks, `max-rss` and `faults` drop by 1--5%, but instruction counts and `wall-time` changes are in the noise. Fixes #57432, fixes #57829.

oli-obk · 2019-02-10T10:53:03Z

Can you try again with today's nightly?

hsivonen · 2019-02-11T11:52:00Z

FWIW, without the in-flight fix here, a relatively small tweak to packed_simd made packed_simd uncompilable on an ARMv7 system whose /proc/meminfo says there's 3624684 kB of RAM plus some swap. (And a Chrome OS kernel; I don't know what kind of swap use policy Chrome OS applies.)

I'll test again once the fix for this issue is in nightly.

Much better memory usage now. Thank you!

It seems it would be worthwhile to nominate this for uplift to beta, but I'm not permitted to add the tag myself.

Currently it just unconditionally allocates it in the arena. For a "Clean Check" build of the the `packed-simd` benchmark, this change reduces both the `max-rss` and `faults` counts by 59%; it slightly (~3%) increases the instruction counts but the `wall-time` is unchanged. For the same builds of a few other benchmarks, `max-rss` and `faults` drop by 1--5%, but instruction counts and `wall-time` changes are in the noise. Fixes rust-lang#57432, fixes rust-lang#57829.

hsivonen mentioned this issue Jan 22, 2019

Switch from simd to packed_simd hsivonen/encoding_rs#23

Closed

Centril added I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. A-simd Area: SIMD (Single Instruction Multiple Data) labels Jan 22, 2019

nnethercote mentioned this issue Feb 6, 2019

Make intern_lazy_const actually intern its argument. #58207

Merged

bors closed this as completed in #58207 Feb 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate memory usage of compiling the packed_simd crate #57829

Investigate memory usage of compiling the packed_simd crate #57829

hsivonen commented Jan 22, 2019

gnzlbg commented Jan 22, 2019

matthiaskrgr commented Jan 22, 2019

gnzlbg commented Jan 22, 2019 •

edited

Loading

memoryruins commented Jan 22, 2019

mati865 commented Jan 30, 2019

gnzlbg commented Jan 30, 2019 •

edited

Loading

nnethercote commented Feb 4, 2019

nnethercote commented Feb 4, 2019

RalfJung commented Feb 4, 2019

nnethercote commented Feb 5, 2019

hsivonen commented Feb 7, 2019

RalfJung commented Feb 9, 2019

oli-obk commented Feb 10, 2019

hsivonen commented Feb 11, 2019

Investigate memory usage of compiling the packed_simd crate #57829

Investigate memory usage of compiling the packed_simd crate #57829

Comments

hsivonen commented Jan 22, 2019

Steps to reproduce

Actual results

Expected results

Additional info

gnzlbg commented Jan 22, 2019

matthiaskrgr commented Jan 22, 2019

gnzlbg commented Jan 22, 2019 • edited Loading

memoryruins commented Jan 22, 2019

mati865 commented Jan 30, 2019

gnzlbg commented Jan 30, 2019 • edited Loading

nnethercote commented Feb 4, 2019

nnethercote commented Feb 4, 2019

RalfJung commented Feb 4, 2019

nnethercote commented Feb 5, 2019

hsivonen commented Feb 7, 2019

RalfJung commented Feb 9, 2019

oli-obk commented Feb 10, 2019

hsivonen commented Feb 11, 2019

gnzlbg commented Jan 22, 2019 •

edited

Loading

gnzlbg commented Jan 30, 2019 •

edited

Loading