Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summarize: add "aggregate" sub-command for analyzing sets of profiles. #129

Merged
merged 2 commits into from
Feb 19, 2021

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Aug 23, 2020

For my ongoing rdpmc instruction-counting work I've ended up with a lot of profile results (over 700 and there were hundreds I've already removed), that I don't really have any tools to analyze in-depth.

The fundamental problem is that e.g. summarize summarize will effectively sum together many small intervals between consecutive timestamps/instruction counts, so the distribution of variance in individual samples (of the clock/instruction counter), or the exact sources of large variances, are lost in the summary.

This PR seeks to address that by introducing a summarize aggregate command specifically for sets of deterministic runs (identical in all respects except the timestamps themselves). Example output:

$ summarize aggregate core-{48481,48701,48905,49453,49648,49860,50178,50587,50783,50978}
Smallest 10 durations:
  80 ns: 81 occurrences, or 0.00%
  90 ns: 10018 occurrences, or 0.59%
  91 ns: `hir_owner()`
  100 ns: 38546 occurrences, or 2.26%
  101 ns: 9 occurrences, or 0.00%
  110 ns: 49097 occurrences, or 2.87%
  111 ns: 14 occurrences, or 0.00%
  120 ns: 42200 occurrences, or 2.47%
  121 ns: 22 occurrences, or 0.00%
  130 ns: 52373 occurrences, or 3.07%

Largest 10 durations:
  111073277 ns: in `typeck()`,
    between `parent_module_from_def_id()`
        and `upvars_mentioned()`

  112291280 ns: `free_global_ctxt()`
  117943976 ns: in `typeck()`,
    between `parent_module_from_def_id()`
        and `upvars_mentioned()`

  126222248 ns: in `typeck()`,
    between `opt_const_param_of()`
        and `upvars_mentioned()`

  208308909 ns: in `typeck()`,
    between `parent_module_from_def_id()`
        and `upvars_mentioned()`

  220730751 ns: `build_hir_map()`
  221392949 ns: in `typeck()`,
    between `parent_module_from_def_id()`
        and `upvars_mentioned()`

  375357799 ns: `hir_lowering()`
  858511163 ns: `resolve_crate()`
  1139917634 ns: in `expand_crate()`, after `pre_AST_expansion_lint_checks()`

Smallest 10 variances:
  ±0 ns: 6057 occurrences, or 0.35%
  ±0.5 ns: 79 occurrences, or 0.00%
  ±4.5 ns: 8 occurrences, or 0.00%
  ±5 ns: 107368 occurrences, or 6.28%
  ±5.5 ns: 775 occurrences, or 0.05%
  ±9.5 ns: 55 occurrences, or 0.00%
  ±10 ns: 112015 occurrences, or 6.56%
  ±10.5 ns: 390 occurrences, or 0.02%
  ±14.5 ns: 42 occurrences, or 0.00%
  ±15 ns: 58164 occurrences, or 3.40%

Largest 10 variances:
  ±6107910 ns: `get_lib_features()`
  ±6942093.5 ns: `hir_lowering()`
  ±8277596.5 ns: in `privacy_access_levels()`,
    between `module_exports()`
        and `def_kind()`

  ±8340332.5 ns: in `typeck()`,
    between `parent_module_from_def_id()`
        and `upvars_mentioned()`

  ±8977432 ns: `resolve_crate()`
  ±9465039 ns: in `typeck()`,
    between `parent_module_from_def_id()`
        and `upvars_mentioned()`

  ±9681275.5 ns: in `privacy_access_levels()`, after `module_exports()`
  ±9737941 ns: in `death_checking()`,
    between `codegen_fn_attrs()`
        and `inherent_impls()`

  ±10313490.5 ns: in `expand_crate()`, after `pre_AST_expansion_lint_checks()`
  ±14306281.5 ns: `free_global_ctxt()`

For comparison, this is my best set of runs so far, for instruction counting (with rdpmc measuring both instructions:u and hardware interrupts, to account for iret overcounting, and cpuid for serialization):

$ summarize aggregate core-{047003,047937,048427,048692,048991,049165,049351,049555,049743,050036} \
    | sed 's/ ns/ instructions/'
Smallest 10 durations:
  97 instructions: 3 occurrences, or 0.00%
  99 instructions: 2 occurrences, or 0.00%
  112 instructions: `llvm_dump_timing_file()`
  114 instructions: `dep_graph_tcx_init()`
  121 instructions: `link_binary_check_files_are_writeable()`
  123 instructions: `plugin_registration()`
  157 instructions: `serialize_work_products()`
  164 instructions: `attributes_injection()`
  166 instructions: `drop_dep_graph()`
  170 instructions: in `index_hir()`, after `build_hir_map()`

Largest 10 durations:
  289482983 instructions: `early_lint_checks()`
  421515729 instructions: in `typeck()`,
    between `parent_module_from_def_id()`
        and `upvars_mentioned()`

  426300619 instructions: in `typeck()`,
    between `parent_module_from_def_id()`
        and `upvars_mentioned()`

  434275596 instructions: in `typeck()`,
    between `opt_const_param_of()`
        and `upvars_mentioned()`

  754467760 instructions: in `typeck()`,
    between `parent_module_from_def_id()`
        and `upvars_mentioned()`

  769694431 instructions: in `typeck()`,
    between `parent_module_from_def_id()`
        and `upvars_mentioned()`

  1082435704 instructions: `hir_lowering()`
  1270692563 instructions: `build_hir_map()`
  3884803038 instructions: `resolve_crate()`
  4059854999 instructions: in `expand_crate()`, after `pre_AST_expansion_lint_checks()`

Smallest 10 variances:
  ±0 instructions: 1701795 occurrences, or 99.61%
  ±0.5 instructions: 519 occurrences, or 0.03%
  ±1 instructions: 11 occurrences, or 0.00%
  ±1.5 instructions: 61 occurrences, or 0.00%
  ±2 instructions: 290 occurrences, or 0.02%
  ±2.5 instructions: 108 occurrences, or 0.01%
  ±3 instructions: 6 occurrences, or 0.00%
  ±3.5 instructions: 124 occurrences, or 0.01%
  ±4 instructions: 68 occurrences, or 0.00%
  ±4.5 instructions: 71 occurrences, or 0.00%

Largest 10 variances:
  ±217.5 instructions: in `mir_borrowck()`,
    between `mir_validated()`
        and `is_late_bound_map()`

  ±236.5 instructions: in `mir_borrowck()`,
    between `is_late_bound_map()`
        and `type_op_prove_predicate()`

  ±263.5 instructions: in `mir_borrowck()`, after `is_late_bound_map()`
  ±267.5 instructions: in `mir_borrowck()`,
    between `mir_validated()`
        and `is_late_bound_map()`

  ±354.5 instructions: in `mir_borrowck()`, after `type_op_prove_predicate()`
  ±445 instructions: in `mir_borrowck()`, after `is_late_bound_map()`
  ±468.5 instructions: in `drop_ast()`, before `analysis()`
  ±649.5 instructions: `hir_lowering()`
  ±659 instructions: `free_global_ctxt()`
  ±1147 instructions: in `expand_crate()`, after `pre_AST_expansion_lint_checks()`

This is not the intended final output, just the simplest analysis I could do, that is still somewhat indicative of the quality of the data. To be truly useful in improving that quality, it has to be able to pinpoint to outliers, not just quantify their existence.

EDIT: added some very basic indications of sources of single occurrences (and updated the example output above)
EDIT2: streamlined the single-occurrence output and included the parent (or caller) of the two interval endpoints, when it's relevant (i.e. when the interval is part of it, between one call ending and the other starting, not either callee)

Copy link
Member

@wesleywiser wesleywiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I think all of the feedback I could give is covered by the FIXMEs 🙂

@wesleywiser wesleywiser self-assigned this Aug 26, 2020
@eddyb eddyb force-pushed the aggregate branch 5 times, most recently from e79bac9 to ad97fb3 Compare September 8, 2020 20:52
@eddyb eddyb force-pushed the aggregate branch 2 times, most recently from ef25949 to ccbbe4c Compare October 25, 2020 09:29
measureme/src/counters.rs Outdated Show resolved Hide resolved
measureme/src/counters.rs Outdated Show resolved Hide resolved
measureme/src/counters.rs Outdated Show resolved Hide resolved
measureme/src/counters.rs Outdated Show resolved Hide resolved
@eddyb
Copy link
Member Author

eddyb commented Oct 26, 2020

Uhh, please ignore the extra commits, they're from rebasing on the rdpmc branch, PR for which is coming soon.
I should've rebased on the commit just before but this was more convenient.

@eddyb
Copy link
Member Author

eddyb commented Nov 3, 2020

Not sure if I should put this into the PR description, but these are the current results with #143's instructions-minus-irqs:u (specifically for the runs used in the "Results" section):

Smallest 10 durations:
  119 instructions: `llvm_dump_timing_file()`
  122 instructions: `dep_graph_tcx_init()`
  129 instructions: `link_binary_check_files_are_writeable()`
  131 instructions: `plugin_registration()`
  170 instructions: `serialize_work_products()`
  171 instructions: `drop_dep_graph()`
  176 instructions: `attributes_injection()`
  208 instructions: 3 occurrences, or 0.00%
  209 instructions: `link_binary_remove_temps()`
  210 instructions: 2 occurrences, or 0.00%

Largest 10 durations:
  437874189 instructions: `early_lint_checks()`
  456828967 instructions: in `typeck()`,
    between `parent_module_from_def_id()`
        and `upvars_mentioned()`

  469996314 instructions: in `typeck()`,
    between `parent_module_from_def_id()`
        and `upvars_mentioned()`

  501380829 instructions: `drop_ast()`
  678063701 instructions: in `typeck()`,
    between `opt_const_param_of()`
        and `in_scope_traits_map()`

  679879471 instructions: in `typeck()`,
    between `opt_const_param_of()`
        and `in_scope_traits_map()`

  1381363724 instructions: `hir_lowering()`
  2099521684 instructions: `build_hir_map()`
  3951196865 instructions: `resolve_crate()`
  6541458129 instructions: in `expand_crate()`, after `pre_AST_expansion_lint_checks()`

Smallest 4 variances:
  ±0 instructions: 1903452 occurrences, or 99.98%
  ±0.5 instructions: 423 occurrences, or 0.02%
  ±1 instructions: 3 occurrences, or 0.00%
  ±4.5 instructions: `write_crate_metadata()`

Largest 4 variances:
  ±0 instructions: 1903452 occurrences, or 99.98%
  ±0.5 instructions: 423 occurrences, or 0.02%
  ±1 instructions: 3 occurrences, or 0.00%
  ±4.5 instructions: `write_crate_metadata()`

(Should probably not show the "Largest N variances" entries that are also in the "Smallest N variances")

@eddyb eddyb marked this pull request as ready for review November 23, 2020 00:44
@eddyb eddyb changed the title [WIP] summarize: add "aggregate" sub-command for analyzing sets of profiles. summarize: add "aggregate" sub-command for analyzing sets of profiles. Nov 23, 2020
@eddyb
Copy link
Member Author

eddyb commented Nov 23, 2020

This looks good to me. I think all of the feedback I could give is covered by the FIXMEs 🙂

Do you think any of them should block merging? I think this could be useful to anyone who wants to reproduce the rdpmc experiments we did.

Copy link
Member

@wesleywiser wesleywiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think any of them should block merging? I think this could be useful to anyone who wants to reproduce the rdpmc experiments we did.

No, I don't think so. Sorry for the delay on this, I wanted to review it again since it's been a while.

.map(|file| ProfilingData::new(&file))
.collect::<Result<Vec<_>, _>>()?;

// FIXME(eddyb) return some kind of serializable data structure from `aggregate_profiles`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nice to have since it would probably be part of any plans to implement the aggregation feature for perf.rlo.

summarize/src/aggregate.rs Outdated Show resolved Hide resolved
@wesleywiser wesleywiser merged commit 61e3581 into rust-lang:master Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants