Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre-populate code cache from provided profiling data #2463

Closed
derekbruening opened this issue Jun 2, 2017 · 3 comments
Closed

pre-populate code cache from provided profiling data #2463

derekbruening opened this issue Jun 2, 2017 · 3 comments

Comments

@derekbruening
Copy link
Contributor

In some attach scenarios (currently only fully supported with static DR in a start/stop type model) we want to pre-populate the code cache on the side so when we trigger the attach we can avoid the cost of building the cache up. This issue covers creating some kind of interface where a list of basic block start tags obtained from profiling can be passed to DR to create a pre-warmed cache.

@derekbruening
Copy link
Contributor Author

One complication here is that the user of this interface typically does some kind of sampling to obtain the list of tags. Decoding is often needed to further find target bb entries, and self-sampling of this decoding code can produce tags for DR code. This then ends up instrumenting such bbs, even if they will never be executed. For static DR this can hit this assert:

<Application xxx (112699) DynamoRIO usage error : dr_insert_get_seg_base supports TLS segonly with -private_loader>

#5  0x00000000043e1bdb in external_error (file=0x6baea47 <.L.str.1> "core/lib/instrument.c", line=7231,
    msg=0x6bb0fd5 <.L.str.225> "dr_insert_get_seg_base supports TLS segonly with -private_loader") at core/utils.c:200
#6  0x0000000004285ac2 in dr_insert_get_seg_base (drcontext=0x52c75080, ilist=0x52ca20f0, instr=0x52cca540, seg=106, reg=2) at core/lib/instrument.c:7229
#7  0x0000000003fab9d6 in drutil_insert_get_mem_addr_x86 (drcontext=0x52c75080, bb=0x52ca20f0, where=0x52cca540, memref=..., dst=1, scratch=2) at ext/drutil/drutil.c:151
#8  0x0000000003fab83a in drutil_insert_get_mem_addr (drcontext=0x52c75080, bb=0x52ca20f0, where=0x52cca540, memref=..., dst=1, scratch=2) at ext/drutil/drutil.c:121
#9  0x0000000003f96353 in instru_t::insert_obtain_addr (this=0x52c2bf00, drcontext=0x52c75080, ilist=0x52ca20f0, where=0x52cca540, reg_addr=1, reg_scratch=2, ref=...)
    at clients/drcachesim/tracer/instru.cpp:109
#10 0x0000000003f96edd in offline_instru_t::insert_save_addr (this=0x52c2bf00, drcontext=0x52c75080, ilist=0x52ca20f0, where=0x52cca540, reg_ptr=2, reg_addr=1, adjust=8, ref=..., write=false)
    at clients/drcachesim/tracer/instru_offline.cpp:258
#11 0x0000000003f971aa in offline_instru_t::instrument_memref (this=0x52c2bf00, drcontext=0x52c75080, ilist=0x52ca20f0, where=0x52cca540, reg_ptr=2, reg_tmp=1, adjust=8, app=0x52cca540, ref=..., write=false,
    pred=DR_PRED_NONE) at clients/drcachesim/tracer/instru_offline.cpp:284
#12 0x0000000003f9bff9 in instrument_memref (drcontext=0x52c75080, ud=0x52ca4868, ilist=0x52ca20f0, where=0x52cca540, reg_ptr=2, reg_tmp=1, adjust=8, app=0x52cca540, ref=..., write=false, pred=DR_PRED_NONE)
    at clients/drcachesim/tracer/tracer.cpp:697
#13 0x0000000003f9a462 in event_app_instruction (drcontext=0x52c75080, tag=0x44bb25f <safe_read_tls_magic>, bb=0x52ca20f0, instr=0x52cca540, for_trace=false, translating=false, user_data=0x52ca4868)
    at clients/drcachesim/tracer/tracer.cpp:907
#14 0x0000000003fa86e5 in drmgr_bb_event (drcontext=0x52c75080, tag=0x44bb25f <safe_read_tls_magic>, bb=0x52ca20f0, for_trace=0 '\000', translating=0 '\000') at ext/drmgr/drmgr.c:656
#15 0x0000000004272eb6 in instrument_basic_block (dcontext=0x52c75080, tag=0x44bb25f <safe_read_tls_magic> "e\213\004%h", bb=0x52ca20f0, for_trace=0 '\000', translating=0 '\000', emitflags=0x7fa9bf007584)
    at core/lib/instrument.c:1554
#16 0x000000000402c5e6 in client_process_bb (dcontext=0x52c75080, bb=0x7fa9bf007c00) at core/arch/interp.c:2799
#17 0x0000000004001f44 in build_bb_ilist (dcontext=0x52c75080, bb=0x7fa9bf007c00) at core/arch/interp.c:4166
#18 0x000000000400953c in build_basic_block_fragment (dcontext=0x52c75080, start=0x44bb25f <safe_read_tls_magic> "e\213\004%h", initial_flags=0, link=1 '\001', visible=1 '\001', for_trace=0 '\000',
    unmangled_ilist=0x0) at core/arch/interp.c:5175
#19 0x00000000042863c0 in dr_prepopulate_cache (tags=0x7fa9b88fe020, tags_count=75761) at core/lib/instrument.c:7378

We disabled -private_loader for static DR (#2117).

For non-static-DR we would just avoid building this bb by checking for a DR
address. For static we'd have to check safe_read_tls_{magic,self}. And some
other inline asm os.c routine for ARM? What if -no_safe_read_tls_init? What
about TLS reads inside C functions with branches where the bb won't start at
some named label? Our samples should only be in routines called while decoding
which hopefully will only be get_thread_private_dcontext() but that still hits
the aforementioned issues for some arch or option combos.

It would be nice to turn asserts into just bb building failures during prepop:
though perhaps not all asserts as there could be general errors in the client
instru or sthg. dr_insert_get_seg_base() already returns false on the next
line, so we could remove this assert. It will get back to
instru_t::insert_obtain_addr() which has a DR_ASSERT(ok), which is a
release-build failure. It seems like somebody has to fail somewhere, and DR's
bb building events aren't set up to handle a client bb failing and aborting bb
building, b/c normally that's not an option for non-prepop bb building. It
seems we have to avoid building the bb in the first place, or always silence
this type of failure.

@derekbruening
Copy link
Contributor Author

For the DR segment read: I'm going with a solution of looking for the start PC of the
safe_read_tls_{magic,self} routines and re-examining if we hit this on ARM too.
Kind of ugly but the alternatives seem uglier.

derekbruening added a commit that referenced this issue Jul 7, 2017
Adds a new API routine dr_prepopulate_cache() meant to be called between
dr_app_setup() and dr_app_start() to build up a code cache in parallel with
app execution, to avoid the cost of a cold cache upon attach.

Fixes some auxiliary issues with building blocks ahead of time from
sampling data:
+ Sets up TLS during bb building for dcontexts but does not enable signal
  handlers to avoid perturbing the app.
+ Adds is_DR_segment_reader_entry() to avoid pre-building a problematic
  self-sampled bb that reads DR segments when DR is a static library and we
  do not support mangling such reads.
+ Fixes an initialized-dcontext issue in dr_get_isa_mode().

Adds a test.

Fixes #2463
derekbruening added a commit that referenced this issue Jul 7, 2017
Adds a new API routine dr_prepopulate_cache() meant to be called between
dr_app_setup() and dr_app_start() to build up a code cache in parallel with
app execution, to avoid the cost of a cold cache upon attach.

Solves several issues with building blocks ahead of time:
+ Sets up TLS during bb building for dcontexts but does not enable signal
  handlers to avoid perturbing the app.
+ Adds is_DR_segment_reader_entry() to avoid pre-building a problematic
  self-sampled bb that reads DR segments when DR is a static library and we
  do not support mangling such reads.
+ Fixes an initialized-dcontext issue in dr_get_isa_mode().

Adds a test.

Fixes #2463
@derekbruening
Copy link
Contributor Author

Xref #1594. The caveat about the ISA mode apply here: it's up to the caller of this API there. The caveat about the mcontext at the top of the bb also applies: the example there is Dr. Memory which uses the mcontext of the very first bb event to get the mcontext for the primary thread to work around DR's failure to provide it there (#1152).

derekbruening added a commit that referenced this issue Jun 18, 2018
Adds dr_prepopulate_indirect_targets() for filling in indirect branch
target tables to avoid trips to dispatch when attaching.

Adds a test to api.static_prepop.

Issue: #2463
derekbruening added a commit that referenced this issue Jun 19, 2018
Adds dr_prepopulate_indirect_targets() for filling in indirect branch
target tables to avoid trips to dispatch when attaching.

Adds a test to api.static_prepop.

Issue: #2463
derekbruening added a commit that referenced this issue Jul 13, 2018
Fixes two races with shared ibt tables:

+ Adding a new table entry must write the start_pc before the tag.
  This is accomplished with a new ENTRY_SET_TO_ENTRY hashtablex.h
  optional specifier.  For ARM #2502 a new MEMORY_STORE_BARRIER macro
  is added.

+ Resizing a table must not clear the tags in the old table to avoid
  losing the tag on the target_delete ibl path.

Adds a test api.ibl-stress which uses the DR IR to synthetically
construct thousands of basic blocks with indirect branches betweent
them.

To make the test work, relaxes several is-on-stack checks to support
pre-building basic blocks (#2463) from generated code or other
locations not known prior to starting the application.

Issue: #3098, #2502, #2463

Fixes #3098
derekbruening added a commit that referenced this issue Jul 13, 2018
Fixes two races with shared ibt tables:

+ Adding a new table entry must write the start_pc before the tag.
  This is accomplished with a new ENTRY_SET_TO_ENTRY hashtablex.h
  optional specifier.  For ARM #2502 a new MEMORY_STORE_BARRIER macro
  is added.

+ Resizing a table must not clear the tags in the old table to avoid
  losing the tag on the target_delete ibl path.

Adds a test api.ibl-stress which uses the DR IR to synthetically
construct thousands of basic blocks with indirect branches betweent
them.

To make the test work, relaxes several is-on-stack checks to support
pre-building basic blocks (#2463) from generated code or other
locations not known prior to starting the application.

Issue: #3098, #2502, #2463

Fixes #3098
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant