Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta hide_in_stacktrace to hide internal julia functions #35371

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

c42f
Copy link
Member

@c42f c42f commented Apr 6, 2020

Currently stacktrace uses the heuristic that every C function is internal, and every Julia function is of interest to the user. This works surprisingly well, but on occasion it can lead to confusion or redundancy. A particular example was include() as noted in #33065. This has since been worked around, but at the cost of duplicating the implementation of include().

This PR implements a hide_in_stacktrace meta to indicate that certain frames should be hidden, reviving an idea originally mentioned in #33065 (comment). (Though with the motivation of removing the duplication of include in preparation for moving more of the runtime C frontend code into Julia code as part of #35243)

I've used this to mark a few functions as hidden:

  • Functions that unconditionally throw an error aren't terribly useful to have in the stacktrace, for example error
  • Functions that walk the stack should be removed from the visible stacktrace. Hidden frames provide a neat way to do this.
  • Shims generated by lowering (keyword sorters) should be removed from stacktraces. This was previously fixed by relying on a naming convention, but attaching a meta during lowering is a bit neater.
  • The aforementioned include() internals

Minor caveats and yak shaving

  • Currently our debug info lookup can't get at the CodeInfo flag for inlined functions, so I've made @hide_in_stacktrace imply noinline for now. This seems ok for the types of backtrace-producing and error throwing functions this is typically useful for.
  • Serialization tries to support deserializing CodeInfo from older Julia versions, but any added CodeInfo flags are Bool which leaves us unable to infer the CodeInfo version. To fix this, I've added an explicit codeinfo version to the serialized form. The rationale for this is explained in much more detail in the third commit message.

@c42f
Copy link
Member Author

c42f commented Apr 6, 2020

Test failures seem to be #35356. Rebasing...

c42f added 3 commits April 6, 2020 19:17
This implements a hide_in_backtrace meta which may be attached to
individual methods (via a flag on the CodeInfo) to mark them for hiding
in backtraces.

Use this infrastructure to hide several julia methods by default (these
can still be seen using the internal_funcs flag to backtrace())
* error(), backtrace(), stacktrace()
* keyword sorters
This reunifies the versions of include() in MainInclude and Base which
have started diverging after being duplicated for improved stack traces.
@ hide_in_backtrace provides a solution where the implementation is
marked as internal.
Here we introduce an explicit codeinfo_ver version number to track the
CodeInfo struct. Core.CodeInfo has seen multiple changes between Julia
versions and the addition of the hide_in_stacktrace flag is not possible
to detect via the types which exist in the serialized stream.

To solve this problem and to avoid backward compatibility code becoming
more convoluted and hard to maintain in the future, it seems simplest to
introduce an Int32 version flag which should be a single byte of
overhead.

An alternative which was considered was to rely on the value of the
header ser_version. However, this would need to be somehow stored with
any user-defined subtype of AbstractSerializer which would be a API
breaking change.
@StefanKarpinski
Copy link
Member

I would argue that the default heuristic should probably be changed. Stack traces should, by default, only show functions that are exported. Yes, that means we lose information in bug reports, which is visible to people here and therefore will be unpopular, but the flip side is that we are scaring huge numbers of people away from Julia with long, unintelligible stack traces, which is a silent but more worse problem.

@KristofferC
Copy link
Member

only show functions that are exported.

You mean literally exported? That seems like a too rough heuristic, export is only loosely tried to public API.

@StefanKarpinski
Copy link
Member

We probably should have a way to declare something to be public without exporting it, but we don't so for now export is the best we have. Another possibility would be if the caller and the callee are not defined in the same module, then the callee should be included in the stack trace. If the caller and the callee are defined in the same module and the callee isn't exported, then the call can be considered an implementation detail and omitted from the stack trace by default.

@JeffBezanson
Copy link
Member

I really disagree with removing a lot of frames from stack traces. When the truth is ugly, lying about it tends not to make things better. Somebody who doesn't want to look at a stack trace will still not want to look at it, but if somebody does want to try to trace through and see what happened, it will become impossible.

Stack traces always and inherently show implementation details. That's their purpose. The fact that f calls g is an implementation detail no matter what g is or where it's defined.

If we want to do something here, rather than having a heuristic that leaves out random frames we should address this with formatting or UI. For example maybe show only the top 1 or 2 frames with a key you can press to see the rest. Though that will certainly reduce the usefulness of bug reports, as you noted.

@JeffBezanson
Copy link
Member

The inlining pass can see this flag, so it should be possible to have it remove the frame info for a function it's inlining instead of implying noinline.

@StefanKarpinski
Copy link
Member

I think you're really viewing stack traces purely from the point of view of a developer of the system, not as a user of the system. For most users, the primary purpose of a stack trace is to communicate to them where they did something wrong, not where in the guts of the system the thing they did wrong happened to trigger a problem. I agree that it is a UI problem, however, and we don't need to not capture those stack frames, we just need to not show them all the time.

@JeffBezanson
Copy link
Member

I understand that seeing a large stack trace full of internals is confusing and unhelpful. That's not hard to imagine. I just think there is a large asymmetry: hiding information could be a big nuisance for a (julia or package) developer or anybody who might want to look into the stack trace, but is at best a small help to somebody who finds stack traces confusing.

I agree it would be great if we could just show what the user did wrong, but I don't think that's possible. For example, if you pass an invalid value to a function that doesn't have an explicit check for that case, human knowledge is required to know at which level the invalid value arose. It also might just be a bug in the library code.

But I do think we can and should improve stack trace ergonomics (e.g. truncating huge types, formatting, other things discussed in #33065), and I'm open to UI changes.

@c42f
Copy link
Member Author

c42f commented Apr 7, 2020

If we want to do something here, rather than having a heuristic that leaves out random frames we should address this with formatting or UI.

This is exactly what this is intended to implement: a way to refine the existing formatting heuristic that C frames are left out. So perhaps a compromise would be to not use it in backtrace, and only use it in showerror, because showerror is UI? But then, should we include C frames in backtrace by default? I mean, if someone is calling into an external C library which calls a julia callback, having the C frames would be very useful in bug reports. Also for reporting crashes in the runtime.

Probably a better option would be to open a log file for each julia session and to log much more detailed backtraces in there, separately from the summary that the user sees by default. Then get people used to sending in their log files, and also have some REPL tooling to see a more full backtrace for the last error?

The inlining pass can see this flag, so it should be possible to have it remove the frame info for a function it's inlining instead of implying noinline.

I considered this but I didn't pursue it yet because I worried it would imply loosing those frames completely in the raw backtrace() and stacktrace(true). Though I guess we could munge the flag into the debug info string and extract it when reading debug info. Relatedly, this is a problem for getting the module of inlined functions, which would be super handy for improving the backtrace printing heuristic (cf #33065 (comment) #33065 (comment))

@c42f c42f requested review from JeffBezanson and removed request for vtjnash April 7, 2020 04:24
@antoine-levitt
Copy link
Contributor

I agree stack traces need improvements, but removing non exported functions feels much too blunt. Hiding types (like juno does) or having an array display of traces would be a better way to help this.

@c42f
Copy link
Member Author

c42f commented Apr 7, 2020

Here's a summary of a fairly long and productive discussion I had about this with Jeff and Keno.

I think Jeff had a very strong intuitive reaction that completely hiding frames is a bad idea... somehow that it's aberrant behavior / against the spirit of what a backtrace should offer / and that offering it to libraries at the Julia level would just be a very bad idea. I didn't buy all the arguments for this, but I'm convinced there was some underlying kernel of truth, if only we could express it clearly.

Eventually we got to discussing why some frames may be considered truly internal which justifies hiding them in the current implementation, and why most frames do not qualify for this treatment. Here's my takeaways from that discussion:

Truly internal frames are those which exist only as part of the "underlying computational substrate" which implements julia semantics. Personally I believe examples of this are:

  • C frames which are part of the interpreter (eg, eval_body)
  • C frames which are part of the julia calling convention (eg, jl_apply)
  • Julia frames which are part of the JuliaInterpreter interpreter (rather than the code being interpreted)
  • Shims produced as an implementation detail of code lowering (eg keyword sorters)
  • Functions which implement backtrace collection (backtrace, stacktrace)

Frames which should be user visible are any frames which would logically be part of the call stack if there was a formal notion of the abstract machine on which Julia runs. These include

  • error, which is a perfectly normal Julia function
  • Frames of external C libraries which the user might call. For example, GTK.

Now we see how our current heuristics do fail us in certain circumstances:

  • If you call stacktrace() from within JuliaInterpreter you get something completely different and unrelated to the stacktrace of the user code that is running. (It's all internal JuliaInterpreter guts, at least in the version I happen to have installed.)
  • Calling stacktrace() will omit frames from external C libraries such as GTK. These are logically part of the user's program and could be rather important to the user.

Conversely I think the following comment in debuginfo is another interesting case where we correctly distinguish between internal vs user-visible frames:
https://github.com/JuliaLang/julia/blob/master/src/debuginfo.cpp#L512-L515

I hope that categorization is not too controversial and can help frame the discussion here?

Unfortunately things are never entirely clear cut. For example, if I move all the implementation detail from an internal C function like jl_parse_eval_all out into the Julia layer in Base._include, what then? On some level it seems absurd that these implementation details of code loading then become visible to the user, when they weren't before.

@JeffBezanson
Copy link
Member

  • If you call stacktrace() from within JuliaInterpreter you get something completely different and unrelated to the stacktrace of the user code that is running. (It's all internal JuliaInterpreter guts, at least in the version I happen to have installed.)

This is relative to who you ask for the stacktrace --- if you ask the underlying julia runtime, you should get all the JuliaInterpreter guts, but if you ask JuliaInterpreter you should get the stack trace of its subject program.

I'll add that the other reason we have decided to hide frames in the past is if they would be part of every stack trace. For example if every REPL input begins evaluation with the same couple frames, the REPL should probably not show those. But I think removing various frames in the middle of a stack trace is a different story.

@c42f
Copy link
Member Author

c42f commented Apr 8, 2020

This is relative to who you ask for the stacktrace --- if you ask the underlying julia runtime, you should get all the JuliaInterpreter guts, but if you ask JuliaInterpreter you should get the stack trace of its subject program.

Sure, completely agreed. It's not so much that the internal heuristics fail us, but that JuliaInterpreter had a bug such that stacktrace() running inside JuliaInterpreter still uses the underlying runtime's view of the stack. It's just a bug which can be fixed (or may already have been, I need to update).

But rather than nitpicking at this one small part of the summary, what do you think about the point that there are some truly internal frames? Yesterday you argued that you never want to see the C frames from the runtime but allowed that C frames from external libs can be relevant to the user.

I think the distinction comes down to "Which frames exist only as an implementation detail of how the abstract julia call stack is concretely realized in the runtime?"

@c42f
Copy link
Member Author

c42f commented Apr 8, 2020

rather than nitpicking at this one small part of the summary

Hrm, I don't entirely like how I wrote that. To be quite clear, I'm not unhappy about nitpicking but I want to focus on the larger point: we manifestly do have hidden frames (currently all of the C frames and a few ad-hoc exceptions to make Julia frames hidden).

So can we agree on some definition for what an internal frame actually is?

@JeffBezanson
Copy link
Member

Yesterday you argued that you never want to see the C frames from the runtime but allowed that C frames from external libs can be relevant to the user.

I think the distinction comes down to "Which frames exist only as an implementation detail of how the abstract julia call stack is concretely realized in the runtime?"

Yes I agree.

@c42f
Copy link
Member Author

c42f commented Apr 9, 2020

Cool, so under that definition of "internal" things become clearer. If we restrict to discussing a possible Core.@_internal_function_meta (deleting the convenience version of the macro):

  • This meta is not a UI hint. Perhaps it should still be documented with a docstring, but only with a clear disclaimer about the intended purpose.
  • Functions like error should not use this
  • This not an annotation for use by Julia libraries. (Though an exception can be made for compiler-like libraries, eg those doing GPU codegen.)
  • It makes some sense for debug info generation to pay attention to the annotation in inlining.
  • Eventually we can upgrade our heuristics for C frames to only exclude those from libjulia or some such thing.

I could go either way on _include. Currently I feel that a lot of the code in loading.jl is actually internal. From an abstract machine point of view, include("foo.jl") "just executes" successive top level statements in foo.jl and for users it's consistent that there's no intervening frames. Much the same thing could be said of using and any eval machinery.

Perhaps that sounds iffy, but there's no slippery slope here. The same argument can not be extended to the rest of Base.

We can have a separate discussion about UI hints. One small point: users will not be tempted to abuse @_internal_function_meta if there is a compelling alternative.

@c42f
Copy link
Member Author

c42f commented Apr 16, 2020

For anyone who happens to be following with interest — we're deferring this for now to avoid getting too bogged down with difficult design work. There's bigger usability gains to be made elsewhere.

@c42f
Copy link
Member Author

c42f commented Oct 9, 2020

Rust elides some frames from the start and end of backtraces by default. (Mentioned in https://blog.rust-lang.org/2020/10/08/Rust-1.47.html#shorter-backtraces.)

Some implementation details:

@KristofferC
Copy link
Member

So the equivalent for us would be to elide frames 3-6 in

> julia foo.jl
 [1] error() at ./error.jl:42
 [2] top-level scope at /Users/kristoffercarlsson/juliamaster/foo.jl:1
 [3] include(::Function, ::Module, ::String) at ./Base.jl:380
 [4] include(::Module, ::String) at ./Base.jl:368
 [5] exec_options(::Base.JLOptions) at ./client.jl:296
 [6] _start() at ./client.jl:506
in expression starting at /Users/kristoffercarlsson/juliamaster/foo.jl:1

in julia foo.jl? For the REPL we already do this:

julia> stacktrace()
13-element Array{Base.StackTraces.StackFrame,1}:
 top-level scope at REPL[2]:1
 eval(::Module, ::Any) at boot.jl:331
 eval_user_input(::Any, ::REPL.REPLBackend) at REPL.jl:134
 repl_backend_loop(::REPL.REPLBackend) at REPL.jl:195
 start_repl_backend(::REPL.REPLBackend, ::Any) at REPL.jl:180
 run_repl(::REPL.AbstractREPL, ::Any; backend_on_current_task::Bool) at REPL.jl:292
 run_repl(::REPL.AbstractREPL, ::Any) at REPL.jl:288
 (::Base.var"#806#808"{Bool,Bool,Bool,Bool})(::Module) at client.jl:399
 #invokelatest#1 at essentials.jl:710 [inlined]
 invokelatest at essentials.jl:709 [inlined]
 run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at client.jl:383
 exec_options(::Base.JLOptions) at client.jl:313
 _start() at client.jl:506

julia> error("foo.jl")
ERROR: foo.jl
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] top-level scope at REPL[3]:1

@c42f
Copy link
Member Author

c42f commented Oct 9, 2020

Right, it's not very complicated. Just a slightly more general way to do what the REPL is already doing.

@BioTurboNick
Copy link
Contributor

Just linking this PR to my https://github.com/BioTurboNick/AbbreviatedStackTraces.jl and its primordial related PR #40537.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants