Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quit from filed-in script can cause silent nil DNU #setToEnd error #1184

Closed
daniels220 opened this issue Mar 24, 2023 · 2 comments · Fixed by #1255
Closed

Quit from filed-in script can cause silent nil DNU #setToEnd error #1184

daniels220 opened this issue Mar 24, 2023 · 2 comments · Fixed by #1255
Labels

Comments

@daniels220
Copy link
Contributor

Describe the bug
A filed-in script (either from a menu or starting the image with -f) that ends by calling SessionManager current quit[:] may result in a brief flash of a walkback dialog, and logging to the .errors file, of an UndefinedObject does not understand #setToEnd error due to SourceManager default changesFiler having closed and nil'ed its underlying stream.

Specifically, this only occurs if the script ends with a chunk marker followed by at least one more character (including—indeed most likely—whitespace), e.g. SessionManager current quit!\r\n (a trailing newline). The error occurs trying to log an empty chunk, after evaluating the chunk containing the quit.

Worth noting also that this can only occur in a GUI session, because in a console session the primQuit: that actually exits the image is issued synchronously rather than in a postToInputQueue block.

Several things contribute to this, suggesting several possible fixes:

  1. ChunkSourceFiler>>fileIn does a skipSeparators after deciding whether another chunk may be present. Simply changing the start of the method to [stream skipSeparators; atEnd] whileFalse: [(stream peekFor: $!) ...etc... would eliminate this particular problem, and this seems like a reasonable thing to do anyway to avoid extraneous empty entries in the changes file.
  2. There is some inconsistency as to whether the SourceManager's subsidiary filers are simply closed and left in place, or removed from the SourceFiles array as well.
  • SourceManager>>onExit just closes them, while #closeSources removes/nils them as well.
  • #hasSources checks that the streams are actually present, but most other methods (all the #log... and #store... methods that write to the changes file, as well as #getSourceFromDescriptor:) only checks that the filter is not nil.
  • Removing/niling the files in onExit would cause any attempted logging this late in the shutdown process to be silently skipped. This I'm not sure about, since it really ought not to be necessary—I do think this particular logging attempt should be stopped earlier as per (1), and in general, closing the sources/changes files is the very last thing that happens before SessionManager>>onExit returns and the final primQuit: is queued, well past the point of "second childishness and mere oblivion" (I love those comments by the way).
  1. It is weird, at a very basic level, that quit[:] returns normally to its sender. If I were to write something like:

    "Almost like a ^return, but for a multi-chunk script file, and setting the exit code"
    someErrorCondition ifTrue: [SessionManager current quit: 1].
    self doSomeOtherStuff.
    

    I would certainly expect doSomeOtherStuff not to execute when the condition is true, but actually it would. Indeed the entire rest of the script, multiple chunks and all, would attempt to continue executing, though it would be brought up short on the next chunk by the DNU #setToEnd error, and the resulting forkMain from the walkback dialog would process the primQuit: just as it does when the following chunk is empty.

    I assume the reason the primQuit: is deferred using postToInputQueue: is to avoid executing it within a callback? (Are there any other conditions where we'd want to defer the quit?) That would make sense, for the same reason that we clean up various resources nicely even though Windows would ultimately clean up after us once the VM terminates. We could take a more nuanced approach, though. If we're not in a callback, just issue the primQuit: immediately. If we are, do the postToInputQueue, but then terminate (or kill?) the active process. (Interesting point of similarity with SessionManager>>onStartup:, which also ends with a kill, for a different but related reason.) Or, rather than postToInputQueue, we could just fork a process to do the primQuit:.

To Reproduce

  1. File in a script ending with SessionManager current quit! , including a trailing space, newline, etc.
  2. You may see a brief flash of walkback dialog, and in any case, note that the .errors file contains a stack dump like:
 2023-03-24T13:31:35.0988149-04:00: Unhandled exception - a MessageNotUnderstood('UndefinedObject does not understand #setToEnd')

Object>>doesNotUnderstand:
ChunkSourceFiler(SourceFiler)>>setToEnd
SourceManager>>changesFiler
SourceManager>>logEvaluate:
Compiler class>>evaluate:for:evaluationPools:logged:ifFail:
Compiler class>>evaluate:for:evaluationPools:logged:
Compiler class>>evaluate:for:logged:
[] in ChunkSourceFiler>>fileIn
ExceptionHandler(ExceptionHandlerAbstract)>>markAndTry
[] in ExceptionHandler(ExceptionHandlerAbstract)>>try:
BlockClosure>>ifCurtailed:
BlockClosure>>ensure:
ExceptionHandler(ExceptionHandlerAbstract)>>try:
BlockClosure>>on:do:
ChunkSourceFiler>>fileIn
SourceManager>>fileInFrom:normalizeLineEndings:
[] in SourceManager>>fileIn:normalizeLineEndings:
BlockClosure>>ifCurtailed:
BlockClosure>>ensure:
SourceManager>>fileIn:normalizeLineEndings:
[] in RefactoringSmalltalkSystem(SmalltalkSystem)>>fileFileIn
BlockClosure>>ifCurtailed:
BlockClosure>>ensure:
Cursor>>showWhile:
RefactoringSmalltalkSystem(SmalltalkSystem)>>fileFileIn
Symbol>>forwardTo:
CommandDescription>>performAgainst:
[] in Command>>value
BlockClosure>>ifCurtailed:
BlockClosure>>ensure:
Command>>value
ShellView>>performCommand:
SmalltalkSystemShell(Shell)>>performCommand:
CommandQuery>>perform
DelegatingCommandPolicy(CommandPolicy)>>route:
[] in ShellView(View)>>onCommand:
BlockClosure>>ifCurtailed:
BlockClosure>>ensure:
Cursor>>showWhile:
ShellView(View)>>onCommand:
ShellView(View)>>wmCommand:wParam:lParam:
ShellView(View)>>dispatchMessage:wParam:lParam:
[] in InputState>>wndProc:message:wParam:lParam:cookie:
BlockClosure>>ifCurtailed:
GuiInputState(InputState)>>wndProc:message:wParam:lParam:cookie:
GuiInputState(InputState)>>pumpMessage:
GuiInputState(InputState)>>loopWhile:
GuiInputState(InputState)>>mainLoop
[] in GuiInputState(InputState)>>forkMain
ExceptionHandler(ExceptionHandlerAbstract)>>markAndTry
[] in ExceptionHandler(ExceptionHandlerAbstract)>>try:
BlockClosure>>ifCurtailed:
BlockClosure>>ensure:
ExceptionHandler(ExceptionHandlerAbstract)>>try:
BlockClosure>>on:do:
[] in BlockClosure>>newProcess:
----------------------------------------------------------------------------------------------------

Expected behavior
Image exist cleanly.

Please complete the following information):

  • OS: Windows 10
@daniels220 daniels220 added the bug label Mar 24, 2023
@blairmcg
Copy link
Contributor

blairmcg commented Mar 25, 2023

The quit being deferred is, as you surmise, to avoid it being processed inside a callback, although it's only actually necessary where there is an underlying user->kernel->user mode transition such as when dispatching a windows message. See 2647cf7, #355, and dolphinsmalltalk/DolphinVM#128 for more explanation. It is a workaround hack and really needs a proper solution in the VM. Not deferring when not inside a callback would probably improve your scenario, but at the expense of introducing another confusing variability.

Forking the quit into a separate process won't work reliably because the forked processes are green threads and so all run on only one real OS thread, and therefore one machine stack. That means the callback for one process might still be active when another calls the quit primitive.

A proper fix for the VM is not immediately obvious so needs more thought than I've ever given it. You are right that for the normal case of Dolphin running in its own process, just exiting the process directly from the primitive should be fine as the OS will clean up all the resources anyway. However, a solution to exit cleanly without killing the process is still required when running in-proc as a DLL hosted in another process, e.g. in some COM scenarios.

Closing the source files fully in the way expected by the source manager would seem the right thing to do anyway, and would avoid the problem in your scenario. Whether or not the file-in should skip the trailing whitespace seems like a separate matter and only avoids the failure as a side effect.

A temporary workaround is, ironically, to post the quit to the input queue. This is effectively what the "RegressionTestsRun.st" script is doing,

Lastly, I was thinking it would be a good idea if there was a command line option to specify to exit after processing any file-ins, but there is one already. If you are able to add -x to the end of your command line, then you won't need to quit from the file-in.

@daniels220
Copy link
Contributor Author

Oooh, I remember seeing #355 and being sympathetically frustrated with the Windows change...the u->k->u exception throw seems like a potentially useful thing for programs in general, not just interpreters. And it looks like you're already aware that there are situations—mainly if the quit: is issued from outside the UI process, or from what was once a UI process but a new one has taken over the job—where there might still be trouble. (Re: my suggestion of doing the quit: in a forked process—that was only in the scenario where we are in a callback and escape from it by terminating the active process—then once we know we aren't, a forked process can handle it without spinning up the event-loop machinery again. "Terminating the active process" is of course not totally reliable either, as the callback could be from another process...)

Overall, yeah, seems like a thorny problem, and I certainly don't know enough about Win32 or the Dolphin VM to have any idea. Good luck, if/when you ever decide to tackle it.

Re: when to skip the trailing whitespace—I agree it only avoids this problem as a side-effect, but it seems equally clearly the right thing to do anyway. If you look at the method, it skips that whitespace immediately after checking atEnd—if only whitespace is left, really it will be atEnd by the time it actually does anything, so probably should have skipped it before checking.

So, seems like both that and the change to fully close the source files would make sense to make? I can make a PR in the next couple days I think.

Re: workarounds—yes, explicitly posting the whole quit to the input queue does work—I realize that I have one script that does this already, because it's doing a save and quit and wants to do so without the tail end of the script execution on the stack, so it not only posts the quit but does so as a MessageSequence rather than a block, to completely divorce it from its origins (a little bit of inspiration from the ImageStripper, I think?).

Of course, there's an even simpler workaround—just leave off the last chunk marker in the file! Posting the quit is probably more robust though.

I had forgotten about -x—that would work sometimes, but doesn't allow setting the exit code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants