Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap build is failing on Mac OS X with "rust: task ... ran out of stack" #6061

Closed
pnkfelix opened this issue Apr 25, 2013 · 21 comments
Closed

Comments

@pnkfelix
Copy link
Member

Fresh clone of repo.
commit 1d53bab

% configure --disable-debug --disable-optimize  --enable-clang
% time make -j8
...
compile_and_link: x86_64-apple-darwin/stage1/lib/rustc/x86_64-apple-darwin/lib/libcore.dylib
rust: task 7fa031d00000 ran out of stack
compile_and_link: x86_64-apple-darwin/stage1/lib/rustc/x86_64-apple-darwin/lib/libstd.dylib
cp: x86_64-apple-darwin/stage2/lib/libcore.dylib
cp: x86_64-apple-darwin/stage1/lib/rustc/x86_64-apple-darwin/lib/libcore-*.dylib: No such file or directory
make: *** [x86_64-apple-darwin/stage2/lib/libcore.dylib] Error 1
make: *** Waiting for unfinished jobs....
rust: task 7fa058c0ad10 ran out of stack

real    4m54.362s
user    22m36.394s
sys 1m4.580s

This may be a duplicate of another issue, for example perhaps #6049. I'm honestly not sure.

At this point I have bisected the problem down to:

4a24f10 is the first bad commit
commit 4a24f10
Author: Huon Wilson [email protected]
Date: Wed Apr 24 22:29:19 2013 +1000

(But it is possible that Huon's code is just exposing a latent bug in rustc, rather than actually having something fundamentally wrong with it itself.) I plan to investigate backing out Huon's change to see if that makes the problem go away, just so I can actually bootstrap a compiler and do more investigation.

pnkfelix added a commit to pnkfelix/rust that referenced this issue Apr 25, 2013
…he generic `gen`."

This reverts commit 4a24f10
as part of Felix's attempt to resolve Issue rust-lang#6061.
pnkfelix added a commit to pnkfelix/rust that referenced this issue Apr 25, 2013
This reverts commit 9860fe1
as part of Felix's attempt to resolve Issue rust-lang#6061.
@pnkfelix
Copy link
Member Author

A theory from #rust IRC channel:

cuddling: @pnkfelix: regarding #6049/#6061 ... rustsqlite uses HashMap, so maybe gen() has a problem

@huonw
Copy link
Member

huonw commented Apr 25, 2013

Does RUST_LOG=::rt::backtrace make indicate where the infinite recursion is? (Assuming it is actually that.)

(Also, sorry for breaking everything :( )

@huonw
Copy link
Member

huonw commented Apr 25, 2013

And, does giving type hints, i.e. r.gen::<u64>() in hashmap, fix it? (I don't have an osx computer to experiment with.)

@catamorphism
Copy link
Contributor

I'm checking on this.

@brson
Copy link
Contributor

brson commented Apr 26, 2013

@pcwalton's recent ffi changes could have significantly changed the way stack accounting works. You could try running with RUST_MAX_STACK=20000000 (20MB, the default is 8MB) and see if that gives you enough overhead.

@brson
Copy link
Contributor

brson commented Apr 26, 2013

More specifically, C stacks were never accounted for before when deciding whether there was enough stack. I assume that now when we use the fast_ffi stacks, that is attributing an additional 2MB of stack to the task.

@huonw
Copy link
Member

huonw commented Apr 26, 2013

Would #[fixed_stack_segment] on the rustrt functions in rand.rs help this?

(I'm still not quite sure what it does, but it's helped with stack troubles before.)

@huonw
Copy link
Member

huonw commented Apr 26, 2013

@catamorphism (or @brson, or anyone else who can reproduce this) I just opened #6073, which reduces the number of C calls that the rand module does (they are then only necessary for generating the initial seed), which may fix this if it that is the problem. I can't test it so I'd appreciate if someone could throw some spare CPU cycles at it.

@pnkfelix
Copy link
Member Author

@huonw

% RUST_LOG=rustc=1,::rt::backtrace x86_64-apple-darwin/stage1/bin/rustc -Z verbose --cfg stage1    --target=x86_64-apple-darwin   -o x86_64-apple-darwin/stage1/lib/rustc/x86_64-apple-darwin/lib/libsyntax.dylib /tmp/rust/src/libsyntax/syntax.rc
rust: task 7f800b40ac80 ran out of stack

Maybe I am doing it wrong, or maybe our stack exhaustion handling code needs some hacking. Looking through the other suggestions.

@pnkfelix
Copy link
Member Author

@huonw commit 6d50d55 does not fix the issue for me.

That is, after fetching that repository, rebasing that commit to my incoming (commit 1d53bab), cleaning my build tree (via: ( rm -rf objdir/* src/libuv src/llvm && git checkout src/libuv src/llvm ) to side-step the problems with submodules I have been encountering lately), and then doing a configure+make as I wrote at the top of this isue, the same problem persists:

% RUST_LOG=rustc=1,::rt::backtrace make -j1 VERBOSE=1
cfg: build triple x86_64-apple-darwin
cfg: host triples x86_64-apple-darwin
cfg: target triples x86_64-apple-darwin
cfg: disabling rustc optimization (CFG_DISABLE_OPTIMIZE)
cfg: host for x86_64-apple-darwin is x86_64
cfg: os for x86_64-apple-darwin is apple-darwin
cfg: using clang
x86_64-apple-darwin/stage1/bin/rustc --cfg stage1    --target=x86_64-apple-darwin  --cfg rustc -o x86_64-apple-darwin/stage1/lib/rustc/x86_64-apple-darwin/bin/rustc /Users/fklock/Dev/Mozilla/rust.git/src/driver/driver.rs
rust: task 7fca11d00000 ran out of stack
cp x86_64-apple-darwin/stage1/lib/rustc/x86_64-apple-darwin/bin/rustc x86_64-apple-darwin/stage2/bin/rustc
cp: x86_64-apple-darwin/stage1/lib/rustc/x86_64-apple-darwin/bin/rustc: No such file or directory
make: *** [x86_64-apple-darwin/stage2/bin/rustc] Error 1

@brson Increasing the stack size by various reasonable+unreasonable amounts does not fix the problem:

% time RUST_MAX_STACK=10000000 x86_64-apple-darwin/stage1/bin/rustc --cfg stage1    --target=x86_64-apple-darwin  --cfg rustc -o x86_64-apple-darwin/stage1/lib/rustc/x86_64-apple-darwin/bin/rustc /Users/fklock/Dev/Mozilla/rust.git/src/driver/driver.rs
rust: task 7fbb50d00000 ran out of stack

real    0m0.376s
user    0m0.364s
sys 0m0.011s
% time RUST_MAX_STACK=100000000 x86_64-apple-darwin/stage1/bin/rustc --cfg stage1    --target=x86_64-apple-darwin  --cfg rustc -o x86_64-apple-darwin/stage1/lib/rustc/x86_64-apple-darwin/bin/rustc /Users/fklock/Dev/Mozilla/rust.git/src/driver/driver.rs
rust: task 7f950ad00000 ran out of stack

real    0m3.734s
user    0m3.694s
sys 0m0.039s
% time RUST_MAX_STACK=1000000000 x86_64-apple-darwin/stage1/bin/rustc --cfg stage1    --target=x86_64-apple-darwin  --cfg rustc -o x86_64-apple-darwin/stage1/lib/rustc/x86_64-apple-darwin/bin/rustc /Users/fklock/Dev/Mozilla/rust.git/src/driver/driver.rs
rust: task 7fdde2d00000 ran out of stack

real    0m37.261s
user    0m36.967s
sys 0m0.295s

(I still haven't investigating what it is about the commits that causes the problem. I plan to do that now; I just wanted to make sure I addressed these comments first.)

bors added a commit that referenced this issue Apr 26, 2013
See #6061. Reverting these commits fixes it; later we'll isolate the problem.
@brson
Copy link
Contributor

brson commented Apr 26, 2013

FWIW I opened pull request #6081 to fix some longstanding issues about recursion limits. It doesn't sound like it will fix this problem though.

@pnkfelix
Copy link
Member Author

@huonw A new discovery: rather than reverting the entirety of the rand-related commits, this much smaller change: pnkfelix/rust@b9aaa79 on my own repo (branch: investigate-issue-6061) also appears to fix the stack exhaustion problem.

That smaller change was largely a shot in the dark on my part (it just happened to be the first in a series of planned changes to narrow down the reversion to something we can actually analyze). It removes a trait impl of the form:

// Allow direct chaining with `task_rng`
impl<R: Rng> Rng for @R {
    fn next(&self) -> u32 { (*self).next() }
}

and then puts in some supplementary explicit derefs that I'm guessing were meant to be main unnecessary by the direct chaining, but became necessary after removing the above impl.

I'll admit, I find the whole matter bizarre; @pcwalton asserted yesterday that we should be auto-deref'ing on these sorts of method invocations anyway. Maybe that's a hint to what is going on here, some sort of nasty interaction between the above impl and the auto-deref'ing?

@pnkfelix
Copy link
Member Author

I freely admit that the auto-derefs I removed do not look like they have anything to do with the impl I removed. Maybe I was mistaken about their relationship, but I really thought I tried to compile with just the impl removed and found rustc complained about the auto-derefs I referenced.

Anyway, at this point I have further narrowed down the change that brings bootstrapping back to a working state to the following:

diff --git a/src/libcore/hashmap.rs b/src/libcore/hashmap.rs
index d2be041..c31905b 100644
--- a/src/libcore/hashmap.rs
+++ b/src/libcore/hashmap.rs
@@ -56,7 +56,7 @@ fn resize_at(capacity: uint) -> uint {
 pub fn linear_map_with_capacity<K:Eq + Hash,V>(
     initial_capacity: uint) -> HashMap<K, V> {
     let r = rand::task_rng();
-    linear_map_with_capacity_and_keys(r.gen(), r.gen(),
+    linear_map_with_capacity_and_keys((*r).gen(), (*r).gen(),
                                       initial_capacity)
 }

Curiouser and curiouser. I am still looking.

@pnkfelix
Copy link
Member Author

Okay I finally had the bright idea to make a debug-enabled build and set a breakpoint at the point in the C++ code rust_task::new_stack where it issues the "ran out of stack" message. Now I can see the infinite recursion:

#0  rust_task::new_stack (this=0x103500000, requested_sz=24) at /Users/pnkfelix/Dev/Mozilla/rust.git/src/rt/rust_task.cpp:532
#1  0x000000010332643f in __morestack ()
#2  0x0000000103317771 in rust_task::return_c_stack () at /Users/pnkfelix/Dev/Mozilla/rust.git/src/rt/rust_task.h:472
#3  0x0000000103317771 in rust_task::call_on_c_stack (this=0x103500000, args=0x81fb80, fn_ptr=0x100000) at rust_task.h:477
#4  0x0000000103318a50 in rust_task::next_stack (this=<value temporarily unavailable, due to optimizations>, stk_sz=<value temporarily unavailable, due to optimizations>, args_addr=0x105525130, args_sz=<value temporarily unavailable, due to optimizations>) at rust_task.h:588
#5  0x000000010017afb9 in __morestack ()
#6  0x00000001000a2394 in rand::__extensions__::next_12073::_4b24ebc0314a72c::_07pre ()
#7  0x00000001000a23b0 in rand::__extensions__::next_12073::_4b24ebc0314a72c::_07pre ()
#8  0x00000001000a23b0 in rand::__extensions__::next_12073::_4b24ebc0314a72c::_07pre ()
...
#32764 0x00000001000a23b0 in rand::__extensions__::next_12073::_4b24ebc0314a72c::_07pre ()
#32765 0x00000001000a23b0 in rand::__extensions__::next_12073::_4b24ebc0314a72c::_07pre ()
#32766 0x000000010017afe7 in __morestack ()

@pnkfelix
Copy link
Member Author

So at this point I'm fairly confident that the stack exhaustion problem that we are observing here is because the trait from the comment above is erroneous: the way its code is written, it makes one think that it is doing one level of dereference (a single unrolling), but in fact it is compiling into a self-recursive call, at least on my host.

But I think I understand why @huonw put in that impl; if you leave it out and try to rely on auto-derefs, the compilation fails with:

/Users/pnkfelix/Dev/Mozilla/rust.git/src/libcore/hashmap.rs:59:38: 59:46 error: failed to find an implementation of trait rand::Rng for <V5>
/Users/pnkfelix/Dev/Mozilla/rust.git/src/libcore/hashmap.rs:59     linear_map_with_capacity_and_keys(r.gen(), r.gen(),
                                                                                                     ^~~~~~~~
make: *** [x86_64-apple-darwin/stage0/lib/rustc/x86_64-apple-darwin/lib/libcore.dylib] Error 101

and, after fixing that:

/Users/pnkfelix/Dev/Mozilla/rust.git/src/libcore/rand.rs:702:4: 703:1 error: failed to find an implementation of trait rand::Rng for <V0>
/Users/pnkfelix/Dev/Mozilla/rust.git/src/libcore/rand.rs:702     task_rng().gen()
/Users/pnkfelix/Dev/Mozilla/rust.git/src/libcore/rand.rs:703 }
make: *** [x86_64-apple-darwin/stage0/lib/rustc/x86_64-apple-darwin/lib/libcore.dylib] Error 101

My hypothesis is that the latter failures are problems in our auto-deref insertion logic, at least at the level of the static analysis before code generation.

The question is: Why is it that the resulting code works at all on some (many!) targets, but compiles into the infinite loop on others? Very very strange.

I'm also not sure about whether it makes sense to be able to impl a trait that has &self methods for a non-ref type. Or at least this seems potentially confusing. What I mean by this when you look at code like this:

trait Fooable { fn foo(&self) -> int; }
impl Fooable for int { fn foo(&self) -> int { 3 } }
impl Fooable for @str { fn foo(&self) -> int { 4 } }

Would *self denote an int in the first case and a str in the second case (which seems like magic)? I only wrote the above example off the top of my head; I'll need to double-check that I even got the details right later. But my point is, a bug like the Rng issue above seems like it might be arising in part due to confusion in scenarios like the above.

@Blei
Copy link
Contributor

Blei commented Apr 28, 2013

Hi all, I think I have a fix:

diff --git a/src/libcore/rand.rs b/src/libcore/rand.rs
index cdf6a5b..5294439 100644
--- a/src/libcore/rand.rs
+++ b/src/libcore/rand.rs
@@ -690,7 +690,7 @@ pub fn task_rng() -> @IsaacRng {

 // Allow direct chaining with `task_rng`
 impl<R: Rng> Rng for @R {
-    fn next(&self) -> u32 { (*self).next() }
+    fn next(&self) -> u32 { (**self).next() }
 }

 /**

The problem is that the first * only dereferences the &, but self at this point is &@Rng so you keep endlessly recursing.

@pnkfelix
Copy link
Member Author

@Blei your hypothesis is plausible. But it does not explain why the infinite loop is only arising on some targets. I think there is still a latent platform-dependent bug there, perhaps in trait method resolution or perhaps in the auto-deref generation.

@huonw
Copy link
Member

huonw commented Apr 28, 2013

@Blei, (maybe) you could/should open a pull request with that fix, so that this get solved. The platform dependency can get investigated/solved progressively.

FWIW, the following gives a run out of stack error. (On a computer that doesn't exhibit it when bootstrapping.)

trait Foo {
    fn x(&self);
}

impl Foo for int {
    fn x(&self) {}
}

impl<F:Foo> Foo for @F {
    fn x(&self) {
        (*self).x()
    }
}

fn main() {
    (@1i).x()
}

@pnkfelix
Copy link
Member Author

An interesting detail I just discovered while trying to understand why the infinite recursion is not exhibited across all architectures: when I do a bootstrap build, the default configure settings for Mac OS X (namely x86_64-apple-darwin) cause the infinite recursion, but with this setting:
configure --build-triple=i686-apple-darwin
the infinite recursion does not occur.

Clarification: the above experiment was applied to the commit preceding Blei's fix (SHA: cdd342b); the whole point is to find out why the code before the fix was not behaving uniformly across the board (either breaking everywhere or working everywhere).

@bors bors closed this as completed in cdd342b Apr 30, 2013
@pnkfelix
Copy link
Member Author

pnkfelix commented May 1, 2013

I think I was mistaken in earlier comment when I claimed that the relevant bit was the choice between x86_64 and i686 -- there were a number of other settings on my original configure call, and subsequent attempts to investigate/replicate the problem have made me realize that the relevant bit might in fact be:

configure --disable-optimize

I am still investigating. This particular issue has been closed, but there is a more insidious bug hiding here I believe, and if I can replicate it readily, I will open a fresh issue for it.

@brson
Copy link
Contributor

brson commented May 2, 2013

While I was bisecting through this commit range I saw this problem on linux too with --disable-optimize

flip1995 pushed a commit to flip1995/rust that referenced this issue Apr 8, 2021
Lint: filter(Option::is_some).map(Option::unwrap)

Fixes rust-lang#6061

*Please write a short comment explaining your change (or "none" for internal only changes)*
changelog:
* add new lint for filter(Option::is_some).map(Option::unwrap)

First Rust PR, so I'm sure I've violated some idioms. Happy to change anything.

I'm getting one test failure locally -- a stderr diff for `compile_test`. I'm having a hard time seeing how I could be causing it, so I'm tentatively opening this in the hopes that it's an artifact of my local setup against `rustc`. Hoping it can at least still be reviewed in the meantime.

I'm gathering that since this is a method lint, and `.filter(...).map(...)` is already checked, the means of implementation needs to be a little different, so I didn't exactly follow the setup boilerplate. My way of checking for method calls seems a little too direct (ie, "is the second element of the expression literally the path for `Option::is_some`?"), but it seems like that's how some other lints work, so I went with it. I'm assuming we're not concerned about, eg, closures that just end up equivalent to `Option::is_some` by eta reduction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants