Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: run tests with a release+asserts build and 4 workers #11614

Closed
wants to merge 2 commits into from

Conversation

JeffBezanson
Copy link
Sponsor Member

Hoping this will help #11553 a bit. Seems to take about the same amount of time, but with half as many workers. (The tests are roughly 2x slower in a debug build.)

Were there reasons to run the tests in debug mode other than assertions?

…cores

this will hopefully run faster and use less memory
@tkelman
Copy link
Contributor

tkelman commented Jun 8, 2015

Were there reasons to run the tests in debug mode other than assertions?

Differences in backtraces maybe? If this helps (which at least from the small sample size so far it looks like it does) I say go for it.

@tkelman
Copy link
Contributor

tkelman commented Jun 8, 2015

At the time #5228 was merged, apparently the Travis builds only took 5 minutes. Crazy how things have ballooned since then. Since we are doing the separate "starts without sys.so" test now, and I think we've been putting debug symbols in libjulia.so by default for a while, I'm not sure if running with a whole debug build actually helps catch much of anything. We can keep an eye out by running tests with julia-debug on the buildbots if we need to.

@JeffBezanson
Copy link
Sponsor Member Author

If we can fix #10205 and this PR works out it will help a lot.

@tkelman
Copy link
Contributor

tkelman commented Jun 8, 2015

I'm not so sure the builds of C libraries are that big of a contributor to the build time, but if we can find an up-to-date enough PPA or bring them into juliadeps it'll help some.

If this speeds things up enough, it might also be worth trying to bring back osx builds. I'm trying that and cleaning some other things up on a branch.

@tkelman
Copy link
Contributor

tkelman commented Jun 8, 2015

6 led to a gc segfault apparently, not an oom killer but not a good sign either - https://travis-ci.org/JuliaLang/julia/jobs/65938347

@yuyichao
Copy link
Contributor

yuyichao commented Jun 8, 2015

Incidently, that's the same codepath I mentioned in #11606 (comment) ...

Edit: and I'm running a collection here just to make sure if it's an issue or not...

@tkelman
Copy link
Contributor

tkelman commented Jun 8, 2015

And the stack overflow in dates https://travis-ci.org/JuliaLang/julia/jobs/65938351 has been happening on the buildbots a bunch but this might be the first time I've seen it on Travis I think, so I haven't filed a separate issue for it yet.

@tkelman
Copy link
Contributor

tkelman commented Jun 8, 2015

I'd be in favor of merging if you rebase out the second commit, or just cherry-picking the first.

@tkelman tkelman deleted the jb/travistweaks branch June 9, 2015 00:46
@yuyichao
Copy link
Contributor

yuyichao commented Jun 9, 2015

So it happens with 4 workers as well? https://travis-ci.org/JuliaLang/julia/jobs/65979278
When did it first happens on the buildbot?

@tkelman
Copy link
Contributor

tkelman commented Jun 9, 2015

If I had to guess, I'd say probably shortly after the tuple overhaul. There's a similar-looking stack overflow in Enums that happened at 0cd2677 on this build http://buildbot.e.ip.saba.us:8010/builders/build_ubuntu14.04-x86/builds/1366/steps/shell_2/logs/stdio and one in dates a few days later http://buildbot.e.ip.saba.us:8010/builders/build_ubuntu14.04-x86/builds/1423/steps/shell_2/logs/stdio

@yuyichao
Copy link
Contributor

yuyichao commented Jun 9, 2015

Does it make sense to print frame number in the backtrace? I find it quite hard to compare the backtrace printed with the symbols stript or not....

@yuyichao
Copy link
Contributor

yuyichao commented Jun 9, 2015

I run the sparse test 300 times last night and got 3 times this stackoverflow error and one

     * sparse              exception on 1: ERROR: LoadError: LoadError: assertion failed: |F' \ ones(elty,5) - full(A1pd)' \ ones(5)| <= 1.1641532182693481e-5
  F' \ ones(elty,5) = [1.8239389082351972e6
 934570.1978360965
 8.603310773126363
 30.915492846568245
 5.86573069208295]
  full(A1pd)' \ ones(5) = [1.8239389082799375e6,934570.1978590211,8.603310773126363,30.91549284656825,5.86573069208295]
  difference = 4.474027082324028e-5 > 1.1641532182693481e-5
 in error at ./error.jl:22
 in test_approx_eq at ./test.jl:139
 in anonymous at ./no file:382
 in include at ./boot.jl:253
 in include_from_node1 at ./loading.jl:133
 in include at ./boot.jl:253
 in runtests at /home/yuyichao/projects/julia/master/test/testdefs.jl:197
 in anonymous at ./multi.jl:644
 in run_work_thunk at ./multi.jl:605
 in remotecall_fetch at ./multi.jl:678
 in remotecall_fetch at ./multi.jl:693
 in anonymous at ./task.jl:1422
while loading /home/yuyichao/projects/julia/master/test/sparsedir/cholmod.jl, in expression starting on line 318
while loading /home/yuyichao/projects/julia/master/test/sparse.jl, in expression starting on line 6
ERROR: LoadError: LoadError: LoadError: assertion failed: |F' \ ones(elty,5) - full(A1pd)' \ ones(5)| <= 1.1641532182693481e-5
  F' \ ones(elty,5) = [1.8239389082351972e6
 934570.1978360965
 8.603310773126363
 30.915492846568245
 5.86573069208295]
  full(A1pd)' \ ones(5) = [1.8239389082799375e6,934570.1978590211,8.603310773126363,30.91549284656825,5.86573069208295]
  difference = 4.474027082324028e-5 > 1.1641532182693481e-5
 in error at ./error.jl:22
 in test_approx_eq at ./test.jl:139
 in anonymous at ./no file:382
 in include at ./boot.jl:253
 in include_from_node1 at ./loading.jl:133
 in include at ./boot.jl:253
 in runtests at /home/yuyichao/projects/julia/master/test/testdefs.jl:197
 in anonymous at ./multi.jl:644
 in run_work_thunk at ./multi.jl:605
 in remotecall_fetch at ./multi.jl:678
 in remotecall_fetch at ./multi.jl:693
 in anonymous at ./task.jl:1422
while loading /home/yuyichao/projects/julia/master/test/sparsedir/cholmod.jl, in expression starting on line 318
while loading /home/yuyichao/projects/julia/master/test/sparse.jl, in expression starting on line 6
while loading /home/yuyichao/projects/julia/master/test/runtests.jl, in expression

Which looks like a normal precision error for me. Does the result make sense for the input and should we relax the requirement here a little bit?

@tkelman
Copy link
Contributor

tkelman commented Jun 9, 2015

Which looks like a normal precision error for me. Does the result make sense for the input and should we relax the requirement here a little bit?

Probably. cc @andreasnoack

@andreasnoack
Copy link
Member

I think it is better to set the seed to have it deterministic. If we relax the tolerance then it will just happen again, but will smaller probability.

@StefanKarpinski
Copy link
Sponsor Member

Agreed. The seed should be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants