Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several testsuite tests fail when compiling with clang under OSX #74

Closed
heplesser opened this issue Aug 6, 2015 · 13 comments
Closed

Several testsuite tests fail when compiling with clang under OSX #74

heplesser opened this issue Aug 6, 2015 · 13 comments
Assignees
Labels
I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) S: Normal Handle this with default priority T: Bug Wrong statements in the code or documentation ZC: Installation DO NOT USE THIS LABEL ZP: PR Created DO NOT USE THIS LABEL

Comments

@heplesser
Copy link
Contributor

As first reported by Mario Mulansky on NEST User (17 July 2015), several testsuite tests fail when compiling NEST under OSX using the clang compiler.

To reproduce:

  • OSX 10.10.4
  • Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
  • ../src/configure --prefix=pwd/install --without-openmp
  • GSL 1.16 from homebrew
  • NEST master branch aeb4165

The following tests fail:

  • Running test unittests/test_aeif_cond_alpha_multisynapse.sli... Failed: segmentation fault
  • Running test unittests/test_mip_corrdet.sli... Failed: segmentation fault
  • Running test unittests/test_recorder_close_flush.sli... Failed: missed C++ assertion
  • Running test regressiontests/ticket-80-175-179.sli... Failed: segmentation fault
  • All 15 nest.tests.test_connect_distributions tests
@heplesser
Copy link
Contributor Author

Further observations:

  • The errors also occur if NEST is compiled with clang using --with-debug --with-optimize CFLAGS=-O0 CXXFLAGS=-O0
  • Errors occur in different locations on subsequent runs (backtrace)
  • Occasionally, instead of failing, tests enter an infinite loop
  • No other tests in the testsuite fail
  • If compiled for debugging as above, then also regressiontests/ticket-787.sli fails
    • This is because SLI command ctermid triggers a segmentation fault
    • This does not occur if NEST is not built for debugging
    • On one occasion, skipping the test on ctermid made NEST enter an infinite loop on Hz
    • Failure on ctermid happens reliably on every run in the same location, see below
  • The errors also occur if I hide my Homebrew directory during compilation and testing, so this does not seem to be related to interference between system and other versions of the same library.
Process 96435 stopped
* thread #1: tid = 0x9f7c3, 0x00000001004e9f5c nest`TokenArrayObj::capacity(this=0x00007974742f77c5) const + 12 at tarrayobj.h:86, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x7974742f77dd)
    frame #0: 0x00000001004e9f5c nest`TokenArrayObj::capacity(this=0x00007974742f77c5) const + 12 at tarrayobj.h:86
   83     size_t
   84     capacity( void ) const
   85     {
-> 86       return ( size_t )( end_of_free_storage - p );
   87     }
   88   
   89     Token& operator[]( size_t i )
(lldb) bt
* thread #1: tid = 0x9f7c3, 0x00000001004e9f5c nest`TokenArrayObj::capacity(this=0x00007974742f77c5) const + 12 at tarrayobj.h:86, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x7974742f77dd)
  * frame #0: 0x00000001004e9f5c nest`TokenArrayObj::capacity(this=0x00007974742f77c5) const + 12 at tarrayobj.h:86
    frame #1: 0x00000001004e9ee0 nest`TokenArrayObj::push_back(this=0x00007974742f77c5, t=0x00007fff5fbff4e8) + 32 at tarrayobj.h:158
    frame #2: 0x00000001004d4b28 nest`TokenStack::push(this=0x00007974742f77c5, e=0x00007fff5fbff4e8) + 40 at tokenstack.h:63
    frame #3: 0x00000001004380da nest`Processes::CtermidFunction::execute(this=0x0000000100c1e788, i=0x00007974742f7665) const + 218 at processes.cc:889
    frame #4: 0x0000000100416c02 nest`Datum::execute(this=0x0000000101010cd8, i=0x00007fff5fbff898) + 50 at datum.h:179
    frame #5: 0x000000010041144b nest`SLIInterpreter::execute_(this=0x00007fff5fbff898, exitlevel=0) + 571 at interpret.cc:1375
    frame #6: 0x0000000100411e0a nest`SLIInterpreter::execute(this=0x00007fff5fbff898, v=0) + 282 at interpret.cc:1287
    frame #7: 0x0000000100000ac7 nest`main(argc=3, argv=0x00007fff5fbffa88) + 103 at main.cpp:45
    frame #8: 0x00007fff937f55c9 libdyld.dylib`start + 1
    frame #9: 0x00007fff937f55c9 libdyld.dylib`start + 1

@mariomulansky
Copy link

Please let me know if you need further information on this.

On a short side note: if you prefer reports on Github instead the mailing list, I suggest to change the corresponding message shown when some tests are failing.

@jougs
Copy link
Contributor

jougs commented Aug 18, 2015

@mariomulansky: See #86 for a pull request that fixes the web references in the source code. Thanks for pointing this out!

@borismarin
Copy link

I am getting the same segfaults on OSX 10.10.5, using either Apple LLVM version 7.0.0 (clang-700.1.76) (--without-openmp), or clang-omp (--with--openmp).

https://gist.github.com/borismarin/064587c7536a2f9d1c1e

@tammoippen
Copy link
Contributor

Please also see #205, especially this comment by @heplesser:

I have explored the test failures a bit further (all with clang-omp):

  • test_aeif_cond_alpha_multisynapse.sli
    • fails on most runs with a segfault, usually somewhere during update()
    • detailed locations vary
    • on rare occasions, it does not crash, but appears to hang
    • in one case, it seg faulted during finalize_nodes()
    • the test uses only a single thread
  • test_mip_corrdet_sli
    • uses single thread
    • seg faults reliably
    • mostly in finalize_nodes(), once in update()
  • test_multithreading_devices.sli
    • uses 1 or 2 threads
    • seg faults reliably
    • test_multithreading.sli passes reliably
  • test_recorder_close_flush.sli
    • seg faults reliably
    • on one occasion, it ran for a while before segfaulting
  • ticket-80-175-179.sli
    • seg faults in most cases
    • hangs occasionally

Tests coming after ticket-80-175-179.sli have not run yet.

@abigailm
Copy link
Contributor

@heplesser , @jougs - what is the status with this problem, and what needs to happen to close the issue?

@heplesser
Copy link
Contributor Author

@abigailm I will check whether these tests also fail under OSX 10.11 and will report back soon.

@heplesser heplesser self-assigned this Aug 1, 2016
@heplesser heplesser added the T: Bug Wrong statements in the code or documentation label Aug 1, 2016
@heplesser
Copy link
Contributor Author

I just tried with the newest version o Apple Clang that ships with XCode 8:

  • NEST master 38a9608
  • OSX 10.11.6
  • Apple LLVM version 8.0.0 (clang-800.0.38)
  • Python 2.7.12 :: Continuum Analytics, Inc. (Anaconda)

Results:

  • Running test 'unittests/test_aeif_cond_alpha_multisynapse.sli'... Failed: segmentation fault
  • Running test 'unittests/test_recorder_close_flush.sli'... Failed: segmentation fault
  • Running test 'regressiontests/ticket-80-175-179.sli'... [Hung & killed]
  • Python tests (nosetests) fail with
    Failure: ImportError (dlopen(/Users/plesser/NEST/code/bld_clang/install/lib/python2.7/site-packages/nest/pynestkernel.so, 8): Symbol not found: __ZNK12lockPTRDatumIN4nest12AbstractMaskEXadL_ZNS0_14TopologyModule8MaskTypeEEEE4infoERNSt3__113basic_ostreamIcNS4_11char_traitsIcEEEE

@seeholza
Copy link
Contributor

seeholza commented Sep 27, 2016

I have very similar issues compiling the master branch https://github.com/nest/nest-simulator/tree/4b94532f5e592883e3ba87969dd3dd2a3eb32c22:

  • OSX 10.11.6
  • Apple LLVM version 8.0.0 (clang-800.0.38)
  • MacPorts Python 2.7.10

Results are exactly the same.

For reference, all tests pass when compiling with gcc/g++, so calling cmake via e.g.

cmake -DCMAKE_C_COMPILER=/opt/local/bin/gcc-mp-4.9 -DCMAKE_CXX_COMPILER=/opt/local/bin/g++-mp-4.9

@adamhaber
Copy link

I am experiencing very similar issues as well. Installed nest 2.10.0 from source, with brew's GSL. Running make installcheck gave the following result:

Total number of tests: 729
Passed: 708
Failed: 21 (17 PyNEST)

*** There were errors detected during the run of the NEST test suite!
*** Please report the problem at
*** https://github.com/nest/nest-simulator/issues
*** To help us diagnose the problem, please attach the archived content
*** of these directories to the issue:
*** - '/Users/adam/nest-2.10.0-build/reports'
*** - '/var/folders/ry/34c3djms6bbfmwt6pfnyr4y40000gn/T//nest.7bugZ'

This is the relevant log file, as far as I can tell:
installcheck log.txt

BTW, at least most of the numerical deviations I've seen in the failed tests seem quite small, in case it helps...

  • Python 2.7.12 (Anaconda)
  • macOS 10.12'
  • Apple LLVM version 7.3.0 (clang-703.0.31)

@heplesser heplesser added ZC: Installation DO NOT USE THIS LABEL I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) ZP: Pending DO NOT USE THIS LABEL S: Normal Handle this with default priority labels Nov 17, 2016
@heplesser
Copy link
Contributor Author

I tried again with the newest master and compilers, problems persist.

Apple LLVM version 8.1.0 (clang-802.0.42)

  • Still no support for OpenMP
  • Crashes:
    • unittests/test_aeif_cond_alpha_multisynapse.sli
    • unittests/test_aeif_cond_beta_multisynapse.sli
    • unittests/test_mip_corrdet.sli
    • unittests/test_recorder_close_flush.sli
    • regressiontests/ticket-80-175-179.sli
  • Python still fails

clang version 4.0.0 (tags/RELEASE_400/final)

  • Installed using brew install llvm
  • Configuration
export CPPFLAGS=-I/usr/local/opt/llvm/include
export LDFLAGS=-L/usr/local/opt/llvm/lib
cmake -DCMAKE_C_COMPILER=/usr/local/opt/llvm/bin/clang -DCMAKE_CXX_COMPILER=/usr/local/opt/llvm/bin/clang++  -DCMAKE_INSTALL_PREFIX:PATH=$PWD/install -Dwith-mpi=OFF -Dwith-python=OFF   /Users/plesser/NEST/code/src
  • Supports threading
  • Failed test:
    • regressiontests/ticket-681.sli
  • Crashes:
    • unittests/test_aeif_cond_alpha_multisynapse.sli
    • unittests/test_aeif_cond_beta_multisynapse.sli
    • unittests/test_mip_corrdet.sli
    • unittests/test_multithreading_devices.sli
    • unittests/test_recorder_close_flush.sli
    • regressiontests/ticket-80-175-179.sli
  • This is consistent with the crashes using Apple Clang (ticket-681 and test_multithreading_devices are skipped without threads)
  • I do not know how "normal" and Apple version numbers for LLVM/Clang are related.

@hakonsbm
Copy link
Contributor

hakonsbm commented May 22, 2017

All tests pass when compiling with clang under Ubuntu 16.04.2, using clang version 3.8.0.
What I did:

export LDFLAGS="-L/usr/lib/llvm-3.8/lib -L/path/to/llvm-openmp/bld/install/lib"
export LD_LIBRARY_PATH=/path/to/llvm-openmp/bld/install/lib
  • Run cmake with paths to the clang compiler, specifying the compiled OpenMP version and its include path
cmake -DCMAKE_C_COMPILER=/usr/lib/llvm-3.8/bin/clang -DCMAKE_CXX_COMPILER=/usr/lib/llvm-3.8/bin/clang++ -DCMAKE_INSTALL_PREFIX:PATH=$PWD/install -Dwith-mpi=ON -Dwith-openmp=-fopenmp=libomp -Dwith-python=2 -DCMAKE_CXX_FLAGS=-I/path/to/llvm-openmp/bld/install/include ../src
  • Run make -j4 install installcheck as usual.

@heplesser
Copy link
Contributor Author

I now built under MacOS 10.12.5 with clang 4.0.0 from brew with everything turned off:

export CPPFLAGS=-I/usr/local/opt/llvm/include
export LDFLAGS=-L/usr/local/opt/llvm/lib

cmake -DCMAKE_C_COMPILER=/usr/local/opt/llvm/bin/clang \
     -DCMAKE_CXX_COMPILER=/usr/local/opt/llvm/bin/clang++ \
     -DCMAKE_INSTALL_PREFIX:PATH=$PWD/install\
     -Dwith-mpi=OFF -Dwith-python=OFF  -Dwith-gsl=OFF -Dwith-openmp=OFF \
     -Dwith-readline=OFF -Dwith-ltdl=OFF -Dwith-optimize=-O0 -Dwith-debug=ON \
     -Dstatic-libraries=ON /Users/plesser/NEST/code/src

Building with static libraries requires deletion of duplicate names from topology_names.h.

Then, the following test lead to segmentation faults:

  • unittests/test_mip_corrdet.sli
  • unittests/test_recorder_close_flush.sli
  • regressiontests/ticket-80-175-179.sli
    The other tests reported failing above (test_aeif_*, test_multithreading_*, ticket-681.sli) are skipped.

In addition,

  • regressiontests/ticket-787.sli
    causes a segmentation fault. This is caused by calling ctermid, and this is the only function in this test that causes the segmentation fault. Most likely, this is not related to the errors above.

@heplesser heplesser added ZP: PR Created DO NOT USE THIS LABEL and removed ZP: Pending DO NOT USE THIS LABEL labels May 23, 2017
heplesser pushed a commit that referenced this issue Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) S: Normal Handle this with default priority T: Bug Wrong statements in the code or documentation ZC: Installation DO NOT USE THIS LABEL ZP: PR Created DO NOT USE THIS LABEL
Projects
None yet
Development

No branches or pull requests

9 participants