-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix threading issue when connecting #1276
Conversation
threading_issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks already quite good, thanks for fixing this! i've added a few comments that should be addressed
@@ -427,7 +426,10 @@ nest::ConnectionManager::connect( const index sgid, | |||
{ | |||
kernel().model_manager.assert_valid_syn_id( syn_id ); | |||
|
|||
have_connections_changed_ = true; | |||
if ( not have_connections_changed_[ target_thread ] ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this check here? isn't it cheaper to just write in all cases? the logic wouldn't change, would it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakobj I tested on Blaustein, and if we do not check before setting the parameter, then the performance gets a lot worse. You can see it as 4c388f4
here. It must have something to do with all the threads trying to access at once and maybe something with the omp automatic write
in the set
function. I wrote a comment before every if
to try to explain and make sure nobody removes the if
without testing the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,thanks for clarifying
const thread tid = kernel().vp_manager.get_thread_id(); | ||
|
||
if ( not have_connections_changed_[ tid ] ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see above)
@@ -694,7 +693,10 @@ nest::ConnectionManager::find_connection( const thread tid, const synindex syn_i | |||
void | |||
nest::ConnectionManager::disconnect( const thread tid, const synindex syn_id, const index sgid, const index tgid ) | |||
{ | |||
have_connections_changed_ = true; | |||
if ( not have_connections_changed_[ tid ] ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see above)
ConnectionCreator::ConnectionCreator( DictionaryDatum dict ) | ||
: allow_autapses_( true ) | ||
, allow_multapses_( true ) | ||
, allow_oversized_( false ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I just added it because I noticed it wasn't initialized. I can remove it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakobj If allow_oversized_
is not initialized here, the test test_oversized_mask.sli
fails. I don't understand why the test suddenly fails here and not in master though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uninitialized is always bad, thanks for fixing this.
7de30bd
to
b16db1f
Compare
but just for mpi, as this is easier to read.
b16db1f
to
63c0862
Compare
@jakobj First of all, sorry for taking way too long to address your comments! I have made some changes based on your comments. Concerning your comments about why I don't set So no need to do another review yet. |
@heplesser It is hanging on me I'm afraid. |
@jakobj @jarsi The PR is again ready for the review, sorry for the delay. Somewhere along the way the benchmarks showed worse performance again, and it took some time to detect the problem, but that is fixed now: benchmark_results.pdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stinebuu thanks! everything looks good, just one minor comment where we could use std function. otherwise 👍
@@ -64,6 +64,34 @@ CompletedChecker::all_true() const | |||
return true; | |||
} | |||
|
|||
bool | |||
CompletedChecker::any_false() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use std::any_of
since we use c++11 now? we should probably also replace the other functions, maybe in a separate PR though
} | ||
|
||
bool | ||
CompletedChecker::any_true() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
@@ -427,7 +426,10 @@ nest::ConnectionManager::connect( const index sgid, | |||
{ | |||
kernel().model_manager.assert_valid_syn_id( syn_id ); | |||
|
|||
have_connections_changed_ = true; | |||
if ( not have_connections_changed_[ target_thread ] ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,thanks for clarifying
ConnectionCreator::ConnectionCreator( DictionaryDatum dict ) | ||
: allow_autapses_( true ) | ||
, allow_multapses_( true ) | ||
, allow_oversized_( false ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uninitialized is always bad, thanks for fixing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work, the pr looks fine to me.
@jakobj Thanks for you comments! We cannot use As all reviewers have approved now, I will merge. |
A little while ago @heplesser found that when trying to reproduce the figures from Ippen et al (2017), the strong scaling by threads variant behaved badly with master. Luckily, he found the problem as well, which is that a lot of threads are trying to write to
have_connections_changed_
at once. This PR useCompletedChecker
forhave_connections_changed_
to make sure that all threads do not write to the same variable.A benchmark can be found here: Fig_5_results_one_node_noinst.pdf The benchmark is from July,
0ac2385
was master at the time, and8fa2e37
contains the fix. With the fix we see that strong scaling by threads and strong scaling by mpi again are equal when we connect.This PR also cleans up some of our thread-safe connection routines. Instead of passing around one static empty dummy dictionary when connecting, we use a list of empty dictionaries, one dict per thread.