-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve NEST performance by revised connection exchange and spike delivery #2926
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…spike_data; use max spike-data buffer size
…ther_spike_data Ensure thread-local memory allocation
…om:suku248/nest-simulator into test_single_threading_in_gather_spike_data
…gle_threading_in_gather_spike_data
…m:suku248/nest-simulator into single_batchwise
…single_threading_in_gather_spike_data Conflicts: nestkernel/event_delivery_manager.cpp
…nto def_nolag_mrg
mlober
reviewed
Sep 11, 2023
Co-authored-by: Jochen Martin Eppler <[email protected]>
mlober
reviewed
Sep 11, 2023
Co-authored-by: Jochen Martin Eppler <[email protected]>
Co-authored-by: Melissa Lober <[email protected]>
Co-authored-by: Melissa Lober <[email protected]>
…nto def_nolag_mrg
mlober
reviewed
Sep 11, 2023
Co-authored-by: Melissa Lober <[email protected]>
heplesser
requested review from
jougs
and removed request for
suku248 and
JanVogelsang
September 13, 2023 08:09
jougs
approved these changes
Sep 13, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Many thanks!
jessica-mitchell
approved these changes
Sep 13, 2023
mlober
approved these changes
Sep 13, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
I: Behavior changes
Introduces changes that produce different results for some users
S: High
Should be handled next
T: Enhancement
New functionality, model or documentation
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request significantly improves NEST performance by
The changes are described in detail below.
Note that transmission of
SecondaryEvents
is essentially not affected by this PR since they are written directly into ready-made buffers on the receiver side. There is only some effect on connection transmission to the presynaptic side.Many NEST developers have contributed to this work, especially @diesmann, @suku248, @JoseJVS, @med-ayssar, @mlober, @hakonsbm, @ackurth and @JanVogelsang.
Breaking changes in NEST
Spikes from last slice not delivered
spike_recorder
) are not affected since they receive spikes locally.Removed kernel parameters
sort_connections_by_source
—use_compressed_spikes
remains, which automatically activates connection sorting. There is simply no relevant use case where sorting but not compressing would make sense.adaptive_spike_buffers
— spike buffers are now always adaptive, see section on Buffer growth and shrinkingmax_buffer_size_spike_data
— there is no upper limit since all spikes need to be transmitted in one roundNew kernel parameters
The following parameters control or report spike buffer resizing (see Buffer growth and shrinking for details):
spike_buffer_grow_extra
spike_buffer_shrink_limit
spike_buffer_shrink_spare
spike_buffer_resize_log
Things to check in particular
kernel_manager.h
)Modified tests
test_mip_corrdet
— need to simulate one step longer due to delivery at beginning of next steptest_regression_issue-1034
— need to subtract min delay due to moved deliveryConnection compression and transmission
gather_target_data_compressed()
.spike
in a number of method and data structure names below has historic roots and should be cleaned up in a follow-up step.Data Structures
SourceTable::sources_
sources_[target_thread][syn_id][lcid]
stores raw connection info, built during connection creationSourceTable::compressible_sources_
compressible_sources_[target_thread][syn_id][idx]
contains onepair<source_node_id, SpikeData(target_thread, syn_id, conn_lcid, 0)>
entry, mapping each source node id to aSpikeData
entry identifying the firstsources_
entry for that source node in the sortedsources_
arrayConnectionManager::compressed_spike_data
compressed_spike_data[syn_id][source_index][target_thread]
is the result of the second compression steppair
fromcompressible_sources_
.SourceTable::compressed_spike_data_map_
provides and index from source node id tosource_index
incompressed_spike_data
.SourceTable::compressed_spike_data_map_
compressed_spike_data_map_[syn_id]
maps each unique source node id onto the correspondingsource_index
in thecompressed_spike_data
CSDMapEntry
compressed_spike_data_map_
.ConnectionManager::iteration_state_
vector< pair< syn_id, map< source_gid, CSDMapEntry >::const_iterator > >
compressed_spike_data_map_
.Source compression
SourceTable::collect_compressible_sources()
, thread parallel)SourceTable::sources_
, create entry incompressible_sources_
connecting source ID to info about first entry for that source insources_
and mark sequence of connections from that source neuron assource_has_more_targets
.SourceTable::fill_compressed_spike_data()
, serial)syn_id
, iterate over connections on all threads incompressible_sources_
compressed_spike_data_map_
(for thatsyn_id
)compressed_spike_data
with one slot per threadcompressed_spike_data_map_
SpikeData
entry created in the first compression step.Connection transmission
Connection transmission works in multiple rounds if necessary, buffer size may be adjusted
Collocation of data is assigned to "assigned ranks"
Writing is mainly done by
ConnectionManager::fill_target_buffer
as follows:compressed_spike_data_map_
, outermost bysyn_id
, then over source entries.Gather MPI-exchanges data that has been written to buffers
If not all data has been transmitted, do more rounds until all data has been transmitted.
For each compressed set of connections, we send to the presynaptic side
compressed_spike_data
NOTE: The iteration scheme is different from the original approach. We stop as soon as a single rank has filled its part of the buffer. In the original, iteration would continue until the last rank had filled its chunk. CSDMap entries were marked as processed when written. On the next round, iteration through CSDMap would start at the point where the first rank had to stop writing, skipping all entries that had been written.
Spike transmission
emitted_spikes_register_
during node update are gathered at end of time slice and exchanged between ranks bygather_spike_data()
andcollocated_spike_data_buffers_()
.deliver_events_()
Data structures
Spike register
emitted_spikes_register
SpikeData
(to be written directly to transmission buffer) and rank of target neuron (for writing to correct section of target buffer)emitted_spikes_register_
inEventDeliverManager::send_remote()
, which is called when a node sends a spikeemitted_spikes_register_
inEventDeliveryManager::collocate_spike_data_buffers_()
, called fromgather_spike_data()
SendBufferPosition
TargetSendBufferPosition
used for connection communication with assigned ranks.SpikeData
OffGrid
versionsend_remote()
, we immediately create the eventualSpikeData
entry, which is later copied to the transmission buffer bycollocate_spike_data_buffers_()
, no more re-coding in the process(Target, lag)
for direct insertion toemitted_spikes_register
as part ofSpikeDataWithRank
entrySpikeDataWithRank
combinesSpikeData
with target rank information needed for eventual writing to transmission buffer.emplace_back()
intoemitted_spikes_register
, i.e., direct construction instead of constuct and copy.set_lcid()
to allow transmission of locally requiredbuffer size per rank in LCID field
get_marker()
struct SpikeDataWithRank
foremitted_spikes_register
, alsovariant for
OffGrid
Deliver events first
deliver_events_()
again in a separate method at the beginning of each update loop.clock_
is advance by onemin_delay
when spikes are delivered compared to when they were sent.min_delay
needs to be subtracted fromclock_
when computing arrival timesprepared_timestamps
indeliver_events_()
rate_*_impl.h
filesSpike gathering and transmission
Marking completeness and required buffer size
SpikeData::marker_
fieldbegpos
, the lastendpos
(this position is included in the chunk, it is not one beyond)local_max_spikes_per_rank
is the largest number of spikes a given rank needs to transmit to any other rank.global_max_spikes_per_rank
is the maximum of alllocal_max_spikes_per_rank
values. It determines the minimum required buffer chunk size.SpikeData
marker values are defined as follows: haveDEFAULT
: Normal entry, cannot occur in endposEND
: Marks last entry containing data.local_max_spikes_per_rank
of the sending rank is equal to the current buffer sizelocal_max_spikes_per_rank
.local_max >= chunk_size
, set endpos markers toINVALID
and store `local_max_ there.INVALID
,otherwise set
END
marker on last position written to.COMPLETE
on endpos for chunk and storelocal_max
in endpos LCIDglobal_max_spikes_per_rank
from alllocal_max
information obtainedglobal_max > chunk_size
, grow buffer and repeat entire process.Buffer growth and shrinking
gather_spike_data_()
global_max_spikes_per_rank_
, i.e., the largest number of spikes that any rank has sent to any other rank. The individual sections of the spike transmission buffer must be at least this size.global_max_spikes_per_rank_
, growth during gathering if required.Growing
global_max_spikes_per_rank_
(1 + spike_buffer_grow_extra) * global_max_spikes_per_rank_
to keep number of grow operations smallspike_buffer_grow_extra == 0.5
Shrinking
global_max_spikes_per_rank_ < spike_buffer_shrink_limit * buffer_size
new_size = ( 1 + spike_buffer_shrink_spare ) * global_max_spikes_per_rank_
spike_buffer_shrink_limit = 0
spike_buffer_shrink_limit == 0.3
spike_buffer_shrink_spare == 0.1
Logging
global_max_spikes_per_rank_
and the new buffer size are recorded.spike_buffer_resize_log
, which is a dictionary with the same structure asevents
dictionaries of recorders, i.e., containing one array for each of the three quantities recorded.Spike delivery
deliver_events_()
is called at beginning of each time slice except for the very first time slice (time 0, nothing to deliver)deliver_events_()
is called in a thread-parallel contextend_marker
in section from each rankMinor changes
Limit on LCID values
MAX_LCID
now used to markinvalid_lcid
MAX_LCID-1
nest_types.h
MPIManager
changesEventDeliveryManager
.FULL_LOGGING()
macrowrite_to_dump()
method for logging outputcritical
sections for outputCMakeLists.txt
cmake/ProcessOptions.cmake
libnestutil/config.h.in
kernel_manager.h,cpp
Touch ups
int
bysize_t
for spike multiplicitymusic_event_out_proxy
spike_recorder
event
stimulation_backend_mpi.h
recording_backend_mpi.h
Updated tests
test_stdp_synapse
— modernization, no change to logicSLI unittest
distributed_process_invariant_events...
Open issues to be followed up
MAX_
andinvalid_
constants, see Systematize definition of INVALID_* and MAX_* constants #2529EventDeliveryManager
. Move toConnectionManager
.MPIManager
and code using the buffers in complicated ways. This should be made more systematic.deliver_events_()
can be simplified by use of functions.SendBufferPosition
be turned into proper iterator (or array of iterators), and shouldTargetSendBufferPosition
moved to file of its own?spike
in names in connection infrastructure buildingSourceTable::compressed_spike_data_map_
can be cleared after connection transmissionThis PR replaces #2617.