Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new specialized sparse-matrix extensions. #924

Merged
merged 7 commits into from
Jul 21, 2023
Merged

Conversation

m4rs-mt
Copy link
Owner

@m4rs-mt m4rs-mt commented Feb 6, 2023

This PR adds new helper classes and methods to process sparse matrices on the GPU while supporting specific masking operations to avoid computation of intermediate values. It also adds add a newly created ConcurrentStreamProcessor to perform generic operations concurrently on multiple concurrent GPU streams.

A sample use case is shown below that multiplies masked sparse matrixes. There are two options available:

  1. Create a single-stream-focused masked multiplier to have explicit control over matrix multiplications
  2. Create a stream-based concurrent processor to perform batched multiplications
// Create a single-streamed sparse matrix buffer
var processor = accel.CreateSparseTransposedMatrixMultiplierMasked<
    float, // Matrix type
    FloatEpsPredicate<Stride2D.DenseY>, // A generic predicate - a float-eps-value predicate in this case
    Stride2D.DenseY, // Custom striding of the matrix to use
    FloatMaskedSparseMatrixProcessor>(); // A float-specific FMA operation to accumulate dot products

// Multiply a single masked sparse matrix
processor(
    accel.DefaultStream,
    new(denseMatrixBuffer.View, 2.0f),
    denseMatrixBuffer.View,
    sparseView, // Sparse matrix view, obtained by explicit construction or using one of the available sparsification methods; will be transposed during runtime (see below)
    outBuffer.View);

// Create a concurrent processor that supports batched multiplication
using var maskedProcessor = new MaskedMatrixProcessor<
    float,
    FloatEpsPredicate<Stride2D.DenseY>,
    Stride2D.DenseY,
    FloatMaskedSparseMatrixProcessor>(accel);
maskedProcessor.Predicate = /* custom predicate here */;
maskedProcessor.MultiplyBatchedTransposed(
    accel.DefaultStream, // The main accelerator stream to attach the next multiplication operations to
    aViewList, // A list of dense input matrices used to multiply the sparse matrices (bViewList) with
    bViewList, // A list of sparse input matrices (will be transposed)
    outViewList); //A list of output views to matrix data containing the results 

Sample for accelerated creation of huge sparse matrices on accelerators using the newly added extensions:

// Setup sparse 2D matrix
var matrix = new float[4, 4]
{
    { 1, 0, 0, 0 },
    { 2, 1, 0, 0 },
    { 3, 0, 1, 0 },
    { 4, 0, 0, 1 },
};

// Allocate basic matrix on the accelerator and transfer it to the device
using var matrixBuffer = accel.Allocate2DDenseY<float>(matrix.GetExtent());
matrixBuffer.View.CopyFromCPU(matrix);

// Allocate a temp buffer (or use existing memory from somewhere else)
using var tempBuffer = accel.Allocate1D<int>(1);

// Initialize the basic shape converter and data converters
var shapeConverter = accel.CreateSparseTransposedMatrixShapeConverter<
    float,
    FloatEpsPredicate<Stride2D.DenseY>,
    Stride2D.DenseY>(tempBuffer.View);
var converter = accel.CreateSparseMatrixConverter<float, Stride2D.DenseY>();

// Get basic shape of the sparse matrix living on the device
using var numNeighborsBuffer = accel.Allocate1D<int>(matrix.GetLength(1));
var shapeView = shapeConverter(
    accel.DefaultStream,
    matrixBuffer,
    new(matrixBuffer, 0.0f),
    numNeighborsBuffer.View,
    maxNumNeighbors =>
        // Allocate a shape-view buffer to store the neighbor lists
        accel.Allocate2DDenseY<int>((matrix.GetLength(1), maxNumNeighbors)));

// Allocate the actual sparse data buffer
using var dataBuffer = accel.Allocate2DDenseY<float>(
    (matrix.GetLength(1),
    shapeView.MaxNonZeroEntries));

// Convert data and fill our sparse matrix structure
var sparseView = converter(accel.DefaultStream, matrixBuffer, shapeView, dataBuffer);

Note: This PR requires PR #989 to be merged.

Co-authored by @corwinjoy who wrote the initial POC.

@m4rs-mt m4rs-mt force-pushed the sparse_matrix branch 2 times, most recently from a929e5f to ad17267 Compare February 6, 2023 16:32
@m4rs-mt m4rs-mt added this to the v1.4 milestone Feb 16, 2023
@m4rs-mt m4rs-mt modified the milestones: v1.4, v1.5 Mar 16, 2023
@m4rs-mt m4rs-mt force-pushed the sparse_matrix branch 4 times, most recently from c0ba115 to f944ca3 Compare April 12, 2023 21:11
@m4rs-mt m4rs-mt requested a review from corwinjoy April 12, 2023 21:13
@m4rs-mt m4rs-mt force-pushed the sparse_matrix branch 3 times, most recently from aa46563 to 0bd56f2 Compare April 12, 2023 21:19
@m4rs-mt m4rs-mt marked this pull request as ready for review April 12, 2023 21:19
@m4rs-mt m4rs-mt added the feature A new feature (or feature request) label Apr 12, 2023
@m4rs-mt m4rs-mt force-pushed the sparse_matrix branch 3 times, most recently from 356eb44 to 46fbc48 Compare April 17, 2023 15:01
Copy link

@corwinjoy corwinjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks pretty good. I suggested a few minor changes.
I think the big piece that is missing here is documentation as to the motivation of this class and what problem it solves. We know, but I think users will be mystified.

@m4rs-mt m4rs-mt force-pushed the sparse_matrix branch 2 times, most recently from ab37a3e to c45f3c1 Compare July 13, 2023 16:21
Copy link

@corwinjoy corwinjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. As mentioned, I think the only thing that is missing is a piece of documentation explaining why end-users would want to use this / what problem this solves.

@m4rs-mt
Copy link
Owner Author

m4rs-mt commented Jul 20, 2023

Looks good. As mentioned, I think the only thing that is missing is a piece of documentation explaining why end-users would want to use this / what problem this solves.

Absolutely! I think we should create a sample in a separate PR and add further documentation to the wiki before the final release.

Copy link

@corwinjoy corwinjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Updated tests look good.

@m4rs-mt m4rs-mt merged commit 00d5c98 into master Jul 21, 2023
@m4rs-mt m4rs-mt deleted the sparse_matrix branch July 21, 2023 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature (or feature request)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants