Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed pipelines #428

Merged
merged 5 commits into from
Oct 3, 2022
Merged

Distributed pipelines #428

merged 5 commits into from
Oct 3, 2022

Commits on Oct 3, 2022

  1. [DAPHNE-428] Distributed Pipelines Compiler Infrastructure

    Basic compiler infrastructure for distributed pipelines.
    
    - DistributedPipelineOp akin to VectorizedPipelineOp
      - main difference: pipeline body given as IR string, not as nested code region
    - New DistributePipelinesPass rewriting VectorizedPipelineOp to DistributedPipelineOp.
      - including the generation of the IR string
      - distribution requires vectorization (--vec) to be effective
    - RewriteToCallKernelOpPass lowers DistributedPipelineOp to CallKernelOp
      - special rewrite pattern for DistributedPipelineOp, for certain reasons it cannot be treated as any other op here
    - distributedPipeline-kernel: currently just a stub that prints the information it receives, including the IR string.
    - Limitations:
      - currently, we only support distributed pipelines with 1 or 2 outputs (we still have the same limitation for vectorized pipelines)
      - currently, we only support DenseMatrix<double>
      - currently, all vectorized pipelines are rewritten to distributed pipelines
    pdamme authored and corepointer committed Oct 3, 2022
    Configuration menu
    Copy the full SHA
    8fdbf65 View commit details
    Browse the repository at this point in the history
  2. [DAPHNE-96, DAPHNE-194] Distributed Runtime Refactoring

    This commit merges a longer development process to the main branch. The general topic is given in the first line of the commit message and the aggregated individual commit messages are listed below.
    
    Closes #96, Closes #194
    
    Distributed runtime updates:
    - Updated project structure
    - Moved distributed related kernels and datastructures under runtime/distributed/coordinator/
    - Generalized Broadcast&Distribute kernels
    - DistributedWrapper implementation
    - Updated Worker to support Vectorized execution.
    
    Implementation of vectorizedPipeline local kernel
    
    - Updated distributedCollect Primitive
    - TODO generated ir code needs to be fixed
    - TODO Additional debugging needed on worker side
    
    Distributed runtime updates:
    
    - Extended parseType for rows/cols
    
    - Updated Distributed Runtime for COLS combine
    
    -Updated DistributedTest
    
    - DistributedDescriptor implementation (metadata for the distributed runtime)
    - Distributed allocation descriptor implementation. simply holds object metadata information
    - Distributed kernels (distribute/broadcast/compute, etc.) use template functions for each communication framework.
    - MPI implementation missing.
    
    - New enum type for distributed backend implementation
    
    [MINOR] Changes for readCSVFiles after rebasing
    aristotelis96 authored and corepointer committed Oct 3, 2022
    Configuration menu
    Copy the full SHA
    bf56d8f View commit details
    Browse the repository at this point in the history
  3. [DAPHNE-367] Distributed Pipelines Metadata Handling

    This commit merges a longer development process to the main branch. The general topic is given in the first line of the commit message and the aggregated individual commit messages are listed below.
    
    Closes #367
    
    Initial AllocationDescriptor Distributed Implementation
    
    - GRPC implementation
    
    Moved gRPC-related classes and files under "runtime/distributed/proto/" .
    
    - Some files containing gRPC code where located under distributed/worker. Moved class ProtoDataConverter, class CallData
    - Some files containing gRPC code where located under distributed/coordinator. Moved class DistributedGRPCCaller
    - Updated CMAKE files.
    
    Updated DistributedWorker
    
    - Seperated worker implementation from gRPC.
    - Worker gRPC implementation now derives from base class Worker Implementation.
    - Base class WorkerImpl contains generic functions for storing data, computing pipelines, etc.
    - class WorkerImplGRPC contains functions for communicating with gRPC and using parent class for storing/computing data.
    - TODO WorkerImplMPI.
    
    Distributed pipeline kernel: Support for more than two outputs.
    
    Enabling multiple outputs for distributed pipelines.
    
    - There was already a partial implementation transfering the recent changes from vectorized pipelines to distributed pipelines.
    
    - However, a few pieces were still missing to make it work:
    
    - The CallKernelOp generated for the DistributedPipelineOp in RewriteToCallKernelOpPass must have the attribute "hasVariadicResults" to ensure correct lowering in LowerToLLVMPass.
    
    - The number of outputs must come after the outputs in the kernel, and must not be added as an operand to the CallKernelOp, since it is added automatically for variadic results in LowerToLLVMPass.
    
    [MINOR] Bugfix: grpc was not throwing an error when handling unsupported types (for now we support only Dense<double>)
    
    - Support for broadcasting single double values.
    - Minor fixes.
    - Due to current Object Meta Data limitations, we only support unique inputs for a Distributed Pipeline (no duplicate pipeline inputs).
    
    Distributed kernels
    
    - Distributed kernels have specializations for each distributed backend implementation.
    - Distributed kernels update the meta data and handle the communication using specific distributed-backend implementation.
    - Distributed metadata now hold only information.
    - TODO: Add simple transferTo/From functions in the meta data class for the distributed gRPC implementation.
    - Various small changes.
    
    Rebased onto main
    
    - main includes the initial Meta Data implementation
    - MetaDataObject mdo field of class Structure is now public. Distributed kernels need to access and modify the metadata of an object.
    - Various small updates to kernels in order to support the new meta data implementation.
    
    Updated distributed runtime tests
    
    - WorkerTest.cpp now tests the generic WorkerImpl class, instead of the gRPC specific implementation.
    - TODO: Add a test for the gRPC WorkerImpl class.
    - Removed unused utility function "StartDistributedWorker"
    - Disabled "DistributedRead" test. With the new Distributed-Pipeline implementation we do not support distributed read yet, therefore this test does not actually test something significant.
    - Updated a few test-scripts for the distributed runtime, due to unique-pipeline-inputs limitations.
    
    Cleanup.
    
    - Added Status nested class to WorkerImpl.
    - Renamed and moved AllocationDescriptorGRPC.
    - Renamed Worker::StoredInfo::filename to identifier.
    - Improved serialization from CSRMatrix to protobuf.
    - Changed MetaDataObject mdo in Structure class, from public to private.
    - Added getter by reference for modifying MetaDataObject of a Structure.
    - Improved CSRMatrix serialization from Daphne object to protobuf.
    - Fixed various warnings.
    - Minor changes.
    aristotelis96 authored and corepointer committed Oct 3, 2022
    Configuration menu
    Copy the full SHA
    a1e92f5 View commit details
    Browse the repository at this point in the history
  4. [DAPHNE-428] Distributed Context & CLI flag

    Added distributed context and cli argument for distributed execution.
    
    - Added DistributedContext.h containg information about distributed workers.
    - Removed duplicate code.
    - Added command line argument "--distributed", in order to enable distributed execution for DAPHNE.
    - Various small fixes after adding "--distributed"" flag.
    
    Co-authored-by: Stratos Psomadakis <[email protected]>
    Co-authored-by: Aristotelis Vontzalidis <[email protected]>
    2 people authored and corepointer committed Oct 3, 2022
    Configuration menu
    Copy the full SHA
    b5a84c3 View commit details
    Browse the repository at this point in the history
  5. [DAPHNE-428] Distributed Runtime Bugfixes & Cleanups

    This commit merges a longer development process to the main branch. The general topic is given in the first line of the commit message and the aggregated individual commit messages are listed below.
    
    Closes #428
    
    - Fixed various memory leaks.
    - Fixed headers.
    - Changed some pass-by-value to const-reference.
    - Channel map changed to inline static.
    
    [MINOR] Silenced some warnings
    
    [Bugfix] Worker receiving and storing a value.
    
    When the worker receives a value, we need to allocate memory, store the value and keep the address, for later use.
    Since we did not allocate any memory this resulted in a bug where the address stored was not pointing to any value.
    
    [Bugfix] Distribute/Broadcast kernels never actually checked if something was already placed at the workers.
    aristotelis96 authored and corepointer committed Oct 3, 2022
    Configuration menu
    Copy the full SHA
    f199b4b View commit details
    Browse the repository at this point in the history