-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed pipelines #428
Commits on Oct 3, 2022
-
[DAPHNE-428] Distributed Pipelines Compiler Infrastructure
Basic compiler infrastructure for distributed pipelines. - DistributedPipelineOp akin to VectorizedPipelineOp - main difference: pipeline body given as IR string, not as nested code region - New DistributePipelinesPass rewriting VectorizedPipelineOp to DistributedPipelineOp. - including the generation of the IR string - distribution requires vectorization (--vec) to be effective - RewriteToCallKernelOpPass lowers DistributedPipelineOp to CallKernelOp - special rewrite pattern for DistributedPipelineOp, for certain reasons it cannot be treated as any other op here - distributedPipeline-kernel: currently just a stub that prints the information it receives, including the IR string. - Limitations: - currently, we only support distributed pipelines with 1 or 2 outputs (we still have the same limitation for vectorized pipelines) - currently, we only support DenseMatrix<double> - currently, all vectorized pipelines are rewritten to distributed pipelines
Configuration menu - View commit details
-
Copy full SHA for 8fdbf65 - Browse repository at this point
Copy the full SHA 8fdbf65View commit details -
[DAPHNE-96, DAPHNE-194] Distributed Runtime Refactoring
This commit merges a longer development process to the main branch. The general topic is given in the first line of the commit message and the aggregated individual commit messages are listed below. Closes #96, Closes #194 Distributed runtime updates: - Updated project structure - Moved distributed related kernels and datastructures under runtime/distributed/coordinator/ - Generalized Broadcast&Distribute kernels - DistributedWrapper implementation - Updated Worker to support Vectorized execution. Implementation of vectorizedPipeline local kernel - Updated distributedCollect Primitive - TODO generated ir code needs to be fixed - TODO Additional debugging needed on worker side Distributed runtime updates: - Extended parseType for rows/cols - Updated Distributed Runtime for COLS combine -Updated DistributedTest - DistributedDescriptor implementation (metadata for the distributed runtime) - Distributed allocation descriptor implementation. simply holds object metadata information - Distributed kernels (distribute/broadcast/compute, etc.) use template functions for each communication framework. - MPI implementation missing. - New enum type for distributed backend implementation [MINOR] Changes for readCSVFiles after rebasing
Configuration menu - View commit details
-
Copy full SHA for bf56d8f - Browse repository at this point
Copy the full SHA bf56d8fView commit details -
[DAPHNE-367] Distributed Pipelines Metadata Handling
This commit merges a longer development process to the main branch. The general topic is given in the first line of the commit message and the aggregated individual commit messages are listed below. Closes #367 Initial AllocationDescriptor Distributed Implementation - GRPC implementation Moved gRPC-related classes and files under "runtime/distributed/proto/" . - Some files containing gRPC code where located under distributed/worker. Moved class ProtoDataConverter, class CallData - Some files containing gRPC code where located under distributed/coordinator. Moved class DistributedGRPCCaller - Updated CMAKE files. Updated DistributedWorker - Seperated worker implementation from gRPC. - Worker gRPC implementation now derives from base class Worker Implementation. - Base class WorkerImpl contains generic functions for storing data, computing pipelines, etc. - class WorkerImplGRPC contains functions for communicating with gRPC and using parent class for storing/computing data. - TODO WorkerImplMPI. Distributed pipeline kernel: Support for more than two outputs. Enabling multiple outputs for distributed pipelines. - There was already a partial implementation transfering the recent changes from vectorized pipelines to distributed pipelines. - However, a few pieces were still missing to make it work: - The CallKernelOp generated for the DistributedPipelineOp in RewriteToCallKernelOpPass must have the attribute "hasVariadicResults" to ensure correct lowering in LowerToLLVMPass. - The number of outputs must come after the outputs in the kernel, and must not be added as an operand to the CallKernelOp, since it is added automatically for variadic results in LowerToLLVMPass. [MINOR] Bugfix: grpc was not throwing an error when handling unsupported types (for now we support only Dense<double>) - Support for broadcasting single double values. - Minor fixes. - Due to current Object Meta Data limitations, we only support unique inputs for a Distributed Pipeline (no duplicate pipeline inputs). Distributed kernels - Distributed kernels have specializations for each distributed backend implementation. - Distributed kernels update the meta data and handle the communication using specific distributed-backend implementation. - Distributed metadata now hold only information. - TODO: Add simple transferTo/From functions in the meta data class for the distributed gRPC implementation. - Various small changes. Rebased onto main - main includes the initial Meta Data implementation - MetaDataObject mdo field of class Structure is now public. Distributed kernels need to access and modify the metadata of an object. - Various small updates to kernels in order to support the new meta data implementation. Updated distributed runtime tests - WorkerTest.cpp now tests the generic WorkerImpl class, instead of the gRPC specific implementation. - TODO: Add a test for the gRPC WorkerImpl class. - Removed unused utility function "StartDistributedWorker" - Disabled "DistributedRead" test. With the new Distributed-Pipeline implementation we do not support distributed read yet, therefore this test does not actually test something significant. - Updated a few test-scripts for the distributed runtime, due to unique-pipeline-inputs limitations. Cleanup. - Added Status nested class to WorkerImpl. - Renamed and moved AllocationDescriptorGRPC. - Renamed Worker::StoredInfo::filename to identifier. - Improved serialization from CSRMatrix to protobuf. - Changed MetaDataObject mdo in Structure class, from public to private. - Added getter by reference for modifying MetaDataObject of a Structure. - Improved CSRMatrix serialization from Daphne object to protobuf. - Fixed various warnings. - Minor changes.
Configuration menu - View commit details
-
Copy full SHA for a1e92f5 - Browse repository at this point
Copy the full SHA a1e92f5View commit details -
[DAPHNE-428] Distributed Context & CLI flag
Added distributed context and cli argument for distributed execution. - Added DistributedContext.h containg information about distributed workers. - Removed duplicate code. - Added command line argument "--distributed", in order to enable distributed execution for DAPHNE. - Various small fixes after adding "--distributed"" flag. Co-authored-by: Stratos Psomadakis <[email protected]> Co-authored-by: Aristotelis Vontzalidis <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b5a84c3 - Browse repository at this point
Copy the full SHA b5a84c3View commit details -
[DAPHNE-428] Distributed Runtime Bugfixes & Cleanups
This commit merges a longer development process to the main branch. The general topic is given in the first line of the commit message and the aggregated individual commit messages are listed below. Closes #428 - Fixed various memory leaks. - Fixed headers. - Changed some pass-by-value to const-reference. - Channel map changed to inline static. [MINOR] Silenced some warnings [Bugfix] Worker receiving and storing a value. When the worker receives a value, we need to allocate memory, store the value and keep the address, for later use. Since we did not allocate any memory this resulted in a bug where the address stored was not pointing to any value. [Bugfix] Distribute/Broadcast kernels never actually checked if something was already placed at the workers.
Configuration menu - View commit details
-
Copy full SHA for f199b4b - Browse repository at this point
Copy the full SHA f199b4bView commit details