Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring MetaDataObject out of DenseMatrix #758

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

corepointer
Copy link
Collaborator

This PR moves the MetaDataObject (MDO) functionality out of DenseMatrix and generalizes it to be used by other classes derived from Structure as well.

Furthermore, this contains a performance improvement to prevent excessive allocation ID lookups and a separation of ranged and full allocations.

All tests are running except the distributed ones.

corepointer added a commit to corepointer/daphne that referenced this pull request Jul 22, 2024
* This commit introduces the meta data object to the CSR data type

* Memory pinning

To prevent excessive allocation ID lookups in the hot path when using --vec, this change "pins" memory by allocation type of previous accesses.
@corepointer corepointer mentioned this pull request Jul 22, 2024
corepointer added a commit to corepointer/daphne that referenced this pull request Jul 29, 2024
* This commit introduces the meta data object to the CSR data type

* Memory pinning

To prevent excessive allocation ID lookups in the hot path when using --vec, this change "pins" memory by allocation type of previous accesses.
corepointer added a commit to corepointer/daphne that referenced this pull request Aug 19, 2024
* This commit introduces the meta data object to the CSR data type

* Memory pinning

To prevent excessive allocation ID lookups in the hot path when using --vec, this change "pins" memory by allocation type of previous accesses.
corepointer added a commit to corepointer/daphne that referenced this pull request Oct 18, 2024
* This commit introduces the meta data object to the CSR data type

* Memory pinning

To prevent excessive allocation ID lookups in the hot path when using --vec, this change "pins" memory by allocation type of previous accesses.
corepointer added a commit to corepointer/daphne that referenced this pull request Oct 18, 2024
… Pinning

* This commit introduces the meta data object to the CSRMatrix data type
  To implement this change, handling of the AllocationDescriptors has been refactored out of DenseMatrix.

* Separate handling of ranges
  Since tracking of ranges of data is only used in the distributed setting for now, we will handle this separately and assume always a full allocation for local computation. This should result in less unnecessary "if range not null do this, else do that".

* Memory pinning
  To prevent excessive allocation ID lookups in the hot path, especially when using --vec, this change "pins" memory by allocation type of previous accesses.
  Simply put, as long as there is no different access type (e.g., call getValues() for host vs device memory) it is assumed, that the data is not changed and no query of the meta data object needs to be done.

Closes daphne-eu#758
corepointer added a commit to corepointer/daphne that referenced this pull request Oct 18, 2024
… Pinning

* This commit introduces the meta data object to the CSRMatrix data type
  To implement this change, handling of the AllocationDescriptors has been refactored out of DenseMatrix.

* Separate handling of ranges
  Since tracking of ranges of data is only used in the distributed setting for now, we will handle this separately and assume always a full allocation for local computation. This should result in less unnecessary "if range not null do this, else do that".

* Memory pinning
  To prevent excessive allocation ID lookups in the hot path, especially when using --vec, this change "pins" memory by allocation type of previous accesses.
  Simply put, as long as there is no different access type (e.g., call getValues() for host vs device memory) it is assumed, that the data is not changed and no query of the meta data object needs to be done.

Closes daphne-eu#758
corepointer added a commit to corepointer/daphne that referenced this pull request Oct 18, 2024
… Pinning

* This commit introduces the meta data object to the CSRMatrix data type
  To implement this change, handling of the AllocationDescriptors has been refactored out of DenseMatrix.

* Separate handling of ranges
  Since tracking of ranges of data is only used in the distributed setting for now, we will handle this separately and assume always a full allocation for local computation. This should result in less unnecessary "if range not null do this, else do that".

* Memory pinning
  To prevent excessive allocation ID lookups in the hot path, especially when using --vec, this change "pins" memory by allocation type of previous accesses.
  Simply put, as long as there is no different access type (e.g., call getValues() for host vs device memory) it is assumed, that the data is not changed and no query of the meta data object needs to be done.

Closes daphne-eu#758
corepointer added a commit to corepointer/daphne that referenced this pull request Oct 18, 2024
… Pinning

* This commit introduces the meta data object to the CSRMatrix data type
  To implement this change, handling of the AllocationDescriptors has been refactored out of DenseMatrix.

* Separate handling of ranges
  Since tracking of ranges of data is only used in the distributed setting for now, we will handle this separately and assume always a full allocation for local computation. This should result in less unnecessary "if range not null do this, else do that".

* Memory pinning
  To prevent excessive allocation ID lookups in the hot path, especially when using --vec, this change "pins" memory by allocation type of previous accesses.
  Simply put, as long as there is no different access type (e.g., call getValues() for host vs device memory) it is assumed, that the data is not changed and no query of the meta data object needs to be done.

Closes daphne-eu#758
corepointer added a commit to corepointer/daphne that referenced this pull request Oct 18, 2024
… Pinning

* This commit introduces the meta data object to the CSRMatrix data type
  To implement this change, handling of the AllocationDescriptors has been refactored out of DenseMatrix.

* Separate handling of ranges
  Since tracking of ranges of data is only used in the distributed setting for now, we will handle this separately and assume always a full allocation for local computation. This should result in less unnecessary "if range not null do this, else do that".

* Memory pinning
  To prevent excessive allocation ID lookups in the hot path, especially when using --vec, this change "pins" memory by allocation type of previous accesses.
  Simply put, as long as there is no different access type (e.g., call getValues() for host vs device memory) it is assumed, that the data is not changed and no query of the meta data object needs to be done.

Closes daphne-eu#758
@corepointer
Copy link
Collaborator Author

The numerous force pushes are a result of my local clang-format disagreeing with the CI's clang-format:

 --- src/runtime/local/datastructures/AllocationDescriptorGRPC.h	(original)
+++ src/runtime/local/datastructures/AllocationDescriptorGRPC.h	(reformatted)
@@ -35,7 +35,7 @@
   public:
     AllocationDescriptorGRPC() = default;
     AllocationDescriptorGRPC(DaphneContext *ctx, const std::string &address, const DistributedData &data)
-        : ctx(ctx), workerAddress(address), distributedData(data) {};
+        : ctx(ctx), workerAddress(address), distributedData(data){};
 
     ~AllocationDescriptorGRPC() override = default;
     [[nodiscard]] ALLOCATION_TYPE getType() const override { return type; };

@corepointer corepointer marked this pull request as ready for review October 18, 2024 17:15
@corepointer corepointer added feature missing/requested features performance label for PRs of perf++ and issues of perf-- Accelerators Distributed Issues and PRs related to distributed computation labels Oct 18, 2024
@corepointer
Copy link
Collaborator Author

Explaining the labels:

  • feature: CUDA handling CSRMatrix is new
  • Accelerator: it's (also) about CUDA ops
  • Distributed: the refactoring affects this component
  • Performance: besides this one explaining itself, the pinning and being able to run sparse stuff on GPU help with performance 💪

…not throw

Changing the behavior of fileExists() to a boolean operation as suggested by the method's name. Throwing an exception us up to the caller of this method.

Closes daphne-eu#867
… Pinning

* This commit introduces the meta data object to the CSRMatrix data type
  To implement this change, handling of the AllocationDescriptors has been refactored out of DenseMatrix.

* Separate handling of ranges
  Since tracking of ranges of data is only used in the distributed setting for now, we will handle this separately and assume always a full allocation for local computation. This should result in less unnecessary "if range not null do this, else do that".

* Memory pinning
  To prevent excessive allocation ID lookups in the hot path, especially when using --vec, this change "pins" memory by allocation type of previous accesses.
  Simply put, as long as there is no different access type (e.g., call getValues() for host vs device memory) it is assumed, that the data is not changed and no query of the meta data object needs to be done.

Closes daphne-eu#758
Due to the use of ptr to local var the distributed (GRPC_SYNC) mode crashed in test cases. This patch fixes this by using std::unique_ptr appropriately.
Furthermore, a check for nullptr is performed before getting distributed data to add a message indicating that execution failed here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accelerators Distributed Issues and PRs related to distributed computation feature missing/requested features performance label for PRs of perf++ and issues of perf--
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant