Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAPHNE-191] Refactor CUDA buffer mgmt (aka introducing Object Meta D… #334

Merged
merged 4 commits into from
Aug 25, 2022

Conversation

corepointer
Copy link
Collaborator

@corepointer corepointer commented Apr 6, 2022

…ata)

This change introduces a major change how external storage buffers (CUDA memory specifically) are handled. In that regard, the following noteworthy changes are implemented:

  • Factor out CUDA allocations from DenseMatrix (one of the initial motivations of issue Refactor CUDA buffer mgmt #191)
  • Introduce a mechanism to handle several storage backends and track ranges of a Structure's data.
  • To make use of the mechanism, an AllocationDescriptor is passed to create() and getValues() (at the moment only DenseMatrix is supported).
  • Allocation descriptors need to implement the IAllocationDescriptor interface. This decouples backend specific dependencies.
  • AllocationDescriptorHost and AllocationDescriptorCUDA are implemented atm. The former is more or less a no-op for now.
  • The CUDA memory allocation and data movement is moved to the CUDAContext class. It keeps track of its allocations per device. For now this does nothing but can be used to reuse allocations in the future.

corepointer added a commit that referenced this pull request Apr 6, 2022
…ata)

This change introduces a major change how external storage buffers (CUDA memory specifically) are handled. In that regard, the following noteworthy changes are implemented:
* Factor out CUDA allocations from DenseMatrix (one of the initial motivations of issue #191)
* Introduce a mechanism to handle several storage backends and track ranges of a Structure's data.
* To make use of the mechanism, an AllocationDescriptor is passed to create() and getValues() (at the moment only DenseMatrix is supported).
* Allocation descriptors need to implement the IAllocationDescriptor interface. This decouples backend specific dependencies.
* AllocationDescriptorHost and AllocationDescriptorCUDA are implemented atm. The former is more or less a no-op for now.
* The CUDA memory allocation and data movement is moved to the CUDAContext class. It keeps track of its allocations per device. For now this does nothing but can be used to reuse allocations in the future.

Closes #191, Closes #334
@aristotelis96
Copy link
Collaborator

aristotelis96 commented May 17, 2022

Thank you @corepointer for this great contribution! One thing though, on my machine when I run the following script:
a = rand(5,5,0.0,10.0,1,-1);
print(a+a);
on the local runtime: ./build/bin/daphne test.script I get a segmentation fault.

I noticed that you comment out the initalization of the result matrix on the EwBinaryMat.h kernel. Un-commenting that out fixes the error, however the error re-appears when running with vectorized execution (./build/bin/daphnec --vec test.script).

@corepointer
Copy link
Collaborator Author

Thank you, @aristotelis96, for trying out these code changes and finding the bugs and pointing me right to them :D Obviously I did not clean up the code properly and code, that I commented for testing was not put in place again. I also found a few other issues. While cleaning it up and testing it. New version coming soon ©️

@aristotelis96
Copy link
Collaborator

aristotelis96 commented May 21, 2022

Thanks @corepointer for fixing those issues. I found a new bug with this PR (haven't tested which commit might have introduced it though). This bug applies only for the vectorized engine (--vec). There seems to be something wrong with the transpose function. The following script is returning different results.
The result dimensions seem to be fine, however the resulted values of the matrix, are wrong.

a = rand(10,5,0.0,10.0,1,123);
print(t(a));

corepointer added a commit that referenced this pull request May 24, 2022
…ata)

This change introduces a major change how external storage buffers (CUDA memory specifically) are handled. In that regard, the following noteworthy changes are implemented:
* Factor out CUDA allocations from DenseMatrix (one of the initial motivations of issue #191)
* Introduce a mechanism to handle several storage backends and track ranges of a Structure's data.
* To make use of the mechanism, an AllocationDescriptor is passed to create() and getValues() (at the moment only DenseMatrix is supported).
* Allocation descriptors need to implement the IAllocationDescriptor interface. This decouples backend specific dependencies.
* AllocationDescriptorHost and AllocationDescriptorCUDA are implemented atm. The former is more or less a no-op for now.
* The CUDA memory allocation and data movement is moved to the CUDAContext class. It keeps track of its allocations per device. For now this does nothing but can be used to reuse allocations in the future.

Closes #191, Closes #334
@corepointer
Copy link
Collaborator Author

I fixed the transpose.Thanks for spotting this. The issue should have been there before the introduction of the object meta data feature 🤔
Sorry for force pushing - I rebased this branch to the latest in main. This means you will get conflicts pulling updates in this branch. You can work around it by dropping your local version of this branch. Either delete your local 191-metadata-allocations branch and check out again or if you're currently on that branch, do a git fetch and git reset --hard origin/191-metadata-allocations. In any case this will remove changes you might have made to this branch. Uncommitted changes can be stashed of course ;-)

aristotelis96 pushed a commit to aristotelis96/daphne that referenced this pull request Jun 6, 2022
…ata)

This change introduces a major change how external storage buffers (CUDA memory specifically) are handled. In that regard, the following noteworthy changes are implemented:
* Factor out CUDA allocations from DenseMatrix (one of the initial motivations of issue daphne-eu#191)
* Introduce a mechanism to handle several storage backends and track ranges of a Structure's data.
* To make use of the mechanism, an AllocationDescriptor is passed to create() and getValues() (at the moment only DenseMatrix is supported).
* Allocation descriptors need to implement the IAllocationDescriptor interface. This decouples backend specific dependencies.
* AllocationDescriptorHost and AllocationDescriptorCUDA are implemented atm. The former is more or less a no-op for now.
* The CUDA memory allocation and data movement is moved to the CUDAContext class. It keeps track of its allocations per device. For now this does nothing but can be used to reuse allocations in the future.

Closes daphne-eu#191, Closes daphne-eu#334
…ata)

This change introduces a major change how external storage buffers (CUDA memory specifically) are handled. In that regard, the following noteworthy changes are implemented:
* Factor out CUDA allocations from DenseMatrix (one of the initial motivations of issue #191)
* Introduce a mechanism to handle several storage backends and track ranges of a Structure's data.
* To make use of the mechanism, an AllocationDescriptor is passed to create() and getValues() (at the moment only DenseMatrix is supported).
* Allocation descriptors need to implement the IAllocationDescriptor interface. This decouples backend specific dependencies.
* AllocationDescriptorHost and AllocationDescriptorCUDA are implemented atm. The former is more or less a no-op for now.
* The CUDA memory allocation and data movement is moved to the CUDAContext class. It keeps track of its allocations per device. For now this does nothing but can be used to reuse allocations in the future.

Closes #191, Closes #334
The code contained in LoadPartitioning.h does not need to be included all over the place (through inclusion in DaphneUserConfig.h)
@corepointer corepointer merged commit 31a0852 into main Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants