Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import/export for AIEs via vmem API #25

Draft
wants to merge 20 commits into
base: iree-aie
Choose a base branch
from

Conversation

ypapadop-amd
Copy link
Collaborator

This PR allows sharing GPU memory with AIE agents via the vmem API. Sharing via other alternatives (hsa_amd_memory_lock, hsa_amd_interop_map_buffer, hsa_amd_agents_allow_access) is not allowed and will throw an error.

@ypapadop-amd ypapadop-amd self-assigned this Sep 16, 2024
@@ -861,7 +847,6 @@ class Runtime {
MemoryHandle* mem_handle;
AddressHandle* address_handle;
uint64_t offset;
uint64_t mmap_offset;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not used anywhere and I accidentally typed it in. Removing it so people don't do the same mistake.

@@ -508,19 +507,22 @@ hsa_status_t MemoryRegion::AllowAccess(uint32_t num_agents,

bool cpu_in_list = false;

std::set<GpuAgentInt*> whitelist_gpus;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never used.

@@ -584,25 +585,25 @@ hsa_status_t MemoryRegion::Lock(uint32_t num_agents, const hsa_agent_t* agents,
return HSA_STATUS_SUCCESS;
}

std::set<core::Agent*> whitelist_gpus;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may want to make a separate change for this and get David's feedback. He's mentioned this code to me in the past, although I have to admit I never bothered to understand how it works. Weird that it'd be unused though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll take it to the public ROCR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

David has mentioned to me in the past that we'll need to think about how to add AIEs to these lists. So they should definitely be used somehow. So, yeah get his feedback there.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does look like whitelist_gpus are not used anymore and we forgot to remove them at some point. Feel free to remove them.

atgutier: I think you are talking about the rvd filers in amd_topology.c. A user can export an environment variable: ROCR_VISIBLE_DEVICES=0,2 on a system with lets say 3 GPUs, and this will effectively hide the second GPU to the users of ROCr (i.e the device will not be listed when someone uses the iterate_agents APIs).
It does look like the whitelist_gpus was used to keep track of visible gpus at some point and affects when initDma() was called, but this is not necessary anymore.

Comment on lines +3205 to +3214
// For now, this is only supported for KFD due to the call to
// GetAmdgpuDeviceArgs
if (agent->device_type() != core::Agent::DeviceType::kAmdGpuDevice)
return HSA_STATUS_ERROR_INVALID_AGENT;

// Create handle by exporting and importing the memory from the owning agent
hsa_status_t status =
agent->ExportDMABuf(memoryHandleIt->first, size, &dmabuf_fd, &offset);
if (status != HSA_STATUS_SUCCESS)
return status;
Copy link
Collaborator Author

@ypapadop-amd ypapadop-amd Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a create shareable handle procedure.

Comment on lines 3251 to 3254
// MAYBE? auto status = agentPermsIt->second.RemoveAccess();
hsa_status_t status = agentPermsIt->second.targetAgent->Unmap(
&(agentPermsIt->second.ldrm_bo), va, mappedHandleIt->second.offset,
size);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking if this is indeed RemoveAccess (it seems so).

reinterpret_cast<uint64_t>(va), drm_perm(perms), AMDGPU_VA_OP_MAP);
if (ret) return HSA_STATUS_ERROR;
// CPU agents use a different offset
offset = reinterpret_cast<uint64_t>(mappedHandle->drm_cpu_addr);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure about why this is different.

Copy link
Collaborator

@atgutier atgutier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to do a deeper review of the core changes, but wanted to request some changes, particular regarding the agent API first.

runtime/hsa-runtime/core/driver/xdna/amd_xdna_driver.cpp Outdated Show resolved Hide resolved
runtime/hsa-runtime/core/inc/agent.h Outdated Show resolved Hide resolved
runtime/hsa-runtime/core/driver/xdna/amd_xdna_driver.cpp Outdated Show resolved Hide resolved
runtime/hsa-runtime/core/driver/xdna/amd_xdna_driver.cpp Outdated Show resolved Hide resolved
@@ -278,6 +278,35 @@ class Agent : public Checked<0xF6BC25EB17E6F917> {
// @brief Returns an array of regions owned by the agent.
virtual const std::vector<const core::MemoryRegion*>& regions() const = 0;

// @brief Maps the memory associated with the handle.
virtual hsa_status_t Map(void *handle, void *va, size_t offset, size_t size,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be members of the Driver as the other OS driver functions are, not the agent. It'd warrant much broader/deeper discussion if we want to expose these in the Agent API.

Copy link
Collaborator Author

@ypapadop-amd ypapadop-amd Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a tricky part. GpuAgent::ImportDMABuf uses GpuAgent::libDrmDev() which is set by hsaKmtGetAMDGPUDeviceHandle(node_id(), &device_handle);

If we were to move it into the driver, for every import we would be adding a lookup for the driver type (we should probably cache that one) and an ioctl.

The rest of the functions can be moved to the driver, but then we'll have one function in the agent and the rest in the driver. Maybe have the import / export in the agent and the rest in the driver?

We also don't have a CPU driver to put some of these functions.

Copy link
Collaborator

@atgutier atgutier Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are some things we need to think about wrt the driver interface. I think for now I'm more concerned about the interface than lookup overhead. I was actually thinking we should have a reference to the driver object inside the agent. Would that help?

For some of the areas where there already seem to be sidebands thru the Agent, we can rethink those.

Copy link
Collaborator Author

@ypapadop-amd ypapadop-amd Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are some things we need to think about wrt the driver interface. I think for now I'm more concerned about the interface that lookup overhead.

I'd like to avoid unnecessary obvious overhead though, esp. if it's something that will require extensive refactoring.
 

I was actually thinking we should have a reference to the driver object inside the agent. Would that help?

I was planning to do the driver lookup at AIEAgent construction and store it as a pointer since the driver is guaranteed to exist and outlasts the lifetime of the agent.

For some of the areas where there already seem to be sidebands thru the Agent, we can rethink those.

We can pass the agent in, but then we create a cyclic dependency between agent and driver. Maybe I can dig a little bit more and see if that libDrm can be expressed as another handle? But we are going past the scope of this work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking we should have a reference to the driver object inside the agent. Would that help?

AIEAgents have now a pointer to their driver in ff00230

@@ -584,25 +585,25 @@ hsa_status_t MemoryRegion::Lock(uint32_t num_agents, const hsa_agent_t* agents,
return HSA_STATUS_SUCCESS;
}

std::set<core::Agent*> whitelist_gpus;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may want to make a separate change for this and get David's feedback. He's mentioned this code to me in the past, although I have to admit I never bothered to understand how it works. Weird that it'd be unused though.

@ypapadop-amd
Copy link
Collaborator Author

Test will fail until 0d77583 is merged.

Is there any reason not to sync the repo with ROCR-Runtime:amd-staging and merge in amd-staging to the iree-aie branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants