Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Establish concept of a computing device #52

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions src/Adapt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -52,4 +52,7 @@ include("arrays.jl")
# helpers
include("macro.jl")

# compute devices
include("computedevs.jl")

end # module
160 changes: 160 additions & 0 deletions src/computedevs.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# adaptors for converting abstract arrays to Base.Array


"""
abstract type AbstractComputeUnit

Supertype for arbitrary computing devices (CPU, GPU, etc.).

`adapt(dev::AbstractComputeUnit, x)` adapts `x` for `dev`.

`Sys.total_memory(dev)` and `Sys.free_memory(dev)` return the total and free
memory on the device.
"""
abstract type AbstractComputeUnit end
export AbstractComputeUnit


"""
struct ComputingDeviceIndependent

`get_compute_unit(x) === ComputingDeviceIndependent()` indicates
that `x` is not tied to a specific computing device. This typically
means that x is a statically allocated object.
"""
struct ComputingDeviceIndependent end
export ComputingDeviceIndependent


"""
UnknownComputeUnitOf(x)

`get_compute_unit(x) === ComputingDeviceIndependent()` indicates
that the computing device for `x` cannot be determined.
"""
struct UnknownComputeUnitOf{T}
x::T
end


"""
struct MixedComputeSystem <: AbstractComputeUnit

A (possibly heterogenous) system of multiple compute units.
"""
struct MixedComputeSystem <: AbstractComputeUnit end
export MixedComputeSystem


"""
struct CPUDevice <: AbstractComputeUnit

`CPUDevice()` is the default CPU device.
"""
struct CPUDevice <: AbstractComputeUnit end
export CPUDevice
oschulz marked this conversation as resolved.
Show resolved Hide resolved

adapt_storage(::CPUDevice, x) = adapt_storage(Array, x)

Sys.total_memory(::CPUDevice) = Sys.total_memory()
Sys.free_memory(::CPUDevice) = Sys.free_memory()


"""
abstract type AbstractComputeAccelerator <: AbstractComputeUnit

Supertype for GPU computing devices.
"""
abstract type AbstractComputeAccelerator <: AbstractComputeUnit end
export AbstractComputeAccelerator


"""
abstract type AbstractGPUDevice <: AbstractComputeAccelerator

Supertype for GPU computing devices.
"""
abstract type AbstractGPUDevice <: AbstractComputeAccelerator end
export AbstractGPUDevice


merge_compute_units() = ComputingDeviceIndependent()

@inline function merge_compute_units(a, b, c, ds::Vararg{Any,N}) where N
a_b = merge_compute_units(a,b)
return merge_compute_units(a_b, c, ds...)
end

@inline merge_compute_units(a::UnknownComputeUnitOf, b::UnknownComputeUnitOf) = a
@inline merge_compute_units(a::UnknownComputeUnitOf, b::Any) = a
@inline merge_compute_units(a::Any, b::UnknownComputeUnitOf) = b

@inline function merge_compute_units(a, b)
return (a === b) ? a : compute_unit_mergeresult(
compute_unit_mergerule(a, b),
compute_unit_mergerule(b, a),
)
end

struct NoCUnitMergeRule end

@inline compute_unit_mergerule(a::Any, b::Any) = NoCUnitMergeRule()
@inline compute_unit_mergerule(a::UnknownComputeUnitOf, b::Any) = a
@inline compute_unit_mergerule(a::UnknownComputeUnitOf, b::UnknownComputeUnitOf) = a
@inline compute_unit_mergerule(a::ComputingDeviceIndependent, b::Any) = b

@inline compute_unit_mergeresult(a_b::NoCUnitMergeRule, b_a::NoCUnitMergeRule) = MixedComputeSystem()
@inline compute_unit_mergeresult(a_b, b_a::NoCUnitMergeRule) = a_b
@inline compute_unit_mergeresult(a_b::NoCUnitMergeRule, b_a) = b_a
@inline compute_unit_mergeresult(a_b, b_a) = a_b === b_a ? a_b : MixedComputeSystem()


"""
get_compute_unit(x)::Union{
AbstractComputeUnit,
ComputingDeviceIndependent,
UnknownComputeUnitOf
}

Get the computing device backing object `x`.

Don't specialize `get_compute_unit`, specialize
[`Adapt.get_compute_unit_impl`](@ref) instead.
"""
get_compute_unit(x) = get_compute_unit_impl(Union{}, x)
export get_compute_unit


"""
get_compute_unit_impl(::Type{TypeHistory}, x)::AbstractComputeUnit

See [`get_compute_unit_impl`](@ref).

Specializations that directly resolve the compute unit based on `x` can
ignore `TypeHistory`:

```julia
Adapt.get_compute_unit_impl(@nospecialize(TypeHistory::Type), x::SomeType) = ...
```
"""
function get_compute_unit_impl end


@inline get_compute_unit_impl(@nospecialize(TypeHistory::Type), ::Array) = CPUDevice()

# Guard against object reference loops:
@inline get_compute_unit_impl(::Type{TypeHistory}, x::T) where {TypeHistory,T<:TypeHistory} = begin
UnknownComputeUnitOf(x)
end

@generated function get_compute_unit_impl(::Type{TypeHistory}, x) where TypeHistory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really not a fan of this @generated implementation. I think it would be preferable to follow the Adapt.jl pattern here and to perform a tree-walk. Now it is probably the case that the current infrastructure is not general enough, but if this goes into Adapt.jl we should have one mechanism to do this.

@maleadt and I briefly spoke about this and in general we are not opposed to this functionality being in Adapt.jl

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maleadt and I briefly spoke about this and in general we are not opposed to this functionality being in Adapt.jl

Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really not a fan of this @generated implementation [...] would be preferable to follow the Adapt.jl pattern here and to perform a tree-walk [...] if this goes into Adapt.jl we should have one mechanism to do this.

Do you mean that it's @generated, or what it does? In principle it does a tree walk, I would say (and object's encountered on the way can use a different method/specialization of get_compute_unit_impl. The function could be written without @generated, it could be done with just getfield, map and ntuple. I only wrote it this way to minimize the resulting code and increase type stability. But maybe I'm missing your point here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two points:

  1. I personally aim to reduce reliance on @generated as much as possible. So if we can write this without staged functions that would be great, it often makes the intent clearer and simplifies extendability (and is better for compile times, also better type-stability)
  2. This duplicates the core functionality of Adapt.jl which is essentially a tree walk over structs, so instead of two implementations it would be great to have one, but Tim and I acknowledge that this might be a trickier design.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll try to rewrite without @generated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without staged functions [...] better type-stability

Type stability was actually one of the reasons I went for @generated here. :-)

Ok I tried without @generated @vchuravy, but type stability suffers. One implementation I came up with is

function get_compute_unit_impl_nogen(::Type{TypeHistory}, x::T) where {TypeHistory,T}
    NewTypeHistory = Union{TypeHistory, T}
    fields = ntuple(Base.Fix1(getfield, x), Val(fieldcount(T)))
    merge_compute_units(map(Base.Fix1(get_compute_unit_impl, NewTypeHistory), fields)...)
end

Nice and short, but:

julia> x = (a = 4.2, b = (c = rand(2,3)))
(a = 4.2, b = [0.7927795137326867 0.8930673224466184 0.15921059563423712; 0.6399176439174568 0.4501022168243579 0.3951239506670382])

julia> @inferred get_compute_unit_impl(Union{}, x)
Adapt.CPUDevice()

julia> @inferred get_compute_unit_impl_nogen(Union{}, x)
ERROR: return type Adapt.CPUDevice does not match inferred return type Any

And in truth, this version also uses generated code underneath, because ntuple does. Is there a way to do this without ntuple, efficiently?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vchuravy how would you prefer that I proceed here?

if isbitstype(x)
:(ComputingDeviceIndependent())
else
NewTypeHistory = Union{TypeHistory, x}
impl = :(begin dev_0 = ComputingDeviceIndependent() end)
append!(impl.args, [:($(Symbol(:dev_, i)) = merge_compute_units(get_compute_unit_impl($NewTypeHistory, getfield(x, $i)), $(Symbol(:dev_, i-1)))) for i in 1:fieldcount(x)])
push!(impl.args, :(return $(Symbol(:dev_, fieldcount(x)))))
impl
end
end