-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc: various Dimension internal fixes #2205
Changes from 5 commits
76a3c4b
fa1a9b7
87d8d0e
f7ab007
441de0f
ab160dd
97f1cc8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -147,6 +147,12 @@ def preprocess(clusters, options=None, **kwargs): | |
found = [] | ||
for c1 in list(queue): | ||
distributed_aindices = c1.halo_scheme.distributed_aindices | ||
h_indices = set().union(*[(d, d.root) | ||
for d in c1.halo_scheme.loc_indices]) | ||
|
||
# Skip if the Halo echange would end up outside its need iteration space | ||
FabioLuporini marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. couple of typos here |
||
if h_indices and not h_indices & dims: | ||
FabioLuporini marked this conversation as resolved.
Show resolved
Hide resolved
|
||
continue | ||
|
||
diff = dims - distributed_aindices | ||
intersection = dims & distributed_aindices | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,7 +24,7 @@ | |
from devito.symbolics import estimate_cost | ||
from devito.tools import (DAG, OrderedSet, Signer, ReducerMap, as_tuple, flatten, | ||
filter_sorted, frozendict, is_integer, split, timed_pass, | ||
timed_region) | ||
timed_region, contains_val) | ||
from devito.types import Grid, Evaluable | ||
|
||
__all__ = ['Operator'] | ||
|
@@ -526,6 +526,7 @@ def _prepare_arguments(self, autotune=None, **kwargs): | |
edges = [(i, i.parent) for i in self.dimensions | ||
if i.is_Derived and i.parent in set(nodes)] | ||
toposort = DAG(nodes, edges).topological_sort() | ||
|
||
futures = {} | ||
for d in reversed(toposort): | ||
if set(d._arg_names).intersection(kwargs): | ||
|
@@ -560,18 +561,23 @@ def _prepare_arguments(self, autotune=None, **kwargs): | |
# a TimeFunction `usave(t_sub, x, y)`, an override for `fact` is | ||
# supplied w/o overriding `usave`; that's legal | ||
pass | ||
elif is_integer(args[k]) and args[k] not in as_tuple(v): | ||
elif is_integer(args[k]) and not contains_val(args[k], v): | ||
raise ValueError("Default `%s` is incompatible with other args as " | ||
"`%s=%s`, while `%s=%s` is expected. Perhaps you " | ||
"forgot to override `%s`?" % | ||
(p, k, v, k, args[k], p)) | ||
|
||
args = kwargs['args'] = args.reduce_all() | ||
|
||
# DiscreteFunctions may be created from CartesianDiscretizations, which in | ||
# turn could be Grids or SubDomains. Both may provide arguments | ||
discretizations = {getattr(kwargs[p.name], 'grid', None) for p in overrides} | ||
discretizations.update({getattr(p, 'grid', None) for p in defaults}) | ||
discretizations.discard(None) | ||
# Remove subgrids if multiple grids | ||
if len(discretizations) > 1: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to coin a term for any CartesianDiscretization that is not the main Grid Also, we need to distinguish I think between SubDomain and SubGrid. The latter being a coarser representation of the (main) Grid I don't exactly what names to use, but I feel an increasing need for a set of CartesianDiscretization attributes -- suitably overridden in the various subclasses -- along the lines of is_root (is_main? is_domain?) etc and then here just |
||
discretizations = {g for g in discretizations | ||
if not any(d.is_Derived for d in g.dimensions)} | ||
for i in discretizations: | ||
args.update(i._arg_values(**kwargs)) | ||
|
||
|
@@ -584,6 +590,9 @@ def _prepare_arguments(self, autotune=None, **kwargs): | |
if configuration['mpi']: | ||
raise ValueError("Multiple Grids found") | ||
try: | ||
# Take biggest grid, i.e discard grids with subdimensions | ||
grids = {g for g in grids if not any(d.is_Sub for d in g.dimensions)} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this needs to be better abstracted in my opinion I wonder whether we could have a SubGrid object (returned via There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My main though is that this needs to be part of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if we drop this line and modify the one above as
This should be enough to filter away the Subdomain grids? Also: can we not at least add a property to Grid such as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No this will create issue for example if there is only one "sub-grid" this would end up empty and leads to missing arguments. But looks like it's a duplicate from above now might be able to remove that one. |
||
# First grid as there is no heuristic on how to choose from the leftovers | ||
grid = grids.pop() | ||
except KeyError: | ||
grid = None | ||
|
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -214,6 +214,9 @@ def DeviceIteration(self): | |||
def Prodder(self): | ||||
return self.lang.Prodder | ||||
|
||||
def _n_device_pointers(self, *args, **kwargs): | ||||
return 0 | ||||
|
||||
|
||||
class DeviceAwareMixin(object): | ||||
|
||||
|
@@ -325,6 +328,12 @@ def _(iet): | |||
|
||||
return _initialize(iet) | ||||
|
||||
def _n_device_pointers(self, iet): | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. first, the nitpicks:
Now the actual issue:
I think what you're actually after are the so called There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No it's not. This only detect the amount of "present" in the iet which muct match:
|
||||
functions = FindSymbols().visit(iet) | ||||
devfuncs = [f for f in functions if f.is_Array and f._mem_local] | ||||
|
||||
return len(devfuncs) | ||||
|
||||
def _is_offloadable(self, iet): | ||||
""" | ||||
True if the IET computation is offloadable to device, False otherwise. | ||||
|
@@ -336,7 +345,8 @@ def _is_offloadable(self, iet): | |||
functions = FindSymbols().visit(iet) | ||||
buffers = [f for f in functions if f.is_Array and f._mem_mapped] | ||||
hostfuncs = [f for f in functions if not is_on_device(f, self.gpu_fit)] | ||||
return not (buffers and hostfuncs) | ||||
|
||||
return not (hostfuncs and buffers) | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. leftover change? |
||||
|
||||
|
||||
class Sections(tuple): | ||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -295,15 +295,13 @@ def _select_candidates(self, candidates): | |
except TypeError: | ||
pass | ||
|
||
# At least one inner loop (nested) or | ||
# we do not collapse most inner loop if it is an atomic reduction | ||
if not i.is_ParallelAtomic or nested: | ||
FabioLuporini marked this conversation as resolved.
Show resolved
Hide resolved
|
||
collapsable.append(i) | ||
collapsable.append(i) | ||
|
||
# Give a score to this candidate, based on the number of fully-parallel | ||
# Iterations and their position (i.e. outermost to innermost) in the nest | ||
score = ( | ||
int(root.is_ParallelNoAtomic), | ||
self._n_device_pointers(root), # Outermost offloadable | ||
int(len([i for i in collapsable if i.is_ParallelNoAtomic]) >= 1), | ||
int(len([i for i in collapsable if i.is_ParallelRelaxed]) >= 1), | ||
-(n0 + 1) # The outermost, the better | ||
|
@@ -377,6 +375,12 @@ def _make_partree(self, candidates, nthreads=None): | |
ncollapsed=ncollapsed, nthreads=nthreads, | ||
**root.args) | ||
prefix = [] | ||
elif all(i.is_ParallelRelaxed for i in candidates) and nthreads is not None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how about just
actually, to mirror what we have in the first
|
||
body = self.HostIteration(schedule='static', | ||
parallel=nthreads is not self.nthreads_nested, | ||
ncollapsed=ncollapsed, nthreads=nthreads, | ||
**root.args) | ||
prefix = [] | ||
else: | ||
# pragma ... for ... schedule(..., expr) | ||
assert nthreads is None | ||
|
@@ -428,11 +432,6 @@ def _make_nested_partree(self, partree): | |
if self.nhyperthreads <= self.nested: | ||
return partree | ||
|
||
# Loop nest with atomic reductions are more likely to have less latency | ||
# keep outer loop parallel | ||
if partree.root.is_ParallelAtomic: | ||
return partree | ||
|
||
# Note: there might be multiple sub-trees amenable to nested parallelism, | ||
# hence we loop over all of them | ||
# | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -94,7 +94,7 @@ def __init_finalize__(self, *args, function=None, **kwargs): | |
# a reference to the user-provided buffer | ||
self._initializer = None | ||
if len(initializer) > 0: | ||
self.data_with_halo[:] = initializer | ||
self.data_with_halo[:] = initializer[:] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. interesting, what's the difference here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No difference, just debug leftover, the issue was calling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just being paranoid here... are we completely sure both the performance and the semantics of this assignment are exactly the same as master, except for that corner case u mention above? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I worry the new one might be slightly slower There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well the view gets created, so it's not really 0 impact.... but yeah, I agree it's gonna be 0.0000001 in practice 😬 |
||
else: | ||
# This is a corner case -- we might get here, for example, when | ||
# running with MPI and some processes get 0-size arrays after | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For homogeneity with the rest of the code below, I would rather write:
Note: I'm using
_defines
aboveThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d_defines ?