How to treat name of root node? #81

TomNicholas · 2022-04-29T16:14:58Z

In #76 I refactored the tree structure to use a path-like syntax. This includes referring to the root of a tree as "/", same as in cd / in a unix-like filesystem.

This makes accessing nodes and variables of nodes quite neat, because you can reference nodes via absolute or relative paths:

In [23]: from datatree.tests.test_datatree import create_test_datatree

In [24]: dt = create_test_datatree()

In [25]: dt['set2/a']
Out[25]: 
<xarray.DataArray 'a' (x: 2)>
array([2, 3])
Dimensions without coordinates: x

In [26]: dt['/set2/a']
Out[26]: 
<xarray.DataArray 'a' (x: 2)>
array([2, 3])
Dimensions without coordinates: x

In [27]: dt['./set2/a']
Out[27]: 
<xarray.DataArray 'a' (x: 2)>
array([2, 3])
Dimensions without coordinates: x

This refactor also made DataTree objects only optionally have a name, as opposed to be before when they were required to have a name. (They still have a .name attribute now, it just can be None.)

In [28]: dt.name

Normally this doesn't matter, because when assigned a .parent a node's .name property will just point to the key under which it is stored as a child. This echoes the way an unnamed DataArray can be stored in a Dataset.

In [29]: import xarray as xr

In [30]: ds = xr.Dataset()

In [31]: da = xr.DataArray(0)

In [32]: ds['foo'] = da

In [33]: ds['foo'].name
Out[33]: 'foo'

However this means that the root node of a tree is no longer required to have a name in general.

This is good because

As a user you normally don't care about the name of the root when manipulating the tree, only the names of the nodes,
It makes the __init__ signature simpler as name is no longer a required arg,
It most closely echoes how filepaths work (the filesystem root "/" doesn't have another name),
Roundtripping from Zarr/netCDF files still seems to work (see test_io.py),

Roundtripping from dictionaries still works if the root node is unnamed

In [35]: d = {node.path: node.ds for node in dt.subtree}

In [36]: roundtrip = DataTree.from_dict(d)

In [37]: roundtrip
Out[37]: 
DataTree('None', parent=None)
│   Dimensions:  (y: 3, x: 2)
│   Dimensions without coordinates: y, x
│   Data variables:
│       a        (y) int64 6 7 8
│       set0     (x) int64 9 10
├── DataTree('set1')
│   │   Dimensions:  ()
│   │   Data variables:
│   │       a        int64 0
│   │       b        int64 1
│   ├── DataTree('set1')
│   └── DataTree('set2')
├── DataTree('set2')
│   │   Dimensions:  (x: 2)
│   │   Dimensions without coordinates: x
│   │   Data variables:
│   │       a        (x) int64 2 3
│   │       b        (x) float64 0.1 0.2
│   └── DataTree('set1')
└── DataTree('set3')

In [38]: dt.equals(roundtrip)
Out[38]: True

But it's bad because

Roundtripping from dictionaries doesn't work anymore if the root node is named

In [39]: dt2 = dt

In [40]: dt2.name = "root"

In [41]: d2 = {node.path: node.ds for node in dt2.subtree}

In [42]: roundtrip2 = DataTree.from_dict(d2)

In [43]: roundtrip2
Out[43]: 
DataTree('None', parent=None)
│   Dimensions:  (y: 3, x: 2)
│   Dimensions without coordinates: y, x
│   Data variables:
│       a        (y) int64 6 7 8
│       set0     (x) int64 9 10
├── DataTree('set1')
│   │   Dimensions:  ()
│   │   Data variables:
│   │       a        int64 0
│   │       b        int64 1
│   ├── DataTree('set1')
│   └── DataTree('set2')
├── DataTree('set2')
│   │   Dimensions:  (x: 2)
│   │   Dimensions without coordinates: x
│   │   Data variables:
│   │       a        (x) int64 2 3
│   │       b        (x) float64 0.1 0.2
│   └── DataTree('set1')
└── DataTree('set3')

In [44]: dt2.equals(roundtrip2)
Out[44]: False

The signature of the DataTree.from_dict becomes a bit weird because if you want to name the root node the only way to do it is to pass a separate name argument, i.e.

In [45]: dt3 = DataTree.from_dict(d, name='root')

In [46]: dt3
Out[46]: 
DataTree('root', parent=None)
├── DataTree('set1')
│   │   Dimensions:  ()
│   │   Data variables:
│   │       a        int64 0
│   │       b        int64 1
│   ├── DataTree('set1')
│   └── DataTree('set2')
├── DataTree('set2')
│   │   Dimensions:  (x: 2)
│   │   Dimensions without coordinates: x
│   │   Data variables:
│   │       a        (x) int64 2 3
│   │       b        (x) float64 0.1 0.2
│   └── DataTree('set1')
└── DataTree('set3')

What do we think about this behaviour? Does this seem like a good design, or annoyingly finicky?

@jhamman I notice that in the code you wrote for the io you put a note about not being able to specify a root group for the tree. Is that related to this question? Do you have any other thoughts on this?

The text was updated successfully, but these errors were encountered:

jhamman · 2022-05-03T05:06:35Z

@jhamman I notice that in the code you wrote for the io you put a note about not being able to specify a root group for the tree. Is that related to this question? Do you have any other thoughts on this?

I believe my comment was referring to supplying the root group when writing a datatree such that the child pahts are prepended with the group id (i.e. dt.to_netcdf('foo.nc', group='/foo/bar/')). I don't think there is anything that kept me from implementing that feature apart from my goal of an MVP at the time. I also think the changes in #76 (and your description above) will work with this feature if or when someone implements it. (tldr; I don't think there is a problem here)

TomNicholas added design question IO Representation of particular file formats as trees labels Apr 29, 2022

TomNicholas mentioned this issue Apr 29, 2022

to/from_dict #82

Merged

4 tasks

TomNicholas closed this as completed Jul 12, 2022

TomNicholas mentioned this issue Jan 5, 2023

Improving the string repr #184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to treat name of root node? #81

How to treat name of root node? #81

TomNicholas commented Apr 29, 2022 •

edited

Loading

jhamman commented May 3, 2022

How to treat name of root node? #81

How to treat name of root node? #81

Comments

TomNicholas commented Apr 29, 2022 • edited Loading

jhamman commented May 3, 2022

TomNicholas commented Apr 29, 2022 •

edited

Loading