Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NXexperiment and NXexperimentdata #580

Closed
wants to merge 1 commit into from

Conversation

jacobfilik
Copy link
Contributor

Contributed definitions to allow NXdata groups to be related to each other and an NXdetector.

With the aim of making writing software for visualisation and processing of data collections with multiple detectors in the same scan easier by explicitly describing how the NXdata groups are released to each other and the corresponding NXDetector.

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://definition.nexusformat.org/nxdl/3.1 ../nxdl.xsd">
<doc>
Describe the contents of an NXentry resulting from a data collection using multiple detectors,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per #469 & #270, make the first line a one-line summary since only the first line will be used to generate the summary documentation about each NXDL shown on http://download.nexusformat.org/doc/html/classes/contributed_definitions/index.html

Describe the contents of an NXentry resulting from a data collection using multiple detectors,
generating raw and derived data, potentially being scanned across multiple dimensions.

It is not uncommon for a beamline at a synchrotron to collect data from multiple detectors in during a single scan. NeXus files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove in

xsi:schemaLocation="http://definition.nexusformat.org/nxdl/3.1 ../nxdl.xsd"
>
<doc>
Describe all the plottable data from a single detector in a single entity. To be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per #469 & #270, make the first line a one-line summary since only the first line will be used to generate the summary documentation about each NXDL shown on http://download.nexusformat.org/doc/html/classes/contributed_definitions/index.html


(**required**) :ref:`NXdata` at least one NXdata, related to the NXdetector
</doc>
<attribute name="DATANAME_origin">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add nameType="any" (per #562) to indicate that DATANAME is not required text in the name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check, the issue suggests using this in fields, whereas this is an attribute. No other attributes (such as AXISNAME_indices) follow this convention.

name of an :ref:`NXdata` that must be in the same :ref:`NXexperimentdata`.
</doc>
</attribute>
<attribute name="primary">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change name of this attribute to default (per #380)

documentation should mention that an absolute HDF5 path address (starts with /, such as /entry/instrument/detector1/ccd3) must be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to have to use the absolute HDF5 path for this, just the name of (one of) the NXdata in this class.

This would be expected to be used more like the signal attribute in NXData, that is, from my collection of objects, which is the one i should actually care about. If it is a path to a location outside the group, it loses all the connections in the group.
For example, NXexperimentdata contains

Xspress3 (/entry/instrument/detector)[NXDetector]
AllElementSum(/entry/AllElementSum)[NXData]
FeKAlpha(/entry/FeKAlpha)[NXData]

with default wanting to refer to FeKalpha (which has an origin of AllElementSum, which has an origin of Xspress3). If default was /entry/FeKalpha I have to strip the HDF5 path to make any use of the origin attributes.
There is the requirement that all the NXdata groups in this class must have unique names (which could be explicitly stated in the doc) but if there are mulitple NXdata, derived from the same NXDetector, with the same name, that suggests to me bigger issues with the file.

Since this behaviour breaks with the use of default in the issue (by not using the full path), but would work with the external default definition (I assume there is no reason why the default in an NXentry couldn't contain the full path to the NXdata in an NXexperimentaldata), maybe a different name would be better.

@rayosborn
Copy link
Contributor

Thanks for making this suggestion, which looks interesting, but I'm struggling to understand exactly how these classes would be used. Could you provide an example file tree, such as your SAXS example, with its various detectors?

@jacobfilik
Copy link
Contributor Author

jacobfilik commented Jun 23, 2017

Happy to, but it will take a few days to find suitable example data and construct the file by hand.

The main issue the classes aim to solve is relating NXdata to each other and NXdetector groups. For simple files with one NXdata and one NXdetector the problem is trivial.

The minute there are multiple NXdata and NXdetector groups there is no way to relate them, except potentially by comparing names/strings which can't be good. (If there is an official NeXus way to achieve this that I have overlooked I am happy to be educated). Relating an NXdata to the NXdetector is crucial, if you want to, say, convert from pixel to q space.

Also, for a grid scan, if one NXdata contains the detector sum (which would be a 2D image), and another contains the detector data (a 3 or 4D block) there seems to be no way to encode that each pixel of the sum is derived from a frame of the detector. Again for one detector you could guess, but for N detectors you are back to comparing names.

I have no emotional attachment to these classes. @markbasham and I have discussed around this issue for a long time, and this seems to be one method of encapsulating the information we need, without just adding more fields to the existing base classes. I'm sure there are many elegant solutions, and I am happy to discuss the most NeXus way to get the information we need from the file.

@rayosborn
Copy link
Contributor

It is possible that the functionality that you are looking for is already provided by NeXus links. Here is part of the tree structure of one of the example files distributed with NeXpy:

>>> print(chopper.tree)
chopper:NXroot
  entry:NXentry
    analysis = 'TOFNDGS'
    data:NXdata
      @axes = ['polar_angle' 'time_of_flight']
      @signal = 'data'
      data = int32(148x750)
      polar_angle -> /entry/instrument/detector/polar_angle
        @target = '/entry/instrument/detector/polar_angle'
      time_of_flight -> /entry/instrument/detector/time_of_flight
        @target = '/entry/instrument/detector/time_of_flight'
    instrument:NXinstrument
      detector:NXdetector
        polar_angle = float32(148)
          @target = '/entry/instrument/detector/polar_angle'
        time_of_flight = float32(751)
          @target = '/entry/instrument/detector/time_of_flight'

So polar_angle and time_of_flight are added to the /entry/data NXdata group as links to fields that are stored in the /entry/instrument/detector group. The target attribute can be used to identify the NXdetector group from which the NXdata axes are derived. One of the rationales for treating these as soft links within NeXus (even though they are stored as hard links in the HDF5 file) is so that you can identify what detector group it comes from.

It also allows the mixing of axes from multiple groups. For example, if you are scanning both incident energy and detector angle, the former could be stored in a NXmonochromator group and the latter in a NXdetector group.

Your method may be more transparent, although high-level APIs can make the association clearer. For example, in the attached image from NeXpy, the chain-link icon shows that the item is a link. If you right-click on the link, one of the options is Show Link, which will automatically select the target field in the detector group.

tree-view

Let us know if this meets your needs, or if you think new classes will still be useful. We will certainly consider them if you do.

@jacobfilik
Copy link
Contributor Author

Thank you, I had seen the @target attribute in Nexus files before, but I could never find anything in the documentation, so assumed it was something internal. This would work perfectly for my first use case (better in-fact including the axes example you show), but I would want to add some doc to the NXData class to describe this use-case. Assuming NXdata is the main place @target is used.

I think the second use case (multiple NXData, some derived from processing others) could be met by clever use of @target and NXprocess. I will have a think about this and possibly submit an alternative pull request, so the relative merits of each can be discussed.

Thanks again for taking time to review and discuss this request, it is greatly appreciated.

@prjemian
Copy link
Contributor

Good point. The @target attribute is more general than just the NXdata base class, it is common to all links in a NeXus data file. And you have identified another TODO item in the NeXus manual. The only documentation is an h5dump example of a NeXus link.

That documentation will be addressed with (new) issue #581.

@prjemian
Copy link
Contributor

This is where the @target attribute is defined in NeXus: http://download.nexusformat.org/doc/html/nxdl_desc.html#linktype

@prjemian
Copy link
Contributor

The Design chapter of the manual describes NeXus links and gives an example use of the @target attribute.

prjemian added a commit that referenced this pull request Jun 25, 2017
@jacobfilik
Copy link
Contributor Author

Brilliant! Thank you, I will start making sure target attributes are added appropriately. I assume it is also correct to add them when the data is externally linked (e.g. unless the data is only written directly in the NXdata i should expect/write @target)?

Finally, is there an approved manner of representing that a certain NXdata contains the result of processing another NXdata? I am considering including an NXprocess in the processed 'NXdata`, showing the provenance. If this is acceptable (and ideally recommended) I will happily close this pull request, as all the functionality I hoped to add is already there.

@prjemian
Copy link
Contributor

There is not a standard set for NXdata contains NXprocess, it makes some sense.

Another way is to place both NXprocess and NXprocess as siblings within NXentry. There may be one or more such NXprocess groups in a numbered sequence, describing a series of processing events. Each one of those might result in a NXdata group.

@prjemian
Copy link
Contributor

The NAPI code makes no @target attribute assignment for externally linked data.

You'd best avoid using the @target attribute with external file links. Should also be added to the documentation? Seems reasonable at this point.

prjemian added a commit that referenced this pull request Jun 25, 2017
prjemian added a commit that referenced this pull request Jun 25, 2017
@jacobfilik
Copy link
Contributor Author

Thank you so much for this again, I just want to check with one final example and then I will close the request (although at this point it sounds like I should try the NAPI if that is the reference implementation).

If I have some data in myfile.h5, and I make an eternal link to it in my NeXus file in the NXdetector at /entry/instrument/detector/data, this should have no @target (it would make absolutely no sense to have one).
If then in the same NeXus file I make an NXdata /entry/mydata/data which is just a link to the data in the NXdetector this should have a @target, because it is important to show that this is the data from the detector (and it is a link to another dataset in the same file, even though the data is in an external file)?

@prjemian
Copy link
Contributor

The procedure sounds right. BUT, HDF5 will not let you be do this ("Interfile hard links are not allowed"). Just tried it with Python and h5py, following the external file example in the manual. Continuing that example:

In [62]: f = h5py.File(FILE_HDF5_MASTER, "a")

In [63]: ds = f["/entry/data/counts"]

In [64]: ds.name
Out[64]: u'/entry/instrument/detector/counts'

In [65]: ds.attrs["target"] = ds.name

In [66]: f["/entry/nexus_link"] = ds
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-66-659843c8241d> in <module>()
----> 1 f["/entry/nexus_link"] = ds

h5py\_objects.pyx in h5py._objects.with_phil.wrapper (C:\Minonda\conda-bld\h5py_1474482483473\work\h5py\_objects.c:2705)
()

h5py\_objects.pyx in h5py._objects.with_phil.wrapper (C:\Minonda\conda-bld\h5py_1474482483473\work\h5py\_objects.c:2663)
()

D:\Apps\Anaconda\lib\site-packages\h5py\_hl\group.pyc in __setitem__(self, name, obj)
    275
    276         if isinstance(obj, HLObject):
--> 277             h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
    278
    279         elif isinstance(obj, SoftLink):

h5py\_objects.pyx in h5py._objects.with_phil.wrapper (C:\Minonda\conda-bld\h5py_1474482483473\work\h5py\_objects.c:2705)
()

h5py\_objects.pyx in h5py._objects.with_phil.wrapper (C:\Minonda\conda-bld\h5py_1474482483473\work\h5py\_objects.c:2663)
()

h5py\h5o.pyx in h5py.h5o.link (C:\Minonda\conda-bld\h5py_1474482483473\work\h5py\h5o.c:3619)()

RuntimeError: Unable to create link (Interfile hard links are not allowed)

Conclusion is that when you want to make a link to an object in a NeXus HDF5 data file and that object is itself externally linked, then instead of making a hard link at the new location, you make an external file link at the new location back to the data source. Clear enough?

@rayosborn
Copy link
Contributor

Your external link example does reveal a limitation in using the @target attribute to logically associate two groups. The only way to add such an attribute to the externally linked HDF5 dataset is for it to be added to the dataset itself in the remote file, where it might not make sense, e.g., it might be an invalid path. In practice, or at least in my own experience, external links are most often used for very large multidimensional arrays, i.e., signal arrays rather than one-dimensional axis arrays. If you stored the axis arrays in the local file, you could still use them to associate the NXdata and NXdetector groups, as in the chopper example, even if the signal array is externally linked in both groups. But I agree that external links expose a flaw in this approach.

@jacobfilik
Copy link
Contributor Author

That's a shame, the @target attribute looks like exactly what I want, but a significant amount of our data is externally linked.

I don't suppose there is a plan to move the @target attribute off the data set into the group (in the same way @signal was moved)? Defining @DATASETNAME_target with all the axes and axes_indicies etc, feels logical to me.

(As an aside, should I close this pull request - i'm now not convinced this is the most NeXus approach - and open an issue pointing at it? Are more people likely to read pull request comments or issues?)

@prjemian
Copy link
Contributor

The @target attribute can be used with both fields (a.k.a. HDF5 datasets) and groups to indicate a link to another item in the same file. No plans to change this as this is one of the older internal structures in NeXus, many software tools expect this to remain as they know it now.

Perhaps we re-examine your original intention, which was to associate the NXdata and NXdetector groups. I agree that closing this PR without a merge is the right course, and open a new issue to discuss how best to implement your intention. The scope may expand to include associations between NXprocess, NXlog, and NXnote as well.

@mkoennecke
Copy link
Contributor

Hi,

to add my 2c's worth: In a normal in-file link look at the target attribute. For an external link, NAPI
writes the napimount attribute. Which contains a URL like string pointing into the external file.

So, yes, we have the use case for NXexperiment_data well covered.

@rayosborn
Copy link
Contributor

rayosborn commented Jun 27, 2017

Hi Mark,
This discussion has now been transferred to #583. I'm not sure the @napimount attribute does what Jacob needs. It points to the object in the remote file, but Jacob is asking how to link an NXdata and NXdetector group in the same file that happen to have the same external link. Also, does the @napimount attribute still exist once the HDF5 file has been written? I thought it was just used by the API in creating the external link. In HDF5, the external link in the parent file cannot have different attributes to the dataset in the remote file, so once the link has been made and saved to the HDF5 file, these extra attributes must disappear. That is what happens in the Python nexusformat API.

P.S. I see that Jacob was already making a similar point.

@jacobfilik
Copy link
Contributor Author

jacobfilik commented Jun 27, 2017

Hi,

Thanks, but if this points to the external file, the only way I can find the corresponding NXDetector is to parse all the NXDetectors looking for the same url which I don't really want to do. Also, is the napimount atttribute well documented NeXus standard (I haven't seen it before, but I am certainly not a NeXus expert)?

Finally structuring code like:

if (contains target attribute) {
//Can go directly to NXdetector
} else if (contains napimount attribute) {
//walk detectors finding same url
} else {
//walk detectors finding same HDF5 dataset id
}

doesn't seem ideal. I admit that the NXexperiment/NXexperimentdata is not the way we want to do things, but the above can't be the recommended approach?

@jacobfilik jacobfilik closed this Jun 28, 2017
@jacobfilik
Copy link
Contributor Author

jacobfilik commented Jun 28, 2017

Not the right solution for this functionality, discussion moved to issue #583

prjemian added a commit that referenced this pull request Jun 30, 2017
prjemian added a commit that referenced this pull request Jun 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants