-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions regarding required fields and missing data #1143
Comments
If the field/group with the broken link is optional, I would say it is still NeXus compliant. A broken link is equivalent to the field/group not being present. Validators need to be smart enough to know that. If the the field/group with the broken link is required, I would say it is not NeXus compliant. A validator would fail when checking the field type, field dimensionality, field/group attributes (like @Units) and group members. |
With nan's and empty strings you could make anything NeXus compliant. In fact you don't need any data at all. The spirit of required is that it has to have a value. I know it's quite harsh but in this case the people at the instrument/source/experiment side should try to find a way to produce/measure that kind of information. Adding a nan should be really a desparate last resort. And please don't tell anyone you're doing it ;-). |
Dear Colleagues,
We are engaged in experimental sciences, and sometime all or part of an
experiment fails. Perhaps
we decide to throw away such partial data, but there are many times when
preserving and using
partial data is worth the effort. Using a ".", "?" or nan is simply an
honest disclosure of certain
data value not being available, not something to hide.
Regards,
Herbert
…On Wed, Jun 29, 2022 at 5:14 PM woutdenolf ***@***.***> wrote:
As discussed in #966
<#966>, there are cases
when no useful estimates can be deduced from the configuration.
With nan's and empty strings you could make anything NeXus compliant. In
fact you don't need any data at all.
The spirit of *required* is that it *has* to have a value. I know it's
quite harsh but in this case the people at the instrument/source/experiment
side should try to find a way to produce/measure that kind of information.
Adding a nan should be really a desparate last resort. And please don't
tell anyone you're doing it ;-).
—
Reply to this email directly, view it on GitHub
<#1143 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABB6EAJIC2DHAT65PC2YRKDVRS4CZANCNFSM52B2RMIQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sure but wouldn't it be better to then produce a file which does not pass a NeXus validation instead of making the validator think all required information is present? |
In the design page of the manual, when it is talking specifically about application definitions:
If you write data knowing that it is different than the application definition requires, you should expect that an application expecting to processing your data should fail. Also, if the author(s) of an application definition specify something as required, but yet non compliant data, as you suggest, is provided, then the author should be challenged what is the true requirement. NeXus should not be in the business of splitting this more finely. We should not be describing how to violate the contract of an application definition. |
Thank you all for your replies. I agree, if data for a required field is missing, validators should not accept those files. Adding "no data" defaults would clearly circumvent that, and basically making the required data optional. I completely agree that that is not desirable - this is why I opened issue #966. So to sum up, when aiming to write files following a NeXus application definition, but some required information is missing, it would be better to leave out the missing data and ultimately produce non-compliant files. Of course, that makes it impossible to guarantee that the files our software produces is really fulfilling that definition. Ultimately, it is in the hands of those defining the application definition to decide what data is crucial for further processing and what is not. This might be an opportunity for the NXmx community to revisit the required fields and decide if they are truly required, rendering a file unusable with processing software if the information is missing. |
@soph-dec I agree with your summary here (thank you) and with your call for a review of required fields in NXmx. An issue here could be a good place for this. Alternatively, as @yayahjb mentioned, the ACA SIG meeting on best practices could be a good place (July 20th). I'd like to add a further note, that the standard is not just for software interoperability but for long term archival and provenance. There is a lot of metadata that is completely irrelevant for standard processing methods, including the example you mentioned, source name. I just checked and the DIALS processing suite doesn't even look for NXsource when reading the data. That doesn't make it less required though for NXmx. |
For those attending the ACA today, we'll be discussing this issue during the XFEL session this afternoon, at 4:45PM Pacific time. It is a hybrid session, but you need to be a registered attendee to get access to the zoom link. |
Hi @soph-dec, @yayahjb and I reviewed this issue during the XFEL session at the ACA with a well attended audience. Specifically regarding your point:
I thought this was a good thing to do, so before the session I reviewed the NXmx spec and compiled a list of fields that are required in NXmx but that wouldn't necessarily prevent data processing with standard software suites if they were absent. These included:
We then showed this list and asked the attendees if they thought these parameters should continue to required. The general consensus seemed to be that they should be. But! Today I am thinking about this more. Of these fields, the only ones that are not necessarily known during data collection by the Dectris DAQ systems would be:
Would you agree that all of the other ones should certainly remain required? If so, the question that remains is, could these two non-deterministic fields be moved to recommended so that if they are absent, they only generate warnings instead of errors? I don't know that the community would have as strong an opinion on this. Regardless, I think that simplifies the discussion at least. If you agree, I'd propose closing this issue and continuing the discussion over in #966. Reasonable? |
Hi @phyy-nx, thank you for looking into it and discussing it with the attendees at the ACA, I really appreciate it.
Besides those two, also SOURCE.name, SAMPLE.name and INSTRUMENT.name cannot be known unless the user inputs that information. We also do not want to force users to give that information, so for now, I would opt for documenting that this information is needed if the files should follow NXmx. If no names are given, the corresponding datasets will be missing in the hdf5 files.
Yes, it makes sense to close this issue, since the original questions regarding defaults and links have been answered. |
In some cases, from a software point of view, it is not possible to guarantee that required data is available when a file is written.
For example, in NXmx the field source name is required. But when this is not set by the user, there is no way for the software to guess that information. Another example is entry/end_time_estimated. As discussed in #966, there are cases when no useful estimates can be deduced from the configuration.
The question is, what information should software that aims to produce NXmx compliant files write in situations like that? In #966 it was suggested that for entry/end_time_estimated, which is of type NX_DATE_TIME, we should use a "." or a "?".
Are there general definitions for default values that symbolize that the data is not available? If not, could that be defined?
Possibilities could be:
(I asked basically the same question here, but I thought it would be easier to make a separate issue for this.)
Another thing we were wondering is regarding soft links. Let's say we have a required field in the master file and we realize it as a soft link to an external file. If for some reason that external file is missing or corrupt, the link will be broken. Would the master file still be considered to be following NeXus? If not, what should we do in such cases? Use VDS with a fill value that matches the defaults mentioned above? Are there other options?
The text was updated successfully, but these errors were encountered: