-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remaking NDData to support LSST needs (APE11) #14
Conversation
I'm not sure where the discussion should be, so I'll post it here. I agree on many points of your APE but I don't really like the proposed implementation. Let me summarize the current standard:
So the planned refactor would promote Additionally the current
Sorry this was so negative. I think a lot of your APE makes sense, for example the bitmask, exposureinfo and variance uncertainty would be really great additions. But I somehow disagree on the proposed implementation. I think these could also be integrated without changing (much) of Just to cross-link it here: @crawfordsm already mentioned some or even all of these points in https://groups.google.com/forum/#!topic/astropy-dev/I_co2botSNw |
@MSeifert04, I should say that the primary goal of this effort was to see if we couldn't align NDData better with LSST needs and it was deemed as an important goal by the coordinators, so I gave it a shot. I'll try to address some of the comments, but really, it is probably better if someone representing LSST (e.g., @r-owen) responds to the justification for these since I don't really feel I should speak for them in the end. First, the main goal is not to break backward compatibility, at least for normal uses. This may not be true for those building subclasses that depend on deeper aspects of NDData and associated classes. Are you in that category? Some have questioned the need for the dualism of NDDataBase and NDData (is there any existing compelling use of this split yet?). The need for NDArr: I'm guessing the main justification LSST has for this is for an object for which mathematical operations have unambiguous results. This isn't true for your proposed solution of using a subclass since meta or other attributes do not have well accepted semantics in this case. Yes, they could use these with conventions (ignore these attributes, or not set them) but then you have mixed usages of these objects that isn't clear from the class alone. Having subclasses that disable attributes or capabilities in the parent class is generally not a good idea. The proposed NDArr class removes these sorts of problems. It seems much cleaner to me than multiple flavors of NDData. Imho, when there are many different dimensions to how a class will be subclassed, e.g. through mixins of different kinds or other means), one is probably asking for a lot of trouble and confusion down the road. Reducing the number of such variants is a good thing if it can be done flexibly. Regarding masks: I don't think a property that returns a boolean in some cases and a callable in others is a good solution. Regarding offsets Yet one more mixin. My previous concern applies Regarding ExposureInfo: I wasn't clear enough. I don't think such an attribute is justified. It is something that LSST asked for, but I feel that it could be handled on their part by a special class or subclass if they need it in their software with a simple function to extract the needed info from an NDData instance (or other constructors if they don't need an NDArr instance). Again, I welcome LSST comments and any others that have a better solution to their needs than this APE currently provides. In the end, it is up to the coordinators to decide whether to accept this or some other variant. Cheers, Perry |
I think it may be most useful to split the discussion along with the two types: a simple data array with mask and uncertainty on which mathematical operations are possible ( For One specific comment: I don't quite see why |
@mhvk, I don't think WCS as we usually understand it should be part of it since having it makes operations much more fragile or presents many options, none of which are the obvious default. If two NDArr objects with different WCS are combined, what is to be done? Generally resampling is required and that is a complex operation with many issues to decide. But it is true the slice aspect with regard to the parent is a kind of a wcs, but it isn't any sort of absolute coordinate system. It is much simpler to handle though. It is something to discuss as to whether that information should affect operations on these arrays. |
@perrygreenfield - yes, I agree. It was more that I felt that even an offset in an original frame is best done via a wcs. But this may just be an implementation detail. |
Here is the way LSST views things and what we are trying to emulate with We are keen to avoid having a WCS in For uniformity and expandability, I suggest that WCS stored as part of LSST finds that the metadata is often useful without the pixels, so a single object containing all of it (including the WCS) is a big win. Some use cases:
As @mhvk notes, the offset of the |
APE9.rst
Outdated
NDArr will support optional units since the propagation of units is | ||
unambiguous in mathematical operations. | ||
|
||
Supported numerical operations for NDData are: (+,-,*,/). While the discussion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will NDData (or rather NDArr) really support these numerical operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's a typo. It was missed in editing. NDData here should be NDArr.
Thanks @perrygreenfield and @r-owen. I've made a few comments in line but I've also made a pull request against @perrygreenfield with an alternative solution. The reason for the alternative solution is that I really don't like the idea of NDData being required to use NDArr. I think that limits the flexibility of NDData, which was one of its features, and if it isn't required to use NDArr, then it increases the ambiguity, which makes compatibility more difficult. The bounding box/origin just seems like a very restricted case of WCS (where the only transformation is a translation), but I can see where having a shortcut like that could simplify the coding and be very useful so I could see that being included immediately into NDData regardless of this APE. |
@crawfordsm could you provide an example or two where an NDData implementation using NDArr reduces its flexibility. I'm not sure I see the point being made. |
APE9.rst
Outdated
object that mimics one). [An alternative option is to assume that the mask | ||
attribute of NDArr is a callable whereas the mask attribute of NDData represents | ||
a boolean array; this would be workable, but also likely confusing] | ||
This APE proposes that the new attribute have the name: lone_ranger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆
The problem that arose previously was that if NDData has to have a specific type of |
Sorry, I started replying to the prior email thread during my vacation in July, but then this APE got circulated before I finished my message and I'm just getting back to it after catching up with other things :-(. The basic requirements here sound completely fine to me, with a few nice new features -- except that I still disagree with NDData (or a sub-class) not supporting mathematical operations. While I don't doubt there is valuable experience behind the objection to arithmetic, I'm thinking in particular of my own usage here; I would like to work with high-level objects describing exposures and all their logically-associated information while retaining the major convenience of being able to operate on them concisely, in a clear and ergonomic sequence. I really don't want bookeeping or conceptual operations that take several steps to perform in my scientific process. These are things I want to manipulate hands-on. If the default behaviour for arithmetic doesn't make sense, it can be altered in a similar way to your dmask.apply_flags() example or the attribute to disable variance propagation. Otherwise I am just going to end up extending it to work anyway (in fact I will have to if I re-base DataFile on it). I still haven't thought of a great alternative name, but NDArr seems like it could easily be confused with NDArray. What do people think of something like NDPixData? Then again, people might get used to either. I'm not clear on the memory implications of all these (strong?) references to parent objects, though this is something LSST must have considered carefully. Obviously in the ideal case, if one were to slice a data array, drop the variance/mask somehow and delete the original reference(s) to the parent, then only the main subarray would survive (might be awkward to implement without shooting oneself in the foot?). By the way, Gemini has also been (with a bit of encouragement) re-basing a version of the AstroData class used in our pipeline on NDData. I would favour retaining backwards compatibility insofar as it doesn't distort the design unduly. I'll make a couple more comments in-line then have a better look over the discussion... Thanks (& sorry for being a bit flaky), James. |
BTW: +1 to bit planes (I think we could probably deprecate flags?) and variance (insofar as it's the most convenient thing for me and Gemini, other people's use cases notwithstanding). |
attribute. Making attributes such as these at the top level could be done | ||
through subclasses, though it may lead to many variants. Perhaps the best way | ||
to deal with this is for applications or libraries to state their requirements | ||
for items required to be in meta and leave it at that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have thought it will be useful to have more functionality than just a meta dictionary and WCS. While it's not at all practical to standardize everything comprehensively, some sort of fairly general meta-data abstraction (like Gemini has in our mostly-internal AstroData class) would help write data reduction routines that can be used with data from different instruments etc. (or at least avoid hard-wiring more conventions than are necessary). I thought we also came up with some useful attributes to parse into objects besides WCS at the meeting but will have to dig out the photo. One thing I wonder about, off hand, is tracking associations between NDData objects besides parent-subarray (eg. calibrations that go with a science exposure). By the way, is the implied alternative to an ExposureInfo object here just to keep meta in NDData?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jehturner The devil's in the details. If there is a broad consensus for extra attributes and their meaning, fine. But one could quickly get into a VO-like morass before long. I'd suggest putting that off and settling the other issues first.
@parejkoj - my specific suggestion would be to include a bit plane "USER", which is the one that gets affected by setting |
Deriving a binary mask from bitplanes sounds fine to me. I just need the bitplanes... I should be able to switch to |
So it seems like there may be some consensus on |
Good point; I'd been so focused on
I'll try to formalize all of the recent suggestions here into an update of the APE over the weekend. |
That sounds reasonable. While The most obvious justification for having both classes would seem to be compatibility with old code, but I wonder whether one could be sub-classed from the other, so that it doesn't necessarily break but there is really only one implementation and |
@mwcraig Just curious what the status of the updated APE is? I think we've had enough LSST-astropy agreement now that there shouldn't be many issues from our end, but we'll all probably have to read through the final document again to remind ourselves of everything that we've discussed over the past year+. |
@parejkoj — I’ve been about 2 days from finishing the APE update for almost 2 weeks. Just wrapped up something that had been consuming a lot of time, so hopeful I’ll have the update ready by the weekend. Then one more broad call for comments, and assuming no major objections, the APE gets merged and the coding starts to implement it! |
An update to the APE language is pending (perrygreenfield#3) but in the meantime I'll outline the revisions in the comments here. API additions
API changes
API not changingThough it is unusual to call out things that are not changing, it seems appropriate to do so for
API deprecations in astropy 3 for removal in astropy 4
|
The diagram below summarizes the new class structure; most of the changes are in I do realize the bit plane class is not on here yet... EDIT: strikeout indicates deprecated attributes/properties/methods. Mint green indicates a change (addition or deletion). |
One more update
@parejkoj -- this is ready for more feedback. I know there still needs to be some work on the details of the bit plane object, but I agree with @TallJimbo's advice above that users should be presented with a way to specify which planes are included in calculating the mask rather than access to individual planes. Not sure I'll have time to implement anything more than a very basic bit plane before feature freeze for 3.0 next Friday, but my intent is to get as much of the stuff in this APE in there as I can.... |
@jehturner @crawfordsm @MSeifert04 -- comments welcome! |
@mwcraig: Many thanks for the summary and sorry I can't really do this justice before the code freeze (I only became conscious of the latter recently and just couldn't find much time). I think this sounds quite sensible overall, at least at a glance. Does this mean that the name Will The way you have divided things up for backwards compatibility seems fine, but I'm again thinking that. in principle, there might be some code that could be compatible with either OK, have to go now. Hope that's slightly helpful (& thanks for the work and happy Christmas!). |
It has been 6 years since the last reply, should we just say this won't go in at this point, or are the interested parties going to revisit this? |
Thanks for the reminder. I think we're still interested in looking at differences in how we handle mask bit fields: we've already adopted CCDData for our alert packets, and submitted a PR to include a PSF cutout to support that. I'll see if we can find some time to discuss this more on the LSST side. |
Thank you for your response, @parejkoj ! It has been almost another year since and CoCo reached out to VRO again. It appears while there is definitely interest, unfortunately there is no available bandwidth to pursue this in the near future. Therefore, CoCo has decided to reject this APE for now. However, I will open a follow-up issue (#100) to revisit should both interest and bandwidth return in the future. Thank you all again and sorry this did not work out this time around. |
This is a first draft of an APE to support the issues raised by LSST and the subsequent email discussions regarding that