-
-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which (FITS) header should be stored in Spectrum1D.meta (and where)? #617
Comments
I'm a bit worried about carrying around a lot of metadata by default in a data structure that could otherwise be relatively lightweight. Many users will want to do various things with the spectrum, perhaps combining it with data from other files. When writing out the data, they can go back to their FITS files and extract the relevant metadata, but I think it's very hard to anticipate in any general way what they will want. Perhaps, at least as an interim solution, this could be handled in the docs by adding example or two of how to copy your FITS metadata into the Spectrum1D object. That leaves the user free to put in the header or headers or select the subset of keywords they want. And also gives them a lot of freedom on the data structure to uses within |
So your suggestion would be as a 4th option to not read any header info (by default) at all? |
Ah. Good point about having to use I don't think one can default to storing both the primary and extension header in the same string or flat dictionary, because there can be naming collisions between the two headers. So one either needs dictionary of strings or a nested dictionary, or something like an HDU list. |
Yes, the |
Since we already have a dependency on |
I think it's totally fine (and good) to collect header/meta data. The problem I always encounter is that there is not enough meta data to fully characterize the data, rather than the other way around. I don't have strong feelings about I would as much as possible like to take advantage of and mirror the functionality which exists already for headers so I think I am in favor of |
Having gone back through this with a bit of hindsight, I think I want to vote for @dhomeier's original option 3. My reasoning is twofold:
2 in particular leads to the impedance mismatch we're struggling with: FITS or similar files in principle can store multiple spectra in one file, but As a middle-ground of #617 (comment) and this option 3, another possibility (Option 3.5?) is to do option 3 and also what @hcferguson suggests in #617 (comment) - then the user can see the "originals" if they really want to, but we tell them the only thing that will be used by downstream tools is what's in As for
That's only one line more, and is more explicit in that it makes it clear that the user is intentionally starting fresh. And that leaves us free to implement option 3 in |
I'm excited to move this discussion forward and am happy with the header data (a dictionary) be in the In the docs, I think we need to include info about
If there's code which needs to be written for the writers/readers, I'm happy to contribute, especially if someone pointed me in the right direction. E.g., open some issues and assign me. :) |
I've done some "round trip" testing using the
I can also update the |
I agree with putting everthing directly in meta because that's the most usable. I do want to point out something though to keep in mind for both users and developers: Operations that the user performs on the |
it's always going to be possible for data and metadata to get out of sync somehow (if i had a nickel for every time i've had this happen with a FITS file...). i don't think dropping metadata on the floor is the answer, though. anything in |
There seems to be consensus that the Primary Header, e.g., whatever you'd put in HDU0, "should" be in There is not a consensus around what to do with extension headers and I am currently ok with that. I propose that docs be updated to reflect the general expectation that a header be put in |
I understood those two just to be the opposite, storing the cards directly as keys of |
Yes, you're right, I've changed my mind. I'm gonna edit my comment. |
Since there are a lot of open issues on this topic, I went ahead and made a discussion to consolidate these discussions and so we can decide how to proceed - we have a ticket on our board this sprint to start this work. I have a proposed plan there (storing everything in 'meta' as Header objects, and keeping the primary header separate for those formats that want to preserve that structure, or combine them) but I would appreciate some input before I start this work to make sure we are in agreement. Thanks! |
I think most of the information should go into the primary header, according to my understanding of multi extension files their use case is to store different planes of the same data not just a pile of that, for that case another model should be used. I've also seen multidetector instruments use MEFs, where each detector is stored as a separate extension . In either case the user must be able to control where a given metadata is going to be stored. |
Loaders for FITS files are saving the full information from the header cards in
Spectrum1D.meta['header']
. In #608 (comment) I noted some ambiguity as to which header should be stored.Most of the
default_loaders
for such formats, namelyapogee
,hst_*
,muscles
,subaru_pfs
andsdss
spec_loader
, are pulling the info from the FITS header of the first HDU, while the spectral data are read from one (or several) of the extension HDUs.This primary HDU header will usually contain general information on the target, observing run etc., while the HDU with the spectrum will include more specific info on the dataset. In principle both could be of interest. Should there be a general recommendation for new default (or custom) loaders (and a fix to existing ones) how to handle this?
I the following options:
Leave it at the primary HDU header as with the majority of current
default_loaders
.Pros: no API changes
Cons: discarding potentially useful information; inconsistent with Astropy's
io.fits
behaviour forTable
(see below)Always read the header for the HDU containing the data.
This is already implicitly done by
generic_spectrum_from_table
andspectrum_from_column_mapping
as they are loadingtable.meta
under the hood.Pros: Matches the
Table.meta
data; possibility to useio.fits
mechanism to filter relevant keywords (i.e. excludingREMOVE_KEYWORDS
and all, that have already been used in defining column formats and units).Cons: Losing info from the primary HDU header (among the current formats with data in a BINTABLE extension, at least the Apogee, HST, MUSCLES and SDSS also have potentially interesting information in the primary HDU).
Some formats, e.g.
apogee
, are parsing spectrum data from several different extensions.Combine all header info from primary HDU and any extension HDU from which data are read.
This is already implemented in the
jwst_reader
by using the (FITS)header.extend
method to update the primary header with the data extension header.Pros: No or minimal loss of information.
Cons: Possibly excessive accumulation of values in
spectrum.meta['header']
; writing the spectrum back will produce a FITS file with different headers (but currently there is no guarantee that any of the loaders will write back identically formatted files anyway).I tend to the 3rd option, using a similar implementation to
jwst_reader
. It might still be preferable to use the AstropyTable.meta
dict for updating to take advantage of theio.fits
mechanism for stripping excess cards.Addendum
In the above I was assuming that the
Spectrum1D
format always has the header info collected in one dictionary within the meta dictionary asSpectrum1D.meta['header']
; this is also suggested in the docs for creating a custom loader by prescribingmeta = {'header': header}
and followed by most of the
default_loaders
.But in fact neither
parsing_utils
norjwst_reader
follow this scheme, but instead put all saved header cards directly intoSpectrum1D.meta
.I therefore suggest to first settle on a consistent scheme for organising the
meta
dict.The text was updated successfully, but these errors were encountered: