Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: new SDSS V datatype loaders #1107

Merged
merged 31 commits into from
Feb 13, 2024
Merged

Conversation

rileythai
Copy link
Contributor

@rileythai rileythai commented Nov 15, 2023

Within the 5th generation of the SDSS, there are several new data products, and many of the access methods for existing datatypes have been changed.

This pull request will add new default loaders for the new/updated SDSS-V spectra data products in a new file called sdss_v.py, along with unit tests in test_sdss_v.py, which will allow for automatic loading of SDSS-V data into Spectrum1D/List objects, and directly into jdaviz.

For mwm and spec files, the first HDU with data is loaded with the Spectrum1D loader, whereas all spectra within a file are loaded with the SpectrumList loader.

Remaining questions:

  • How can/should a user specify a given spectra from files with stacked spectra?
    • mwm files can contain multiple stacked spectra from APOGEE and BOSS spectrographs. spec-full files can contain both the coadd and individual visits.
    • I've left it to default to coadd (HDU1) for spec files, and the first HDU with data for mwm files.
  • SpectrumList inherits a basic reading method from the Spectrum1D methods (I don't want it to do this because of how some access methods will encounter non-spectra when we want to import everything)
  • jdaviz does not yet have an implementation for Spectrum1D objects of nD flux. Do we want to accomodate for that here? Would it be it's expected behaviour anyways? (i.e. 1 spectrum in a Spectrum1D object only, not several flux arrays)
    • since spectra in mwmVisit files is stacked, similarly to how it is in apStar files. If we keep current access methods, this means it would need a double specification for accessing a single spectrum (the observatory/instrument + the specific spectrum from that obs/instrument)

Copy link
Contributor

@havok2063 havok2063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start. One thing we might want to have a think on, or get some feedback from the specutils folks, is for the multi loaders, if we should use SpectrumList or Spectrum1D. If all the spectra in that file are on the same wavelength solution (and shape), we may want to use Spectrum1D as the container to take advantage of its numpy array operations. If they really are different, then we can stick with SpectrumList, which is just a regular python list.

mpl_preamble.py Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
specutils/io/default_loaders/sdss_v.py Outdated Show resolved Hide resolved

"""
# Orderly with an OrderedDict
common_meta = OrderedDict([])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dicts are ordered by default now, as of python 3.7. So this could be common_meta = {}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep for backward compatibility -- needs to work on python 3.4->3.6 as per contributing guidelines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to keep it for backward compatibility with older Python versions (3.4->3.6) as per guidelines, but I've changed all OrderedDict() metadata calls to dict() in b61a668.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those docs are out of date. The minimum python for spectuils is 3.8. Python 3.4-3.7 is already end-of-life.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I didn't realize our contributor docs were so out of date. I'll open a PR to fix that soon.

Comment on lines 61 to 80
for key in hdulist[0].header.keys():
# exclude these keys
if key.startswith(("TTYPE", "TFORM", "TDIM")) or key in (
"",
"COMMENT",
"CHECKSUM",
"DATASUM",
"NAXIS",
"NAXIS1",
"NAXIS2",
"XTENSION",
"BITPIX",
"PCOUNT",
"GCOUNT",
"TFIELDS",
):
continue

common_meta[key.lower()] = hdulist[0].header.get(
key) # add key to dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't documented very well but the header is expected to be stored within meta in a key called header. I think we can also just dump the complete primary header here, something like meta['header'] = hdulist[0].header

All the other keys you've added can either be added at the top level of meta, or if you'd rather they live in the header, you can add them there. I think where they live only matters when writing out the Spectrum1D object to a file and reading it back in again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see discussions in #617 and #1102

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b61a668

This should be fine then. I haven't tested the tabular writer wcs1d_fits_writer , but I'd assume leaving this header as is in the metadata would be fine.

mwmVisit/Star might fail on writing like that though, since it partially stores some metadata within the data component of the BinTableHDU (SNR, telescope, observation date, MJD), so I've left that in the meta dictionary for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too worried about people writing the Spectrum1D objects back to files. Currently, both writers would write out files that are not in the original format, and not that useful. Many of the SDSS products in here may not write correctly at the moment.

There is a longer ongoing discussion about improving the default writers to make more fits-like things. We could write custom writers for our loaders, but I think that's out of scope for this PR.

spectral_axis=spectral_axis,
flux=flux,
uncertainty=e_flux,
# mask=mask,
Copy link
Contributor

@havok2063 havok2063 Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advocate for including the mask where possible for all the loaders. The SDSS mask arrays are a bit different than the astropy masked arrays used in specutils, which are basic boolean True/False. See the use in the SDSS manga loaders of examples for converting from an SDSS maskbit array to the boolean one that specutils expects. It's not ideal but it's something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fd999ce -- I'm not sure if this is how the bitmask works (do we consider 0 valid or invalid?), and also the coadd mask is done with two methods (AND and OR) for BOSS spectra.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for SDSS, a value of 0 is a good pixel (valid) and >0 is a bad pixel (invalid). The Spectrum1D mask attribute is an "array where values in the flux to be masked are those that astype(bool) converts to True. (For example, integer arrays are not masked where they are 0, and masked for any other value.)". For SDSS products that have a single mask array, flipping the condition like in the manga loader should work.

For the BOSS AND/OR masks, I think we can ask @Sean-Morrison whether we should use one or the other as the default input for the mask attribute, or if they should be combined, and if so, how?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think for BOSS we want to use OR masks, as that would result in a cleaner plot, but we might want to check with @joelbrownstein for what we did in the old SDSS webapp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think once we fix the mask arrays, this PR is ready to be marked as ready and reviewed by the larger specutils group. They should be able to give us more insight into our other questions!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rileythai Others should perhaps comment, but I think we need to convert all the mask arrays explicitly to boolean True/False arrays, similar to what is done in manga. I believe a fair amount of other specutils methods/functions assume this and do things like s.flux[~s.mask] for selection of good values. If we retain the integer values, this selection won't work, and people need to explicitly do s.mask==False.

@rileythai
Copy link
Contributor Author

This is a great start. One thing we might want to have a think on, or get some feedback from the specutils folks, is for the multi loaders, if we should use SpectrumList or Spectrum1D. If all the spectra in that file are on the same wavelength solution (and shape), we may want to use Spectrum1D as the container to take advantage of its numpy array operations. If they really are different, then we can stick with SpectrumList, which is just a regular python list.

The main issue is jdaviz. It won't load Spectrum1D objects with 2D flux arrays (not implemented yet). As you said -- if someone who knows a little more amount specutils + jdaviz could comment on this, it'd be great.

@havok2063
Copy link
Contributor

The main issue is jdaviz. It won't load Spectrum1D objects with 2D flux arrays (not implemented yet). As you said -- if someone who knows a little more amount specutils + jdaviz could comment on this, it'd be great.

If it's primarily a jdaviz issue, then it should be fixed there. We can open a new issue in jdaviz if there isn't one already. specutils has more utility outside of Jdaviz, so we want to make sure we're doing the "right" thing here. I'm not suggesting we change anything, just saying.

Comment on lines +164 to +167
# reduce flux array if 1D in 2D np array
# NOTE: bypasses jdaviz specviz NotImplementedError, but could be the expected output for specutils
if flux.shape[0] == 1:
flux = flux[0]
e_flux = e_flux[0]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we want any jdaviz specific logic in here since it's a downstream package and it could change the use this apStar spectrum1d outside of that context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be its expected behavior anyways? If there's only a single spectrum within the data file, is the flux expected to still be loaded as an array within an array (2D, so without this block), or just a 1D array (with this block). Just wondering what the convention for this is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Spectrum1D just takes what you pass in as flux, so np.array([1,2,3,4]), and [np.array([1,2,3,4])] retain their shapes of (4,) and (1,4) respectively. My point is only that we shouldn't put logic in here that is specifically addressing something in Jdaviz, but if this is the behaviour we want for the Spectrum1D object, then that's certainly ok.

Copy link
Contributor

@rosteen rosteen Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reducing a 2D flux array where one of the axes is degenerate to a 1D is reasonable, I would expect a 1D flux to come out of the reader for a 1D spectrum.

rileythai added a commit to rileythai/specutils-sdss-loaders that referenced this pull request Nov 23, 2023
fix as per @Sean-Morrison 's suggestion in astropy pull req [astropy#1107](astropy#1107)

could be reverted in future, in which case this commit can just be deleted
@rileythai rileythai marked this pull request as ready for review November 23, 2023 00:39
@havok2063
Copy link
Contributor

@rileythai I think there's something wrong with the SDSS-V identify functions on the loaders. The following code should correctly identify the format of this file either as SDSS-V spec multi or SDSS-V spec.

import specutils
from specutils.io.registers import identify_spectrum_format

file = '/Users/Brian/Work/sdss/sas/sdsswork/bhm/boss/spectro/redux/v6_0_6/spectra/lite/017057/59631/spec-017057-59631-27021598108289694.fits'

identify_spectrum_format(file, specutils.SpectrumList)
[]

This function basically loops over all the identifiers in the formats table, io_registry.get_formats() for the input object type, and returns the format of the best match. If this can return successfully, then Spectrum1D.read(file) will work without manually specifying the format.

Examples for MaNGA, and spec-lite for SDSS-IV.

# manga file
identify_spectrum_format("redux/v3_1_1/8485/stack/manga-8485-1901-LOGCUBE.fits.gz", specutils.Spectrum1D)
'MaNGA cube'

# eboss file
identify_spectrum_format("eboss/spectro/redux/v5_10_0/spectra/lite/3606/spec-3606-55182-0537.fits", specutils.SpectrumList)
'SDSS-III/IV spec'

@rileythai
Copy link
Contributor Author

@rileythai I think there's something wrong with the SDSS-V identify functions on the loaders. The following code should correctly identify the format of this file either as SDSS-V spec multi or SDSS-V spec.

I had it check for an OBSERVAT column in the primary HDU, which doesn't exist in the file you've used. I've removed it since it probably doesn't appear in other files (99eccef), so it should work now for other similar files.

In [1]: import specutils

In [2]: from specutils.io.registers import identify_spectrum_format

In [3]: path = "/home/riley/uni/rproj/data/"

In [4]: identify_spectrum_format(path + "spec-017057-59631-27021598108289694.fits", specutils.SpectrumList)
Out[4]: ['SDSS-V spec', 'SDSS-V spec multi']

@rosteen
Copy link
Contributor

rosteen commented Feb 8, 2024

Thanks for this contribution, I'm trying to make some time to review in the next week. In the meantime, would you resolve the conflict with main? Cheers.

@rosteen rosteen added the io label Feb 8, 2024
@rosteen rosteen added this to the v1.x milestone Feb 8, 2024
Copy link

codecov bot commented Feb 12, 2024

Codecov Report

Attention: 32 lines in your changes are missing coverage. Please review.

Comparison is base (c646007) 71.11% compared to head (1fdb4a2) 72.21%.

Files Patch % Lines
specutils/io/default_loaders/sdss.py 21.21% 26 Missing ⚠️
specutils/io/default_loaders/sdss_v.py 96.64% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1107      +/-   ##
==========================================
+ Coverage   71.11%   72.21%   +1.10%     
==========================================
  Files          61       62       +1     
  Lines        4248     4427     +179     
==========================================
+ Hits         3021     3197     +176     
- Misses       1227     1230       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

working on loaders
new helper funcs:
 - _fetch_metadata to perform grab of common metadata
 - _fetch_flux_unit to get flux unit from the given HDU and convert it to an astropy unit object
can fully load boss files now

other changes:
- commented out abstract @ things to make it so ipython autoreloads

notes:
- flux unit where for BOSS files?
- what the heck are spAll, zAll, and zList HDUs?
- does the InverseVariance need a unit?
added mwmVisit and mwmStar loaders.

updated demonstration notebook accordingly and output to PDF

del: test.pdf and test.ipynb

del: secret SDSS-V data
able to now load all BOSS spec directly with the same underlying code.

required refactoring methods into BOSS_spec loaders
going to now write implementation test

add
mwm confirmed working

still todo:
- add HDU not-specified message
- merge the mwm types into a single 2 loaders
- confirm all other loaders work and add to __all__
- refactored BOSS spec methods and mwm spec methods into single functions for simplicity
- all loaders WORKING!! (except apStar multi)
- all the documentation + type hinting (excluding outputs)
- changed variable names to standard types used in specutils
- TODO: the apStar multi-loader is confusing, so it remains unimplemented for now.
- CHECK: do I need to clean the files of zero vals?
- TODO: BUNIT pulls for spec and mwm files
- TODO: check with data team what mwm files are needed
- currently non-functional because of zero values in x-axis

- deleted test_implementation.ipynb for policy reasons
- jdaviz hates nan and zero flux, so they have to be removed
- TODO: open issue on jdaviz repo about nan and zero flux bug

the bug originates in the x_min x_max values used for the redshift slider for line_lists (somehow) on nan and zero flux values in the Spectrum object.
apStar loader not yet tested because file is of length 1 (no visits)
mwm loaders will SKIP any DATASUM=0 because Spectrum1D cannot be instantiated with zero data
rileythai and others added 17 commits February 13, 2024 11:32
fixes a jdaviz issue regarding a 1D flux in a 2D object, where it gets confused and explodes

i will put an issue in for it

this fix is different from the previous as it keeps all zero and NaN flux points
need someone to help me write a BinTableHDU for mwm files...
still need to write mwm dummy file for the tests

there's also a foobar variable check for the metadata
now obtains header from PrimaryHDU in the HDUList, any data that was previously accessed through it has been removed too
keeping .jukit incase anyone else uses vim-jukit during dev
Spectrum1D intializer converts any 0 to valid values. I'm assuming that zeroes in the bitmask means that its valid, as per manga.py
fix as per @Sean-Morrison 's suggestion in astropy pull req [astropy#1107](astropy#1107)

could be reverted in future, in which case this commit can just be deleted
OBSERVAT column not in everything so i changed it, also adding another LOGLAM check to the coadd HDU check.
instead of specifying a hdu on Spectrum1D loaders for spec and mwm types, it will not find the first HDU with data, or in the case of spec, just use the coadd.

this means that it works directly with jdaviz for those two datatypes correctly now.

there are no user facing methods, and I don't want to break anything, but it should be noted that these datafiles can contain several spectra, which inherently limits this.

in theory, I could put everything as a Spectrum1D nD flux object, but I'm pretty sure that breaks sometimes for jdaviz.
- force masks to be boolean prior to entering initializer
- add mwm file tests based on dummy file (credit to @andycasey for those dummy file generators)
- add more mwm file tests for failures
- added checks to see if file is empty for mwm files based on datasum (failsafe)
Copy link
Contributor

@rosteen rosteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks (and double thanks for adding tests)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants