add: new SDSS V datatype loaders #1107

rileythai · 2023-11-15T23:01:27Z

Within the 5th generation of the SDSS, there are several new data products, and many of the access methods for existing datatypes have been changed.

This pull request will add new default loaders for the new/updated SDSS-V spectra data products in a new file called sdss_v.py, along with unit tests in test_sdss_v.py, which will allow for automatic loading of SDSS-V data into Spectrum1D/List objects, and directly into jdaviz.

For mwm and spec files, the first HDU with data is loaded with the Spectrum1D loader, whereas all spectra within a file are loaded with the SpectrumList loader.

Remaining questions:

How can/should a user specify a given spectra from files with stacked spectra?
- mwm files can contain multiple stacked spectra from APOGEE and BOSS spectrographs. spec-full files can contain both the coadd and individual visits.
- I've left it to default to coadd (HDU1) for spec files, and the first HDU with data for mwm files.
SpectrumList inherits a basic reading method from the Spectrum1D methods (I don't want it to do this because of how some access methods will encounter non-spectra when we want to import everything)
jdaviz does not yet have an implementation for Spectrum1D objects of nD flux. Do we want to accomodate for that here? Would it be it's expected behaviour anyways? (i.e. 1 spectrum in a Spectrum1D object only, not several flux arrays)
- since spectra in mwmVisit files is stacked, similarly to how it is in apStar files. If we keep current access methods, this means it would need a double specification for accessing a single spectrum (the observatory/instrument + the specific spectrum from that obs/instrument)

havok2063

This is a great start. One thing we might want to have a think on, or get some feedback from the specutils folks, is for the multi loaders, if we should use SpectrumList or Spectrum1D. If all the spectra in that file are on the same wavelength solution (and shape), we may want to use Spectrum1D as the container to take advantage of its numpy array operations. If they really are different, then we can stick with SpectrumList, which is just a regular python list.

mpl_preamble.py

.gitignore

specutils/io/default_loaders/sdss_v.py

havok2063 · 2023-11-16T00:03:28Z

specutils/io/default_loaders/sdss_v.py

+
+    """
+    # Orderly with an OrderedDict
+    common_meta = OrderedDict([])


I think dicts are ordered by default now, as of python 3.7. So this could be common_meta = {}.

Should we keep for backward compatibility -- needs to work on python 3.4->3.6 as per contributing guidelines?

We might need to keep it for backward compatibility with older Python versions (3.4->3.6) as per guidelines, but I've changed all OrderedDict() metadata calls to dict() in b61a668.

I think those docs are out of date. The minimum python for spectuils is 3.8. Python 3.4-3.7 is already end-of-life.

Wow, I didn't realize our contributor docs were so out of date. I'll open a PR to fix that soon.

havok2063 · 2023-11-16T00:09:19Z

specutils/io/default_loaders/sdss_v.py

+    for key in hdulist[0].header.keys():
+        # exclude these keys
+        if key.startswith(("TTYPE", "TFORM", "TDIM")) or key in (
+                "",
+                "COMMENT",
+                "CHECKSUM",
+                "DATASUM",
+                "NAXIS",
+                "NAXIS1",
+                "NAXIS2",
+                "XTENSION",
+                "BITPIX",
+                "PCOUNT",
+                "GCOUNT",
+                "TFIELDS",
+        ):
+            continue
+
+        common_meta[key.lower()] = hdulist[0].header.get(
+            key)  # add key to dict


This isn't documented very well but the header is expected to be stored within meta in a key called header. I think we can also just dump the complete primary header here, something like meta['header'] = hdulist[0].header

All the other keys you've added can either be added at the top level of meta, or if you'd rather they live in the header, you can add them there. I think where they live only matters when writing out the Spectrum1D object to a file and reading it back in again.

Also see discussions in #617 and #1102

b61a668

This should be fine then. I haven't tested the tabular writer wcs1d_fits_writer , but I'd assume leaving this header as is in the metadata would be fine.

mwmVisit/Star might fail on writing like that though, since it partially stores some metadata within the data component of the BinTableHDU (SNR, telescope, observation date, MJD), so I've left that in the meta dictionary for now.

I'm not too worried about people writing the Spectrum1D objects back to files. Currently, both writers would write out files that are not in the original format, and not that useful. Many of the SDSS products in here may not write correctly at the moment.

There is a longer ongoing discussion about improving the default writers to make more fits-like things. We could write custom writers for our loaders, but I think that's out of scope for this PR.

havok2063 · 2023-11-16T04:59:41Z

specutils/io/default_loaders/sdss_v.py

+        spectral_axis=spectral_axis,
+        flux=flux,
+        uncertainty=e_flux,
+        # mask=mask,


I would advocate for including the mask where possible for all the loaders. The SDSS mask arrays are a bit different than the astropy masked arrays used in specutils, which are basic boolean True/False. See the use in the SDSS manga loaders of examples for converting from an SDSS maskbit array to the boolean one that specutils expects. It's not ideal but it's something.

fd999ce -- I'm not sure if this is how the bitmask works (do we consider 0 valid or invalid?), and also the coadd mask is done with two methods (AND and OR) for BOSS spectra.

for SDSS, a value of 0 is a good pixel (valid) and >0 is a bad pixel (invalid). The Spectrum1D mask attribute is an "array where values in the flux to be masked are those that astype(bool) converts to True. (For example, integer arrays are not masked where they are 0, and masked for any other value.)". For SDSS products that have a single mask array, flipping the condition like in the manga loader should work.

For the BOSS AND/OR masks, I think we can ask @Sean-Morrison whether we should use one or the other as the default input for the mask attribute, or if they should be combined, and if so, how?

I would think for BOSS we want to use OR masks, as that would result in a cleaner plot, but we might want to check with @joelbrownstein for what we did in the old SDSS webapp

I think once we fix the mask arrays, this PR is ready to be marked as ready and reviewed by the larger specutils group. They should be able to give us more insight into our other questions!

@rileythai Others should perhaps comment, but I think we need to convert all the mask arrays explicitly to boolean True/False arrays, similar to what is done in manga. I believe a fair amount of other specutils methods/functions assume this and do things like s.flux[~s.mask] for selection of good values. If we retain the integer values, this selection won't work, and people need to explicitly do s.mask==False.

rileythai · 2023-11-17T00:19:38Z

This is a great start. One thing we might want to have a think on, or get some feedback from the specutils folks, is for the multi loaders, if we should use SpectrumList or Spectrum1D. If all the spectra in that file are on the same wavelength solution (and shape), we may want to use Spectrum1D as the container to take advantage of its numpy array operations. If they really are different, then we can stick with SpectrumList, which is just a regular python list.

The main issue is jdaviz. It won't load Spectrum1D objects with 2D flux arrays (not implemented yet). As you said -- if someone who knows a little more amount specutils + jdaviz could comment on this, it'd be great.

havok2063 · 2023-11-17T18:28:33Z

The main issue is jdaviz. It won't load Spectrum1D objects with 2D flux arrays (not implemented yet). As you said -- if someone who knows a little more amount specutils + jdaviz could comment on this, it'd be great.

If it's primarily a jdaviz issue, then it should be fixed there. We can open a new issue in jdaviz if there isn't one already. specutils has more utility outside of Jdaviz, so we want to make sure we're doing the "right" thing here. I'm not suggesting we change anything, just saying.

havok2063 · 2023-11-17T18:30:47Z

specutils/io/default_loaders/sdss_v.py

+        # reduce flux array if 1D in 2D np array
+        # NOTE: bypasses jdaviz specviz NotImplementedError, but could be the expected output for specutils
+        if flux.shape[0] == 1:
+            flux = flux[0]
+            e_flux = e_flux[0]
+


not sure if we want any jdaviz specific logic in here since it's a downstream package and it could change the use this apStar spectrum1d outside of that context.

Would this be its expected behavior anyways? If there's only a single spectrum within the data file, is the flux expected to still be loaded as an array within an array (2D, so without this block), or just a 1D array (with this block). Just wondering what the convention for this is.

I think Spectrum1D just takes what you pass in as flux, so np.array([1,2,3,4]), and [np.array([1,2,3,4])] retain their shapes of (4,) and (1,4) respectively. My point is only that we shouldn't put logic in here that is specifically addressing something in Jdaviz, but if this is the behaviour we want for the Spectrum1D object, then that's certainly ok.

I think reducing a 2D flux array where one of the axes is degenerate to a 1D is reasonable, I would expect a 1D flux to come out of the reader for a 1D spectrum.

@Sean-Morrison

fix as per @Sean-Morrison 's suggestion in astropy pull req [astropy#1107](astropy#1107) could be reverted in future, in which case this commit can just be deleted

havok2063 · 2023-12-21T18:27:31Z

@rileythai I think there's something wrong with the SDSS-V identify functions on the loaders. The following code should correctly identify the format of this file either as SDSS-V spec multi or SDSS-V spec.

import specutils
from specutils.io.registers import identify_spectrum_format

file = '/Users/Brian/Work/sdss/sas/sdsswork/bhm/boss/spectro/redux/v6_0_6/spectra/lite/017057/59631/spec-017057-59631-27021598108289694.fits'

identify_spectrum_format(file, specutils.SpectrumList)
[]

This function basically loops over all the identifiers in the formats table, io_registry.get_formats() for the input object type, and returns the format of the best match. If this can return successfully, then Spectrum1D.read(file) will work without manually specifying the format.

Examples for MaNGA, and spec-lite for SDSS-IV.

# manga file
identify_spectrum_format("redux/v3_1_1/8485/stack/manga-8485-1901-LOGCUBE.fits.gz", specutils.Spectrum1D)
'MaNGA cube'

# eboss file
identify_spectrum_format("eboss/spectro/redux/v5_10_0/spectra/lite/3606/spec-3606-55182-0537.fits", specutils.SpectrumList)
'SDSS-III/IV spec'

rileythai · 2023-12-21T22:10:59Z

@rileythai I think there's something wrong with the SDSS-V identify functions on the loaders. The following code should correctly identify the format of this file either as SDSS-V spec multi or SDSS-V spec.

I had it check for an OBSERVAT column in the primary HDU, which doesn't exist in the file you've used. I've removed it since it probably doesn't appear in other files (99eccef), so it should work now for other similar files.

In [1]: import specutils

In [2]: from specutils.io.registers import identify_spectrum_format

In [3]: path = "/home/riley/uni/rproj/data/"

In [4]: identify_spectrum_format(path + "spec-017057-59631-27021598108289694.fits", specutils.SpectrumList)
Out[4]: ['SDSS-V spec', 'SDSS-V spec multi']

rosteen · 2024-02-08T20:19:37Z

Thanks for this contribution, I'm trying to make some time to review in the next week. In the meantime, would you resolve the conflict with main? Cheers.

codecov · 2024-02-12T14:53:56Z

Codecov Report

Attention: 32 lines in your changes are missing coverage. Please review.

Comparison is base (c646007) 71.11% compared to head (1fdb4a2) 72.21%.

Files	Patch %	Lines
specutils/io/default_loaders/sdss.py	21.21%	26 Missing ⚠️
specutils/io/default_loaders/sdss_v.py	96.64%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1107      +/-   ##
==========================================
+ Coverage   71.11%   72.21%   +1.10%     
==========================================
  Files          61       62       +1     
  Lines        4248     4427     +179     
==========================================
+ Hits         3021     3197     +176     
- Misses       1227     1230       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

working on loaders

new helper funcs: - _fetch_metadata to perform grab of common metadata - _fetch_flux_unit to get flux unit from the given HDU and convert it to an astropy unit object

pulled from https://raw.githubusercontent.com/sdss/astra/298a73ce600db428cf0a2ed8a707a56c2182ae57/python/astra/tools/spectrum/readers.py

can fully load boss files now other changes: - commented out abstract @ things to make it so ipython autoreloads notes: - flux unit where for BOSS files? - what the heck are spAll, zAll, and zList HDUs? - does the InverseVariance need a unit?

added mwmVisit and mwmStar loaders. updated demonstration notebook accordingly and output to PDF del: test.pdf and test.ipynb del: secret SDSS-V data

able to now load all BOSS spec directly with the same underlying code. required refactoring methods into BOSS_spec loaders

going to now write implementation test add

mwm confirmed working still todo: - add HDU not-specified message - merge the mwm types into a single 2 loaders - confirm all other loaders work and add to __all__

- refactored BOSS spec methods and mwm spec methods into single functions for simplicity - all loaders WORKING!! (except apStar multi) - all the documentation + type hinting (excluding outputs) - changed variable names to standard types used in specutils - TODO: the apStar multi-loader is confusing, so it remains unimplemented for now. - CHECK: do I need to clean the files of zero vals? - TODO: BUNIT pulls for spec and mwm files - TODO: check with data team what mwm files are needed

- currently non-functional because of zero values in x-axis - deleted test_implementation.ipynb for policy reasons

- jdaviz hates nan and zero flux, so they have to be removed - TODO: open issue on jdaviz repo about nan and zero flux bug the bug originates in the x_min x_max values used for the redshift slider for line_lists (somehow) on nan and zero flux values in the Spectrum object.

apStar loader not yet tested because file is of length 1 (no visits) mwm loaders will SKIP any DATASUM=0 because Spectrum1D cannot be instantiated with zero data

fixes a jdaviz issue regarding a 1D flux in a 2D object, where it gets confused and explodes i will put an issue in for it this fix is different from the previous as it keeps all zero and NaN flux points

need someone to help me write a BinTableHDU for mwm files...

still need to write mwm dummy file for the tests there's also a foobar variable check for the metadata

now obtains header from PrimaryHDU in the HDUList, any data that was previously accessed through it has been removed too

keeping .jukit incase anyone else uses vim-jukit during dev

Spectrum1D intializer converts any 0 to valid values. I'm assuming that zeroes in the bitmask means that its valid, as per manga.py

@Sean-Morrison

fix as per @Sean-Morrison 's suggestion in astropy pull req [astropy#1107](astropy#1107) could be reverted in future, in which case this commit can just be deleted

OBSERVAT column not in everything so i changed it, also adding another LOGLAM check to the coadd HDU check.

instead of specifying a hdu on Spectrum1D loaders for spec and mwm types, it will not find the first HDU with data, or in the case of spec, just use the coadd. this means that it works directly with jdaviz for those two datatypes correctly now. there are no user facing methods, and I don't want to break anything, but it should be noted that these datafiles can contain several spectra, which inherently limits this. in theory, I could put everything as a Spectrum1D nD flux object, but I'm pretty sure that breaks sometimes for jdaviz.

@andycasey

- force masks to be boolean prior to entering initializer - add mwm file tests based on dummy file (credit to @andycasey for those dummy file generators) - add more mwm file tests for failures - added checks to see if file is empty for mwm files based on datasum (failsafe)

rosteen

Looks good to me, thanks (and double thanks for adding tests)!

rileythai force-pushed the sdss-v-loaders branch from 9966c92 to 0846cc6 Compare November 16, 2023 02:28

havok2063 reviewed Nov 16, 2023

View reviewed changes

havok2063 reviewed Nov 17, 2023

View reviewed changes

rileythai marked this pull request as ready for review November 23, 2023 00:39

rileythai requested review from eteq, nmearl and keflavich as code owners November 23, 2023 00:39

rosteen added the io label Feb 8, 2024

rosteen added this to the v1.x milestone Feb 8, 2024

rileythai added 14 commits February 13, 2024 11:32

add: sdss_v.py

15d5f91

working on loaders

add: test.py for testing things, fix ordered method for apVisit

4d3e7ef

feat: apStar/apVisit functionality, new helper funcs

de6e0bb

new helper funcs: - _fetch_metadata to perform grab of common metadata - _fetch_flux_unit to get flux unit from the given HDU and convert it to an astropy unit object

add: astra_readers.py

8220430

pulled from https://raw.githubusercontent.com/sdss/astra/298a73ce600db428cf0a2ed8a707a56c2182ae57/python/astra/tools/spectrum/readers.py

feat: BOSS specFull loaders

652c0f9

can fully load boss files now other changes: - commented out abstract @ things to make it so ipython autoreloads notes: - flux unit where for BOSS files? - what the heck are spAll, zAll, and zList HDUs? - does the InverseVariance need a unit?

feat: mwmVisit, mwmStar loaders

be4fe7d

added mwmVisit and mwmStar loaders. updated demonstration notebook accordingly and output to PDF del: test.pdf and test.ipynb del: secret SDSS-V data

feat: specLite and other BOSS REDUX loader functionality

88d2687

able to now load all BOSS spec directly with the same underlying code. required refactoring methods into BOSS_spec loaders

chore: identifier + documentation

42382c7

add: test_implementation.py

0f5f0eb

going to now write implementation test add

feat: partial implementation of loaders

39ebfb2

mwm confirmed working still todo: - add HDU not-specified message - merge the mwm types into a single 2 loaders - confirm all other loaders work and add to __all__

add: test_implementation jupyter notebook

dfdb4e2

- currently non-functional because of zero values in x-axis - deleted test_implementation.ipynb for policy reasons

feat: all multiloaders functional

59aca5b

apStar loader not yet tested because file is of length 1 (no visits) mwm loaders will SKIP any DATASUM=0 because Spectrum1D cannot be instantiated with zero data

rileythai and others added 17 commits February 13, 2024 11:32

fix: Astropy units warning + warning format -> print

29c7794

ignore: demonstration and test files

9d730d6

fix: jdaviz specviz 1D in 2D array handling

db94648

fixes a jdaviz issue regarding a 1D flux in a 2D object, where it gets confused and explodes i will put an issue in for it this fix is different from the previous as it keeps all zero and NaN flux points

fix: header method -> .get() + other minor fixes

c58b4dc

feat: unit tests on dummy data (excl. MWM)

39192b5

need someone to help me write a BinTableHDU for mwm files...

feat: unit tests with assertions

38cdcbe

still need to write mwm dummy file for the tests there's also a foobar variable check for the metadata

chore: docustrings

5228528

fix: header fetch method -> specutils standard

5162131

now obtains header from PrimaryHDU in the HDUList, any data that was previously accessed through it has been removed too

del: mpl_preamble.py

0f78635

feat: individual identifiers + unit tests update

f16cf34

fix: .gitignore list

27c20a7

keeping .jukit incase anyone else uses vim-jukit during dev

add: bitmasks to Spectrum object outputs

245ff3e

Spectrum1D intializer converts any 0 to valid values. I'm assuming that zeroes in the bitmask means that its valid, as per manga.py

fix: spec mask, AND_MASK -> OR_MASK

8054817

fix as per @Sean-Morrison 's suggestion in astropy pull req [astropy#1107](astropy#1107) could be reverted in future, in which case this commit can just be deleted

fix: spec file identify OBSERVAT column

5562d3c

OBSERVAT column not in everything so i changed it, also adding another LOGLAM check to the coadd HDU check.

Fix codestyle errors

1fdb4a2

rosteen force-pushed the sdss-v-loaders branch from e548a16 to 1fdb4a2 Compare February 13, 2024 16:33

rosteen approved these changes Feb 13, 2024

View reviewed changes

rosteen merged commit 0b7098d into astropy:main Feb 13, 2024
12 checks passed

This was referenced Oct 12, 2024

fix: SDSS-V SpectrumList format ambiguity, mwmVisit BOSS load fail #1185

Open

[FEAT] support Spectrum1D objects with 2D flux arrays in Specviz spacetelescope/jdaviz#3223

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add: new SDSS V datatype loaders #1107

add: new SDSS V datatype loaders #1107

rileythai commented Nov 15, 2023 •

edited

Loading

havok2063 left a comment

havok2063 Nov 16, 2023

rileythai Nov 17, 2023

rileythai Nov 17, 2023

havok2063 Nov 17, 2023

rosteen Nov 20, 2023

havok2063 Nov 16, 2023

havok2063 Nov 16, 2023

rileythai Nov 17, 2023

havok2063 Nov 17, 2023

havok2063 Nov 16, 2023 •

edited

Loading

rileythai Nov 17, 2023

havok2063 Nov 17, 2023

Sean-Morrison Nov 17, 2023

havok2063 Nov 20, 2023

rileythai Nov 23, 2023

havok2063 Nov 27, 2023

rileythai commented Nov 17, 2023

havok2063 commented Nov 17, 2023

havok2063 Nov 17, 2023

rileythai Nov 18, 2023

havok2063 Nov 20, 2023

rosteen Feb 13, 2024 •

edited

Loading

havok2063 commented Dec 21, 2023

rileythai commented Dec 21, 2023

rosteen commented Feb 8, 2024

codecov bot commented Feb 12, 2024 •

edited

Loading

rosteen left a comment

add: new SDSS V datatype loaders #1107

add: new SDSS V datatype loaders #1107

Conversation

rileythai commented Nov 15, 2023 • edited Loading

havok2063 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

havok2063 Nov 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rileythai commented Nov 17, 2023

havok2063 commented Nov 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rosteen Feb 13, 2024 • edited Loading

Choose a reason for hiding this comment

havok2063 commented Dec 21, 2023

rileythai commented Dec 21, 2023

rosteen commented Feb 8, 2024

codecov bot commented Feb 12, 2024 • edited Loading

Codecov Report

rosteen left a comment

Choose a reason for hiding this comment

rileythai commented Nov 15, 2023 •

edited

Loading

havok2063 Nov 16, 2023 •

edited

Loading

rosteen Feb 13, 2024 •

edited

Loading

codecov bot commented Feb 12, 2024 •

edited

Loading