Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 6dFGS loader #608

Merged
merged 4 commits into from
Apr 23, 2020
Merged

Add 6dFGS loader #608

merged 4 commits into from
Apr 23, 2020

Conversation

aragilar
Copy link
Contributor

This adds support for loading files from 6dFGS which were stored in the tabular variant.

Some questions:

  • What's the best way of uploading test data for the tests, currently there is no stable place for downloading the files (I can provide example files though)? I'll add tests once I know where I can put the data files.
  • I'm unsure where the tox configuration is stored—I'm currently running into problems with tests not finding Cython (even though it is installed).

@aragilar aragilar force-pushed the add_6dfgs_tabular branch 2 times, most recently from 37d2fc5 to fe7f02e Compare March 18, 2020 01:26
@dhomeier
Copy link
Contributor

  • What's the best way of uploading test data for the tests, currently there is no stable place for downloading the files (I can provide example files though)? I'll add tests once I know where I can put the data files.

You mean you don't have a permanent download link for the files?
There are a number of test files uploaded to Zenodo for just these purposes (see tests/conftest.py and tests/test_default_loaders.py for more details). I don't know if it is possible to add files to one of the existing archives, or if you could simply upload them into your own archive and pass the location with the respective
@remote_access([{'id': file_id, 'filename': file_name}])

Else I guess for small files it would be acceptable to create e.g. a io/default_loaders/tests/data subdirectory analogously to the astropy test file setup, but I'd wait for one of the maintainers' opinion.

@dhomeier
Copy link
Contributor

  • I'm unsure where the tox configuration is stored—I'm currently running into problems with tests not finding Cython (even though it is installed).

Have you tried running pytest directly? The default options (like --remote-data=any) are already set in setup.cfg; I'd think if cython is in your standard path this should work.

@aragilar
Copy link
Contributor Author

I managed to run setup.py test within a venv where I installed all the deps, so I can now run the tests.

There's no permanent place yet. I'm currently working on getting a large number of older surveys ingested into Data Central, but it will take some time before there's a permanent and reliable host for the data.

extensions=["fit", "fits"])
def sixdfgs_tabular_fits_loader(filename, **kwargs):
"""
Load the tabular variant of a 6dFGS file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide a bit more detail on the format here (just a few words)? I vaguely know the 6 degree surveys, but a brief characterisation for the user would be nice.

The 6dF spectrum that is represented by the data in this table.
"""
with fits.open(filename) as hdulist:
header = hdulist[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want hdulist[0].header here?
That aside, there is the question which header really is the best one to store. I am just realising I have not given this further thought when revising the MUSCLES loader, which is similarly organised.
I don't know the specific content here, but the primary HDU header will usually contain general information on the target, observing run etc., while the HDU with the spectrum (here the 1st extension) will include more specific info on this dataset. In principle both could be interesting, so I could imagine two solutions:

  1. First get the primary HDU header, and then update it from the extension:
header = hdulist[0].header
header.update(hdulist[1].header)
  1. Revise the meta info to include additional header1 etc. keys.

But this is of sufficient general interest so I'll better open an extra issue about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, typo on my part. :)

The original 6dFGS Final Data Release released the observations in two formats, the tabular one (which this loader loads), which only contained flux and wavelength (no uncertainty), and one where the reduced spectra (with sky/uncertainty information, both with the separate arms and combined) was included with 2MASS and SuperCOSMOS (I think?) postage stamps. Then GAMA came along and only used the combined and best version (so single HDU). And now the plan for Data Central is to split all the 6dFGS spectra into their own individual file (avoiding the need to keep downloading postage stamps, and make it easier to understand what data each object has).

I thought about including the header from the actual table HDU, but there's no metadata there (other than the table metadata), so rather than adding another key/dict pair, I left it out. Happy to include it for completeness sake though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the table HDU header really contains no information of interest (you could also check table.meta below, which will only preserve non-trivial header cards), then there is probably no need to add it. #617 has raised a few more questions on how to organise the header info inside meta.

with fits.open(filename) as hdulist:
header = hdulist[0]
table = QTable(hdulist[1])
flux = table["FLUX"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that the existing loaders all get along without importing QTable; when reading as a regular Table, the values just need to be passed as
flux = Quantity(table['FLUX'])
an so on. I think this might carry slightly less overhead, but there is no official recommendation either way, so it's up to your taste.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, cool, I didn't know I could do that. I'll match the other loaders for consistency then.

flux._unit = Unit("count/s")
meta = {"header": header}

return Spectrum1D(flux=flux, spectral_axis=wavelegth, meta=meta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "wavelength"?

Is there no uncertainty information of any kind stored in the data? If at all possible, the Spectrum1D.uncertainty should be filled as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks.

Yeah, the table variant didn't include uncertainty data unfortunately.

@aragilar
Copy link
Contributor Author

I'll look at uploading a few of the FITS files to zenodo, and then add some tests.

Do the apogee tests normally timeout (I'm noticing that they are the failing tests on travis).

@aragilar
Copy link
Contributor Author

Sorry for not getting back to this earlier, I've added the test and fixed the other issues.

@dhomeier
Copy link
Contributor

dhomeier commented Apr 1, 2020

Looks good; I agree that in your case the only meta information of interest is in the primary HDU's header.

Thanks for pointing to the IMAGE-format files, that data model turned out to be of interest as I am adding a WCS spectrum writer in #632, but have not implemented any functionality for storing uncertainty!
If it is of interest, the updated loader from that PR would also be able to read these spectra with format='wcs1d-fits', hdu_idx=5 and so on, so in principle the uncertainties from there should be accessible as StdDevUncertainty(np.sqrt(spectrum.flux.value[1])).

@dhomeier
Copy link
Contributor

dhomeier commented Apr 8, 2020

Have you made any changes in the last commit? I got a notification some 3 days ago, but the commit it referred to seems to have already been deleted again.
Generally it looks like you have completely reset all previous commits before force-pushing a new version here. This makes it almost impossible to keep track of the changes as your PR evolved – I suggest to check the recommendations on rebasing and squashing.

@aragilar
Copy link
Contributor Author

aragilar commented Apr 9, 2020

Odd, I only rebased this at the same time as #633 (as I figured it'd have the same CI failure otherwise). The code hasn't changed since I added the tests.

@dhomeier
Copy link
Contributor

dhomeier commented Apr 9, 2020

Strange – yes, the rebase after 8339d8a was in place to fix the build_docs failure, but I see a total of 6 force-pushes and only a single commit in the entire PR.

Copy link
Contributor

@dhomeier dhomeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some more extensive change suggestions that would bring this loader in line with #637 by making all loaders accept file names, file-like objects or HDUList objects alike.

specutils/tests/test_loaders.py Show resolved Hide resolved
@aragilar
Copy link
Contributor Author

Cool, I've applied the suggestions (it's probably worth pulling the things from #637 into a common decorator, so that support for the different types doesn't require custom code each time).

@aragilar
Copy link
Contributor Author

I've added the other format variants that 6dFGS can come in, looks like the tests are again failing due to a reference issue in the docs, should I rebase this again?

@dhomeier
Copy link
Contributor

dhomeier commented Apr 17, 2020

I've added the other format variants that 6dFGS can come in, looks like the tests are again failing due to a reference issue in the docs, should I rebase this again?

I think we can (have to) ignore the build_docs failures for now; from #655 there is no consistent fix for this yet anyway, so no point in rebasing, thanks!

@nmearl
Copy link
Contributor

nmearl commented Apr 22, 2020

This looks pretty good to me -- is there anymore intended work on the loaders?

@aragilar
Copy link
Contributor Author

No, these should cover all the possible outputs I know to exist for 6dFGS.

@nmearl nmearl merged commit 4afeee0 into astropy:master Apr 23, 2020
@aragilar aragilar deleted the add_6dfgs_tabular branch June 2, 2020 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants