Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datalink getdataset too eager #554

Open
msdemlei opened this issue Jun 13, 2024 · 2 comments
Open

Datalink getdataset too eager #554

msdemlei opened this issue Jun 13, 2024 · 2 comments

Comments

@msdemlei
Copy link
Contributor

Consider this script:

import pyvo

ACCESS_URL = "http://dc.g-vo.org/lswscans/res/positions/siap/siap.xml?"
svc = pyvo.sia.SIAService(ACCESS_URL)
images = svc.search((11,35), size=(0.1, 0.1))
images[0].cachedataset()

Warning: This will download at least 100 Megabytes with current pyVO.

It shouldn't, though.

This is one of the SIA1 services that return cutouts as access_urls. However, there is a datalink service on the result, too. That service only uses the full data ids, not any cutout instructions. That is, #this can be a FITS of 1 GB or so, whereas the accref can point to just a few epsilons.

Now, you can argue that if I can't make #this exactly the dataset that is described in the SIA row then I shouldn't make the datalink in the first place; but I claim this arrangement is useful in practice (e.g., people can adjust their cutouts, or pull the full image if that's what they want) and at least not totally wrong.

On the other hand, pyVO's current behaviour, that is, to prefer datalink's #this over the access reference given in the SIAP response, is weird (to say the least). I'm not even sure why we're doing it. It seems the original code was written by @funbaker, who I think has dropped out of pyVO development, so I don't think we can ask him.

Anyway, I'd suggest to prefer SIAP's access reference and only fall back to datalink if that's not going anywhere. Technically, I'd simply turn around the logic in adhoc.DatalinkRecordMixin.getdataset: try super() first and only if that exceptions out try datalink's #this.

Does anyone feel that would be grossly wrong?

@andamian
Copy link
Contributor

andamian commented Jun 13, 2024

Can the intersect argument be used to infer user's intentions? ENCLOSED probably means cutout whereas COVERS and CENTER does not. Not sure about OVERLAPS. But even then, there's the question of what is the default. But at least we could make it clear what it defaults to (right now is None)

@msdemlei
Copy link
Contributor Author

msdemlei commented Jun 14, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants