-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sane behaviour for servicetype="image" #429
Comments
I'm very much leaning towards option 3) and leave it to the user to decide which one to use. Then the second choice is 5). But I fully agree that it's something to be raised for input from the archives, especially from those who decide to serve both versions of the protocol. I would expect while some may prefer SIA v1, others very much prefer SIA v2. |
Btw, this discussion is probably not limited to python/pyvo, but also I wonder whether it should be the same for e.g. the Gavo webform of the registry when one selects "Image Access"? Which options does it do currently? |
On Thu, Mar 02, 2023 at 11:23:39AM -0800, Brigitta Sipőcz wrote:
Btw, this discussion is probably not limited to python/pyvo, but
also I wonder whether it should be the same for e.g. the Gavo
webform of the registry when one selects "Image Access"? Which
options does it do currently?
WIRR (the gavo webform) uses standard_id LIKE '<stdid>%', which does
not make a difference between sia1 and sia2. For WIRR, that's
perhaps not a disaster, since people will manually shove around
access URLs. When sending services to TOPCAT via SAMP, TOPCAT does
its own Registry query, and it will switch versions based on what it
sees. I've not tried which one it prefers when there's both versions
on a record, but since TOPCAT has basically the "minimal common SIA
interface" proposed above, that doesn't matter much.
I don't know what Aladin does -- I suppose we should ask Pierre?
If the pyVO release planning allows this: should we take this to the
apps session in Bologna and hope for coffee break discussions to
figure out the right way to deal with this?
|
Yes, I think that's reasonable to bring up as a discussion item. Also, I don't think this should necessarily hold up v1.5, currently, I consider #428 and #386 blockers for the release, but nothing else. |
On Tue, Mar 07, 2023 at 10:36:35AM -0800, Brigitta Sipőcz wrote:
Yes, I think that's reasonable to bring up as a discussion item.
Who's going to go for it? Frankly, I'd much rather have it in Apps
than in Registry (where it probably tends to be ignored). @andamian
-- can you see yourself volunteering? I'd be happy to help out.
Also, I don't think this should necessarily hold up v1.5,
currently, I consider #428 and #386 blockers for the release, but
nothing else.
Ah yes, getting #386 merged would be great. I'm happy to rebase and
polish as soon as @tomdonaldson gives me a green light.
For #428, sorry about bypassing setup.cfg. Should be better now, and
of course feel free to further re-arrange the plumbing as you see
fit.
|
Yes, we can probably find a spot in an Apps session for this discussion. Without being very familiar with SIA1, I've always considered SIA2 as a natural evolution of the standard that will eventually supersede version 1. In that case, (5) made the most sense. However, recent SIA2 refactoring and the comments around it make them look like competing versions of the same standard. Is that correct? If so, (3) is the best option. |
On Wed, Mar 08, 2023 at 10:28:35AM -0800, Adrian wrote:
Without being very familiar with SIA1, I've always considered SIA2
as a natural evolution of the standard that will eventually
supersede version 1. In that case, (5) made the most sense.
However, recent SIA2 refactoring and the comments around it make
them look like competing versions of the same standard. Is that
correct? If so, (3) is the best option.
Unfortunately, there is not terribly much consensus about this.
SIAv2 was basically a political compromise that went through based on
a "we need it for cubes and because we want some parameter-based
protocol" narrative. As so often, there was *some* truth to that
narrative, but probably not enough to make it really true, which is
part of the reason for the lackluster takeup.
Be that as it may, since SIAv2 is now there and at least in
combination with datalink it's certainly not less powerful than
SIAv1, I suspect we should make a drive to sunset SIAv1 with its ugly
ad-hoc UCDs and missing support for time constraints. Perhaps that's
an agenda Apps could officially petition DAL to pursue?
I guess what I'm trying to say is: I'd suggest we at least pretend
that SIAv1 services won't be around any more 10 years from now.
|
I've given a talk on that at the Bologna Interop (https://wiki.ivoa.net/internal/IVOA/InterOpMay2023Apps/sia-notes.pdf), and my conclusion from having prepared the talk and the reactions at the conference is: servicetype="image" was a mistake because it suggests an abstraction far too leaky to be useful (cf. https://en.wikipedia.org/wiki/Leaky_abstraction). I suppose the intention was (or would now be) to enable an all-VO image search in that way. But such a thing is a lot more complicated that just looking for a certain kind of service. For one, you'd still be missing ObsTAP, and I've not even started with de-duplication, unifying interfaces, and so on. Hence, here is what I propose: (1) both servicetype="image" and servicetype="spectrum" won't change their current behaviour (i.e., they'll return sia1 or ssap services, respectively), but they will result in a deprecation warning saying, for now, that people should use sia or ssap instead. That's for the next release. (2) We create a new module for doing global dataset discovery that hides the various details of protocols and all that from the user. That's definitely material for the next release, and I can't promise I'll invest terribly much time into that, but I'm not convinced that's what we need if we want user-comprehensible global dataset discovery. I'd say more about it in a bug I'd open when I close this. If and when the new functions are available, the deprecation warnings would point to them. What does everyone think? |
I'm not super happy about this, there was also a reasonably widespread notion that end-users shouldn't know much about vo, e.g. know sia vs sia2. With this proposal, we actually force them to know the difference and whether a dataset is offered via one or the other (e.g. at Irsa we have a mixture of the two and we really should not expect the users to know which datasets served by either). |
On Fri, May 19, 2023 at 09:33:15AM -0700, Brigitta Sipőcz wrote:
I'm not super happy about this, there was also a reasonably
widespread button that end users shouldn't know much about vo, e.g.
knowing sia vs sia2. With this proposal we actually force them to
know the difference and whether a dataset is offered via one or the
other (e.g at Irsa we have a mixture of the two and we really
should not expect the users to know which datasets served by
either).
No, on the contrary. The reason I propose to drop image is exactly
because I don't believe the "shouldn't now much about" can fly based
on any "slim" layer on top of the raw registry (i.e., whatever
servicetype="image" could do).
To be a bit more concrete, I think what we should come up with to
replace it (perhaps expanding its capabilities over time) is function
of the type
global_find_images(spatial=None, spectral=None, temporal=None,
maxrec=1000) -> [obscore record]
(where spatial would probably start out as a circle, and the other
two as min,max pairs; we should probably quickly expand to support
multiple positions). The trickery of picking apart all the
complications to achieve halfway sane and predictable behaviour can
IMHO only be encapsulated in a function abstracting away all of SIA,
SIA2, Obscore and who knows what else (HiPS?). And if that were easy
enough to hide within servicetype="image", I'd promise it for 1.5 :-).
|
I agree that For a data centric functionality, the workflow should probably be:
Related to this, I've noticed that the CADC SIA service has 2 access URLs (one for each version) but the registry returns only one of them. TOPCAT shows both. Is that a bug or me not knowing how to use the registry search? |
On Wed, May 31, 2023 at 11:03:49PM -0700, Adrian wrote:
I agree that `servicetype="image"` -> `SIA1` is incorrect but I
don't know what the appropriate solution is.
`global_find_images` might be a bit too ambitious. There could be
tens of services, do we search all of them even if they will return
duplicate data?
Ah, we'll be querying hundreds of services sometimes, and dealing
with duplicate data as best we can is part of the challenge why I
think "give me all image services" is nowhere near enough to let
people do global image discovery.
On the other hand, global data discovery has been the VO's promise
for a long time, and if we can't pull it off in pyvo, who else could?
For a data centric functionality, the workflow should probably be:
1. Search for all image services and return a list of services
Actually, we should already use constraints at this level -- more and
more services give their STC coverage, and we ought to use this
information to cut down on the number of services to query. Users
will quickly notice that search times go down dramatically if they're
precise in their constraints, and this kind of thing may actually be
the killer application that makes data providers give their STC
coverage.
2. (optional) User selects a subset of services. This can be based
on the data provider or service type.
Well... if people know about the data provider, they can probably use
the registry directly -- I don't think there's a strong case to be
made to use some global_find_x with that kind of constraint. But who
knows? Perhaps people will like the interface so much that we should
support arbitrary rtcons constraints?
3. Search for data in the service subset and return a list
Yup -- and there's the usual problems here: parallel queries,
filtering dupes, when to give up on services not returning data for X
seconds, etc. My plan would be to start slowly and do more advanced
stuff when we find people actually like it.
5.. Download data in the subset (with potential SODA operations:
cutouts, packaging etc)
In particular automatic cutouts would be nice, yes.
`servicetype` can achieve this with the union option. It will then
be up to the user to pick services from the result and use them
accordingly and this is the part that a higher level class could be
useful. So as a fix, I suggest to make it a union not only of SIAs
but all the services that return that data type. That can be
deprecated in the future when/if a viable alternative becomes
available.
As I said, I don't think we can make a service interface that
reasonably encapsulates the intracies of the various service types,
and I don't think we can reasonably explain how to do the service
selection smartly using registry constraints to a normal astronomers
(starting with: "Well, why don't all data providers declare their STC
coverage?" and "Why do you have SIA2 when there's obscore?", which I
couldn't honestly answer myself).
All-VO operations (which is what this is about) require non-trivial
development, and I'm sure we can't ask astronomers to do that
themselves.
Data discovery (i.e., "stuff without a servicetype constraint") are a
different matter, and they have a completely different workflow, so
that's fine; folks can look at what services are available on the
data they've discovered, and there's no need to consider semantics of
"image" or "sia".
Related to this, I've noticed that the CADC SIA service has 2
access URLs (one for each version) but the registry returns only
one of them. TOPCAT shows both. Is that a bug or me not knowing how
to use the registry search?
Where do you look? If you used record.access_url, you should have
got a DeprecationWarning if there's multiple access URLs (i.e.,
interfaces) for a capability; and then pyvo just picks one of the
multiple URIs. That's again legacy API we should get rid of (there's
no such thing as *the* access URL of a resource). If you did
something else: What was it?
|
FWIT, I agree with much of what Brigitta and Adrian said. I don't disagree with Markus' aims, but I don't like the current PR by itself. If we had an alternative workflow for users to find all-VO images, that would be different, but we don't yet. I would go for option 5 or 3 (duplication will always be with us) along with documenting what that means. I think astronomers can deal with the resulting duplication. If at some point, we have a better way, we can revisit this, but we do not as yet. This PR as implemented it seems to me will only add confusion. |
On Tue, Jun 13, 2023 at 08:43:05AM -0700, trjaffe wrote:
FWIT, I agree with much of what Brigitta and Adrian said. I don't
disagree with Markus' aims, but I don't like the current PR by
itself. If we had an alternative workflow for users to find all-VO
images, that would be different, but we don't yet. I would go for
Well, it's not like there is a working scheme for having all-VO
cross-protocol discovery right now, so this PR won't damage anything
working so far. This PR just stops claiming something that's been...
factually incorrect (to avoid less polite words) ever since we
defined Obscore, and which has become more factually incorrect with
SIAv2.
option 5 or 3 (duplication will always be with us) along with
documenting what that means. I think astronomers can deal with the
resulting duplication. If at some point, we have a better way, we
The problem with this plan is that SIAService and SIA2Service have
rather different signatures -- their query methods accept different
parameters and have different return types. Just blindly returning
them intermixed is going to be a very painful experience if someone
really wants to do all-VO queries. That's why I was talking about
the compatibility shim in the original bug, but building that again
is painful and relatively complicated. I'd much rather invest the
time to build such a thing into a proper all-VO dataset discovery
facility that people can actually use.
Let's just write the global discovery functions -- help (in
particular with writing the tests, which is going to be really
painful as we'll have to mock half of the VO) is highly welcome.
The image and spectrum service types, however, appear less and less
salvagable the longer I think about them...
can revisit this, but we do not as yet. This PR as implemented it
seems to me will only add confusion.
Where would you expect confusion? You mean when people had
servicetype='image' and now get warnings telling them to better use
something that doesn't exist yet?
Well, yes, I'd be happier if that other thing existed, too, but I
think it's actually a good thing to tell them that "image" isn't what
they may think it is. I'm more than happy to consider better
alternatives for the warning text, though. Perhaps: "Don't use image
any more, because it's *not* the all-VO image search you may
suspect." Better alternatives would be most appreciated.
|
Given we have a long discussion here, I move my comment from #449 to here: I would not like to go ahead with this deprecation, until we have a good/convenient alternative. I totally see the problems, but OTOH, the users should not really need to know about the versions, so doing an E.g. I strongly feel instead of this, one should be able to do one
The user should do a lot of digging after that anyway to see e.g. which "Spitzer" they want, but otherwise all the entries look so similar, that they should be in one result.
|
In our Registry interface, we have had an "image" service type for a long time. It has so far been a synonym for sia1. Now that our Registry discovery supports sia2, too, the question is what to do with the image service type.
Here's a few options I have been pondering:
(1) Leave things as they are. Not pretty, because why should sia2 not be image?
(2) Deprecate servicetype="image" and then one day drop it. Well, we can always do that when we find out we can't come up with useful semantics.
(3) Make image the union of sia and sia2. I think that's about the worst we can do, because several services offer both sia and sia2, and these will then be queried twice (though we can't avoid duplicate results in general; in DaCHS services, for instance, there's just one global SIA2 service but many per-data collection SIA services, and it's hard to work out which SIA results you might also get through SIA2).
(4) Make image return sia services where available and sia2 otherwise. Since I personally don't like sia2 in its current state, I'd like that a lot. But we'd have quite a bit of explaining to do if we went for that.
(5) Make image return sia2 services where available and sia otherwise. I think that is about the least surprising thing we can do.
For (3) through (5), however, we have the additional problem of what get_service() would return. For reference, SIAService's search has the signature
whereas SIA2Service's has
– where pos is, I think, a rather different beast in SIA2Service than in SIAService (it is in the protocol, I've not looked at the pyVO implementation).
This means that if we want to enable something like:
I think we won't get around a compatibility shim. What could this be? We could implement SIA2's extra conditions as good as we can locally on top of SIAResults (with suffiently rich service responses, that would be reasonable for band, time, and serveral of the others; btw: you don't have any guarantees that the constraints will be honoured in SIA2 either). But, really, someone else would have to contribute code for that. Volunteers?
Me, I think I'd do a GenericImageService with a search that just has pos and size, which would be translated into a CIRCLE query against SIA2 services. The return rows, on the other hand, would be pretty much SIA2Results, which are basically obscore rows.
So, what does everyone think? Do you perhaps have better ideas? I predict our future selves will be grateful if we think about this a bit. We'll always have similar problems when we have two major versions of a protocol coexisting, and this will probably happen quite a few times in the VO's future.
Should we have a side meeting on this in Bologna?
The text was updated successfully, but these errors were encountered: