Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MANGO Annotation Scope #18

Open
lmichel opened this issue Mar 19, 2021 · 20 comments
Open

MANGO Annotation Scope #18

lmichel opened this issue Mar 19, 2021 · 20 comments

Comments

@lmichel
Copy link
Collaborator

lmichel commented Mar 19, 2021

This issue is a fork of #12 that diverged from the initial dependant axes topic

Last message (#12 (comment)):

On Fri, Mar 19, 2021 at 07:23:56AM -0700, Laurent MICHEL wrote:

The scope of the annotations must go beyond simple column
annotations which must remain supported though.
I detailed it here section 2.
I'm starting to be unsure whether we are actually disagreeing on much
here -- and I've not found anything in that section 2 that I'd need
to contradict.

So, perhaps a clarification: is my time series use case "single
column annotation", and if so, why? What actual usage would go
beyond what's possible there?

My point, is since we have a self-consistant model made with a
hierarchy of elements identified with dmtype, dmrole and others
things, the annotation must be something matching that structure.

Well, the thing with dmrole and dmtype to me is the annotation, but
I think what you're saying here is that the annotation should be
directly derived from the model.

That I wholeheartedly agree with,
and that's why I'm so concerned about the current MCT proposal -- if
it were some abstract musing, I'd be totally ok with it. But when
the model defines the annotation structure. whatever we do in the
model has concrete operational consequences. Which, mind you, is
fine -- we'll have to deal with them somewhere and the DM is the
right place for that.

Once you have it, you can use accessors based on those identifiers.
That is what I call a public API does no refer to any native data element but only to model elements

...and I still cannot figure out why you want this -- after all, the
point of the whole exercise IMNSHO is to add information to VOTables
(and later perhaps other container formats) that is not previously in
there.

What would the use case for your free-floating annotation be, if this
is what your are proposing?

I the examples I showed up is these use-cases, I transform the
annotation in Pyhton dictionnaries that are easily serializable in
JSON (a good point for data exchange).

In pseudo code, this would look like this:

annotation_reader = AnnotationReader(my_votable)
if annotation_reader.support("mango") is False:
  sys.exit(1)

mongo_instance = annotation_reader.get_first_row()
print(mongo_instance.get_measures())
['pos", "magField"]
print("Magnetic field is:" + mongo_instance.get_measure("magField"))
Magnetic field is: 1.23e-6T +/- 2.e-7

This wouldn't require Python classes implementing the model
(fundamental point)

I claim that the annotation must be designed in a way that allows
this in addition to basic usages.

-- but why would you want to do this JSON serialisation? Wouldn't it
be much better overall to just put that value into a VOTable and
transmit that rather than fiddle around with custom JSON
dictionaries? In particular when there are quite tangible benefits
if you make it explicit in the model what exactly it is that you're
annotating?

By the way, if by "wouldn't require Python classes" you mean "You
don't have to map model classes into python classes" then yes, I
agree, that is a very desirable part of anything we come up with.
Let's avoid code generators and similar horrors as much as we can.
Nobody likes those.

Let's consider that all Vizier tables come with such annotations, the same API code could that get many things:

  • Basic quantities (no significant gain I admit)
  • Complex quantities (e.g. complex errors)
  • Columns grouping
  • Status values
  • Associated data or services

I agree to all these use cases (except, as I said, even for basic
quantities the gain is enormous because we can finally express
frames, photometric systems, and the like in non-hackish ways).

But: which of these use cases would you miss with the non-entangled,
explicit-reference models?

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 19, 2021

So, perhaps a clarification: is my time series use case "single
column annotation", and if so, why? What actual usage would go
beyond what's possible there?

Any usage that mixes columns together (e.g. error matrix, columns grouping)

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 19, 2021

we do in the model has concrete operational consequences. Which, mind you, is
fine -- we'll have to deal with them somewhere and the DM is the
right place for that.

If you change the model in a way that breaks the backward compatibility you will get concrete operational consequences whatever the way you associated model with data.

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 19, 2021

...and I still cannot figure out why you want this -- after all, the
point of the whole exercise IMNSHO is to add information to VOTables
(and later perhaps other container formats) that is not previously in
there.

At least 2 reason to targeting this:

  1. I would be happy if i could develop my client just by reading the model spec without fighting with VOTable elements (supposing that someone provided me with a low level library doing the dirty job)
  2. The comparison between 2 datasets is straighforward if the quantities 100% certified model compliant.

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 19, 2021

-- but why would you want to do this JSON serialisation?

I DO NOT want this JSON serialisation. It is both an example for our discussion and a convenient way to exercice and to validate my proposal.
JSON is however a convenient way to exchange data whatever their complexity. Let's imagine I spot a very intertesting source in my VOTable and I want to share it with another client (e.g. by SAMP). No doubt that the best way to do it would be to send a JSON MANGO (or whatever model) instance of that source.

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 19, 2021

Let's avoid code generators and similar horrors as much as we can.
Nobody likes those.

At least a clear point of agreement

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 19, 2021

But: which of these use cases would you miss with the non-entangled,
explicit-reference models?

For TS or spectra I send you back to the @mcdittmar responses

For the catalogcase, let's talk MANGO.
I do not figure out what the MANGO entanglement level is, so just have a look at it.

Mango is a simple model with 2 docks (container).

  • One for the measures (MCT object mostly)
  • One for the associated data (out of the current topic)

The content of those docks is totally free (non-entangled components?)

  • Q#1 Why using such containers?

The are designed in a way to carry any meta data we need to to perfecly describe any measure. So that, a Mango instances are self-consistant. If by some magic you need to handle some out of the VOTable scope (SAMP, datalink...) I'll expect them to be complete.

  • Q#2 You are going to object that most of these meta-data are already in the VOTable and that we shouldn't duplicate them in the annotation.

This is not false, but this is an annotation issue. If I've a unit in some model leaf, my annotation scheme must be able say that this unit comes from that FIELD.
After this, resolving or not such references is the client business.

@msdemlei
Copy link
Contributor

msdemlei commented Mar 22, 2021 via email

@msdemlei
Copy link
Contributor

msdemlei commented Mar 22, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 22, 2021

Could you recommend a specific one that I should tackle to show that
this kind of thing is of course possible with explicit referencing?

  • Column grouping here. This based on a real Vizier tabke
  • Error matrix: here. This is based on a mock VOtable that I wrote to test my code. The real use case if Gaia and testing this feature on it is still planed

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 22, 2021

Yes, but the question is: Will changing one model in this way take
entire rest of the annotation with it or will the remaining
annotation keep working? This is what the entanglement problem
is about.

IMO, the annotation must be faith to the model, but do not require the model to be totally mapped.
Only data present in the dataset have to be mapped. The rest can (must) be ignored.
The mapping block represents a subset of the model.
If the model changes keep the backward compatibility, the 'old' annotations remain consistant and the interoperability between dataset mapped with different DM versions is preserved.

If you are saying that clients must be updated to take advantage of new model features, you are right, whatever the annotation scheme is, this is just because. new model class => new role => new processing.

@msdemlei
Copy link
Contributor

msdemlei commented Mar 23, 2021 via email

@msdemlei
Copy link
Contributor

msdemlei commented Mar 23, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 24, 2021

I'm afraid I don't really understand this use case

  • This is a Vizier usecase, more to say.

Looking at your annotation,

Again do not mix model and annotation

  • The model describes how things relates each to others. This should drive the design of both annotations and model processing at client level.
  • I repeated several time that the model must be self-consistance and independant of any particular dataset.

I've added an annotation to this table and ...

  • Mark T and FXP seams interested in such a feature
  • I would say that the issues page is not the right place to question one of the use cases that have been proposed and validated about 2 months ago.

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 24, 2021

No, that is not my point. My p

Continued in #24

@msdemlei
Copy link
Contributor

msdemlei commented Mar 25, 2021 via email

@msdemlei
Copy link
Contributor

msdemlei commented Mar 25, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 26, 2021

See the Wiki post by Gilles.
Some catalogs may have columns that give extra information about a particular quantity (e.g. quality flag, statistical sample size...). A client could hide such associated information at the first stage and then show them up on demand (e.g. with a tooltip)

Another use case, a bit aside, in shown here as an alternative to define (in)dependant axes.
The independant axis is represented by a parameter and all the dependant axes are its associated parameters.

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 26, 2021

Hm -- complicating things a great deal to perhaps simplify standards
development a bit doesn't sound like a good deal to me.

I won't say that using annotations faith to the model is complicating things. It is rather the opposite.

Wouldn't you agree that out in the field, people should be taking the
annotation from the VOTables?

Yes I do, I even plead for these annotations, read in the VOTable, to bear the structure of the model.

But wouldn't such a comparison happen in a client after it's parsed
and deserialised the instances into whatever representation it
chooses? Where would such an abstract "normalise-and-compare"
operation play a role?

You are pointing the root of our disagreement:
1- You propose to let clients dealing with the model if they want to, and provide them the minimal stuff to do it.
2- I (with @mcdittmar ) propose to provide clients with model instances that can be parsed as such.

I do not say that you approach is not appropriate, but I claim that it makes the job more tough for clients for a little benefit whereas you way to consume data do work with my annotation sheme. This is not a good deal.

The way out of this discussion is likely somewhere in this topic

@msdemlei
Copy link
Contributor

msdemlei commented Mar 26, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Apr 7, 2021

it seems you're doing something very much different from grouping
different columns. The associatedDataDock annotation looks more like
an "associated link" thing,

associatedDataDock has nothing to do with the associated parameters.

  • associatedParameter is a 1-n relationship that bind MANGO parameters

  • associatedDataDock is a dock carrying data that are associated to MANGO instances

    • It allows to attach services (DL included) or data to MANGO instances.
    • There are 2 basic use-case for this feature
      1. Having both sources and detections in one VOTable
      2. Adding semantic to columns containing URLs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants