-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal provenance in VOTable output #37
Comments
On Fri, Apr 16, 2021 at 10:22:24AM -0700, gilleslandais wrote:
For VizieR it will be really appreciated (to not say required) to have common way to provide a minimal origin information.
The mango VizieR prototype uses the dock "associatedData" to link
a remote URL which contains a "complete" VOProvenance.
I would like to add an other concise provenance output in the VOTable (for "naive" client)
The minimal provenance for a VOTable are:
author+year_of_publication, doi or bibcode of the reference
article.
In DatasetDM, I didn't see a clear distinction between
creator/author .. Markus do you have an example of this
serialization in your output?
No. But now that you mention it, what *might* actually be smart is
sync VOResource's "implicit" (there's no VO-DML (yet?)) data model
with Dataset dm and friends.
I'm not 100% sure I'd like to see a lot of VOResource in instance
documents (as usual: What should clients do with it?), and we'd have
to think about whether there ought to be a single "resource" DM or
whether some of the types could become DMs of their own. But
whatever the result of these considerations: if DM deals with
Registry content, we should make sure there are no unnecessary
inconsistencies.
.. and I would like more - but is it possible in a concise
serialization: to specify a short annotation to specify the origin
of a measure - e.g. the filter configuration: with the curator + a
URL.
Any idea ?
Well, I've been advocating in-VOTable provenance forever, and this is
a nice example. While I'm too lazy to properly re-read the ProvDM
docs (or its VO-DML), this would basically look like this in my
annotation:
```
<TEMPLATES>
<INSTANCE dmtype="prov:Agent" id="fred">
<ATTRIBUTE dmrole="name" value="Fred Hoyle"/>
<ATTRIBUTE dmrole="affiliation" value="University of Cambridge"/>
</INSTANCE>
<INSTANCE dmtype="prov:Activity" id="reduction">
<!-- Embedding parameters probably isn't going to be so simple in
current ProvDM, but in an example this might be ok -->
<ATTRIBUTE name="parameters">
<COLLECTION>
<INSTANCE dmtype="prov:Parameter">
<ATTRIBUTE name="name" value="filter profile"/>
<ATTRIBUTE name="value" value="http://whatever"/>
</INSTANCE>
<INSTANCE dmtype="prov:Parameter">
<ATTRIBUTE name="name" value="magic fudge parameter"/>
<ATTRIBUTE name="value" value="27"/>
</INSTANCE>
</COLLECTION>
</ATTRIBUTE>
<ATTRIBUTE dmrole="WasAssociatedWith" ref="fred"/>
</INSTANCE>
<INSTANCE dmtype="prov:Entity" id="reduced_mag">
<!-- this isn't pretty, but the ProvDM authors probably didn't
expect immediately resolvable ids; this kind of thing would need
some thought in the mapping doc (and perhaps a fix in the DM, as
I'd much rather use a proper ref attribute here -->
<ATTRIBUTE dmrole="id">#mag_v</ATTRIBUTE>
<ATTRIBUTE dmrole="WasGeneratedBy" ref="reduction"/>
</INSTANCE>
</TEMPLATES>
<FIELD ID="mag_v" .../>
```
It's an interesting exercise to add a cutout FIELD and use it as an
Entity that's used by #reduction... I'll do that on request, because
I'll have to brush up on ProvDM again to confidently write such a
thing.
Also note how this to me is an argument against the division between
GLOBALS (IIRC) and TEMPLATES in the original VO-DML annotation
proposal: Nothing at all is saved if #fred would need to jump into
Globals here (or jump back to TEMPLATES if its
***@***.***="name"] became a ref to some FIELD).
|
The DatasetDM does map its content to the Resource Metadata elements.. so is in a sense, actualizing the 'implicit' model, and is VO-DML compliant. As for creator/author.. what distinction are you looking for?
I think 'author' implies the Dataset is a paper or some sort, rather than a Photometric Filter or LightCurve. I've also been curious to see how Provenance will get conveyed in the context of Datasets.
Do you mean outside of the Annotation syntax? similar to a COOSYS/TIMESYS element directly in VOTable. |
Ok for the datasetDM - in VizieR context, CDS is the publisher (Curator) and the author is the Creator (including the biblio reference in the same DataID) - so it could be added in AssocDataDock. For measures it is may be more complicate - So may be a simple way could be just a comment ?.. in that case is it better to put the (origin) comment on Mango:Parameter.comment or on Mango:stcextend.Photometry ? |
On Mon, Apr 19, 2021 at 1:43 PM gilleslandais ***@***.***> wrote:
For measures it is may be more complicate -
In vizier the photometry filter characteristics is not a part of the
original data (it could - but often it is added by CDS who assigned a
filter or a similar filter for magnitude columns) - if VizieR provides the
photometry characteristics, it is important to specify the origin. ProvDM
is adapted for that, but the parsing is may be a little discouraging for
clients..
So may be a simple way could be just a comment ?.. in that case is it
better to put the (origin) comment on Mango:Parameter.comment or on
Mango:stcextend.Photometry ?
So, you can put the filter at the 'normal' space according to the model,
but now you want to tag/record that this is something added by CDS, and not
part of the original dataset...
That's a good thread to include for exploring the Provenance usage within
datamodel instances.
Technically, I think one answer would be that this is a NEW Dataset,
created by CDS through an Activity which assigned the Filter to the
original Dataset.
So, your Provenance would point to the original Dataset which does not
include the Filter, is created by XYZ, etc.
That seems rather unappetizing in practice though.
|
If the purpose of the embedded Prov is just to say whether a filter has been added by the CDS, we could consider doing things in a simpler way. As PhotFilter still has to be wrapped into MANGO, we can add a field telling the filter origin. This is somehow similar to the reduction status Mango had at the beginning (raw, calibrated..). This value could be carried either by an enum or a vocabulary. |
On Mon, Apr 19, 2021 at 10:43:02AM -0700, gilleslandais wrote:
But you are true , a <REFSYS> including (refCode, author, year) is
something that I prefer !
Ah well... that's the temptation of quick and simple solutions...
I'd always be in favour of those *except* we already have a more
general and comprehensive thing for that. And there's few things worse
than having two mechanisms that do the same thing -- it's the
guaranteed end of interoperability. If you're unlucky, exactly half
the producers will implement one but not the other, and exactly half
the consumers will implement one but not the other. Then, the
likelihood that an annotation can be used is one in four.
Let's not do that.
It also makes us look bad.
For measures it is may be more complicate -
In vizier the photometry filter characteristics is not a part of
the original data (it could - but often it is added by CDS who
assigned a filter or a similar filter for magnitude columns) - if
VizieR provides the photometry characteristics, it is important to
specify the origin. ProvDM is adapted for that, but the parsing is
may be a little discouraging for clients..
It's even more discouraging if clients can't predict if they have the
complex or the simple thing.
Of course, in the RFC of ProvDM I've also argued that we're
introducing too much complexity in one go, so you have my sympathy
when you say that full IVOA ProvDM is perhaps a little, if you will,
discouraging.
But rather than building an incompatible alternative, I'd much prefer
if we defined a ProvCore (say) that basically is a VO-DML mapping of W3C
prov and that is *a true subset* of ProvDM (so full ProvDM consumers
understand it). Or we just define a "pattern" how this kind of thing
is to be written that people can apply without having to read all of
ProvDM.
So may be a simple way could be just a comment ?.. in that case is
it better to put the (origin) comment on Mango:Parameter.comment or
on Mango:stcextend.Photometry ?
That's of course another thought (I'm calling it Mark's law because
Mark Taylor taught it to me): Only make machine-readable what
machines want to read.
What you describe sounds like something that perhaps is just fine
somewhere where humans that care in a particular case can reliably
find it.
|
On Wed, Apr 28, 2021 at 5:44 AM msdemlei ***@***.***> wrote:
But rather than building an incompatible alternative, I'd much prefer
if we defined a ProvCore (say) that basically is a VO-DML mapping of W3C
prov and that is *a true subset* of ProvDM (so full ProvDM consumers
understand it). Or we just define a "pattern" how this kind of thing
is to be written that people can apply without having to read all of
ProvDM.
+1 on generating a pattern to map provenance information to the provenance
model.
|
For VizieR it will be really appreciated (to not say required) to have common way to provide a minimal origin information.
The mango VizieR prototype uses the dock "associatedData" to link a remote URL which contains a "complete" VOProvenance.
I would like to add an other concise provenance output in the VOTable (for "naive" client)
The minimal provenance for a VOTable are: author+year_of_publication, doi or bibcode of the reference article.
In DatasetDM, I didn't see a clear distinction between creator/author .. Markus do you have an example of this serialization in your output?
.. and I would like more - but is it possible in a concise serialization: to specify a short annotation to specify the origin of a measure - e.g. the filter configuration: with the curator + a URL.
Any idea ?
The text was updated successfully, but these errors were encountered: