-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uniqueness of Artifact.uri #2
Comments
examples that abuse intended Artifact.uri uniqueness: At CADC, many observations include a raw Plane and a calibrated Plane; previews are generated from the calibrated data but are assigned (added as Artifacts) to both Planes... for the raw Plane, the preview does convey what that data could look like (after calibration) but it doesn't convey what it looks like as-is. So it is a preview, but... |
At MAST, observations from some missions like JWST include artifacts that are shared between multiple observations. These are things like guide star files, association tables, and association pool files. They are auxiliary products that get produced by the pipeline but don't belong to a single observation. As such, we list them for each observation that requires it as we would otherwise need to have a concept of an orphaned artifact (ie, a file with no observation/plane). During our meeting, we discussed the potential of having more complex observations that store all artifacts in it (eg, all extracted spectra). This would be a significant change, both in our code base and potentially on the conceptual understanding of our users (depending on how to present that). We may explore that at some point, but for now it's not in the cards. My recommendation is that artifact.uri be unique within an observation, but that multiple instances of the same artifact uri can be shared across observations. |
A MAST example is that of Guide Star files that are obtained for an entire telescope pointing (HST & JWST) and so are associated with multiple observations. Each observation is processed individually into the CAOM XML file so each GS file becomes an non-unique artifact.uri in every observation - they do however, have unique UUIDs |
The "shared preview" usage in CADC would, violate "unique within an observation"; if preview is supposed to be a quick visual way to "examine the content" then that usage seems ok... if it was restricted to "examine the quality" then maybe the shared preview is more questionable, but I don't think we can feasibly limit the meaning of preview like that: especially when we define Artifact.productType to be the terms of the DataLink semantics vocabulary. |
I am thinking in this direction:
Artifact.uri should be globally unique for productType == this: a primary file should only belong to a single plane in a single observation. All other productType(s) are references and URIs can be used in multiple planes/observations. Obviously, having two of the same artifact in the same plane is still not allowed (surely a bug) and the current code prevents it. Code validation could check for duplicate "this" artifacts in an observation, but the check vs other observations would require a unique index in a database. This is implementable in PostgreSQL because one can define an index with a "where" clause, eg
but such a complex rule would potentially be problematic in other DB servers. |
aside about Artifact.id (uuid): these denote a single entity (row in database) so they have to be different due to the type of arrow (composition) in the Plane->Artifact relationship. To have multiple planes refer to the same Artifact, the relation would have to be reference and Artifact(s) would not be part of the Observation: They would be separate entities that have to be peristed and managed independently. I think that would really break the core concept that an Observation is a single self-contained entity (and fairly denormalised to accomplish that) that can be curated and synced. I think being more clear that the URI in Artifact is a reference to an external resource is sufficient. Of course, if you have multiple artifacts with the same URI you have more complex work to maintain the other Artifact metadata (contentLength, contentChecksum). |
In the analysis of observations belonging to multiple proposals, the current solution is to create an observation with no proposal and then create a DerivedObservation for each proposal. Here, belonging is in the sense of implied access rights (for proprietary data). The Important here: the two DerivedObservation-Plane.Artifact could have the same Artifact.uri(s) if the file(s) are the same. So, it is not really feasible to restrict the uniqueness of Artifact.uri -- it is a simple reference to an external resource. |
The
Artifact.uri
field is a reference to an externally stored object (usually a file, but could be a database table or maybe a directory in VOSpace holding multiple files).It was intended that
Artifact.uri
is unique -- no two artifacts have the same URI -- but the implementations did not fully enforce this and usage is not consistent with the original intent.detail:
The text was updated successfully, but these errors were encountered: