From 87d6ac5435728466b25965e04f87b896f2114472 Mon Sep 17 00:00:00 2001 From: Mireille LOUYS <33840665+loumir@users.noreply.github.com> Date: Thu, 10 Oct 2024 19:01:17 +0200 Subject: [PATCH 1/6] Update MANGO.tex section Uses cases rewording and usage of properties instead of parameters when possible . --- doc/MANGO.tex | 111 ++++++++++++++++++++++++-------------------------- 1 file changed, 54 insertions(+), 57 deletions(-) diff --git a/doc/MANGO.tex b/doc/MANGO.tex index ebe1f34..6e28de3 100644 --- a/doc/MANGO.tex +++ b/doc/MANGO.tex @@ -164,29 +164,29 @@ \section{Representing observed astronomical objects : Use Cases and Requirement \subsection{Use Cases} The main purpose of MANGO is to add an upper description level to the tabular data of query responses. -MANGO is not designed to replace the meta-data already present in query responses, +MANGO is not designed to replace the meta-data already present in query responses, but on the contrary, to provide a model_aware layer with structured classes to interpret them and exploit them in client applications. Uses-cases have been collected since 2019 from representatives of various astronomical missions, archive designers and tools developers. -The contribution was totally open. This gave a good picture of the needs but we do not pretend +The call for contribution was totally open. This gave a good picture of the needs but we do not pretend that everything will be supported by this first version. All the use-cases summarized below are detailed in appendix. -\subsubsection{Gaia} -Gaia mission is producing the largest and more precise 3D map of our galaxy. -Gaia core solution is able to solve the astrometric solution of more than 1 +\subsubsection{GAIA} +The GAIA mission is producing the largest and more precise 3D map of our galaxy. +The GAIA Astrometric Core Solution is able to provide the astrometry of more than 1 billion sources by complex models and algorithms \citep{2012A&A...538A..78L}. -Using a minimisation problem approach, different detections identified on -different scans can be associated to the same astronomical source. Some of the +Using a minimization problem approach, different detections identified on +different scans can be associated to the appropriate astronomical source. Some of the properties would be direct measurements on single scans (e.g. positions or -magnitudes). Also other properties like radial velocity (measured in redshift +magnitudes). Other properties like radial velocity (measured in redshift units) are also obtained at integration time of the scans. -A non-exhaustive list of properties required for Gaia use cases would be composed +A non-exhaustive list of properties required for GAIA use cases would be composed of: \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] - \item identifier + \item detection identifier \item sky reference position \item proper motion \item parallax and distance @@ -201,18 +201,18 @@ \subsubsection{Gaia} \end{itemize} \subsubsection{Euclid} -Euclid telescope has been designed to unveil some of the questions about the -dark Universe, including dark matter and dark energy, what would include, e.g. +The Euclid telescope has been designed to unveil some of the questions about the +dark Universe, including dark matter and dark energy, what would include, f.i quite accurate measurements of the expansion of the Universe. -Euclid will mainly observe extragalactic objects providing, e.g. information -of the shapes of galaxies, gravitational lensing, baryon acoustic oscillations +Euclid will mainly observe extragalactic objects providing, f.i information +about the shapes of galaxies, gravitational lensing, baryon acoustic oscillations and distances to galaxies using spectroscopic data. For this mission, and apart from the common metadata provided for extra galactic sources into astronomical catalogues, a good support for object taxonomy and shapes of objects will be required. As known due to general relativity effects, -shapes far galaxies could be deformed due to gravitational lensing effects, +shapes of far galaxies could be deformed due to gravitational lensing effects, producing convergence (visual displacements on the position) and rear (deformation of the shape) effects. All these metadata should be ready for annotations and, also, correlated to theoretical or real metadata in other datasets. @@ -222,6 +222,8 @@ \subsubsection{Euclid} observatories will be combined with Euclid data to produce consistent scientific datasets. +Typical features for objects entail: + \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] \item identifier \item sky position @@ -231,7 +233,6 @@ \subsubsection{Euclid} \item redshift \item photometric redshift \end{itemize} -From the above contributions we can issue a list of use cases that lead the MANGO design: \subsubsection{Exoplanets} Annotation of (exo-)planetary records in catalogues requires some @@ -347,30 +348,29 @@ \subsubsection{Chandra Archive} \end{enumerate} -These science usecases are detailed in ref{sec:chandra}. +These science usecases are detailed \ref{sec:chandra}. \subsubsection{Vizier catalog archive} -VizieR provides science ready catalogs coming from space agencies or articles and covering number of -different science cases. +VizieR provides science ready catalogs coming from space agencies or articles from the astronomical journals, covering number of different science cases. Published data encompass a very large set of measures (position, photometry, redshift, source type, etc.) depending on their origin. They can result from observations, simulations, models or catalog compilations. Individual Vizier tables can contain data all related to one source (e.g. time series of positions or magnitudes) or to a set of sources (one row per source) or a mix of both. The MANGO model must be able to provide a standard representation of most of the metadata contained -in Vizier query responses, whether native or computed by the CDS, -simple quantities or associated complex data. -MANGO is not meant to replace the current management of the meta-data, -it is a way to make those meta-data understandable for a wide panel of VO-compliant clients. +in Vizier query responses, either native or computed by the CDS, and organized either as +simple quantities or as associated complex data. +MANGO is not meant to replace the current management of the ViZier metadata, but rather +to make those understandable/interoperable for a wide panel of VO-compliant clients. \subsubsection{Client (on Mark Taylor behalf)} -Right now, the meta-data provided within the VOTable allow clients such Aladin or Topcat to run most -of the functionalities expected by the user, either for data analysis of plotting. -This information is often guess from UCDs, UTypes or columns name. It can also be given by the user. -Clients have no expectations of working with full model instances but in some cases models -can help to know how quantities in an input table relate to each other. +Right now, the meta-data provided within the VOTable allow client software such Aladin or Topcat to run most +of the functionalities expected by the user, either for data analysis or plotting. +This information is often inferred from UCDs, UTypes or column names. It can also be given by the user. +Client applications do not require working with full model instances but in some cases models +can make it explicit how quantities in an input table relate to each other. -In most cases this is for visualisation, e.g.: +Most cases are oriented towards interpretation of columns for visualization, e.g.: \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] \item what is the sky position for this row (what columns contain latitude and longitude, and what sky system are they in) @@ -390,10 +390,10 @@ \subsubsection{Client (on Mark Taylor behalf)} \item does this table contain sky positions, or HEALPix tiles, or both? What's the best way to represent it on the sky? - \item What is the meaning of such URL found out in a table?s + \item what is the meaning of such URL found out in a table?s \end{itemize} -But there are some other places too: +But there are some other cases like: \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] \item how do I propagate this sky position to a future epoch (what columns contain pmra, pmdec, and maybe all the @@ -405,16 +405,15 @@ \subsubsection{Client (on Mark Taylor behalf)} \end{itemize} This usage shows that MANGO must be designed in a way that individual measurements or quantities -can be easily be identified as such and manipulated independently of the whole instance. +can easily be identified as such and manipulated independently of the whole instance. - -tTis document does not recommend one approach over another. +This document does not recommend one approach over another. This is a matter for the data providers to decide. \subsubsection{Xmatch tool } The basic cross-match of two astronomical tables consists in associating pairs of sources -- one from each table -- fulfilling a given angular distance based criterion. -In relational algebra terms, it is a theta-join on a distance predicate. +%In relational algebra terms, it is a theta-join on a distance predicate. More generally, a cross-match is the association of sources from different tables given their proximity in an astrometric (but also possibly photometric, statistical, ...) parameter @@ -425,16 +424,17 @@ \subsubsection{Xmatch tool } It may also take into account positional uncertainties to reject the statistically unlikely associations. In the latter case (cross-match between two tables taking into account positional errors), -the tool needs to be able to retrieve the errors associated to the each position in each table. +the tool needs to retrieve the errors associated to the each position in each table. -UCDs may help in identifying the errors associated to a positional columns as shown in -table, but this is not sufficient to table with more complex cases based on multi-parameter cases. +UCDs may help in identifying the errors associated to a positional columns, +%as shown in table +but this is not sufficient for tables with more complex cases based on multi-parameter cases. \subsection{Requirements} From the above list of use-cases, we have identified 4 domains for which -the model should provide added value: 1) supported quantities 2) data description enhancement, +the model should provide added value: 1) nature of supported quantities 2) data description enhancement, 3) description of quantities consisting of several columns and 4) connected quantities. %\begin{itemize} @@ -478,10 +478,9 @@ \subsection{Requirements} \begin{itemize} \item Supported quantities: \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] - \item The nature and number of properties characterising a MANGO object must be open. - - \item MANGO must support explicit classes, native or imported from IVOA data-models, - for the most used properties. + \item The nature and number of properties characterizing a MANGO object must be open. + \item MANGO must support explicit classes, native or imported from IVOA data models, + for the most used astronomical properties. \item MANGO must provide a generic way to support properties that do not enter the above category. \item MANGO object must support multiple instances of the same property class. \item The presence of any property in MANGO instances must be optional. @@ -493,13 +492,13 @@ \subsection{Requirements} \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] \item MANGO must support a convenient way to identify model instances. \item MANGO must be able to attach relevant coordinate (or calibration) - systems to any quantities. - \item MANGO must be able to attach complex errors to any numerical quantities. - \item MANGO must be able to define semantics for any quantity or group of quantities. - This will add a capability that is currently missing from the VOTables schema. + systems to any measured quantity. + \item MANGO must be able to attach complex errors to any numerical quantity. + \item MANGO must be able to define semantics for any quantity or group of quantities. \\ + This will add a capability that is currently missing from the VOTable schema. This will also make it possible to specify the role of quantities - that are present more than once, for example by distinguishing between a pointing direction - and the target position. + that are present more than once, for example by distinguishing between a pointing position on sky + and a target position. \item MANGO must be able to specify the set of allowed values for quantities which purpose is to flag data (e.g. detection flag). It must also be able to provide a description for each of these values. This model feature will provide a straightforward way of providing users the meaning of flag values. @@ -510,27 +509,25 @@ \subsection{Requirements} \item Multi-columns quantities \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] - \item MANGO must be able to provide an accurate description of quantities which parameters are spread + \item MANGO must be able to provide an accurate description of properties which attributes are spread out on multiple columns (e.g. positions, errors). \item MANGO must be able to describe errors with the most common shapes (symmetric values, correlation or covariance matrices, ellipses), all with different confidence levels. Such complex quantities cannot be reconstructed from simple field descriptions, but with a model that captures all the components and provides the missing metadata; - \item MANGO must be able to set up correlation links between parameters. For example, + \item MANGO must be able to set up correlation links between properties. For example, the position of an object may depend on its proper motion. This kind of correlation can be revealed with a model that can link data columns. \item MANGO must provide an accurate description of the epoch propagation. - This is probably the most important use case for MANGO. It consists of constructing 6 parameter - position vectors (position, proper motion, parallax and radial velocity), whose components are correlated and - valid for a given epoch. - This feature is required to compare positions given by surveys with high astrometry accuracy such as GAIA. + This is probably the most important use case for MANGO. It consists in constructing 6 parameter position vectors (position, proper motion, parallax and radial velocity), whose components are correlated and valid for a given epoch. + This feature is required to compare positions given by surveys with high astrometry accuracy such as GAIA. \end{itemize} \item Connected quantities : \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] - \item MANGO must be able to setup links between different parameters of the same table. + \item MANGO must be able to setup links between different properties of the same table. This can be relevant for instance for attaching detection likelihoods with source positions - or to tag properties with timestamps? + or to tag properties with timestamps. \item MANGO must be able to link MANGO instances to each other, allowing for instance to connect one source with all of its detections. \end{itemize} From cb850029c6af8a1a7c5d0ca872794a3e0f66eb78 Mon Sep 17 00:00:00 2001 From: Laurent MICHEL Date: Thu, 17 Oct 2024 14:30:19 +0200 Subject: [PATCH 2/6] remove commented block --- doc/MANGO.tex | 37 ------------------------------------- 1 file changed, 37 deletions(-) diff --git a/doc/MANGO.tex b/doc/MANGO.tex index 202eb73..a639cce 100644 --- a/doc/MANGO.tex +++ b/doc/MANGO.tex @@ -437,43 +437,6 @@ \subsection{Requirements} the model should provide added value: 1) nature of supported quantities 2) data description enhancement, 3) description of quantities consisting of several columns and 4) connected quantities. -%\begin{itemize} -% \item Supported properties: -% \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] -% \item COORDINATE SYSTEM : attach a specific coordinate (or calibration) -% system to mapped quantities. -% \item SEMANTIC : Defining a semantic for mapped quantities adds a capability that is currently -% missing from the VOTables schema. This also makes it possible to specify the role of quantities -% that are present more than once, for example by distinguishing between a pointing direction -% and the target position. -% \item FLAG VALUE : Some quantities come with quality flags, the interpretation of which requires inference -% to a free text description. The model can provide a straightforward way of telling the user -% what the current value means. -% \item DATA ORIGIN : -% \item DATA LINK : Some quantities come with links to external data referenced by WEB endpoints. -% Such links are considered as object properties for which the model provides -% an accurate way to specify the nature of these links. -% Usually object links are provided by DataLink services, -% then this MANGO feature is proposed to annotate datasets issued by services -% \end{itemize} -% \item Multi-columns quantities -% \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] -% \item ERROR : Errors can have different shapes (symmetric values, correlation or covariance matrices, ellipses), -% all with different confidence levels. Such complex quantities cannot be reconstructed from -% simple field descriptions, but with a model that captures all the components -% and provides the missing metadata; -% \item QUANTITY CORRELATIONS : In some cases, quantities can be correlated. For example, -% the position of an object may depend on its proper motion. This kind of correlation can be revealed -% with a model that can link data columns. -% \item EPOCH PROPAGATION: This is probably the most important use case for MANGO. It consists of constructing 6 parameter -% position vectors (position, proper motion, parallax and radial velocity), whose components are correlated and -% valid for a given epoch. This feature is required to compare positions of high precision astrometry surveys such as GAIA. -% \end{itemize} -% \item Connected quantities : There are several ways to link quantities. Quantities in the same table can be -% such as values with their errors or with their associated probabilities. We can also join quantities from different -% tables, such as sources with their detections. Both patterns require a model to be properly exposed. -% -%\end{itemize} \begin{itemize} \item Supported quantities: From 17785f16c58c4973ec7b56b44b6c4775539a730d Mon Sep 17 00:00:00 2001 From: Laurent MICHEL Date: Thu, 17 Oct 2024 15:39:11 +0200 Subject: [PATCH 3/6] Add Ian Evans input --- doc/MANGO.tex | 138 ++++++++++++++++++++------------------------------ 1 file changed, 56 insertions(+), 82 deletions(-) diff --git a/doc/MANGO.tex b/doc/MANGO.tex index a639cce..39cb329 100644 --- a/doc/MANGO.tex +++ b/doc/MANGO.tex @@ -164,7 +164,8 @@ \section{Representing observed astronomical objects : Use Cases and Requirement \subsection{Use Cases} The main purpose of MANGO is to add an upper description level to the tabular data of query responses. -MANGO is not designed to replace the meta-data already present in query responses, but on the contrary, to provide a model_aware layer with structured classes to interpret them and exploit them in client applications. +MANGO is not designed to replace the meta-data already present in query responses, but on the contrary, +to provide a model-aware layer with structured classes to interpret them and exploit them in client applications. Uses-cases have been collected since 2019 from representatives of various astronomical missions, archive designers and tools developers. @@ -268,87 +269,60 @@ \subsubsection{Morphologically Complex Structures} \item aggregation of sub-parts (that can be heterogeneous). \end{itemize} -\subsubsection{Chandra Archive} -The Chandra Source Catalog(CSC) is the definitive catalog of serendipitous X-ray sources identified in -publicly released imaging observations obtained by NASA’s ChandraX-ray Observatory (CXO). - -The catalog itself consists of approximately 1,700 columns covering properties at the -individual observation and stacked analysis levels. -Table \ref{tab:chandra_properties} summarizes some of the basic catalog properties derived -from standard CSCView queries. - -\begin{table}[ht!] - \tiny - \begin{tabular}{|p{0.4cm}p{10.0cm}|} - \hline - \multicolumn{2}{|l|}{\textbf{Per Source:}} \\ - & \texttt{ Source name } \\ - & \texttt{ Source position and position errors } \\ - & \texttt{ Significance of the source (signal to noise) } \\ - & \texttt{ Likelihood of the source (True, False, or Marginal detection) } \\ - & \texttt{ Source extent flag } \\ - & \texttt{ Variability flag } \\ - & \texttt{ Spectral variability flag } \\ - & \texttt{ Fluxes and flux errors in ACIS bands b, h, m, s, u } \\ - & \texttt{ Flux and flux error in HRC band w } \\ - & \texttt{ Hardness ratios and errors for hm, hs, ms colors } \\ - & \texttt{ Short term (intra-obs) variability probability for each band } \\ - & \texttt{ Long term (inter-obs) variability probability for each band } \\ - & \texttt{ Spectral (hardness ratios) variability for each color } \\ - \hline - \multicolumn{2}{|l|}{\textbf{Per Detection (at the stack level):}} \\ - & \texttt{ Detection ID } \\ - & \texttt{ Detection position and position errors } \\ - & \texttt{ Flux significance of the detection (S/N) } \\ - & \texttt{ Detection likelihood (True, False, or marginal detection?) } \\ - & \texttt{ Source extent code (codification of source extent in different bands) } \\ - & \texttt{ Variability flag } \\ - & \texttt{ Spectral variability flag } \\ - & \texttt{ Fluxes and flux errors in ACIS bands b, h, m, s, u } \\ - & \texttt{ Flux and flux error in HRC band w } \\ - & \texttt{ Hardness ratios and errors for hm, hs, ms colors } \\ - & \texttt{ Short term (intra-obs) variability probability for each band } \\ - & \texttt{ Long term (inter-obs) variability probability for each band } \\ - & \texttt{ Spectral (hardness ratios) variability probability for each color } \\ - \hline - \multicolumn{2}{|l|}{\textbf{Per Detection (at the observation level):}} \\ - \multicolumn{2}{|l|}{ Note that source detection is done at the stack level, but properties are estimated for the } \\ - \multicolumn{2}{|l|}{detections at each observation using the detection region from the stack level.} \\ - & \texttt{ Detection ID } \\ - & \texttt{ Detection position and position errors } \\ - & \texttt{ Flux significance of the detection (S/N) } \\ - & \texttt{ Detection likelihood } \\ - & \texttt{ Source extent code (codification of source extent in different bands) } \\ - & \texttt{ Variability code (applies to intra-obs only) } \\ - & \texttt{ Fluxes and flux errors in ACIS bands b, h, m, s, u } \\ - & \texttt{ Flux and flux error in HRC band w } \\ - & \texttt{ Hardness ratios and errors for hm, hs, ms colors } \\ - & \texttt{ Short term (intra-obs) variability probability for each band } \\ - \hline - - \end{tabular} - \caption{ Example Chandra Source Catalog Properties } - \label{tab:chandra_properties} - \end{table} - - -\begin{enumerate}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] - \item Searching for spectrally variable or flaring point sources - \item Identifying flaring point sources - \item Find sources with changing properties: Look for sources with changes of spectral - slope and column density between observations so as function of time; - this can easily be done across X-ray catalogs provided that the same spectral model (absorbed power-law) - is used in the different catalogs. The changes in spectral slope and column density are measured - in sigma using the errors as well on each quantity to evaluate the statistical significance of the changes. - \item Finding Tidal Disruption Events in the CSC - \item Quick, rough identification of AGN, galaxies, and stars - \item Follow-up research - \item Spectral decomposition of X-ray sources - \item Using CSC 2.0 data to create Color-Color-Intensity plots(CCI) \item Using CSC 2.0 data to create Color-Color-Intensity plots(CCI) - -\end{enumerate} - -These science usecases are detailed \ref{sec:chandra}. +\subsubsection{X-ray Observatory Archives} + +The requirements for both Chandra (get more in appendix \ref{sec:chandra}) +and XMM-Newton \footnote{https://www.cosmos.esa.int/web/xmm-newton} science cases +are combined in this use case. +These 2 X-ray observatories have many common features that could take advantage of sharing the same model: + +\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] + \item Both work as photon counters with a good time resolution. + The result is that physical quantities remain tied to the instrument response. + Therefore, the metadata must refer to instrumental parameters that are needed to + understand the data well. + \item Both observatories work in pointed mode and provide the community with sets of products per observation. + \item Observation-level data are periodically merged into catalog of detections, + which is a very important scientific product, + but individual observations are equally important and are used directly for analysis. + \item Detection catalogs are merged into source catalogs, and it is important to be able to + associate sources with their detections. + \item Equally important, given the more than 2 decades that both spacecraft are flying, + is the ability to correlate catalog data with time. + \item X-ray data reveal quantities that are usually not well supported by the VO: + \begin{itemize} + \item energy bands + \item hardness ratio + \item Flags that are very important for understanding the source detections. + \item Complex errors (asymmetric, ellipse) + \item model-based data (flux, spectra) + \end{itemize} + \item X-ray data are often analyzed in conjunction with data from other domains, + This is made easier if they all have the same way of describing the quantities of interest. +\end{itemize} + +% Ian E. Mail (17/10)======================== + +% The CSC does provide independent lower and upper confidence limits for each measurement as part of the data tables. +% They are separate columns for us (eg, we have measurement, measurement_lolim, and measurement_hilim as 3 columns) +% but I wonder if these can be handled as a single concept in MANGO? +% For positions on the sky we similarly use a position error ellipse with defined semi-axes and orientation. + +% The one other thing I think about is that we often have many very similar measurements in the CSC. +% For example, for aperture photometry, we have multiple energy bands, +% and in each energy band we measure flux in the aperture in multiple ways +% (eg, photon flux, energy flux, model energy flux (based on several canonical +% spectral models such as absorbed power-law, absorbed black-body, …), +% spectral fit energy flux (based on several spectral models where the parameters are +% fitted to the data - requires more counts to get robust fits). +% And we may do this for multiple configurations of individual observations of a source +% (eg, a straight average - usually for comparison with other catalogs, or a set based on a multi-band Bayesian +% Blocks analysis - +% so we’re grouping observations in which the source has constant flux in each of the energy bands). +% How we represent these many different types of very similar measurements in a way that is scientifically useful +% and searchable is complex. Can this even be done usefully using UCDs? + +% ============================================ \subsubsection{Vizier catalog archive} VizieR provides science ready catalogs coming from space agencies or articles from the astronomical journals, covering number of different science cases. From a3e07ab02aace789805127b5e0f7de38b7b0d734 Mon Sep 17 00:00:00 2001 From: Laurent MICHEL Date: Thu, 17 Oct 2024 15:39:19 +0200 Subject: [PATCH 4/6] data --- doc/Makefile | 2 +- doc/ivoatexmeta.tex | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/Makefile b/doc/Makefile index d9beb35..f9e568e 100644 --- a/doc/Makefile +++ b/doc/Makefile @@ -7,7 +7,7 @@ DOCNAME = MANGO DOCVERSION = 0.1 # Publication date, ISO format; update manually for "releases" -DOCDATE = 2024-10-10 +DOCDATE = 2024-10-17 # What is it you're writing: NOTE, WD, PR, REC, PEN, or EN DOCTYPE = WD diff --git a/doc/ivoatexmeta.tex b/doc/ivoatexmeta.tex index 7be4a02..5fedb98 100644 --- a/doc/ivoatexmeta.tex +++ b/doc/ivoatexmeta.tex @@ -1,7 +1,7 @@ % GENERATED FILE -- edit this in the Makefile \newcommand{\ivoaDocversion}{0.1} -\newcommand{\ivoaDocdate}{2024-10-10} -\newcommand{\ivoaDocdatecode}{20241010} +\newcommand{\ivoaDocdate}{2024-10-17} +\newcommand{\ivoaDocdatecode}{20241017} \newcommand{\ivoaDoctype}{WD} \newcommand{\ivoaDocname}{MANGO} \renewcommand{\ivoaBaseURL}{https://www.ivoa.net/documents/MANGO} From be0b9d14ba2463f40d0c754c66016416b51767a3 Mon Sep 17 00:00:00 2001 From: Laurent MICHEL Date: Thu, 17 Oct 2024 15:51:19 +0200 Subject: [PATCH 5/6] put use cases in a separate file --- doc/MANGO.tex | 243 +---------------------------------------------- doc/usecases.tex | 240 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 241 insertions(+), 242 deletions(-) create mode 100644 doc/usecases.tex diff --git a/doc/MANGO.tex b/doc/MANGO.tex index 39cb329..8a88d8a 100644 --- a/doc/MANGO.tex +++ b/doc/MANGO.tex @@ -163,248 +163,7 @@ \section{Representing observed astronomical objects : Use Cases and Requirement \subsection{Use Cases} -The main purpose of MANGO is to add an upper description level to the tabular data of query responses. -MANGO is not designed to replace the meta-data already present in query responses, but on the contrary, -to provide a model-aware layer with structured classes to interpret them and exploit them in client applications. - -Uses-cases have been collected since 2019 from representatives of various astronomical -missions, archive designers and tools developers. -The call for contribution was totally open. This gave a good picture of the needs but we do not pretend -that everything will be supported by this first version. -All the use-cases summarized below are detailed in appendix. - -\subsubsection{GAIA} -The GAIA mission is producing the largest and more precise 3D map of our galaxy. -The GAIA Astrometric Core Solution is able to provide the astrometry of more than 1 -billion sources by complex models and algorithms \citep{2012A&A...538A..78L}. -Using a minimization problem approach, different detections identified on -different scans can be associated to the appropriate astronomical source. Some of the -properties would be direct measurements on single scans (e.g. positions or -magnitudes). Other properties like radial velocity (measured in redshift -units) are also obtained at integration time of the scans. - -A non-exhaustive list of properties required for GAIA use cases would be composed -of: - -\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] - \item detection identifier - \item sky reference position - \item proper motion - \item parallax and distance - - \item source extension - \item radial velocity - \item redshift - \item photometry - \item date of detection - \item correlation - \item multiple detection -\end{itemize} - -\subsubsection{Euclid} -The Euclid telescope has been designed to unveil some of the questions about the -dark Universe, including dark matter and dark energy, what would include, f.i -quite accurate measurements of the expansion of the Universe. - -Euclid will mainly observe extragalactic objects providing, f.i information -about the shapes of galaxies, gravitational lensing, baryon acoustic oscillations -and distances to galaxies using spectroscopic data. - -For this mission, and apart from the common metadata provided for extra galactic -sources into astronomical catalogues, a good support for object taxonomy and -shapes of objects will be required. As known due to general relativity effects, -shapes of far galaxies could be deformed due to gravitational lensing effects, -producing convergence (visual displacements on the position) and rear (deformation -of the shape) effects. All these metadata should be ready for annotations and, -also, correlated to theoretical or real metadata in other datasets. - -Finally, crossmatch information with other catalogues will be of crucial interest -as data from other satellites and, more importantly, from ground based -observatories will be combined with Euclid data to produce consistent scientific -datasets. - -Typical features for objects entail: - -\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] - \item identifier - \item sky position - \item correlation with other catalogues - \item photometry (ground + satellite ) - \item morphology class - \item redshift - \item photometric redshift -\end{itemize} - -\subsubsection{Exoplanets} -Annotation of (exo-)planetary records in catalogues requires some -specific metadata or model. - -The use cases identified requires the following metadata: -\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] - \item the degree of confidence in the detection: exoplanets candidates -w.r.t. confirmed ones, plus last update of the record content ; - \item the method used in the discovery (since it affects the available -stellar system description parameters); - \item a set of stellar host characteristics (besides sky coordinates): -activity, mass, type, -metallicity, age, some systemic values, like the global RV (radial -velocity) of the system, and so on; - \item (exo-)planet parameters, like mass, orbital period, orbit's -eccentricity, RV semi-amplitude, time at periastron (for RV detections) -or central transit time (for transit method), longitude of periastron, -and so on. -\end{itemize} - - -\subsubsection{Morphologically Complex Structures} -The ViaLactea Knowledge Base (VLKB, see \cite{2016SPIE.9913E..0HM}) is a set of data -resources and services built up to study the star formation regions and -processes in the Milky Way. Besides 2-D images and 3-D radial velocity -cubes, the VLKB exposes a bunch of source catalogues. -A model that supports description of such catalogues will need a -way to describe sources with: -\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] - \item non-point-like positions; - \item extended complex area, possibly as multiple detached areas; - \item aggregation of sub-parts (that can be heterogeneous). -\end{itemize} - -\subsubsection{X-ray Observatory Archives} - -The requirements for both Chandra (get more in appendix \ref{sec:chandra}) -and XMM-Newton \footnote{https://www.cosmos.esa.int/web/xmm-newton} science cases -are combined in this use case. -These 2 X-ray observatories have many common features that could take advantage of sharing the same model: - -\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] - \item Both work as photon counters with a good time resolution. - The result is that physical quantities remain tied to the instrument response. - Therefore, the metadata must refer to instrumental parameters that are needed to - understand the data well. - \item Both observatories work in pointed mode and provide the community with sets of products per observation. - \item Observation-level data are periodically merged into catalog of detections, - which is a very important scientific product, - but individual observations are equally important and are used directly for analysis. - \item Detection catalogs are merged into source catalogs, and it is important to be able to - associate sources with their detections. - \item Equally important, given the more than 2 decades that both spacecraft are flying, - is the ability to correlate catalog data with time. - \item X-ray data reveal quantities that are usually not well supported by the VO: - \begin{itemize} - \item energy bands - \item hardness ratio - \item Flags that are very important for understanding the source detections. - \item Complex errors (asymmetric, ellipse) - \item model-based data (flux, spectra) - \end{itemize} - \item X-ray data are often analyzed in conjunction with data from other domains, - This is made easier if they all have the same way of describing the quantities of interest. -\end{itemize} - -% Ian E. Mail (17/10)======================== - -% The CSC does provide independent lower and upper confidence limits for each measurement as part of the data tables. -% They are separate columns for us (eg, we have measurement, measurement_lolim, and measurement_hilim as 3 columns) -% but I wonder if these can be handled as a single concept in MANGO? -% For positions on the sky we similarly use a position error ellipse with defined semi-axes and orientation. - -% The one other thing I think about is that we often have many very similar measurements in the CSC. -% For example, for aperture photometry, we have multiple energy bands, -% and in each energy band we measure flux in the aperture in multiple ways -% (eg, photon flux, energy flux, model energy flux (based on several canonical -% spectral models such as absorbed power-law, absorbed black-body, …), -% spectral fit energy flux (based on several spectral models where the parameters are -% fitted to the data - requires more counts to get robust fits). -% And we may do this for multiple configurations of individual observations of a source -% (eg, a straight average - usually for comparison with other catalogs, or a set based on a multi-band Bayesian -% Blocks analysis - -% so we’re grouping observations in which the source has constant flux in each of the energy bands). -% How we represent these many different types of very similar measurements in a way that is scientifically useful -% and searchable is complex. Can this even be done usefully using UCDs? - -% ============================================ - -\subsubsection{Vizier catalog archive} -VizieR provides science ready catalogs coming from space agencies or articles from the astronomical journals, covering number of different science cases. -Published data encompass a very large set of measures (position, photometry, redshift, source type, etc.) -depending on their origin. -They can result from observations, simulations, models or catalog compilations. -Individual Vizier tables can contain data all related to one source (e.g. time series of positions or magnitudes) or to a set of sources (one row per source) or a mix of both. - -The MANGO model must be able to provide a standard representation of most of the metadata contained -in Vizier query responses, either native or computed by the CDS, and organized either as -simple quantities or as associated complex data. -MANGO is not meant to replace the current management of the ViZier metadata, but rather -to make those understandable/interoperable for a wide panel of VO-compliant clients. - -\subsubsection{Client (on Mark Taylor behalf)} -Right now, the meta-data provided within the VOTable allow client software such Aladin or Topcat to run most -of the functionalities expected by the user, either for data analysis or plotting. -This information is often inferred from UCDs, UTypes or column names. It can also be given by the user. -Client applications do not require working with full model instances but in some cases models -can make it explicit how quantities in an input table relate to each other. - -Most cases are oriented towards interpretation of columns for visualization, e.g.: -\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] - \item what is the sky position for this row - (what columns contain latitude and longitude, and what sky system are they in) - - \item what +/-ERR error bars should I plot for these points - (what column is a simple error for column A) - - \item what error ellipses should I plot for these sky positions - (what columns provide ra\_error, dec\_error, ra\_dec\_corr, - or how can I derive those from columns that do exist) - - \item where do I get the grid information for a column containing - a vector of samples so I can label the X axis of a spectrogram - (what column or parameter contains an axis vector matching - the sample vectors) - - \item does this table contain sky positions, or HEALPix tiles, or both? - What's the best way to represent it on the sky? - - \item what is the meaning of such URL found out in a table?s -\end{itemize} - -But there are some other cases like: -\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] - \item how do I propagate this sky position to a future epoch - (what columns contain pmra, pmdec, and maybe all the - associated errors and correlation coefficients) - - \item what is the error ellipse/oid to use for a sky/Cartesian crossmatch - (what columns provide the relevant errors and, if available, - correlations) -\end{itemize} - -This usage shows that MANGO must be designed in a way that individual measurements or quantities -can easily be identified as such and manipulated independently of the whole instance. - -This document does not recommend one approach over another. -This is a matter for the data providers to decide. - -\subsubsection{Xmatch tool } -The basic cross-match of two astronomical tables consists in associating pairs of sources -- one -from each table -- fulfilling a given angular distance based criterion. -%In relational algebra terms, it is a theta-join on a distance predicate. - -More generally, a cross-match is the association of sources from different tables given their -proximity in an astrometric (but also possibly photometric, statistical, ...) parameter -space \citep{2017A&A...597A..89P} . - -If proper motions (plus parallax and radial velocities) are available, the cross-match tool -may propagate the positions of each table to a common epoch. -It may also take into account positional uncertainties to reject the statistically unlikely associations. - -In the latter case (cross-match between two tables taking into account positional errors), -the tool needs to retrieve the errors associated to the each position in each table. - -UCDs may help in identifying the errors associated to a positional columns, -%as shown in table -but this is not sufficient for tables with more complex cases based on multi-parameter cases. - - +\input{usecases.tex} \subsection{Requirements} From the above list of use-cases, we have identified 4 domains for which diff --git a/doc/usecases.tex b/doc/usecases.tex new file mode 100644 index 0000000..bc1df9c --- /dev/null +++ b/doc/usecases.tex @@ -0,0 +1,240 @@ +The main purpose of MANGO is to add an upper description level to the tabular data of query responses. +MANGO is not designed to replace the meta-data already present in query responses, but on the contrary, +to provide a model-aware layer with structured classes to interpret them and exploit them in client applications. + +Uses-cases have been collected since 2019 from representatives of various astronomical +missions, archive designers and tools developers. +The call for contribution was totally open. This gave a good picture of the needs but we do not pretend +that everything will be supported by this first version. +All the use-cases summarized below are detailed in appendix. + +\subsubsection{GAIA} +The GAIA mission is producing the largest and more precise 3D map of our galaxy. +The GAIA Astrometric Core Solution is able to provide the astrometry of more than 1 +billion sources by complex models and algorithms \citep{2012A&A...538A..78L}. +Using a minimization problem approach, different detections identified on +different scans can be associated to the appropriate astronomical source. Some of the +properties would be direct measurements on single scans (e.g. positions or +magnitudes). Other properties like radial velocity (measured in redshift +units) are also obtained at integration time of the scans. + +A non-exhaustive list of properties required for GAIA use cases would be composed +of: + +\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] + \item detection identifier + \item sky reference position + \item proper motion + \item parallax and distance + + \item source extension + \item radial velocity + \item redshift + \item photometry + \item date of detection + \item correlation + \item multiple detection +\end{itemize} + +\subsubsection{Euclid} +The Euclid telescope has been designed to unveil some of the questions about the +dark Universe, including dark matter and dark energy, what would include, f.i +quite accurate measurements of the expansion of the Universe. + +Euclid will mainly observe extragalactic objects providing, f.i information +about the shapes of galaxies, gravitational lensing, baryon acoustic oscillations +and distances to galaxies using spectroscopic data. + +For this mission, and apart from the common metadata provided for extra galactic +sources into astronomical catalogues, a good support for object taxonomy and +shapes of objects will be required. As known due to general relativity effects, +shapes of far galaxies could be deformed due to gravitational lensing effects, +producing convergence (visual displacements on the position) and rear (deformation +of the shape) effects. All these metadata should be ready for annotations and, +also, correlated to theoretical or real metadata in other datasets. + +Finally, crossmatch information with other catalogues will be of crucial interest +as data from other satellites and, more importantly, from ground based +observatories will be combined with Euclid data to produce consistent scientific +datasets. + +Typical features for objects entail: + +\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] + \item identifier + \item sky position + \item correlation with other catalogues + \item photometry (ground + satellite ) + \item morphology class + \item redshift + \item photometric redshift +\end{itemize} + +\subsubsection{Exoplanets} +Annotation of (exo-)planetary records in catalogues requires some +specific metadata or model. + +The use cases identified requires the following metadata: +\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] + \item the degree of confidence in the detection: exoplanets candidates +w.r.t. confirmed ones, plus last update of the record content ; + \item the method used in the discovery (since it affects the available +stellar system description parameters); + \item a set of stellar host characteristics (besides sky coordinates): +activity, mass, type, +metallicity, age, some systemic values, like the global RV (radial +velocity) of the system, and so on; + \item (exo-)planet parameters, like mass, orbital period, orbit's +eccentricity, RV semi-amplitude, time at periastron (for RV detections) +or central transit time (for transit method), longitude of periastron, +and so on. +\end{itemize} + + +\subsubsection{Morphologically Complex Structures} +The ViaLactea Knowledge Base (VLKB, see \cite{2016SPIE.9913E..0HM}) is a set of data +resources and services built up to study the star formation regions and +processes in the Milky Way. Besides 2-D images and 3-D radial velocity +cubes, the VLKB exposes a bunch of source catalogues. +A model that supports description of such catalogues will need a +way to describe sources with: +\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] + \item non-point-like positions; + \item extended complex area, possibly as multiple detached areas; + \item aggregation of sub-parts (that can be heterogeneous). +\end{itemize} + +\subsubsection{X-ray Observatory Archives} + +The requirements for both Chandra (get more in appendix \ref{sec:chandra}) +and XMM-Newton \footnote{https://www.cosmos.esa.int/web/xmm-newton} science cases +are combined in this use case. +These 2 X-ray observatories have many common features that could take advantage of sharing the same model: + +\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] + \item Both work as photon counters with a good time resolution. + The result is that physical quantities remain tied to the instrument response. + Therefore, the metadata must refer to instrumental parameters that are needed to + understand the data well. + \item Both observatories work in pointed mode and provide the community with sets of products per observation. + \item Observation-level data are periodically merged into catalog of detections, + which is a very important scientific product, + but individual observations are equally important and are used directly for analysis. + \item Detection catalogs are merged into source catalogs, and it is important to be able to + associate sources with their detections. + \item Equally important, given the more than 2 decades that both spacecraft are flying, + is the ability to correlate catalog data with time. + \item X-ray data reveal quantities that are usually not well supported by the VO: + \begin{itemize} + \item energy bands + \item hardness ratio + \item Flags that are very important for understanding the source detections. + \item Complex errors (asymmetric, ellipse) + \item model-based data (flux, spectra) + \end{itemize} + \item X-ray data are often analyzed in conjunction with data from other domains, + This is made easier if they all have the same way of describing the quantities of interest. +\end{itemize} + +% Ian E. Mail (17/10)======================== + +% The CSC does provide independent lower and upper confidence limits for each measurement as part of the data tables. +% They are separate columns for us (eg, we have measurement, measurement_lolim, and measurement_hilim as 3 columns) +% but I wonder if these can be handled as a single concept in MANGO? +% For positions on the sky we similarly use a position error ellipse with defined semi-axes and orientation. + +% The one other thing I think about is that we often have many very similar measurements in the CSC. +% For example, for aperture photometry, we have multiple energy bands, +% and in each energy band we measure flux in the aperture in multiple ways +% (eg, photon flux, energy flux, model energy flux (based on several canonical +% spectral models such as absorbed power-law, absorbed black-body, …), +% spectral fit energy flux (based on several spectral models where the parameters are +% fitted to the data - requires more counts to get robust fits). +% And we may do this for multiple configurations of individual observations of a source +% (eg, a straight average - usually for comparison with other catalogs, or a set based on a multi-band Bayesian +% Blocks analysis - +% so we’re grouping observations in which the source has constant flux in each of the energy bands). +% How we represent these many different types of very similar measurements in a way that is scientifically useful +% and searchable is complex. Can this even be done usefully using UCDs? + +% ============================================ + +\subsubsection{Vizier catalog archive} +VizieR provides science ready catalogs coming from space agencies or articles from the astronomical journals, covering number of different science cases. +Published data encompass a very large set of measures (position, photometry, redshift, source type, etc.) +depending on their origin. +They can result from observations, simulations, models or catalog compilations. +Individual Vizier tables can contain data all related to one source (e.g. time series of positions or magnitudes) or to a set of sources (one row per source) or a mix of both. + +The MANGO model must be able to provide a standard representation of most of the metadata contained +in Vizier query responses, either native or computed by the CDS, and organized either as +simple quantities or as associated complex data. +MANGO is not meant to replace the current management of the ViZier metadata, but rather +to make those understandable/interoperable for a wide panel of VO-compliant clients. + +\subsubsection{Client (on Mark Taylor behalf)} +Right now, the meta-data provided within the VOTable allow client software such Aladin or Topcat to run most +of the functionalities expected by the user, either for data analysis or plotting. +This information is often inferred from UCDs, UTypes or column names. It can also be given by the user. +Client applications do not require working with full model instances but in some cases models +can make it explicit how quantities in an input table relate to each other. + +Most cases are oriented towards interpretation of columns for visualization, e.g.: +\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] + \item what is the sky position for this row + (what columns contain latitude and longitude, and what sky system are they in) + + \item what +/-ERR error bars should I plot for these points + (what column is a simple error for column A) + + \item what error ellipses should I plot for these sky positions + (what columns provide ra\_error, dec\_error, ra\_dec\_corr, + or how can I derive those from columns that do exist) + + \item where do I get the grid information for a column containing + a vector of samples so I can label the X axis of a spectrogram + (what column or parameter contains an axis vector matching + the sample vectors) + + \item does this table contain sky positions, or HEALPix tiles, or both? + What's the best way to represent it on the sky? + + \item what is the meaning of such URL found out in a table?s +\end{itemize} + +But there are some other cases like: +\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] + \item how do I propagate this sky position to a future epoch + (what columns contain pmra, pmdec, and maybe all the + associated errors and correlation coefficients) + + \item what is the error ellipse/oid to use for a sky/Cartesian crossmatch + (what columns provide the relevant errors and, if available, + correlations) +\end{itemize} + +This usage shows that MANGO must be designed in a way that individual measurements or quantities +can easily be identified as such and manipulated independently of the whole instance. + +This document does not recommend one approach over another. +This is a matter for the data providers to decide. + +\subsubsection{Xmatch tool } +The basic cross-match of two astronomical tables consists in associating pairs of sources -- one +from each table -- fulfilling a given angular distance based criterion. +%In relational algebra terms, it is a theta-join on a distance predicate. + +More generally, a cross-match is the association of sources from different tables given their +proximity in an astrometric (but also possibly photometric, statistical, ...) parameter +space \citep{2017A&A...597A..89P} . + +If proper motions (plus parallax and radial velocities) are available, the cross-match tool +may propagate the positions of each table to a common epoch. +It may also take into account positional uncertainties to reject the statistically unlikely associations. + +In the latter case (cross-match between two tables taking into account positional errors), +the tool needs to retrieve the errors associated to the each position in each table. + +UCDs may help in identifying the errors associated to a positional columns, +%as shown in table +but this is not sufficient for tables with more complex cases based on multi-parameter cases. From 12ebab5360ac686ee492e751e3cfa9fb3a630632 Mon Sep 17 00:00:00 2001 From: Laurent MICHEL Date: Thu, 17 Oct 2024 15:51:42 +0200 Subject: [PATCH 6/6] layout --- doc/MANGO.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/MANGO.tex b/doc/MANGO.tex index 8a88d8a..afd1a06 100644 --- a/doc/MANGO.tex +++ b/doc/MANGO.tex @@ -162,8 +162,8 @@ \subsection{Role within the VO Architecture} \section{Representing observed astronomical objects : Use Cases and Requirements} \subsection{Use Cases} - \input{usecases.tex} + \subsection{Requirements} From the above list of use-cases, we have identified 4 domains for which