Skip to content

HADatAc User Guide

Paulo Pinheiro edited this page Jan 8, 2021 · 171 revisions

Getting Started

This page provides instructions and links on how to use HADatAc for the first time. This set of instructions progress from installing and setting up the infrastructure to understanding its initial page, to submitting data and metadata for ingestion, to searching ingested data, and to eventually selecting and downloading data and metadata.

Installing HADatAc

When installing HADatAc for the first time, we strongly recommend installing a development version that can run on either Linux (as described in Section 1.2) or MacOS (as described in Section 1.3..

Setting Up HADatAc

Correct configuration is required for HADatAc to connect to its repositories, to authenticate new users through email, to adjust the infrastructure to operating system requirements, and much more. Section 2.1. describes the configuration files and their use.

HADatAc's Knowledge Graph (KG) is the main element of the infrastructure allowing it to manage files, messages, and their content. Section 2.2. describes the boostrapping process required to initiate the KG.

HADatAc should be ready to use after its elements have been installed, the main software components have been configured, and its KG has been bootstrapped.


Using HADatAc

Getting Data and Metadata In

Data files (files starting with a DA- prefix), data messages (broadcast over the web through an IP adress), and metadata files (files with names starting with any of the following prefixes: DPL, STD, SSD, SDD, or STR) are used to feed content to HADatAc. Section 3.2. describes the basic way to manually submit files for ingestion.

As already described above, there are many distinct types of content that can be submitted for ingestion into HADatAc. However, when submitting content for ingestion, the content needs to follow some order constraints for the file types, as specified in the ingestion workflow, which is described in Section 3.4.

Once study content has been ingested into the knowledge graph, the infrastructure can be used to search and browse the content of the knowledge graph, described below.

Searching and Browsing the Knowledge Graph

HADatAc's data faceted search is the main mechanism to inspect the overall content of the infrastructure, and for selecting this content across studies and instruments. Section 3.5. describes how to understand available content and to select the content for eventual downloads. The data faceted search is described in Section 3.5.1. The data spatial search is also available when ingested data is semantically annotated with spatial properties. The data spatial search is described in Section 3.5.2.

Metadata stored in HADatAc can be browsed and searched through the app. Section 3.6. describes these capabilities.

The actual knowledge graph of the infrastructure can be graphically inspected using the Browse Knowledge Graph capability described in Section 3.7.

Getting Data and Metadata Out

At any moment during the use of a faceted search, users are allowed to download the content related to the current search. Section 3.8. describes the many options on how to download the data selected under the current search.

Data and metadata stored in HADAtAc can also be retrieved programmatically. Section 3.7. describes a list of RESTful services. Data visualization frameworks, visual analytical frameworks, and dashboards with a back end HADatAc support are expected to use the API to retrieve content from the app.


Further reading:

Software Architecture and Knowledge Specification

HADatAc is implemented as a web application. Section 4.1. describe how the main website is built in terms of software components. That section also describe what data repositories are used for managing data and metadata content, and what infrastructure(s) is/are used to manage evolving ontologies.

HADatAc's knowledge graph is composed of a collection of ontologies, a knowledge base with many instances of concepts defined in the ontologies. A collection of foundational ontologies, along with the definition of key concepts that are used to align key concepts of the combined vocabulary of these ontologies, is embedded into the Human-Aware Science Ontology described in Section 4.2.

Metadata Files

Section 5 describes the five kinds of semantic metadata specifications used to describe a study's data content:

  • Deployment Description (DPL);
  • Study Description (STD);
  • Semantic Study Design (SSD);
  • Semantic Data Dictionary (SDD); and
  • Stream Specification (STR).

Data Owner Guide

  1. Installation
    1.1. Installing for Linux (Production)
    1.2. Installing for Linux (Development)
    1.3. Installing for MacOS (Development)
    1.4. Deploying with Docker (Production)
    1.5. Deploying with Docker (Development)
    1.6. Installing for Vagrant under Windows
    1.7. Upgrading
    1.8. Starting HADatAc
    1.9. Stopping HADatAc
  2. Setting Up
    2.1. Software Configuration
    2.2. Knowledge Graph Bootstrap
    2.2.1. Knowledge Graph
    2.2.2. Bootstrap without Labkey
    2.2.3. Bootstrap with Labkey
    2.3. Config Verification
  3. Using HADatAc
    3.1. Initial Page
    3.1.1. Home Button
    3.1.2. Sandbox Mode Button
    3.2. File Ingestion
    3.2.1. Ingesting Study Content
    3.2.2. Manual Submission of Files
    3.2.3. Automatic Submission of Files
    3.2.4. Data File Operations
    3.3. Manage Working Files 3.3.1. [Create Empty Semantic File from Template]
    3.3.2. SDD Editor
    3.3.3. DD Editor
    3.4. Manage Metadata
    3.4.1. Manage Instrument Infrastructure
    3.4.2. Manage Deployments 3.4.3. Manage Studies
    3.4.4. [Manage Object Collections]
    3.4.5. Manage Streams
    3.4.6. Manage Semantic Data Dictionaries
    3.4.7. Manage Indicators
    3.5. Data Search
    3.5.1. Data Faceted Search
    3.5.2. Data Spatial Search
    3.6. Metadata Browser and Search
    3.7. Knowledge Graph Browser
    3.8. API
    3.9. Data Download
  4. Software Architecture
    4.1. Software Components
    4.2. The Human-Aware Science Ontology (HAScO)
  5. Metadata Files
    5.1. Deployment Specification (DPL)
    5.2. Study Specification (STD)
    5.3. Semantic Study Design (SSD)
    5.4. Semantic Data Dictionary (SDD)
    5.5. Stream Specification (STR)
  6. Content Evolution
    6.1. Namespace List Update
    6.2. Ontology Update
    6.3. [DPL Update]
    6.4. [SSD Update]
    6.5. SDD Update
  7. Data Governance
    7.1. Access Network
    7.2. User Status, Categories and Access Permissions
    7.3. Data and Metadata Privacy
  8. HADatAc-Supported Projects
  9. Derived Products and Technologies
  10. Glossary
Clone this wiki locally