Skip to content

2.1. Software Configuration

Paulo Pinheiro edited this page Aug 14, 2020 · 45 revisions

Most of the software configuration described in this section is done by changing content inside of files of the HADatAc distribution. If the installation is in a local computer, the HADatAc distribution folder should be in the filesystem of the local computer. If the installation is in a remote computer, the HADatAc distribution folder should be in the filesystem of the local computer, that may require the use of some utility like SSH to gain access to the distribution folder. The "HADatAc distribution folder" is what is called "[HADatAc_folder]" in this documentation. The actual value of [HADatAc_folder] is a path in a filesystem of the machine hosting HADatAc, and it may look something like /home/data/git/hadatac/conf (this location in set during HADatAc's installation).

Assuming that HADatAc is installed at [HADatAc_folder], configuration files are located at [HADatAc_folder]/conf. The way content in [HADatAc_folder]/conf is used is different when using HADatAc in development mode and production mode.

In production, a software upgrade DOES NOT replace the content of [HADatAc_folder]/conf in the distribution because the upgrade should preserve current configuration. To change any configuration file, one needs to change the configuration file in both the current distribution and in "/data/conf" that should not be in "[HADatAc_folder]/conf".

In development mode, you may just need to update a few files, including hadatac.conf, and that change can be done directly in the local copy of the master branch. In case code is contributed back to the master branch, please do not add and commit changes to [HADatAc_folder]/conf.

2.1.1. Setting up hadatac.conf

This is the main configuration file and tells the system important information about how the webapp connects to SOLR and Blazegraph repositories, and what is going to be the URL of the webapp once it is deployed.

If you are a developer, using a local copy of HADatAc on your machine, and are able to call 'http://localhost:9000' to invoke HADatAc, you may not need to change this part of the configuration.

If you are deploying HADatAc on a server and you expect users to access HADAtAc over the web, you will need the parameters below according to your domain name and firewall/port restrictions.

HADatAc URL Configuration

  • the application's base host URL

    default value: host="http://localhost:9000"

  • the url that the application is deployed on

    default value: host_deploy="http://localhost:9000"

  • the base url that the application uses to send and receive email. This is the value prefixed to HADatAc services used to communicate with users during email authentication or password reset. Emails may be sent out if this value is not set properly, but the confirmation of any action embedded in email messages may not have any effect on HADatAc.

    default value: base_url="127.0.0.1:9000"

  • the kb's base host URL -- usually, the application's base host URL without any port information. This value is used to build absolute URLs from internal relative URLS, and it is used to inter-connect HADatAc functionalities including the URLs of internal proxies used by javascript code.

    default value: kb="http://localhost"

SOLR Configuration

  • HOME: the path in the file system where the SOLR instances are located

    default: home=/../hadatac/solr

  • URL for data collections

    default: data="http://127.0.0.1:8983/solr"

Blazegraph Configuration

  • URL used to retrieve content from a Blazegraph repository.

  • activity flags are used to verify if HADatAc knowledge base contains

    • concepts essential for supported scientific activities

    • use true for empirical activities involving the use of sensors

      empirical=true

    • use true for computational activities involving computational simulations

      computational=false

  • properties about community using current HADatAc installation

    • these properties are used to project customization of HADaAc installations

      default: fullname="Child Health Exposure Analysis Repository"

      default: shortname="CHEAR"

      default: description=""

      default: ont_prefix="chear"

Setting up BioPortal Searching

The SDD-editor can perform ontology class searches from within the browser using the BioPortal search engine if a user adds their api-key to the search.config file within hadatac. This section will cover acquiring an api-key and adding it to hadatac.

  1. Login or Signup for a BioPortal account: https://bioportal.bioontology.org/login

      Signing up is easy, simply provide your: name, desired account name, email, and desired password.

  1. Copy the api-key from your BioPortal account settings: https://bioportal.bioontology.org/account

      After signing in, go here and under your account information, you will be identified by your API Key, which is a 36 character string.

  1. Add your BioPortal api-key to Hadatac

      Open (hadatac install directory)/conf/hadatac.config in your favorite text editor, paste your api-key in the quotes of the line that says bioportal_api_key="", and save. After this addition, the line should look like:

bioportal_api_key="########-####-####-####-############"

Setting up SDD-Gen

The SDD-editor can suggest potential ontology class matches for various cells within an SDD using the inference engine SDD-Gen. To enable this feature simply add the address of SDD-Gen to (hadatac install directory)/conf/hadatac.config to the line that says sdd_gen_address="". After this addition, the line should look like:

sdd_gen_address="http://www.example.com:5000"

2.1.2. Setting up email configuration

You may not need to set up email configuration if you are using HADatAc for development purposes. This configuration is essential if you are planning to create users with authenticated access to the system. In this case, the email configuration will enable users to verify their emails and to request password reset.

Authentication is done through email verification, which requires HADatAc to communicate with users through emails. The configuration file is smtp.conf under [HADatAc_folder]/conf/play_authenticate. Instructions for filling out the configuration file are described inside of the file itself. Include your own email account (not the gmail account shown in the example below). The account can be any email that HADatAc can communicate with it. The name and password for the account in the snipet below are JUST EXAMPLES THAT NEED TO BE REPLACED.

play.mailer {
    # TODO: Disable this in production
    mock=false
    # SMTP server
    # (mandatory)
    # defaults to gmail
    host=smtp.gmail.com

    # SMTP port
    # defaults to 25
    port=465

    # Use SSL
    # for GMail, this should be set to true
    ssl=true

    # authentication user
    # Optional, comment this line if no auth
    # defaults to no auth
    user="[email protected]"

    # authentication password
    # Optional, comment this line to leave password blank
    # defaults to no password
    password="password"
}

2.1.3. Setting up application.conf

This is the main configuration file of Play Framework behind HADatAc. From this configuration file, Play Framework can locate other configuration files. The session configuration parameters and the java configuration parameters are the parameters that may be changed.

  • Deadbolt (do not change)

    required value: "play-authenticate/deadbolt.conf"

  • SMTP (do not change)

    required value: "play-authenticate/smtp.conf"

  • Play authenticate (do not change)

    required value: "play-authenticate/mine.conf"

  • HADatAc (do not change)

    required value: "hadatac.conf"

  • Session conf: specifies how long a session can last without any activity from the user.

    session.maxAge=1h play.http.session.maxAge=12h

  • java config: specifies the amount of memory allocated to a HADatAc instance.

    jvm.memory=-Xmx2048M -Xms2048M

2.1.4. Setting up autoccsv.config

autoccsc.config specifies the folders where files are stored when they are initially uploaded into HADatAc, and where they are stored after HADatAc has processed them (i.e., after HADatAc has "ingested" the contents of these files). Data Files and metadata files that are uploaded into HADatAc are managed as part of the overall content of the app.

IMPORTANT: the folder paths identified in this configuration file MUST BE OUTSIDE of the HADatAc distribution folder so that existing files are not affected during HADatAc's software upgrade.

  • The path of processed files. The path can be an absolute (when starting with "/") or relative to HADatAc's deployment location.
    path_proc=processed_csv/

  • The path of unprocessed files. The path can be an absolute (when starting with "/") or relative to HADatAc's deployment location.
    path_unproc=unprocessed_csv/

  • identifies whether the autoannotator is on or off by default.
    auto=on

2.1.5. Setting up namespaces.properties

This configuration file has a list of all the namespaces used in HADatAc's knowledge base. This list is cached inside of HADatAc when it is running, and HADatAc needs to be restarted for any change to this configuration file to take effect.

This list has three roles:

  • to indicate which ontologies should be loaded into HADatAc's knowledge graph
  • to indicate which namespaces may be used as a prefix during the execution of any SPARQL query against the repository
  • to indicate the relative priority of the domain ontologies

Every loaded ontology should be included in the list of SPARQL prefixes, but the content inside the knowledge graph may refer to terms from namespaces that are not necessarily loaded into the knowledge graph.

Each namespace has five attributes:

  • definition (mandatory): It is of the form [abbreviation]=[reference_URI] like xsd=http://www.w3.org/2001/XMLSchema# where [abbreviation] is xsd and [reference_URI] is http://www.w3.org/2001/XMLSchema#
  • mime type (only for loaded ontologies): this attribute identifies the encoding format of the ontology to be loaded. If the ontology is encoded in RDF/XML the value for this attribute should be application/rdf+xml. If the ontology is encoded in turtle, the value for this attribute should be text/turtle.
  • loading URL (only for loaded ontologies): this is a URL that should be resolvable on the web (it is suggested that the URL is tested in a browser to verify if the URL is resolvable).
  • ontology priority (optional for loaded ontologies): this is an integer greater than 0, which indicates the priority of an ontology for sdd-editor suggestions. The lower the number the higher the priority and the more likely the sdd-editor will suggest terms from that ontology.

2.1.7. Setting up template.conf

This file is used to define the name of the column headers for HADatAc's metadata file templates. Each one of these template files is composed of a fixed, ordered list of properties, and the column header of each one of these templates must be a column header defined in the vocabulary of this configuration file.

2.1.8. Creating Master User

Each HADatAc installation has one Master user. This user is created once, and the password used for the master user needs to be carefully noted.

The Master user is unique because of the following:

  • Every functional HADatAc installation has only one Master User
  • It is expected to be created once (it is possible to be recreated, but this is not desirable);
  • It does not need email verification (and thus, without the need of setting up an emailer to work with HADatAc)
  • It is already created with ADMIN permission; In summary, the Master User is everything needed for ONE USER to start using HADatAc functionalities in HADatAc.

It is assumed that a development installation of HADatAc only has the master user and no other.

How to Create a Master User
The user is created by selecting the "Sign Up" option in HADatAc's main page (on the very top of the page on the black navigation bar). The process is the same as the sign up of any other user. Once the Master user is created, the account is ready to be used. If a Master User is created, there is no way to create a new Master User, and any use of the Sign Up button will only be allowed to pre-registered users (see Section 6.2).

How to Reset the Password for the Master User
This option consists of erasing the following:

  • In Blazegraph: the content of the store_users. One easy way of erasing the content from store_users' namespace in Blazegraph is to erase the namespace itself and create it again.
  • In Solr, the content of the following tables under solr/solr-home: users, linked_account, and token_action. Each table in Solr is composed of two folders and one file: conf, data and core.properties. The content of the table is inside the data folder. In this case, it is possible to erase the data folder that is going to be recreated the next time you start Solr. Another option is to replace the entire solr/solr_home folder with its content from github.

2.1.9. Setting up Google Sign-In for Gmail Users

HADatAc currently supports authenticating users using their google accounts with Google Sign-In. To enable this feature, add the following snippet to the play-authenticate/mine.conf:

google {
    redirectUri {
        # Whether the redirect URI scheme should be HTTP or HTTPS (HTTP by default)
        secure=false

        # You can use this setting to override the automatic detection
        # of the host used for the redirect URI (helpful if your service is running behind a CDN for example)
        # host=yourdomain.com
    }

    # Google credentials
    # These are mandatory for using OAuth and need to be provided by you,
    # if you want to use Google as an authentication provider.
    # Get them here: https://code.google.com/apis/console
    clientId="974495099172-adsr39dq03fq875ti67p8tupfokr5jqe.apps.googleusercontent.com"
    clientSecret="xxxxx-xxxxxxxxxxxxxxxxxx"
}

and change the clientId and clientSecret accordingly. Please refer to https://code.google.com/apis/console for further information. Specifically, add http://127.0.0.1:9000/hadatac/authenticate/google for the Authorized redirect URIs.

Data Owner Guide

  1. Installation
    1.1. Installing for Linux (Production)
    1.2. Installing for Linux (Development)
    1.3. Installing for MacOS (Development)
    1.4. Deploying with Docker (Production)
    1.5. Deploying with Docker (Development)
    1.6. Installing for Vagrant under Windows
    1.7. Upgrading
    1.8. Starting HADatAc
    1.9. Stopping HADatAc
  2. Setting Up
    2.1. Software Configuration
    2.2. Knowledge Graph Bootstrap
    2.2.1. Knowledge Graph
    2.2.2. Bootstrap without Labkey
    2.2.3. Bootstrap with Labkey
    2.3. Config Verification
  3. Using HADatAc
    3.1. Initial Page
    3.1.1. Home Button
    3.1.2. Sandbox Mode Button
    3.2. File Ingestion
    3.2.1. Ingesting Study Content
    3.2.2. Manual Submission of Files
    3.2.3. Automatic Submission of Files
    3.2.4. Data File Operations
    3.3. Manage Working Files 3.3.1. [Create Empty Semantic File from Template]
    3.3.2. SDD Editor
    3.3.3. DD Editor
    3.4. Manage Metadata
    3.4.1. Manage Instrument Infrastructure
    3.4.2. Manage Deployments 3.4.3. Manage Studies
    3.4.4. [Manage Object Collections]
    3.4.5. Manage Streams
    3.4.6. Manage Semantic Data Dictionaries
    3.4.7. Manage Indicators
    3.5. Data Search
    3.5.1. Data Faceted Search
    3.5.2. Data Spatial Search
    3.6. Metadata Browser and Search
    3.7. Knowledge Graph Browser
    3.8. API
    3.9. Data Download
  4. Software Architecture
    4.1. Software Components
    4.2. The Human-Aware Science Ontology (HAScO)
  5. Metadata Files
    5.1. Deployment Specification (DPL)
    5.2. Study Specification (STD)
    5.3. Semantic Study Design (SSD)
    5.4. Semantic Data Dictionary (SDD)
    5.5. Stream Specification (STR)
  6. Content Evolution
    6.1. Namespace List Update
    6.2. Ontology Update
    6.3. [DPL Update]
    6.4. [SSD Update]
    6.5. SDD Update
  7. Data Governance
    7.1. Access Network
    7.2. User Status, Categories and Access Permissions
    7.3. Data and Metadata Privacy
  8. HADatAc-Supported Projects
  9. Derived Products and Technologies
  10. Glossary
Clone this wiki locally